- 論壇徽章:
- 0
|
寫了個awk查重復(fù)字段的腳本。目的是把兩個文件中共有的字段找出來輸出并且去除重復(fù)。代碼函數(shù)段如下:
find_same()
{
echo "start at `date | awk '{print $5}'`"
echo -n "now rebuilding input files..."
awk '{count[$1]++}END{for(number in count)print number","count[number] }' $file1 | awk -F, '{print $1 > "find-final1.txt"}'
awk '{count[$1]++}END{for(number in count)print number","count[number] }' $file2 | awk -F, '{print $1 > "find-final2.txt"}'
cat find-final1.txt >> find-final2.txt
echo -ne "ok! \n analyze files..."
awk '{count[$1]++}END{for(number in count)print number","count[number] }' find-final2.txt | awk -F, '$2 > 1 {print $1 > "find-same.txt"}'
echo -ne "ok! \n output files..."
sort find-same.txt > same_$file3
echo -e "ok! \n output file is same_$file3"
rm -f find-*.txt
echo "end at `date | awk '{print $5}'`"
read anything
......
}
但是如果文件1的最后一行剛好在文件2里有的話。輸出的結(jié)果卻沒有這一行,代碼實(shí)現(xiàn)肯定沒有問題,但是為什么遇到最后一行匹配時,這行就沒法輸出呢?實(shí)在不解。。
[ 本帖最后由 galford433 于 2007-12-5 17:07 編輯 ] |
|