- 論壇徽章:
- 0
|
我昨天已經(jīng)發(fā)了一貼,人氣不高,今天再發(fā)一貼,并給出實例
A是明細(xì)文件,B是關(guān)鍵字文件,
根據(jù)A的第二列值(逗號分隔),去B(只有一列)的關(guān)鍵字文件匹配,存在,則輸出到C文件,
A,B,C的樣例數(shù)據(jù):
A文件:
123,B,343Y65
321,C,6547657
435,D,23R4RT353
678,A,423242
B文件:
A
B
C文件:
123,B,343Y65
678,A,423242
具體算法:
sub filterFile
{
my ($processFile,$filterFile1,$outputFile) = @_;
my %hashFile;
open (OUTFILE, ">>${outputFile}") or die "can't open file: $outputFile";
open FH, "<$filterFile1" or die "can't open file: $filterFile1";
while (<FH>)
{
chomp;
$hashFile{$_} = 1;
}
close (FH);
open FH, "<$processFile" or print "can't open file: $processFile";
while (<FH>)
{
chomp;
my @filed =split /,/ ;
print OUTFILE "$_\n" if (defined$hashFile{$filed[1]});
}
close (FH);
close(OUTFILE);
}
測試樣例:
A文件:372M
B文件:20個關(guān)鍵字
耗時:75秒
因為正式環(huán)境的文件預(yù)計有2G左右大小,因此才更優(yōu)算法,謝謝各位大蝦幫忙。 |
|