- 論壇徽章:
- 0
|
我有100百萬行的字符串,約150的長度, 里面有一些是重復(fù)的字符串
現(xiàn)在想得到最終的去除重復(fù)后的字符串及每個字符串出現(xiàn)的次數(shù). perl中有什么快速的算法嗎?
google了下面的方法,不過速度也不快.
http://www.experts-exchange.com/ ... ous/Q_22722019.html
下面是對50百萬行的日志進(jìn)行計數(shù):
open(F,"logfile.log");
open(TMP,">temp.log");
while (<F>) {
@data = split(/\t/,$_); # logfile is tab-separated
print TMP $data[5]."\n"; # ID is in the 6th column
}
close(F);
close(TMP);
$unique = `sort temp.log | uniq | wc -l`;
print "There are ".$unique." user ID's in the log file"; |
|