亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

  免費注冊 查看新帖 |

Chinaunix

  平臺 論壇 博客 文庫
最近訪問板塊 發(fā)新帖
查看: 7628 | 回復(fù): 2
打印 上一主題 下一主題

[Hadoop&HBase] 請教hadoop加lzo壓縮后 運算不正確的問題[已解決] [復(fù)制鏈接]

論壇徽章:
0
跳轉(zhuǎn)到指定樓層
1 [收藏(0)] [報告]
發(fā)表于 2012-02-23 10:02 |只看該作者 |倒序瀏覽
本帖最后由 懶烊烊 于 2012-02-24 12:48 編輯

大家好
我 建立一個hadoop 集群 并安裝好了lzo壓縮   然后通過計算 卻發(fā)現(xiàn) 和不用lzo計算的結(jié)果不一樣

原始文件
  1. cat a.log
  2. 192.168.0.211 - - [26/Dec/2011:15:10:01 +0800] GET /js/272/272893.js HTTP/1.1 "304" 0 "http://www.86zw.com/Html/Book/33/33137/2794580.shtml" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) QQBrowser/6.9.11153.201" "-"
  3. 192.168.0.212 - - [26/Dec/2011:15:10:01 +0800] GET /okno.php?user=troryzh HTTP/1.1 "200" 5591 "http://www.renao001.com/detail22_7555.shtml" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)" "2.52"
  4. 192.168.0.211 - - [26/Dec/2011:15:10:01 +0800] GET /js/282/282002.js HTTP/1.1 "200" 220 "http://gg.ux120.com/zc/0005/00016.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Sicent; .NET CLR 2.0.50727)" "-"
  5. 192.168.0.212 - - [26/Dec/2011:15:10:01 +0800] GET /js/282/282016.js HTTP/1.1 "304" 0 "http://www.bookbao.com/Search/q_%25u5341%25u5E74%25u4E00%25u54C1%25u6E29%25u5982%25u8A00" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; @ZOhdam%{qEY?-9:*EF6cSUp=G{gxfX:v4Us,G; SV1; QQDownload 691; 360SE)" "-"
  6. 192.168.0.212 - - [26/Dec/2011:15:10:01 +0800] GET /js/270/270653.js HTTP/1.1 "304" 0 "http://www.kyks8.com/zuixin520/3/3889/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; Microsoft Windows Media Center PC 6.0)" "-"
  7. 192.168.0.211 - - [26/Dec/2011:15:10:01 +0800] GET /ok.php?user=lmxh521 HTTP/1.1 "200" 5809 "http://www.wwe7.cn/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; 360SE)" "2.48"
  8. 192.168.0.212 - - [26/Dec/2011:15:10:01 +0800] GET /xvi.php HTTP/1.1 "200" 4559 "http://www.shushuw.cn/search/%E8%8B%8D%E7%A9%B9/0.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" "2.61"
  9. 192.168.0.212 - - [26/Dec/2011:15:10:01 +0800] GET /js/281/281779.js HTTP/1.1 "200" 356 "http://www.pp456.com/guochanju/17305/play.html?17305-0-13" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)" "-"
  10. 192.168.0.211 - - [26/Dec/2011:15:10:01 +0800] GET /js/281/281640.js HTTP/1.1 "304" 0 "http://www.lenovo2008.com/files/article/html/0/30/6912.html" "-" "-"
  11. 192.168.0.212 - - [26/Dec/2011:15:10:01 +0800] GET /vi.php HTTP/1.1 "200" 4547 "http://www.morui.com/book/5/5578/1126529.html" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB7.2)" "2.62"
復(fù)制代碼
壓縮并傳上hadoop 并建立好index 如下;
  1. lzop a.log

  2. bin/hadoop fs -put a.log.lzo lzoinputlzo

  3. bin/hadoop jar /home/hadoop/hadoop/lib/hadoop-lzo-0.4.15.jar com.hadoop.compression.lzo.DistributedLzoIndexer lzoinputlzo
  4. 12/02/23 09:44:44 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
  5. 12/02/23 09:44:44 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev ]
  6. 12/02/23 09:44:44 INFO lzo.DistributedLzoIndexer: Adding LZO file hdfs://zsqy13:9000/user/hadoop/lzoinputlzo/a.log.lzo to indexing list (no index currently exists)
  7. 12/02/23 09:44:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
  8. 12/02/23 09:44:45 INFO input.FileInputFormat: Total input paths to process : 1
  9. 12/02/23 09:44:45 INFO mapred.JobClient: Running job: job_201202221049_0013
  10. 12/02/23 09:44:46 INFO mapred.JobClient:  map 0% reduce 0%
  11. 12/02/23 09:44:59 INFO mapred.JobClient:  map 100% reduce 0%
  12. 12/02/23 09:45:04 INFO mapred.JobClient: Job complete: job_201202221049_0013
  13. 12/02/23 09:45:04 INFO mapred.JobClient: Counters: 15
  14. 12/02/23 09:45:04 INFO mapred.JobClient:   Job Counters
  15. 12/02/23 09:45:04 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12736
  16. 12/02/23 09:45:04 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
  17. 12/02/23 09:45:04 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
  18. 12/02/23 09:45:04 INFO mapred.JobClient:     Rack-local map tasks=1
  19. 12/02/23 09:45:04 INFO mapred.JobClient:     Launched map tasks=1
  20. 12/02/23 09:45:04 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
  21. 12/02/23 09:45:04 INFO mapred.JobClient:   File Output Format Counters
  22. 12/02/23 09:45:04 INFO mapred.JobClient:     Bytes Written=0
  23. 12/02/23 09:45:04 INFO mapred.JobClient:   FileSystemCounters
  24. 12/02/23 09:45:04 INFO mapred.JobClient:     HDFS_BYTES_READ=172
  25. 12/02/23 09:45:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=21162
  26. 12/02/23 09:45:04 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=8
  27. 12/02/23 09:45:04 INFO mapred.JobClient:   File Input Format Counters
  28. 12/02/23 09:45:04 INFO mapred.JobClient:     Bytes Read=51
  29. 12/02/23 09:45:04 INFO mapred.JobClient:   Map-Reduce Framework
  30. 12/02/23 09:45:04 INFO mapred.JobClient:     Map input records=1
  31. 12/02/23 09:45:04 INFO mapred.JobClient:     Spilled Records=0
  32. 12/02/23 09:45:04 INFO mapred.JobClient:     Map output records=1
  33. 12/02/23 09:45:04 INFO mapred.JobClient:     SPLIT_RAW_BYTES=117
復(fù)制代碼
測試原始數(shù)據(jù)統(tǒng)計ip出現(xiàn)的次數(shù)如下(結(jié)果正確)
  1. cat a.log | ./mapper.py | sort | ./reducer.py
  2. 192.168.0.211        4
  3. 192.168.0.212        6
復(fù)制代碼
使用lzo 計算如下
  1. bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -inputformat com.hadoop.mapred.DeprecatedLzoTextInputFormat -input lzoinputlzo -output outputlzo -file mapper.py -mapper mapper.py -file reducer.py -reducer reducer.py
  2. packageJobJar: [mapper.py, reducer.py, /home/hadoop/double/hadoop-hadoop/hadoop-unjar7460131778399100974/] [] /tmp/streamjob8185284251347424727.jar tmpDir=null
  3. 12/02/23 09:46:31 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
  4. 12/02/23 09:46:31 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev ]
  5. 12/02/23 09:46:31 INFO mapred.FileInputFormat: Total input paths to process : 2
  6. 12/02/23 09:46:31 INFO streaming.StreamJob: getLocalDirs(): [/home/hadoop/double/hadoop-hadoop/mapred/local]
  7. 12/02/23 09:46:31 INFO streaming.StreamJob: Running job: job_201202221049_0014
  8. 12/02/23 09:46:31 INFO streaming.StreamJob: To kill this job, run:
  9. 12/02/23 09:46:31 INFO streaming.StreamJob: /home/hadoop/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=zsqy13:9001 -kill job_201202221049_0014
  10. 12/02/23 09:46:31 INFO streaming.StreamJob: Tracking URL: http://zsqy13:50030/jobdetails.jsp?jobid=job_201202221049_0014
  11. 12/02/23 09:46:32 INFO streaming.StreamJob:  map 0%  reduce 0%
  12. 12/02/23 09:46:45 INFO streaming.StreamJob:  map 100%  reduce 0%
  13. 12/02/23 09:46:56 INFO streaming.StreamJob:  map 100%  reduce 100%
  14. 12/02/23 09:47:02 INFO streaming.StreamJob: Job complete: job_201202221049_0014
  15. 12/02/23 09:47:02 INFO streaming.StreamJob: Output: outputlzo
復(fù)制代碼
打開計算結(jié)果如下 完全和測試的數(shù)據(jù)不同(結(jié)果看不懂)
File: /user/hadoop/outputlzo/part-00000

  1. 0        1
  2. 1074        9
復(fù)制代碼
請教大家是這個問題 是如何解決的?

論壇徽章:
0
2 [報告]
發(fā)表于 2012-02-24 10:58 |只看該作者
自我解答下
經(jīng)過lzo 后 讀取文件里每行頭一個字段 是亂碼的 和我們計算沒關(guān)系  所以計算從第二個字段開始去處理 就好了(測試了好久 我才發(fā)現(xiàn))

論壇徽章:
0
3 [報告]
發(fā)表于 2012-02-24 11:06 |只看該作者
自我解答 好啊~
您需要登錄后才可以回帖 登錄 | 注冊

本版積分規(guī)則 發(fā)表回復(fù)

  

北京盛拓優(yōu)訊信息技術(shù)有限公司. 版權(quán)所有 京ICP備16024965號-6 北京市公安局海淀分局網(wǎng)監(jiān)中心備案編號:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年舉報專區(qū)
中國互聯(lián)網(wǎng)協(xié)會會員  聯(lián)系我們:huangweiwei@itpub.net
感謝所有關(guān)心和支持過ChinaUnix的朋友們 轉(zhuǎn)載本站內(nèi)容請注明原作者名及出處

清除 Cookies - ChinaUnix - Archiver - WAP - TOP