亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

  免費注冊 查看新帖 |

Chinaunix

  平臺 論壇 博客 文庫
12下一頁
最近訪問板塊 發(fā)新帖
查看: 3767 | 回復(fù): 10
打印 上一主題 下一主題

求腳本,python處理文件 [復(fù)制鏈接]

論壇徽章:
0
跳轉(zhuǎn)到指定樓層
1 [收藏(0)] [報告]
發(fā)表于 2009-11-10 10:32 |只看該作者 |倒序瀏覽
文件格式為:

192.168.0.181 - - [04/Nov/2009:14:35:18 +0800] "CONNECT mail.google.com:443 HTTP/1.1" 200 11163 TCP_MISS:DIRECT
192.168.0.181 - - [04/Nov/2009:14:35:18 +0800] "GET http://www.jingoal.com/favicon.ico HTTP/1.1" 302 662 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:14:35:19 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 342 TCP_MISS:DIRECT
192.168.0.181 - - [04/Nov/2009:14:35:19 +0800] "GET http://www.jingoal.com/portal/publicity/manage_bbs/news/main.html HTTP/1.1" 200
1976 TCP_MEM_HIT:NONE
192.168.0.181 - - [04/Nov/2009:14:35:20 +0800] "GET http://www.jingoal.com/favicon.ico HTTP/1.1" 302 662 TCP_MISS:DIRECT
192.168.0.181 - - [04/Nov/2009:14:35:21 +0800] "GET http://www.jingoal.com/portal/pu ... gmaterial/main.html HTTP/1.1
"
200 3502 TCP_MEM_HIT:NONE
192.168.0.103 - - [04/Nov/2009:14:35:21 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 342 TCP_MISS:DIRECT
192.168.0.181 - - [04/Nov/2009:14:35:21 +0800] "GET http://www.jingoal.com/favicon.ico HTTP/1.1" 302 662 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:14:35:23 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 342 TCP_MISS:DIRECT
192.168.0.181 - - [04/Nov/2009:14:35:24 +0800] "GET http://www.jingoal.com/portal/cn/index.jsp HTTP/1.1" 200 5339 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:14:35:25 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 341 TCP_MISS:DIRECT
192.168.0.181 - - [04/Nov/2009:14:35:25 +0800] "GET http://www.jingoal.com/favicon.ico HTTP/1.1" 302 662 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:14:35:27 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 416 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:14:35:29 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 342 TCP_MISS:DIRECT
192.168.0.181 - - [04/Nov/2009:21:35:31 +0800] "CONNECT mail.google.com:443 HTTP/1.1" 200 1948 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:21:35:31 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 342 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:14:35:33 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 1198 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:14:35:35 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 341 TCP_MISS:DIRECT
192.168.0.181 - - [04/Nov/2009:14:35:36 +0800] "CONNECT mail.google.com:443 HTTP/1.1" 200 165 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:14:35:37 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 341 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:21:35:39 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 342 TCP_MISS:DIRECT
192.168.0.193 - - [04/Nov/2009:14:35:39 +0800] "POST http://www.jingoal.com/portal/pu ... stration/result.jsp HTTP
/1.1"
200 333 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:21:35:41 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 342 TCP_MISS:DIRECT
192.168.0.193 - - [04/Nov/2009:21:35:41 +0800] "GET http://www.jingoal.com/favicon.ico HTTP/1.1" 302 662 TCP_MISS:DIRECT
192.168.0.127 - - [04/Nov/2009:14:35:42 +0800] "GET http://www.tpy100.com/product.aspx? HTTP/1.1" 200 11045 TCP_MISS:DIRECT
192.168.0.103 - - [04/Nov/2009:21:35:43 +0800] "POST http://207.46.124.200/gateway/gateway.dll? HTTP/1.1" 200 342 TCP_MISS:DIRECT
192.168.0.193 - - [04/Nov/2009:01:35:45 +0800] "GET http://www.jingoal.com/portal/cn/mobile/main.jsp HTTP/1.1" 200 3563 TCP_MISS:DIR
ECT


要求:
       1、按照ip排序,統(tǒng)計每個ip訪問域名次數(shù)的前20個。
       2、過濾時間,只統(tǒng)計9:00-12:00,13:00-18:00的,其他時間不統(tǒng)計。

[ 本帖最后由 tony_413 于 2009-11-10 10:34 編輯 ]

論壇徽章:
0
2 [報告]
發(fā)表于 2009-11-10 11:09 |只看該作者
自己用shell寫了一個,但是文件一大,處理速度太慢。只實現(xiàn)了對ip排序和對每個ip訪問的域名統(tǒng)計20個,沒有排序,也沒有對時間過濾。

發(fā)出來,請高手指點一下。

#!/bin/sh
SLog="/home/tony/shell/log/test.log"
IPs=`awk '{ print $1 }' $SLog | sort | uniq`
Doms=`awk -F"/" '{ print $5 }' $SLog | sort | uniq | grep -E "^[a-zA-Z0-9][a-z0-9]{0,}.\W.{1,}(com|cn|com.cn|net)$"`
#Doms=`awk '{ print $2 }' $SLog | sort | uniq`
#echo $Doms
echo -e "+-----------------+-----------------------------+-------+"
echo -e "|      IP         |      Site and Domain        | Count |"
echo -e "+-----------------+-----------------------------+-------+"
for ip in $IPs
do
        #ip_total=`grep "$ip" $SLog | wc -l`
        #echo -e "$ip\t$ip_total"
        for dom in $Doms
        do
                i=1
                count=`grep "$ip" $SLog | grep "$dom" | wc -l`
                if [ "$i" -lt 20 ]
                then
                        if [ "$count" -gt 0 ]
                        then
                                echo -e "|  $ip  |    $dom    |   $count   |"
                                echo -e "+-----------------+-----------------------------+-------+"
                        fi
                        ((i++))
                fi
        done
done

論壇徽章:
0
3 [報告]
發(fā)表于 2009-11-10 12:54 |只看該作者
說下思路:

while True:
   #讀取一行
   #如果時間是9:00-12:00,13:00-18:00則繼續(xù),否則continue
   #以IP為dict的key,然后把域名加入到value里去(value可以嵌套網(wǎng)址)

或者你把這些log直接都split下,存在數(shù)據(jù)庫,然后想怎么查就怎么查把

論壇徽章:
0
4 [報告]
發(fā)表于 2009-11-10 13:21 |只看該作者
先謝謝樓上了。

怎么把讀入一行中的ip和域名取出來呀,我在python中執(zhí)行awk命令老是報錯。

while True:
        os.system("awk '{ print $1}'" + line)

論壇徽章:
1
天蝎座
日期:2013-10-23 21:11:03
5 [報告]
發(fā)表于 2009-11-10 13:57 |只看該作者

回復(fù) #4 tony_413 的帖子

正則表達(dá)式自己處理就可以
re模塊

論壇徽章:
0
6 [報告]
發(fā)表于 2009-11-10 14:06 |只看該作者
我對python的re模塊不熟悉,麻煩樓上給個例子唄。

論壇徽章:
1
天蝎座
日期:2013-10-23 21:11:03
7 [報告]
發(fā)表于 2009-11-10 14:48 |只看該作者
隨便找本書看看就可以
像Python核心編程、Programming Python
網(wǎng)上應(yīng)該有現(xiàn)成的
http://www.baidu.com/s?word=+pyt ... 3&wd=+python+re

論壇徽章:
0
8 [報告]
發(fā)表于 2009-11-10 14:52 |只看該作者
看了幾個,都沒看明白。樓上的最好給個例子。

比如: GET [url]http://www.tpy100.com[/url]

提取出www.tpy100.com

[ 本帖最后由 tony_413 于 2009-11-10 15:00 編輯 ]

論壇徽章:
0
9 [報告]
發(fā)表于 2009-11-10 14:54 |只看該作者
awk '{w=substr($4,14,2);if(w>9&&w<1{print $0}}' ww
awk  '{a[$1]++}END{for (i in a )if(a>20){print i,a}}' ww
awk 處理這種事情比較好吧,可以把上面兩個合并一下。

[ 本帖最后由 jiang_ocean 于 2009-11-10 15:00 編輯 ]

論壇徽章:
0
10 [報告]
發(fā)表于 2009-11-10 17:08 |只看該作者
執(zhí)行:awk  '{a[$1]++}END{for (i in a )if(a>20){print i,a}}'
報錯:
awk: (FILENAME=log/test.log FNR=1359859) fatal: attempt to use array `a' in a scalar context
您需要登錄后才可以回帖 登錄 | 注冊

本版積分規(guī)則 發(fā)表回復(fù)

  

北京盛拓優(yōu)訊信息技術(shù)有限公司. 版權(quán)所有 京ICP備16024965號-6 北京市公安局海淀分局網(wǎng)監(jiān)中心備案編號:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年舉報專區(qū)
中國互聯(lián)網(wǎng)協(xié)會會員  聯(lián)系我們:huangweiwei@itpub.net
感謝所有關(guān)心和支持過ChinaUnix的朋友們 轉(zhuǎn)載本站內(nèi)容請注明原作者名及出處

清除 Cookies - ChinaUnix - Archiver - WAP - TOP