D
Dennis Roberts
I have a script to parse a dns querylog and generate some statistics.
For a 750MB file a perl script using the same methods (splits) can
parse the file in 3 minutes. My python script takes 25 minutes. It
is enough of a difference that unless I can figure out what I did
wrong or a better way of doing it I might not be able to use python
(since most of what I do is parsing various logs). The main reason to
try python is I had to look at some early scripts I wrote in perl and
had no idea what the hell I was thinking or what the script even did!
After some googling and reading Eric Raymonds essay on python I jumped
in Here is my script. I am looking for constructive comments -
please don't bash my newbie code.
#!/usr/bin/python -u
import string
import sys
clients = {}
queries = {}
count = 0
print "Each dot is 100000 lines..."
f = sys.stdin
while 1:
line = f.readline()
if count % 100000 == 0:
sys.stdout.write(".")
if line:
splitline = string.split(line)
try:
(month, day, time, stype, source, qtype, query, ctype,
record) = splitline
except:
print "problem spliting line", count
print line
break
try:
words = string.split(source,'#')
source = words[0]
except:
print "problem splitting source", count
print line
break
if clients.has_key(source):
clients[source] = clients[source] + 1
else:
clients[source] = 1
if queries.has_key(query):
queries[query] = queries[query] + 1
else:
queries[query] = 1
else:
print
break
count = count + 1
f.close()
print count, "lines processed"
for numclient, count in clients.items():
if count > 100000:
print "%s,%s" % (numclient, count)
for numquery, count in queries.items():
if count > 100000:
print "%s,%s" % (numquery, count)
For a 750MB file a perl script using the same methods (splits) can
parse the file in 3 minutes. My python script takes 25 minutes. It
is enough of a difference that unless I can figure out what I did
wrong or a better way of doing it I might not be able to use python
(since most of what I do is parsing various logs). The main reason to
try python is I had to look at some early scripts I wrote in perl and
had no idea what the hell I was thinking or what the script even did!
After some googling and reading Eric Raymonds essay on python I jumped
in Here is my script. I am looking for constructive comments -
please don't bash my newbie code.
#!/usr/bin/python -u
import string
import sys
clients = {}
queries = {}
count = 0
print "Each dot is 100000 lines..."
f = sys.stdin
while 1:
line = f.readline()
if count % 100000 == 0:
sys.stdout.write(".")
if line:
splitline = string.split(line)
try:
(month, day, time, stype, source, qtype, query, ctype,
record) = splitline
except:
print "problem spliting line", count
print line
break
try:
words = string.split(source,'#')
source = words[0]
except:
print "problem splitting source", count
print line
break
if clients.has_key(source):
clients[source] = clients[source] + 1
else:
clients[source] = 1
if queries.has_key(query):
queries[query] = queries[query] + 1
else:
queries[query] = 1
else:
break
count = count + 1
f.close()
print count, "lines processed"
for numclient, count in clients.items():
if count > 100000:
print "%s,%s" % (numclient, count)
for numquery, count in queries.items():
if count > 100000:
print "%s,%s" % (numquery, count)