Parse a log file

K

kaklis

Hello to all!
I want to parse a log file with the following format for
example:
TIMESTAMPE Operation FileName
Bytes
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:05:05 +0200 DELETE sample3.3gp 37151

How can i count the operations for a month(e.g total of 40 Operations,
30 exists, 10 delete?)
Any tips?

Thanks in advance
Antonis
 
S

samwyse

Hello to all!
I want to parse a log file with the following format for
example:
              TIMESTAMPE            Operation     FileName
Bytes
12/Jan/2010:16:04:59 +0200   EXISTS       sample3.3gp   37151
12/Jan/2010:16:04:59 +0200  EXISTS        sample3.3gp   37151
12/Jan/2010:16:04:59 +0200  EXISTS        sample3.3gp   37151
12/Jan/2010:16:04:59 +0200  EXISTS        sample3.3gp   37151
12/Jan/2010:16:04:59 +0200  EXISTS        sample3.3gp   37151
12/Jan/2010:16:05:05 +0200  DELETE      sample3.3gp   37151

How can i count the operations for a month(e.g total of 40 Operations,
30 exists, 10 delete?)
Any tips?

Thanks in advance
Antonis

time.strptime(string[, format])
Parse a string representing a time according to a format. The return
value is a struct_time as returned by gmtime() or localtime().

The format parameter uses the same directives as those used by strftime
(); it defaults to "%a %b %d %H:%M:%S %Y" which matches the formatting
returned by ctime(). If string cannot be parsed according to format,
or if it has excess data after parsing, ValueError is raised. The
default values used to fill in any missing data when more accurate
values cannot be inferred are (1900, 1, 1, 0, 0, 0, 0, 1, -1).
import time
ts='12/Jan/2010:16:04:59 +0200'
time.strptime(ts[:-6], '%d/%b/%Y:%H:%M:%S')
time.struct_time(tm_year=2010, tm_mon=1, tm_mday=12, tm_hour=16,
tm_min=4, tm_sec=59, tm_wday=1, tm_yday=12, tm_isdst=-1)

I leave the conversion of the last six characters (the time zone
offset) as an exercise for the student. :)
 
T

Tim Chase

I want to parse a log file with the following format for
example:
TIMESTAMPE Operation FileName
Bytes
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:04:59 +0200 EXISTS sample3.3gp 37151
12/Jan/2010:16:05:05 +0200 DELETE sample3.3gp 37151

How can i count the operations for a month(e.g total of 40 Operations,
30 exists, 10 delete?)

It can be done pretty easily with a regexp to parse the relevant
bits:

import re
r = re.compile(r'\d+/([^/]+)/(\d+)\S+\s+\S+\s+(\w+)')
stats = {}
for line in file('log.txt'):
m = r.match(line)
if m:
stats[m.groups()] = stats.get(m.groups(), 0) + 1
print stats

This prints out

{('Jan', '2010', 'EXISTS'): 5, ('Jan', '2010', 'DELETE'): 1}


With the resulting data structure, you can manipulate it to do
coarser-grained aggregates such as the total operations, or remap
month-name abbreviations into integers so they could be sorted
for output.

-tkc
 
K

kaklis

I want to parse a log file with the following format for
example:
              TIMESTAMPE            Operation     FileName
Bytes
12/Jan/2010:16:04:59 +0200   EXISTS       sample3.3gp   37151
12/Jan/2010:16:04:59 +0200  EXISTS        sample3.3gp   37151
12/Jan/2010:16:04:59 +0200  EXISTS        sample3.3gp   37151
12/Jan/2010:16:04:59 +0200  EXISTS        sample3.3gp   37151
12/Jan/2010:16:04:59 +0200  EXISTS        sample3.3gp   37151
12/Jan/2010:16:05:05 +0200  DELETE      sample3.3gp   37151
How can i count the operations for a month(e.g total of 40 Operations,
30 exists, 10 delete?)

It can be done pretty easily with a regexp to parse the relevant
bits:

   import re
   r = re.compile(r'\d+/([^/]+)/(\d+)\S+\s+\S+\s+(\w+)')
   stats = {}
   for line in file('log.txt'):
     m = r.match(line)
     if m:
       stats[m.groups()] = stats.get(m.groups(), 0) + 1
   print stats

This prints out

   {('Jan', '2010', 'EXISTS'): 5, ('Jan', '2010', 'DELETE'): 1}

With the resulting data structure, you can manipulate it to do
coarser-grained aggregates such as the total operations, or remap
month-name abbreviations into integers so they could be sorted
for output.

-tkc

Thank you both so much

Antonis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top