log parser design question

A

avidfan

I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...

I need to be able to group these entries together, index them by ID
and IID, and search the context of each entry and if a certain status
if found (such as wait), then be able to return the ID or IID
(depending...) of that entry.

So I was considering parsing them to this effect:

in a dictionary, where the key is a tuple, and the value is a list:

{('ID=8688', 'IID=98889998'): ['ID=8688 IID=98889998 execute begin -
01.21.2007 status enabled', 'locked working.lock', 'status running',
'status complete']}

I am keeping the full text of each entry in the list so that I can
recreate them for display if need be.

I am fairly new to python, so could anyone offer any advice here
before I get too far and discover a fatal flaw that you might see
coming a mile away?

would I, with this design, be able to, for example, search each list
for "waiting on ID=8688", and when found, be able to associate that
value with one of the elements of it's key "ID=9009" ? or is this
approached flawed? I'm assuming there is a better way, but I need
some advice...

I appreciate any thoughts.

Thanks.
 
P

Paul McGuire

I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.

-- Paul

data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete


ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

"""
from pyparsing import *

integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")

logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals")

for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print

prints:

['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]

['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]
 
A

avidfan

I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.

-- Paul

data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete


ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

"""
from pyparsing import *

integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")

logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals")

for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print

prints:

['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]

['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]

Paul,

Thanks! That's a great module. I've been going through the docs and
it seems to do exactly what I need...

I appreciate your help!
 
A

avidfan

I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.

-- Paul

data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete


ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

"""
from pyparsing import *

integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")

logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals")

for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print

prints:

['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]

['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]

Paul,

Thanks! That's a great module. I've been going through the docs and
it seems to do exactly what I need...

I appreciate your help!

http://www.camelrichard.org/roller/page/camelblog?entry=h3_parsing_log_files_with

Thanks, Paul!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,792
Messages
2,569,639
Members
45,353
Latest member
RogerDoger

Latest Threads

Top