xml.sax parsing elements with the same name

A

amadain

I have an event log with 100s of thousands of entries with logs of the
form:

<event eventTimestamp="2009-12-18T08:22:49.035"
uniqueId="1261124569.35725_PFS_1_1340035961">
<result value="Blocked"/>
<filters>
<filter code="338" type="Filter_Name">
<diagnostic>
<result value="Triggered"/>
</diagnostic>
</filter>
<filter code="338" type="Filter_Name">
<diagnostic>
<result value="Blocked"/>
</diagnostic>
</filter>
</filters>
</event>

I am using xml.sax to parse the event log. The trouble with the file
above is when I parse for result value I get the last result value
(Blocked from above). I want to get the result value triggered (the
second in the event).

my code is as follows:

def startElement(self, name, attrs):
if name == 'event':
self.eventTime = attrs.get('eventTimestamp',"")
self.eventUniqueId = attrs.get('uniqueId', "")
if name == 'result':
self.resultValue = attrs.get('value',"")
return

def endElement(self, name):
if name=="event":
result= eval(self.filter)
if result:
...

How do I get the result value I require when events have the same
names like above?
 
J

John Bokma

amadain said:
I have an event log with 100s of thousands of entries with logs of the
form:

<event eventTimestamp="2009-12-18T08:22:49.035"
uniqueId="1261124569.35725_PFS_1_1340035961">
<result value="Blocked"/>
<filters>
<filter code="338" type="Filter_Name">
<diagnostic>
<result value="Triggered"/>
</diagnostic>
</filter>
<filter code="338" type="Filter_Name">
<diagnostic>
<result value="Blocked"/>
</diagnostic>
</filter>
</filters>
</event>

I am using xml.sax to parse the event log. The trouble with the file
above is when I parse for result value I get the last result value
(Blocked from above). I want to get the result value triggered (the
second in the event).

my code is as follows:

def startElement(self, name, attrs):
if name == 'event':
self.eventTime = attrs.get('eventTimestamp',"")
self.eventUniqueId = attrs.get('uniqueId', "")
if name == 'result':
self.resultValue = attrs.get('value',"")
return

def endElement(self, name):
if name=="event":
result= eval(self.filter)
if result:
...

How do I get the result value I require when events have the same
names like above?

You have to keep track if you're inside a filters section, and keep
track of the filter elements (first, second, etc.) assuming you want the
result value of the first filter.
 
J

John Bokma

how do I keep track? The first result value is outside a filters
section and the rest are. Do you mean something like:

def startElement(self, name, attrs):
if name == 'event':
self.eventTime = attrs.get('eventTimestamp',"")
self.eventUniqueId = attrs.get('uniqueId', "")
if name == 'result':
self.resultValue = attrs.get('value',"")
if name == filters:
if name == 'result':
self.resultValueF = attrs.get('value',"")
return

I was thinking about something like:

self.filterIndex = 0

in startElement:

if name == 'filter':
self.filterIndex += 1
return
if name == 'result' and self.filterIndex == 1:
... = attrs.get('value', '')

in EndElement

if name == 'filters':
self.filterIndex = 0

If you want the result of the first filter in filters
 
S

Stefan Behnel

amadain, 11.01.2010 20:13:
I have an event log with 100s of thousands of entries with logs of the
form:

<event eventTimestamp="2009-12-18T08:22:49.035"
uniqueId="1261124569.35725_PFS_1_1340035961">
<result value="Blocked"/>
<filters>
<filter code="338" type="Filter_Name">
<diagnostic>
<result value="Triggered"/>
</diagnostic>
</filter>
<filter code="338" type="Filter_Name">
<diagnostic>
<result value="Blocked"/>
</diagnostic>
</filter>
</filters>
</event>

I am using xml.sax to parse the event log.

You should give ElementTree's iterparse() a try (xml.etree package).
Instead of a stream of simple events, it will give you a stream of
subtrees, which are a lot easier to work with. You can intercept the event
stream on each 'event' tag, handle it completely in one obvious code step,
and then delete any content you are done with to safe memory.

It's also very fast, you will like not loose much performance compared to
xml.sax.

Stefan
 
A

Adam Tauno Williams

Thank you. I will try that

If you document is reasonably complex I usually define some modes like:

BPML_BOOTSTRAP_MODE = 0
BPML_PACKAGE_MODE = 1
BPML_PROCESS_MODE = 2
BPML_CONTEXT_MODE = 3
.....
BPML_EVENT_MODE = 10
BPML_FAULTS_MODE = 11
BPML_ATTRIBUTES_MODE = 12

- so I can self.mode.append(BPML_PROCESS_MODE) when I enter an element
(startElement) and self.mode = self.mode[:-1] when I leave an element
(endElement). This provides you with a complete 'stack trace' of how
you got where you are and still lets you efficiently process the stream
[verses using evil document model]. In startElement you can check the
current mode and tag with something like -
....
elif (name == 'event' and self.mode[-1] -- BPML_PROCESS_MODE):
....
 
D

dontcare

If you are using jython, then you might also want to consider VTD-XML,
which is
a lot easier to use and faster than SAX, native XPath support may be
useful too

http;//vtd-xml.sf.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top