xml.sax parsing elements with the same name

Discussion in 'Python' started by amadain, Jan 11, 2010.

  1. amadain

    amadain Guest

    I have an event log with 100s of thousands of entries with logs of the
    form:

    <event eventTimestamp="2009-12-18T08:22:49.035"
    uniqueId="1261124569.35725_PFS_1_1340035961">
    <result value="Blocked"/>
    <filters>
    <filter code="338" type="Filter_Name">
    <diagnostic>
    <result value="Triggered"/>
    </diagnostic>
    </filter>
    <filter code="338" type="Filter_Name">
    <diagnostic>
    <result value="Blocked"/>
    </diagnostic>
    </filter>
    </filters>
    </event>

    I am using xml.sax to parse the event log. The trouble with the file
    above is when I parse for result value I get the last result value
    (Blocked from above). I want to get the result value triggered (the
    second in the event).

    my code is as follows:

    def startElement(self, name, attrs):
    if name == 'event':
    self.eventTime = attrs.get('eventTimestamp',"")
    self.eventUniqueId = attrs.get('uniqueId', "")
    if name == 'result':
    self.resultValue = attrs.get('value',"")
    return

    def endElement(self, name):
    if name=="event":
    result= eval(self.filter)
    if result:
    ...

    How do I get the result value I require when events have the same
    names like above?
    amadain, Jan 11, 2010
    #1
    1. Advertising

  2. amadain

    John Bokma Guest

    amadain <> writes:

    > I have an event log with 100s of thousands of entries with logs of the
    > form:
    >
    > <event eventTimestamp="2009-12-18T08:22:49.035"
    > uniqueId="1261124569.35725_PFS_1_1340035961">
    > <result value="Blocked"/>
    > <filters>
    > <filter code="338" type="Filter_Name">
    > <diagnostic>
    > <result value="Triggered"/>
    > </diagnostic>
    > </filter>
    > <filter code="338" type="Filter_Name">
    > <diagnostic>
    > <result value="Blocked"/>
    > </diagnostic>
    > </filter>
    > </filters>
    > </event>
    >
    > I am using xml.sax to parse the event log. The trouble with the file
    > above is when I parse for result value I get the last result value
    > (Blocked from above). I want to get the result value triggered (the
    > second in the event).
    >
    > my code is as follows:
    >
    > def startElement(self, name, attrs):
    > if name == 'event':
    > self.eventTime = attrs.get('eventTimestamp',"")
    > self.eventUniqueId = attrs.get('uniqueId', "")
    > if name == 'result':
    > self.resultValue = attrs.get('value',"")
    > return
    >
    > def endElement(self, name):
    > if name=="event":
    > result= eval(self.filter)
    > if result:
    > ...
    >
    > How do I get the result value I require when events have the same
    > names like above?


    You have to keep track if you're inside a filters section, and keep
    track of the filter elements (first, second, etc.) assuming you want the
    result value of the first filter.

    --
    John Bokma

    Read my blog: http://johnbokma.com/
    Hire me (Perl/Python): http://castleamber.com/
    John Bokma, Jan 11, 2010
    #2
    1. Advertising

  3. amadain

    John Bokma Guest

    amadain <> writes:

    > On Jan 11, 7:26 pm, John Bokma <> wrote:
    >> amadain <> writes:



    >> > <event eventTimestamp="2009-12-18T08:22:49.035"
    >> > uniqueId="1261124569.35725_PFS_1_1340035961">
    >> >    <result value="Blocked"/>
    >> >       <filters>
    >> >           <filter code="338" type="Filter_Name">
    >> >               <diagnostic>
    >> >                    <result value="Triggered"/>
    >> >               </diagnostic>
    >> >           </filter>
    >> >           <filter code="338" type="Filter_Name">
    >> >               <diagnostic>
    >> >                    <result value="Blocked"/>
    >> >               </diagnostic>
    >> >           </filter>
    >> >       </filters>
    >> > </event>


    > how do I keep track? The first result value is outside a filters
    > section and the rest are. Do you mean something like:
    >
    > def startElement(self, name, attrs):
    > if name == 'event':
    > self.eventTime = attrs.get('eventTimestamp',"")
    > self.eventUniqueId = attrs.get('uniqueId', "")
    > if name == 'result':
    > self.resultValue = attrs.get('value',"")
    > if name == filters:
    > if name == 'result':
    > self.resultValueF = attrs.get('value',"")
    > return


    I was thinking about something like:

    self.filterIndex = 0

    in startElement:

    if name == 'filter':
    self.filterIndex += 1
    return
    if name == 'result' and self.filterIndex == 1:
    ... = attrs.get('value', '')

    in EndElement

    if name == 'filters':
    self.filterIndex = 0

    If you want the result of the first filter in filters

    --
    John Bokma

    Read my blog: http://johnbokma.com/
    Hire me (Perl/Python): http://castleamber.com/
    John Bokma, Jan 11, 2010
    #3
  4. amadain, 11.01.2010 20:13:
    > I have an event log with 100s of thousands of entries with logs of the
    > form:
    >
    > <event eventTimestamp="2009-12-18T08:22:49.035"
    > uniqueId="1261124569.35725_PFS_1_1340035961">
    > <result value="Blocked"/>
    > <filters>
    > <filter code="338" type="Filter_Name">
    > <diagnostic>
    > <result value="Triggered"/>
    > </diagnostic>
    > </filter>
    > <filter code="338" type="Filter_Name">
    > <diagnostic>
    > <result value="Blocked"/>
    > </diagnostic>
    > </filter>
    > </filters>
    > </event>
    >
    > I am using xml.sax to parse the event log.


    You should give ElementTree's iterparse() a try (xml.etree package).
    Instead of a stream of simple events, it will give you a stream of
    subtrees, which are a lot easier to work with. You can intercept the event
    stream on each 'event' tag, handle it completely in one obvious code step,
    and then delete any content you are done with to safe memory.

    It's also very fast, you will like not loose much performance compared to
    xml.sax.

    Stefan
    Stefan Behnel, Jan 12, 2010
    #4
  5. On Mon, 2010-01-11 at 13:24 -0800, amadain wrote:
    > On Jan 11, 9:03 pm, John Bokma <> wrote:
    > > amadain <> writes:
    > > I was thinking about something like:
    > > self.filterIndex = 0
    > > in startElement:
    > > if name == 'filter':
    > > self.filterIndex += 1
    > > return
    > > if name == 'result' and self.filterIndex == 1:
    > > ... = attrs.get('value', '')
    > > in EndElement
    > > if name == 'filters':
    > > self.filterIndex = 0
    > > If you want the result of the first filter in filters

    > Thank you. I will try that


    If you document is reasonably complex I usually define some modes like:

    BPML_BOOTSTRAP_MODE = 0
    BPML_PACKAGE_MODE = 1
    BPML_PROCESS_MODE = 2
    BPML_CONTEXT_MODE = 3
    .....
    BPML_EVENT_MODE = 10
    BPML_FAULTS_MODE = 11
    BPML_ATTRIBUTES_MODE = 12

    - so I can self.mode.append(BPML_PROCESS_MODE) when I enter an element
    (startElement) and self.mode = self.mode[:-1] when I leave an element
    (endElement). This provides you with a complete 'stack trace' of how
    you got where you are and still lets you efficiently process the stream
    [verses using evil document model]. In startElement you can check the
    current mode and tag with something like -
    ....
    elif (name == 'event' and self.mode[-1] -- BPML_PROCESS_MODE):
    ....

    --
    OpenGroupware developer:
    <http://whitemiceconsulting.blogspot.com/>
    OpenGroupare & Cyrus IMAPd documenation @
    <http://docs.opengroupware.org/Members/whitemice/wmogag/file_view>
    Adam Tauno Williams, Jan 15, 2010
    #5
  6. amadain

    dontcare Guest

    If you are using jython, then you might also want to consider VTD-XML,
    which is
    a lot easier to use and faster than SAX, native XPath support may be
    useful too

    http;//vtd-xml.sf.net

    On Jan 12, 12:13 am, Stefan Behnel <> wrote:
    > amadain, 11.01.2010 20:13:
    >
    >
    >
    >
    >
    > > I have an event log with 100s of thousands of entries with logs of the
    > > form:

    >
    > > <event eventTimestamp="2009-12-18T08:22:49.035"
    > > uniqueId="1261124569.35725_PFS_1_1340035961">
    > >    <result value="Blocked"/>
    > >       <filters>
    > >           <filter code="338" type="Filter_Name">
    > >               <diagnostic>
    > >                    <result value="Triggered"/>
    > >               </diagnostic>
    > >           </filter>
    > >           <filter code="338" type="Filter_Name">
    > >               <diagnostic>
    > >                    <result value="Blocked"/>
    > >               </diagnostic>
    > >           </filter>
    > >       </filters>
    > > </event>

    >
    > > I am usingxml.saxto parse the event log.

    >
    > You should give ElementTree's iterparse() a try (xml.etree package).
    > Instead of a stream of simple events, it will give you a stream of
    > subtrees, which are a lot easier to work with. You can intercept the event
    > stream on each 'event' tag, handle it completely in one obvious code step,
    > and then delete any content you are done with to safe memory.
    >
    > It's also very fast, you will like not loose muchperformancecompared toxml.sax.
    >
    > Stefan- Hide quoted text -
    >
    > - Show quoted text -
    dontcare, Feb 9, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Per Magnus L?vold
    Replies:
    0
    Views:
    1,376
    Per Magnus L?vold
    Nov 15, 2004
  2. Naren
    Replies:
    0
    Views:
    578
    Naren
    May 11, 2004
  3. Replies:
    2
    Views:
    496
  4. erikcw
    Replies:
    1
    Views:
    378
    Diez B. Roggisch
    Feb 25, 2008
  5. Erik Wasser
    Replies:
    5
    Views:
    445
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page