Re: xml.parsers.expat loading xml into a dict and whitespace

Discussion in 'Python' started by kaens, May 23, 2007.

  1. kaens

    kaens Guest

    Ok, I can fix it by modifying

    if self.inOptions and self.curTag != "options":

    to

    if self.inOptions and self.curTag != "options" and self.curTag != ""

    but this feels really freaking ugly.

    Sigh.

    Any suggestions? I know I must be missing something.

    Also, I hate the tendency I have to figure stuff out shortly after
    posting to a mailing list or forum. Happens all the time, and I swear
    I don't solve stuff until I ask for help.

    On 5/23/07, kaens <> wrote:
    > Wait. . . it's because the curTag is set to "", thus it sets the
    > whitespace after a tag to that part of the dict.
    >
    > That doesn't explain why it does it on a xml file containing no
    > whitespace, unless it's counting newlines.
    >
    > Is there a way to just ignore whitespace and/or xml comments?
    >
    > On 5/23/07, kaens <> wrote:
    > > Hey everyone, this may be a stupid question, but I noticed the
    > > following and as I'm pretty new to using xml and python, I was
    > > wondering if I could get an explanation.
    > >
    > > Let's say I write a simple xml parser, for an xml file that just loads
    > > the content of each tag into a dict (the xml file doesn't have
    > > multiple hierarchies in it, it's flat other than the parent node)
    > >
    > > so we have
    > > <parent>
    > > <option1>foo</option1>
    > > <option2>bar</option2>
    > > . . .
    > > </parent>
    > >
    > > (I'm using xml.parsers.expat)
    > > the parser sets a flag that says it's in the parent, and sets the
    > > value of the current tag it's processing in the start tag handler.
    > > The character data handler sets a dictionary value like so:
    > >
    > > dictName[curTag] = data
    > >
    > > after I'm done processing the file, I print out the dict, and the first value is
    > > <a few bits of whitespace> : <a whole bunch of whitespace>
    > >
    > > There are comments in the xml file - is this what is causing this?
    > > There are also blank lines. . .but I don't see how a blank line would
    > > be interpreted as a tag. Comments though, I could see that happening.
    > >
    > > Actually, I just did a test on an xml file that had no comments or
    > > whitespace and got the same behaviour.
    > >
    > > If I feed it the following xml file:
    > >
    > > <options>
    > > <one>hey</one>
    > > <two>bee</two>
    > > <three>eff</three>
    > > </options>
    > >
    > > it prints out:
    > > " :
    > >
    > > three : eff
    > > two : bee
    > > one : hey"
    > >
    > > wtf.
    > >
    > > For reference, here's the handler functions:
    > >
    > > def handleCharacterData(self, data):
    > > if self.inOptions and self.curTag != "options":
    > > self.options[self.curTag] = data
    > >
    > > def handleStartElement(self, name, attributes):
    > > if name == "options":
    > > self.inOptions = True
    > > if self.inOptions:
    > > self.curTag = name
    > >
    > >
    > > def handleEndElement(self, name):
    > > if name == "options":
    > > self.inOptions = False
    > > self.curTag = ""
    > >
    > > Sorry if the whitespace in the code got mangled (fingers crossed...)
    > >

    >
    kaens, May 23, 2007
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Will Stuyvesant

    Help with xml.parsers.expat please?

    Will Stuyvesant, Jul 4, 2003, in forum: Python
    Replies:
    1
    Views:
    675
    Alan Kennedy
    Jul 4, 2003
  2. Thomas Guettler

    xml.parsers.expat vs. xml.sax

    Thomas Guettler, Apr 27, 2004, in forum: Python
    Replies:
    2
    Views:
    887
    Martijn Faassen
    Apr 27, 2004
  3. Replies:
    2
    Views:
    770
    Kent Johnson
    May 4, 2005
  4. kaens
    Replies:
    6
    Views:
    327
    Stefan Behnel
    May 23, 2007
  5. sharan
    Replies:
    1
    Views:
    713
    Pavel Lepin
    Oct 26, 2007
Loading...

Share This Page