K
kaens
Hey everyone, this may be a stupid question, but I noticed the
following and as I'm pretty new to using xml and python, I was
wondering if I could get an explanation.
Let's say I write a simple xml parser, for an xml file that just loads
the content of each tag into a dict (the xml file doesn't have
multiple hierarchies in it, it's flat other than the parent node)
so we have
<parent>
<option1>foo</option1>
<option2>bar</option2>
. . .
</parent>
(I'm using xml.parsers.expat)
the parser sets a flag that says it's in the parent, and sets the
value of the current tag it's processing in the start tag handler.
The character data handler sets a dictionary value like so:
dictName[curTag] = data
after I'm done processing the file, I print out the dict, and the first value is
<a few bits of whitespace> : <a whole bunch of whitespace>
There are comments in the xml file - is this what is causing this?
There are also blank lines. . .but I don't see how a blank line would
be interpreted as a tag. Comments though, I could see that happening.
Actually, I just did a test on an xml file that had no comments or
whitespace and got the same behaviour.
If I feed it the following xml file:
<options>
<one>hey</one>
<two>bee</two>
<three>eff</three>
</options>
it prints out:
" :
three : eff
two : bee
one : hey"
wtf.
For reference, here's the handler functions:
def handleCharacterData(self, data):
if self.inOptions and self.curTag != "options":
self.options[self.curTag] = data
def handleStartElement(self, name, attributes):
if name == "options":
self.inOptions = True
if self.inOptions:
self.curTag = name
def handleEndElement(self, name):
if name == "options":
self.inOptions = False
self.curTag = ""
Sorry if the whitespace in the code got mangled (fingers crossed...)
following and as I'm pretty new to using xml and python, I was
wondering if I could get an explanation.
Let's say I write a simple xml parser, for an xml file that just loads
the content of each tag into a dict (the xml file doesn't have
multiple hierarchies in it, it's flat other than the parent node)
so we have
<parent>
<option1>foo</option1>
<option2>bar</option2>
. . .
</parent>
(I'm using xml.parsers.expat)
the parser sets a flag that says it's in the parent, and sets the
value of the current tag it's processing in the start tag handler.
The character data handler sets a dictionary value like so:
dictName[curTag] = data
after I'm done processing the file, I print out the dict, and the first value is
<a few bits of whitespace> : <a whole bunch of whitespace>
There are comments in the xml file - is this what is causing this?
There are also blank lines. . .but I don't see how a blank line would
be interpreted as a tag. Comments though, I could see that happening.
Actually, I just did a test on an xml file that had no comments or
whitespace and got the same behaviour.
If I feed it the following xml file:
<options>
<one>hey</one>
<two>bee</two>
<three>eff</three>
</options>
it prints out:
" :
three : eff
two : bee
one : hey"
wtf.
For reference, here's the handler functions:
def handleCharacterData(self, data):
if self.inOptions and self.curTag != "options":
self.options[self.curTag] = data
def handleStartElement(self, name, attributes):
if name == "options":
self.inOptions = True
if self.inOptions:
self.curTag = name
def handleEndElement(self, name):
if name == "options":
self.inOptions = False
self.curTag = ""
Sorry if the whitespace in the code got mangled (fingers crossed...)