SAX parsing problem

A

anon

So I've encountered a strange behavior that I'm hoping someone can fill
me in on. i've written a simple handler that works with one small
exception, when the parser encounters a line with '&' in it, it
only returns the portion that follows the occurence.

For example, parsing a file with the line :
<key>mykey</key><value>some%20&%20value</value>

results in getting "%20value" back from the characters method, rather
than "some%20&%20value".

After looking into this a bit, I found that SAX supports entities and
that it is probably believing the & to be an entity and processing
it in some way that i'm unware of. I'm using the default
EntityResolver.

Any help/info would be much appreciated.

gh
 
D

David M. Cooke

anon said:
So I've encountered a strange behavior that I'm hoping someone can fill
me in on. i've written a simple handler that works with one small
exception, when the parser encounters a line with '&' in it, it
only returns the portion that follows the occurence.

For example, parsing a file with the line :
<key>mykey</key><value>some%20&%20value</value>

results in getting "%20value" back from the characters method, rather
than "some%20&%20value".

After looking into this a bit, I found that SAX supports entities and
that it is probably believing the & to be an entity and processing
it in some way that i'm unware of. I'm using the default
EntityResolver.

Are you sure you're not actually getting three chunks: "some%20", "&",
and "%20value"? The xml.sax.handler.ContentHandler.characters method
(which I presume you're using for SAX, as you don't mention!) is not
guaranteed to get all contiguous character data in one call. Also check
if .skippedEntity() methods are firing.
 
G

gh

David M. said:
Are you sure you're not actually getting three chunks: "some%20", "&",
and "%20value"? The xml.sax.handler.ContentHandler.characters method
(which I presume you're using for SAX, as you don't mention!) is not
guaranteed to get all contiguous character data in one call. Also check
if .skippedEntity() methods are firing.

Ya, skippedEntity() wasn't firing, but you are correct about receiving
three chunks. The characters handler routine is fired 3 times for a
single text block. Why does it do this? Is there a way to prevent
doing this?

Much thanks.

gh
 
U

Uche Ogbuji

The characters handler routine is fired 3 times for a
single text block. Why does it do this? Is there a way to prevent
doing this?

Continuing in the vein of closing matters cross-posted to XML-SIG:

http://mail.python.org/pipermail/xml-sig/2005-March/011013.html

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html
Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top