SAX parsing problem, when element contains text like "[text]"

K

Kai Schlamp

Hello.

I try to parse some XML results of PubMed (the largest biomedical
article database).
The document contains elements like this: <articletitle>[Virus and RNA
silencing]</articletitle>
Now my problem is that the characters function of my SAX handler is
called twice for "[Virus and RNA silencing]".
The first time I get "[Virus and RNA silencing" and the second time
"]".
I am not very experienced regarding XML and XML processing. Why does
this happen? Why not one call for "[Virus and RNA silencing]"? And is
there a way to set a property for SAX to behave that (only one call)
way?

Best regards,
Kai
 
A

Arne Vajhøj

Kai said:
I try to parse some XML results of PubMed (the largest biomedical
article database).
The document contains elements like this: <articletitle>[Virus and RNA
silencing]</articletitle>
Now my problem is that the characters function of my SAX handler is
called twice for "[Virus and RNA silencing]".
The first time I get "[Virus and RNA silencing" and the second time
"]".
I am not very experienced regarding XML and XML processing. Why does
this happen? Why not one call for "[Virus and RNA silencing]"? And is
there a way to set a property for SAX to behave that (only one call)
way?

This is expected behavior of a SAX parser.

You characters method should accumulate content and
endElement do the final processing.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top