Problem reading " & apos; " from XML using SAX Parser

M

madan

Hi All,

I have a XML which contains the following element

<DataText>This is simple ' Text</DataText>


I have included " & apos ; " in the element called DataText.

When parsing the element, am getting only the text that appears before
& apos ;

When Not including " & apos ; " am able to get the full text from this
element.

I observed that in the method characters(char buf[], int offset, int
len)

Thelen attribute shows the total length from start position to the
position where " & apos ; "starts...

How can i get the whole text which includes even " & apos ; "

Thanks

Note : while posting this request," & apos; " is being formatted to "
' " . thats the reason included space between them

Thanks
 
B

bugbear

madan said:
Hi All,

I have a XML which contains the following element

<DataText>This is simple ' Text</DataText>


I have included " & apos ; " in the element called DataText.

When parsing the element, am getting only the text that appears before
& apos ;

When Not including " & apos ; " am able to get the full text from this
element.

I observed that in the method characters(char buf[], int offset, int
len)

Thelen attribute shows the total length from start position to the
position where " & apos ; "starts...

How can i get the whole text which includes even " & apos ; "

Thanks

Note : while posting this request," & apos; " is being formatted to "
' " . thats the reason included space between them

Thanks

Are you getting multiple calls to your handler?
How many calls (leading question) do you expect
your handler to get?

BugBear
 
R

Roedy Green

I have included " & apos ; " in the element called DataText.

Just like HTML, various characters are reserved and have long forms
called entities to use when they occur accidentally in the text as
data: &amp;, &lt;, &gt;, &apos; and &quot; Unlike HTML, XML just has
those five basic entities. Character references take one of two forms:
decimal references, ℞ and hexadecimal references, &#x211e. Named
character entities such as &eacute; don't work. You can use any
Unicode characters you want that are not part of the XML grammar, and
UTF-8 deals with encoding them.

If you meant the spaces in &_apos_;, it should be encoded &amp;_apos_;

If you did not mean the spaces, then it should be encoded: &amp;apos;
 
L

Lew

Roedy said:
Just like HTML, various characters are reserved and have long forms
called entities to use when they occur accidentally in the text as
data: &amp;, &lt;, &gt;, &apos; and &quot; Unlike HTML, XML just has
those five basic entities. Character references take one of two forms:
decimal references, ℞ and hexadecimal references, &#x211e. Named
character entities such as &eacute; don't work. You can use any
Unicode characters you want that are not part of the XML grammar, and
UTF-8 deals with encoding them.

If you meant the spaces in &_apos_;, it should be encoded &amp;_apos_;

If you did not mean the spaces, then it should be encoded: &amp;apos;

The OP had stated:
Note : while posting this request," & apos; " is being formatted to "
' " . thats the reason included space between them

Which leads me to wonder what they were using to enter their post. It was
plain text so there shouldn't have been an issue on the Usenet side. Anyhow,
it's pretty clear the OP didn't intend for the spaces to be in the final
literal representation, thus they were saying "&apos;".

To make sure I understand Roedy's answer: to encode the element, instead of
saying "&apos;" the OP should say "&amp;apos;", correct?
 
R

Roedy Green

To make sure I understand Roedy's answer: to encode the element, instead of
saying "&apos;" the OP should say "&amp;apos;", correct?
&apos; is the encoding for ' when you mean it as a literal character,
not as a string delimiter.
&amp;apos; is the encoding for &apos; when it meant it as a literal
string of characters not and an encoding for '.
 
L

Lasse Reichstein Nielsen

madan said:
I have a XML which contains the following element

<DataText>This is simple ' Text</DataText>

I have included " & apos ; " in the element called DataText.

When parsing the element, am getting only the text that appears before
& apos ;

HOW are you parsing it?
If using a DOM parser, you will likely find that the resulting
tree is an element node named DataText with three children:
a text node, an entity node and another text node.
If you are expecting only one child and only looking at the element
node's firstChild, you will find only the text before the entity.

Other parsers might also split the text into separate chunks.

/L
 
M

madan

Hi All,

As Lasse Reichstein said, the Text between the element has been
called thrice if the '&apos;' is in middle of the Text.

Am using SAX Parser extending DefaultHandler.

The method characters(char buf[], int offset, int len) is being called
thrice as said above.

Temp'ly Resolved this by appending the text to a StringBuffer and
converting that to string when the element ends.

But is this the expected behavior of SAX Parser, that the parser might
split the text into separate chunks if there are some entities like
this in between the text ?

Madan N
 
L

Lew

madan said:
Hi All,

As Lasse Reichstein said, the Text between the element has been
called thrice if the '&apos;' is in middle of the Text.

Am using SAX Parser extending DefaultHandler.

The method characters(char buf[], int offset, int len) is being called
thrice as said above.

Temp'ly Resolved this by appending the text to a StringBuffer and
converting that to string when the element ends.

But is this the expected behavior of SAX Parser, that the parser might
split the text into separate chunks if there are some entities like
this in between the text ?

Did you read bugbear's response to your original question?

Have you read the docs on the SAX callback methods?

Yes, it is the expected behavior, in the sense that you do not know how many
times the callback will be invoked to parse the text. It might split on any
arbitrary location, not just on entities.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top