How to parse XML which contains & in the text ?

S

sohan.soni

Hi,


XML file content is:



<?xml version="1.0"?>

<!DOCTYPE RECORD SYSTEM ".\RECORD.dtd">

<RECORD EXT_SOURCE="DEFAULT" TEMPLATE="GAS_POOL_POINTS">

<TABLE_NAME>GAS_POOL_POINTS</TABLE_NAME>

<COLUMN>

<COLUMN_NAME>GP_POOL</COLUMN_NAME>

<PRIMARY_KEY>Y</PRIMARY_KEY>

<COLUMN_VALUE>Some&Value</COLUMN_VALUE>

</COLUMN>

</Record>



When Parsing (i.e. converting this XML doc to String) this XML file
using Java code, I am getting following exception.

org.xml.sax.SAXParseException: Next character must be ";" terminating
reference to entity "Value".



I think there is some changes/modification needed in DTD to treat the
string in XML which contains & as a literal, instead of expecting some
entity.

Adding to this, XML content is not under our control.

Please reply if somebody knows about this.
 
D

Daniel Dyer

When Parsing (i.e. converting this XML doc to String) this XML file
using Java code, I am getting following exception.

org.xml.sax.SAXParseException: Next character must be ";" terminating
reference to entity "Value".

Section 2.4 of the XML 1.0 specification:

"The ampersand character (&) and the left angle bracket (<) MUST NOT
appear in their literal form, except when used as markup delimiters, or
within a comment, a processing instruction, or a CDATA section. If they
are needed elsewhere, they MUST be escaped using either numeric character
references or the strings "&amp;" and "&lt;" respectively. The right angle
bracket (>) may be represented using the string "&gt;", and MUST, for
compatibility, be escaped using either "&gt;" or a character reference
when it appears in the string "]]>" in content, when that string is not
marking the end of a CDATA section."
I think there is some changes/modification needed in DTD to treat the
string in XML which contains & as a literal, instead of expecting some
entity.

You can't fix this in the DTD, the XML is invalid and the parser is
correct to reject it.
Adding to this, XML content is not under our control.

Unforunately, the only rational fix *is* to change the XML. Either use
&amp; or wrap the element data in a CDATA section. If the XML is
controlled by a third part it would be reasonable to request that they
change it since it is not really XML at all if it is not valid.

Dan.
 
A

Alex Hunsley

Hi,


XML file content is:



<?xml version="1.0"?>

<!DOCTYPE RECORD SYSTEM ".\RECORD.dtd">

<RECORD EXT_SOURCE="DEFAULT" TEMPLATE="GAS_POOL_POINTS">

<TABLE_NAME>GAS_POOL_POINTS</TABLE_NAME>

<COLUMN>

<COLUMN_NAME>GP_POOL</COLUMN_NAME>

<PRIMARY_KEY>Y</PRIMARY_KEY>

<COLUMN_VALUE>Some&Value</COLUMN_VALUE>

</COLUMN>

</Record>



When Parsing (i.e. converting this XML doc to String) this XML file
using Java code, I am getting following exception.

org.xml.sax.SAXParseException: Next character must be ";" terminating
reference to entity "Value".



I think there is some changes/modification needed in DTD to treat the
string in XML which contains & as a literal, instead of expecting some
entity.

Adding to this, XML content is not under our control.

Like the other replier said, it's invalid XML. It shouldn't contain a
'naked' ampersand like that.
Do you have any chance at all to speak to the producer of this XML? It's
very reasonable to ask them to fix it. If you can't ask them to fix it,
then how about:

1) put in a fix yourself - e.g. do a search and replace kludge on the
content before the XML parser gets it - so replace naked '&' with
'&amp;' (and any other nasty characters that crop up)
2) At least tell the party making the XML that it is broken - you may
help someone else down the line by doing this, if not yourself

lex
 
S

sohan.soni

When Parsing (i.e. converting this XML doc to String) this XML file
using Java code, I am getting following exception.
org.xml.sax.SAXParseException: Next character must be ";" terminating
reference to entity "Value".

Section 2.4 of the XML 1.0 specification:

"The ampersand character (&) and the left angle bracket (<) MUST NOT
appear in their literal form, except when used as markup delimiters, or
within a comment, a processing instruction, or a CDATA section. If they
are needed elsewhere, they MUST be escaped using either numeric character
references or the strings "&amp;" and "&lt;" respectively. The right angle
bracket (>) may be represented using the string "&gt;", and MUST, for
compatibility, be escaped using either "&gt;" or a character reference
when it appears in the string "]]>" in content, when that string is not
marking the end of a CDATA section."
I think there is some changes/modification needed in DTD to treat the
string in XML which contains & as a literal, instead of expecting some
entity.

You can't fix this in the DTD, the XML is invalid and the parser is
correct to reject it.
Adding to this, XML content is not under our control.

Unforunately, the only rational fix *is* to change the XML. Either use
&amp; or wrap the element data in a CDATA section. If the XML is
controlled by a third part it would be reasonable to request that they
change it since it is not really XML at all if it is not valid.

Dan.

Thanks Daniel,
That info really helped.

Regards
Sohan
 
S

sohan.soni

Like the other replier said, it's invalid XML. It shouldn't contain a
'naked' ampersand like that.
Do you have any chance at all to speak to the producer of this XML? It's
very reasonable to ask them to fix it. If you can't ask them to fix it,
then how about:

1) put in a fix yourself - e.g. do a search and replace kludge on the
content before the XML parser gets it - so replace naked '&' with
'&amp;' (and any other nasty characters that crop up)
2) At least tell the party making the XML that it is broken - you may
help someone else down the line by doing this, if not yourself

lex- Hide quoted text -

- Show quoted text -

Thanks Lex,

Sohan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top