How to parse XML which contains & in the text ?

Discussion in 'Java' started by sohan.soni@gmail.com, Feb 14, 2007.

  1. Guest

    Hi,


    XML file content is:



    <?xml version="1.0"?>

    <!DOCTYPE RECORD SYSTEM ".\RECORD.dtd">

    <RECORD EXT_SOURCE="DEFAULT" TEMPLATE="GAS_POOL_POINTS">

    <TABLE_NAME>GAS_POOL_POINTS</TABLE_NAME>

    <COLUMN>

    <COLUMN_NAME>GP_POOL</COLUMN_NAME>

    <PRIMARY_KEY>Y</PRIMARY_KEY>

    <COLUMN_VALUE>Some&Value</COLUMN_VALUE>

    </COLUMN>

    </Record>



    When Parsing (i.e. converting this XML doc to String) this XML file
    using Java code, I am getting following exception.

    org.xml.sax.SAXParseException: Next character must be ";" terminating
    reference to entity "Value".



    I think there is some changes/modification needed in DTD to treat the
    string in XML which contains & as a literal, instead of expecting some
    entity.

    Adding to this, XML content is not under our control.

    Please reply if somebody knows about this.
     
    , Feb 14, 2007
    #1
    1. Advertising

  2. Daniel Dyer Guest

    On Wed, 14 Feb 2007 11:31:18 -0000,
    <> wrote:

    > When Parsing (i.e. converting this XML doc to String) this XML file
    > using Java code, I am getting following exception.
    >
    > org.xml.sax.SAXParseException: Next character must be ";" terminating
    > reference to entity "Value".
    >


    Section 2.4 of the XML 1.0 specification:

    "The ampersand character (&) and the left angle bracket (<) MUST NOT
    appear in their literal form, except when used as markup delimiters, or
    within a comment, a processing instruction, or a CDATA section. If they
    are needed elsewhere, they MUST be escaped using either numeric character
    references or the strings "&amp;" and "&lt;" respectively. The right angle
    bracket (>) may be represented using the string "&gt;", and MUST, for
    compatibility, be escaped using either "&gt;" or a character reference
    when it appears in the string "]]>" in content, when that string is not
    marking the end of a CDATA section."

    > I think there is some changes/modification needed in DTD to treat the
    > string in XML which contains & as a literal, instead of expecting some
    > entity.


    You can't fix this in the DTD, the XML is invalid and the parser is
    correct to reject it.

    > Adding to this, XML content is not under our control.


    Unforunately, the only rational fix *is* to change the XML. Either use
    &amp; or wrap the element data in a CDATA section. If the XML is
    controlled by a third part it would be reasonable to request that they
    change it since it is not really XML at all if it is not valid.

    Dan.

    --
    Daniel Dyer
    http://www.uncommons.org
     
    Daniel Dyer, Feb 14, 2007
    #2
    1. Advertising

  3. Alex Hunsley Guest

    wrote:
    > Hi,
    >
    >
    > XML file content is:
    >
    >
    >
    > <?xml version="1.0"?>
    >
    > <!DOCTYPE RECORD SYSTEM ".\RECORD.dtd">
    >
    > <RECORD EXT_SOURCE="DEFAULT" TEMPLATE="GAS_POOL_POINTS">
    >
    > <TABLE_NAME>GAS_POOL_POINTS</TABLE_NAME>
    >
    > <COLUMN>
    >
    > <COLUMN_NAME>GP_POOL</COLUMN_NAME>
    >
    > <PRIMARY_KEY>Y</PRIMARY_KEY>
    >
    > <COLUMN_VALUE>Some&Value</COLUMN_VALUE>
    >
    > </COLUMN>
    >
    > </Record>
    >
    >
    >
    > When Parsing (i.e. converting this XML doc to String) this XML file
    > using Java code, I am getting following exception.
    >
    > org.xml.sax.SAXParseException: Next character must be ";" terminating
    > reference to entity "Value".
    >
    >
    >
    > I think there is some changes/modification needed in DTD to treat the
    > string in XML which contains & as a literal, instead of expecting some
    > entity.
    >
    > Adding to this, XML content is not under our control.


    Like the other replier said, it's invalid XML. It shouldn't contain a
    'naked' ampersand like that.
    Do you have any chance at all to speak to the producer of this XML? It's
    very reasonable to ask them to fix it. If you can't ask them to fix it,
    then how about:

    1) put in a fix yourself - e.g. do a search and replace kludge on the
    content before the XML parser gets it - so replace naked '&' with
    '&amp;' (and any other nasty characters that crop up)
    2) At least tell the party making the XML that it is broken - you may
    help someone else down the line by doing this, if not yourself

    lex
     
    Alex Hunsley, Feb 15, 2007
    #3
  4. Guest

    On Feb 14, 4:39 pm, "Daniel Dyer" <"You don't need it"> wrote:
    > On Wed, 14 Feb 2007 11:31:18 -0000,
    >
    > <> wrote:
    > > When Parsing (i.e. converting this XML doc to String) this XML file
    > > using Java code, I am getting following exception.

    >
    > > org.xml.sax.SAXParseException: Next character must be ";" terminating
    > > reference to entity "Value".

    >
    > Section 2.4 of the XML 1.0 specification:
    >
    > "The ampersand character (&) and the left angle bracket (<) MUST NOT
    > appear in their literal form, except when used as markup delimiters, or
    > within a comment, a processing instruction, or a CDATA section. If they
    > are needed elsewhere, they MUST be escaped using either numeric character
    > references or the strings "&amp;" and "&lt;" respectively. The right angle
    > bracket (>) may be represented using the string "&gt;", and MUST, for
    > compatibility, be escaped using either "&gt;" or a character reference
    > when it appears in the string "]]>" in content, when that string is not
    > marking the end of a CDATA section."
    >
    > > I think there is some changes/modification needed in DTD to treat the
    > > string in XML which contains & as a literal, instead of expecting some
    > > entity.

    >
    > You can't fix this in the DTD, the XML is invalid and the parser is
    > correct to reject it.
    >
    > > Adding to this, XML content is not under our control.

    >
    > Unforunately, the only rational fix *is* to change the XML. Either use
    > &amp; or wrap the element data in a CDATA section. If the XML is
    > controlled by a third part it would be reasonable to request that they
    > change it since it is not really XML at all if it is not valid.
    >
    > Dan.
    >
    > --
    > Daniel Dyerhttp://www.uncommons.org


    Thanks Daniel,
    That info really helped.

    Regards
    Sohan
     
    , Feb 18, 2007
    #4
  5. Guest

    On Feb 16, 4:45 am, Alex Hunsley <> wrote:
    > wrote:
    > > Hi,

    >
    > > XML file content is:

    >
    > > <?xml version="1.0"?>

    >
    > > <!DOCTYPE RECORD SYSTEM ".\RECORD.dtd">

    >
    > > <RECORD EXT_SOURCE="DEFAULT" TEMPLATE="GAS_POOL_POINTS">

    >
    > > <TABLE_NAME>GAS_POOL_POINTS</TABLE_NAME>

    >
    > > <COLUMN>

    >
    > > <COLUMN_NAME>GP_POOL</COLUMN_NAME>

    >
    > > <PRIMARY_KEY>Y</PRIMARY_KEY>

    >
    > > <COLUMN_VALUE>Some&Value</COLUMN_VALUE>

    >
    > > </COLUMN>

    >
    > > </Record>

    >
    > > When Parsing (i.e. converting this XML doc to String) this XML file
    > > using Java code, I am getting following exception.

    >
    > > org.xml.sax.SAXParseException: Next character must be ";" terminating
    > > reference to entity "Value".

    >
    > > I think there is some changes/modification needed in DTD to treat the
    > > string in XML which contains & as a literal, instead of expecting some
    > > entity.

    >
    > > Adding to this, XML content is not under our control.

    >
    > Like the other replier said, it's invalid XML. It shouldn't contain a
    > 'naked' ampersand like that.
    > Do you have any chance at all to speak to the producer of this XML? It's
    > very reasonable to ask them to fix it. If you can't ask them to fix it,
    > then how about:
    >
    > 1) put in a fix yourself - e.g. do a search and replace kludge on the
    > content before the XML parser gets it - so replace naked '&' with
    > '&amp;' (and any other nasty characters that crop up)
    > 2) At least tell the party making the XML that it is broken - you may
    > help someone else down the line by doing this, if not yourself
    >
    > lex- Hide quoted text -
    >
    > - Show quoted text -


    Thanks Lex,

    Sohan
     
    , Feb 18, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. James Dyer
    Replies:
    5
    Views:
    687
  2. Ed
    Replies:
    6
    Views:
    1,345
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=
    Aug 2, 2007
  3. Kai Schlamp
    Replies:
    1
    Views:
    430
    Arne Vajhøj
    Mar 27, 2008
  4. beginner
    Replies:
    13
    Views:
    581
  5. anne001
    Replies:
    4
    Views:
    191
    Robert Klemme
    Aug 11, 2008
Loading...

Share This Page