Re: SAX parser splits URL ...

Discussion in 'Java' started by Robert Klemme, Jun 27, 2012.

  1. On 27.06.2012 05:50, lbrt chx _ gemale wrote:
    > I have an URL in an XML file that looks like this:
    > ~
    > ...
    > <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location>
    > ...
    > ~
    > http://xsdvalidation.utilities-online.info/
    > ~
    > is telling me the document itself is valid, but the SAX parser is
    > splitting the value at every "&"
    > ~
    > // __ start element iIxLvl: |3|Location
    > // __ start characters iIxLvl: |3|http://pagesinxt.com/?dn=www.outfo.org|
    > // __ start characters iIxLvl: |3|&|
    > // __ start characters iIxLvl: |3|flrdr=yes|
    > // __ start characters iIxLvl: |3|&|
    > // __ start characters iIxLvl: |3|nxte=zip|
    > // __ end element iIxLvl: |2|Location|
    > ~
    > I found some sort of an explanation here:
    > ~
    > http://stackoverflow.com/questions/1328538/how-do-i-escape-ampersands-in-xml
    > ~
    > I couldn't make much sense of (I tried a few things)
    > ~
    > Is this related to a setting in the parser? Is there a way to fix that problem?


    That's not related to the parser - at least not to a particular one. It
    is a feature of XML which allows you to include characters in the
    document which are not supported by the native encoding you use when
    writing the document.

    The concept is known as "XML entity". Please see
    http://www.tizag.com/xmlTutorial/xmlentity.php
    http://www.javacommerce.com/displaypage.jsp?name=entities.sql&id=18238

    The standard
    http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-references

    Bottom line, you can do

    <Location>http://pagesinxt.com/?dn=www.outfo.org&amp;flrdr=yes&amp;nxte=zip</Location>

    But please read up on XML more thoroughly - it pays off.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Jun 27, 2012
    #1
    1. Advertisements

  2. On Wednesday, June 27, 2012 7:34:18 AM UTC+2, Robert Klemme wrote:
    > On 27.06.2012 05:50, lbrt chx _ gemale wrote:
    > > I have an URL in an XML file that looks like this:
    > > ~
    > > ...
    > > <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location>
    > > ...
    > > ~
    > > http://xsdvalidation.utilities-online.info/
    > > ~
    > > is telling me the document itself is valid, but the SAX parser is
    > > splitting the value at every "&"
    > > ~
    > > // __ start element iIxLvl: |3|Location
    > > // __ start characters iIxLvl: |3|http://pagesinxt.com/?dn=www.outfo.org|
    > > // __ start characters iIxLvl: |3|&|
    > > // __ start characters iIxLvl: |3|flrdr=yes|
    > > // __ start characters iIxLvl: |3|&|
    > > // __ start characters iIxLvl: |3|nxte=zip|
    > > // __ end element iIxLvl: |2|Location|


    I forgot to mention one thing: the SAX parser is quite free to hand over character sequences in any number of chunks as long as it maintains original order from the document and ensures all characters come from the same external entity. See:

    http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int%29

    Kind regards

    robert
     
    Robert Klemme, Jun 27, 2012
    #2
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?a2Q=?=

    How to toggle between window splits?

    =?Utf-8?B?a2Q=?=, May 2, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    529
    =?Utf-8?B?a2Q=?=
    May 2, 2005
  2. William Brogden
    Replies:
    1
    Views:
    8,688
    Manoj S. P.
    Jun 30, 2003
  3. Bernard
    Replies:
    2
    Views:
    1,177
    Bernard
    Nov 14, 2003
  4. Martin Schlatter

    Encoding problem with SAX parser

    Martin Schlatter, Dec 10, 2003, in forum: Java
    Replies:
    2
    Views:
    1,031
    Martin Schlatter
    Dec 14, 2003
  5. Vikas
    Replies:
    0
    Views:
    630
    Vikas
    Jul 19, 2004
  6. Sharp

    Sax Parser

    Sharp, Oct 18, 2004, in forum: Java
    Replies:
    1
    Views:
    468
    Morten Alver
    Oct 18, 2004
  7. shawn bright
    Replies:
    6
    Views:
    221
    shawn bright
    Feb 5, 2009
  8. Hoggman
    Replies:
    1
    Views:
    196
    Randy Webb
    Aug 17, 2004
Loading...