Incorrect parsing of special characters

Discussion in 'Java' started by Dario Di Bella, Jun 17, 2004.

  1. Hi all,
    I hope someone can help me on this. I need to parse the following XML:

    ....
    <area name="promotore">
    <item id="004" code="003" description="attivita promotore">
    <![CDATA[»&nbsp;Attività&nbsp;Promotore]]>
    </item>
    </area>
    ....

    As you can see I used the CDATA section to include special characters.
    Unfortunately as I parse the file, the "item" element content turns to
    be:

    »&nbsp;Attività&nbsp;Promotore

    i.e. the "Â" character is inserted at the beginning of the string and
    the "à" character is translated into "Ã ".

    I'm using the javax.xml.parsers.DocumentBuilder parser.

    Has anyone got any clue? Thanks.

    Dario
     
    Dario Di Bella, Jun 17, 2004
    #1
    1. Advertising

  2. Dario Di Bella wrote:

    > <![CDATA[»&nbsp;Attività&nbsp;Promotore]]>
    > »&nbsp;Attività &nbsp;Promotore
    >
    > i.e. the "Â" character is inserted at the beginning of the string and
    > the "à" character is translated into "Ã ".


    Check your charset encoding. This looks very much as if the encoding in
    which the XML comes and the encoding used to read it don't match.

    /Thomas
     
    Thomas Weidenfeller, Jun 18, 2004
    #2
    1. Advertising

  3. Dario Di Bella wrote:
    > As you can see I used the CDATA section to include special characters.
    > Unfortunately as I parse the file, the "item" element content turns to
    > be:
    >
    > »&nbsp;Attività &nbsp;Promotore
    >
    > i.e. the "Â" character is inserted at the beginning of the string and
    > the "à" character is translated into "Ã ".


    Does your document correctly declare its encoding? If you specify
    none, the default is UTF-8 whereas Windows text editors usually
    default to CP1252. Trying to parse CP1252-encoded text as UTF-8
    could easily lead to the weirdness you describe.
     
    Michael Borgwardt, Jun 18, 2004
    #3
  4. Bjoern/Michael/Thomas,

    I solved this issue declaring a different encoding ("iso-8859-1"
    instead of "utf-8"). Thank you very much for your help, and excuse me
    for bothering you with a trivial problem ;-)

    Best regards.

    Dario.
     
    Dario Di Bella, Jun 18, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stefan Mueller
    Replies:
    3
    Views:
    33,308
    Stefan Mueller
    Jul 23, 2006
  2. Dario Di Bella

    Incorrect parsing of special characters

    Dario Di Bella, Jun 17, 2004, in forum: XML
    Replies:
    6
    Views:
    1,321
    Dario Di Bella
    Jun 18, 2004
  3. Replies:
    2
    Views:
    1,140
    Ingo Menger
    May 31, 2007
  4. rvino
    Replies:
    0
    Views:
    4,720
    rvino
    Aug 14, 2007
  5. majna
    Replies:
    4
    Views:
    777
    Thomas 'PointedEars' Lahn
    Sep 19, 2007
Loading...

Share This Page