xml and java euro signs disapear

Discussion in 'XML' started by flm, May 11, 2005.

  1. flm

    flm Guest

    I've got an XML document that contains euro signs and looks like :

    <?xml version="1.0" encoding="utf-8"?>
    <merchant id="52">
    <product
    offerid="03543068131"
    deliverycost="6,90 €"
    />
    ....

    I use this bit of Java (jdk 1.4.2) code to parse it :

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse( file_ );

    The problem is the euro signs are transformed into the charactere '?'
    (printing the value of a getAttribute( "deliverycost" ) gives ? on a
    utf-8 terminal)

    Thanks for any help,
    FL
     
    flm, May 11, 2005
    #1
    1. Advertising

  2. You have declared that your xml file is utf-8 encoded but have used (as
    far as I can tell) a byte with value 128 to represent a euro which isn't
    the utf8 encoding of character 8364 which is the Euro.
    You either need to declare the encoding that you are using or express
    the character in an encoding-neutral form such as
    "& # 8364 ;"
    (without the spaces

    David
     
    David Carlisle, May 11, 2005
    #2
    1. Advertising

  3. Thank for you reply David.
    If I use & # 8364; or even & # x20ac like you recommand I got the same
    result.

    FLM

    *** Sent via Developersdex http://www.developersdex.com ***
     
    Francois-Louis Mommens, May 11, 2005
    #3
  4. "flm" <> writes:

    > The problem is the euro signs are transformed into the charactere '?'
    > (printing the value of a getAttribute( "deliverycost" ) gives ? on a
    > utf-8 terminal)


    The problem is in "printing", probably because your Writer object has
    improper encoding and/or mismatching locale. Or because you use
    System.out, which use the locale-specified encoding, which may not be
    utf-8. It's probably best to give an explicit encoding/charset.

    -- Alain.
     
    Alain Ketterlin, May 11, 2005
    #4
  5. Francois-Louis Mommens wrote:

    > If I use & # 8364; or even & # x20ac like you recommand I got the same
    > result.


    Are you sure that output terminal is able to render a Euro symbol
    properly? What happens if you do not use XML at all but try to output a
    Euro symbol '€' from a normal string?

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, May 11, 2005
    #5
  6. Hi there


    flm wrote:

    > I've got an XML document that contains euro signs and looks like :
    >
    > <?xml version="1.0" encoding="utf-8"?>
    > <merchant id="52">
    > <product
    > offerid="03543068131"
    > deliverycost="6,90 ?"
    > />
    > ...
    >
    > I use this bit of Java (jdk 1.4.2) code to parse it :
    >
    > DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    > DocumentBuilder builder = factory.newDocumentBuilder();
    > Document document = builder.parse( file_ );
    >
    > The problem is the euro signs are transformed into the charactere '?'
    > (printing the value of a getAttribute( "deliverycost" ) gives ? on a
    > utf-8 terminal)


    If you want to post an UTF-8 file, use UTF-8 as charset; Set the default
    charset in your browser / newsreader to UTF-8.

    Set your locale to UTF-8, eg en_GB.UTF-8 or en_US.UTF-8
    Set de default characterset of your editor to UTF-8.
    Use an UTF-8 enabled terminal such as uxterm.
    Install unicode fonts such as Cyberbit.ttf, Ariel-unicode or GNU-unifont
    and install a unicode font as your default font.


    Regards,
    Rob
    --
    +----------------------------------------------------------------------+
    | The EU constitution will turn the EU into an USA colony |
    | Vote against the EU constitution in the referendum |
    +----------------------------------------------------------------------+
     
    Rob van der Putten, May 12, 2005
    #6
  7. Hi there


    Rob van der Putten wrote:

    > If you want to post an UTF-8 file, use UTF-8 as charset; Set the default
    > charset in your browser / newsreader to UTF-8.
    >
    > Set your locale to UTF-8, eg en_GB.UTF-8 or en_US.UTF-8
    > Set de default characterset of your editor to UTF-8.
    > Use an UTF-8 enabled terminal such as uxterm.
    > Install unicode fonts such as Cyberbit.ttf, Ariel-unicode or GNU-unifont
    > and install a unicode font as your default font.


    If all goes well, this should be UTF-8;

    Nicer typography in plain text files:

    â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
    â•‘ â•‘
    ║ • ‘single’ and “double†quotes ║
    â•‘ â•‘
    ║ • Curly apostrophes: “We’ve been here†║
    â•‘ â•‘
    ║ • Latin-1 apostrophe and accents: '´` ║
    â•‘ â•‘
    ║ • ‚deutsche‘ „Anführungszeichen“ ║
    â•‘ â•‘
    ║ • †, ‡, ‰, •, 3–4, —, −5/+5, ™, … ║
    â•‘ â•‘
    ║ • ASCII safety test: 1lI|, 0OD, 8B ║
    ║ ╭─────────╮ ║
    ║ • the euro symbol: │ 14.95 € │ ║
    ║ ╰─────────╯ ║
    â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•

    Russian:

    From a Unicode conference invitation:

    ЗарегиÑтрируйтеÑÑŒ ÑÐµÐ¹Ñ‡Ð°Ñ Ð½Ð° ДеÑÑтую Международную Конференцию по
    Unicode, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ ÑоÑтоитÑÑ 10-12 марта 1997 года в Майнце в Германии.
    ÐšÐ¾Ð½Ñ„ÐµÑ€ÐµÐ½Ñ†Ð¸Ñ Ñоберет широкий круг ÑкÑпертов по вопроÑам глобального
    Интернета и Unicode, локализации и интернационализации, воплощению и
    применению Unicode в различных операционных ÑиÑтемах и программных
    приложениÑÑ…, шрифтах, верÑтке и многоÑзычных компьютерных ÑиÑтемах.

    Greek:

    From a speech of Demosthenes in the 4th century BC:

    Οá½Ï‡á½¶ ταá½Ï„á½° παÏίσταταί μοι γιγνώσκειν, ὦ ἄνδÏες ᾿Αθηναῖοι,
    ὅταν τ᾿ εἰς Ï„á½° Ï€Ïάγματα ἀποβλέψω καὶ ὅταν Ï€Ïὸς τοὺς
    λόγους οὓς ἀκούω· τοὺς μὲν Î³á½°Ï Î»á½¹Î³Î¿Ï…Ï‚ πεÏὶ τοῦ
    τιμωÏήσασθαι Φίλιππον á½Ïῶ γιγνομένους, Ï„á½° δὲ Ï€Ïάγματ᾿
    εἰς τοῦτο Ï€Ïοήκοντα, ὥσθ᾿ ὅπως μὴ πεισόμεθ᾿ αá½Ï„οὶ
    Ï€ÏότεÏον κακῶς σκέψασθαι δέον. οá½Î´á½³Î½ οὖν ἄλλο μοι δοκοῦσιν
    οἱ Ï„á½° τοιαῦτα λέγοντες á¼¢ τὴν ὑπόθεσιν, πεÏὶ ἧς βουλεύεσθαι,
    οá½Ï‡á½¶ τὴν οὖσαν παÏιστάντες ὑμῖν á¼Î¼Î±Ïτάνειν. á¼Î³á½¼ δέ, ὅτι μέν
    ποτ᾿ á¼Î¾á¿†Î½ τῇ πόλει καὶ Ï„á½° αὑτῆς ἔχειν ἀσφαλῶς καὶ Φίλιππον
    τιμωÏήσασθαι, καὶ μάλ᾿ ἀκÏιβῶς οἶδα· á¼Ï€á¾¿ á¼Î¼Î¿á¿¦ γάÏ, οὠπάλαι
    γέγονεν ταῦτ᾿ ἀμφότεÏα· νῦν μέντοι πέπεισμαι τοῦθ᾿ ἱκανὸν
    Ï€Ïολαβεῖν ἡμῖν εἶναι τὴν Ï€Ïώτην, ὅπως τοὺς συμμάχους
    σώσομεν. á¼á½°Î½ Î³á½°Ï Ï„Î¿á¿¦Ï„Î¿ βεβαίως ὑπάÏξῃ, τότε καὶ πεÏὶ τοῦ
    τίνα τιμωÏήσεταί τις καὶ ὃν Ï„Ïόπον á¼Î¾á½³ÏƒÏ„αι σκοπεῖν· Ï€Ïὶν δὲ
    τὴν á¼€Ïχὴν á½€Ïθῶς ὑποθέσθαι, μάταιον ἡγοῦμαι πεÏὶ τῆς
    τελευτῆς á½Î½Ï„ινοῦν ποιεῖσθαι λόγον.

    Δημοσθένους, Γ´ ᾿Ολυνθιακὸς

    All the display, editing and conversion software you use should also be
    capable of handling UTF-8.


    Regards,
    Rob
    --
    +----------------------------------------------------------------------+
    | The EU constitution will turn the EU into an USA colony |
    | Vote against the EU constitution in the referendum |
    +----------------------------------------------------------------------+
     
    Rob vd Putten, May 12, 2005
    #7
  8. Hi there


    Martin Honnen wrote:

    > Are you sure that output terminal is able to render a Euro symbol
    > properly? What happens if you do not use XML at all but try to output a
    > Euro symbol '?' from a normal string?


    Most UTF-8 enviroments display dec 128 / hex 0x80 as a glyph looking
    something like;

    +----+
    | 00 |
    | 80 |
    +----+

    The same applies to other glyphs in the 128 / 0x80 ... 159 / 0x9F range;

    +----+
    | 00 |
    | 9F |
    +----+

    Maybe UTF-8 is somehow converterd to CP-1252.

    Try yudit, http://www.yudit.org/ to view and edit your files.


    Regards,
    Rob
    --
    +----------------------------------------------------------------------+
    | The EU constitution will turn the EU into an USA colony |
    | Vote against the EU constitution in the referendum |
    +----------------------------------------------------------------------+
     
    Rob van der Putten, May 12, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Thunell

    dynamic controls - disapear after postback

    Chris Thunell, Jul 27, 2004, in forum: ASP .Net
    Replies:
    13
    Views:
    4,856
    Andrea Williams
    Jul 29, 2004
  2. Alex
    Replies:
    3
    Views:
    391
  3. flm
    Replies:
    11
    Views:
    1,691
    John McGrath
    May 13, 2005
  4. =?Utf-8?B?TWljaGFlbA==?=

    controls disapear after adding Namespace

    =?Utf-8?B?TWljaGFlbA==?=, Apr 26, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    315
    =?Utf-8?B?TWljaGFlbA==?=
    Apr 26, 2006
  5. Lionel Thiry

    [ruby2] will '@@' disapear in ruby2?

    Lionel Thiry, Mar 12, 2005, in forum: Ruby
    Replies:
    12
    Views:
    173
    Lionel Thiry
    Mar 17, 2005
Loading...

Share This Page