xml and java euro signs disapear

F

flm

I've got an XML document that contains euro signs and looks like :

<?xml version="1.0" encoding="utf-8"?>
<merchant id="52">
<product
offerid="03543068131"
deliverycost="6,90 €"
/>
....

I use this bit of Java (jdk 1.4.2) code to parse it :

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse( file_ );

The problem is the euro signs are transformed into the charactere '?'
(printing the value of a getAttribute( "deliverycost" ) gives ? on a
utf-8 terminal)

Thanks for any help,
FL
 
D

David Carlisle

You have declared that your xml file is utf-8 encoded but have used (as
far as I can tell) a byte with value 128 to represent a euro which isn't
the utf8 encoding of character 8364 which is the Euro.
You either need to declare the encoding that you are using or express
the character in an encoding-neutral form such as
"& # 8364 ;"
(without the spaces

David
 
F

Francois-Louis Mommens

Thank for you reply David.
If I use & # 8364; or even & # x20ac like you recommand I got the same
result.

FLM
 
A

Alain Ketterlin

flm said:
The problem is the euro signs are transformed into the charactere '?'
(printing the value of a getAttribute( "deliverycost" ) gives ? on a
utf-8 terminal)

The problem is in "printing", probably because your Writer object has
improper encoding and/or mismatching locale. Or because you use
System.out, which use the locale-specified encoding, which may not be
utf-8. It's probably best to give an explicit encoding/charset.

-- Alain.
 
M

Martin Honnen

Francois-Louis Mommens said:
If I use & # 8364; or even & # x20ac like you recommand I got the same
result.

Are you sure that output terminal is able to render a Euro symbol
properly? What happens if you do not use XML at all but try to output a
Euro symbol '€' from a normal string?
 
R

Rob van der Putten

Hi there

I've got an XML document that contains euro signs and looks like :

<?xml version="1.0" encoding="utf-8"?>
<merchant id="52">
<product
offerid="03543068131"
deliverycost="6,90 ?"
/>
...

I use this bit of Java (jdk 1.4.2) code to parse it :

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse( file_ );

The problem is the euro signs are transformed into the charactere '?'
(printing the value of a getAttribute( "deliverycost" ) gives ? on a
utf-8 terminal)

If you want to post an UTF-8 file, use UTF-8 as charset; Set the default
charset in your browser / newsreader to UTF-8.

Set your locale to UTF-8, eg en_GB.UTF-8 or en_US.UTF-8
Set de default characterset of your editor to UTF-8.
Use an UTF-8 enabled terminal such as uxterm.
Install unicode fonts such as Cyberbit.ttf, Ariel-unicode or GNU-unifont
and install a unicode font as your default font.


Regards,
Rob
 
R

Rob vd Putten

Hi there

If you want to post an UTF-8 file, use UTF-8 as charset; Set the default
charset in your browser / newsreader to UTF-8.

Set your locale to UTF-8, eg en_GB.UTF-8 or en_US.UTF-8
Set de default characterset of your editor to UTF-8.
Use an UTF-8 enabled terminal such as uxterm.
Install unicode fonts such as Cyberbit.ttf, Ariel-unicode or GNU-unifont
and install a unicode font as your default font.

If all goes well, this should be UTF-8;

Nicer typography in plain text files:

â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
â•‘ â•‘
║ • ‘single’ and “double†quotes ║
â•‘ â•‘
║ • Curly apostrophes: “We’ve been here†║
â•‘ â•‘
║ • Latin-1 apostrophe and accents: '´` ║
â•‘ â•‘
║ • ‚deutsche‘ „Anführungszeichen“ ║
â•‘ â•‘
║ • †, ‡, ‰, •, 3–4, —, −5/+5, ™, … ║
â•‘ â•‘
║ • ASCII safety test: 1lI|, 0OD, 8B ║
║ ╭─────────╮ ║
║ • the euro symbol: │ 14.95 € │ ║
║ ╰─────────╯ ║
â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•

Russian:

From a Unicode conference invitation:

ЗарегиÑтрируйтеÑÑŒ ÑÐµÐ¹Ñ‡Ð°Ñ Ð½Ð° ДеÑÑтую Международную Конференцию по
Unicode, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ ÑоÑтоитÑÑ 10-12 марта 1997 года в Майнце в Германии.
ÐšÐ¾Ð½Ñ„ÐµÑ€ÐµÐ½Ñ†Ð¸Ñ Ñоберет широкий круг ÑкÑпертов по вопроÑам глобального
Интернета и Unicode, локализации и интернационализации, воплощению и
применению Unicode в различных операционных ÑиÑтемах и программных
приложениÑÑ…, шрифтах, верÑтке и многоÑзычных компьютерных ÑиÑтемах.

Greek:

From a speech of Demosthenes in the 4th century BC:

Οá½Ï‡á½¶ ταá½Ï„á½° παÏίσταταί μοι γιγνώσκειν, ὦ ἄνδÏες ᾿Αθηναῖοι,
ὅταν τ᾿ εἰς Ï„á½° Ï€Ïάγματα ἀποβλέψω καὶ ὅταν Ï€Ïὸς τοὺς
λόγους οὓς ἀκούω· τοὺς μὲν Î³á½°Ï Î»á½¹Î³Î¿Ï…Ï‚ πεÏὶ τοῦ
τιμωÏήσασθαι Φίλιππον á½Ïῶ γιγνομένους, Ï„á½° δὲ Ï€Ïάγματ᾿
εἰς τοῦτο Ï€Ïοήκοντα, ὥσθ᾿ ὅπως μὴ πεισόμεθ᾿ αá½Ï„οὶ
Ï€ÏότεÏον κακῶς σκέψασθαι δέον. οá½Î´á½³Î½ οὖν ἄλλο μοι δοκοῦσιν
οἱ Ï„á½° τοιαῦτα λέγοντες á¼¢ τὴν ὑπόθεσιν, πεÏὶ ἧς βουλεύεσθαι,
οá½Ï‡á½¶ τὴν οὖσαν παÏιστάντες ὑμῖν á¼Î¼Î±Ïτάνειν. á¼Î³á½¼ δέ, ὅτι μέν
ποτ᾿ á¼Î¾á¿†Î½ τῇ πόλει καὶ Ï„á½° αὑτῆς ἔχειν ἀσφαλῶς καὶ Φίλιππον
τιμωÏήσασθαι, καὶ μάλ᾿ ἀκÏιβῶς οἶδα· á¼Ï€á¾¿ á¼Î¼Î¿á¿¦ γάÏ, οὠπάλαι
γέγονεν ταῦτ᾿ ἀμφότεÏα· νῦν μέντοι πέπεισμαι τοῦθ᾿ ἱκανὸν
Ï€Ïολαβεῖν ἡμῖν εἶναι τὴν Ï€Ïώτην, ὅπως τοὺς συμμάχους
σώσομεν. á¼á½°Î½ Î³á½°Ï Ï„Î¿á¿¦Ï„Î¿ βεβαίως ὑπάÏξῃ, τότε καὶ πεÏὶ τοῦ
τίνα τιμωÏήσεταί τις καὶ ὃν Ï„Ïόπον á¼Î¾á½³ÏƒÏ„αι σκοπεῖν· Ï€Ïὶν δὲ
τὴν á¼€Ïχὴν á½€Ïθῶς ὑποθέσθαι, μάταιον ἡγοῦμαι πεÏὶ τῆς
τελευτῆς á½Î½Ï„ινοῦν ποιεῖσθαι λόγον.

Δημοσθένους, Γ´ ᾿Ολυνθιακὸς

All the display, editing and conversion software you use should also be
capable of handling UTF-8.


Regards,
Rob
 
R

Rob van der Putten

Hi there


Martin said:
Are you sure that output terminal is able to render a Euro symbol
properly? What happens if you do not use XML at all but try to output a
Euro symbol '?' from a normal string?

Most UTF-8 enviroments display dec 128 / hex 0x80 as a glyph looking
something like;

+----+
| 00 |
| 80 |
+----+

The same applies to other glyphs in the 128 / 0x80 ... 159 / 0x9F range;

+----+
| 00 |
| 9F |
+----+

Maybe UTF-8 is somehow converterd to CP-1252.

Try yudit, http://www.yudit.org/ to view and edit your files.


Regards,
Rob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top