utf-8 not working

D

David Thielen

I have:

<?xml version="1.0" encoding="UTF-8"?>
<FLD DSC="KEYWORD">Reiner's Geschäftschance</FLD>

And when I do a Node.valueOf() I get back Reiner's Geschäftschance
instead of Reiner's Geschäftschance

Any idea what might be going on? I'm using dom4j and java 1.4.

Thanks - dave


david@[email protected]
Windward Reports -- http://www.WindwardReports.com
DefendTek -- http://www.DefendTek.com
Page 2 Stage -- http://www.Page2Stage.com
Enemy Nations -- http://www.EnemyNations.com
me -- http://dave.thielen.com
Barbie Science Fair -- http://www.BarbieScienceFair.info
Hillary Clinton -- http://www.HillaryIn2004.org
(yes I have lots of links)
 
J

Jon Skeet

David Thielen said:
I have:

<?xml version="1.0" encoding="UTF-8"?>
<FLD DSC="KEYWORD">Reiner's Geschäftschance</FLD>

And when I do a Node.valueOf() I get back Reiner's Geschäftschance
instead of Reiner's Geschäftschance

Any idea what might be going on? I'm using dom4j and java 1.4.

How are you displaying the data? The best thing to do here is look at
each phase in turn. In this case:

o What are the relevant bytes in the file?
o What is the Unicode character value (as an integer) of the character
after 'h'?

Take fonts and everything else out of the picture, just look at the
UTF-8 encoded values before loading, and the Unicode value after.
 
T

Timo Kinnunen

David Thielen said:
I have:

<?xml version="1.0" encoding="UTF-8"?>
<FLD DSC="KEYWORD">Reiner's Geschäftschance</FLD>

And when I do a Node.valueOf() I get back Reiner's
Geschäftschance instead of Reiner's Geschäftschance

Any idea what might be going on? I'm using dom4j and java 1.4.

Assuming that I copied the same bytes that you pasted into your
message, Emacs, IE and Mozilla agree that c3a4 (ä) should be an ä.

If you are using console output, a simple String.length() display
should tell you whether you have extra characters in your string.
 
D

David Thielen

I'm looking at the String contents with the debugger in the line after

String val = node.valueOf("/root/element");

very weird

Assuming that I copied the same bytes that you pasted into your
message, Emacs, IE and Mozilla agree that c3a4 (ä) should be an ä.

If you are using console output, a simple String.length() display
should tell you whether you have extra characters in your string.


david@[email protected]
Windward Reports -- http://www.WindwardReports.com
DefendTek -- http://www.DefendTek.com
Page 2 Stage -- http://www.Page2Stage.com
Enemy Nations -- http://www.EnemyNations.com
me -- http://dave.thielen.com
Barbie Science Fair -- http://www.BarbieScienceFair.info
Hillary Clinton -- http://www.HillaryIn2004.org
(yes I have lots of links)
 
J

Jon Skeet

David Thielen said:
I'm looking at the String contents with the debugger in the line after

String val = node.valueOf("/root/element");

Rather than look in the debugger, dump it out one character at a time -
I've seen various debuggers fail nastily with strings :(
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top