Unicode and such

E

EdwardH

The file "höhö" is shown as "h?h?" when I get a file.getName().

java.nio.charset.Charset.defaultCharset().name()
US-ASCII

System.getProperty("file.encoding")
ANSI_X3.4-1968


I've played around and set file.encoding to ascii, utf-8, utf-16, cp437
and iso-8859-1. Nothing helps.

Can anyone tell me what to do to fix this?

(I'm running and amd64 linux system, btw).
 
E

EdwardH

Try setting the encoding specifically at file open.

Where would one do that?

File doesn't take a (String filename, String encoding) constructor.
 
C

Chris Uppal

EdwardH said:
The file "höhö" is shown as "h?h?" when I get a file.getName().

java.nio.charset.Charset.defaultCharset().name()
US-ASCII

So the system has no way of printing out the name using the default charset.
If you check the four chars in the name then they, presumably, will not include
63 (the question mark), but will have the correct Unicode code point for ö
(whatever that might be).

You don't say how you are viewing the filename, but whatever it is (debugger,
System.out.println(), ...) will need to be told to use a charset that can
represent ö.

System.getProperty("file.encoding")
ANSI_X3.4-1968


I've played around and set file.encoding to ascii, utf-8, utf-16, cp437
and iso-8859-1. Nothing helps.

I don't know (off the top of my head) what the 'file.encoding' property is used
for, but I very much doubt if it's relevant here. At a guess it's used as the
default charset for interpreting the /contents/ of files -- but that's a guess.

-- chris
 
E

EdwardH

EdwardH said:
Where would one do that?

File doesn't take a (String filename, String encoding) constructor.

Fixed!

export LC_CTYPE=en_US

It was previously POSIX, which I'm sure is short for "Piece of Shit IX".
 
Z

zero

The file "höhö" is shown as "h?h?" when I get a file.getName().

java.nio.charset.Charset.defaultCharset().name()
US-ASCII

System.getProperty("file.encoding")
ANSI_X3.4-1968


I've played around and set file.encoding to ascii, utf-8, utf-16, cp437
and iso-8859-1. Nothing helps.

Can anyone tell me what to do to fix this?

(I'm running and amd64 linux system, btw).

omg I'm getting nightmares again... I had a similar problem with
retreiving a name from a Clipper database in an internship (I had to
convert an old Clipper program to Java). In the end I just gave up and
added some code that replaced the ö characters with their Unicode
equivalent.
 
R

Roedy Green

Where would one do that?

File doesn't take a (String filename, String encoding) constructor.

The file class has nothing to do with contents or reading or writing.
It is about file names and existence.

You need to look elsewhere. In regular file i/o it is the Readers and
Writers.

In nio look at the Charset, CharsetDecoder
 
M

Mike Schilling

Chris Uppal said:
I don't know (off the top of my head) what the 'file.encoding' property is
used
for, but I very much doubt if it's relevant here. At a guess it's used as
the
default charset for interpreting the /contents/ of files -- but that's a
guess.

You're right; it's the default encoding used by FileReader and FileWriter.
 
T

Thomas Fritsch

Mike Schilling said:
Chris Uppal said:
I don't know (off the top of my head) what the 'file.encoding' property
is used for, [...] At a guess it's used as the
default charset for interpreting the /contents/ of files -- but that's a
guess.

You're right; it's the default encoding used by FileReader and FileWriter.
Even more: it's the default encoding used by
InputStreamReader, OutputStreamWriter
String ( constructor String(byte[]), method getBytes() )
 
M

Mike Schilling

Thomas Fritsch said:
Mike Schilling said:
Chris Uppal said:
I don't know (off the top of my head) what the 'file.encoding' property
is used for, [...] At a guess it's used as the
default charset for interpreting the /contents/ of files -- but that's a
guess.

You're right; it's the default encoding used by FileReader and
FileWriter.
Even more: it's the default encoding used by
InputStreamReader, OutputStreamWriter
String ( constructor String(byte[]), method getBytes() )

So it is. That is, it's the "defaut encoding", period. Misleadingly named,
if you ask me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top