Unicode and such

Discussion in 'Java' started by EdwardH, Oct 20, 2005.

  1. EdwardH

    EdwardH Guest

    The file "höhö" is shown as "h?h?" when I get a file.getName().

    java.nio.charset.Charset.defaultCharset().name()
    US-ASCII

    System.getProperty("file.encoding")
    ANSI_X3.4-1968


    I've played around and set file.encoding to ascii, utf-8, utf-16, cp437
    and iso-8859-1. Nothing helps.

    Can anyone tell me what to do to fix this?

    (I'm running and amd64 linux system, btw).
     
    EdwardH, Oct 20, 2005
    #1
    1. Advertising

  2. EdwardH

    Roedy Green Guest

    On Thu, 20 Oct 2005 11:41:07 GMT, EdwardH
    <edwardh@N:O:S:p:A:M:edward.dyndns.org> wrote or quoted :

    >Can anyone tell me what to do to fix this?


    Try setting the encoding specifically at file open.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
     
    Roedy Green, Oct 20, 2005
    #2
    1. Advertising

  3. EdwardH

    EdwardH Guest

    > Try setting the encoding specifically at file open.

    Where would one do that?

    File doesn't take a (String filename, String encoding) constructor.
     
    EdwardH, Oct 20, 2005
    #3
  4. EdwardH

    Chris Uppal Guest

    EdwardH wrote:
    > The file "höhö" is shown as "h?h?" when I get a file.getName().
    >
    > java.nio.charset.Charset.defaultCharset().name()
    > US-ASCII


    So the system has no way of printing out the name using the default charset.
    If you check the four chars in the name then they, presumably, will not include
    63 (the question mark), but will have the correct Unicode code point for ö
    (whatever that might be).

    You don't say how you are viewing the filename, but whatever it is (debugger,
    System.out.println(), ...) will need to be told to use a charset that can
    represent ö.


    > System.getProperty("file.encoding")
    > ANSI_X3.4-1968
    >
    >
    > I've played around and set file.encoding to ascii, utf-8, utf-16, cp437
    > and iso-8859-1. Nothing helps.


    I don't know (off the top of my head) what the 'file.encoding' property is used
    for, but I very much doubt if it's relevant here. At a guess it's used as the
    default charset for interpreting the /contents/ of files -- but that's a guess.

    -- chris
     
    Chris Uppal, Oct 20, 2005
    #4
  5. EdwardH

    EdwardH Guest

    EdwardH wrote:
    >> Try setting the encoding specifically at file open.

    >
    >
    > Where would one do that?
    >
    > File doesn't take a (String filename, String encoding) constructor.


    Fixed!

    export LC_CTYPE=en_US

    It was previously POSIX, which I'm sure is short for "Piece of Shit IX".
     
    EdwardH, Oct 20, 2005
    #5
  6. EdwardH

    zero Guest

    EdwardH <edwardh@N:O:S:p:A:M:edward.dyndns.org> wrote in news:nnL5f.148696
    $:

    > The file "höhö" is shown as "h?h?" when I get a file.getName().
    >
    > java.nio.charset.Charset.defaultCharset().name()
    > US-ASCII
    >
    > System.getProperty("file.encoding")
    > ANSI_X3.4-1968
    >
    >
    > I've played around and set file.encoding to ascii, utf-8, utf-16, cp437
    > and iso-8859-1. Nothing helps.
    >
    > Can anyone tell me what to do to fix this?
    >
    > (I'm running and amd64 linux system, btw).


    omg I'm getting nightmares again... I had a similar problem with
    retreiving a name from a Clipper database in an internship (I had to
    convert an old Clipper program to Java). In the end I just gave up and
    added some code that replaced the ö characters with their Unicode
    equivalent.
     
    zero, Oct 20, 2005
    #6
  7. EdwardH

    Roedy Green Guest

    On Thu, 20 Oct 2005 12:05:40 GMT, EdwardH
    <edwardh@N:O:S:p:A:M:edward.dyndns.org> wrote or quoted :

    >
    >Where would one do that?
    >
    >File doesn't take a (String filename, String encoding) constructor.


    The file class has nothing to do with contents or reading or writing.
    It is about file names and existence.

    You need to look elsewhere. In regular file i/o it is the Readers and
    Writers.

    In nio look at the Charset, CharsetDecoder



    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
     
    Roedy Green, Oct 20, 2005
    #7
  8. "Chris Uppal" <-THIS.org> wrote in message
    news:43579d1a$0$38045$...
    > I don't know (off the top of my head) what the 'file.encoding' property is
    > used
    > for, but I very much doubt if it's relevant here. At a guess it's used as
    > the
    > default charset for interpreting the /contents/ of files -- but that's a
    > guess.


    You're right; it's the default encoding used by FileReader and FileWriter.
     
    Mike Schilling, Oct 21, 2005
    #8
  9. "Mike Schilling" <> wrote:
    > "Chris Uppal" <-THIS.org> wrote:
    >> I don't know (off the top of my head) what the 'file.encoding' property
    >> is used for, [...] At a guess it's used as the
    >> default charset for interpreting the /contents/ of files -- but that's a
    >> guess.

    >
    > You're right; it's the default encoding used by FileReader and FileWriter.
    >

    Even more: it's the default encoding used by
    InputStreamReader, OutputStreamWriter
    String ( constructor String(byte[]), method getBytes() )

    --
    "TFritsch$t-online:de".replace(':','.').replace('$','@')
     
    Thomas Fritsch, Oct 21, 2005
    #9
  10. "Thomas Fritsch" <> wrote in message
    news:djbh0f$mad$04$-online.com...
    > "Mike Schilling" <> wrote:
    >> "Chris Uppal" <-THIS.org> wrote:
    >>> I don't know (off the top of my head) what the 'file.encoding' property
    >>> is used for, [...] At a guess it's used as the
    >>> default charset for interpreting the /contents/ of files -- but that's a
    >>> guess.

    >>
    >> You're right; it's the default encoding used by FileReader and
    >> FileWriter.
    >>

    > Even more: it's the default encoding used by
    > InputStreamReader, OutputStreamWriter
    > String ( constructor String(byte[]), method getBytes() )
    >


    So it is. That is, it's the "defaut encoding", period. Misleadingly named,
    if you ask me.
     
    Mike Schilling, Oct 21, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Mark Bram
    Replies:
    0
    Views:
    3,956
    Robert Mark Bram
    Sep 28, 2003
  2. ygao

    unicode wrap unicode object?

    ygao, Apr 8, 2006, in forum: Python
    Replies:
    6
    Views:
    562
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Apr 8, 2006
  3. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    985
    Grzegorz ¦liwiñski
    Jan 19, 2011
  4. Chirag Mistry
    Replies:
    6
    Views:
    176
    Ollivier Robert
    Feb 8, 2008
  5. Terry Reedy
    Replies:
    0
    Views:
    78
    Terry Reedy
    Jan 7, 2014
Loading...

Share This Page