Cannot read German characters via FileInputStream

Discussion in 'Java' started by Zsolt, Jan 31, 2004.

  1. Zsolt

    Zsolt Guest

    Hi,

    the German characters will be replaced with question marks when I read a
    file using FileInputStream unless I set the LANG environment to "en_US" on
    Linux.

    export LANG=en_US

    How can I fix the problem without setting the LANG environment variable ?

    Zsolt
    Zsolt, Jan 31, 2004
    #1
    1. Advertising

  2. Zsolt

    S Manohar Guest

    Have you tried using an InputStramReader?

    InputStreamReader isr = new InputStreamReader(myInputStream,
    Charset.forName("US_ASCII"));

    replacing the Charset name for the one used by the file.

    You can create a BufferedReader from this to help with reading whole lines.

    HTH


    "Zsolt" <> wrote in message news:<bvgqgh$qdd$04$-online.com>...
    > Hi,
    >
    > the German characters will be replaced with question marks when I read a
    > file using FileInputStream unless I set the LANG environment to "en_US" on
    > Linux.
    >
    > export LANG=en_US
    >
    > How can I fix the problem without setting the LANG environment variable ?
    >
    > Zsolt
    S Manohar, Jan 31, 2004
    #2
    1. Advertising

  3. Zsolt

    Zsolt Guest

    Yes, I have tried that but it didn't help either.

    Zsolt

    "S Manohar" <> schrieb im Newsbeitrag
    news:...
    > Have you tried using an InputStramReader?
    >
    > InputStreamReader isr = new InputStreamReader(myInputStream,
    > Charset.forName("US_ASCII"));
    >
    > replacing the Charset name for the one used by the file.
    >
    > You can create a BufferedReader from this to help with reading whole

    lines.
    >
    > HTH
    >
    >
    > "Zsolt" <> wrote in message

    news:<bvgqgh$qdd$04$-online.com>...
    > > Hi,
    > >
    > > the German characters will be replaced with question marks when I read a
    > > file using FileInputStream unless I set the LANG environment to "en_US"

    on
    > > Linux.
    > >
    > > export LANG=en_US
    > >
    > > How can I fix the problem without setting the LANG environment variable

    ?
    > >
    > > Zsolt
    Zsolt, Feb 1, 2004
    #3
  4. "Zsolt" <> writes:

    > Yes, I have tried that but it didn't help either.


    Try setting default file encoding, e.g.

    java -Dfile.encoding=iso8859-1 MyClass
    Tor Iver Wilhelmsen, Feb 1, 2004
    #4
  5. Zsolt:

    >the German characters will be replaced with question marks when I read a
    >file using FileInputStream unless I set the LANG environment to "en_US" on
    >Linux.
    >
    >export LANG=en_US
    >
    >How can I fix the problem without setting the LANG environment variable ?


    Where do the question marks show up? In the file, or when you print
    them to standard output? In the latter case, the console may not have
    the correct encoding type.

    Besides, you should use a Reader instead of an InputStream. After all
    you are reading characters. Make sure the encoding is set correctly
    (so that it matches the file).

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
    Marco Schmidt, Feb 1, 2004
    #5
  6. Zsolt

    Zsolt Guest

    Hi Marco,

    the question marks show up when I print the contents (thus not in the input)
    I read.

    I have also tried to use FileReader but got the same result.

    Here is my code:

    private static String encoding = "ISO-8859-1";

    private static String readStream(InputStream sin)
    throws IOException

    {

    InputStreamReader in = null;

    if (encoding == null || encoding.length() == 0)

    {

    in = new InputStreamReader(sin);

    }

    else

    {

    in = new InputStreamReader(sin, encoding);

    }

    // log.println("Encoding: <" + in.getEncoding() + ">");

    StringWriter out = new StringWriter();

    try

    {

    char[] buf = new char[8 * 1024];

    for (int bytesRead = 0; (bytesRead = in.read(buf)) != -1; )

    {

    out.write(buf, 0, bytesRead);

    }

    }

    finally

    {

    in.close();

    }

    String cont = out.toString().trim();

    out.close();

    log.println("Read: <" + cont + ">");

    return cont;

    }

    "Marco Schmidt" <> schrieb im Newsbeitrag
    news:...
    > Zsolt:
    >
    > >the German characters will be replaced with question marks when I read a
    > >file using FileInputStream unless I set the LANG environment to "en_US"

    on
    > >Linux.
    > >
    > >export LANG=en_US
    > >
    > >How can I fix the problem without setting the LANG environment variable ?

    >
    > Where do the question marks show up? In the file, or when you print
    > them to standard output? In the latter case, the console may not have
    > the correct encoding type.
    >
    > Besides, you should use a Reader instead of an InputStream. After all
    > you are reading characters. Make sure the encoding is set correctly
    > (so that it matches the file).
    >
    > Regards,
    > Marco
    > --
    > Please reply in the newsgroup, not by email!
    > Java programming tips: http://jiu.sourceforge.net/javatips.html
    > Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
    Zsolt, Feb 2, 2004
    #6
  7. Zsolt

    Jon A. Cruz Guest

    Zsolt wrote:
    > Hi Marco,
    >
    > the question marks show up when I print the contents (thus not in the input)
    > I read.
    >



    Not necessarily.

    It could very well be that you read in the character correctly, but are
    displaying it via a pipeline that does the changing.

    If you walk the string in the middle and dump each of it's characters
    (not bytes, characters) as hex, then you might be able to pinpoint problems.



    Make sure you use the correct encoding.

    >
    > in = new InputStreamReader(sin, encoding);
    >
    > }
    >
    > // log.println("Encoding: <" + in.getEncoding() + ">");
    >

    [SNIP]


    > for (int bytesRead = 0; (bytesRead = in.read(buf)) != -1; )


    OK. Since you're reading char's (16-bit unsigned values) and not bytes
    (8-bit signed values), 'bytesRead' is a misleading name.


    > String cont = out.toString().trim();
    >
    > out.close();
    >
    > log.println("Read: <" + cont + ">");


    Bingo!!!!

    Check what log.println() does. It could be the culprit mangling characters.

    Instead use a small test file and add this with each read in char:

    char c;
    c = ???;

    ....

    System.out.println( "The char '" + c + "' is \\u" + Integer.toHexString(
    c & 0x0ffff) );

    Oh, and try to send some thing to your log like this:

    log.println("A test of [\u00df] and [\u00fc].");

    You should see

    A test of [ß] and [ü].

    (The capital 'B'-looking s and a 'u' with umlaut)
    Jon A. Cruz, Feb 8, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Krick
    Replies:
    2
    Views:
    14,210
    Marco Schmidt
    Aug 28, 2003
  2. Billy
    Replies:
    7
    Views:
    10,892
    Billy
    Oct 25, 2005
  3. Ajey
    Replies:
    1
    Views:
    676
    Ron Natalie
    Mar 30, 2005
  4. Navin Mishra
    Replies:
    2
    Views:
    430
    Joerg Jooss
    Feb 27, 2007
  5. HowTo
    Replies:
    3
    Views:
    1,334
    Arne Vajhøj
    Jun 7, 2008
Loading...

Share This Page