Encoding in file

Discussion in 'Java' started by Lukasz, Sep 27, 2006.

  1. Lukasz

    Lukasz Guest

    Hi,

    In my application I create some files and I write some text into. I
    want to use UTF-8 encoding, but both methods that I tried seem to
    ignore specified encoding. I used:

    OutputStream fout= new FileOutputStream(nazwa);
    OutputStream bout= new BufferedOutputStream(fout);
    OutputStreamWriter out = new OutputStreamWriter(bout, "UTF8");

    and

    BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
    FileOutputStream(nazwa),"UTF8"));

    The problem seems to be simple, but often it is hard to find an answer
    for a most simple question.
     
    Lukasz, Sep 27, 2006
    #1
    1. Advertising

  2. Lukasz wrote on 27.09.2006 10:24:
    > Hi,
    >
    > In my application I create some files and I write some text into. I
    > want to use UTF-8 encoding, but both methods that I tried seem to
    > ignore specified encoding. I used:


    Can you be more specific what you mean with "seem to ignore"?

    > BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
    > FileOutputStream(nazwa),"UTF8"));


    This works for me, with the only difference that I use "UTF-8"

    Thomas
     
    Thomas Kellerer, Sep 27, 2006
    #2
    1. Advertising

  3. Lukasz

    Lukasz Guest

    Thomas Kellerer napisal(a):
    > Lukasz wrote on 27.09.2006 10:24:
    > > Hi,
    > >
    > > In my application I create some files and I write some text into. I
    > > want to use UTF-8 encoding, but both methods that I tried seem to
    > > ignore specified encoding. I used:

    >
    > Can you be more specific what you mean with "seem to ignore"?
    >
    > > BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
    > > FileOutputStream(nazwa),"UTF8"));

    >
    > This works for me, with the only difference that I use "UTF-8"
    >
    > Thomas


    In UTF-8, for example " sign should be replaced with ;quote (or
    something like that). Neither of my method does it.
     
    Lukasz, Sep 27, 2006
    #3
  4. Lukasz wrote on 27.09.2006 11:44:
    >
    > In UTF-8, for example " sign should be replaced with ;quote


    Not at all!

    What you are describing is HTML (or XML) "escaping".
    That has nothing to do with the encoding of characters.

    UTF-8 is an encoding that stores characters that do not fit into 8bit ASCII with
    as a variable number of bytes. Some characters are encoded with one byte, some
    with two, some with three.

    The " sign fits into the 8bit ASCII range, and will be encoded with one byte
    (hex 22)
    The Euro symbol for example does not fit into the 8bit ASCII range, and will be
    encoded with two bytes with UTF-8 (20 AC)

    Thomas
     
    Thomas Kellerer, Sep 27, 2006
    #4
  5. Lukasz

    Lukasz Guest

    Thomas Kellerer napisal(a):
    > Lukasz wrote on 27.09.2006 11:44:
    > >
    > > In UTF-8, for example " sign should be replaced with ;quote

    >
    > Not at all!
    >
    > What you are describing is HTML (or XML) "escaping".
    > That has nothing to do with the encoding of characters.
    >
    > UTF-8 is an encoding that stores characters that do not fit into 8bit ASCII with
    > as a variable number of bytes. Some characters are encoded with one byte, some
    > with two, some with three.
    >
    > The " sign fits into the 8bit ASCII range, and will be encoded with one byte
    > (hex 22)
    > The Euro symbol for example does not fit into the 8bit ASCII range, and will be
    > encoded with two bytes with UTF-8 (20 AC)
    >
    > Thomas


    And what should I make, to replace this " sign with :quote, as well as
    other signs with xml escaping?
     
    Lukasz, Sep 27, 2006
    #5
  6. Lukasz wrote on 27.09.2006 12:12:
    > Thomas Kellerer napisal(a):
    >> Lukasz wrote on 27.09.2006 11:44:
    >>> In UTF-8, for example " sign should be replaced with ;quote

    >> Not at all!
    >>
    >> What you are describing is HTML (or XML) "escaping".
    >> That has nothing to do with the encoding of characters.
    >>
    >> UTF-8 is an encoding that stores characters that do not fit into 8bit ASCII with
    >> as a variable number of bytes. Some characters are encoded with one byte, some
    >> with two, some with three.
    >>
    >> The " sign fits into the 8bit ASCII range, and will be encoded with one byte
    >> (hex 22)
    >> The Euro symbol for example does not fit into the 8bit ASCII range, and will be
    >> encoded with two bytes with UTF-8 (20 AC)
    >>
    >> Thomas

    >
    > And what should I make, to replace this " sign with :quote, as well as
    > other signs with xml escaping?
    >

    There is not standard API (as far as I know). You'll have to roll your own. But
    maybe the Jakarta site has something.

    Thomas
     
    Thomas Kellerer, Sep 27, 2006
    #6
  7. In article <>,
    Thomas Kellerer <> wrote:

    > Lukasz wrote on 27.09.2006 12:12:
    > > Thomas Kellerer napisal(a):
    > >> Lukasz wrote on 27.09.2006 11:44:
    > >>> In UTF-8, for example " sign should be replaced with ;quote
    > >> Not at all!
    > >>
    > >> What you are describing is HTML (or XML) "escaping".
    > >> That has nothing to do with the encoding of characters.
    > >>
    > >> UTF-8 is an encoding that stores characters that do not fit into
    > >> 8bit ASCII with as a variable number of bytes. Some characters are
    > >> encoded with one byte, some with two, some with three.
    > >>
    > >> The " sign fits into the 8bit ASCII range, and will be encoded
    > >> with one byte (hex 22) The Euro symbol for example does not fit
    > >> into the 8bit ASCII range, and will be encoded with two bytes with
    > >> UTF-8 (20 AC)
    > >>
    > >> Thomas

    > >
    > > And what should I make, to replace this " sign with :quote, as well as
    > > other signs with xml escaping?
    > >

    > There is not standard API (as far as I know). You'll have to roll
    > your own. But maybe the Jakarta site has something.
    >
    > Thomas


    If the information being written is actually XML, it should be a
    non-issue. I've found that it's necessary to use the UTF-8 encoding
    name on the OutputStreamWriter to ensure that the file itself gets that
    encoding, but the method used to serialize the XML must also know that
    it should use UTF-8 and it will automatically take care of this
    "escaping".

    = Steve =
    --
    Steve W. Jackson
    Montgomery, Alabama
     
    Steve W. Jackson, Sep 27, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    18,993
    Jon Skeet [C# MVP]
    Jun 9, 2004
  2. MattB
    Replies:
    2
    Views:
    18,239
    Joerg Jooss
    Jun 17, 2005
  3. Replies:
    1
    Views:
    23,514
    Real Gagnon
    Oct 8, 2004
  4. Matthew Mueller

    file.encoding doesn't apply to file.write?

    Matthew Mueller, Jun 7, 2004, in forum: Python
    Replies:
    2
    Views:
    449
    Matthew Mueller
    Jun 7, 2004
  5. Replies:
    2
    Views:
    397
Loading...

Share This Page