Encoding in file

Lukasz · Sep 27, 2006

Hi,

In my application I create some files and I write some text into. I
want to use UTF-8 encoding, but both methods that I tried seem to
ignore specified encoding. I used:

OutputStream fout= new FileOutputStream(nazwa);
OutputStream bout= new BufferedOutputStream(fout);
OutputStreamWriter out = new OutputStreamWriter(bout, "UTF8");

and

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(nazwa),"UTF8"));

The problem seems to be simple, but often it is hard to find an answer
for a most simple question.

Thomas Kellerer · Sep 27, 2006

Lukasz wrote on 27.09.2006 10:24:

Hi,

In my application I create some files and I write some text into. I
want to use UTF-8 encoding, but both methods that I tried seem to
ignore specified encoding. I used:

Can you be more specific what you mean with "seem to ignore"?

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(nazwa),"UTF8"));

This works for me, with the only difference that I use "UTF-8"

Thomas

Lukasz · Sep 27, 2006

Thomas Kellerer napisal(a):

Lukasz wrote on 27.09.2006 10:24:

Can you be more specific what you mean with "seem to ignore"?

This works for me, with the only difference that I use "UTF-8"

Thomas

In UTF-8, for example " sign should be replaced with ;quote (or
something like that). Neither of my method does it.

Thomas Kellerer · Sep 27, 2006

Lukasz wrote on 27.09.2006 11:44:

In UTF-8, for example " sign should be replaced with ;quote

Not at all!

What you are describing is HTML (or XML) "escaping".
That has nothing to do with the encoding of characters.

UTF-8 is an encoding that stores characters that do not fit into 8bit ASCII with
as a variable number of bytes. Some characters are encoded with one byte, some
with two, some with three.

The " sign fits into the 8bit ASCII range, and will be encoded with one byte
(hex 22)
The Euro symbol for example does not fit into the 8bit ASCII range, and will be
encoded with two bytes with UTF-8 (20 AC)

Thomas

Lukasz · Sep 27, 2006

Thomas Kellerer napisal(a):

Lukasz wrote on 27.09.2006 11:44:

Not at all!

What you are describing is HTML (or XML) "escaping".
That has nothing to do with the encoding of characters.

UTF-8 is an encoding that stores characters that do not fit into 8bit ASCII with
as a variable number of bytes. Some characters are encoded with one byte, some
with two, some with three.

The " sign fits into the 8bit ASCII range, and will be encoded with one byte
(hex 22)
The Euro symbol for example does not fit into the 8bit ASCII range, and will be
encoded with two bytes with UTF-8 (20 AC)

Thomas

And what should I make, to replace this " sign with :quote, as well as
other signs with xml escaping?

Thomas Kellerer · Sep 27, 2006

Lukasz wrote on 27.09.2006 12:12:

Thomas Kellerer napisal(a):

And what should I make, to replace this " sign with :quote, as well as
other signs with xml escaping?

There is not standard API (as far as I know). You'll have to roll your own. But
maybe the Jakarta site has something.

Thomas

Steve W. Jackson · Sep 27, 2006

Thomas Kellerer said:
Lukasz wrote on 27.09.2006 12:12:
There is not standard API (as far as I know). You'll have to roll
your own. But maybe the Jakarta site has something.

Thomas

If the information being written is actually XML, it should be a
non-issue. I've found that it's necessary to use the UTF-8 encoding
name on the OutputStreamWriter to ensure that the file itself gets that
encoding, but the method used to serialize the XML must also know that
it should use UTF-8 and it will automatically take care of this
"escaping".

= Steve =

split UTF-8 string to multi UTF8-file	2	Jan 26, 2010
I/O Confusion	8	Oct 26, 2011
Reading from stdout	4	Sep 12, 2013
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
Pin the buffer on the output chain	10	Feb 20, 2010
Help Required, Problem in writing a File.	3	Nov 23, 2006
Problem reading/writing U.K. pound sign	24	Jan 12, 2010
What's the diff PrintWriter vs. OutputStreamWriter ?	0	Jan 28, 2005

Encoding in file

Lukasz

Thomas Kellerer

Lukasz

Thomas Kellerer

Lukasz

Thomas Kellerer

Steve W. Jackson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads