ascii problem in writing to files

J

Jim Pete

I have a problem with the following piece of code:

FileWriter o = new FileWriter(new File("foo"));
o.write((char)140);
o.close();

FileReader f = new FileReader(certFile);
BufferedReader in = new BufferedReader(f);
String tmp = in.readLine();


When I read tmp the charachter changed to the value 63 rather than 140.

How can I fix this?



Thank you.
 
R

Roedy Green

When I read tmp the charachter changed to the value 63 rather than 140.

You used the default encoding. Char 140 is one of the characters that
has different meanings in different 8-bit encodings. You must specify
an encoding for your FIleReader that maps it the way you want.

Encoding problems flow from the transition between many different
8-bit encodings and Java's internal 16-bit Unicode encoding.

See http://mindprod.com/encoding.html to understand the process and
select a suitable encoding. See also
http://mindprod.com/jgloss/unicode.html

See also http://mindprod.com/fileio.html for sample code.

Another way of looking at the problem is you wrote with raw byte-level
I/O and read with cooked character level I/O.
 
J

Jim Pete

I just tried all the data formats... nothing worked... I either get 63, 338
or more than two bytes...
 
P

Phil Hanna

Pete,

It's an encoding problem. Look at the first line of the javadocs for
FileWriter:

"Convenience class for writing character files. The constructors of
this class assume that the default character encoding and the default
byte-buffer size are acceptable. To specify these values yourself,
construct an OutputStreamWriter on a FileOutputStream."

So when you write

new FileWriter(new File("foo"))

you're getting the platform default encoding, which you can see if you
insert

System.out.println
("The file writer encoding is " + o.getEncoding());

In my case, on Windows NT, this is "Cp1252", which is Windows Latin-1,
in which your character 140 is not supported.

Solution: specify the encoding explicitly when you create the output
file. Instead of constructing a FileWriter directly, create a
FileOutputStream and then an OutputStreamWriter for it that specifies
the encoding:

String encoding = "ISO-8859-1"; // For example

OutputStream ostream = new FileOutputStream(new File("foo"));
Writer o = new OutputStreamWriter(ostream, encoding);

o.write((char) 140);
o.flush();
o.close();

Same deal on input; use

InputStream istream = new FileInputStream(new File("foo"));
Reader f = new InputStreamReader(istream, encoding);
 
J

Jim Pete

Isn't there an encoder that writes the exact bytes I tell it to... like
23=23 160=160 244=244..etc..?
 
J

Jim Pete

IT WORKED!!

ISO-8859-1 is the one I needed...


thanks

Jim Pete said:
Isn't there an encoder that writes the exact bytes I tell it to... like
23=23 160=160 244=244..etc..?
 
P

Phil Hanna

Isn't there an encoder that writes the exact bytes I tell it to... like
23=23 160=160 244=244..etc..?

Sure - if you're writing binary output, just use FileOutputStream
without any encoding (as opposed to a Writer of any kind).
 
C

Chris Smith

Jim said:
IT WORKED!!

ISO-8859-1 is the one I needed...

Jim,

I'd still read and pay attention to Phil's last reply. If you're
dealing with binary data and not characters, then using FileWriter is
wrong even if it can be coerced into working. If that's the case,
FileWriter is still costing you performance and confusing the heck out
of anyone trying to read your code. The OutputStream class and its
subclasses are appropriate for binary data (including even text if
you've pre-encoded it somehow, as it appears you might be doing), and
Writer and its subclasses are appropriate for writing some encoding of a
text file, when you have Unicode character data as your source.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
J

Jon Skeet

Phil Hanna said:
In my case, on Windows NT, this is "Cp1252", which is Windows Latin-1,
in which your character 140 is not supported.

Quick point: Cp1252 and ISO-8859-1 are *not* the same things - they
differ between 129 and 139, IIRC.
 
R

Roedy Green

I just tried all the data formats... nothing worked... I either get 63, 338
or more than two bytes...

First of all we have to decide what sort of file we want to
write/read.

1. Is this file for human consumption or intermediate storage for the
convenience of a computer?

If the first case, we have to pick an 8-bit character encoding. If
the second, we might pick a raw byte or binary encoding.

2. What does 160 mean to you? A small integer 160, the character OE,
or some other character? If some other character, in what 8-bit
encoding does 160 represent the desired character? See
http://mindprod.com/jgloss/encoding.html. Then go look at
http://mindprod.com/jgloss/unicode.html to find out what 16-bit
Unicode encoding represents the desired character. See
http://mindprod.com/jgloss/literals.html for background.

With the answers to those to questions at hand, I can tell you what to
ask the File I/O Amanuensis at http://mindprod.com/fileio.html in
order to get it to generate you the answer.

One thing to understand is that Java works internally with 16 bit
characters. They are translated to 8-bit encodings sometimes, to 16
bit chars other, to multibyte strings others and to counted strings
others. Rarely do you see raw big-endian 16-bit characters lying
about in a file.
 
R

Roedy Green

Isn't there an encoder that writes the exact bytes I tell it to... like
23=23 160=160 244=244..etc..?

ASCII is pretty close. It handles the 7bit chars. Keep in mind Java
uses 16 bit chars internally. What you want is an encoding that just
truncates the high byte and pads with 0 or read.


Have a look at the description of 8859_1 at
http://mindprod.com/jgloss/encoding.html

This is the default American encoding which I think this leads novice
American Java programmers astray. They don't fully realise what
encoding is or that there even is an 8-bit / 16-bit problem.
 
D

Dale King

Jon Skeet said:
Quick point: Cp1252 and ISO-8859-1 are *not* the same things - they
differ between 129 and 139, IIRC.


He did not say they were the same. He said Cp1252 and Windows Latin-1 are
the same thing, which is true. Windows Latin-1 is another name for Cp1252,
just as ISO Latin-1 is another name for ISO-8859-1.
 
S

Steve Horsley

I have a problem with the following piece of code:

FileWriter o = new FileWriter(new File("foo")); o.write((char)140);
o.close();

FileReader f = new FileReader(certFile); BufferedReader in = new
BufferedReader(f); String tmp = in.readLine();


When I read tmp the charachter changed to the value 63 rather than 140.

How can I fix this?



Thank you.

I just checked, and 140 is the unicode character PLU (Partial Line
Backwards). Is this what you meant to write?

Please be aware of the difference between text and binary data, characters
and bytes. It is impossible to write text to a file without puting it
through some form of text->bytes encoding. If you want to write text, use
a Writer and specify the required character encoding. If you want to write
bytes, use an OutputStream and write bytes.

In java more than any other language I know, the difference between bytes
and text characters is fundamental, and the IO methods are very diferent.

By the way, 0x3f is the unicode for '?' which is used when trying to
encode characters which are not supported in the target characterset.

Steve
 
J

Jon Skeet

Dale King said:
He did not say they were the same. He said Cp1252 and Windows Latin-1 are
the same thing, which is true. Windows Latin-1 is another name for Cp1252,
just as ISO Latin-1 is another name for ISO-8859-1.

Ah... fair enough. I've never heard of Cp1252 referred to as "Windows
Latin-1" before. Sorry Phil!
 
S

Steve Horsley

Ah... fair enough. I've never heard of Cp1252 referred to as "Windows
Latin-1" before. Sorry Phil!

And presumably, Windows-Cheese is ideal for writing on blackboards!

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top