reading file, asii 161 (meta-space) converted to question mark

Michael Muller · Sep 17, 2003

I'm trying to read an HTML file that has been generated by MS excel.
When I use od -c to examine this file, I see lots of octal 240
(decimal 161) chars. This is supposedly a "meta space", whatever that
means. When I read the file in on Windows, everything works ok (the
characters stay as 240), but when I read the file in on linux (RH9),
the "meta-spaces" are converted to question marks, rendering the html
unreadable.

My LANG envar on unix is set to us_ENG.UTF-8. On windows, it's not
set. I tried unsetting and exporting LANG on linux -- no joy.

Help! I'm using 1.4.2 on Linux and 1.4.1 on windows. I sure hope
that's not the issue. The code that reads the file is appended.

Thanks in advance for any help anyone can offer,

-- Mike

private static String slurp(File file)
throws IOException
{
StringBuffer sb = new StringBuffer();
char[] buf = new char[1024 * 4];
BufferedReader br = new BufferedReader(new FileReader(file));
int bytesRead;
while ((bytesRead = br.read(buf, 0, buf.length)) != -1)
{
sb.append(buf, 0, bytesRead);
}

return sb.toString();
}

Roedy Green · Sep 18, 2003

Thanks in advance for any help anyone can offer,

you want to find out what default encodings you are using, then recode
with explicit ones. See http://mindprod.com/jgloss/encoding.html
and http://mindprod.com/fileio.html

Neomorph · Sep 18, 2003

I'm trying to read an HTML file that has been generated by MS excel.
When I use od -c to examine this file, I see lots of octal 240
(decimal 161) chars.

That should be: 160 decimal.
You can't have a even octal number becoming an uneven decimal number ;-)

This is supposedly a "meta space", whatever that
means.

Usually used to 'connect' two words, so they are not split when realigning
text. Like the non-breaking space in HTML (coded as &nbsp

.

When I read the file in on Windows, everything works ok (the
characters stay as 240), but when I read the file in on linux (RH9),
the "meta-spaces" are converted to question marks, rendering the html
unreadable.

HTML should either only contain US ASCII (32-127), or should have a special
codepage/encoding set.
You should be replacing the 0240 (octal) with the code   as long as
it's not part of a parameter value.

My LANG envar on unix is set to us_ENG.UTF-8. On windows, it's not
set. I tried unsetting and exporting LANG on linux -- no joy.

Either way, the Linux font probably has no correlation to that character
code.

Help! I'm using 1.4.2 on Linux and 1.4.1 on windows. I sure hope
that's not the issue. The code that reads the file is appended.

Thanks in advance for any help anyone can offer,

-- Mike

private static String slurp(File file)
throws IOException
{
StringBuffer sb = new StringBuffer();
char[] buf = new char[1024 * 4];
BufferedReader br = new BufferedReader(new FileReader(file));
int bytesRead;
while ((bytesRead = br.read(buf, 0, buf.length)) != -1)
{
sb.append(buf, 0, bytesRead);
}

return sb.toString();
}

Cheers.

Trouble with reading and appending to lines in a file...	3	Jan 22, 2004
java 1.4.2_03 under redhat 9.0 - encoding problem	6	Feb 10, 2004
Problems reading in "£" character from a file	2	Dec 18, 2003
[ANN] JRuby 1.4.0 Released	2	Nov 2, 2009
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
[ANN] JRuby 1.4.0RC1 Released	0	Oct 3, 2009
[ANN] JRuby 1.4.0RC2 Released	0	Oct 21, 2009

reading file, asii 161 (meta-space) converted to question mark

Michael Muller

Roedy Green

Neomorph

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads