Do you know how Java read character value greater than 128/255?

RC · Dec 13, 2006

I have

BufferedReader bufferedReader =
new BufferedReader(new FileReader(inputfile_name));

int c;
while ((c = bufferedReader.read()) > -1 ) {
if (c > (int)128) {
System.err.println(
(char)c + " " +
c + " " +
Integer.toOctalString(c) + " " +
Integer.toHexString(c)
);
}
}
bufferedReader.close();

This is fine, I got print all characters which ASCII value greater than
128.

Now I do the same in C

if ((fp = fopen("inputfile_name", "r")) == NULL) {
fprintf(stderr, "Can't open %s\n", argv[1]);
exit(2);
}
int c;
while ((c = getc(fp)) != EOF) {
if (c > 128) {
printf("%c %d %o %x\n", c, c, c, c);
}
}
fclose(fp);

But in C I don't get print any character ASCII value greater than 128 by
read the same file.

I just wonder why, how do Java read those character ASCII greater
than 128?

Lew · Dec 13, 2006

RC said:
int c;
while ((c = bufferedReader.read()) > -1 ) {
if (c > (int)128) {

128 is already an int, so casting it to int has no effect.

System.err.println(
(char)c + " " +
c + " " +
Integer.toOctalString(c) + " " +
Integer.toHexString(c)
);
}
}
bufferedReader.close();

This is fine, I got print all characters which ASCII value greater than
128.

Now I do the same in C

if ((fp = fopen("inputfile_name", "r")) == NULL) {
fprintf(stderr, "Can't open %s\n", argv[1]);
exit(2);
}
int c;
while ((c = getc(fp)) != EOF) {

The C function getc() returns a byte-scale value, not a 16-bit value as does Java.

if (c > 128) {
printf("%c %d %o %x\n", c, c, c, c);
}
}
fclose(fp);

But in C I don't get print any character ASCII value greater than 128 by
read the same file.
I just wonder why, how do Java read those character ASCII greater
than 128?

Java is likely not reading ASCII but UTF-8. Have you tried the Java program
with the InputStreamReader encoding set to "US-ASCII"?

For a fuller answer one would need to know the contents of the file.

Check out the API docs for java.io.InputStreamReader and java.nio.charset.Charset.

- Lew

Oliver Wong · Dec 13, 2006

RC said:
But in C I don't get print any character ASCII value greater than 128 by
read the same file.

I just wonder why, how do Java read those character ASCII greater
than 128?

I think it's basically because C uses ASCII internally, while Java uses
a modified version of UTF-16 internally.

- Oliver

Mike Schilling · Dec 13, 2006

Oliver Wong said:
I think it's basically because C uses ASCII internally, while Java uses
a modified version of UTF-16 internally.

It's because the C code shown was reading bytes, while the Java code shown
was reading characters. Java that reads bytes, e.g.

InputStream strm;

int b = strm.read();

would never see anything outside the range [-128..127], while C that reads
"wide" characters, e.g.

wint_t c = getwc(stdin);

can see characters outside that range.

Timothy Bendfelt · Dec 13, 2006

These two bits of code do not do the same thing. The java code has the
opportunity to use the file encoding, including multi-byte schemes
(e.g. UTF8) to re-map bytes in the file stream to characters represented
as UTF16 code points. The C code should just be consuming bytes and
retuning them as unsigned chars.

Question: Do both of them read the same number of characters from the
stream?

Question: What does java think your default file encoding and code page
is? You can force it to read US-ASCII or LATIN 1 and run again.

Oliver Wong · Dec 13, 2006

Mike Schilling said:
Oliver Wong said:

I think it's basically because C uses ASCII internally, while Java
uses a modified version of UTF-16 internally.

Click to expand...

It's because the C code shown was reading bytes, while the Java code shown
was reading characters. Java that reads bytes, e.g.

InputStream strm;

int b = strm.read();

would never see anything outside the range [-128..127], while C that reads
"wide" characters, e.g.

wint_t c = getwc(stdin);

can see characters outside that range.

I was referring to the language-built-in datatypes known as "char" in C
and "char" in Java. Both languages seem to assume that there is a finite
number of characters that will ever used in computing (256 in the case of C,
65536 in the case of Java), and when they were shown wrong, libraries needed
to be added to support the extra characters.

The OQ (Original Question) was informally phrased (e.g. contrasting C's
printing versus Java's reading -- I would further argue that Java doesn't
"read" characters at all in this scenario, but instead reads bytes, and then
does some behind the scenes conversions to characters), so I was sort of
guessing at what the OP was really asking.

- Oliver

Mike Schilling · Dec 14, 2006

Oliver Wong said:
Mike Schilling said:

Oliver Wong said:

But in C I don't get print any character ASCII value greater than 128
by
read the same file.

I just wonder why, how do Java read those character ASCII greater
than 128?

I think it's basically because C uses ASCII internally, while Java
uses a modified version of UTF-16 internally.

Click to expand...

It's because the C code shown was reading bytes, while the Java code
shown was reading characters. Java that reads bytes, e.g.

InputStream strm;

int b = strm.read();

would never see anything outside the range [-128..127], while C that
reads "wide" characters, e.g.

wint_t c = getwc(stdin);

can see characters outside that range.

Click to expand...

I was referring to the language-built-in datatypes known as "char" in C
and "char" in Java. Both languages seem to assume that there is a finite
number of characters that will ever used in computing (256 in the case of
C, 65536 in the case of Java), and when they were shown wrong, libraries
needed to be added to support the extra characters.

Yes, but that's an apples-to-oranges comparison. Java has "byte" and "char"
for octets and character-set-members respectively. C has "char" and
"wchar_t" for those purposes. The confusion (if any) arises from the fact
that C and Java use the same name ("char") for two different things.

The OQ (Original Question) was informally phrased (e.g. contrasting C's
printing versus Java's reading -- I would further argue that Java doesn't
"read" characters at all in this scenario, but instead reads bytes, and
then does some behind the scenes conversions to characters),

Any language (or library) that handles multi-byte character sets has to do
the same.

so I was sort of guessing at what the OP was really asking.

I was too.

How to sort a CSV file with merge sort JAVA	7	May 6, 2021
Does somebody know how to print out wild character with unicode value greater than 255	2	Jan 5, 2005
isdigit() for characters greater than 127	4	Oct 9, 2004
How to do Lexical analysis for c/c++ source file?	0	Jun 8, 2012
Simple/pojo loc parser for java	2	Sep 6, 2010
ifstream character read problem	4	Aug 21, 2007
Getting Java to input strange characters from a file like c/c++ does it	4	Jan 17, 2007
fgetwc doesn't read Unicode	6	Jun 8, 2011

Do you know how Java read character value greater than 128/255?

RC

Lew

Oliver Wong

Mike Schilling

Timothy Bendfelt

Oliver Wong

Mike Schilling

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads