'A'++ == 'B': Always True?

Fritz Foetzl · Nov 18, 2004

[snip]

The OP (to whom I was responding) was asking an important question
about a fundamental difference between the languages that he was
accustomed to and Java. Languages like C process characters internally
in 'native' form with no translation during I/O, while Java processes
characters internally in Unicode and translates during I/O. This
difference trips up a LOT of beginning Java programmers, and I felt
that it was worthwhile to be explicit about what was going on.

....and the OP appreciates it. This has been a lively, stimulating
discussion - better than I anticipated. The difference between
character I/O and internal processing is important, and I've learned
much from reading this thread. Thanks to all who have responded!

ff

Gary Labowitz · Nov 18, 2004

Doug> you'll usually be using a character encoding that
Doug> translates 0x0000-0x007F into byte values 0x00-0x7F.
Doug> A counter-example would be if
Doug> you were running on an IBM mainframe,

Chris> I don't think that's relevant. A typical EBCDIC machine would
Chris> take a different route to get there, but the resulting output
Chris> would still be an 'A' followed by a 'B'.

Not true. EBCDIC does not guarantee that the alphabetic characters are
contiguous. I believe 'R' is not followed by 'S'. There may also be other
"breaks" in the sequence.

Gary Labowitz · Nov 18, 2004

Fritz Foetzl said:
"Doug Pardee" <[email protected]> wrote in message

[snip]

The OP (to whom I was responding) was asking an important question
about a fundamental difference between the languages that he was
accustomed to and Java. Languages like C process characters internally
in 'native' form with no translation during I/O, while Java processes
characters internally in Unicode and translates during I/O. This
difference trips up a LOT of beginning Java programmers, and I felt
that it was worthwhile to be explicit about what was going on.

Click to expand...

...and the OP appreciates it. This has been a lively, stimulating
discussion - better than I anticipated. The difference between
character I/O and internal processing is important, and I've learned
much from reading this thread. Thanks to all who have responded!

Interesting. Also, since the OP used the postfix operator, I'm just
wondering if the 'A' wasn't being compared to 'B' and would therefore always
be false.
As confused as ever, I remain

Chris Smith · Nov 19, 2004

Gary Labowitz said:
Not true. EBCDIC does not guarantee that the alphabetic characters are
contiguous. I believe 'R' is not followed by 'S'. There may also be other
"breaks" in the sequence.

That doesn't matter. The point is that the literal 'A' is a unicode
code point. Incrementing it will always give the code point for 'B'.
The translation to EBCDIC is only performed during the output phase.
The resulting output may not contain consecutive EBCDIC values (I don't
know enough about EBCDIC to say whether it will or not), but it WILL be
A followed by B -- not because A and B are consecutive in EBCDIC, but
because A and B are consecutive in Unicode.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

Gary Labowitz · Nov 19, 2004

Chris Smith said:
That doesn't matter. The point is that the literal 'A' is a unicode
code point. Incrementing it will always give the code point for 'B'.
The translation to EBCDIC is only performed during the output phase.
The resulting output may not contain consecutive EBCDIC values (I don't
know enough about EBCDIC to say whether it will or not), but it WILL be
A followed by B -- not because A and B are consecutive in EBCDIC, but
because A and B are consecutive in Unicode.

You are getting me confused. EBCDIC is not Unicode. Are you saying that on
an EBCDIC machine it uses Unicode for internal storage of data?
Unicode: 'A' = \u0041
EBCDIC: 'A' = 0xF0

John C. Bollinger · Nov 19, 2004

Michael said:
You really didn't get the point. The Java Language Specification isn't
even relevant at this point. The source code is originally composed of
bytes, not characters. And the compiler has to use some sort of encoding
to convert these bytes into characters. It's not the fault of the compiler
than it may use a wrong one.

And here again voice my dissent. The source code is composed of
characters. JLS says so. The _representation_ of the source code in
most media consists of bytes, but that's not what we said we were
talking about, and it's not a generally useful thing to bring up in
discussion of language issues. As you say, the compiler must decode the
source code representation correctly in order to produce classes the
properly correspond to that source, but that's not a language issue,
it's a tools issue. Am I splitting hairs? Certainly! But so is all
the rest of this subthread.

John Bollinger
(e-mail address removed)

Chris Smith · Nov 19, 2004

Gary Labowitz said:
You are getting me confused. EBCDIC is not Unicode. Are you saying that on
an EBCDIC machine it uses Unicode for internal storage of data?

Yes. That's a quite fundamental concept of Java. Java *always* uses
Unicode to store internal character data. If it's running on an EBCDIC
machine, then it translates to EBCDIC during the output process.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

Michael Borgwardt · Nov 19, 2004

Gary said:
You are getting me confused. EBCDIC is not Unicode. Are you saying that on
an EBCDIC machine it uses Unicode for internal storage of data?

As far as Java is concerned, the only thing that distinguishes an
"BCDIC machine" from, say, an "ASCII machine" is the platform default
encoding, which is used for Char/String <--> byte[]/file conversion
in cases where the encoding is not explicitly specified.

Java chars and Strings are Unicode, or at least must behave as if they
were. A JVM implementation is free to use EBCDIC for its internal storage
of chars, but to fulfill the JLS, char must behave in every aspect as if
it were a 16 bit unicode value, which means that 'A'+1 == 'B'. Since that
means that using EBCDIC for internal storage would make the JVM
implementation complex and inefficient without (IMO) any gains to show
for it, yes, JVMs on an "EBCDIC machine" are going to use Unicode for
internal storage of data.

Michael Borgwardt · Nov 19, 2004

John said:
And here again voice my dissent. The source code is composed of
characters. JLS says so. The _representation_ of the source code in
most media consists of bytes, but that's not what we said we were
talking about, and it's not a generally useful thing to bring up in
discussion of language issues.

Well, IMO we were talking about whether 'A'+1 == 'B' always.
In practice, this expression is entered in some kind of text editor and
saved as a file, which is then fed to a compiler. So in practice
it may end up being false in some circumstances. But only through faulty
assumptions about or misuage of the tools, not the language.

As you say, the compiler must decode the
source code representation correctly in order to produce classes the
properly correspond to that source, but that's not a language issue,
it's a tools issue. Am I splitting hairs? Certainly! But so is all
the rest of this subthread.

Certainly.

A Beginner:Why is my program always returning true?	12	Nov 18, 2007
LiveConnect Applet Architecture Bug with Thread utilization (with SSCCE!)	0	Nov 20, 2010
Java - cannot find symbol	1	Feb 12, 2011
Calling java APIs from a c program	3	Nov 23, 2006
rewrote a program and it still isn't working right for the catch statement	5	Oct 15, 2006
Shining a nice sunshine on American neighborhoods	1	Dec 21, 2006
Why replaceSelection in JTextPane is not behaving safely?	4	Nov 1, 2009
PyWart: PEP8: a seething cauldron of inconsistencies.	1	Jul 28, 2011

'A'++ == 'B': Always True?

Fritz Foetzl

Gary Labowitz

Gary Labowitz

Chris Smith

Gary Labowitz

John C. Bollinger

Chris Smith

Michael Borgwardt

Michael Borgwardt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads