Problem with gcc

Ben Bacarisse · Nov 15, 2009

Seebs said:
Nope! '' interprets the value as a character then promotes to int. So
on my system, '\400' == '\0'.

Not exactly, but only because "promote" is a technical term. '\x82'
is of type int so no promotion happens, but the effect is the same, I
agree. The exact wording in the standard is a little cumbersome.

Ironically, I think both bartc's point and mine are better made
without any intervening variables. He thinks many people will assume
that '\x82' == 0x82 must be 1 and I don't think anyone should --
that's what the escapes are for.

Seebs · Nov 15, 2009

Are you a licensed psychiatrist too?

No, and I have not offered a clinical diagnosis.

You should be more careful about slandering people. Some might take it
seriously.

No slander. I have a psych degree, I keep up on the literature, and I
offer suggestions as to things people might want to look into, which are
not to be mistaken for a clinical diagnosis. No licensing is required
to argue that someone appears to have traits consistent with a given
disorder.

You have a near perfect sweep of the NPD traits, with your insistence on
believing that people who politely express disagreement with you are
"attacking" you being one of the more outstanding. You view your personal
convenience as absolutely more important than millions of existing users,
too; that's another classic example.

-s

John Kelly · Nov 15, 2009

No, and I have not offered a clinical diagnosis.

No slander. I have a psych degree

I've met psychologists, but not as a client. I've met psychiatrists,
but not as a patient. I understand the significant difference between
the two.

You have a near perfect sweep of the NPD traits, with your insistence on
believing that people who politely express disagreement with you are
"attacking" you being one of the more outstanding.

To "politely express disagreement" does not include slanderously
mutilating my subject lines as you have done many times. Sparring with
you is mildly amusing, but my time is limited. So please don't feel
lonely if I stop.

Nick · Nov 15, 2009

Malcolm McLean said:
That's the sort of distinction that used to be meanigful but no longer is.
Nowadays PCs can be expected to run programs which may display characters in
any language, although any individual PC will probably only have two or
three languages in use. Mainframes still typcially spit out output to stdout
in some sort of encoding, but often their output files are not read from the
terminal - they are downloaded and displayed on PCs.
So the idea of an "execution character set" is getting wooly.

It is, but I think we're stuck with it.

Keith Thompson · Nov 15, 2009

Seebs said:
Nope! '' interprets the value as a character then promotes to int. So
on my system, '\400' == '\0'.

If your system has CHAR_BIT==8, then '\400' violates a constraint.
See C99 6.4.4.4p9:

Constraints

The value of an octal or hexadecimal escape sequence shall
be in the range of representable values for the type unsigned
char for an integer character constant, or the unsigned type
corresponding to wchar_t for a wide character constant.

Of course the compiler is free to issue a warning and then go on to
treat it as '\0' (which is what gcc does).

Hmm. Since a character constant is of type int, I would have expected
'\x82' to have type int and value +130. But gcc and Sun's C compiler
agree that its value is -126.

C99 6.4.4.4p6 specifies the meaning of a hexadecimal escape sequence:

The hexadecimal digits that follow the backslash and the letter
x in a hexadecimal escape sequence are taken to be part of the
construction of a single character for an integer character
constant or of a single wide character for a wide character
constant. The numerical value of the hexadecimal integer so
formed specifies the value of the desired character or wide
character.

Note that it says "character"; it doesn't refer to the type (plain)
char.

And, of course, the constraint I already quoted says that the value of
the hexadecimal escape sequence must be in the range of type unsigned
char. If '\x82' has the value -126, then it violates the constraint,
which I don't think is the intent.

My tentative conclusion is that the value of '\x82' is supposed to be
+130, not -126, and that both gcc and Sun's compiler get this wrong.
I'd be interested in any counterarguments.

This issue isn't likely to cause problems in real-world code, since
character constants are usually used with objects of type char, signed
char, or unsigned char. There's no good reason to use '\x82' rather
than 0x82 if you want to store it in an int.

Kenny McCormack · Nov 15, 2009

Are you a licensed psychiatrist too?

You should be more careful about slandering people. Some might take it
seriously.

The day anyone with any intelligence at all takes anything Seebs says
the least bit seriously, is the day we all better get really and
seriously worried about the future of the planet.

Ben Bacarisse · Nov 15, 2009

Keith Thompson said:
If your system has CHAR_BIT==8, then '\400' violates a constraint.
See C99 6.4.4.4p9:

Constraints

The value of an octal or hexadecimal escape sequence shall
be in the range of representable values for the type unsigned
char for an integer character constant, or the unsigned type
corresponding to wchar_t for a wide character constant.

Of course the compiler is free to issue a warning and then go on to
treat it as '\0' (which is what gcc does).

Hmm. Since a character constant is of type int, I would have expected
'\x82' to have type int and value +130. But gcc and Sun's C compiler
agree that its value is -126.

C99 6.4.4.4p6 specifies the meaning of a hexadecimal escape sequence:

The hexadecimal digits that follow the backslash and the letter
x in a hexadecimal escape sequence are taken to be part of the
construction of a single character for an integer character
constant or of a single wide character for a wide character
constant. The numerical value of the hexadecimal integer so
formed specifies the value of the desired character or wide
character.

Note that it says "character"; it doesn't refer to the type (plain)
char.

And, of course, the constraint I already quoted says that the value of
the hexadecimal escape sequence must be in the range of type unsigned
char. If '\x82' has the value -126, then it violates the constraint,
which I don't think is the intent.

My tentative conclusion is that the value of '\x82' is supposed to be
+130, not -126, and that both gcc and Sun's compiler get this wrong.
I'd be interested in any counterarguments.

I think the answer lies further on, in p10 under semantics:

10 An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer.
The value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape
sequence that does not map to a single-byte execution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that
results when an object with type char whose value is that of the
single character or escape sequence is converted to type int.

The actual int value of '\x82' is the value you'd get from a char with
that "character value" after being converted to int. Since char is
probably signed in the implementation you are using, gcc can give
-126.

I think all the gyrations are to avoid the possibility of an
implementation-defined conversion of an out-of range value. I think
that is why the standard talks about the value of the character rather
than using more concrete C terms. '\x82' can't just be 0x82 because
then, with a signed char type,

char c = '\x82';

would be governed by 6.3.1.3 p3:

"Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or
an implementation-defined signal is raised."

Instead, '\x80' denotes some character (not char) value that is put in
a char object and that char is converted to int. I think -126 is
correct on signed char machines.

This issue isn't likely to cause problems in real-world code, since
character constants are usually used with objects of type char, signed
char, or unsigned char. There's no good reason to use '\x82' rather
than 0x82 if you want to store it in an int.

Agreed.

You do want to be able to assign '\x82' to a char, though, and that
requires that '\x82' (an int) be in the range of char. The hex part
must be in the range of unsigned char, and the value you finally get
is the result of putting a not entirely well-specified "character
value" into a char object and converting that to int.

Seebs · Nov 15, 2009

I've met psychologists, but not as a client. I've met psychiatrists,
but not as a patient. I understand the significant difference between
the two.

Do you, though? Both are qualified to diagnose mental disorders -- what
differs is what they're qualified to do about it.

To "politely express disagreement" does not include slanderously
mutilating my subject lines as you have done many times.

Welcome to Usenet, where topic lines drift appropriately. There was no
defamation. Your program is incompetently written, poorly duplicative of
existing and widely available software which has done it better for ten
years, and maintained by someone who dismissed millions of users with
comments about how funny he thinks it is when they squeal.

(I would point out, by the way, if you want to sound cool and hip, you
must remember that "libel" is written, and "slander" is spoken.)

Sparring with
you is mildly amusing, but my time is limited. So please don't feel
lonely if I stop.

Oh, dear. I may faint. Where oh where are the smelling salts.

-s

Alan Curry · Nov 16, 2009

Ok, so how do I assign a character code to c that happens to be the code
130, and that happens to have a different encoding from the one C
understands?

Just type that character directly into a character literal. Here's an
experiment:

I want to use a particular character that is not part of the minimal
character set required by the C standard.

I know the numerical value of that character in a particular character set,
so I could use your
char c=130;
statement (with a different nmber) to get the character I want. Or I could
use the equivalent of
char c='\x82';

I also know that it's unlikely for my program ever to reach a system where
this character doesn't have the same value as my development system.

Or I could just type the character itself.

Which of the 3 alternatives do you prefer, when I show you the actual
character I'm talking about:

Spoiler ahead (this is not the end of the article. keep going.)

char c=36;
char c='\x24';
char c='$';

I like the last one. It's not less portable than the first two, and it's more
human-readable.

If you use the third form for your 130 character, by just typing it into the
source code as a literal, you'll get exactly the character you want. No
hardcoded character numbers necessary. Signedness irrelevant.

Keith Thompson · Nov 16, 2009

Ben Bacarisse said:
I think the answer lies further on, in p10 under semantics:

10 An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer.
The value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape
sequence that does not map to a single-byte execution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that
results when an object with type char whose value is that of the
single character or escape sequence is converted to type int.

The actual int value of '\x82' is the value you'd get from a char with
that "character value" after being converted to int. Since char is
probably signed in the implementation you are using, gcc can give
-126.

Ah, yes, I missed that. So gcc and Sun's compiler are right, and I
was wrong. (I'm actually a bit relieved.)

Here's some more convoluted reasoning:

Assume that CHAR_BIT == 8 and that plain char is signed with a
two's-complement representation. C99 6.4.4.4p says:

Constraints

The value of an octal or hexadecimal escape sequence shall be in
the range of representable values for the type unsigned char for
an integer character constant, or the unsigned type corresponding
to wchar_t for a wide character constant.

At first glance, this implies that, since '\x82' has the value -126,
and -126 is not "in the range of representable values for the type
unsigned char", '\x82' must be a constraint violation. This
clearly was not the intent. Looking back at 6.4.4.4p6:

The numerical value of the hexadecimal integer so formed
specifies the value of the desired character or wide character.

It says that it *specifies* the value, not that it *is* the value.
The numerical value of the hexadecimal integer 82 is +130, which is
within the range of unsigned char. This *specifies* the value -126
for "the desired character".

Phil Carmody · Nov 16, 2009

Ian Collins said:
With the benefit of 20-20 hindsight, I'm sure we would use unsigned or
maybe Unicode. But we are stuck with the baggage of 7 bit ASCII.

Remind me precisely how signed 7-bit ASCII is?

Phil

Ben Bacarisse · Nov 16, 2009

Keith Thompson said:
It says that it *specifies* the value, not that it *is* the value.
The numerical value of the hexadecimal integer 82 is +130, which is
within the range of unsigned char. This *specifies* the value -126
for "the desired character".

That is a very succinct way to put it. It was late when I wrote and
had the feeling that must be a simpler way to get at the distinction
that was being made.

Richard Tobin · Nov 16, 2009

Ian Collins said:
With the benefit of 20-20 hindsight, I'm sure we would use unsigned or
maybe Unicode. But we are stuck with the baggage of 7 bit ASCII.

char is already allowed to be an unsigned type. It just isn't on the
most commonly used platforms. It's not ASCII that's responsible, it's
that on some systems (if I recall correctly) it was significantly more
efficient for char to be signed.

-- Richard

Dik T. Winter · Nov 16, 2009

> Eric Sosman a écrit :
>
> The letter 'é' is 130. Why I should have it as -126 ???

Looking at the wrong way again.

> The problem is that you ignore foreign languages and all their special
> characters like é or è or à or £ or...

And what about 129? Is it e-grave or c-hacek? And why should it be one
and not the other?

What you see is an *encoding* of e-acute. In other codes than ISO8859-x
it will have a different code, and it works perfectly well if you use
the default char whether it is signed or not.

>
> Most the conversions are indirect, or because some operation with characters
> is done by promoting, etc etc.

Why would you widen e-acute?

>
> Sure, if we accept that 'é' is not a character THEN obviously
> "strcmp is not well suited to the task.

Unless you can tell us the reason for widening e-grave, c-hacek or e-acute
and so on this makes no sense.

bartc · Nov 16, 2009

Dik T. Winter said:
Looking at the wrong way again.

Click to expand...

Unless you can tell us the reason for widening e-grave, c-hacek or e-acute
and so on this makes no sense.

Click to expand...

int data[256]={0};

data['ú'] += 1;

Eric Sosman · Nov 16, 2009

bartc said:
Dik T. Winter said:

Looking at the wrong way again.

Click to expand...

Unless you can tell us the reason for widening e-grave, c-hacek or
e-acute
and so on this makes no sense.

Click to expand...

int data[256]={0};

data['ú'] += 1;

Click to expand...

int data[1+UCHAR_MAX] = { 0 };
data['ú' - CHAR_MIN] += 1;

Or you could use the `int *datap = data - CHAR_MIN;' trick
if desired.

jacob navia · Nov 16, 2009

Eric Sosman a écrit :

bartc said:
bartc said:

Dik T. Winter said:

The letter 'é' is 130. Why I should have it as -126 ???

Looking at the wrong way again.

Click to expand...

Unless you can tell us the reason for widening e-grave, c-hacek or
e-acute
and so on this makes no sense.

Click to expand...

int data[256]={0};

data['ú'] += 1;

Click to expand...

int data[1+UCHAR_MAX] = { 0 };
data['ú' - CHAR_MIN] += 1;

Or you could use the `int *datap = data - CHAR_MIN;' trick
if desired.

Click to expand...

GREAT!!!

But why should I be forced to remember to subtract CHAR_MIN ???

Countless bugs that have no reason to exist. Characters should
be unsigned.

Keith Thompson · Nov 16, 2009

jacob navia said:
Countless bugs that have no reason to exist. Characters should
be unsigned.

Is plain char signed or unsigned in lcc-win?

bartc · Nov 16, 2009

jacob navia said:
Eric Sosman a écrit :

bartc said:

The letter 'é' is 130. Why I should have it as -126 ???

Looking at the wrong way again.

Unless you can tell us the reason for widening e-grave, c-hacek or
e-acute
and so on this makes no sense.

int data[256]={0};

data['ú'] += 1;

Click to expand...

int data[1+UCHAR_MAX] = { 0 };
data['ú' - CHAR_MIN] += 1;

Or you could use the `int *datap = data - CHAR_MIN;' trick
if desired.

Click to expand...

GREAT!!!

But why should I be forced to remember to subtract CHAR_MIN ???

And whether it's UCHAR_MAX or CHAR_MIN (or is it CHAR_MAX or UCHAR_MIN), and
whether they are to be added or subtracted.

And I thought widening/arithmetic on characters were meaningless (according
to most of the posters on this thread), yet those lines seem to do just
that.

Eric Sosman · Nov 16, 2009

jacob said:
Eric Sosman a écrit :

bartc said:

The letter 'é' is 130. Why I should have it as -126 ???

Looking at the wrong way again.

Unless you can tell us the reason for widening e-grave, c-hacek or
e-acute
and so on this makes no sense.

int data[256]={0};

data['ú'] += 1;

Click to expand...

int data[1+UCHAR_MAX] = { 0 };
data['ú' - CHAR_MIN] += 1;

Or you could use the `int *datap = data - CHAR_MIN;' trick
if desired.

Click to expand...

GREAT!!!

But why should I be forced to remember to subtract CHAR_MIN ???

For the same reasons you are forced to remember to drive
your automobile on the right side of the road (pun intended).

Countless bugs that have no reason to exist. Characters should
be unsigned.

No argument. But they aren't. One can cope with the world
as it exists, imperfections and all, or one can seek a divorce
decree from Reality -- but few can afford the alimony payments.

String operations with unsigned char arrays	2	Mar 27, 2009
Compiling fics-1.7.4	3	May 6, 2011
Warning when comparing char[] to a #define'd string	12	Nov 7, 2008
gcc 4 signed vs unsigned char	22	Jul 26, 2005
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Differing signedness warnings when compiling ruby-odbc.	0	Jan 9, 2006
review of the "container library", part 1/?	18	Mar 1, 2011
M2Crypto-0.20.2, SWIG-2.0.0, and OpenSSL-1.0.0a build problem	5	Jul 13, 2010

Problem with gcc

Ben Bacarisse

Seebs

John Kelly

Nick

Keith Thompson

Kenny McCormack

Ben Bacarisse

Seebs

Alan Curry

Keith Thompson

Phil Carmody

Ben Bacarisse

Richard Tobin

Dik T. Winter

bartc

Eric Sosman

jacob navia

Keith Thompson

bartc

Eric Sosman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads