Problem with gcc

Eric Sosman · Nov 14, 2009

Which has the potential to misbehave on ones' complement machines if
*cp is -0 (you might get 0 rather than UCHAR_MAX), so it's better to
cast the pointer:

if (isdigit(*(unsigned char *)cp)) ...

Note that if you've got two char variable cpz and cmz
holding the plus-zero and minus-zero representations, both
fputs(cpz,stream) and fputc(cmz,stream) output exactly the
same character: There's no way to tell from examining the
output which variable was written. That being the case, it's
not troubling that isdigit(0) == isdigit(0).

lawrence.jones · Nov 14, 2009

Keith Thompson said:
Note that it's not *converted* to unsigned char, it's *interpreted*
as unsigned char. That might have some odd effects on
sign-and-magnitude systems.

On the contrary, it *avoids* the odd effect of -0 being interpreted as 0
rather than as UCHAR_MAX, which it might well do if it were converted.

Phil Carmody · Nov 14, 2009

Flash Gordon said:
If it's a string then strcmp. This is not a problem because strcmp
will handle this case perfectly.

7.21.4 Comparison functions

[#1] The sign of a nonzero value returned by the comparison
functions memcmp, strcmp, and strncmp is determined by the
sign of the difference between the values of the first pair
of characters (both interpreted as unsigned char) that
differ in the objects being compared.

So strcmp thinks that, for the purposes of comparison of strings,
the characters in strings should be treated as unsigned chars.

So in some contexts, strings are more reasonably considered to be
a sequence of unsigned chars.

Why not all contexts?

Phil

Phil Carmody · Nov 14, 2009

John Kelly said:
'\0' is not part of the string

Please take that 'kick me' sign off your back. We're weak-willed
here and may do as you command.

Phil

Phil Carmody · Nov 14, 2009

Seebs said:
Yes, they do.

Cast arguments to the type strcmp expects. Or use 'char' for text data,
since it is the native type for text data

What is the type that strcmp uses to compare such text?

Phil

Seebs · Nov 14, 2009

Personally, I think the language (including the library) would be
cleaner if plain char were required to be unsigned. But there are
historical reasons for leaving it up to the implementation. (I think
making plain char signed made for significantly more efficient code
on the PDP-11; it's likely the same issue occurred on other systems.)

I don't know about efficiency, but consider:

char x;
short x;
int x;
long x;
long long x;

Why should one of these five values default to unsigned, when the others
all default to signed?

I would sort have preferred that 'signed' not be a keyword, and plain char
be always-signed. However, here we run into the essential clash between
'char' as thing to hold characters and 'char' as shortest basic integer
type.

Perhaps the correct solution would have been to use 'byte' and 'unsigned
byte', then have 'typedef <...> char_t' as the basic type used for strings,
etc.

-s

Ian Collins · Nov 14, 2009

Phil said:
7.21.4 Comparison functions

[#1] The sign of a nonzero value returned by the comparison
functions memcmp, strcmp, and strncmp is determined by the
sign of the difference between the values of the first pair
of characters (both interpreted as unsigned char) that
differ in the objects being compared.

So strcmp thinks that, for the purposes of comparison of strings,
the characters in strings should be treated as unsigned chars.

So in some contexts, strings are more reasonably considered to be
a sequence of unsigned chars.

Why not all contexts?

With the benefit of 20-20 hindsight, I'm sure we would use unsigned or
maybe Unicode. But we are stuck with the baggage of 7 bit ASCII.

As others have pointed out, when dealing with text, the value is what
matters, the representation is irrelevant. For non-textual data where
the value is relevant, use unsigned char.

Keith Thompson · Nov 14, 2009

Seebs said:
I don't know about efficiency, but consider:

char x;
short x;
int x;
long x;
long long x;

Why should one of these five values default to unsigned, when the others
all default to signed?

Because one of them is used primarily to hold character codes, which
are normally thought of as unsigned values.

There is a conflict between the idea that char is a type used to old
character codes, and that char is a narrow integer type. The way C
has (mostly) resolved this conflict is steeped in historical accident,
and I think it would have been done quite differently if the language
were being designed from scratch today.

I would sort have preferred that 'signed' not be a keyword, and plain char
be always-signed. However, here we run into the essential clash between
'char' as thing to hold characters and 'char' as shortest basic integer
type.

Perhaps the correct solution would have been to use 'byte' and 'unsigned
byte', then have 'typedef <...> char_t' as the basic type used for strings,
etc.

Or make char a fundamental type that isn't part of the family of
integer types.

One example: In Ada, "Character" is an enumerated type, and character
literals like 'x' are permitted as enumerators.

A solution for a hypothetical C-like solution might be something like
this:

The integer types are byte, short, int, long, and long long
(or choose better names if you like). There are signed and
unsigned versions of each of these; each name by itself refers
to the signed version. For example, "byte" and "signed byte"
are different names for the same type; "unsigned byte" is another
type with the same size but a different range. (Or maybe "byte"
is an exception, with "byte" being an alias for "unsigned byte",
since unsigned bytes are more useful. Either way, the choice
is made by the language, *not* by the implementation.)

Type char is distinct from any of these types, and can hold
a single character value. char acts like an unsigned type,
in the sense that converting a char value to a sufficiently
wide type always yields a nonnegative value.

Deciding whether char is actually an integer type, and whether
conversions between char and (other) integer types may be done
implicitly, is left as an exercise.

Of course it's way too late to change C in this way.

Ian Collins · Nov 14, 2009

Richard said:
[...] we are stuck with the baggage of 7 bit ASCII.

Click to expand...

7-bit ASCII? 7-BIT ASCII? I used to *dream* of being stuck with 7-bit
ASCII!

Try telling that to the Europeans!

Seebs · Nov 14, 2009

What is the type that strcmp uses to compare such text?

int.

What it actually compares, we're told, is the values "interpreted as
unsigned char" (which does not imply a conversion), but imagine that
you were to write this:

if (*(unsigned char *)s != *(unsigned char *)t)

The type used for this comparison is, of course, int. Unless int and
unsigned char are the same size, in which case, it's unsigned int.

-s

bartc · Nov 14, 2009

Flash Gordon said:
Why should it? In any case, as others mentioned, a cast will fix this.
Although I have to wonder why the char is being assigned to a larger
unsigned integer type in the first place, it seems an odd thing to do to
me.

Try this:

int offset;
char c;
unsigned char uc;

c=uc=130;
offset=10;

printf("Defchar <%u>\n",c+offset);
printf("Defchar <%d>\n",c+offset);
printf("Unsigned <%u>\n",uc+offset);
printf("Unsigned <%d>\n",uc+offset);

You expect 140 to be printed. But using a default char type, you will get
apparent nonsense displayed when this happens to be signed. Using an
explicit unsigned char, you get the 140 you expect, with both %d and %u
formats.

This can be fixed by workarounds, by really people have other matters to
worry about than fixing problems caused by C's idiosyncracies.

John Kelly · Nov 14, 2009

No, when I use C, I work around its limitations.

Click to expand...

Which C do you mean here? Kelly C, or internationally
agreed-upon C?

The string is data and the '\0' is metadata. The standard say it's all
data, but that's what someone else said. I think the '\0' is metadata,
serving as a pseudo length specifier.

Click to expand...

If "the standard say" [sic] isn't good enough for you, what
is there to discuss?

It's so easy to make people angry here. Without even trying. This must
be Trolls' Paradise.

jacob navia · Nov 14, 2009

bartc a écrit :

Try this:

int offset;
char c;
unsigned char uc;

c=uc=130;
offset=10;

printf("Defchar <%u>\n",c+offset);
printf("Defchar <%d>\n",c+offset);
printf("Unsigned <%u>\n",uc+offset);
printf("Unsigned <%d>\n",uc+offset);

You expect 140 to be printed. But using a default char type, you will
get apparent nonsense displayed when this happens to be signed. Using an
explicit unsigned char, you get the 140 you expect, with both %d and %u
formats.

This can be fixed by workarounds, by really people have other matters to
worry about than fixing problems caused by C's idiosyncracies.

Exactly!

If you can avoid bugs, why not avoiding them?

Flash Gordon · Nov 14, 2009

bartc said:
Try this:

Why? I can see no good reason to do what you are doing.

int offset;
char c;
unsigned char uc;

c=uc=130;
offset=10;

printf("Defchar <%u>\n",c+offset);
printf("Defchar <%d>\n",c+offset);
printf("Unsigned <%u>\n",uc+offset);
printf("Unsigned <%d>\n",uc+offset);

You expect 140 to be printed. But using a default char type, you will
get apparent nonsense displayed when this happens to be signed. Using an
explicit unsigned char, you get the 140 you expect, with both %d and %u
formats.

Well, with the number of bits of implementation defined (or maybe
undefined) behaviour you could get nonsense, but I get numbers which I
expect on my implementation.

This can be fixed by workarounds, by really people have other matters to
worry about than fixing problems caused by C's idiosyncracies.

I've yet to see a good argument *why* you care about the numeric value
what you are using it as a character. In your example above you are
clearly using it as a number, so that is not relevant to the discusion.

bartc · Nov 15, 2009

Flash Gordon said:
bartc wrote:
I've yet to see a good argument *why* you care about the numeric value
what you are using it as a character. In your example above you are
clearly using it as a number, so that is not relevant to the discusion.

I don't understand. You have the situation where this code:

char c=130;

if (c+10==140)
puts("It worked as expected.");
else
puts("It didn't work!");

does not do what you expect, and you're perfectly happy with this?

char values containing character representations which are negative, get
unexpectedly sign-extended when used in mixed arithmetic. Usually this is
undesirable, and unexpected if you are unaware of the signedness of your
char type.

You can fix this in *your* code, by using unsigned char types, but then you
get type mismatches with other code. Or you can stick (unsigned char)
everywhere, which is really going to help unclutter your code and make it
readable...

As to using character code as numbers, well I've been doing that for two or
three decades, and with codes up to 255 too. It does happen from from time
to time that you do arithmetic with numbers representing character codes...

Ian Collins · Nov 15, 2009

bartc said:
I don't understand. You have the situation where this code:

char c=130;

if (c+10==140)
puts("It worked as expected.");
else
puts("It didn't work!");

does not do what you expect, and you're perfectly happy with this?

char values containing character representations which are negative, get
unexpectedly sign-extended when used in mixed arithmetic. Usually this
is undesirable, and unexpected if you are unaware of the signedness of
your char type.

The question still remains: why are you doing mixed arithmetic on
character values?

bartc · Nov 15, 2009

Ian Collins said:
bartc wrote:

The question still remains: why are you doing mixed arithmetic on
character values?

Why not?

I didn't even need to make why code so elaborate:

char c=130;

if (c==130)

will fail, and it's not immediately obvious that it *is* mixed arithmetic.

And on my machine:

char c=255;
if (c==EOF)

will be true, but not true for unsigned char. I thought char signedness
wasn't supposed to matter...

Alan Curry · Nov 15, 2009

I don't understand. You have the situation where this code:

char c=130;

Already sloppy. 130 isn't a character, it's a number. The rest is just
a demonstration of "garbage in, garbage out"

Ian Collins · Nov 15, 2009

bartc said:
Why not?

I didn't even need to make why code so elaborate:

char c=130;

But why would you write such code? char represents a character, not a
numeric value. I could just as easily write

short n = 0x8000;

if( n == 0x8000 )

and in either case, my compiler would give me a handy warning.

if (c==130)

will fail, and it's not immediately obvious that it *is* mixed arithmetic.

And on my machine:

char c=255;
if (c==EOF)

will be true, but not true for unsigned char. I thought char signedness
wasn't supposed to matter...

It doesn't, if you use char to store characters.

Ben Bacarisse · Nov 15, 2009

Which has the potential to misbehave on ones' complement machines if
*cp is -0 (you might get 0 rather than UCHAR_MAX), so it's better to
cast the pointer:

if (isdigit(*(unsigned char *)cp)) ...

Nasty. I'd really, really, hope that char would be unsigned on such a
machine! On a system like that -- with signed char -- even

while (!*cp) ...

breaks, does it not?

String operations with unsigned char arrays	2	Mar 27, 2009
Compiling fics-1.7.4	3	May 6, 2011
Warning when comparing char[] to a #define'd string	12	Nov 7, 2008
gcc 4 signed vs unsigned char	22	Jul 26, 2005
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Differing signedness warnings when compiling ruby-odbc.	0	Jan 9, 2006
review of the "container library", part 1/?	18	Mar 1, 2011
M2Crypto-0.20.2, SWIG-2.0.0, and OpenSSL-1.0.0a build problem	5	Jul 13, 2010

Problem with gcc

Eric Sosman

lawrence.jones

Phil Carmody

Phil Carmody

Phil Carmody

Seebs

Ian Collins

Keith Thompson

Ian Collins

Seebs

bartc

John Kelly

jacob navia

Flash Gordon

bartc

Ian Collins

bartc

Alan Curry

Ian Collins

Ben Bacarisse

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads