Problem with gcc

Seebs · Nov 15, 2009

I don't understand. You have the situation where this code:

char c=130;

if (c+10==140)
puts("It worked as expected.");
else
puts("It didn't work!");

does not do what you expect, and you're perfectly happy with this?

I think that's undefined behavior (you assigned a value to c that didn't
fit in the type, possibly).

-s

Ben Bacarisse · Nov 15, 2009

John Kelly said:
You can remove the *exam test.

Click to expand...

But then you're testing whether '\0' is a space or not. Perhaps it
improves performance, but is it good programming?

if (*++keep) {
*keep = '\0';
}

Click to expand...

And here you could replace the whole 'if' with 'keep[1] = 0;'.
Neither of them is wrong, of course, but every test makes the reader
wonder why it is there.

Click to expand...

But then you replace '\0' with '\0'. Which is worse, one extra test, or
a redundant action?

I can't imagine you think I'd suggest something I thought was bad
programming or a change that made the code worse, so your questions
are, I presume, rhetorical -- intended to prompt readers to decide for
themselves.

That was the purpose of my post. I think

while (isspace(*exam)) { ... }

and

keep[1] = 0;

are both slightly better whereas you don't. At least now, anyone who
has not yet decided such matters for themselves can see the options.

John Kelly · Nov 15, 2009

I can't imagine you think I'd suggest something I thought was bad
programming or a change that made the code worse, so your questions
are, I presume, rhetorical -- intended to prompt readers to decide for
themselves.

Treating '\0' as data in a NUL terminated string seems unnatural to me,
despite what the standard says. I know it's data in the sense of taking
up storage, but I think of it as metadata, a pseudo length specifier.

At least now, anyone who has not yet decided such matters for
themselves can see the options.

I'm not saying your C idiom is bad. For C, it's good. But how good can
C be, is the question.

Ian Collins · Nov 15, 2009

Richard said:
Because he can?

He can also go to work naked, but that isn't a particularly good idea
either.

As I understand it, signed char and unsigned
char are integer types. One expects to be able to mixed
arithmetic on integer types. However char is an exotic integer
type with the peculiar property that its signedness is undefined
by the language.

Which is why it shouldn't be used for mixed arithmetic.

The problem is simple: Char is an ill-defined integer type;
despite its name it is not a character type.

It is used to hold the representation of a single character.

Seebs · Nov 15, 2009

He can also go to work naked, but that isn't a particularly good idea
either.

I think that rather depends on his line of work. And also on whether
he telecommutes.

It is used to hold the representation of a single character.

Except when that has to be done in unsigned char, such as when using
the ctype functions...

-s

Nick · Nov 15, 2009

John Kelly said:
Treating '\0' as data in a NUL terminated string seems unnatural to me,
despite what the standard says. I know it's data in the sense of taking
up storage, but I think of it as metadata, a pseudo length specifier.

I'm not saying your C idiom is bad. For C, it's good. But how good can
C be, is the question.

Surely if you are programming in C, you should use C idiom - as it's
what anyone reading your code is going to expect. If you find C's idiom
too repulsive to use (and I don't think anyone would claim that there
aren't things in C that would never get in there now but that we're
stuck with), then don't use C.

But to try and force it into some sort of "like all other programs" mode
just means that you're writing unfriendly C.

bartc · Nov 15, 2009

Seebs said:
I think that's undefined behavior (you assigned a value to c that didn't
fit in the type, possibly).

Doesn't it?

char c=130,d=140;

if (c+10==d) seems to work as you'd expect.

bartc · Nov 15, 2009

Alan Curry said:
Already sloppy. 130 isn't a character, it's a number. The rest is just
a demonstration of "garbage in, garbage out"

Ok, so how do I assign a character code to c that happens to be the code
130, and that happens to have a different encoding from the one C
understands?

It's quite common to want to use character codes from 0 to 255, and it's
understandable that someone may want to use single chars and char arrays to
store them in.

But that's apparently full of gotchas. But what's more annoying is the
experts smugly explaining everytime the distinction between numbers,
characters, character codes, glyphs, and whatnot.

Instead of explaining why you can't do this or that, why couldn't the C99
people have fixed the problem instead?

Nick · Nov 15, 2009

bartc said:
Ok, so how do I assign a character code to c that happens to be the code
130, and that happens to have a different encoding from the one C
understands?

It's quite common to want to use character codes from 0 to 255, and it's
understandable that someone may want to use single chars and char arrays to
store them in.

But that's apparently full of gotchas. But what's more annoying is the
experts smugly explaining everytime the distinction between numbers,
characters, character codes, glyphs, and whatnot.

Instead of explaining why you can't do this or that, why couldn't the C99
people have fixed the problem instead?

Probably the usual problem: it would have broken stuff. I think I'd
have been a bit more ruthless than they were in several areas, but I'm
not a major organisation with a huge installed codebase - if I was, I
think I'd agree with what they did.

char is really only any good for holding characters, and only those
characters in a particular subset (I think this is what they call the
"execution character set") - crucially /not/ all the characters that can
be displayed on the host machine.

If you do treat them as characters, and never try to look at the numeric
values it works surprisingly well, even when they are outside the normal
range. For example, I find that I can read and write UTF8 into normal C
strings (but don't expect strlen to give you what you'd expect!).

Ben Bacarisse · Nov 15, 2009

bartc said:
Ok, so how do I assign a character code to c that happens to be the code
130, and that happens to have a different encoding from the one C
understands?

You'd write

char c = '\x82';

It's quite common to want to use character codes from 0 to 255, and it's
understandable that someone may want to use single chars and char arrays to
store them in.

But that's apparently full of gotchas. But what's more annoying is the
experts smugly explaining everytime the distinction between numbers,
characters, character codes, glyphs, and whatnot.

I suspect that there is a communication problem here, to some extent.
You've encountered a problem in the past which was fiddly to solve,
but in describing it you've simplified to the extent that the problem
disappears.

char data usually just comes form some input and goes to some output;
it is rare to calculate with the values. If you need to calculate
with the codes, you are doing arithmetic so need to be sure of range
of the integer type you are using and your code should probably use
explicitly signed or unsigned char types rather than plain char. (A
few calculations are covered by the various guarantees in the standard
(c - '0' for example) but these are the exception.)

Instead of explaining why you can't do this or that, why couldn't the C99
people have fixed the problem instead?

I think you are overstating the degree to which there is a problem.
How complex what it, eventually, get round the problem that you
encountered? I can't say why "the problem" was not fixed because I am
not sure exactly what it is. Changing anything as basic as the way
C's char is defined or the interface to the C library is would require
clear evidence of a major problem.

bartc · Nov 15, 2009

Ben Bacarisse said:
You'd write

char c = '\x82';

OK. But then you have this little anomaly:

int C = '\x82';
int D = 0x82;

You might expect C==D, but that isn't the case. Just something else to
explain that probably wouldn't need explaining if chars were not signed.

I suspect that there is a communication problem here, to some extent.
You've encountered a problem in the past which was fiddly to solve,
but in describing it you've simplified to the extent that the problem
disappears.

char data usually just comes form some input and goes to some output;
it is rare to calculate with the values.

If I was writing Cobol, that might be the case. But I do just as much
messing about with char codes as with integers.

I think you are overstating the degree to which there is a problem.

Up to now it was just one of those things. Every so often, something that
didn't work, would suddenly start working as soon as a few 'unsigned char's
were sprinkled about. I don't use C seriously enough to worry about it. (And
if I did use it seriously there are plenty of other things to take issue
with.)

But when someone as experienced as jacob navia says there is a problem, then
you listen.

How complex what it, eventually, get round the problem that you
encountered? I can't say why "the problem" was not fixed because I am
not sure exactly what it is. Changing anything as basic as the way
C's char is defined or the interface to the C library is would require
clear evidence of a major problem.

Insisting on char being unsigned by default I think would be more useful
than otherwise. Why do all the C compilers on my Windows machine use signed
char? What is the advantage?

Someone who wants small negative integers can explicitly write signed char.
For all other purposes, there is no reason for char to be signed.

Ben Bacarisse · Nov 15, 2009

bartc said:
OK. But then you have this little anomaly:

int C = '\x82';
int D = 0x82;

You might expect C==D, but that isn't the case.

Presumably you meant char C?

Just something else to
explain that probably wouldn't need explaining if chars were not
signed.

I don't think it is possible to "rescue" C from its ancient history.
You get into trouble if you assume that '\x82' necessarily equals
0x82 but I can't see why anyone has to make that assumption, though I
agree that many probably do.

Up to now it was just one of those things. Every so often, something
that didn't work, would suddenly start working as soon as a few
unsigned char's were sprinkled about. I don't use C seriously enough
to worry about it. (And if I did use it seriously there are plenty of
other things to take issue with.)

But when someone as experienced as jacob navia says there is a
problem, then you listen.

I prefer to take note when the problem is explained. Currently, I
just can't see it, which I agree may be my fault, but I can't see a
problem just because someone is experienced. It needs to be
explained.

Insisting on char being unsigned by default I think would be more
useful than otherwise. Why do all the C compilers on my Windows
machine use signed char? What is the advantage?

You'd have to ask a compiler author. Jacob made char signed on his
Windows implementation so you could ask him.

Someone who wants small negative integers can explicitly write signed
char. For all other purposes, there is no reason for char to be
signed.

Changing C needs a powerful motivation, and I have not seen one
presented that merits a change to one of C's basic types.

Eric Sosman · Nov 15, 2009

John said:
No, when I use C, I work around its limitations.

Click to expand...

Which C do you mean here? Kelly C, or internationally
agreed-upon C?

The string is data and the '\0' is metadata. The standard say it's all
data, but that's what someone else said. I think the '\0' is metadata,
serving as a pseudo length specifier.

Click to expand...

If "the standard say" [sic] isn't good enough for you, what
is there to discuss?

Click to expand...

It's so easy to make people angry here. Without even trying.

Not angry, no, but impatient. There's a serious point here,
one that is perhaps not appreciated by those who didn't use C in
the Bad Old Days. Before the ANSI Standard, "C" was whatever an
implementor felt like implementing. All one had to do was read
the White Book and start ringing one's favorite changes on it.
The result was that while it was often possible to write portable
"C," it was a difficult and clumsy business. The line about C
combining the power of assembly language with the portability of
assembly language dates from this era, when portability could be
achieved only by larding the source with #ifdef's to an extent
that would startle today's pampered practitioners.

When the ANSI Standard came along, imaginative implementors
were reined in somewhat and "C" became something that could be
agreed upon. You no longer needed an #ifdef to decide whether
to include <string.h> or <strings.h>, or to find out whether
sprintf() returned a count or a pointer, or to figure out whether
integer promotions preserved value or preserved sign, or ... It
became enormously easier (although still not trivial) to write
portable C code, portable to an extent that was economically
infeasible in the Bad Old Days.

And all this easing of difficulty and reduction of expense
came from -- what? From a single document that everyone could
point to and say "That is the official definition of C." Well,
not quite: The benefits flowed not from the document itself --
it's just words -- but from the agreement to accept the document
as the definition, the agreement that it was a bug, not a feature,
if one's C implementation failed to adhere to the Standard.

So when you decide to reject the Standard's definition of C
and substitute your own, it seems to me you are rejecting all the
benefits an agreed-upon definition generates. You appear to want
to return to the lawless Wild West, a place and time where life
was difficult and brief. (Yet even in the Wild West standards
had benefits: Just try finding ammunition for a .42 caliber
seven-shooter ...)

This must
be Trolls' Paradise.

Herein, alas, you are right.

John Kelly · Nov 15, 2009

"C," it was a difficult and clumsy business. The line about C
combining the power of assembly language with the portability of
assembly language dates from this era, when portability could be
achieved only by larding the source with #ifdef's to an extent
that would startle today's pampered practitioners.

C portability is still hard due to environmental differences. I can
port dh from Linux to BSD without too much work, but beyond that, say to
Solaris, the going gets tough.

So when you decide to reject the Standard's definition of C
and substitute your own, it seems to me you are rejecting all the
benefits an agreed-upon definition generates. You appear to want
to return to the lawless Wild West, a place and time where life
was difficult and brief.

My point about C string data vs. '\0' metadata was not a broad rejection
of standards. To me it was a trivial thing, a possible basis for an
interesting discussion of C programming vs. programming in general. My
apologies for arousing such ire.

Seebs · Nov 15, 2009

Instead of explaining why you can't do this or that, why couldn't the C99
people have fixed the problem instead?

Excellent question!

In this case, I think existing practice really was a killer -- quite simply,
way too many systems existed that had made each decision about what 'char'
was, and there were compelling advantages to having plain char represent
the system's native preferences.

Basically, I can't think of a fix that wouldn't break millions of lines of
code. Maybe the right solution would have been to introduce a new type,
but that seemed a bit drastic.

It certainly is a foundational flaw that 'char' is supposed to be both the
smallest addressable unit of at least 8 bits, and also the execution
character set. MHO.

-s

Seebs · Nov 15, 2009

OK. But then you have this little anomaly:

int C = '\x82';
int D = 0x82;

You might expect C==D, but that isn't the case. Just something else to
explain that probably wouldn't need explaining if chars were not signed.

#include <stdio.h>

int C = '\x100';
int D = 0x100;

int main(void) {
printf("%d %d\n", C, D);
return 0;
}

And there's something we wouldn't have to explain if we hadn't guaranteed
that characters were at least 9 bits.

Basically, I expect '' to give me local character set, which may or may not
be the same thing as a particular numeric value.

But when someone as experienced as jacob navia says there is a problem, then
you listen.

Listen, yes. Agree, not always.

Insisting on char being unsigned by default I think would be more useful
than otherwise. Why do all the C compilers on my Windows machine use signed
char? What is the advantage?

I think one of the reasons people used that in the past was that it was
consistent with the other types; no qualifier means signed. There may be
systems on which the ability to represent "-1" in the same object you'd use
to hold members of your tiny character set is also useful.

-s

Seebs · Nov 15, 2009

Presumably you meant char C?

Nope! '' interprets the value as a character then promotes to int. So
on my system, '\400' == '\0'.

-s

Seebs · Nov 15, 2009

C portability is still hard due to environmental differences. I can
port dh from Linux to BSD without too much work, but beyond that, say to
Solaris, the going gets tough.

Two issues here:

1. You're writing something intrinsically system-specific.
2. If you were more familiar with POSIX, you would probably not have this
problem.

Understanding the difference between "what this compiler accepts" and "what
is guaranteed for the set of environments I'm looking at" is important and
useful.

My point about C string data vs. '\0' metadata was not a broad rejection
of standards. To me it was a trivial thing, a possible basis for an
interesting discussion of C programming vs. programming in general. My
apologies for arousing such ire.

Again, you're showing classic NPD traits here.

No one's angry at you. They disagree with you and they're telling you why.
Disagreeing with you, or claiming you are incorrect, is not attacking. If
your brain perceives a disagreement as an attack, you are showing a diagnostic
criterion for NPD and should probably get that checked out, because that is
one of the more destructive cognitive problems you could possibly have.

-s

John Kelly · Nov 15, 2009

Again, you're showing classic NPD traits here.

No one's angry at you. They disagree with you and they're telling you why.
Disagreeing with you, or claiming you are incorrect, is not attacking. If
your brain perceives a disagreement as an attack, you are showing a diagnostic
criterion for NPD and should probably get that checked out, because that is
one of the more destructive cognitive problems you could possibly have.

Are you a licensed psychiatrist too?

You should be more careful about slandering people. Some might take it
seriously.

Richard Bos · Nov 15, 2009

bartc said:
Why not?

Because, as you know damned well, it's the Wrong Thing to do. You use
plain char for characters. If you want a small integer, you use
explicitly signed or unsigned char.

When I read this whole whine about how it's _soooo_ broken that char
does not behave the same on all systems, I am reminded of 1950s writing
manuals which assumed that all pupils were right-handed. Teachers have
grown out of that prejudice; it's time that C programmers do the same.

Richard

String operations with unsigned char arrays	2	Mar 27, 2009
Compiling fics-1.7.4	3	May 6, 2011
Warning when comparing char[] to a #define'd string	12	Nov 7, 2008
gcc 4 signed vs unsigned char	22	Jul 26, 2005
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Differing signedness warnings when compiling ruby-odbc.	0	Jan 9, 2006
review of the "container library", part 1/?	18	Mar 1, 2011
M2Crypto-0.20.2, SWIG-2.0.0, and OpenSSL-1.0.0a build problem	5	Jul 13, 2010

Problem with gcc

Seebs

Ben Bacarisse

John Kelly

Ian Collins

Seebs

Nick

bartc

bartc

Nick

Ben Bacarisse

bartc

Ben Bacarisse

Eric Sosman

John Kelly

Seebs

Seebs

Seebs

Seebs

John Kelly

Richard Bos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads