condition true or false? -> (-1 < sizeof("test"))

Nomen Nescio · May 20, 2012

BartC said:
That's true; the value of -3000000000u on my 32-bit C is well-defined;
completely wrong, but well-defined according to the Standard.

Actually only lcc-win32, out of my handful of C compilers, bothers to tell
me that that expression has an overflow.

The 'whinings' were to do with being dependent on compiler options for
figuring why programs like this:

I asked about that before elsewhere, why can't/don't C compilers do a better
job of pointing out obvious problems, given various lints have been written
to do just that. It seems so obvious to me that logic should be included in
a compiler worthy of the name. I was told to go **** myself. I didn't, but I
understood I was treading on another UNIX golden calf so..

unsigned int a=4;
signed int b=-2;

printf("%u<%d = %d\n", a, b, a<b);
printf("%d<%d = %d\n", 4, b, 4<b);
printf("%u<%d = %d\n", a, -2, a<-2);
printf("%d<%d = %d\n", 4, -2, 4<-2);

(notice the integer literals, or constants, or whatever you like to call
them today, have been correctly displayed as signed values) produce output
like this:

4<-2 = 1
4<-2 = 0
4<-2 = 1
4<-2 = 0

You don't need to know any C, or any language, for it to raise eyebrows. And
as it happened, I had trouble getting any of my four compilers to give any
warning, until someone told me to try -Wextra on gcc.

I don't know any C but it did raise my eyebrows. Looking into this a little:

#include <stdio.h>

int main() {
unsigned int a = 4;
signed int b = -2;

printf("%u<%d = %d\n", a, b, a<b);
printf("%d<%d = %d\n", 4, b, 4<b);
printf("%u<%d = %d\n", a, -2, a<-2);
printf("%d<%d = %d\n", 4, -2, 4<-2);
}

Works like yours:

./bartc
4<-2 = 1
4<-2 = 0
4<-2 = 1
4<-2 = 0

Agreed, not very helpful. Now let's try:

Solaris lint, comes with the system:

lint bartc.c
(9) warning: suspicious comparison of unsigned with negative constant: op "<"

function returns value which is always ignored
printf

Got one and missed one.

Even better, this:

Splint 3.1.2 --- 23 Nov 2011

bartc.c: (in function main)
bartc.c:7:32: Operands of < have incompatible types (unsigned int, int): a < b
To ignore signs in type comparisons use +ignoresigns
bartc.c:7:32: Format argument 3 to printf (%d) expects int gets boolean: a < b
To make bool and int types equivalent, use +boolint.
bartc.c:7:20: Corresponding format code
bartc.c:8:32: Format argument 3 to printf (%d) expects int gets boolean: 4 < b
bartc.c:8:20: Corresponding format code
bartc.c:9:33: Operands of < have incompatible types (unsigned int, int): a < -2
bartc.c:9:33: Format argument 3 to printf (%d) expects int gets boolean: a < -2
bartc.c:9:20: Corresponding format code
bartc.c:10:33: Format argument 3 to printf (%d) expects int gets boolean:
4 < -2
bartc.c:10:20: Corresponding format code
bartc.c:11:2: Path with no return in function declared to return int
There is a path through a function declared to return a value on which there
is no return statement. This means the execution may fall through without
returning a meaningful result to the caller. (Use -noret to inhibit warning)

Finished checking --- 7 code warnings

Conclusions: C (again) fails the least-surprise test, which is least surprising
since it is a language that just happened in an environment where there was
no premium on doing things right but there was a premium on doing them cheap
and without giving any help to the programmer. Resources were tight and
small and ok was better than big and good. What's the excuse now, in the
21st century? Two thumbs up to splint, btw. Damn fine piece of code.

Any serious C coder probably should use some form of lint or even better, splint.

How much C does someone need to know, to complain about -1 being silently
converted to something like 4294967295?

I saw the problem and it is somewhat obvious (I write assembly code for
work) but that doesn't mean everybody gets it right all the time in a big
piece of code. The compiler should be more helpful. And lint should be built
in to every C compiler.

A lot of my 'whinings' are backed up by people who know the language
inside-out. And although nothing can be done because the Standard is always
right, and the language is apparently set in stone, at least discussion
about various pitfalls can increase awareness.

Yes, C sucks, so why use it? I saw pretty quickly discussing any
shortcomings of UNIX or Linux or C just creates a flamefest, no matter how
shitty or broken all that stuff is. When you start messing with peoples'
religion you're going to get your ass kicked. Although it's hard to say
which is preferable, a trip to the dentist or coding on x86, I guess x86
assembly is preferably to C. At least there aren't any surprises. Lots of
disappointment and gasps of horror, but not real surprises.

BartC · May 20, 2012

It should be the size of the array that the string gives rise to, so it
should vary with the string's length:

OK, that was just a silly mistake. (I changed the string in the caption,
not the actual string being measured. But it was quite late at night
here...)

BartC · May 20, 2012

unsigned int a = 4;
signed int b = -2;

printf("%u<%d = %d\n", a, b, a<b);

bartc.c:7:32: Operands of < have incompatible types (unsigned int, int): a
< b
To ignore signs in type comparisons use +ignoresigns

Interesting error message, suggesting that use of mixed types is an error
here. As far as I know, comparing mixed types is perfectly legal. My
complaint is that, because it works by converting both to unsigned values,
there is a good chance the result will be completely wrong (ie. if one value
happens to be negative).

Actually, about 50% of arbitrary values within the range of INT_MAX, as this
program measures (using 0..RAND_MAX/2, and +/- RAND_MAX/2). Surely this is
too much for a language/implementation to just ignore?

The language sets out what should happen, and that's exactly what it does,
but I'm not sure how useful such compares are. (The explanation, I've been
told, is all results are perfectly correct according to the sound
mathematical principles on which C operates...)

#include <stdio.h>
#include <stdlib.h>

int main(void){
#define N 1000000
int i,sresult,uresult,total=0,matches=0;
unsigned int a;
signed int b;

for (i=1; i<=N; ++i) {
a=rand()/2;
b=rand()-RAND_MAX/2;
sresult=(signed int )a<b;
uresult=a<b;

// printf("Signed/Signed: %6d < %6d = %d\n", (signed int)a,b,sresult);
// printf("Unsigned/signed: %6u < %6d = %d (b=>%u)\n\n", a, b, uresult,
(unsigned int)b);

++total;
matches+=sresult==uresult;
}

printf("Total mixed compares: %d\n",total);
printf("Correct mixed compares: %d
(%.2f%%)\n",matches,100.0*(double)matches/total);
}

(Interestingly, for about 10 minutes I was convinced the program was giving
75% correct compares. Then I noticed I was using "=" instead of "=="...)

Ben Bacarisse · May 20, 2012

My
complaint is that, because it works by converting both to unsigned values,
there is a good chance the result will be completely wrong (ie. if one value
happens to be negative).

Actually, about 50% of arbitrary values within the range of INT_MAX, as this
program measures (using 0..RAND_MAX/2, and +/- RAND_MAX/2). Surely this is
too much for a language/implementation to just ignore?

You haven't backed up your complaint that the results are "completely
wrong" because you haven't addressed my point about C's design goals.

The integer promotions (which are done first, before the type conversion
that so bothers you) in effect tell the programmer that the compiler
reserves to right to put narrow data into "natural width" registers and
to operate on these instead. The type conversion rules for arithmetic
operators tell the programmer that unsigned operations will be used for
mixed-type operations without any further widening of the data. For
most arithmetic operation, on 2's complement machines, this makes no
difference, but it does for compares.

You may view the choice of converting to unsigned as arbitrary but,
either way, the result of the compare will either be "wrong" for half of
the signed values or for half of the unsigned values (if signed compare
were to be used instead). I can only conclude that you'd like... what?
That compares be done my moving the data into the widest registers
available? That makes some type (for which there is no wider register)
a special case. Maybe you'd prefer to leave the data alone and have a
more complex series of instruction be issued so that that answer is
"right" no matter what signs the operands happen to have. In other
words, what is the compiled code for a<b that you'd like to see?

I don't mind if you say that C's goal of being cheap to compile on a
wide range of machines is wrong (I'd disagree, but that's another
matter), but what I don't think you can say is that C's type conversion
rules are wrong without saying what price you want to pay for "better"
ones.

The language sets out what should happen, and that's exactly what it does,
but I'm not sure how useful such compares are. (The explanation, I've been
told, is all results are perfectly correct according to the sound
mathematical principles on which C operates...)

If you are referring to me, I did not say that. In particular, I did
not say anything about the mathematical principles that C uses for
compares.

#include <stdio.h>
#include <stdlib.h>

int main(void){
#define N 1000000
int i,sresult,uresult,total=0,matches=0;
unsigned int a;
signed int b;

for (i=1; i<=N; ++i) {
a=rand()/2;
b=rand()-RAND_MAX/2;
sresult=(signed int )a<b;
uresult=a<b;

// printf("Signed/Signed: %6d < %6d = %d\n", (signed int)a,b,sresult);
// printf("Unsigned/signed: %6u < %6d = %d (b=>%u)\n\n", a, b, uresult,
(unsigned int)b);

++total;
matches+=sresult==uresult;
}

printf("Total mixed compares: %d\n",total);
printf("Correct mixed compares: %d
(%.2f%%)\n",matches,100.0*(double)matches/total);
}

Programming is fun, but why write a program to calculate something so
predictable? By the way, RAND_MAX need not be INX_MAX so you could have
got even more confusing data. (And don't you indent code?)

James Kuyper · May 20, 2012

On 05/19/2012 08:02 PM, BartC wrote:
....

On DMC (Digital Mars' C) it gives this:

(size_t)-1 4294967295
sizeof "test" 5
-1+sizeof "test" 4
INT_MAX 2147483647
SIZE_MAX 4294967295
-1<sizeof "test" 1

Exactly the same, exact for the final compare.

I'm fairly certain that DMC is therefore non-conforming. Since you
approve of this result, I doubt that you'd be inclined to file a bug
report, but someone should.

BartC · May 20, 2012

For
most arithmetic operation, on 2's complement machines, this makes no
difference, but it does for compares.

(See below)

You may view the choice of converting to unsigned as arbitrary but,
either way, the result of the compare will either be "wrong" for half of
the signed values or for half of the unsigned values (if signed compare
were to be used instead).

But the half of the signed values, includes the very useful range from -1
downwards!

With signed compare, *any* value of int will work, but it will only go wrong
when the unsigned operand is above INT_MAX or so, which I would say is not
typical.

And you expect to take special care with using large numbers; you might not
expect to do when comparing 5 with -1 for example.

Using unsigned compares, then *any* negative value of the signed operand
might give the wrong result (and if someone chooses signed types, it might
be because they expect some values below zero!).

I can only conclude that you'd like... what?
That compares be done my moving the data into the widest registers
available? That makes some type (for which there is no wider register)
a special case.

I'm just saying that using a signed compare, even on the same width, might
have been a more useful choice; all the combinations that cause problems,
will be shifted into the top end of the unsigned operand.

I don't mind if you say that C's goal of being cheap to compile on a
wide range of machines is wrong (I'd disagree, but that's another
matter), but what I don't think you can say is that C's type conversion
rules are wrong without saying what price you want to pay for "better"
ones.

Using signed instead of unsigned for mixed arithmetic (promoting one side to
signed) I don't think costs any more (unless sign extension was costly in
the 1970s). Requiring a stern warning (or insisting on explicit casts)
wouldn't have any runtime costs.

You're right that for some arithmetic, sign doesn't come into it; the
machine operation is the same. However, if you try and print the results of
mixed arithmetic as unsigned values, then you will get more surprises that
doing everything as signed (especially if results wanting to be negative; or
doing divide or mod with one side negative).

If you are referring to me, I did not say that. In particular, I did
not say anything about the mathematical principles that C uses for
compares.

What was mentioned was "closed operations", "finite groups" and "additive
inverses". Also the fact that C uses "modulo arithmetic" for unsigned types
which cannot oveflow. All of which somehow suggest that unsigned types are a
completely different animal from the signed versions, with their own set of
rules.

....

Programming is fun, but why write a program to calculate something so
predictable?

I find probabilities unintuitive; I like to use actual measurements to back
up my surmises.

(And don't you indent code?)

(I use one-space indentation. But that seems to have to disappeared in your
quoted version.)

Eric Sosman · May 20, 2012

[Follow-ups set to comp.std.c in view of thread drift.]

On 5/18/12 8:24 AM, James Kuyper wrote:
The language has not been set in stone; C2011 made a lot of changes.
However, this is a very fundamental feature of the language. The
standard already allows a warning for such code; if your compiler
doesn't provide one, complain to the compiler vendor. The C standard
could mandate a diagnostic only at the cost of rendering the behavior
undefined if the code is compiled and executed despite generation of
that diagnostic. Such a change would break a LOT of code, and would
therefore be unacceptable.

The language could define another "level" of messages besides just a
single type of "diagnostic". There are several things now in the
language which if the standard had language to allow it to require the
emitting of a "warning" (vs an "error") could help programmers. In many
cases "good" compilers already implement them. One key is that the
implementation should need to define how to distinguish a "warning"
diagnostic from an "error" diagnostic, and allow for possible other
levels of messages (like "informational") which the standard doesn't
define/mandate.

Programs that created an "error" would have undefined behavior if
executed (if that was even possible), while programs which only
generated "warnings" should be able to be executed.

Click to expand...

Doesn't this suggestion lead to four types of diagnostics?

1) "Errors" whose issuance is required by the Standard, and
which are required to produce translation failure,

2) "Errors" whose issuance is required by the Standard, and
which are allowed (but not required) to produce translation
failure,

3) "Warnings" whose issuance is required by the Standard, and
which are not allowed to produce translation failure, and

4) "Informational messages" whose issuance is entirely optional
(the Standard might not even mention them), and which are not
allowed to cause translation failure.

We already have [1] (one instance), [2] (many instances), and [4]
(an open-ended set); the new feature of your proposal is [3]. Are
you convinced [3] is the proper province of a language Standard?
Even if "yes," I think it would be hard to get universal agreement
on what particular diagnostics should be promoted from [4] to [3].
Note, for example, that some compilers can be told to suppress
specific diagnostics; this shows that the line between [3] and [4]
is indistinct and situational, not easily drawn by a Committee many
miles and years removed from a particular slice of code.

The business of a compiler is to compile if it can and to tell
you potentially useful things it discovers in the process, but
setting policy about the use of its outputs seems to me outside its
proper sphere. There's a natural tendency to dump extra work onto
the compiler, simply because it's always "there" and in plain sight;
people might fail to run lint but they can't avoid the compiler.
But if you've got a people problem you should talk with the problem
people, not with ISO/IEC!

Click to expand...

The reason to want to have a warning class of diagnostics in my mind
would be most useful to allow more feature that clutter the language to
depreciated status. For example, a rule that a warning shall be
generated for any program that uses trigraphs, unless the first use of
them is the sequence ??= or an option has been provided to the
translator signifying that trigraphs are expected.

A "shall" that defers to a compiler option seems a pretty weak
requirement.

As for your grouping of messages, I would NOT advice the standard
distinguishing between 1 and 2 the way you have, if anything the
difference should be on if the implementations HAS generated a
translation result (not just on if it was allowed to), with the
disclaimer that any use of such a result is undefined behavior.

The distinction isn't mine; it's C-as-it-stands, today. The
lone [1] is the diagnostic generated by #error, which (as of C99)
requires the compiler to reject the code. The [2] category covers
all other diagnostics required by the Standard (in C90, even #error
is a [2]), because the compiler is allowed to overlook the problem
and press onward. If it does so the behavior is undefined, with the
usual proviso that an implementation is free to define behaviors for
situations the Standard leaves undefined.

I would agree that requiring a warning diagnostic for a signed/unsigned
comparison operation is probably farther than should be required by the
standard, and many of the warnings that come out of current compilers
are beyond what should be expected out of a "basic but conforming" C
compiler.

The phrase "quality of implementation" was used a lot when the
original ANSI Standard was new and people were getting used to it, but
it doesn't crop up as much nowadays. It seems to me a Standard does
well to steer clear of requiring a particular level of QoI -- or even
of suggesting one (look at how many people treated the Standard's
rand() example as prescriptive rather than illustrative). Besides,
I think the debate would become very politicized very quickly: If a
Committee tried to require a behavior that Compiler X exhibits but
Compiler Y does not, the decision might well be driven not by the
merits of the behaviors, but by the clout of the X and Y vendors.
Also, notions of "good practice" evolve more rapidly than Standards
can; we could easily wind up codifying what's already deprecated!

The users will "vote with their feet" in any case; the Standard
should stay out of the way and let the implementors do their own best
to attract foot traffic.

Richard Damon · May 20, 2012

Using signed instead of unsigned for mixed arithmetic (promoting one
side to
signed) I don't think costs any more (unless sign extension was costly in
the 1970s). Requiring a stern warning (or insisting on explicit casts)
wouldn't have any runtime costs.

You're right that for some arithmetic, sign doesn't come into it; the
machine operation is the same. However, if you try and print the results of
mixed arithmetic as unsigned values, then you will get more surprises that
doing everything as signed (especially if results wanting to be
negative; or doing divide or mod with one side negative).

My guess is that the reasoning comes to minimizing undefined behavior.
Note that overflow and such is well defined for unsigned value, but not
for signed values. There is also the fact that the conversion of a
signed to an unsigned value is defined by the standard. For an unsigned
value to signed, it is only defined IF the value is representable.
Converting an unsigned value greater than INT_MAX to an int, is
explicitly undefined behavior.

Also, it is possible better if the "surprises" happen in more common
cases than uncommon ones. The common cases are more apt to be tested,
and the issue found. The uncommon case might get missed in unit testing
and not show up until the program is fielded and cause a strange
behavior at the customer.

If the unsigned number should never get big enough in practice to reach
the problem case, than the question comes why was the value made
unsigned in the first place. One basic principle that I use is that you
do NOT make a variable unsigned, just because it shouldn't have negative
values, but only if you need some of the properties of unsigned arithmetic.

BartC · May 20, 2012

Richard Damon said:
On 5/20/12 8:59 AM, BartC wrote:

You're right that for some arithmetic, sign doesn't come into it; the
machine operation is the same. However, if you try and print the results
of
mixed arithmetic as unsigned values, then you will get more surprises
tha[n] doing everything as signed (especially if results wanting to be
negative; or doing divide or mod with one side negative).

Click to expand...

My guess is that the reasoning comes to minimizing undefined behavior.
Note that overflow and such is well defined for unsigned value, but not
for signed values.

There is also the fact that the conversion of a
signed to an unsigned value is defined by the standard. For an unsigned
value to signed, it is only defined IF the value is representable.
Converting an unsigned value greater than INT_MAX to an int, is
explicitly undefined behavior.

That sounds reasonable. They seem to have imparted almost 'magical'
properties to unsigned types, so that you can do anything you like to them
without ever getting a wrong result, because any result is always correct
even if it is arithmetically wrong.

If the unsigned number should never get big enough in practice to reach
the problem case, than the question comes why was the value made
unsigned in the first place. One basic principle that I use is that you
do NOT make a variable unsigned, just because it shouldn't have negative
values, but only if you need some of the properties of unsigned
arithmetic.

One property that is useful is to store bigger numbers without having to use
a wider type. But you don't necessarily want everything else that comes with
it.

Also, if you're using 'sizeof', then you don't have the choice to make it
signed, even if all the objects in your program are well within the range of
signed int. And with 'char' you might not know if it is signed or unsigned.

Ben Bacarisse · May 20, 2012

BartC said:
(See below)

But the half of the signed values, includes the very useful range from -1
downwards!

I'm going to stop here. I don't think a discussion of useful halves is
likely to be very enlightening -- especially as the "alternate C" is
likely to be different in other respects that I don't have time to work
though.

With signed compare, *any* value of int will work, but it will only go wrong
when the unsigned operand is above INT_MAX or so, which I would say is not
typical.

And you expect to take special care with using large numbers; you might not
expect to do when comparing 5 with -1 for example.

Using unsigned compares, then *any* negative value of the signed operand
might give the wrong result (and if someone chooses signed types, it might
be because they expect some values below zero!).

I'm just saying that using a signed compare, even on the same width, might
have been a more useful choice; all the combinations that cause problems,
will be shifted into the top end of the unsigned operand.

They would be undefined rather than just problematic, but presumably
you'd change that as well.

Using signed instead of unsigned for mixed arithmetic (promoting one side to
signed) I don't think costs any more (unless sign extension was costly in
the 1970s). Requiring a stern warning (or insisting on explicit casts)
wouldn't have any runtime costs.

You're right that for some arithmetic, sign doesn't come into it; the
machine operation is the same. However, if you try and print the results of
mixed arithmetic as unsigned values, then you will get more surprises that
doing everything as signed (especially if results wanting to be
negative; or doing divide or mod with one side negative).

What was mentioned was "closed operations", "finite groups" and "additive
inverses". Also the fact that C uses "modulo arithmetic" for unsigned types
which cannot oveflow. All of which somehow suggest that unsigned types are a
completely different animal from the signed versions, with their own set of
rules.

That's indeed what I said. Of the three main kinds of arithmetic that C
can do (floating, signed int and unsigned int) only unsigned arithmetic
has a simple mathematical model.

I find probabilities unintuitive; I like to use actual measurements to back
up my surmises.

(I use one-space indentation. But that seems to have to disappeared in your
quoted version.)

Curious. Only the for loop had any indentation in the post I saw, but
that single space did indeed vanish in the quoted part.

James Kuyper · May 20, 2012

Richard Damon said:
Richard Damon said:

On 5/20/12 8:59 AM, BartC wrote:

You're right that for some arithmetic, sign doesn't come into it; the
machine operation is the same. However, if you try and print the results
of
mixed arithmetic as unsigned values, then you will get more surprises
tha[n] doing everything as signed (especially if results wanting to be
negative; or doing divide or mod with one side negative).

Click to expand...

Click to expand...

My guess is that the reasoning comes to minimizing undefined behavior.
Note that overflow and such is well defined for unsigned value, but not
for signed values.

Click to expand...

There is also the fact that the conversion of a
signed to an unsigned value is defined by the standard. For an unsigned
value to signed, it is only defined IF the value is representable.
Converting an unsigned value greater than INT_MAX to an int, is
explicitly undefined behavior.

Click to expand...

That sounds reasonable. They seem to have imparted almost 'magical'
properties to unsigned types, so that you can do anything you like to them
without ever getting a wrong result, because any result is always correct
even if it is arithmetically wrong.

There is a "magical" property, but it's a property of the C standard:
namely, that it, by definition, defines what "correct behavior" for an
implementation of C is. It can be internally inconsistent, badly
designed, or poorly written, among other things, but it can't be wrong.
If you don't like that specification, you can use a different language,
or advocate that it be changed.

The C standard has a great many very precise specifications of the
behavior of unsigned integers. An implementation of C that gets any of
those specifications wrong is non-conforming. That's a completely
different issue from the fact that you don't like the specifications
they've made.

Note: "mathematically wrong" would require identification of specific
aspects of math that are supposed to correspond to C operators. Oddly
enough, mathematics is sufficiently broad field of study that it does
encompass concepts that correspond very well with the behavior required
by the C standard; they're just not the concepts you think should be
relevant. The problem is not that the choices made by the C standard are
mathematically wrong, it's only that they're different from the ones you
think they should have made.

One property that is useful is to store bigger numbers without having to use
a wider type. But you don't necessarily want everything else that comes with
it.

C chose not to provide that option; you get the option of storing bigger
numbers without change of arithmetic features by going from signed char,
to short, to int, to long, to long long; or by going from int_least8_t,
to int_least16_t, to int_least32_t, or int_least64_t (the fact that you
have two independent ways of doing that is a bit of confusion caused by
the necessity of backwards compatibility). The key point is that you can
only add individual bits to the size of variable without changing it's
signedness if it's a bit-field.

Also, if you're using 'sizeof', then you don't have the choice to make it
signed, even if all the objects in your program are well within the range of
signed int.

Just cast the result of sizeof to a signed type that you know to be big
enough. That's what casts are for.

And with 'char' you might not know if it is signed or unsigned.

If that's important, use signed char or unsigned char. That's what those
types were invented for. You should use plain char only for text
processing, not for storing small numbers.

Richard Damon · May 21, 2012

Richard Damon said:
Richard Damon said:

On 5/20/12 8:59 AM, BartC wrote:

You're right that for some arithmetic, sign doesn't come into it; the
machine operation is the same. However, if you try and print the results
of
mixed arithmetic as unsigned values, then you will get more surprises
tha[n] doing everything as signed (especially if results wanting to be
negative; or doing divide or mod with one side negative).

Click to expand...

Click to expand...

My guess is that the reasoning comes to minimizing undefined behavior.
Note that overflow and such is well defined for unsigned value, but not
for signed values.

Click to expand...

There is also the fact that the conversion of a
signed to an unsigned value is defined by the standard. For an unsigned
value to signed, it is only defined IF the value is representable.
Converting an unsigned value greater than INT_MAX to an int, is
explicitly undefined behavior.

Click to expand...

That sounds reasonable. They seem to have imparted almost 'magical'
properties to unsigned types, so that you can do anything you like to them
without ever getting a wrong result, because any result is always correct
even if it is arithmetically wrong.

They are defined to give the answer that standard computer hardware that
works on base 2 numbers would give. This meets with C goal of being an
efficient language. At the time the language was first being defined, a
similar rule for signed numbers could not be made as there wasn't a
single model for signed arithmetic in use on computers; while two's
complement was common, so was one's complement and signed magnitude.

The only way to get "arithmetically correct" values for mathematical
expressions would be to use an arbitrary precision math package for the
arithmetic.

One of the reasons C does not try to get signed/unsigned comparison
"right" by your rules, is that there exist (to my knowledge) no
processor with an instruction that will natively do the operation. you
can write the code explicitly to make it work out. As an example,
instead of:

int s;
unsigned u;
int flag;
....
flag = s < u;

you need to use

flag = (s < 0) || (s < u);

to get the right answer by your definition. Why this isn't built into
the language?, well it would make comparison operators a great big
exception to the usual arithmetic conversion, and sometimes it is better
to be consistent rather than intuitive, since once you grok the language
specification the inconsistencies become non-intuitive.

One property that is useful is to store bigger numbers without having to
use
a wider type. But you don't necessarily want everything else that comes
with
it.

Also, if you're using 'sizeof', then you don't have the choice to make it
signed, even if all the objects in your program are well within the
range of
signed int. And with 'char' you might not know if it is signed or unsigned.

If you know that all of your object size are going to be well withing
the range of signed int, just store them in signed int. Assigning a
size_t to int is not an error, nor is passing an int to a function
declared to take a size_t. The only places this doesn't work is if you
have to pass the address of a size_t to something, and I can't think of
any standard functions that take a size_t* parameter. The other case is
passing to a function like printf which uses var_args.

For combining int-s and unsigned-s to squeeze out that last drop of
range, well, you are going to need to be VERY careful looking over the
operations to make sure you haven't committed overflow. Note that by
your mathematical model there is no appropriate type the compiler can
use for taking the difference of two numbers without going to a larger
type. Since this is inefficient, C won't do that by default, but you can
force it to happen by widening the arguments before performing the
operation.

Using plain char to hold numbers is generally a mistake because of the
unknown sign-ness, if using char sized values to save space, ALWAYS make
them explicitly signed or unsigned.

Nomen Nescio · May 21, 2012

Forwarding this to guys who write code in real languages to see what they
think of this. AFAIK you cannot get something like that past the compiler in
Ada...and you would have to define a type or a subtype to even have an
unsigned int unless you use a modular type IIRC. In FORTRAN I don't remember
an unsigned integer but I haven't used it much since FORTRAN IV.

Basically C gives the coder no help in the example you wrote. It doesn't
make sense to do what you did. It's almost surely an error and it should at
least be flagged as a warning. The fact people calling themselves C
programmers can defend any compiler just letting this go by without at least
a warning amd flaming you and calling you an idiot and a noob really says a
lot about their total lack of discipline and explains the pathetic state of
buffer overflows and race conditions, ad naseum, found in C code..

Richard Maine · May 21, 2012

Nomen Nescio said:
Forwarding this to guys who write code in real languages to see what they
think of this. AFAIK you cannot get something like that past the compiler in
Ada...and you would have to define a type or a subtype to even have an
unsigned int unless you use a modular type IIRC. In FORTRAN I don't remember
an unsigned integer but I haven't used it much since FORTRAN IV.

Correct on the factual question. Standard Fortran has no unsigned kinds.
Some compilers do such a thing as an extension, but as is the nature of
extensions, one cannot guarantee that all compilers would make the same
choices on the fine points.

I won't comment on the rest. The parts you cited already sounded like a
flame fest, and using a term "real languages" doesn't help much. I won't
participate in any discussion that has already degraded to that point.

Tim Rentsch · May 21, 2012

Richard Damon said:
The language could define another "level" of messages besides just a
single type of "diagnostic". There are several things now in the
language which if the standard had language to allow it to require the
emitting of a "warning" (vs an "error") could help programmers. In many
cases "good" compilers already implement them. One key is that the
implementation should need to define how to distinguish a "warning"
diagnostic from an "error" diagnostic, and allow for possible other
levels of messages (like "informational") which the standard doesn't
define/mandate.

Programs that created an "error" would have undefined behavior if
executed (if that was even possible), while programs which only
generated "warnings" should be able to be executed.

This seems implicitly to offer two related but distinct suggestions,
namely, that two types of messages ("errors" and "warnings") be able
to be differentiated, and that certain constructions be identified
as requiring warning messages.

As to the first suggestion, this isn't a behavioral change but a
documentational one, ie, that implementations describe not one
subset of message outputs but two. Considered by itself, this
change doesn't seem to offer much value, because any implementation
that goes to the trouble of providing warning messages is likely
also to describe which are which.

As to the second suggestion, IMO the idea of changing the Standard
so that some legal program constructions are required to produce
warning messages, with no effect on program semantics, has nothing
to recommend it.

Kaz Kylheku · May 21, 2012

I don't know any C but it did raise my eyebrows. Looking into this a little:

#include <stdio.h>

int main() {
unsigned int a = 4;
signed int b = -2;

printf("%u<%d = %d\n", a, b, a<b);
printf("%d<%d = %d\n", 4, b, 4<b);
printf("%u<%d = %d\n", a, -2, a<-2);
printf("%d<%d = %d\n", 4, -2, 4<-2);
}

Works like yours:

./bartc
4<-2 = 1
4<-2 = 0
4<-2 = 1
4<-2 = 0

Agreed, not very helpful. Now let's try:

Solaris lint, comes with the system:

lint bartc.c
(9) warning: suspicious comparison of unsigned with negative constant: op "<"

You forgot the obvious: compile with "gcc -Wall -W".

test.c: In function â€˜mainâ€™:
test.c:7:33: warning: comparison between signed and unsigned integer expressions
test.c:9:34: warning: comparison between signed and unsigned integer expressions
test.c:11:1: warning: control reaches end of non-void function

function returns value which is always ignored
printf

Got one and missed one.

Even better, this:

Splint 3.1.2 --- 23 Nov 2011

This is not better. Splint is not a tool that you can simply download, compile
and then run in this manner. Proper use of Splint requires you to RTFM and then
fine-tune the tool to actually look for problems. Making the most of Splint
requires special annotations in the code.

If you just invoke it this way, you get reams of spewage full of all kinds of
false positive identifications of situations, most of which are are not
erroneous in any way.

Tim Rentsch · May 21, 2012

[snip]

[snip] The
lone [1] is the diagnostic generated by #error, which (as of C99)
requires the compiler to reject the code. The [2] category covers
all other diagnostics required by the Standard (in C90, even #error
is a [2]), because the compiler is allowed to overlook the problem
and press onward. If it does so the behavior is undefined, with the
usual proviso that an implementation is free to define behaviors for
situations the Standard leaves undefined. [snip]

Micro-nit: I think #error in C90 is not quite the same as the
[2] category, because all of those are necessarily undefined
behavior. A #error in C90 does not violate any syntax rule or
constraint, nor can I find any other reason it would give rise
to undefined behavior.

Other than the u-nit, very nice analysis.

Tim Rentsch · May 21, 2012

[snip, and minor snip later]

I used this code:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int main(void){
printf("(size_t)-1 %lu\n", (unsigned long)((size_t)-1));
printf("sizeof \"test\" %lu\n",(unsigned long)(sizeof "test"));
printf("-1+sizeof \"test\" %lu\n",(unsigned long)(-1 + sizeof "test"));
printf("INT_MAX %lu\n", (unsigned long)INT_MAX);
printf("SIZE_MAX %lu\n", (unsigned long)SIZE_MAX);
printf("-1<sizeof \"test\" %lu\n",(unsigned long)(-1<sizeof "test"));
}

On gcc, it gave this output:

(size_t)-1 4294967295
sizeof "test" 5
-1+sizeof "test" 4
INT_MAX 2147483647
SIZE_MAX 4294967295
-1<sizeof "test" 0

On DMC (Digital Mars' C) it gives this:

(size_t)-1 4294967295
sizeof "test" 5
-1+sizeof "test" 4
INT_MAX 2147483647
SIZE_MAX 4294967295
-1<sizeof "test" 1

Exactly the same, exact for the final compare.

Is there a chance that the Digital Mars compiler was
run in a non-conforming mode? I'd be very surprised
if Walter got this wrong.

glen herrmannsfeldt · May 21, 2012

(snip)

Basically C gives the coder no help in the example you wrote.
It doesn't make sense to do what you did. It's almost surely an
error and it should at least be flagged as a warning.
(snip)
(snip)

You should post a bug report for lcc-win32 then.

Unsigned types wrap, and don't overflow. (At least in C89.)

(snip)
Mixing signed and unsigned types can give surprising results.
The results are defined, but it is usual for compilers to
give a warning.

Note that C89 (and I believe later) allows signed integer types
to have twos complement, ones complement, or sign magnitude
representation, and that affects some expressions mixing signed
and unsigned values. In those cases, the results are implementation
dependent.

(If anyone writes a C compiler for the 7090 we can check this.)

(snip)
C programmers way too often don't check the return value of I/O
operations. Many I/O errors go unnoticed that way.

(snip)
This is new since C89, but has to be allowed for back compatibility.

As far as I can tell, this is a complaint against unsigned arithmetic
wrapping instead of generating overflow. C89 leaves undefined the
effect of overflow on signed integer expressions. One could still be
surprised in that case.

-- glen

BartC · May 21, 2012

Is there a chance that the Digital Mars compiler was
run in a non-conforming mode? I'd be very surprised
if Walter got this wrong.

I didn't see anything obvious amongst all the options. And I made sure it
was the latest version.

Program to find the largest integer element of an array.	1	Mar 2, 2022
Beginner at c	0	Oct 5, 2023
Problem with changing of elements' locations	0	Oct 18, 2022
Why is 'i' equal to 7?I know it's super simple but can anyone help me?	1	Feb 11, 2023
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
Why sizeof(main) = 1?	8	Dec 17, 2012
Adding adressing of IPv6 to program	1	Feb 16, 2023
Fibonacci	0	May 13, 2023

condition true or false? -> (-1 < sizeof("test"))

Nomen Nescio

BartC

BartC

Ben Bacarisse

James Kuyper

BartC

Eric Sosman

Richard Damon

BartC

Ben Bacarisse

James Kuyper

Richard Damon

Nomen Nescio

Richard Maine

Tim Rentsch

Kaz Kylheku

Tim Rentsch

Tim Rentsch

glen herrmannsfeldt

BartC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads