unsigned short addition/subtraction overflow

A

Andy

Hi,
Are 1 through 4 defined behaviors in C?

unsigned short i;
unsigned long li; /* 32-bit wide */

1. i = 65535 + 3;
2. i = 1 - 3;
3. li = (unsigned long)0xFFFFFFFF + 3;
4. li = 1 - 3;

TIA
Andy
 
J

James Hu

Are 1 through 4 defined behaviors in C?

unsigned short i;
unsigned long li; /* 32-bit wide */

1. i = 65535 + 3;
2. i = 1 - 3;
3. li = (unsigned long)0xFFFFFFFF + 3;
4. li = 1 - 3;

Yes.

-- James
 
R

Robert Stankowic

pete said:
No.
65536 is an allowable value for INT_MAX.
(65535 + 3) would be integer overflow
and undefined behavior in that case.

From N869: 6.2.5
9 The range of nonnegative values of a signed integer type is a subrange of
the
corresponding unsigned integer type, and the representation of the same
value in each
type is the same.28) A computation involving unsigned operands can never
overflow,
because a result that cannot be represented by the resulting unsigned
integer type is
reduced modulo the number that is one greater than the largest value that
can be
represented by the resulting type.

However, I am not sure about
i = 1 - 3;
and
li = 1 - 3;
but I think it is defined as far as I understand the integer promotion
rules.

cheers
Robert
 
C

CBFalconer

pete said:
No.
65536 is an allowable value for INT_MAX.
(65535 + 3) would be integer overflow
and undefined behavior in that case.

Good catch. However, change "would" to "could".
 
P

pete

Robert said:
From N869: 6.2.5
9 The range of nonnegative values of a
signed integer type is a subrange of
the corresponding unsigned integer type,
and the representation of the same value in each
type is the same.28)
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned
integer type is
reduced modulo the number that is one greater than
the largest value that can be
represented by the resulting type.

What's your point ?
If INT_MAX equals 65535,
then there are no unsigned operands in (65535 + 3), just two ints.
However, I am not sure about
i = 1 - 3;
and
li = 1 - 3;
but I think it is defined as far as I
understand the integer promotion rules.

The standard defines what happens when any integer
is the right operand of the assignment operator,
and the left operand is an unsigned type.
 
P

pete

CBFalconer said:
Good catch. However, change "would" to "could".

I think I got that right.
"in that case" meaning the case when 65536 was equal to INT_MAX,
then (65535 + 3) most definitely would overflow,
and definitely be undefined behavior.
The particular form of the manifestation of the undefined behavior,
would be different matter, if that's what you mean.
 
A

Andy

Hi,
Are 1 through 4 defined behaviors in C?

unsigned short i;
unsigned long li; /* 32-bit wide */

1. i = 65535 + 3;
2. i = 1 - 3;
3. li = (unsigned long)0xFFFFFFFF + 3;
4. li = 1 - 3;

TIA
Andy

Actually what I really meant is for unsigned operations.
This is what I want to know. Are the following defined in
C and always guaranteed warped around values?

unsigned short i;
unsigned long li; /* 32-bit wide */

1. i = (unsigned short)65535 + (unsigned short)3;
2. i = (unsigned short)1 - (unsigned short)3;
3. li = (unsigned long)0xFFFFFFFF + (unsigned long)3;
4. li = (unsigned long)1 - (unsigned long)3;

TIA
Andy
 
K

Kevin Goodsell

pete said:
No.
65536 is an allowable value for INT_MAX.

I don't think so. INT_MAX pretty much has to be an odd number, I think.
In fact, to satisfy the requirement that the non-negative integer values
have the same representation as the same values for the corresponding
unsigned type, and the requirement that unsigned types use a pure binary
representation, I think it's safe to say that INT_MAX must be (2^N)-1
for some integer N, which must be 15 or greater.

-Kevin
 
K

Kevin Goodsell

Andy said:
Actually what I really meant is for unsigned operations.
This is what I want to know. Are the following defined in
C and always guaranteed warped around values?

unsigned short i;
unsigned long li; /* 32-bit wide */

1. i = (unsigned short)65535 + (unsigned short)3;

You could write these easer this way:

i = 65535u + 3u;
2. i = (unsigned short)1 - (unsigned short)3;
3. li = (unsigned long)0xFFFFFFFF + (unsigned long)3;
4. li = (unsigned long)1 - (unsigned long)3;

It's difficult to produce undefined behavior with unsigned values.
(Unless you divide by zero, maybe. I don't actually know what the
standard says about that, which surprises me.) Overflow doesn't occur
with unsigned types, but it's possible for unsigned types that are
narrower than int to be promoted to (signed) int, which may allow
overflow (and undefined behavior) to occur.

-Kevin
 
J

James Hu

No.
65536 is an allowable value for INT_MAX.
(65535 + 3) would be integer overflow
and undefined behavior in that case.

Good catch. I did consider overflow, but I assumed that (65535+3) was
identical to writing (65538) because of computation at translation
time versus computation at run time.

-- James
 
C

Chris Torek

All C guarantees here is "at least" 32 bits, but here it does
not really matter.

You could write these easer this way:

i = 65535u + 3u;

Actually, this is potentially quite different.

If ANSI/ISO C used the *correct* rules (according to me :) )
it would be precisely the same, but we are stuck with quite
bogus widening rules due to a mistaken decision in the 1980s:
"when a narrow unsigned integer type widens, the resulting
type is signed if all the unsigned values fit, otherwise
it is unsigned".

In this particular case, unsigned short widens to either
unsigned int or signed int. Which one we get depends on the
properties of the implementation. This is a really dumb idea,
made in an attempt to be "less surprising" than the "right"
way ("narrow unsigned widens to unsigned"), that actually
turns out to be *more* surprising. But again we are stuck
with the wrong decision -- so let me define it.

What you must do is look in <limits.h> (perhaps by writing a
small C program, since the header may not exist) and compare
the values of USHRT_MAX and INT_MAX. One of the following two
cases will necessarily hold:

a) USHRT_MAX > INT_MAX.

This occurs on, e.g., the 16-bit PDP-11 and old 16-bit
MS-DOS C compilers. Here USHRT_MAX is 65535 while INT_MAX
is 32767.

b) USHRT_MAX <= INT_MAX.

This occurs on, e.g., today's 16-bit-short 32-bit-int C
compilers. Here USHRT_MAX is 65535 while INT_MAX is
2147483647.

In case (a), an unsigned short expression -- no matter what its
actual value is -- that appears in an arithmetic expression is
widened to unsigned int. Thus (unsigned short)65535 is
identical to (unsigned int)65535 or 65535U.

In case (b), howver, an unsigned short -- no matter what its actual
value is -- is widened to a *signed* int. Thus (unsigned short)65535
is identical to (int)65535 or 65535.

If we have two "unsigned short"s, values 65535 and 3 respectively,
and go to add them, we continue to have "case a" and "case b".
In case (a), the sum is 65535U + 3U, which has type unsigned int
and value 2. In case (b), the sum is 65535 + 3, which has type
signed int and value 65538.

In either case, when storing the final values back into an unsigned
short, it is reduced mod (USHRT_MAX+1), so that i becomes 2. The
place where this becomes a problem is not when we stuff the result
back into an unsigned variable, but rather when we compare it in
what the original 1989 C rationale called a "questionably signed"
expression.

Suppose we have the following code fragment:

unsigned short us = 65535;
int i = -1;

if (us > i)
printf("65535 > -1\n");
else
printf("65535 <= -1\n");

According to ANSI C's ridiculous rules (which we must obey anyway),
we decide whether this comparison uses "unsigned int" or "signed
int" based on whether USHRT_MAX exceeds INT_MAX. Once again, we
have the two cases:

case (a), USHRT_MAX > INT_MAX (PDP-11): "us" widens to an
unsigned int, value 65535U; i widens to an unsigned int,
value 65535U. 65535U > 65535U is false and we print
"65535 <= -1". This is, supposedly, "surprising" -- but
it happens!

case (b), USHRT_MAX < INT_MAX (VAX etc): "us" widens to
a signed int, value 65535; i remains signed int, value
-1. 65535 > -1 is true and we print "65535 > -1". This
is supposedly "not surprising" (which is probably true),
but in fact it is only SOMETIMES true.

As far as I am concerned, it is *much* better to be "predictably
surprising" than "unpredictably surprising based on the relative
values of USHRT_MAX and INT_MAX". The reason is that, while C
programmers do get surprised, they get surprised *once*, the *first*
time they mix signed and unsigned this way. This gives them the
opportunity to learn that the results are surprising; from then
on, they have no excuse to be surprised. Moreover, the logic
is trivial to follow: "unsigned widens to unsigned" means "put
an unsigned into an expression and it takes over."

Instead, we have a language where the code "works as expected" --
until it is moved to a machine where case (a) holds instead of case
(b). Programmers learn that mixing signed and unsigned is harmless
and "never surprises", only to find someday that, no, the language
is considerably more perverse than that. The logic is difficult
as well: "unsigned takes over except when it doesn't, based on the
relative values of the corresponding MAXes."
It's difficult to produce undefined behavior with unsigned values.

As long as you stick with unsigned int or unsigned long, anyway,
so that the broken widening rules do not trick you into accidentally
using signed values.
(Unless you divide by zero, maybe. I don't actually know what the
standard says about that, which surprises me.)

Division by zero produces undefined behavior, even for 1U / 0U and
the like.
Overflow doesn't occur
with unsigned types, but it's possible for unsigned types that are
narrower than int to be promoted to (signed) int, which may allow
overflow (and undefined behavior) to occur.

Yes. I claim that this rule is a terrible one; but I note that we
are stuck with it. The best approach is to avoid it -- make sure
you explicitly widen your narrow unsigned types to wider unsigned
types if the result (overflow or result of "questionably signed"
comparison) can matter. This kind of code is undeniably ugly, but
then, working around broken portions of any language (not just C)
is usually ugly.
 
K

Kevin Goodsell

Chris said:
Actually, this is potentially quite different.

Yes, obviously. Not sure what I was thinking there. I think I suffer
from "short blindness" - I either miss the word 'short' or
sub-consciously translate it to 'int'. This wasn't the first time.

Thanks for pointing out the error.

-Kevin
 
P

Peter Nilsson

pete said:
pete wrote: ....

I meant 65536.

Why? Neither is likely, but 65536 is considerably less so. Some would
argue (myself included) that 65536 is impossible on a conforming
implementation (be that C90 or C99).
 
P

pete

Peter said:
Why? Neither is likely, but 65536 is considerably less so. Some would
argue (myself included) that 65536 is impossible on a conforming
implementation (be that C90 or C99).

You would be right.
 
C

Christopher Benson-Manica

Chris Torek said:
Actually, this is potentially quite different.
If ANSI/ISO C used the *correct* rules (according to me :) )
it would be precisely the same, but we are stuck with quite
bogus widening rules due to a mistaken decision in the 1980s:
"when a narrow unsigned integer type widens, the resulting
type is signed if all the unsigned values fit, otherwise
it is unsigned".

Wow, what a great article! The only thing I'm unclear on now is why
such a seemingly obvious point escaped the C89 people, and why you
weren't around to dissuade them ;)
 
C

Chris Torek

Wow, what a great article! The only thing I'm unclear on now is why
such a seemingly obvious point escaped the C89 people, and why you
weren't around to dissuade them ;)

I was but a poor student at the time (making about four bucks an
hour, with a limit of 20 hrs/week, as "student staff") and could
not afford exotic vacation trips to ANSI C committee meetings. :)
I did, however, hear from someone who did go to them that this was
actually something of a "hotly debated" topic.

The VAX PCC did it "my" way, and apparently Plauger's C compiler(s)
did it the other way. The "base document" -- i.e., K&R-1 -- did
not even allow for the possibility of "unsigned short" and "unsigned
char", and if you have "narrow unsigned always widens to unsigned"
as a rule, you need an exception for plain char if/when plain char
is unsigned (as on the IBM 370), so that EOF can be negative.

The results of the rules differ only in "questionably signed" cases,
which are rare enough. But the ANSI rules are so ugly to work with
that I would prefer a special exception for "plain char is unsigned
on this implementation, yet nonetheless widens to signed int". Note
that this exception would force the constraint that CHAR_MAX < INT_MAX,
even when char is unsigned, which would have the happy side effect
of making stdio "work right".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top