Signed <-> Unsigned conversions.

C

Charles Sullivan

Is there a C standard for how conversions between signed and unsigned
variables of the same size are implemented. E.g., if I have:

signed int s1, s2;
unsigned int u1;

s1 = -100;
u1 = s1;
s2 = u1;

printf("s2 = %d\n", s2);

With my (gcc) compiler, s2 has the same value (-100) as s1, but I'm
wondering if that's a C standard.

Thanks for your help.
 
J

John Kelly

Is there a C standard for how conversions between signed and unsigned
variables of the same size are implemented. E.g., if I have:

signed int s1, s2;
unsigned int u1;

s1 = -100;
u1 = s1;

at this point u1 = 4294967196


which cannot be represented in s2. Now if you believe in standards, C99
say:

Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.

So your program may freak out, or it may give you a reasonable answer,
depending on your compiler "implementation."

printf("s2 = %d\n", s2);

With my (gcc) compiler, s2 has the same value (-100) as s1, but I'm
wondering if that's a C standard.

Thanks for your help.

I'm still waiting for someone to point out an implementation that raises
a signal in this case.
 
O

osmium

John Kelly said:
at this point u1 = 4294967196



which cannot be represented in s2. Now if you believe in standards, C99
say:

Or that could have been written in English as

"The result is defined by the implementation."
 
K

Keith Thompson

Charles Sullivan said:
Is there a C standard for how conversions between signed and unsigned
variables of the same size are implemented. E.g., if I have:

signed int s1, s2;
unsigned int u1;

s1 = -100;
u1 = s1;
s2 = u1;

printf("s2 = %d\n", s2);

With my (gcc) compiler, s2 has the same value (-100) as s1, but I'm
wondering if that's a C standard.

If you want *all* the gory details, grab a copy of the latest
almost-official C standard draft from
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf>
and read the section on Conversions (6.3).

To summarize:

s1 = -100 assigns an int value to an int object; no conversion
is needed.

u1 = s1 peforms an implicit conversion of the int value -100 from
int to unsigned int. For a value out of range of the unsigned
type, the standard specifies that the conversion is done "by
repeatedly adding or subtracting one more than the maximum value
that can be represented in the new type until the value is in the
range of the new type" (this refers to mathematical values, not
to addition or subtraction as it would be done in C). Which is a
somewhat roundabout way of describing the way it typically works on
systems that use a 2's-complement representation for signed integers
(other (rare) systems must do whatever is needed to produce the
same result). So the value stored in u1 is UINT_MAX + 1 - 100.
If UINT_MAX is 65535, u1 is 65436u; if UINT_MAX is 4294967295,
u1 is 4294967196u.

s2 = u1 attempts to convert the unsigned int value UINT_MAX +
1 - 100 from unsigned int to int. Typically this will just
reverse the previous conversion and yield -100, but the standard
doesn't guarantee this; it says that "either the result is
implementation-defined or an implementation-defined signal is
raised". (The permission to raise a signal is new in C99; in C90,
it merely produced an implementation-defined result.)

So the behavior you're seeing is typical, but not guaranteed.
 
K

Keith Thompson

John Kelly said:
at this point u1 = 4294967196

Only if UINT_MAX happens to be 4294967295 (which is common but not
universal).

which cannot be represented in s2. Now if you believe in standards, C99
say:

What does C99 say if you don't believe in standards? What on Earth does
"believe in standards" even mean?

[...]
I'm still waiting for someone to point out an implementation that raises
a signal in this case.

Why are you waiting for that? The standard permits it; that doesn't
imply that any implementation takes advantage of that permission.
(I know of none that do so.)
 
K

Keith Thompson

osmium said:
"John Kelly" wrote: [...]
Or that could have been written in English as

"The result is defined by the implementation."

How is that an improvement? How is "the result is
implementation-defined"
not English?

Note that the standard defines the term "implementation-defined"
in 3.4.1.
 
B

BGB / cr88192

Behind China Blue Eyes said:
On a typical twos complement machine, it uses the same bits, just labelled
with
a signed or unsigned type. That's because the address arithmetic works out
the
same if you ignore overflow. Whilst many of these machines can detect
integer
overflow, I don't recall any C compilers that signal it. Division produces
different values signed and unsigned, but not addition and subtraction.

it would be extra effort to detect and to signal on many CPU's, and would
reduce performance, so it is generally not worth the bother.

in my case, when I wrote a compiler, I largely ignored the difference
between signed and unsigned types except in cases where it actually matters.
typically, the same CPU instructions are used either way, so it doesn't
matter too much.

a little more effort though is preserving the correct behavior of smaller
integer types when typically using full-width registers and operations,
since a lot of code will by unhappy if a calculation which exploits overflow
behavior just happens to perform the calculation with a wider-than-expected
value range.

example:
unsigned char a, b;
int c;

a=254; b=3;
c=a+b;

c should be 1, rather than 257.

it may seem trivial, but code can actually misbehave or break due to these
sorts of issues (as some naively written hash function produces a different
value, ...).

as well as finding the least-cost way of pulling this off:
for example, consider a byte-addition will preserve the correct behavior,
but maybe only certain registers have a byte-width addition instruction;
otherwise, one may have to use a full-width addition and use a mask or
zero-extend opcode to fix the result (but, this shouldn't always be done, as
maybe this costs an additional clock cycle the prior);
....

but, yeah, it is all this little fiddly crap which makes compiler writing
"fun" (and leads to lots of difficult-to-track-down bugs...).

or such...
 
O

osmium

Keith Thompson said:
How is that an improvement? How is "the result is
implementation-defined"
not English?

Note that the standard defines the term "implementation-defined"
in 3.4.1.

It is one single thought in a simple sentence. The messing with word
arrangement was just my way of saying it, that was not the point I was
trying to make. Why force the reader to tangle with a compound thought when
a single thought is the only information being conveyed? This is the
convoluted way lawyers talk.

On the tangential thing, "defined by the implementation" does not need to be
defined someplace else in the spec. It is a meaningful thought, all by
it's lonesome.
 
K

Keith Thompson

BGB / cr88192 said:
it would be extra effort to detect and to signal on many CPU's, and would
reduce performance, so it is generally not worth the bother.

in my case, when I wrote a compiler, I largely ignored the difference
between signed and unsigned types except in cases where it actually matters.
typically, the same CPU instructions are used either way, so it doesn't
matter too much.

And the rules in the C standard are specifically designed to allow (but
not require) you to do this.
a little more effort though is preserving the correct behavior of smaller
integer types when typically using full-width registers and operations,
since a lot of code will by unhappy if a calculation which exploits overflow
behavior just happens to perform the calculation with a wider-than-expected
value range.

example:
unsigned char a, b;
int c;

a=254; b=3;
c=a+b;

c should be 1, rather than 257.

No, c should be 257, and if it isn't you have a bug in your compiler
(assuming it's intended to be a conforming C compiler). C99 6.5.6p4:

If both operands have arithmetic type, the usual arithmetic
conversions are performed on them.

In this case, the "usual arithmetic conversions" (6.3.1.8) apply the
"integer promotions" (6.3.1.1p2), which promote unsigned char to int
(or, rarely, to unsigned int, but the result is the same either way).

[...]
 
S

Seebs

It is one single thought in a simple sentence.

But it screws the reader over.

Most readers will, given only your sentence, not consider "a signal
is raised" to be a possible outcome. The reason for the bifurcation
is that these are two separate outcomes, and the implementation must
document not only which one it picked, but the details of the choice.

But it is important to distinguish between this and a case where you
definitely get a number as a result, but the implementation defines
which one.

-s
 
K

Keith Thompson

osmium said:
Keith Thompson said:
osmium said:
"John Kelly" wrote: [...]
Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.

Or that could have been written in English as

"The result is defined by the implementation."

How is that an improvement? How is "the result is
implementation-defined"
not English?

Note that the standard defines the term "implementation-defined"
in 3.4.1.

It is one single thought in a simple sentence. The messing with word
arrangement was just my way of saying it, that was not the point I was
trying to make. Why force the reader to tangle with a compound thought when
a single thought is the only information being conveyed? This is the
convoluted way lawyers talk.

Sorry, I still don't understand why "defined by the implementation" is
any clearer than "implementation-defined".
On the tangential thing, "defined by the implementation" does not need to be
defined someplace else in the spec. It is a meaningful thought, all by
it's lonesome.

Oh? If something is "defined by the implementation", does that mean the
implementation has to document it, or can the implementers just choose
to define it in a certain way and leave it at that?

You've seen the nitpicking debates we have here over terms that *are*
defined by the standard. Do you think less rigor would be an
improvement?
 
K

Keith Thompson

Keith Thompson said:
osmium said:
Keith Thompson said:
:
[...]
Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.

Or that could have been written in English as

"The result is defined by the implementation."

How is that an improvement? How is "the result is
implementation-defined"
not English?

Note that the standard defines the term "implementation-defined"
in 3.4.1.

It is one single thought in a simple sentence. The messing with word
arrangement was just my way of saying it, that was not the point I was
trying to make. Why force the reader to tangle with a compound thought when
a single thought is the only information being conveyed? This is the
convoluted way lawyers talk.

Sorry, I still don't understand why "defined by the implementation" is
any clearer than "implementation-defined".

After reading Seebs's response, I think I might understand what you're
saying.

Are you suggesting that the entire phrase:

either the result is implementation-defined or an
implementation-defined signal is raised.

could be replaced by

The result is defined by the implementation.

? I had assumed you only meant to replace just "the result is
implementation-defined".

The current C99 wording says that exactly one of two things
must happen: *either* the conversion yields a result (and the
implementation must define what that result is), or the conversion
raises a signal (and the implementation must specify what that
signal is). Merely saying that "The result is defined by the
implementation" does not suggest any possibility other than yielding
a result -- or, if it does, it doesn't limit the possibilities to
raising a signal.
 
O

osmium

Keith Thompson said:
Keith Thompson said:
osmium said:
:
[...]
Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined
or an implementation-defined signal is raised.

Or that could have been written in English as

"The result is defined by the implementation."

How is that an improvement? How is "the result is
implementation-defined"
not English?

Note that the standard defines the term "implementation-defined"
in 3.4.1.

It is one single thought in a simple sentence. The messing with word
arrangement was just my way of saying it, that was not the point I was
trying to make. Why force the reader to tangle with a compound thought
when
a single thought is the only information being conveyed? This is the
convoluted way lawyers talk.

Sorry, I still don't understand why "defined by the implementation" is
any clearer than "implementation-defined".

After reading Seebs's response, I think I might understand what you're
saying.

Are you suggesting that the entire phrase:

either the result is implementation-defined or an
implementation-defined signal is raised.

could be replaced by

The result is defined by the implementation.

? I had assumed you only meant to replace just "the result is
implementation-defined".

The current C99 wording says that exactly one of two things
must happen: *either* the conversion yields a result (and the
implementation must define what that result is), or the conversion
raises a signal (and the implementation must specify what that
signal is). Merely saying that "The result is defined by the
implementation" does not suggest any possibility other than yielding
a result -- or, if it does, it doesn't limit the possibilities to
raising a signal.

Yes, your later interpretation is what I meant. I really don't enjoy
dissecting the spec and would rather end it here. I would rather deal with
simpler things like changing the gravitational constant.
 
B

BGB / cr88192

Keith Thompson said:
And the rules in the C standard are specifically designed to allow (but
not require) you to do this.

yes, ok.

No, c should be 257, and if it isn't you have a bug in your compiler
(assuming it's intended to be a conforming C compiler). C99 6.5.6p4:

If both operands have arithmetic type, the usual arithmetic
conversions are performed on them.

In this case, the "usual arithmetic conversions" (6.3.1.8) apply the
"integer promotions" (6.3.1.1p2), which promote unsigned char to int
(or, rarely, to unsigned int, but the result is the same either way).

interesting...

I had thought the correct semantics were to find the common promoted type of
'a' and 'b', which both being unsigned char, would be unsigned char, then
convert each side to this type.

thrn, one performs the arithmetic, which does its usual overflow thing, and
then one upcasts to int, giving the final result (which would be 1 by this
reasoning).

checking:
hmm, it seems both MSVC and GCC give 257, unexpected...

yet, if I switch to 'unsigned int' and 'long long', and try a similar
experiment, then the final result is truncated to the range of unsigned int
(the exact opposite behavior), in both GCC and MSVC (testing both 32 and 64
bit).


hmm...
 
K

Keith Thompson

BGB / cr88192 said:
yes, ok.



interesting...

I had thought the correct semantics were to find the common promoted type of
'a' and 'b', which both being unsigned char, would be unsigned char, then
convert each side to this type.

thrn, one performs the arithmetic, which does its usual overflow thing, and
then one upcasts to int, giving the final result (which would be 1 by this
reasoning).

checking:
hmm, it seems both MSVC and GCC give 257, unexpected...

yet, if I switch to 'unsigned int' and 'long long', and try a similar
experiment, then the final result is truncated to the range of unsigned int
(the exact opposite behavior), in both GCC and MSVC (testing both 32 and 64
bit).

It's not at all surprising once you realize that C has no arithmetic
operations for types of rank lower than int. The "usual arithmetic
conversions", which most operators apply to their operands, promote
any time narrower than int/unsigned int to int (if int can represent
all values of the original type) or to unsigned int (otherwise).
Only then are the rules that convert the operands to a common
type applied.

This makes thinks a lot easier for systems (many of them, I think) that
don't provide hardware instructions for narrow arithmetic.
 
P

Peter Nilsson

John Kelly said:
at this point u1 = 4294967196

If, and only if UINT_MAX is 4294967295.
which cannot be represented in s2.  Now if you believe
in standards, C99 say:

Please stop quoting non-posted text with > as if it was written
by a previous poster.

N1256 6.3.1.3p2:

"Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-
defined or an implementation-defined signal is raised."
So your program may freak out, or it may give you a
reasonable answer, depending on your compiler
"implementation."

Any implementation-defined result is reasonable since
it was presumably reasonable for compiler writers to
implement it that way.

But you could mention that by far the most common
conversion doesn't involve signals and simply performs
the inverse conversion of signed to unsigned.

It's highly likely on implementations you're ever likely to
see, but it isn't guaranteed. There are still some sign-
magnitude DSP chipsets being designed where the conversion
of unsigned to signed may not be what you might expect.
I'm still waiting for someone to point out an
implementation that raises a signal in this case.

Wait away. You and I may never see one in our lifetimes.
But the fact that a group of people put their money where
their mouth was, personally paying to be 'volunteers' on
the Committee to put this change in place, suggests they
had at least one such actual implementation in mind.

Even without knowing what that implementation is, I dare
you say you're not quite so stunned to know that floating
point signals are not at all uncommon. There have been
implementations of high ranked integers using floating
point representations.

In order to preserve the congruency between signed and
unsigned value bits, it may well be necessary to use a
non-normalised or biased mantissa. I wouldn't find it at
all suprising for such an implementation to raise a
signal rather than go out of its (already convoluted)
way to avoid one.
 
B

BGB / cr88192

Keith Thompson said:
BGB / cr88192 said:
Keith Thompson said:
interesting...

I had thought the correct semantics were to find the common promoted type
of
'a' and 'b', which both being unsigned char, would be unsigned char, then
convert each side to this type.

thrn, one performs the arithmetic, which does its usual overflow thing,
and
then one upcasts to int, giving the final result (which would be 1 by
this
reasoning).

checking:
hmm, it seems both MSVC and GCC give 257, unexpected...

yet, if I switch to 'unsigned int' and 'long long', and try a similar
experiment, then the final result is truncated to the range of unsigned
int
(the exact opposite behavior), in both GCC and MSVC (testing both 32 and
64
bit).

It's not at all surprising once you realize that C has no arithmetic
operations for types of rank lower than int. The "usual arithmetic
conversions", which most operators apply to their operands, promote
any time narrower than int/unsigned int to int (if int can represent
all values of the original type) or to unsigned int (otherwise).
Only then are the rules that convert the operands to a common
type applied.

This makes thinks a lot easier for systems (many of them, I think) that
don't provide hardware instructions for narrow arithmetic.

this is useful to know, but is admittedly not the first time I have
misunderstood the standard...

yeah, I guess it is convinient then as one doesn't have to fiddle with
things like trying to keep char or short types properly truncated in the
middle of expressions.


oh well, I am considering rewriting my codegen anyways (which in my compiler
is what contains this logic), as the current implementation is needlessly
crufty, and I may be in effect better off starting over with a different
design (considering going from the current AST->RPN->ASM design to an
AST->TAC->ASM design...).

forsaking the codegen internals being a big hackish stack-machine would
likely be an improvement, and would hopefully simplify debugging.

this may sound drastic, but actually most of my framework's architecture is
unlikely to be effected all that much (although still likely a bit of effort
to do a rewrite of the codegen).


or such...
 
B

BartC

BGB / cr88192 said:
interesting...

I had thought the correct semantics were to find the common promoted type
of 'a' and 'b', which both being unsigned char, would be unsigned char,
then convert each side to this type.

I thought the same; one compiler of mine also gives the result c=1. This is
not for C, so I don't need to care about standards. It just seemed more
intuitive for any arithmetic (binary) op to worry only about it's two
operand types, and not about an arbitrary third type.

(However, a more recent project of mine would give c=257, but only because
'char' is considered a 32-bit type for calculation purposes. Then, explicit
narrowing casts would be needed to get the same 1 result.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top