conversion of signed integer to unsigned integer

J

junky_fellow

Can anybody please explain this:

[ N869 6.3.1.3 ]

When a value with integer type is converted to another integer type
other than _Bool,
if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.


Thanx ...
 
E

Eric Sosman

Can anybody please explain this:

[ N869 6.3.1.3 ]

When a value with integer type is converted to another integer type
other than _Bool,
if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.

unsigned short us; /* assume USHRT_MAX == 65535 */
us = -1000000;

The variable is unsigned, and not capable of expressing a
negative value. The assignment must set it to a non-negative
value, and 6.3.1.3 describes how that value is computed:

-1000000
+65536 (USHRT_MAX+1)
========
-934464
+65536
========
-868928
:
:
========
-16960
+65536
========
48576 (final answer, Regis)

Various computational shortcuts are available; all those
additions need not actually be peformed to get to the answer.
 
J

junky_fellow

Eric said:
Can anybody please explain this:

[ N869 6.3.1.3 ]

When a value with integer type is converted to another integer type
other than _Bool,
if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.

unsigned short us; /* assume USHRT_MAX == 65535 */
us = -1000000;

The variable is unsigned, and not capable of expressing a
negative value. The assignment must set it to a non-negative
value, and 6.3.1.3 describes how that value is computed:

-1000000
+65536 (USHRT_MAX+1)
========
-934464
+65536
========
-868928
:
:
========
-16960
+65536
========
48576 (final answer, Regis)

Various computational shortcuts are available; all those
additions need not actually be peformed to get to the answer.

Consider,
signed char sc = -4; /* binary = 11111100 */
unsigned char uc = sc;

Now, if I print the value of sc it is 252 (binary 11111100).

So, if you see no conversion has been done. The bit values in
"uc" are exactly the same as in "sc".

Then, whats the need of performing so many additions/subtractions ?
 
J

Jean-Claude Arbaut

Consider,
signed char sc = -4; /* binary = 11111100 */
unsigned char uc = sc;

Now, if I print the value of sc it is 252 (binary 11111100).

So, if you see no conversion has been done. The bit values in
"uc" are exactly the same as in "sc".

Great confusion between bit pattern and actual value, here.
First, 252 = -4 + 256, needed to get a valid unsigned char.
Second, not all machines are 2-complement, though many are
nowadays.

As Eric said, there are computational shortcuts, and you
should note they depend on the machine (hence, on the
implementation of the standard). It's pure coincidence
if a signed char and an unsigned char have actually the same
representation for "s" and "256+s" respectively.
 
J

Jean-Claude Arbaut

Le 17/06/2005 15:49, dans BED8A19F.50FE%[email protected],
« Jean-Claude Arbaut » said:
Great confusion between bit pattern and actual value, here.
First, 252 = -4 + 256, needed to get a valid unsigned char.
Second, not all machines are 2-complement, though many are
nowadays.

As Eric said, there are computational shortcuts, and you
should note they depend on the machine (hence, on the
implementation of the standard). It's pure coincidence
if a signed char and an unsigned char have actually the same
representation for "s" and "256+s" respectively.

Just for reference:

ISO 9899-1999, section 6.2.6.2#2 p39
 
M

Me

Can anybody please explain this:
When a value with integer type is converted to another integer type
other than _Bool,
if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.

It's basically describing a mod operation. Lets say you want to convert
any random signed integer to an unsigned int, a table of those values
looks like:

....
UINT_MAX+2 1
UINT_MAX+1 0
UINT_MAX UINT_MAX
UINT_MAX-1 UINT_MAX-1
UINT_MAX-2 UINT_MAX-2
....
2 2
1 1
0 0
-1 UINT_MAX
-2 UINT_MAX-1
....
-UINT_MAX+1 2
-UINT_MAX 1
-UINT_MAX-1 0
-UINT_MAX-2 UINT_MAX
-UINT_MAX-3 UINT_MAX-1
....

Where the left side is the signed integer value and the right side is
the resulting value when converted to unsigned int. I'm sure you can
figure it out from there.
 
M

Mac

Eric said:
Can anybody please explain this:

[ N869 6.3.1.3 ]

When a value with integer type is converted to another integer type
other than _Bool,
if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.

unsigned short us; /* assume USHRT_MAX == 65535 */
us = -1000000;

The variable is unsigned, and not capable of expressing a
negative value. The assignment must set it to a non-negative
value, and 6.3.1.3 describes how that value is computed:

-1000000
+65536 (USHRT_MAX+1)
========
-934464
+65536
========
-868928
:
:
========
-16960
+65536
========
48576 (final answer, Regis)

Various computational shortcuts are available; all those
additions need not actually be peformed to get to the answer.

Consider,
signed char sc = -4; /* binary = 11111100 */
unsigned char uc = sc;

Now, if I print the value of sc it is 252 (binary 11111100).

So, if you see no conversion has been done. The bit values in
"uc" are exactly the same as in "sc".

Then, whats the need of performing so many additions/subtractions ?

junky,

Eric is pretty sharp. He dumbed down his answer a bit because he was
afraid of confusing you. Your first post made it look like you were prone
to confusion. Your second post has not changed that appearance.

But now you are coming back like a smart-alec. ;-)

Even so, nothing in Eric's post is incorrect, as far as I can see, and
nothing in your followup contradicts anything in Eric's post.

He specifically said that the additions and/or subtractions need not
actually be performed to get the answer.

In the case of typical architectures, converting from unsigned to signed
of the same size may well be a no-op. The conversion really just means
that the compiler will change how it thinks of the bit pattern, not the
bit-pattern itself. As it turns out, this behavior satisfies the
mathematical rules laid out in the standard. This is probably not a
coincidence. I believe the intent of the rule was to force any non two's
complement architectures to emulate two's complement behavior. This is
convenient for programmers.

Even in the cases where a conversion from signed to unsigned involves
types of different sizes, typical architectures will have minimal work to
do to perform the conversion as specified in the standard. Non-typical
architectures, if there really are any, might have to do some arithmetic.

--Mac
 
J

Jean-Claude Arbaut

Le 17/06/2005 18:20, dans (e-mail address removed), « Mac »
On Fri, 17 Jun 2005 06:35:09 -0700, junky_fellow wrote:

In the case of typical architectures, converting from unsigned to signed
of the same size may well be a no-op. The conversion really just means
that the compiler will change how it thinks of the bit pattern, not the
bit-pattern itself.

Only on 2-complement architectures, but the standard envisages three
possibilities.

On other machines, a negative signed char and its unsigned char counterpart
cannot have same bit pattern.
As it turns out, this behavior satisfies the
mathematical rules laid out in the standard. This is probably not a
coincidence. I believe the intent of the rule was to force any non two's
complement architectures to emulate two's complement behavior.

I don't see why. The additions required by the standard are on mathematical
values, not on registers. Sections 6.2.6.2 and 6.3.1.3 do no rely
particularly on (or emulate) 2-complement behaviour.


This is
convenient for programmers.

Even in the cases where a conversion from signed to unsigned involves
types of different sizes, typical architectures will have minimal work to
do to perform the conversion as specified in the standard. Non-typical
architectures, if there really are any, might have to do some arithmetic.

At least there were. IBM 704 had 36 bits words, with 1 sign bit and 35 bits
of magnitude. There may be more modern machines with same kind of
arithmetic, I just looked for one :) Reference is: "IBM 704, Manual
Of Operation, 1955" at www.bitsavers.org. Of course nothing to do with the C
language, just an example of a different machine. If anybody knows of modern
ones, I'm of course interested.
 
L

Lawrence Kirby

On Fri, 17 Jun 2005 06:35:09 -0700, junky_fellow wrote:

....
Consider,
signed char sc = -4; /* binary = 11111100 */

That is the representation on your implementation, it may be something
else on another implementation.
unsigned char uc = sc;

Now, if I print the value of sc it is 252 (binary 11111100).

So, if you see no conversion has been done.

Yes it has. You had a value of -4, now you have a value of 252. A very
real conversion has happened that has produced a different value.
The bit values in
"uc" are exactly the same as in "sc".

That's a happy coincidence, well not entirely a coincidence. The
conversion rules are designed to be efficiently implementable on common
architectures as well as being useful.

The underlying representation is not important, the result of the
conversion is defined on VALUES. You stared with a value and following the
conversion rules you added (UCHAR_MAX+1) in this case 256 to produce a
result of 252. You didn't need to know anything about the underlying
representation to determine that. On a 1's complement implementation -4
would be represented in 8 bits as 11111011, but uc = sc would still
produce the result 252 because the conversion is defined in terms of
value. On such systems the implementation must change the representation
to produce the correct result. That's the price you pay for portability
and consistent results.
Then, whats the need of performing so many additions/subtractions ?

The additions/subtractions are just a means in the standard to specify
what the correct result should be. In practice a compiler would not
perform lots of additions or subtractions to actually calculate the result.

As you've noted in a lot of cases it doesn't have to do anything at all
except reinterpret a bit pattern according to a new type.

Lawrence
 
M

Mac

Le 17/06/2005 18:20, dans (e-mail address removed), « Mac »


Only on 2-complement architectures, but the standard envisages three
possibilities.

Right. As far as I'm concerned, all typical architectures use two's
complement representations. Also note I say "may well be," not "is" or
"must be."

[snip]
I don't see why. The additions required by the standard are on mathematical
values, not on registers. Sections 6.2.6.2 and 6.3.1.3 do no rely
particularly on (or emulate) 2-complement behaviour.

I'm only talking about the conversion from signed to unsigned here. The
rule doesn't explicitly say that the result must be the same as if two's
complement representation is used, but that is the result. Why would this
be a coincidence?

Probably the folks writing the standard did not want to leave
signed-to-unsigned conversions implementation-defined, so they specified
the behavior to be the most natural thing for two's-complement machines.
This is just a guess on my part.

I did not mean to make any claims regarding any other arithmetic
issues.
At least there were. IBM 704 had 36 bits words, with 1 sign bit and 35 bits
of magnitude. There may be more modern machines with same kind of
arithmetic, I just looked for one :) Reference is: "IBM 704, Manual
Of Operation, 1955" at www.bitsavers.org. Of course nothing to do with the C
language, just an example of a different machine. If anybody knows of modern
ones, I'm of course interested.

On a system which uses sign-magnitude representation, aren't all positive
integers represented the same way, regardless of whether the type is
signed or unsigned? Or is the sign convention that 1 is positive?

Anyway, I know there are lots of architectures out there, but I hesitate
to call most of them typical. And non two's-complement machines seem to be
getting rarer with every passing decade. Note that I am not advocating
ignoring the standard, or writing code which has undefined behavior.

--Mac
 
J

Jean-Claude Arbaut

Le 17/06/2005 23:03, dans (e-mail address removed), « Mac »
Only on 2-complement architectures, but the standard envisages three
possibilities.

Right. As far as I'm concerned, all typical architectures use two's
complement representations. Also note I say "may well be," not "is" or
"must be."

[snip]
I don't see why. The additions required by the standard are on mathematical
values, not on registers. Sections 6.2.6.2 and 6.3.1.3 do no rely
particularly on (or emulate) 2-complement behaviour.

I'm only talking about the conversion from signed to unsigned here. The
rule doesn't explicitly say that the result must be the same as if two's
complement representation is used, but that is the result. Why would this
be a coincidence?

Probably the folks writing the standard did not want to leave
signed-to-unsigned conversions implementation-defined, so they specified
the behavior to be the most natural thing for two's-complement machines.
This is just a guess on my part.

I didn't understand that way the first time. Your guess is quite
reasonable.
I did not mean to make any claims regarding any other arithmetic
issues.

That's what made me understand :)
On a system which uses sign-magnitude representation, aren't all positive
integers represented the same way, regardless of whether the type is
signed or unsigned? Or is the sign convention that 1 is positive?

Yes, but we were interested in the conversion -4 -> 252, so *negative*
signed chars. In case you convert a positive signed char to an unsigned
char, section 6.3.1.3#1 says the value shall not change if it is
representable. Since signed/unsigned types have the same size by 6.2.5#6,
I assume a positive signed char is always representable as an unsigned
char. Hence the *value* won't change. Now for the representation:
section 6.2.6.2#2 says there is sign correction only when the sign bit
is one, this means a positive signed char always have sign bit 0,
hence there is *nothing* to do during conversion. I hope I got right
in my interpretation of the standard. Otherwise, a guru will soon yell
at me, _again_ ;-)

Anyway, I know there are lots of architectures out there, but I hesitate
to call most of them typical.

Well I do too, but saying so one is often accused of thinking
there are only wintels in the world :)
And non two's-complement machines seem to be
getting rarer with every passing decade. Note that I am not advocating
ignoring the standard, or writing code which has undefined behavior.

You say that to ME !!! Well, if you read my recent posts, you'll
see I am not advocating enforcing the standard too strongly ;-)
I am merely discovering the standard, and I must admit it's a shame
I have programmed in C for years without knowing a line from it.
It's arid at first glance, but it deserves deeper reading.
Wow, I said *that* ? ;-)
 
P

pete

Consider,
signed char sc = -4; /* binary = 11111100 */
unsigned char uc = sc;

Now, if I print the value of sc it is 252 (binary 11111100).

I think you mean uc
So, if you see no conversion has been done.
The bit values in "uc" are exactly the same as in "sc".

That doesn't matter.
252 isn't -4.
A conversion has been done.
Then, whats the need of performing so many additions/subtractions ?

-4 could also be either 11111011 or 10000100

The subtractions are a procedure that produces
the correct result regardless of representation.

When the represention is known, then easier ways can be used,
like interpreting
((unsigned char)sc)
as
(*(unsigned char *)&sc),
as your two's complement system may.
 
J

junky_fellow

Mac said:
Le 17/06/2005 18:20, dans (e-mail address removed), « Mac »


Only on 2-complement architectures, but the standard envisages three
possibilities.

Right. As far as I'm concerned, all typical architectures use two's
complement representations. Also note I say "may well be," not "is" or
"must be."

[snip]
I don't see why. The additions required by the standard are on mathematical
values, not on registers. Sections 6.2.6.2 and 6.3.1.3 do no rely
particularly on (or emulate) 2-complement behaviour.

I'm only talking about the conversion from signed to unsigned here. The
rule doesn't explicitly say that the result must be the same as if two's
complement representation is used, but that is the result. Why would this
be a coincidence?

Probably the folks writing the standard did not want to leave
signed-to-unsigned conversions implementation-defined, so they specified
the behavior to be the most natural thing for two's-complement machines.
This is just a guess on my part.

I did not mean to make any claims regarding any other arithmetic
issues.
At least there were. IBM 704 had 36 bits words, with 1 sign bit and 35 bits
of magnitude. There may be more modern machines with same kind of
arithmetic, I just looked for one :) Reference is: "IBM 704, Manual
Of Operation, 1955" at www.bitsavers.org. Of course nothing to do with the C
language, just an example of a different machine. If anybody knows of modern
ones, I'm of course interested.

On a system which uses sign-magnitude representation, aren't all positive
integers represented the same way, regardless of whether the type is
signed or unsigned? Or is the sign convention that 1 is positive?

Anyway, I know there are lots of architectures out there, but I hesitate
to call most of them typical. And non two's-complement machines seem to be
getting rarer with every passing decade. Note that I am not advocating
ignoring the standard, or writing code which has undefined behavior.

--Mac

OK. I got it. I had this doubt because I thought that 2's complement is
the
only way to represent the negative integer. That is why I was wondering
why
we need to do so many operations to convert a signed int to unsigned
int.

But, in practical, is there any significance of converting signed int
to
unsigned int ? Do we ever do this in real world ?
If we don't and it doesn't have any practical significance, then why
not give
an error just at compile time ?

Similarly, is there any rule for converting an unsigned char to signed
char.
For eg: How, unsigned char = 0xFF will be converted to signed char ?
And is there any significance of this ?
 
C

Clark S. Cox III

OK. I got it. I had this doubt because I thought that 2's complement is
the
only way to represent the negative integer. That is why I was wondering
why
we need to do so many operations to convert a signed int to unsigned
int.

We don't actually *need* all of the operations, as long as we get the
same result as we would have had we performed them all.
But, in practical, is there any significance of converting signed int
to
unsigned int ? Do we ever do this in real world ?

Sure:

unsigned int u;
u = 1; //Converts the signed int (1) to an unsigned int

If we don't and it doesn't have any practical significance, then why
not give
an error just at compile time ?

Similarly, is there any rule for converting an unsigned char to signed
char.

No, that is implementation defined. From the standard:
"Otherwise, the new type is signed and the value cannot be represented
in it; either the
result is implementation-defined or an implementation-defined signal is
raised. "

For eg: How, unsigned char = 0xFF will be converted to signed char ?
 
C

CBFalconer

.... snip ...

But, in practical, is there any significance of converting signed
int to unsigned int ? Do we ever do this in real world ? If we
don't and it doesn't have any practical significance, then why
not give an error just at compile time ?

Similarly, is there any rule for converting an unsigned char to
signed char. For eg: How, unsigned char = 0xFF will be
converted to signed char ? And is there any significance of this ?

If the unsigned entities value doesn't fit in the signed value,
behaviour is undefined. So conversion in that direction is
dangerous.

Please fix your line length. I had to reformat your article.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top