Three questions about signed/unsigned type representations

R

Rade

Following a discussion on another thread here... I have tried to understand
what is actually standardized in C++ regarding the representing of integers
(signed and unsigned) and their conversions. The reference should be 3.9.1
(Fundamental types), and 4.7 (Integral conversions).

It seems to me that the Standard doesn't specify:

1) The "value representation" of any of these types, except that (3.9.1/3)
"... The range of nonnegative values of a signed integer type is a subrange
of the corresponding unsigned integer type, and the value representation of
each corresponding signed/unsigned type shall be the same."

2) The conversion from an unsigned type to a signed type when the signed
type can't represent the value.

I have three questions:

1) The cited sentence from 3.9.1/3 is not clear to me. How, for example, the
value representation for int and unsigned int can be the same as they have
different sets of values? Shouldn't this sentence be restricted just to
representations of these values that are common to both types (i.e.
nonnegative ints) ?

2) Am I missing something, or there is really no word in the Standard about
the "value representations" of unsigned types? 3.9.1/4 obviously describes
just the arithmetic laws for unsigned types, but has nothing with their
"value representations". Imagine we use 2 bits to represent unsigned int,
with the set of values 0-3 and set of bit patterns 00, 01, 10, 11. One would
expect that the following condition is always true:

n == (n & 1) + (n & (1 << 1));

(deliberately mixing arithmetic and bitwise operators). However, what if the
value 1 is represented as 11 and 2 is represented as 00 (does anything in
the Standard preclude such a representation?)? Then 1 << 1 is 2 (i.e. 00),
the first operand of + is equal to n, the second is equal to 00, i.e. 2, and
the result on the right side is actually n+2 (mod 4), not n.

(see also 5.8: operators << and >> are actually arithmetic operators, not
bitwise operators, as they are defined on values, not on representations.
This means that even replacing + with | wouldn't help in some cases).

If this is possible (at least on some weird but conforming implementation of
the Standard), then one could hardly find a portable nontrivial C++ program
in the world.

3) I haven't seen any additional rules for conversions from unsigned to
signed types in case the signed type has the same (or greater - can it
happen with a conforming implementation???) number of bits as the unsigned
type, but still can't contain the value from the unsigned type. For
example - conversions from unsigned int to int when the unsigned int value
is greater than INT_MAX... (the opposite conversion is well-defined by
4.7/2). If this is implementation-defined, that would mean, for example:

// -1 is int and is converted to 2^n-1 by 4.7/2
unsigned minusOneUnsigned = -1;

// implementation-defined, may be -0 on 1's complement machine
int minusOneInt = minusOneUnsigned;

// modulo 2^n conversion again, but of which value ?
unsigned minusOneUnsignedAgain = minusOneInt;

- we can't tell what the value of minusOneUnsignedAgain will be, after a
double conversion.

My question is: does C++ standard specify anything in this case? I can
understand that our minusOneUnsignedAgain can be 0, after all, instead of
2^n-1, but this seems counterintuitive to me.

Regards,
Rade
 
A

Andrew Koenig

1) The cited sentence from 3.9.1/3 is not clear to me. How, for example,
the value representation for int and unsigned int can be the same as they
have different sets of values? Shouldn't this sentence be restricted just
to representations of these values that are common to both types (i.e.
nonnegative ints) ?

It implicitly is. How could it be saying anything about the representations
of different values?
2) Am I missing something, or there is really no word in the Standard
about the "value representations" of unsigned types? 3.9.1/4 obviously
describes just the arithmetic laws for unsigned types, but has nothing
with their "value representations". Imagine we use 2 bits to represent
unsigned int, with the set of values 0-3 and set of bit patterns 00, 01,
10, 11. One would expect that the following condition is always true:

n == (n & 1) + (n & (1 << 1));

(deliberately mixing arithmetic and bitwise operators). However, what if
the value 1 is represented as 11 and 2 is represented as 00 (does anything
in the Standard preclude such a representation?)? Then 1 << 1 is 2 (i.e.
00), the first operand of + is equal to n, the second is equal to 00, i.e.
2, and the result on the right side is actually n+2 (mod 4), not n.

The logical operators operate on the arithmetic values of their operands,
irrespective of their representations. 3|5 is 7 regardless of how the values
are represented internally.
(see also 5.8: operators << and >> are actually arithmetic operators, not
bitwise operators, as they are defined on values, not on representations.
This means that even replacing + with | wouldn't help in some cases).

If this is possible (at least on some weird but conforming implementation
of the Standard), then one could hardly find a portable nontrivial C++
program in the world.
Why?

3) I haven't seen any additional rules for conversions from unsigned to
signed types in case the signed type has the same (or greater - can it
happen with a conforming implementation???) number of bits as the unsigned
type, but still can't contain the value from the unsigned type. For
example - conversions from unsigned int to int when the unsigned int value
is greater than INT_MAX... (the opposite conversion is well-defined by
4.7/2). If this is implementation-defined, that would mean, for example:

// -1 is int and is converted to 2^n-1 by 4.7/2
unsigned minusOneUnsigned = -1;
Yes.

// implementation-defined, may be -0 on 1's complement machine
int minusOneInt = minusOneUnsigned;

If the range of integers includes 2^n-1 (which it probably doesn't), then
minusOneInt will be 2^n-1. Otherwise it's undefined.
// modulo 2^n conversion again, but of which value ?
unsigned minusOneUnsignedAgain = minusOneInt;

Of whatever value minusOneInthas.
- we can't tell what the value of minusOneUnsignedAgain will be, after a
double conversion.
Right.

My question is: does C++ standard specify anything in this case? I can
understand that our minusOneUnsignedAgain can be 0, after all, instead of
2^n-1, but this seems counterintuitive to me.

It can probably be anything. In particular, implementations are allowed to
terminate a program that tries to do an unsigned->signed conversion to a
value that won't fit.
 
R

Rade

Thank you very much for your answers, the points (1) and (3) are now
completely clear to me. I am sorry you answer to (2) has actually raised
more questions that I had before:

Andrew Koenig said:
The logical operators operate on the arithmetic values of their operands,
irrespective of their representations. 3|5 is 7 regardless of how the
values are represented internally.

I see... That was actually the source of my confusion for the question (2).

But... you can't actually perform bitwise operations on the values, you must
imply some representation. In your example, 3 is implicitly represented as
011 and 5 as 101 and you have 111 which is in turn a representation of 7.

If I understand you properly, the C++ Standard doesn't specify the
*internal* representation of values, but it specifies *this* (let me call it
"implicit" - sorry for introducing nonstandard terms) representation that is
necessary to perform bitwise operations. If this is so, can you tell me
which part of the Standard specifies this "implicit" representation ?

Also, is there an "implicit" representation specified for negative numbers
(as you can perform bitwise operations on signed numbers as well), or it is
"implementation dependent" (I would expect the latter) ? If the latter is
the case, is the implementation allowed to terminate the program on
computation of a result of a bitwise operation in some cases?

Particularily, will an implementation that performs bitwise operations on,
say, ints by first converting them (by 4.7/2) to unsigned ints, then
performing the same operations on unsigned ints, and then converting the
result again to int (with a possibility of overflow which results in
terminating the program) be compliant to the C++ Standard?

Regards,
Rade
 
J

Jack Klein

Thank you very much for your answers, the points (1) and (3) are now
completely clear to me. I am sorry you answer to (2) has actually raised
more questions that I had before:



I see... That was actually the source of my confusion for the question (2).

But... you can't actually perform bitwise operations on the values, you must
imply some representation. In your example, 3 is implicitly represented as
011 and 5 as 101 and you have 111 which is in turn a representation of 7.

If I understand you properly, the C++ Standard doesn't specify the
*internal* representation of values, but it specifies *this* (let me call it
"implicit" - sorry for introducing nonstandard terms) representation that is
necessary to perform bitwise operations. If this is so, can you tell me
which part of the Standard specifies this "implicit" representation ?

Your statement above is not entirely true. The C++ standard does not
specify the internal representation of many types of objects, but it
does specifically impose some requirements on the representations of
all the integer types, signed and unsigned, specifically in 3.9.1#7
does this:

"Types bool, char, wchar_t, and the signed and unsigned integer types
are collectively called integral types.43) A synonym for integral type
is integer type. The representations of integral types shall define
values by use of a pure binary numeration system.44) [Example: this
International Standard permits 2’s complement, 1’s complement and
signed magnitude representations for integral types.]"

And the explanatory text in footnote 44:

"44) A positional representation for integers that uses the binary
digits 0 and 1, in which the values represented by successive bits are
additive, begin with 1, and are multiplied by successive integral
power of 2, except perhaps for the bit with the highest position.
(Adapted from the American National Dictionary for Information
Processing Systems.)"

The terms "2's complement", "1's complement", and "signed magnitude"
are not themselves defined in the standard. They are better defined
in the 1999 and later versions of the C standard, but C++ is based on
the 1995 C standard. Perhaps they are assumed to be either common
knowledge among programmers, or easily referenced from the source
cited in the footnote or others.

So all integer types must appear to the C++ program the way they do
even if the hardware were somewhat different.
Also, is there an "implicit" representation specified for negative numbers
(as you can perform bitwise operations on signed numbers as well), or it is
"implementation dependent" (I would expect the latter) ? If the latter is
the case, is the implementation allowed to terminate the program on
computation of a result of a bitwise operation in some cases?

The 1998 C standard (and later versions with TC1 and TC2 added, so I
suppose it is now actually C 2004), does a much better job of defining
some of these things than either C++ 98 or versions of the C standard
prior to 1999.

Both C and C++ allow for the fact that certain bit patterns in an
object might not represent a valid value for that type. C, from 1999
on, uses the term "trap representation" for such, although C++ has no
specific term.

Uninitialized values might be trap representations, as might be
objects stored via one lvalue (object type) and then accessed as
another type, unless the accessing type is unsigned char. This
applies to misuse of unions, and type punning. Any access of an
object by an lvalue type, if that object does not represent a valid
value for that type, causes undefined behavior.

You could say that this "allows" the implementation to terminate the
program, in the sense that the C++ standard, like the C standard,
places no requirements at all on a program once it generates undefined
behavior.

There are, or more likely were, platforms where bit-wise operations on
signed integer types might produce an invalid representation, thus
causing undefined behavior.
Particularily, will an implementation that performs bitwise operations on,
say, ints by first converting them (by 4.7/2) to unsigned ints, then
performing the same operations on unsigned ints, and then converting the
result again to int (with a possibility of overflow which results in
terminating the program) be compliant to the C++ Standard?

The issue here is covered in 4.7 Integral conversions, particularly
paragraph 3. When you store the value of one integer type, signed or
unsigned, into a different signed integer type, if the actual value is
within the range of the signed type, you get that value. If the
actual value is outside the range of the destination signed type, the
result is implementation-defined. This is not all the same as
undefined behavior, and an implementation that did so would be
non-conforming.

All other cases of out-range-resulst in signed integer types, whether
by calculation overflow or underflow, or by conversion from a floating
point or pointer value, are undefined. But you seem to think that
overflow or any other instance of undefined behavior must terminate a
program, and this is incorrect. Since the C++ standard places no
requirements on a program once undefined behavior has occurred, it is
not required to terminate or have any other specific action or result.
 
A

Arijit

Jack said:
Your statement above is not entirely true. The C++ standard does not
specify the internal representation of many types of objects, but it
does specifically impose some requirements on the representations of
all the integer types, signed and unsigned, specifically in 3.9.1#7
does this:

"Types bool, char, wchar_t, and the signed and unsigned integer types
are collectively called integral types.43) A synonym for integral type
is integer type. The representations of integral types shall define
values by use of a pure binary numeration system.44) [Example: this
International Standard permits 2’s complement, 1’s complement and
signed magnitude representations for integral types.]"

And the explanatory text in footnote 44:

"44) A positional representation for integers that uses the binary
digits 0 and 1, in which the values represented by successive bits are
additive, begin with 1, and are multiplied by successive integral
power of 2, except perhaps for the bit with the highest position.
(Adapted from the American National Dictionary for Information
Processing Systems.)"

So its not possible to write C++ on a system using BCD arithmetic ? (Not
that I have ever seen a BCD system :) But you can do BCD arithmetic with
Intel processors if you want to)
The terms "2's complement", "1's complement", and "signed magnitude"
are not themselves defined in the standard. They are better defined
in the 1999 and later versions of the C standard, but C++ is based on
the 1995 C standard. Perhaps they are assumed to be either common
knowledge among programmers, or easily referenced from the source
cited in the footnote or others.

So all integer types must appear to the C++ program the way they do
even if the hardware were somewhat different.

What if I do ~,|,&,^ on the numbers ? Will 1's complement, 2's
complement and signed magnitude give same values ? I believe I read
somewhere that they always act as if 2's complement is used, but I
couldn't find the exact sections in the standard.
There are, or more likely were, platforms where bit-wise operations on
signed integer types might produce an invalid representation, thus
causing undefined behavior.

On such a system, is it possible that bitwise operations on unsigned
integers will result in an invalid representation, thus causing UB ?
Also, am I safe in assuming that bitwise operations are inherently unsafe ?

-Arijit
 
J

Jack Klein

Jack said:
Your statement above is not entirely true. The C++ standard does not
specify the internal representation of many types of objects, but it
does specifically impose some requirements on the representations of
all the integer types, signed and unsigned, specifically in 3.9.1#7
does this:

"Types bool, char, wchar_t, and the signed and unsigned integer types
are collectively called integral types.43) A synonym for integral type
is integer type. The representations of integral types shall define
values by use of a pure binary numeration system.44) [Example: this
International Standard permits 2’s complement, 1’s complement and
signed magnitude representations for integral types.]"

And the explanatory text in footnote 44:

"44) A positional representation for integers that uses the binary
digits 0 and 1, in which the values represented by successive bits are
additive, begin with 1, and are multiplied by successive integral
power of 2, except perhaps for the bit with the highest position.
(Adapted from the American National Dictionary for Information
Processing Systems.)"

So its not possible to write C++ on a system using BCD arithmetic ? (Not
that I have ever seen a BCD system :) But you can do BCD arithmetic with
Intel processors if you want to)

One could provide a conforming implementation for such a system if
desired, provided that at least one of it's BCD types held a range of
+/- 2^31. It would be quite inefficient when shifts or bit operations
were used, because the implementation would need to produce the same
result that a true binary representation would.
What if I do ~,|,&,^ on the numbers ? Will 1's complement, 2's
complement and signed magnitude give same values ? I believe I read
somewhere that they always act as if 2's complement is used, but I
couldn't find the exact sections in the standard.

Absolutely not true that they always act as if 2's complement is used.
The bits change according to pure binary rules, but the new bit
pattern produced has its value interpreted based on the
representation.
On such a system, is it possible that bitwise operations on unsigned
integers will result in an invalid representation, thus causing UB ?
Also, am I safe in assuming that bitwise operations are inherently unsafe ?

The unsigned integer types are guaranteed safe under arithmetic, bit
wise, and shift operations, with the possible exception of the
quotient of a division out of range, which is undefined behavior and
some architecture trap in hardware.

All unsigned integer types contain only value bits, and no sign bit.
There are no invalid combinations of value bits in any unsigned
integer type. There are no invalid combinations of value bits in
signed integer types, so long as the value is positive (the sign bit
is not turned on).
 
R

Rade

Jack, thank you for the clarification.

Your statement above is not entirely true. The C++ standard does not
specify the internal representation of many types of objects, but it
does specifically impose some requirements on the representations of
all the integer types, signed and unsigned, specifically in 3.9.1#7
does this:

It does, indeed, except for the highest bit. However, I guess that the
exception is valid only for signed types, i.e. for the unsigned types the
highest bit must bear the value of 2 ^ (n-1). This is somewhat imprecise.
But you seem to think that
overflow or any other instance of undefined behavior must terminate a
program, and this is incorrect.

Not at all. In fact, I would appreciate if there was a compiler switch that
made my favorite compiler create programs that could detect overflows (and
warn somehow, at least in debug builds). Now my compiled programs happily
and silently perform all signed integer computations without any overflow
checks.

Regards,
Rade
 
J

Jack Klein

Jack, thank you for the clarification.




It does, indeed, except for the highest bit. However, I guess that the
exception is valid only for signed types, i.e. for the unsigned types the
highest bit must bear the value of 2 ^ (n-1). This is somewhat imprecise.

There is nothing at all imprecise about it. It is exactly specified.
How can anyone who understands binary values possibly be confused?
Not at all. In fact, I would appreciate if there was a compiler switch that
made my favorite compiler create programs that could detect overflows (and
warn somehow, at least in debug builds). Now my compiled programs happily
and silently perform all signed integer computations without any overflow
checks.

It is extremely unlikely that a compiler will ever offer that option.
There are analysis tools, most of the quite expensive, that will
attempt to detect this sort of problem.

But you will never see it as part of the language. C++ inherits from
C the principal that you don't pay for what you don't want to use. If
there are places in your code where unexpected values might cause an
out of range problem, write the code to check them yourself. There,
and only there, does the program incur the overhead.
 
R

Rade

There is nothing at all imprecise about it. It is exactly specified.
How can anyone who understands binary values possibly be confused?

I am sorry for being so formal. The Standard is obviously written by
engineers, not by lawyers, and I am satisfied even if some definition is
taken from the common knowledge.
It is extremely unlikely that a compiler will ever offer that option.
There are analysis tools, most of the quite expensive, that will
attempt to detect this sort of problem.

This is slightly out of topic, but just to mention: the compiler I was
talking about (VC++ 7.1) actually has a switch (/RTCc) that, if switched on,
checks the conversions from a longer to a shorter type during the runtime.
That is not the same as I was asking for (and also it works in a rather
strange way and I don't like how it works), but illustrates a similar idea.
But you will never see it as part of the language. C++ inherits from
C the principal that you don't pay for what you don't want to use. If
there are places in your code where unexpected values might cause an
out of range problem, write the code to check them yourself. There,
and only there, does the program incur the overhead.

I respect that principle. But I disagree that the sole responsibility for
checks should be on the programmers. (Again OT...) I think that the compiler
is a right place for such an option, because only the machine code generated
by it can benefit from the overflow flags and other similar tools provided
by the target processor. Any check on the language level is rather
cumbersome. Of course, I would like to be able to switch off this option for
the code I want to ship, and I would appreciate a control that is grained at
a finer level than the level of compilation units (actually I wouldn't like
a compiler option, I would like a #pragma, or even better - both!).

Rade
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top