Style question: Use always signed integers or not?

J

Juha Nieminen

I was once taught that if some integral value can never have negative
values, it's a good style to use an 'unsigned' type for that: It's
informative, self-documenting, and you are not wasting half of the value
range for values which you will never be using.

I agreed with this, and started to always use 'unsigned' whenever
negative values wouldn't make any sense. I did this for years.

However, I slowly changed my mind: Doing this often causes more
problems than it's worth. A good example:

If you have an image class, its width and height properties can
*never* get negative values. They will always be positive. So it makes
sense to use 'unsigned' for the width and height, doesn't it? What
problems could that ever create?

Well, assume that you are drawing images on screen, by pixel
coordinates, and these images can be partially (or completely) outside
the screen. For example, the left edge coordinate of the image to be
drawn might have a negative x value (for example drawing a 100x100 image
at coordinates <-20, 50>). Since the coordinates are signed and the
dimensions of the image are unsigned, this may cause signed-unsigned
mixup. For example this:

if(x - width/2 < 1) ...

where 'x' is a signed integer, gives *different* results depending on
whether 'width' is signed or unsigned, with certain values of those
variables (for example x=2 and width=10). The intention is to treat
'width' here as a signed value, but if it isn't, the comparison will
malfunction (without explicitly casting 'width' to a signed value). This
may well go completely unnoticed because compilers might not even give
any warning (for example gcc doesn't).

Thus at some point I started to *always* use signed integers unless
there was a very good reason not to. (Of course this sometimes causes
small annoyances because STL containers return an unsigned value for
their size() functions, but that's usually not a big problem.)

It would be interesting to hear other opinions on this subject.
 
E

Erik Wikström

I was once taught that if some integral value can never have negative
values, it's a good style to use an 'unsigned' type for that: It's
informative, self-documenting, and you are not wasting half of the value
range for values which you will never be using.

I agreed with this, and started to always use 'unsigned' whenever
negative values wouldn't make any sense. I did this for years.

However, I slowly changed my mind: Doing this often causes more
problems than it's worth. A good example:

If you have an image class, its width and height properties can
*never* get negative values. They will always be positive. So it makes
sense to use 'unsigned' for the width and height, doesn't it? What
problems could that ever create?

Well, assume that you are drawing images on screen, by pixel
coordinates, and these images can be partially (or completely) outside
the screen. For example, the left edge coordinate of the image to be
drawn might have a negative x value (for example drawing a 100x100 image
at coordinates <-20, 50>). Since the coordinates are signed and the
dimensions of the image are unsigned, this may cause signed-unsigned
mixup. For example this:

if(x - width/2 < 1) ...

where 'x' is a signed integer, gives *different* results depending on
whether 'width' is signed or unsigned, with certain values of those
variables (for example x=2 and width=10).

Wow, that was certainly an eye-opener. I had assumed that in this case
both values would be promoted to some larger type (signed long) which
could accurately represent both values (the signed and the unsigned int)
but apparently not.

This is a defect in the standard in my opinion since it allows the
correct action to be taken for types smaller than int:

#include <iostream>

int main()
{
unsigned short width = 10;
short x = 2;
std::cout << (x - width);
return 0;
}
 
C

Chris Gordon-Smith

Juha said:
I was once taught that if some integral value can never have negative
values, it's a good style to use an 'unsigned' type for that: It's
informative, self-documenting, and you are not wasting half of the value
range for values which you will never be using.

I agreed with this, and started to always use 'unsigned' whenever
negative values wouldn't make any sense. I did this for years.

However, I slowly changed my mind: Doing this often causes more
problems than it's worth. A good example:

If you have an image class, its width and height properties can
*never* get negative values. They will always be positive. So it makes
sense to use 'unsigned' for the width and height, doesn't it? What
problems could that ever create?

Well, assume that you are drawing images on screen, by pixel
coordinates, and these images can be partially (or completely) outside
the screen. For example, the left edge coordinate of the image to be
drawn might have a negative x value (for example drawing a 100x100 image
at coordinates <-20, 50>). Since the coordinates are signed and the
dimensions of the image are unsigned, this may cause signed-unsigned
mixup. For example this:

if(x - width/2 < 1) ...

where 'x' is a signed integer, gives *different* results depending on
whether 'width' is signed or unsigned, with certain values of those
variables (for example x=2 and width=10). The intention is to treat
'width' here as a signed value, but if it isn't, the comparison will
malfunction (without explicitly casting 'width' to a signed value). This
may well go completely unnoticed because compilers might not even give
any warning (for example gcc doesn't).

Thus at some point I started to *always* use signed integers unless
there was a very good reason not to. (Of course this sometimes causes
small annoyances because STL containers return an unsigned value for
their size() functions, but that's usually not a big problem.)

It would be interesting to hear other opinions on this subject.

Yes, I have had similar experiences / thoughts.

Some while back I thought it would be a good idea to use unsigned integers.
I don't think I read anywhere that this is good practice, and as a spare
time programmer I have never been on a C++ course; it just seemed 'obvious'
that using unsigned for values that can never be negative would be safer.

So I tried using unsigned integers in my simulation project. However I soon
started finding that I was getting compiler warnings about unsigned /
signed conversions. When I started trying to change more variables to
unsigned to stop the warnings, I just started getting more warnings.

I pretty soon abandoned the whole thing on the basis that it was more
trouble than it was worth (but also feeing slightly guilty).

I supposed that part of the problem might have been that I was trying to
retro-fit existing code, rather than coding from the outset to use
unsigned. However, the example given by Juha above makes me think that this
is not just a problem of retro-fitting. There seem to be scenarios where
using unsigned int is positively dangerous, particularly in scenarios where
the compiler doesn't generate a warning (and I've just confirmed on my own
gcc setup that there are such scenarios).

Rather than using unsigned, I use Assertions liberally, and having seen the
danger of signed/unsigned conversions, I think this is the right approach.

Chris Gordon-Smith
www.simsoup.info
 
C

Chris Forone

Erik said:
Wow, that was certainly an eye-opener. I had assumed that in this case
both values would be promoted to some larger type (signed long) which
could accurately represent both values (the signed and the unsigned int)
but apparently not.

This is a defect in the standard in my opinion since it allows the
correct action to be taken for types smaller than int:

#include <iostream>

int main()
{
unsigned short width = 10;
short x = 2;
std::cout << (x - width);
return 0;
}

is -8 (gcc 3.4.5/adam riese)...
 
D

Darío Griffo

I was once taught that if some integral value can never have negative
values, it's a good style to use an 'unsigned' type for that: It's
informative, self-documenting, and you are not wasting half of the value
range for values which you will never be using.

I agreed with this, and started to always use 'unsigned' whenever
negative values wouldn't make any sense. I did this for years.

However, I slowly changed my mind: Doing this often causes more
problems than it's worth. A good example:

If you have an image class, its width and height properties can
*never* get negative values. They will always be positive. So it makes
sense to use 'unsigned' for the width and height, doesn't it? What
problems could that ever create?

Well, assume that you are drawing images on screen, by pixel
coordinates, and these images can be partially (or completely) outside
the screen. For example, the left edge coordinate of the image to be
drawn might have a negative x value (for example drawing a 100x100 image
at coordinates <-20, 50>). Since the coordinates are signed and the
dimensions of the image are unsigned, this may cause signed-unsigned
mixup. For example this:

if(x - width/2 < 1) ...

where 'x' is a signed integer, gives *different* results depending on
whether 'width' is signed or unsigned, with certain values of those
variables (for example x=2 and width=10). The intention is to treat
'width' here as a signed value, but if it isn't, the comparison will
malfunction (without explicitly casting 'width' to a signed value). This
may well go completely unnoticed because compilers might not even give
any warning (for example gcc doesn't).

Thus at some point I started to *always* use signed integers unless
there was a very good reason not to. (Of course this sometimes causes
small annoyances because STL containers return an unsigned value for
their size() functions, but that's usually not a big problem.)

It would be interesting to hear other opinions on this subject.

It's a good moment for re-read Stroustrups The C++ Programming
Language
He talks about usual conversions un binary operators.
Your example: if(x - width/2 < 1) ...
considering x as int and width as unsigned, should return a unsigned
value.
There are a lot more of basic conversion rules, but i think it fits
here.
I've assumed (until today) the compiler autocast width to signed
before makes the comparison, but it is not true.


About your question, I still think that if the value will never have a
negative value, it should be unsigned, because of the reasons you had
told. The example of the properties of an image is valid, for the
first argument, width and height properties will allways be non
negative, but another thing is to calculate where you're going to draw
them. Where your are going to draw is a coordinate, and they certainly
can be negative, so they have the be signed values. But we where
mixing the concept of a coordinate with a width and height. Booth are
related at drawing time, but not the same thing.

Darío
 
T

Tomás Ó hÉilidhe

  If you have an image class, its width and height properties can
*never* get negative values. They will always be positive. So it makes
sense to use 'unsigned' for the width and height, doesn't it? What
problems could that ever create?

  Well, assume that you are drawing images on screen, by pixel
coordinates, and these images can be partially (or completely) outside
the screen. For example, the left edge coordinate of the image to be
drawn might have a negative x value (for example drawing a 100x100 image
at coordinates <-20, 50>).


I agree that this is a case where you'd consider using signed integer
types, but I'm still a stedfast unsigned man. My major peeve with
signed integer types is their undefined behaviour upon overflow.

(Of course the less pretty solution is to use casts in your code)
 
E

Erik Wikström

is -8 (gcc 3.4.5/adam riese)...

Exactly, but if you use int instead of short you get 4294967288, because
the unsigned int is not promoted to a signed long.
 
D

Daniel Pitts

Paavo said:
You mean that you are fond of the following code having no undefined
behavior? Sorry, I have not encountered a need for this kind of defined
behavior!

#include <iostream>

int main() {
unsigned int a = 2150000000u;
unsigned int b = 2160000000u;
unsigned int c = (a+b)/2;

std::cout << "a=" << a << "\nb=" << b <<
"\nand their average\nc=" << c << "\n";
}

Output (32-bit unsigned ints):

a=2150000000
b=2160000000
and their average
c=7516352

The unsigned integers in C/C++ are very specific cyclic types with
strange overflow and wrapping rules. IMHO, these should be used primarily
in some very specific algorithms needing such cyclic types.

Regards
Paavo
I don't have a copy of the standard, but does the standard actually
define unsigned integral types as having that overflow behavior? Or is
that just the "most common case"?

I'm not questioning to make a point, I really would like to know the answer.

Thanks,
Daniel.
 
K

Kai-Uwe Bux

[snip]

The wrapping isn't all that strange.

I don't have a copy of the standard, but does the standard actually
define unsigned integral types as having that overflow behavior? Or is
that just the "most common case"?

I'm not questioning to make a point, I really would like to know the
answer.

Overflow for signed arithmetic types is undefined behavior [5/5]. Unsigned
integer types have arithmetic mod 2^N where N is a the bitlength [3.9.1/4].


Best

Kai-Uwe Bux
 
K

Kai-Uwe Bux

Paavo said:
Daniel Pitts said:
I don't have a copy of the standard, but does the standard actually
define unsigned integral types as having that overflow behavior? Or is
that just the "most common case"?

I'm not questioning to make a point, I really would like to know the
answer.

Thanks,
Daniel.

AFAIK the unsigned arithmetics is specified exactly by the standard. This
means for example that a debug implementation cannot detect and assert the
overflow, but has to produce the wrapped result instead:

<quote>

3.9.1/4:

Unsigned integers, declared unsigned, shall obey the laws of arithmetic
modulo 2^n where n is the number of bits in the value representation of
that particular size of integer.

Footnote: This implies that unsigned arithmetic does not overflow because
a result that cannot be represented by the resulting unsigned integer type
is reduced modulo the number that is one greater than the largest value
that can be represented by the resulting unsigned integer
type.

</quote>

In the above example one should also consider the effects of integral
promotion - the internal calculations are done in unsigned int, which
means that a similar example with unsigned shorts or chars would appear to
work correctly. However, I am not sure where the standard says that
3.9.1/4 can be violated by promoting the operands to a larger type.
[5.9]

By considering 3.9.1/4 only it seems that one should have:

unsigned char a=128u, b=128u, c=(a+b)/2; assert(c==0);

Fortunately, this is not the case with my compilers ;-)

I am not so sure whether we are really fortunate to have integral
promotions.


Best

Kai-Uwe Bux
 
J

Jerry Coffin

I was once taught that if some integral value can never have negative
values, it's a good style to use an 'unsigned' type for that: It's
informative, self-documenting, and you are not wasting half of the value
range for values which you will never be using.

I agreed with this, and started to always use 'unsigned' whenever
negative values wouldn't make any sense. I did this for years.

However, I slowly changed my mind: Doing this often causes more
problems than it's worth. A good example:

There was a rather long thread on this very subject a few years ago:

http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread
/840f37368aefda4e/85b1ac5149a6bd6f#85b1ac5149a6bd6f
 
J

James Kanze

I was once taught that if some integral value can never have
negative values, it's a good style to use an 'unsigned' type
for that: It's informative, self-documenting, and you are not
wasting half of the value range for values which you will
never be using.

It would be good style to use a cardinal integral type, if C++
had such. For better or for worse, the unsigned types in C++
are not really a good abstraction of cardinals (they use modulo
arithmetic), and the implicit conversion rules between signed
and unsigned cause all sorts of problems.

The result is: the "standard" integral type in C++ is "int".
Any other type should only be used to fulfill a very specific
need. And the unsigned types should be avoided except where you
need the modulo arithmetic, or you are actually dealing with
bits.

Up to a point. Even more important is to avoid mixing signed
and unsigned (again, because of the conversion rules). Which
means that if you're stuck using a library (like the standard
library) which uses unsigned, you should usually use unsigned
when interfacing it. Which leads to a horrible mixture in your
own code, but the alternatives seem worse.

[...]
It would be interesting to hear other opinions on this
subject.

The problem here is that there is a difference between theory
and practice, mainly because of all of the implicit and
unchecked conversions, but also because the unsigned types do
not really model a cardinal as well as they should.
 
J

Juha Nieminen

Erik said:
Exactly, but if you use int instead of short you get 4294967288, because
the unsigned int is not promoted to a signed long.

What would be the difference between promoting an unsigned int to a
signed int vs. to a signed long in an architecture where int and long
are the same thing (ie. basically all 32-bit systems)?
 
J

Juha Nieminen

James said:
Up to a point. Even more important is to avoid mixing signed
and unsigned (again, because of the conversion rules). Which
means that if you're stuck using a library (like the standard
library) which uses unsigned, you should usually use unsigned
when interfacing it. Which leads to a horrible mixture in your
own code, but the alternatives seem worse.

I find myself constantly writing code like this:

if(size_t(amount) >= table.size())
table.resize(amount+1);

and:

int Class::elementsAmount() const
{
return int(table.size());
}

I don't like those explicit casts. They are awkward and feel
dangerous, but sometimes there just isn't any way around them.
Especially in the latter case there is a danger of overflow if the table
is large enough, but...

It's a bit of a dilemma.
 
E

Erik Wikström

What would be the difference between promoting an unsigned int to a
signed int vs. to a signed long in an architecture where int and long
are the same thing (ie. basically all 32-bit systems)?

None, but then the limitation would (partially) be in the platform and
not in the language. What I complain about is the fact that integer
promotion is specified for types "smaller" than int, but not for
"larger" types. Considering that many desktop machines now easily can
handle types larger than int (which is often 32 bits even on 64-bit
machines) this seems a bit short-sighted to me. I can see no reason to
allow integer promotion for all integer types.
 
E

Erik Wikström

None, but then the limitation would (partially) be in the platform and
not in the language. What I complain about is the fact that integer
promotion is specified for types "smaller" than int, but not for
"larger" types. Considering that many desktop machines now easily can
handle types larger than int (which is often 32 bits even on 64-bit
machines) this seems a bit short-sighted to me. I can see no reason to
allow integer promotion for all integer types.

I meant: "I can see no reason to *not* allow integer promotion for all
integer types."
 
J

Jerry Coffin

[ ... ]
AFAIK the unsigned arithmetics is specified exactly by the standard. This means for
example that a debug implementation cannot detect and assert the overflow, but has to
produce the wrapped result instead:

Unsigned arithmetic is defined in all but one respect: the sizes of the
integer types. IOW, you know how overflow is handled when it does
happen, but you don't (portably) know when it'll happen.
 
K

Kai-Uwe Bux

Jerry said:
[ ... ]
AFAIK the unsigned arithmetics is specified exactly by the standard. This
means for example that a debug implementation cannot detect and assert
the overflow, but has to produce the wrapped result instead:

Unsigned arithmetic is defined in all but one respect: the sizes of the
integer types. IOW, you know how overflow is handled when it does
happen, but you don't (portably) know when it'll happen.

a) There is <limits>, which will tell you portably the bounds of built-in
types.

b) With unsigned integers, you can check for overflow easily:

unsigned int a = ...;
unsigned int b = ...;
unsigned int sum = a + b;
if ( sum < a ) {
std::cout << "overflow happened.\n"
}

It's somewhat nice that you can check for the overflow _after_ you did the
addition (this does not necessarily work with signed types). Also, the
check is usually a bit cheaper than for signed types (in the case of
addition, subtraction, and division a single comparison is enough; I did
not think too hard about multiplication).


Best

Kai-Uwe Bux
 
J

Jerry Coffin

[email protected] says... said:
a) There is <limits>, which will tell you portably the bounds of built-in
types.

Yes, but 1) it doesn't guarantee that the size you need will be present,
and 2) even if the size you need is present, it may not be represented
completely accurately.

For example, on one (admittedly old) compiler, <limits.h> said that that
SCHAR_MIN was -127 -- but this was on a twos-complement machine where
the limit was really -128. At the time the committee discussed it, and
at least from what I heard, agreed that this didn't fit what they
wanted, but DID conform with the requirements of the standard.

Even when/if <limits.h> does contain the correct data, and the right
size is present, you can end up with a rather clumsy ladder to get the
right results.

#if (unsigned char is the right size)
typedef unisgned char typeIneed;
#elif (unsigned short is the right size)
typedef unsigned short typeIneed;
#elif (unsigned int is the right size)
typedef unsigned int typeIneed;
#elif (unsigned long is the right size)
typedef unsigned long typeIneed;
#else
#error correct size not present?
#endif

Fortunately, you most often care about sizes like 16 and 32 bits. C99
and C++ 0x allow you to get sizes like them much more easily, using
uintXX_t.
 
K

Kai-Uwe Bux

Jerry said:

Let's restore some important context: You said:
[ ... ]
AFAIK the unsigned arithmetics is specified exactly by the standard.
This means for example that a debug implementation cannot detect and
assert the overflow, but has to produce the wrapped result instead:

Unsigned arithmetic is defined in all but one respect: the sizes of the
integer types. IOW, you know how overflow is handled when it does
happen, but you don't (portably) know when it'll happen.

Note the claim that one cannot know when overflow will happen.

To _that_, I answered:

And now, you say:
Yes, but 1) it doesn't guarantee that the size you need will be present,

which is completely unrelated to the question whether you can portably
detect overflow, and:
and 2) even if the size you need is present, it may not be represented
completely accurately.

which is just FUD.
For example, on one (admittedly old) compiler, <limits.h> said that that
SCHAR_MIN was -127 -- but this was on a twos-complement machine where
the limit was really -128. At the time the committee discussed it, and
at least from what I heard, agreed that this didn't fit what they
wanted, but DID conform with the requirements of the standard.

Yes. And on that compiler, SCHAR_MIN _was_ -127. That _means_ that the
implementation made no guarantees about behavior when computations
reach -128 even though experiments or documentation about the computer
architecture suggest you could there. The point is that SCHAR_MIN _defines_
what counts as overflow for the signed char type.

In short: contrary to your claim, you can portably detect overflow (and it
is particularly easy for unsigned types, where issues like the one with
SCHAR_MIN cannot happen).


Best

Kai-Uwe Bux
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top