sorting 5million minute data: choose std::list or std::vector

James Kanze · Feb 8, 2011

Eschewing unsigned integral types is sometimes irrational. *If* it
makes no sense to have negative days or minutes then why use a signed
integral type?

Two reasons, really. The first is simple: the *normal* integral
type in C++ is plain int. You don't use anything else unless
plain int won't do the job. You don't use unsigned for the same
reason you don't use long, or short, or whatever. The second is
more fundamental: C++ doesn't have an unsigned type with
appropriate semantics which would be appropriate for days or
minutes, at least in the general case (where you might have to
take the difference between days or minutes). That's not to say
that you should never use unsigned---there are cases where you
want its special semantics. And there are doubtlessly cases
where it really doesn't matter---things like serial numbers, for
example. (But there again, unless there is some reason why int
isn't appropriate, then it should be int. Anything other than
int tells the reader that you need its special
features---smaller size, bigger range, etc.)

James Kanze · Feb 8, 2011

On Feb 7, 1:06 pm, Leigh Johnston <[email protected]> wrote:

[...]

Stick to arguing with Paul, at least you *are* probably smarter than
him.

Ouch. Now that is hitting low.

(Seriously, there is one machine currently being sold today
where the compiler has an option to turn off standard compliant
behavior of unsigned integers, because of the runtime overhead.
But it's not common enough for it to be an issue for most
people. And from other postings in the past, I think Leigh's in
the camp that only Wintel matters.)

Juha Nieminen · Feb 8, 2011

Leigh Johnston said:
Wrong; it is not a sign of bad design at all. If a variable is used as
an unsigned value more often than as a signed value and it logically
makes sense for it to be unsigned (e.g. it is never negative) then
making the variable unsigned makes perfect sense.

What is "more often"? 50%? Do you count the individual instances where
it's being used in the program and change the type accordingly? What if
the percentage changes later?

The major problem with using unsigned types in variables which are being
used in signed expressions is that it's easy to forget the cast, which
easily introduces bugs which might not be immediately obvious.

The most typical case where this introduces bugs is where the unsigned
type is mixed with signed types on an expression that ends up being of a
floating point type. If you forget the cast, a result which should be in
the order of tens might end up to be wrongly in the order of billions.
The problem is that this might not be apparent immediately, only with
certain values (eg. if you are calculating coordinates based on some
input data or whatever).

You might think, for example, "the width and height of an image are
never negative, hence it makes sense to make then unsigned". However,
if the width and height are used to calculate screen coordinates of
objects (which is rather typical in many situations), you will end up
with the useless casts and a high risk of forgetting a cast and potentially
have your coordinates 2 billion pixels to the right and down.

In these cases it actually makes more sense to have the width and height
as signed integers.

And yes, I have experience on this precise problem.

Perhaps
Java is a more suitable language for you rather than C++.

Where did that come from?

Paul · Feb 8, 2011

James Kanze said:
[...]
Stick to arguing with Paul, at least you *are* probably smarter than
him.

Click to expand...

Ouch. Now that is hitting low.

Look 3 idiots together! ^_^

Juha Nieminen · Feb 8, 2011

Paul said:
Look 3 idiots together! ^_^

Honestly, what do you think you are going to achieve with that kind of
attitude?

Paul · Feb 8, 2011

Juha Nieminen said:
Honestly, what do you think you are going to achieve with that kind of
attitude?

Or is it 4? °_°

Richard Kettlewell · Feb 8, 2011

Kaba said:
### Looping with unsigned integers

Looping over values from n to zero backwards demonstrates the zero
boundary problem with unsigned integers:

for (unsigned int i = n;i >= 0;--i)
{
// Do something.
}

Since there are no negative values to fail the loop test,
this loop never finishes.

While agreeing that unsigned can booby-trapped in some cases, I don't
think this is one of them: a decent compiler will spot the trivial
comparison and generate a warning.

Kaba · Feb 8, 2011

Richard said:
While agreeing that unsigned can booby-trapped in some cases, I don't
think this is one of them: a decent compiler will spot the trivial
comparison and generate a warning.

Yep. So instead:

unsigned int a = f();

for (unsigned int i = n;i >= a;--i)
{
// Do something.
}

The loop works except when a = 0.

Richard Kettlewell · Feb 8, 2011

Kaba said:
Richard Kettlewell wrote:

Yep. So instead:

unsigned int a = f();

for (unsigned int i = n;i >= a;--i)
{
// Do something.
}

The loop works except when a = 0.

Much better l-)

Martin · Feb 10, 2011

Leigh Johnston wrote: [snip]

Zero boundary problem
---------------------

Most of the time we are assuming that, away from boundaries caused
by the finite representation, an integer object works like an element
of the integer ring ZZ. In addition, it seems plausible that most
integer calculations are concentrated around a neighborhood of zero.

Actually, I assume an unsigned integer object behaves more like N away
from the boundaries caused by the finite representation. And it's not

an assumption that it behaves like Z said:
The _zero-boundary problem_ (of unsigned integers) is that the zero
is on the boundary beyond which the assumption of working with integers
falls apart. Thus, the probability of introducing errors in computations
increases greatly. Furthermore, those errors can not be catched, since
every value is legal.

If you don't start with that assumption, it's failure won't cause any
errors at all. And the method you advocate for catching errors with
signed integers can also be transformed to work with unsigned
integers, unless the the unsigned integer's maximum is equal to the
signed integer's maximum. (.e.g. UINT_MAX == INT_MAX would prevent
the transformation from working. Otherwise, reserve the range
(INT_MAX, UINT_MAX] as invalid. If those need to be valid values,
then signed int isn't an option in the first place.)

[example elided]

Extended positive range problem
-------------------------------

Conversion between an unsigned integer and a signed integer
is an information destroying process in either direction.
The only way to avoid this is to use one or the other
consistently.

Not necessarily. There's at least one common implementation where
that conversion is lossless.

And one can, of course, only convert in situations where the
conversion is lossless (e.g. [0, INT_MAX]). Using only one or the
other is hardly the *only* way.

If the normal arithmetic is the intention, then a signed integer
represents a more general concept than an unsigned integer: the former
covers both the negative and positive integers, whereas the latter
only covers non-negative integers.

Which normal arithmetic? On reals? On rationals? On Integers? On
Naturals? On Complex? On Quaternions? The most obvious of those
options is the one on the reals. Why aren't you advocating the use of
double everywhere unless otherwise indicated?

In programming terms, it is
possible to create a program using signed integers alone, however,
the same can't be said about the unsigned integers. Therefore, if
only one type is to be used consistently, the choice should be a
signed integer. However, let us do some more analysis.

This is not the previous argument in programming terms. That said,
the *only* reason it's not possible to create a program using only
unsigned integers is that main requires int. In fact, any
functionality that can be coded using only signed integers can be
coded using only unsigned integers.

Since any non-trivial program must use signed integers, the use of
unsigned integers eventually leads to an unsigned-signed
conversion. In particular, because `std::size_t` in the Standard
Library is an unsigned integer, there are few programs that can
escape the unsigned-signed conversions.

Yep, just about every non-trivial program that is sanely written will
have both signed and unsigned integers.

Despite these conversions, programs still seem to work. The reason
for this, I reflect, is that the unsigned integers are normally
not taken into the extended positive range they allow.
The _extended positive range problem_ is that if the unsigned integers
are taken to their extended positive range, then the signed-unsigned
conversions become a reality. Ensuring correctness under such a threat
is hard, if not practically impossible.

What problem? Let's assume that you have an unsigned integer with a
value in (INT_MAX, UINT_MAX]. Either this value is valid, in which
case the correctness is ensured by not converting. Or it's an invalid
value, in which case you've already failed to ensure correctness.
Your distinction makes no difference.

Conclusion

True enough, but I don't see anyone advocating such a use.

James Kanze · Feb 10, 2011

On 07/02/2011 17:19, Juha Nieminen wrote:

You might be in the "use int everywhere" camp but I am not.
Perhaps Java is a more suitable language for you rather than
C++.

Or perhaps he's just been using the language longer than you
have. Other people I know in the "use int everywhere" camp
include such people as Stroustrup, and in C, Kernigan and
Richie. (The fact that they use int everywhere isn't an
absolute argument for that policy, of course, but it does
signify that using int everywhere is very much in the C/C++
tradition.)

gwowen · Feb 10, 2011

On 10/02/2011 12:59, James Kanze wrote:

I am completely comfortable with using "unsigned int" where appropriate these days.

unsigned is often fine, as long as you don't want to do subtraction.

Juha Nieminen · Feb 10, 2011

Leigh Johnston said:
You can create a bug using any language feature. Fix and recompile.

Do you know how much does it cost to fix bugs vs. not having them in
the first place?

The problem is that the bug might not be found during development nor
even testing. And even if it's found in testing, it might be laborious to
track down the reason for the bug (because it often causes very erratic
behavior).

Besides, your argument can be used to defend sloppy programming. There's
nothing wrong in defensive programming that takes care of potential bugs
from the get go.

In your opinion.

Based on actual experience in actual projects where using unsigned integers
to denote width and height of images has bitten me hard.

So do I. I don't have any problem with casts; they are a perfectly
acceptable part of a statically typed language such as C++.

As long as you *remember* to cast everything that needs to be cast.

Again, explicit casting is often a sign of bad design. There are
situations where explicit casting is ok and to be expected, but this
isn't one.

As you seem to be in the "use int everywhere" camp Java may be a more
appropriate language for you.

No. I'm of the camp "use signed integers when signed types are being used".

I still don't understand what Java has to do with this.

Juha Nieminen · Feb 10, 2011

James Kanze said:
Or perhaps he's just been using the language longer than you
have. Other people I know in the "use int everywhere" camp
include such people as Stroustrup, and in C, Kernigan and
Richie. (The fact that they use int everywhere isn't an
absolute argument for that policy, of course, but it does
signify that using int everywhere is very much in the C/C++
tradition.)

Usually the only situation where using unsigned integers (of any size)
is legitimate is when you are playing with bits, rather than with integers
(in other words, when your variable is more or less used as a bit container).

There might also be some situations where you need the full range of
unsigned integers, including the wrap-around behavior (many random number
generators come to mind as examples).

With things like array indices the question starts being much fuzzier,
without any clear definitive answer. (Well, unless you really, really need
to address a char array with the full range of an unsigned integer. It's
not inconceivable that this could happen in some apps.)

Juha Nieminen · Feb 10, 2011

Paul said:
Or is it 4? °_°

You didn't answer my question.

Bo Persson · Feb 10, 2011

Leigh said:
I have been using C++ for 18 years; I used to be (IMO) lazy and also
used int everywhere but I have since stopped this practice mainly
due to the advent of the C++ Standard Library and the size_type
typedef which by default is std::size_t which is an unsigned
integral type. I am completely comfortable with using "unsigned
int" where appropriate these days. We *can* argue about what is
appropriate but I would rather not as we would just be repeating
the same old arguments.

If you are in the "use int everywhere", the unsigned size_type of the
standard containers is arguably a mistake (or non-optimal design, at
best). You only need the extra range if you have things like a char
array larger than half the addressable space. And when you do, *that*
is probably a mistake too.

Having size unsigned is a problem if you try to compute it by
subtracting two pointers, because that result is signed. You then
quickly end up mixing signed and unsigned artihmetic, which is
definitely *not* good.

Bo Persson

Paul · Feb 10, 2011

Juha Nieminen said:
Do you know how much does it cost to fix bugs vs. not having them in
the first place?

The problem is that the bug might not be found during development nor
even testing. And even if it's found in testing, it might be laborious to
track down the reason for the bug (because it often causes very erratic
behavior).

Besides, your argument can be used to defend sloppy programming. There's
nothing wrong in defensive programming that takes care of potential bugs
from the get go.

This seems to be an argument of styles , nobody is right or wrong.
As for bug free code that is the *only* way to code, is it not?

Based on actual experience in actual projects where using unsigned
integers
to denote width and height of images has bitten me hard.

This is arguable as a rectangle on Cartesian co-ords can be negative, but as
a 3D object its not going to be negative unless it is some kind of
anti-matter.

As long as you *remember* to cast everything that needs to be cast.

Again, explicit casting is often a sign of bad design. There are
situations where explicit casting is ok and to be expected, but this
isn't one.

You should not need to cast anything a variable declared as unsigned should
be used for positive values only, except in very exceptional circumstances
IMO.
If you are casting signed to unsigned or vice versa often it seems like bad
programming to me.

No. I'm of the camp "use signed integers when signed types are being
used".

Saying 'of the camp use signed for everything' is just a saying and it does
not imply that these people are totally ignorant to the benefits of using
unsigned when appropriate.
I think it's more a case fo 'if in doubt' use a signed.

I still don't understand what Java has to do with this.

All Java integer types are signed.

gwowen · Feb 10, 2011

Subtraction of unsigned values is fine if the minuend (lhs) is greater
than the subtrahend (rhs).

If you're attempting to use the archaic (and almsot entirely unused)
Latin to demonstrate how educated and/or intelligent you are, it helps
to be right. Your usage of minuend and subtrahend are fine (to the
extent that such self-indulgent archaisms-for-effect are ever fine),
but you need to learn what the LHS and RHS of an equation are are.
Clue: They're relative to an '=', not to an operator.

For example in the subtraction: x = a - b

The minuend is 'a'. The subtrahend is 'b''. The LHS is 'x', the RHS
is 'a-b'.
Here, you go:
http://en.wikipedia.org/wiki/Left-hand_side_and_right-hand_side_of_an_equation

Juha Nieminen · Feb 10, 2011

Leigh Johnston said:
Again, you can create a bug in the first place using any language feature.

This whole "bugs will happen no matter what, hence it doesn't matter
what you do to avoid them, just fix them when they happen" mentality is
something that I just must disagree with.

There is no reason nor advantage in using unsigned integers to represent
image sizes (at least not in any of the practical applications I have been
involved with, where image sizes never get even close to 2 billion x 2
billion pixels). On the contrary, it can only potentially cause bugs which
can be hard to trace and which might not manifest themselves immediately
nor obviously.

Because all Java's integral types are signed ergo Java may suit you
better due to your irrational fear of a perfectly acceptable C++
language features.

So if you want to, for example, manipulate the individual bits of a
32-bit register, you do what exactly in Java? Or how do you write an
efficient linear congruential generator in Java?

Where have I ever said that unsigned types must never be used?

Juha Nieminen · Feb 10, 2011

Leigh Johnston said:
What utter nonsense; when dealing with just *positive* integers the
unsigned integral types are fine.

If there is no clear advantage or need for unsigned types, choosing
a signed type is usually safer. This is especially so if the variable
will be used in signed expressions.

Of course this doesn't mean there aren't situations where unsigned types
are more advantageous if not outright mandatory (for example when handling
raw bytes you usually want to use unsigned chars rather than signed ones),
but those situations are, in fact, relatively rare.

More nonsense; it is not fuzzy at all; the C++ draft Standard provides a
clear and definitive answer: std::tr1::array::size_type is std::size_t
(array indices).

The standardization committee made the decision of defining the size_type
of containers as an unsigned integer. Not everybody agrees with this
decision, and there are good arguments on both sides. The issue *is* fuzzy
and a question of opinion.

Problem with std::list and sorting	3	Jan 11, 2005
Crash while sorting a vector of pointers	3	Feb 13, 2008
If I change from std::vector to std::list my compiler becomes very displeased with me	5	Jan 3, 2005
Creating a list of multiple element rows, and sorting them.	2	Jul 2, 2006
design choice	1	May 27, 2008
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.vhdl FAQ part 3 of 4: products & services	0	Jul 8, 2003

sorting 5million minute data: choose std::list or std::vector

James Kanze

James Kanze

Juha Nieminen

Paul

Juha Nieminen

Paul

Richard Kettlewell

Kaba

Richard Kettlewell

Martin

James Kanze

gwowen

Juha Nieminen

Juha Nieminen

Juha Nieminen

Bo Persson

Paul

gwowen

Juha Nieminen

Juha Nieminen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads