Unsigned types are DANGEROUS??

M

MikeP

If you investigate the tcmalloc code (by Google), you will find the
following warning:

// NOTE: unsigned types are DANGEROUS in loops and other arithmetical
// places. Use the signed types unless your variable represents a bit
// pattern (eg a hash value) or you really need the extra bit. Do NOT
// use 'unsigned' to express "this value should always be positive";
// use assertions for this.

Is it just their idiom? What's the problem with using unsigned ints in
loops (it seems natural to do so)? Are C++ unsigned ints "broken"
somehow?
 
Ö

Öö Tiib

If you investigate the tcmalloc code (by Google), you will find the
following warning:

// NOTE: unsigned types are DANGEROUS in loops and other arithmetical
// places. Use the signed types unless your variable represents a bit
// pattern (eg a hash value) or you really need the extra bit. Do NOT
// use 'unsigned' to express "this value should always be positive";
// use assertions for this.

Is it just their idiom? What's the problem with using unsigned ints in
loops (it seems natural to do so)? Are C++ unsigned ints "broken"
somehow?

Unsigned int is not broken. It is well-defined. Also it needs one bit
less than signed type and can use it for representing the value. What
is most annoying about it all is that some people take it sort of
religiously one way or other.

One problem is that unsigned int is type whose behavior most novices
intuitively misinterpret as positive integer. It is not positive or
negative. It is modular arithmetic value that does not and can not
have sign. For example if you subtract 8U from 4U then you do not get
-4 but some platform dependent large unsigned value that you did not
want on majority of cases. When you multiply 4U by -4 then you get
some platform dependent large unsigned value ... and so on.

Unsigned types are very good for algorithms that use bitwise
arithmetics. Unsigned int is also excellent for algorithm that needs
modular arithmetic with modulus 2 in power 32. For example for
cryptography.

In general ... good software developer should be careful. C++ is good
language for training careful developers. ;)
 
A

Alf P. Steinbach /Usenet

* MikeP, on 12.03.2011 22:35:
If you investigate the tcmalloc code (by Google), you will find the
following warning:

// NOTE: unsigned types are DANGEROUS in loops and other arithmetical
// places. Use the signed types unless your variable represents a bit
// pattern (eg a hash value) or you really need the extra bit. Do NOT
// use 'unsigned' to express "this value should always be positive";
// use assertions for this.

Is it just their idiom?
No.


What's the problem with using unsigned ints in
loops (it seems natural to do so)?

"and other arithmetical places". It's not about loops, it's about any place
where unsigned values might be treated as numbers (instead of as bitpatterns),
and implicit conversions can kick in. For example, ...

assert( std::string( "blah blah" ).length() < -5 );

.... is technically unproblematic, but you have to think twice about it.

Having to think twice about it means that you can easily write something incorrect.

And Murphy then guarantees that you will.

Are C++ unsigned ints "broken" somehow?

Not in the sense that you're apparently asking about, that is, there is not
anything broken about e.g 'unsigned' itself. But as part of a willy-nilly broken
type hodge-podge inherited from C, yes, it's broken. That's because implicit
conversions that lose information are all over the place.


Cheers & hth.,

- Alf
 
A

Andre Kaufmann

Using unsigned arithmetic means that underflows become overflows, which
means you only need to check for a single class of invalid values instead of
two when you use common vector coding patterns: base + length. It also means

Is there really always a difference between signed and unsigned ?
I think it depends on the code/algorithm and how overflows are handled
by the CPU.

e.g. (only an illustrative sample, wouldn't write code like this)


// Assuming code runs under a
// >>> 32 bit CPU <<< !!!!

char buf[4000];

void foo1(unsigned int len, unsigned int appendLen, char* append)
{
if ((len + appendLen) < sizeof(buf))
{
memcpy(buf + len, append, appendLen);
}
}


foo1(2000, 0xFFFFFFFF - 0x2, ....)

Most of the CPU's and C++ compilers I know would simply copy (32 Bit
CPU!) beyond the buffers boundary because:

2000 + 0xFFFFFFFF -> 1997 < sizeof(buf) == 4000

Effectively the unsigned value results due to an overflow to be a
subtraction of -3:


Using signed values wouldn't change that much:

void foo2(int len, int appendLen, char* append)
{
if ((len + appendLen) < sizeof(buf))
{
memcpy(buf + len, append, appendLen);
}
}

foo1(2000, 0xFFFFFFFF - 0x2, ....)

2000 + -3 -> 1997

What would help either check each value individually or use 64 bit
integer arithmetic for 32 bit operands to prevent overflows.
Or to use either exceptions on overflows (some languages do that by
default or use saturation registers which don't overflow).
you only need to check the result, instead of each intermediate arithmetic
operation.

IMHO depends on the CPU and language and the algorithm used. In the
sample above checking the result wouldn't help.

Andre
 
L

Lasse Reichstein Nielsen

Andre Kaufmann said:
Using unsigned arithmetic means that underflows become overflows, which
means you only need to check for a single class of invalid values instead of
two when you use common vector coding patterns: base + length. It also means

Is there really always a difference between signed and unsigned ?
I think it depends on the code/algorithm and how overflows are handled
by the CPU.

e.g. (only an illustrative sample, wouldn't write code like this)


// Assuming code runs under a
// >>> 32 bit CPU <<< !!!!

char buf[4000];

void foo1(unsigned int len, unsigned int appendLen, char* append)
{
if ((len + appendLen) < sizeof(buf))
{
memcpy(buf + len, append, appendLen);
}
} ....
What would help either check each value individually or use 64 bit
integer arithmetic for 32 bit operands to prevent overflows.
Or to use either exceptions on overflows (some languages do that by
default or use saturation registers which don't overflow).

Or write your code to do non-overflowing computations only. In this
case we are doing arithmetic on an untrusted value (appendLen) before
validating it, which means that we might mask an invalid value.

If we can assume that len is within sizeof(buf) (otherwise we have already
overflowed the buffer), then in this case it should be:
if (sizeof(buf) - len >= appendLen) {
memcpy(buf + len, append, appendLen);
}
because sizeof(buf) - len is guaranteed to give a valid value, and
we compare that directly to the untrusted value.

Ofcourse, this only works as this if the values are unsigned. If
signed, we should also bail out if appendLen is negative (or,
preferably, just cast both sides to size_t or unsigned).

If there is more than one untrusted value, then we will probably need
to do individual validation, because two wrongs might seem to make a
right :)

/L
 
A

Andre Kaufmann

Andre Kaufmann<akinet#remove#@t-online.de> writes:
[...]

Or write your code to do non-overflowing computations only. In this
case we are doing arithmetic on an untrusted value (appendLen) before
validating it, which means that we might mask an invalid value.

If we can assume that len is within sizeof(buf) (otherwise we have already
overflowed the buffer), then in this case it should be:
if (sizeof(buf) - len>= appendLen) {
memcpy(buf + len, append, appendLen);
}

Agreed - good and more safe idea.
Ofcourse, this only works as this if the values are unsigned. If
signed, we should also bail out if appendLen is negative (or,
preferably, just cast both sides to size_t or unsigned).

If there is more than one untrusted value, then we will probably need
to do individual validation, because two wrongs might seem to make a
right :)

Yes ;-): Negative * Negative = Positive

The problem is only, which value passed as parameter can be trusted ?

C++ strings would be more safe, but even there it depends on the
implementation of the string class itself.

Checking for correct values of integer operations is quite complex.
E.g. the code file of an safe integer class: SafeInt class from
Microsoft is quite long:
6000 lines of code.
But who want's to have or can afford such an overhead for each integer :-/

Andre
 
A

Alf P. Steinbach /Usenet

* William Ahern, on 13.03.2011 03:01:
The danger here with implicit conversions occurs when you mix signed and
unsigned.
Yes.


If you don't mix, and you stick to (unsigned int) or wider, then
you're fine.

Those are two separate claims.

"If you don't mix ... you're fine" is generally true. There has to be some
mixing at some level because of the (with 20-20 hindsight) unfortunate choice of
unsigned sizes in the standard library. To *contain* this mixing it's a good
idea to define the triad of helper functions countOf, startOf and endOf.

"If you stick to (unsigned int) or wider, then you're fine" is generally false.
Use hammer for nails, screwdriver for screws. In short, use the right tool for
the job, or at least don't use a clearly inappropriate tool: don't use signed
types for bitlevel stuff, and don't use unsigned types for numerical stuff.

unsigned types can be safer because everything about the
arithmetic is well defined, including over- and underflows which occur
modulo 2^N; as opposed to signed, where those scenarios are undefined.

The well-definedness of operations that you're talking about is this: that the
language *guarantees* that range errors for unsigned types will not be caught.

A guarantee that errors won't be caught does not mean "safer".

That's plain idiocy, sorry.

If you don't need negative numbers, why use a type that can produce them?

To catch errors and to not inadvertently produce them in the first place.

If
they are produced (from an oversight or a bug), really unexpected things can
happen.
Right.


Using unsigned arithmetic means that underflows become overflows, which
means you only need to check for a single class of invalid values instead of
two when you use common vector coding patterns: base + length.

I am aware that a similar piece of reasoning is in the FAQ. It is technically
correct. But adopted as a guideline it's like throwing the baby out with the
bathwater: the single check has no real advantage, it is limited to a very
special context, and the cost of providing null-advantage is very high.

In short basing your default selection of types on that, is lunacy.

It also means
you only need to check the result, instead of each intermediate arithmetic
operation.

Sorry, that's also incorrect.


Cheers & hth.,

- Alf (replying because you replied to me, possibly not following up on this)
 
Ö

Öö Tiib

Andre Kaufmann said:
IMHO depends on the CPU and language and the algorithm used. In the
sample above checking the result wouldn't help.

Indeed, it would not. But the following is an example of what I had in mind.
[...]


#include <iostream>

int main(void) {
        const char s[] = "\x7f\xff\xff\xffSome object";
#if 0
        int limit = sizeof s;
        int offset = 0;
        int count;
#else
        unsigned limit = sizeof s;
        unsigned offset = 0;
        unsigned count;
#endif

        /* note that I do (limit - 4), not (offset + 4). and earlier i would
           need to ensure that limit >= 4 */
        while (offset < limit - 4) {
                count =  ((0x7fU & s[offset++]) << 24U);
                count |= ((0xffU & s[offset++]) << 16U);
                count |= ((0xffU & s[offset++]) << 8U);
                count |= ((0xffU & s[offset++]) << 0U);

                offset += count;

                std::cout << "count:" << count
                          << " limit:" << limit
                          << " offset:" << offset << "\n";
        }

        return 0;
}

It is exactly what everyone are agreeing that unsigned is good as a
dummy full of bits for doing bitwise algorithms. If you are constantly
doing such bit-crunching then no wonder that you prefer unsigned.

BTW: Since your example is anyway platform-dependent code ... why you
don't use memcpy? The whole cryptic bit-shift block would go away.
 
A

Andre Kaufmann

You're deriving a new base, so of course this is bad code. There are an
infinite ways to write bad code. This would be equally bad as signed. My
contention wasn't that unsigned is always better; just as worse or at the

Yes, agreed.
Switching to a wider type is usually the wrong thing to do. 5 or 10 years
later they can easily become broken. When audio/video protocols switched to
unsigned 64-bit types, all of a sudden signed 64-bit integers became too
short because they weren't just capturing operations over 32-bit operands.

I think it's nearly impossible to generally switch to a larger type
without any problems. Either you loose binary compatibility (when sent
over IP to other systems or when old binary data is loaded) or you have
other incompatibilities. One of the reason why most of the compiler use
a data model where an [int] type has the same size on 32 bit and 64
systems. And to be honest, I don't care that much about 128 bit systems
today ;-9
Indeed, it would not. But the following is an example of what I had in mind.
You get completely different results depending on whether one is using
signed or unsigned. And, at least for the line of work I do (on-the-fly
media transcoding), this is extremely common.

I too write code for media transcoding. Generally I don't care about
integer overflows too, besides of overflows caused by integer values
that are passed to our services (e.g via http/tcp).
In the work I do, it doesn't
matter whether I detect that there has been an overflow, or I just fail
after performing the next jump and there's junk data.

Yes, but I had typical buffer overflows in mind, which are used to
attack and compromise systems.
What matters is that I
program in a way which mitigates my mistakes (because none of us can write
perfect code all the time) and which fails gracefully. The less error
checking I need to do, the less possibility of mistake. I could do
additional error checking using signed types, but that just means that I (a)
have to write more code more susceptible to mistake, and (b) mistakes have
worse consequences.

Yes, I don't think that signed values help that much too - perhaps would
make the situation even worse.
#include<iostream>
You get completely different results depending on whether one is using
signed or unsigned.

Yep, signed integers don't help. Besides if you would use a 64 bit (long
long) signed integer on a 32 bit system.

But for most algorithms this would be overkill and at least for most
media codecs result in decreased performance (when one would use
generally 64 bit integers on 32 bit platforms).

Andre
 
P

Paul

Leigh Johnston said:
Bullshit. Using unsigned integral types to represent values that are
never negative is perfectly fine. std::size_t ensures that the C++ horse
has already bolted as far as trying to avoid them is concerned.


Plain idiocy is eschewing the unsigned integral types in C++. Perhaps you
would prefer being a Java programmer? Java has less types to "play with"
which perhaps would suit you better if you cannot cope with C++'s richer
set of types.
As Java, like C++, supports UDT's I don't think it's correct to say that C++
suports a richer set of types.
class anyTypeYouLike{};

There is a reason Java doesn't bother with a built in unsigned numeric type.
I think the people who created Java know more about programming than you do
and it is not a case of Java being inadequete. This is just your misguided
interpretation in an attempt to reinforce your idea that std::size_t is the
only type people should use in many circumstances.
You obviously think std:size_t is the best thing since sliced bread and this
is the way forward in C++ and, as per usual, your opinion is wrong.

The message you replied to ALf said to use the correct tool for the job,
which seems like a reasonable opinion. You replied saying this was bullshit
and implied Alf had said something about never using unsigned, your post
looks like a deliberate attempt to start a flare.
You also make the point of saying using unsigned for values that are never
negative is fine, but how do you know it is never going to be negative? Your
idea of never negative is different from others', you think array indexes
cannot be negative, but most other people know they can be.
 
J

Johannes Schaub (litb)

MikeP said:
If you investigate the tcmalloc code (by Google), you will find the
following warning:

// NOTE: unsigned types are DANGEROUS in loops and other arithmetical
// places. Use the signed types unless your variable represents a bit
// pattern (eg a hash value) or you really need the extra bit. Do NOT
// use 'unsigned' to express "this value should always be positive";
// use assertions for this.

Is it just their idiom? What's the problem with using unsigned ints in
loops (it seems natural to do so)? Are C++ unsigned ints "broken"
somehow?

Exactly. They are soooo right. It's a anti-pattern that the Standard library
uses unsigned integral types for "only-positive" entities.

http://stackoverflow.com/questions/3729465/the-importance-of-declaring-a-
variable-as-unsigned/3729815#3729815
 
P

Paul

Leigh Johnston said:
This is a troll. If is obvious I am referring to the set of built-in
types. A user defined type is often composed of one or more built-in
types.


C++'s std::size_t comes from C's size_t. It is the way forward because it
is also the way backward.


If array indexes can be negative then please explain why
std::array::size_type will be std::size_t and not std::ptrdiff_t.

Just because that particular array type cannot have a negative index doesn't
mean this applies to all arrays.
Array indexes *can* be negative see:
http://www.boost.org/doc/libs/1_46_1/libs/multi_array/doc/user.html
 
P

Paul

Leigh Johnston said:
Just because that particular array type cannot have a negative index
doesn't mean this applies to all arrays.
Array indexes *can* be negative see:
http://www.boost.org/doc/libs/1_46_1/libs/multi_array/doc/user.html

From n3242:

"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"

E2 cannot be negative if the above rule is not to be broken.
The above says E2 is an integer not an unsigned integer.

Also:
int arr[5] = {0};
arr[4] =5; /*The 5th element not the 4th*/
arr[0] = 1; /*The 1st elelment, not the 0th*/
int*p_arr = &arr;
++p_arr;
p_arr[-1] = 1; /*The 1st element, not the -1st*/
p_arr[0] = 2; /*The 2nd element, not the 0th*/


I don't know what medicine you are on but it doesn't seem to be working. An
array can have a negative index, accept you're wrong, forget about size_t,
revisit your doctor and move on.
 
P

Paul

Leigh Johnston said:
Leigh Johnston said:
On 14/03/2011 00:09, Paul wrote:

On 13/03/2011 20:43, Paul wrote:

On 13/03/2011 19:29, Alf P. Steinbach /Usenet wrote:
* William Ahern, on 13.03.2011 03:01:

"If you stick to (unsigned int) or wider, then you're fine" is
generally
false. Use hammer for nails, screwdriver for screws. In short,
use the
right tool for the job, or at least don't use a clearly
inappropriate
tool: don't use signed types for bitlevel stuff, and don't use
unsigned
types for numerical stuff.


Bullshit. Using unsigned integral types to represent values that are
never negative is perfectly fine. std::size_t ensures that the C++
horse has already bolted as far as trying to avoid them is
concerned.


unsigned types can be safer because everything about the
arithmetic is well defined, including over- and underflows which
occur
modulo 2^N; as opposed to signed, where those scenarios are
undefined.

The well-definedness of operations that you're talking about is
this:
that the language *guarantees* that range errors for unsigned types
will
not be caught.

A guarantee that errors won't be caught does not mean "safer".

That's plain idiocy, sorry.

Plain idiocy is eschewing the unsigned integral types in C++.
Perhaps
you would prefer being a Java programmer? Java has less types to
"play
with" which perhaps would suit you better if you cannot cope with
C++'s richer set of types.

As Java, like C++, supports UDT's I don't think it's correct to say
that
C++ suports a richer set of types.
class anyTypeYouLike{};

This is a troll. If is obvious I am referring to the set of built-in
types. A user defined type is often composed of one or more built-in
types.


There is a reason Java doesn't bother with a built in unsigned
numeric
type.
I think the people who created Java know more about programming
than you
do and it is not a case of Java being inadequete. This is just your
misguided interpretation in an attempt to reinforce your idea that
std::size_t is the only type people should use in many circumstances.
You obviously think std:size_t is the best thing since sliced bread
and
this is the way forward in C++ and, as per usual, your opinion is
wrong.

C++'s std::size_t comes from C's size_t. It is the way forward because
it is also the way backward.


The message you replied to ALf said to use the correct tool for the
job,
which seems like a reasonable opinion. You replied saying this was
bullshit and implied Alf had said something about never using
unsigned,
your post looks like a deliberate attempt to start a flare.
You also make the point of saying using unsigned for values that are
never negative is fine, but how do you know it is never going to be
negative? Your idea of never negative is different from others', you
think array indexes cannot be negative, but most other people know
they
can be.


If array indexes can be negative then please explain why
std::array::size_type will be std::size_t and not std::ptrdiff_t.


Just because that particular array type cannot have a negative index
doesn't mean this applies to all arrays.
Array indexes *can* be negative see:
http://www.boost.org/doc/libs/1_46_1/libs/multi_array/doc/user.html


From n3242:

"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"

E2 cannot be negative if the above rule is not to be broken.
The above says E2 is an integer not an unsigned integer.

Also:
int arr[5] = {0};
arr[4] =5; /*The 5th element not the 4th*/
arr[0] = 1; /*The 1st elelment, not the 0th*/
int*p_arr = &arr;
++p_arr;
p_arr[-1] = 1; /*The 1st element, not the -1st*/
p_arr[0] = 2; /*The 2nd element, not the 0th*/

In C++ an array index can't be negative; p_arr above is not an array it is
a pointer. Obviously "E2-th" is zero-based not one-based. The fact that
Boost provides an array-like container which accepts negative indices is
mostly irrelevant; my primary concern is C++ not Boost; again:
An array is always accessed via a pointer thats just how it works.

From n3242:

"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"

if E2 is negative then E1[E2] cannot possibly refer to a member of E1.
From common sense:

int arr[16]={0};
arr[0]=0; /*1st element not 0th element*/


I don't give a **** about your stupid misinterpretations from the ISO
documents you fuckin idiot, learn to accpet when you are wrong.
You are the one who is wrong and refuse to accept what is correct.

std::size_t.
Oh yes try to start a flare with no reasonable arguement as usual, well I
had enough of idiots like you see if I care , anyone with a brain can see
you are the one who is wrong.
 
P

Paul

Leigh Johnston said:
On 14/03/2011 00:09, Paul wrote:

On 13/03/2011 20:43, Paul wrote:

On 13/03/2011 19:29, Alf P. Steinbach /Usenet wrote:
* William Ahern, on 13.03.2011 03:01:

"If you stick to (unsigned int) or wider, then you're fine" is
generally
false. Use hammer for nails, screwdriver for screws. In short,
use the
right tool for the job, or at least don't use a clearly
inappropriate
tool: don't use signed types for bitlevel stuff, and don't use
unsigned
types for numerical stuff.


Bullshit. Using unsigned integral types to represent values that
are
never negative is perfectly fine. std::size_t ensures that the C++
horse has already bolted as far as trying to avoid them is
concerned.


unsigned types can be safer because everything about the
arithmetic is well defined, including over- and underflows which
occur
modulo 2^N; as opposed to signed, where those scenarios are
undefined.

The well-definedness of operations that you're talking about is
this:
that the language *guarantees* that range errors for unsigned
types
will
not be caught.

A guarantee that errors won't be caught does not mean "safer".

That's plain idiocy, sorry.

Plain idiocy is eschewing the unsigned integral types in C++.
Perhaps
you would prefer being a Java programmer? Java has less types to
"play
with" which perhaps would suit you better if you cannot cope with
C++'s richer set of types.

As Java, like C++, supports UDT's I don't think it's correct to say
that
C++ suports a richer set of types.
class anyTypeYouLike{};

This is a troll. If is obvious I am referring to the set of built-in
types. A user defined type is often composed of one or more built-in
types.


There is a reason Java doesn't bother with a built in unsigned
numeric
type.
I think the people who created Java know more about programming
than you
do and it is not a case of Java being inadequete. This is just your
misguided interpretation in an attempt to reinforce your idea that
std::size_t is the only type people should use in many
circumstances.
You obviously think std:size_t is the best thing since sliced bread
and
this is the way forward in C++ and, as per usual, your opinion is
wrong.

C++'s std::size_t comes from C's size_t. It is the way forward
because
it is also the way backward.


The message you replied to ALf said to use the correct tool for the
job,
which seems like a reasonable opinion. You replied saying this was
bullshit and implied Alf had said something about never using
unsigned,
your post looks like a deliberate attempt to start a flare.
You also make the point of saying using unsigned for values that are
never negative is fine, but how do you know it is never going to be
negative? Your idea of never negative is different from others', you
think array indexes cannot be negative, but most other people know
they
can be.


If array indexes can be negative then please explain why
std::array::size_type will be std::size_t and not std::ptrdiff_t.


Just because that particular array type cannot have a negative index
doesn't mean this applies to all arrays.
Array indexes *can* be negative see:
http://www.boost.org/doc/libs/1_46_1/libs/multi_array/doc/user.html


From n3242:

"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"

E2 cannot be negative if the above rule is not to be broken.

The above says E2 is an integer not an unsigned integer.

Also:
int arr[5] = {0};
arr[4] =5; /*The 5th element not the 4th*/
arr[0] = 1; /*The 1st elelment, not the 0th*/
int*p_arr = &arr;
++p_arr;
p_arr[-1] = 1; /*The 1st element, not the -1st*/
p_arr[0] = 2; /*The 2nd element, not the 0th*/

In C++ an array index can't be negative; p_arr above is not an array it
is a pointer. Obviously "E2-th" is zero-based not one-based. The fact
that Boost provides an array-like container which accepts negative
indices is mostly irrelevant; my primary concern is C++ not Boost; again:

From n3242:

"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"

if E2 is negative then E1[E2] cannot possibly refer to a member of E1.

Also from n1336:

"if E1 is an array object (equivalently, a pointer to the initial element
of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero)."

This more closely matches my thinking on this subject; a sub-array
reference that allows you to give a negative array index on most
implementations is not an array object.

/Leigh

idiot.
 
P

Paul

Leigh Johnston said:
Leigh Johnston said:
On 14/03/2011 01:41, Paul wrote:

On 14/03/2011 00:09, Paul wrote:

On 13/03/2011 20:43, Paul wrote:

On 13/03/2011 19:29, Alf P. Steinbach /Usenet wrote:
* William Ahern, on 13.03.2011 03:01:

"If you stick to (unsigned int) or wider, then you're fine" is
generally
false. Use hammer for nails, screwdriver for screws. In short,
use the
right tool for the job, or at least don't use a clearly
inappropriate
tool: don't use signed types for bitlevel stuff, and don't use
unsigned
types for numerical stuff.


Bullshit. Using unsigned integral types to represent values that
are
never negative is perfectly fine. std::size_t ensures that the C++
horse has already bolted as far as trying to avoid them is
concerned.


unsigned types can be safer because everything about the
arithmetic is well defined, including over- and underflows which
occur
modulo 2^N; as opposed to signed, where those scenarios are
undefined.

The well-definedness of operations that you're talking about is
this:
that the language *guarantees* that range errors for unsigned
types
will
not be caught.

A guarantee that errors won't be caught does not mean "safer".

That's plain idiocy, sorry.

Plain idiocy is eschewing the unsigned integral types in C++.
Perhaps
you would prefer being a Java programmer? Java has less types to
"play
with" which perhaps would suit you better if you cannot cope with
C++'s richer set of types.

As Java, like C++, supports UDT's I don't think it's correct to say
that
C++ suports a richer set of types.
class anyTypeYouLike{};

This is a troll. If is obvious I am referring to the set of built-in
types. A user defined type is often composed of one or more built-in
types.


There is a reason Java doesn't bother with a built in unsigned
numeric
type.
I think the people who created Java know more about programming
than you
do and it is not a case of Java being inadequete. This is just your
misguided interpretation in an attempt to reinforce your idea that
std::size_t is the only type people should use in many
circumstances.
You obviously think std:size_t is the best thing since sliced bread
and
this is the way forward in C++ and, as per usual, your opinion is
wrong.

C++'s std::size_t comes from C's size_t. It is the way forward
because
it is also the way backward.


The message you replied to ALf said to use the correct tool for the
job,
which seems like a reasonable opinion. You replied saying this was
bullshit and implied Alf had said something about never using
unsigned,
your post looks like a deliberate attempt to start a flare.
You also make the point of saying using unsigned for values that
are
never negative is fine, but how do you know it is never going to be
negative? Your idea of never negative is different from others',
you
think array indexes cannot be negative, but most other people know
they
can be.


If array indexes can be negative then please explain why
std::array::size_type will be std::size_t and not std::ptrdiff_t.


Just because that particular array type cannot have a negative index
doesn't mean this applies to all arrays.
Array indexes *can* be negative see:
http://www.boost.org/doc/libs/1_46_1/libs/multi_array/doc/user.html


From n3242:

"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"

E2 cannot be negative if the above rule is not to be broken.

The above says E2 is an integer not an unsigned integer.

Also:
int arr[5] = {0};
arr[4] =5; /*The 5th element not the 4th*/
arr[0] = 1; /*The 1st elelment, not the 0th*/
int*p_arr = &arr;
++p_arr;
p_arr[-1] = 1; /*The 1st element, not the -1st*/
p_arr[0] = 2; /*The 2nd element, not the 0th*/


In C++ an array index can't be negative; p_arr above is not an array
it is a pointer. Obviously "E2-th" is zero-based not one-based. The
fact that Boost provides an array-like container which accepts
negative indices is mostly irrelevant; my primary concern is C++ not
Boost; again:
An array is always accessed via a pointer thats just how it works.

From n3242:

"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"

if E2 is negative then E1[E2] cannot possibly refer to a member of E1.
From common sense:

int arr[16]={0};
arr[0]=0; /*1st element not 0th element*/


I don't give a **** about your stupid misinterpretations from the ISO
documents you fuckin idiot, learn to accpet when you are wrong.

*My* misinterpretations? lulz. I don't think so; the standards are quite
clear; from n1336:

"if E1 is an array object (equivalently, a pointer to the initial element
of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero)."

This more closely matches my thinking on this subject; a sub-array
reference that allows you to give a negative array index on most
implementations is not an array object.

You are the fucking idiot.
Oh yes try to start a flare with no reasonable arguement as usual, well
I had enough of idiots like you see if I care , anyone with a brain can
see you are the one who is wrong.

As I said above it is you who is the idiot; it is you who is wrong.

Using negative array indices is poor practice at best.

/Leigh

idiot
 
S

SG

Obviously one can do the following:

int main()
{
        int n[2][2] = { { 11, 12 }, { 21, 22} };
        std::cout << n[1][-1]; // outputs "12"
}
[...]
so IMO it is an open question as to whether the above code is UB.

Yes, I think this is a bit of a gray area. Avoid it if you can. If I
recall correctly, pointer arithmetic is restricted to stay inside some
array (including "one element past the end"). Here, n is an array of
arrays. n[1] selects the second array and the additional [-1] steps
out of that 2nd array. If I recall correctly, James Kanze talked about
a possible debugging implementation which would augment pointers to
store additional information to enable runtime checking of index-out-
of-bounds conditions w.r.t. pointer arithmetic and op[]. IIRC, he
claimed this implementation to be conforming. But it would obviously
"break" your example.
 
M

Michael Doubez

Obviously one can do the following:
  int main()
  {
        int n[2][2] = { { 11, 12 }, { 21, 22} };
        std::cout << n[1][-1]; // outputs "12"
  }
[...]
so IMO it is an open question as to whether the above code is UB.

Yes, I think this is a bit of a gray area. Avoid it if you can. If I
recall correctly, pointer arithmetic is restricted to stay inside some
array (including "one element past the end"). Here, n is an array of
arrays. n[1] selects the second array and the additional [-1] steps
out of that 2nd array. If I recall correctly, James Kanze talked about
a possible debugging implementation which would augment pointers to
store additional information to enable runtime checking of index-out-
of-bounds conditions w.r.t. pointer arithmetic and op[]. IIRC, he
claimed this implementation to be conforming. But it would obviously
"break" your example.

An implementation storing data before or after each array would be
problematic indeed.

It would also break n[1][1] == (&n[0][0])[3] which would be
surprising.
I could not find a guarantee of continuity of the layout in the
standard to back up that claim and, to my untrained eye, it carefully
steps around the issue.

By the way, Leigh Johnson didn't quote the full phrase :
"/Because of the conversion rules that apply to +,/ if E1 is an array
and E2 an integer, then E1[E2] refers to the E2-th member of E1".

The C99 standard phrases it a bit differently:
"Because of the conversion rules that apply to the binary + operator,
if E1 is an array object (equivalently, a pointer to the initial
element of an array object) and E2 is an integer, E1[E2] designates
the E2-th element of E1 (counting from zero)."

The counting from zero is surprising because arrays are however zero-
based but not if you consider that when referencing a sub-object (a
sub-array), you would no longer be zero-based regarding the underlying
object..

Which tends to hint that E1 is however converted into a pointer and no
special treatment is performed because E1 is of array type. In this
case, I expect normal pointer arithmetic applies and a negative
integral part of the subscript operator is acceptable provided you
stay within the underlying object.
 
M

Michael Doubez

On 14/03/2011 11:48, Leigh Johnston wrote:
On 14/03/2011 01:41, Paul wrote:
On 14/03/2011 00:09, Paul wrote:
On 13/03/2011 20:43, Paul wrote:
On 13/03/2011 19:29, Alf P. Steinbach /Usenet wrote:
* William Ahern, on 13.03.2011 03:01:
"If you stick to (unsigned int) or wider, then you're fine" is
generally
false. Use hammer for nails, screwdriver for screws. In short,
use the
right tool for the job, or at least don't use a clearly
inappropriate
tool: don't use signed types for bitlevel stuff, and don't use
unsigned
types for numerical stuff.
Bullshit. Using unsigned integral types to represent values that
are
never negative is perfectly fine. std::size_t ensures that the C++
horse has already bolted as far as trying to avoid them is
concerned.
unsigned types can be safer because everything about the
arithmetic is well defined, including over- and underflows which
occur
modulo 2^N; as opposed to signed, where those scenarios are
undefined.
The well-definedness of operations that you're talking about is
this:
that the language *guarantees* that range errors for unsigned
types
will
not be caught.
A guarantee that errors won't be caught does not mean "safer".
That's plain idiocy, sorry.
Plain idiocy is eschewing the unsigned integral types in C++.
Perhaps
you would prefer being a Java programmer? Java has less types to
"play
with" which perhaps would suit you better if you cannot cope with
C++'s richer set of types.
As Java, like C++, supports UDT's I don't think it's correct to say
that
C++ suports a richer set of types.
class anyTypeYouLike{};
This is a troll. If is obvious I am referring to the set of built-in
types. A user defined type is often composed of one or more built-in
types.
There is a reason Java doesn't bother with a built in unsigned
numeric
type.
I think the people who created Java know more about programming
than you
do and it is not a case of Java being inadequete. This is just your
misguided interpretation in an attempt to reinforce your idea that
std::size_t is the only type people should use in many
circumstances.
You obviously think std:size_t is the best thing since sliced bread
and
this is the way forward in C++ and, as per usual, your opinion is
wrong.
C++'s std::size_t comes from C's size_t. It is the way forward
because
it is also the way backward.
The message you replied to ALf said to use the correct tool for the
job,
which seems like a reasonable opinion. You replied saying this was
bullshit and implied Alf had said something about never using
unsigned,
your post looks like a deliberate attempt to start a flare.
You also make the point of saying using unsigned for values that
are
never negative is fine, but how do you know it is never going to be
negative? Your idea of never negative is different from others',
you
think array indexes cannot be negative, but most other people know
they
can be.
If array indexes can be negative then please explain why
std::array::size_type will be std::size_t and not std::ptrdiff_t.
Just because that particular array type cannot have a negative index
doesn't mean this applies to all arrays.
Array indexes *can* be negative see:
http://www.boost.org/doc/libs/1_46_1/libs/multi_array/doc/user.html
From n3242:
"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"
E2 cannot be negative if the above rule is not to be broken.
The above says E2 is an integer not an unsigned integer.
Also:
int arr[5] = {0};
arr[4] =5; /*The 5th element not the 4th*/
arr[0] = 1; /*The 1st elelment, not the 0th*/
int*p_arr = &arr;
++p_arr;
p_arr[-1] = 1; /*The 1st element, not the -1st*/
p_arr[0] = 2; /*The 2nd element, not the 0th*/
In C++ an array index can't be negative; p_arr above is not an array it
is a pointer. Obviously "E2-th" is zero-based not one-based. The fact
that Boost provides an array-like container which accepts negative
indices is mostly irrelevant; my primary concern is C++ not Boost;
again:
From n3242:
"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"
if E2 is negative then E1[E2] cannot possibly refer to a member of E1.
Also from n1336:
"if E1 is an array object (equivalently, a pointer to the initial
element of an array object) and E2 is an integer, E1[E2] designates the
E2-th element of E1 (counting from zero)."
This more closely matches my thinking on this subject; a sub-array
reference that allows you to give a negative array index on most
implementations is not an array object.
Obviously one can do the following:
int main()
{
int n[2][2] = { { 11, 12 }, { 21, 22} };
std::cout << n[1][-1]; // outputs "12"
}
This should work on most implementations (Comeau warns with "subscript
out of range"); however I think it does run contra to the following:
"if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th
member of E1"
as n[1][-1] is not a member of n[1]; it is a member of n[0].
so IMO it is an open question as to whether the above code is UB.

Interestingly g++ v4.1.2 (codepad.org) does not output "12"; this of
course either strengthens my position that using negative array indices
may be UB or that g++ is buggy; either way using negative array indices
is still poor practice.

It may be a bug or related to codepad.

The offset is right :
int main()
{
int n[2][2] = { { 11, 12 }, { 21, 22} };
std::cout << &n[0][0] <<" "<<&n[1][0]; // outputs 0xbf6c1960
0xbf6c1968
}
 
S

SG

Yes, I think this is a bit of a gray area. Avoid it if you can. If I
recall correctly, pointer arithmetic is restricted to stay inside some
array (including "one element past the end"). Here, n is an array of
arrays. n[1] selects the second array and the additional [-1] steps
out of that 2nd array. If I recall correctly, James Kanze talked about
a possible debugging implementation which would augment pointers to
store additional information to enable runtime checking of index-out-
of-bounds conditions w.r.t. pointer arithmetic and op[]. IIRC, he
claimed this implementation to be conforming. But it would obviously
"break" your example.

An implementation storing data before or after each array would be
problematic indeed.
It would also break n[1][1] == (&n[0][0])[3] which would be
surprising.

I think you misunderstood. I was talking about the possibility of an
implementation that makes raw pointers more intelligent for debugging
purposes. Imagine a "raw pointer" that not only stores the adress but
also a range of possibly valid "indices". A built-in version of

struct raw_pointer {
char* address;
ptrdiff_t begin_idx, end_idx;
};

so-to-speak where an array-to-pointer decay in a case like

void foo(int * r) {
r += 9; // <-- runtime check fails, program aborts
}

void bar() {
int arr[5];
int* p = arr;
int* q = p+1;
foo(p);
}

would make p store the address of the first element along with 0 and 5
as an index pair and q the adress of the second array element with an
index pair -1 and 4.

Now, I'm not aware of any such implementation and I'm not claiming
this to be a good idea. But IIRC James Kanze hinted at something like
this and claimed it to be conforming to the C++ ISO standard.
I could not find a guarantee of continuity of the layout in the
standard to back up that claim and, to my untrained eye, it carefully
steps around the issue.

Well, it would surprize me if sizeof(T[N])==N*sizeof(T) was not
satisfied. But I am not aware of any such guarantee, either.
[...]
Which tends to hint that E1 is however converted into a pointer and no
special treatment is performed because E1 is of array type. In this
case, I expect normal pointer arithmetic applies and a negative
integral part of the subscript operator is acceptable provided you
stay within the underlying object.

As I said, IIRC James Kanze claimed this to be "formally U.B.".

Cheers!
SG
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,754
Messages
2,569,522
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top