Decreasing order of address within main

D

Dik T. Winter

>
> At the CPU level, data is neither signed nor unsigned. It's typically the
> operations which treat their operands as signed or unsigned.

Right. But if I remember right, on the ARM memory started always at an
address 'below' 0 and continued to an address 'above' 0.
 
C

Chris Dollin

Dik said:
Right. But if I remember right, on the ARM memory started always at an
address 'below' 0 and continued to an address 'above' 0.

I think that's the Transputer, not the ARM.
 
T

Tim Rentsch

Keith Thompson said:
Nobody said:
I can't think of a situation where the CPU considers addresses as either
"signed" or "unsigned"; they are just "words".
[...]

Assuming, as before, 16-bit addresses, if a single 32-byte object
can cover the range of addresses from 0x7FF0 to 0x800F, then
addresses are being treated as unsigned. Similarly, if a single
32-byte object can cover the range of addresses from -16 to +15
,then addresses are being treated as signed (and a null pointer
is not all-bits-zero). If both are possible then it's a rather
odd architecture. ["If neither" snipped]

ISTM that the "signed-ness" of pointers is determined by how
comparisons work. If (char*)0x7FF0 > (char*)0x800F (and assuming
that the conversion just changes the type and not any of the
bits), then we'd probably call those pointers "signed";
similarly (char*)0xFFF0 > (char*)0x0010 would mean "unsigned".

/However/, it's easy to get both indications in a larger address
space. Suppose we have pointers, ints and unsigned ints all
having 32 bits[*], and pointer comparison is defined

p1 <= p2 IFF (unsigned) p2 - (unsigned) p1 < 0x80000000

This definition of comparison allows objects to be more than two
billion bytes, and they can straddle both 0x0 and 0x80000000
(well, not both at once, but either one by different objects).
Such an architecture would have neither "signed" nor "unsigned"
pointers; the address space is homogeneous and isotropic,
as my physics friends used to say.

[*] In this mythical architecture, CHAR_BIT is 11 bits,
pointers, ints and unsigned ints all have size 3 (with
ints and unsigned ints having one padding bit),
the type (uintptr_t) is (unsigned long), which has a width
of 33 bits, and a null pointer has the bit set which is
a padding bit in int/unsigned int.
 
T

Tim Rentsch

Phil Carmody said:
[UNSNIP - "At the CPU level ... "]
My PoV is "double-".

In C

Nah, doesn't wash. We were at the CPU level, if you remember.
, int * int -> int, long * long -> long, and so on.

Once the types have been promoted, it makes no difference as to their
signedness. OTOH, the promotion is affected by the signedness.


True for x86's double-width multiply, but how many architectures have that
feature?

Well, the first processor I used that had a multiply instruction had both
signed and unsigned. The architecture I've used the most since then also
had this pair. The two other architectures I've used extensively in that
time also have them. Of the two other architectures I've used but not
extensively programmed for, one didn't have a muliply at all, the other
had both types. Only one architecture I've used that has a multiply
instruction at all fails to have the pair. [snip summary]

The point is that to supply C semantics the architecture
needs to provide only one multiply instruction (if 2's
complement representation is used). Presumably the
architectures you mentioned either weren't using
2's complement, or had separate instructions to
set overflow/carry flags differently (or for some
other multiple-precision arithmetic capability).
 
P

Phil Carmody

Tim Rentsch said:
Keith Thompson said:
Nobody said:
I can't think of a situation where the CPU considers addresses as either
"signed" or "unsigned"; they are just "words".
[...]

Assuming, as before, 16-bit addresses, if a single 32-byte object
can cover the range of addresses from 0x7FF0 to 0x800F, then
addresses are being treated as unsigned. Similarly, if a single
32-byte object can cover the range of addresses from -16 to +15
,then addresses are being treated as signed (and a null pointer
is not all-bits-zero). If both are possible then it's a rather
odd architecture. ["If neither" snipped]

ISTM that the "signed-ness" of pointers is determined by how
comparisons work. If (char*)0x7FF0 > (char*)0x800F (and assuming
that the conversion just changes the type and not any of the
bits), then we'd probably call those pointers "signed";
similarly (char*)0xFFF0 > (char*)0x0010 would mean "unsigned".

I seem to remember that the standard does use the word "overflow"
regarding pointers without defining precisely what they mean.
That might complicate matters, as one might view either of the
looks-like-a-wrap-but-isn't as an overflow, and the C standard can't
counter you.
/However/, it's easy to get both indications in a larger address
space. Suppose we have pointers, ints and unsigned ints all
having 32 bits[*], and pointer comparison is defined

p1 <= p2 IFF (unsigned) p2 - (unsigned) p1 < 0x80000000

This definition of comparison allows objects to be more than two
billion bytes, and they can straddle both 0x0 and 0x80000000
(well, not both at once, but either one by different objects).
Such an architecture would have neither "signed" nor "unsigned"
pointers; the address space is homogeneous and isotropic,
as my physics friends used to say.

It's not so far from imaginable.

Phil
 
P

Phil Carmody

Tim Rentsch said:
Phil Carmody said:
[UNSNIP - "At the CPU level ... "]
With two's complement, addition, subtraction and multiplication (but not
division) behave identically for signed or unsigned values.

Full- (or double-, depending on your PoV) width multiplies are different
too. ff*ff = 0001 or fe01.

My PoV is "double-".

In C

Nah, doesn't wash. We were at the CPU level, if you remember.
, int * int -> int, long * long -> long, and so on.

Once the types have been promoted, it makes no difference as to their
signedness. OTOH, the promotion is affected by the signedness.

Apart from division, the only common instruction which has signed and
unsigned variants is a right shift.

And multiply.

True for x86's double-width multiply, but how many architectures have that
feature?

Well, the first processor I used that had a multiply instruction had both
signed and unsigned. The architecture I've used the most since then also
had this pair. The two other architectures I've used extensively in that
time also have them. Of the two other architectures I've used but not
extensively programmed for, one didn't have a muliply at all, the other
had both types. Only one architecture I've used that has a multiply
instruction at all fails to have the pair. [snip summary]

The point is that to supply C semantics the architecture
needs to provide only one multiply instruction (if 2's
complement representation is used). Presumably the
architectures you mentioned either weren't using
2's complement, or had separate instructions to
set overflow/carry flags differently (or for some
other multiple-precision arithmetic capability).

Close. Your presumptions are true for 4 of the 5 archs.

Phil
 
K

Keith Thompson

Tim Rentsch said:
Keith Thompson said:
Nobody said:
I can't think of a situation where the CPU considers addresses as either
"signed" or "unsigned"; they are just "words".
[...]

Assuming, as before, 16-bit addresses, if a single 32-byte object
can cover the range of addresses from 0x7FF0 to 0x800F, then
addresses are being treated as unsigned. Similarly, if a single
32-byte object can cover the range of addresses from -16 to +15
,then addresses are being treated as signed (and a null pointer
is not all-bits-zero). If both are possible then it's a rather
odd architecture. ["If neither" snipped]

ISTM that the "signed-ness" of pointers is determined by how
comparisons work. If (char*)0x7FF0 > (char*)0x800F (and assuming
that the conversion just changes the type and not any of the
bits), then we'd probably call those pointers "signed";
similarly (char*)0xFFF0 > (char*)0x0010 would mean "unsigned".

I tend to agree, but strictly speaking it might still not be
meaningful.

If no object can span certain address boundaries, then no relational
operator (< <= > >=) on pointers *with defined behavior* can be
affected by whether the comparison is done using a signed or an
unsigned integer comparison. A conforming C implementation could
randomly choose one or the other.

The following program:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main(void)
{
void *p0;
void *p1;
if (sizeof (void*) == sizeof (int)) {
p0 = (void*)INT_MAX;
p1 = (void*)INT_MIN;
}
else if (sizeof (void*) == sizeof (long)) {
p0 = (void*)LONG_MAX;
p1 = (void*)LONG_MIN;
}
else {
fputs("Neither int nor long is the same size as void*\n", stderr);
exit(EXIT_FAILURE);
}
printf("p0 = %p\n", p0);
printf("p1 = %p\n", p1);
if (p0 < p1) puts("p0 < p1");
if (p0 <= p1) puts("p0 <= p1");
if (p0 == p1) puts("p0 == p1");
if (p0 != p1) puts("p0 != p1");
if (p0 >= p1) puts("p0 >= p1");
if (p0 > p1) puts("p0 > p1");
return 0;
}

exhibits about 42 metric tons of undefined behavior, but its output,
if any, might be moderately interesting. (On my system it implies
that the compiler treats addresses as unsigned.)

Of course the whole idea of addresses being either signed or unsigned
is completely unsupported by the C standard.
 
T

Tim Rentsch

Phil Carmody said:
Not sure how that follows from the quoted paragraph.


Any sufficiently aggressive optimiser won't even bother setting
any variable set to such an undefined value. Nor will it set any
future values dependent on that variable.

Nonsense.

Any sufficiently aggressive optimizer can choose to make such
optimizations /if/ they're consistent with other decisions made
by the implementation, but not every implementation will make
decisions that allow such optimizations. Just because the
Standard declares a certain behavior as undefined doesn't mean an
implementation will choose not to define it. The optimizer is
subject to the whims of the implementation, not the other way
around.
 
T

Tim Rentsch

Nobody said:
Consider the following code (paraphrasing a bug which was recently
discovered in the linux kernel):

1 int x = p->x;
2 if (!p) return;
...

Some versions of gcc are sufficiently aggressive that they optimise line
2 out of existence.

The rationale is that because p->x had already been evaluated at line 1,
p being null leads to undefined behaviour. If p is not null, the return on
line 2 won't occur, but if p is null, the compiler is free to return, or
not return, or do whatever else it feels like doing.

Another choice, equally valid and for some purposes better, would
be to re-write these two statements as

if(!p) return;
int x = p->x;

In this particular case it seems better to steer in the direction
of greater safety (as this second form of re-writing does).
Personally, I'd rather get a warning than either "optimization".
But if I have to choose between one of the above approaches,
normally I'd choose Dr. Jekyll over Mr. Hyde.
 
T

Tim Rentsch

Stephen Sprunk said:
GCC has an feature that tracks whether it's possible for a pointer to be
null; if you dereference a pointer, GCC then sets the "notnull"
attribute on it and any future checks for a null pointer are optimized
away.
[...]
I assume that this optimization is to remove redundant tests/branches
and therefore improve performance; presumably it wouldn't be there if it
didn't help in at least some cases.

As I've said before, I wish it would tell you when it's doing
this, as it traditionally has with simpler optimisations such as
always-true comparisons. Being able to remove a chunk of code
can be a sign of a mistake by the programmer, and just removing
it often makes the results of the error even more obscure.

I second this motion, at least that there ought to be
a flag to set to ask for the warning. And enabling
the optimization should set the warning flag ON by
default.
 
K

Keith Thompson

Tim Rentsch said:
Another choice, equally valid and for some purposes better, would
be to re-write these two statements as

if(!p) return;
int x = p->x;
[...]

I'm not sure whether the Linux kernel is normally compiled with
options that cause gcc to permit mixed declarations and statements.
If not, it would have to be:

int x;
if (!p) return;
x = p->x;
 
T

Tim Rentsch

Stephen Sprunk said:
Richard said:
GCC has an feature that tracks whether it's possible for a pointer to be
null; if you dereference a pointer, GCC then sets the "notnull"
attribute on it and any future checks for a null pointer are optimized
away.
[...]
I assume that this optimization is to remove redundant tests/branches
and therefore improve performance; presumably it wouldn't be there if it
didn't help in at least some cases.

As I've said before, I wish it would tell you when it's doing
this, as it traditionally has with simpler optimisations such as
always-true comparisons. Being able to remove a chunk of code
can be a sign of a mistake by the programmer, and just removing
it often makes the results of the error even more obscure.

One counter-example comes to mind:

inline void foo(void *x) {
if (!x) return;
/* do something that dereferences x */
}

void bar(void *x) {
if (!x) return;
/* do something that dereferences x */
foo(x);
/* do something more that dereferences x */
}

This is not a bug; foo() needs to be protected against dumb callers, of
which there might be many. However, I would expect that foo()'s test be
optimized away when it was inlined in smart callers, e.g. bar(), because
it's redundant. I would be annoyed if I saw a warning for that.

Ahh, that's a good example.

Surely though any compiler smart enough to do the optimization in
the first place can be made smart enough to distinguish between
the annoying cases and the non-annoying cases. Also, if the
compiler is smart enough to take out the later test, it ought
to be able to transform

int x = p->x;
if(!p) return;

into

ASSERT(p!=0);
int x = p->x;

(for some appropriate form of ASSERT), and provide that
as an option for situations when subsequent code blocks
would be removed.
 
N

Nobody

Apart from division, the only common instruction which has signed and
unsigned variants is a right shift.

And multiply.

True for x86's double-width multiply, but how many architectures have that
feature?

Well, the first processor I used that had a multiply instruction had both
signed and unsigned. The architecture I've used the most since then also
had this pair. The two other architectures I've used extensively in that
time also have them. Of the two other architectures I've used but not
extensively programmed for, one didn't have a muliply at all, the other
had both types. Only one architecture I've used that has a multiply
instruction at all fails to have the pair. [snip summary]

The point is that to supply C semantics the architecture
needs to provide only one multiply instruction (if 2's
complement representation is used). Presumably the
architectures you mentioned either weren't using
2's complement, or had separate instructions to
set overflow/carry flags differently (or for some
other multiple-precision arithmetic capability).

The point is that the CPU can provide features which go beyond C
semantics, e.g. double-width multiply (i.e. multiplying 2 32-bit values
produces a 64-bit value).

If the CPU uses two's complement and only offers a C-style multiply
(where the result is truncated to the width of the operands) there is no
difference between signed and unsigned.
 
P

Phil Carmody

Tim Rentsch said:
Nonsense.

I realised later that that's an introit to warn me about what follows.
Any sufficiently aggressive optimizer can choose to make such
optimizations /if/ they're consistent with other decisions made
by the implementation, but not every implementation will make
decisions that allow such optimizations.

So I say it can conditionally do something (on condition that it's
aggressive enough), and you say it can conditionally do something
(on condition that it knows what it's doing). Right.
Just because the
Standard declares a certain behavior as undefined doesn't mean an
implementation will choose not to define it.

Then it's not sufficiently aggressive. And the direction of your logic
is entirely unsuitable for addressing the point I raise.
The optimizer is
subject to the whims of the implementation, not the other way
around.

The optimiser is part of the implementation.

Phil
 
P

Phil Carmody

Keith Thompson said:
I'm not sure whether the Linux kernel is normally compiled with
options that cause gcc to permit mixed declarations and statements.
If not, it would have to be:

int x;
if (!p) return;
x = p->x;

It gibbers about C90, and then continues to compile it. Given that
there are _tons_ of non-C90 things in the linux kernel that it
doesn't complain about, I've never understood it singling that one
out for the combination of warning and ignoring.

Phil
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
Another choice, equally valid and for some purposes better, would
be to re-write these two statements as

if(!p) return;
int x = p->x;
[...]

I'm not sure whether the Linux kernel is normally compiled with
options that cause gcc to permit mixed declarations and statements.
If not, it would have to be:

int x;
if (!p) return;
x = p->x;

Sorry, my meaning wasn't clear enough. I wasn't talking
about whether the source code is written in C99, only trying
to indicate the transformation that could be applied. That
transformation is easier to express in C99 than C90, but I
wasn't talking about changing the actual program source --
only about what the revised semantics (under the proposed
"optimization") would be.
 
T

Tim Rentsch

Phil Carmody said:
I realised later that that's an introit to warn me about what follows.


So I say it can conditionally do something (on condition that it's
aggressive enough), and you say it can conditionally do something
(on condition that it knows what it's doing). Right.


Then it's not sufficiently aggressive. And the direction of your logic
is entirely unsuitable for addressing the point I raise.

Based on the first sentence in that paragraph, it sounds like
you're saying that any optimizer that's evolved to the point that
it makes such optimizations will make such optimizations. Are
you doing anything more than giving an implicit definition for
"sufficiently aggressive"? Or does "sufficiently agressive"
have an independent meaning so that your statement isn't just
a tautology? If so, what is it? Or do you really mean "any
sufficiently aggressive /implementation/ will ...."?

The optimiser is part of the implementation.

First, it's perfectly possible to write optimizers
that are independent of any particular implementation,
as I'm sure you must know.

Second, even ignoring that, your comment doesn't invalidate
my point. An implementation is defined by the choices
it makes about which representations to use, how various
behaviors are defined, etc. These choices constitute
the "interface" of the implementation". The optimizer
is part of the "implementation" of the implementation.
The optimizer used in an implementation is limited by the
choices made in defining how the implementation behaves;
otherwise, its optimizations aren't valid for that
implementation.
 
T

Tim Rentsch

Phil Carmody said:
Tim Rentsch said:
Keith Thompson said:
[...]
I can't think of a situation where the CPU considers addresses as either
"signed" or "unsigned"; they are just "words".
[...]

Assuming, as before, 16-bit addresses, if a single 32-byte object
can cover the range of addresses from 0x7FF0 to 0x800F, then
addresses are being treated as unsigned. Similarly, if a single
32-byte object can cover the range of addresses from -16 to +15
,then addresses are being treated as signed (and a null pointer
is not all-bits-zero). If both are possible then it's a rather
odd architecture. ["If neither" snipped]

ISTM that the "signed-ness" of pointers is determined by how
comparisons work. If (char*)0x7FF0 > (char*)0x800F (and assuming
that the conversion just changes the type and not any of the
bits), then we'd probably call those pointers "signed";
similarly (char*)0xFFF0 > (char*)0x0010 would mean "unsigned".

I seem to remember that the standard does use the word "overflow"
regarding pointers without defining precisely what they mean.
That might complicate matters, as one might view either of the
looks-like-a-wrap-but-isn't as an overflow, and the C standard can't
counter you.

Probably what you're remembering is this sentence in 6.5.6p8:

If both the pointer operand and the result point to elements
of the same array object, or one past the last element of
the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined.

It's talking about adding a pointer and an integer, and
all it's saying is that adding a pointer and an integer
has to work provided the integer is within the range
of acceptable values for the pointer and array in question.
IOW overflow is meant in the "exceptional condition"
sense, not in the sense of what boundaries might be
crossed.
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
Keith Thompson said:
[...]
I can't think of a situation where the CPU considers addresses as either
"signed" or "unsigned"; they are just "words".
[...]

Assuming, as before, 16-bit addresses, if a single 32-byte object
can cover the range of addresses from 0x7FF0 to 0x800F, then
addresses are being treated as unsigned. Similarly, if a single
32-byte object can cover the range of addresses from -16 to +15
,then addresses are being treated as signed (and a null pointer
is not all-bits-zero). If both are possible then it's a rather
odd architecture. ["If neither" snipped]

ISTM that the "signed-ness" of pointers is determined by how
comparisons work. If (char*)0x7FF0 > (char*)0x800F (and assuming
that the conversion just changes the type and not any of the
bits), then we'd probably call those pointers "signed";
similarly (char*)0xFFF0 > (char*)0x0010 would mean "unsigned".

I tend to agree, but strictly speaking it might still not be
meaningful.

If no object can span certain address boundaries, then no relational
operator (< <= > >=) on pointers *with defined behavior* can be
affected by whether the comparison is done using a signed or an
unsigned integer comparison. A conforming C implementation could
randomly choose one or the other.

I believe in fact that your statement is true even without
the "if no object can span certain address boundaries," clause.
Talking about "signedness" of pointers normally means if the
pointer bits are interpreted integers do pointer comparisons work
the same as signed comparisons or as unsigned comparisons.
The following program:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main(void)
{
void *p0;
void *p1;
if (sizeof (void*) == sizeof (int)) {
p0 = (void*)INT_MAX;
p1 = (void*)INT_MIN;
}
else if (sizeof (void*) == sizeof (long)) {
p0 = (void*)LONG_MAX;
p1 = (void*)LONG_MIN;
}
else {
fputs("Neither int nor long is the same size as void*\n", stderr);
exit(EXIT_FAILURE);
}
printf("p0 = %p\n", p0);
printf("p1 = %p\n", p1);
if (p0 < p1) puts("p0 < p1");
if (p0 <= p1) puts("p0 <= p1");
if (p0 == p1) puts("p0 == p1");
if (p0 != p1) puts("p0 != p1");
if (p0 >= p1) puts("p0 >= p1");
if (p0 > p1) puts("p0 > p1");
return 0;
}

exhibits about 42 metric tons of undefined behavior, but its output,
if any, might be moderately interesting. (On my system it implies
that the compiler treats addresses as unsigned.)

Of course, a program like this might get different results
if the pointer assignments were written

p0 = (void*)(unsigned int)INT_MAX;
p1 = (void*)(unsigned int)INT_MIN;

etc. It might be better (and adding to the gross tonnage
of undefined behavior) to use an approach like this:

p0 = *(void**)(int[1]){INT_MAX};
p1 = *(void**)(int[1]){INT_MIN};
 
T

Tim Rentsch

Phil Carmody said:
Tim Rentsch said:
Phil Carmody said:
On Sat, 26 Sep 2009 10:47:58 +0300, Phil Carmody wrote:

[UNSNIP - "At the CPU level ... "]

With two's complement, addition, subtraction and multiplication (but not
division) behave identically for signed or unsigned values.

Full- (or double-, depending on your PoV) width multiplies are different
too. ff*ff = 0001 or fe01.

My PoV is "double-".

In C

Nah, doesn't wash. We were at the CPU level, if you remember.

, int * int -> int, long * long -> long, and so on.

Once the types have been promoted, it makes no difference as to their
signedness. OTOH, the promotion is affected by the signedness.

Apart from division, the only common instruction which has signed and
unsigned variants is a right shift.

And multiply.

True for x86's double-width multiply, but how many architectures have that
feature?

Well, the first processor I used that had a multiply instruction had both
signed and unsigned. The architecture I've used the most since then also
had this pair. The two other architectures I've used extensively in that
time also have them. Of the two other architectures I've used but not
extensively programmed for, one didn't have a muliply at all, the other
had both types. Only one architecture I've used that has a multiply
instruction at all fails to have the pair. [snip summary]

The point is that to supply C semantics the architecture
needs to provide only one multiply instruction (if 2's
complement representation is used). Presumably the
architectures you mentioned either weren't using
2's complement, or had separate instructions to
set overflow/carry flags differently (or for some
other multiple-precision arithmetic capability).

Close. Your presumptions are true for 4 of the 5 archs.

I thought it was implied that I wasn't including the
case where the processor had no multiply instructions
at all.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top