Decreasing order of address within main

D

Dik T. Winter

> Morris Keesan wrote: .... ....
> Using a relative comparison operator on pointers that do not point into
> the same object is UB.

Read again. There is no comparison of pointers.
 
K

Keith Thompson

Kenneth Brody said:
I would have to agree that there is no UB. However, I would also have
to say that the result of comparing aptr to pbtr is "meaningless".

First, you are using the signed "intptr_t", meaning that b could be in
"higher" memory, yet bptr be "lower" because aptr is positive and bptr
is negative.

Ignoring that, and assuming you changed to uintptr_t,
[...]

Why do you assume that intptr_t provides a less meaningful mapping of
addresses than uinptr_t?

Imagine a system where addresses are treated as signed. You could
even have an object covering a range of addresses from, say, -10
to +10. (A null pointer would have to have a representation other
than all-bits-zero.)

I think most systems treat addresses as unsigned (assuming that
they're numerical at all), but I wouldn't be surprised if some treat
them as signed. On the other hand, some may just avoid having
anything cross the mid-range boundary, so addresses can be considered
either signed or unsigned.
 
D

Dik T. Winter

> Imagine a system where addresses are treated as signed. You could
> even have an object covering a range of addresses from, say, -10
> to +10. (A null pointer would have to have a representation other
> than all-bits-zero.)

Isn't the ARM a machine where some addresses were thought to be negative?
 
N

Nobody

Isn't the ARM a machine where some addresses were thought to be negative?

At the CPU level, data is neither signed nor unsigned. It's typically the
operations which treat their operands as signed or unsigned.

With two's complement, addition, subtraction and multiplication (but not
division) behave identically for signed or unsigned values. The main
difference is in comparisons.

A signed comparison subtracts two values then checks whether the overlow
flag is set, while an unsigned comparison would check the carry flag
instead.

Apart from division, the only common instruction which has signed and
unsigned variants is a right shift. An arithmetic (signed) right shift
duplicates the topmost bit (i.e. the sign bit) while a logical (unsigned)
shift fills with zeros.
 
K

Keith Thompson

Nobody said:
At the CPU level, data is neither signed nor unsigned. It's typically the
operations which treat their operands as signed or unsigned.

With two's complement, addition, subtraction and multiplication (but not
division) behave identically for signed or unsigned values. The main
difference is in comparisons.

A signed comparison subtracts two values then checks whether the overlow
flag is set, while an unsigned comparison would check the carry flag
instead.

Apart from division, the only common instruction which has signed and
unsigned variants is a right shift. An arithmetic (signed) right shift
duplicates the topmost bit (i.e. the sign bit) while a logical (unsigned)
shift fills with zeros.

Ok, but the issue is addresses.

Suppose a machine has, say, an auto-increment addressing mode (an
idea that goes back at least to the PDP-11), which is useful for
stepping through arrays. Thus something like:

*ptr++ = 0;

might be a single instruction. Assuming for concreteness and
simplicitly that addresses are 16 bits, what happens on the machine
level when ptr==0x7FFF?? What happens when ptr==0xFFFF? Can a
single object cover a range of addresses that includes 0x7FFF and
0x8000? What about 0xFFFF and 0x0000 (or, equivalently, -1 and 0)?
What instructions are used to compare addresses?

I don't know the answers to any of those questions for any specific
architecture, but certain sets of answers would imply that addresses
are signed, and certain other sets of answers would imply that
they're unsigned.

And yet other sets of answers might imply that the answer is
indeterminate; either signed or unsigned comparison could work
equally well if no object can span certain address boundaries.

This is approaching the edge of clc topicality, if it hasn't
already crossed it.
 
B

Ben Pfaff

Keith Thompson said:
Imagine a system where addresses are treated as signed. You could
even have an object covering a range of addresses from, say, -10
to +10. (A null pointer would have to have a representation other
than all-bits-zero.)

x86-64 treats addresses as signed numbers. Usually, user
processes occupy positive addresses and the kernel occupies
negative addresses. I don't think that objects are allowed to
cross 0.
 
P

Phil Carmody

Nobody said:
At the CPU level, data is neither signed nor unsigned. It's typically the
operations which treat their operands as signed or unsigned.

With two's complement, addition, subtraction and multiplication (but not
division) behave identically for signed or unsigned values.

Full- (or double-, depending on your PoV) width multiplies are different
too. ff*ff = 0001 or fe01.
The main difference is in comparisons.

A signed comparison subtracts two values then checks whether the overlow
flag is set, while an unsigned comparison would check the carry flag
instead.

Apart from division, the only common instruction which has signed and
unsigned variants is a right shift.

And multiply.

Phil
 
P

Phil Carmody

Eric Sosman said:
<topicality level="minimal">

Long ago I used a machine that treated all its CPU registers
as signed magnitude numbers, and did arithmetic accordingly.
Addresses were notionally unsigned; the machine just grabbed the
right number of low-order bits from the appropriate register and
ignored the rest, including the sign bit.

The fun part was that "all CPU registers" included the program
counter, and that "increment" meant "add one." I wasted a fair
amount of time trying to concoct a sequence of instructions that
would execute normally until encountering one that set the PC's sign
bit, then run again in reverse as the PC "incremented" to successively
lower addresses ...

</topicality>

If all low-topicality stuff was as fun as that, I'd be campaigning
for less topicality!

If C were gcc, you'd actually be bang on topic, with the perfect
counter-example for the not-Frequently-Asked-purely-as-C-isn't-gcc
Question which would no doubt appear! (Explanation in headers.)

Phil
 
M

Morris Keesan

the output might be

":red-segment: :blue-segment: :beige-segment:"

Indeed. But there's no "undefined behavior" there (as defined by
the C standard: "behavior, upon use of a nonportable or erroneous
program construct or of erroneous data, for which this International
Standard imposes no requirements". The standard clearly requires
the values of the three pointers to be converted "to a sequence of
printing characters, in an implementation-defined manner."

No, you're causing implementation-defined behavior, which is a totally
different thing.
you're comparing pointers to different objects which is UB.
I like the idea that my mind can exhibit undefined behaviour...
Have I reformatted my hard drive just by thinking about this stuff?
:)

In my quoted code, there's no pointer comparison. The code below
compares two integers.
hmm. well that's unspecified behaviour. Though we know aptr
and bptr will end up with valid integers.


interesting. The complier would have to remember that they had been
pointers

No. The compiler doesn't have to remember anything. The compiler
just has to generate code which converts the pointers to integers
(in the platforms I've used, where there is a suitable integer type,
this has always been a direct bit-for-bit copy), and then generate
code which compares those two integers.

My point, in this not-very-useful code sample, was not to suggest any
portable or particularly meaningful code. I was simply arguing with the
claim that it's undefined behavior "to even _try to find_ this
information."
(i.e. the relative addresses of unrelated variables).
 
N

Nobody

Ok, but the issue is addresses.

Suppose a machine has, say, an auto-increment addressing mode (an
idea that goes back at least to the PDP-11), which is useful for
stepping through arrays. Thus something like:

*ptr++ = 0;

might be a single instruction. Assuming for concreteness and
simplicitly that addresses are 16 bits, what happens on the machine
level when ptr==0x7FFF?? What happens when ptr==0xFFFF?

"comparison between pointer and integer" == UB ;)

Seriously, the first case will result in ptr==0x8000 (or, if you prefer,
ptr==-0x8000; they're the same thing as far as the CPU is concerned),
while the second case will result in ptr==0x0000.

Simply using the representation "0xFFFF" for a 16-bit value is treating
the values as unsigned. A 16-bit signed integer cannot be 0xFFFF; that
bit pattern would be called -0x0001.
Can a
single object cover a range of addresses that includes 0x7FFF and
0x8000? What about 0xFFFF and 0x0000 (or, equivalently, -1 and 0)?
What instructions are used to compare addresses?

Whichever ones the compiler writer decides to use. C only defines pointer
comparison for elements of a common array. At the CPU level, you have the
same options for comparing pointers as for comparing anything else.

I can't think of a situation where the CPU considers addresses as either
"signed" or "unsigned"; they are just "words".

At the C level, on any platform with two's-complement arithmetic, most
integer operations use the same machine instructions regardless of whether
the values are signed or unsigned. The signedness only becomes relevant
for division, right shift, and comparisons.

At the machine level, there is no signed or unsigned, just words.
 
N

Nobody

Full- (or double-, depending on your PoV) width multiplies are different
too. ff*ff = 0001 or fe01.

My PoV is "double-".

In C, int * int -> int, long * long -> long, and so on.

Once the types have been promoted, it makes no difference as to their
signedness. OTOH, the promotion is affected by the signedness.
And multiply.

True for x86's double-width multiply, but how many architectures have that
feature?
 
N

Nobody

At the CPU level, data is neither signed nor unsigned. It's typically the
operations which treat their operands as signed or unsigned.

Since posting this, it has occurred to me that there's one case where a
CPU might treat values as signed: if the PC (IP) doesn't use a full word,
it's possible that operations which copy the PC as data might use the
topmost valid bit to fill the unused bit (i.e. sign extension).

I don't think that the ARM does this, though.
 
K

Keith Thompson

Nobody said:
I can't think of a situation where the CPU considers addresses as either
"signed" or "unsigned"; they are just "words".
[...]

Assuming, as before, 16-bit addresses, if a single 32-byte object
can cover the range of addresses from 0x7FF0 to 0x800F, then
addresses are being treated as unsigned. Similarly, if a single
32-byte object can cover the range of addresses from -16 to +15
,then addresses are being treated as signed (and a null pointer
is not all-bits-zero). If both are possible then it's a rather
odd architecture. If neither situation can occur, then it probably
doesn't matter whether addresses are considered signed or unsigned.
 
P

Phil Carmody

[UNSNIP - "At the CPU level ... "]
My PoV is "double-".

In C

Nah, doesn't wash. We were at the CPU level, if you remember.
, int * int -> int, long * long -> long, and so on.

Once the types have been promoted, it makes no difference as to their
signedness. OTOH, the promotion is affected by the signedness.


True for x86's double-width multiply, but how many architectures have that
feature?

Well, the first processor I used that had a multiply instruction had both
signed and unsigned. The architecture I've used the most since then also
had this pair. The two other architectures I've used extensively in that
time also have them. Of the two other architectures I've used but not
extensively programmed for, one didn't have a muliply at all, the other
had both types. Only one architecture I've used that has a multiply
instruction at all fails to have the pair.

So that's 5/6 in my experience (plus 3 architectures without a multiply
at all).

Phil
 
N

Nobody

Nobody said:
I can't think of a situation where the CPU considers addresses as either
"signed" or "unsigned"; they are just "words".
[...]

Assuming, as before, 16-bit addresses, if a single 32-byte object
can cover the range of addresses from 0x7FF0 to 0x800F, then
addresses are being treated as unsigned. Similarly, if a single
32-byte object can cover the range of addresses from -16 to +15
,then addresses are being treated as signed

No, it just means that they wrap.
(and a null pointer is not all-bits-zero).

"null pointer" is a C concept; it doesn't mean anything to the CPU.
If both are possible then it's a rather odd architecture.

Actually, I think that most 16-bit CPUs will happily read a 16-bit value
from both 0x7FFF-0x8000 and 0xFFFF-0x0000. Most of them don't have any
alignment constraints and don't care about addresses wrapping.

The fact that all 2^16 addresses are valid doesn't preclude using 0x0000
(or any other value) as the null pointer. The implementation just needs to
ensure that it doesn't use that address for any allocation; as
dereferencing a null-pointer is UB, it doesn't have to explicitly check
for such.
 
R

Richard Bos

Mark McIntyre said:
Firstly its my understanding that n1256 is the final draft, not the
edited final version.

Yes, but for ordinary programmers, the differences between the two are
so small that they might as well not exist. However, it may be relevant
for legal reasons. Someone may be willing to pay money just so their
lawyers can say that they have a copy of the _official_ Standard.

Richard
 
J

James Kuyper

Richard said:

No. They started editing from the final officially approved C99
standard, applying all three officially approved Technical corrigenda.
... but for ordinary programmers, the differences between the two are
so small that they might as well not exist. However, it may be relevant
for legal reasons. Someone may be willing to pay money just so their
lawyers can say that they have a copy of the _official_ Standard.

To get the official standard, you need not only the C99 standard itself,
but also all three officially approved Technical Corrigenda; n1256.pdf
is less official than that set of four documents, but is a lot more
convenient for actual use (and much cheaper, too).
 
K

Keith Thompson

James Kuyper said:
No. They started editing from the final officially approved C99
standard, applying all three officially approved Technical corrigenda.

Whether Mark McIntyre's description is correct depends on how you
parse "edited final version". n1256 is a draft (at least that's what
it calls itself); it is not a final version.
To get the official standard, you need not only the C99 standard
itself, but also all three officially approved Technical Corrigenda;
n1256.pdf is less official than that set of four documents, but is a
lot more convenient for actual use (and much cheaper, too).

Note also that the three TCs are available at no charge from ansi.org.
 
S

Stephen Sprunk

Ben said:
x86-64 treats addresses as signed numbers. Usually, user
processes occupy positive addresses and the kernel occupies
negative addresses. I don't think that objects are allowed to
cross 0.

They're not, but not for exactly that reason. No object can occupy the
first page of memory (0 to +4095), to trap null pointer dereferences;
additionally, no object can exist in both user space and kernel space,
which also covers the wrap from positive to negative.

Most x86 systems could be viewed as having signed pointers as well, with
the same division and rules. Notable exceptions are a special mode in
Win32 that allows the user/kernel division to be 3GB/1GB and one Red Hat
Linux variant that makes it 4GB+4GB (minus some trampolines and bounce
buffers). Those oddities have mostly gone away, though, now that people
can simply use x64 and get as much space as they need.

S
 
S

Stephen Sprunk

Richard said:
GCC has an feature that tracks whether it's possible for a pointer to be
null; if you dereference a pointer, GCC then sets the "notnull"
attribute on it and any future checks for a null pointer are optimized
away.
[...]
I assume that this optimization is to remove redundant tests/branches
and therefore improve performance; presumably it wouldn't be there if it
didn't help in at least some cases.

As I've said before, I wish it would tell you when it's doing
this, as it traditionally has with simpler optimisations such as
always-true comparisons. Being able to remove a chunk of code
can be a sign of a mistake by the programmer, and just removing
it often makes the results of the error even more obscure.

One counter-example comes to mind:

inline void foo(void *x) {
if (!x) return;
/* do something that dereferences x */
}

void bar(void *x) {
if (!x) return;
/* do something that dereferences x */
foo(x);
/* do something more that dereferences x */
}

This is not a bug; foo() needs to be protected against dumb callers, of
which there might be many. However, I would expect that foo()'s test be
optimized away when it was inlined in smart callers, e.g. bar(), because
it's redundant. I would be annoyed if I saw a warning for that.

S
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,775
Messages
2,569,601
Members
45,183
Latest member
BettinaPol

Latest Threads

Top