Implementation -defined behavior

amit.codename13 · May 6, 2009

will the following code have implementation defined behavior???

int main()
{
int *i,j;
i=(int*)10;
return 0;
}

is it certain as to what value is stored in i???

Antoninus Twink · May 6, 2009

int main()
{
int *i,j;
i=(int*)10;
return 0;
}

is it certain as to what value is stored in i???

Yes it is. (The representation of that value is not certain: it may be
little or big endian depending on the architecture the code is compiled
for.)

Pointers are interchangeable with the signed integer type intptr_t, and
/any/ integer type is guaranteed to be large enough to store 10 without
overflow.

In practise, of course, intptr_t will either be int32_t or int64_t on a
modern non-embedded implementation.

Flash Gordon · May 7, 2009

Antoninus said:
Yes it is.

<snip>

Wrong. It could be a trap and if it isn't the conversion is
implementation defined. I can't be bothered to correct all the other
errors you made.

Antoninus Twink · May 7, 2009

Wrong. It could be a trap and if it isn't the conversion is
implementation defined.

Let's break it up into an extra step then, to be clear about what's
going on.

int *ip;
intptr_t i;
i = 10;
ip = (int *) i;
assert((intptr_t) i == 10);

Would you like to name me an implementation where this code compiles and
the assertion fails when it runs?

amit.codename13 · May 7, 2009

Let's break it up into an extra step then, to be clear about what's
going on.

int *ip;
intptr_t i;
i = 10;
ip = (int *) i;
assert((intptr_t) i == 10);

Would you like to name me an implementation where this code compiles and
the assertion fails when it runs?

thats exactly what i need the answer for...

Antoninus Twink · May 7, 2009

(Unfortunately, one careless respondent has given Question B's answer
for Question A.)

[snip]

I think we are in complete agreement here, Eric. Why not leave the
stupid polemics to the Heathfield-Thomson-Falconer trolling machine?

amit.codename13 · May 8, 2009

Note that you've changed your question. You began with
"Is it certain?" and now you've switched to something more like
"Is it likely?" You should not be surprised when different
questions get different answers. (Unfortunately, one careless
respondent has given Question B's answer for Question A.)

"Is it certain" that converting 10 to an int* gives a known
value? No, it is not. Ints can be converted to pointers (and
vice versa), but everything about the conversion is implementation-
defined. Even the validity of the converted value is up to the
implementation.

"Is it likely" that converting 10 to an int* gives a known
value? Yes, it is. On most machines, pointers behave very much
like some flavor of integer, and there is a natural correspondence
between pointer values and integer values. The value 10 is almost
certainly included in the range of the correspondence.

A question you didn't ask, but might have: "Is the converted
value a valid int* pointer value?" Possibly, but probably not.
On many systems, the int corresponding to an int* must be a multiple
of four. Even on those with more relaxed alignment requirements,
it is common to find that "low core" addresses are off-limits and
inaccessible.

that was bad on my side... you have answered all the questions that i
wanted to get answer of...

thanks...

i misinterpreted what antonius told cos i was little biased that it is
*guareenteed* that the assertion that a value 10 is stored in i would
never fail

James Kuyper · May 8, 2009

i misinterpreted what antonius told cos i was little biased that it is
*guareenteed* that the assertion that a value 10 is stored in i would
never fail

A key point to understand is that a value of 10 cannot be stored in an
pointer. The value of a pointer is the location in memory that it points
at. That location might have an address of 10, and that address might be
stored in the representation of the pointer, but the actual value of the
pointer is not 10.

On a more complicated level, Antonius' comments to the contrary
notwithstanding, converting a value of 10 into a pointer is not
guaranteed to produce a pointer that, when converted back to an integer
type, will have a value of 10. That's possible, and commonplace, but not
guaranteed.

It is guaranteed that converting a valid pointer value to a intptr_t
(if available) and back to it's original type will produce a pointer
value that compares equal to the original. It might seem that this
guarantee implies the other guarantee, but it doesn't; multiple
different integer values might convert to the same pointer value,
without violating the guarantee, but the reverse conversion can produce
only one of those integer values, which need not be the same as the one
you started with.

Antoninus Twink · May 8, 2009

Multiple different integer values might convert to the same pointer
value, without violating the guarantee, but the reverse conversion can
produce only one of those integer values, which need not be the same
as the one you started with.

I repeat my invitation to name an implementation for which this is the
case.

Ike Naar · May 8, 2009

Let's break it up into an extra step then, to be clear about what's
going on.

int *ip;
intptr_t i;
i = 10;
ip = (int *) i;
assert((intptr_t) i == 10);

You're converting i, an intptr_t that has the value 10, to type intptr_t,
and then assert that the converted value equals 10 . What's the point?

Or did you mean ``assert((intptr_t) ip == 10);'' ?

Richard Bos · May 9, 2009

[email protected] said:
i misinterpreted what antonius told

That might be because Antoninus is indeed very much like a twink:
creamy, and splurging, but when push comes to shove, not very full of
real value.

Richard

lawrence.jones · May 13, 2009

Antoninus Twink said:
I repeat my invitation to name an implementation for which this is the
case.

Any word addressed machine where 10 is not correctly aligned for an int
and the pointer to int mapping produces a byte offset from address 0. I
believe the Cray falls into that camp.

mfhaigh · May 14, 2009

Any word addressed machine where 10 is not correctly aligned for an int
and the pointer to int mapping produces a byte offset from address 0. I
believe the Cray falls into that camp.

<snip>

It's also fairly easy to run into problems with 64 bit CPUs running 32
bit ABIs.

The 16 core MIPS64 NPU sitting here on my desk is one such example.
Any 32 bit code using the KSEG/SSEG kernel segments must take care to
construct sign-extended pointer values. As an example, take the KSEG0
base address. If you end up with:

0x00000000 80000000 (incorrect)

in a 64 bit register rather than:

0xFFFFFFFF 80000000 (correct)

.... you'll crash upon a dereference. Depending on your toolchain and
your configuration of it, a statement such as:

int *p = (int *) 0x80000000;

.... can generate either one! Perhaps even worse, the resulting code
may or may not work depending on the processor's current mode and
whether or not the SR(UX) bit is set in the CPU's status register.

Generally speaking, the newer and nicer toolchain setups will sign
extend for you (because it's what you probably want), and this is fine
because that's what everyone is talking about when they say
"implementation defined". The vanishingly few people that actually
want 0x00000000 80000000 can simply use a uint64_t / unsigned long
long / (u)intptr_t type before converting it to a pointer.

Something to look out for is crossing the sign extension boundary with
arithmetic. This is virtually certain to not be handled properly. If
you think about it, the reason why makes perfect sense: the region
that 32-bit code sees as continuous is in fact composed of the very
discontinuous bottom (0x0000000000000000-0x000000007fffffff) and top
(0xffffffff80000000-0xffffffffffffffff) of the 64 bit space.

For normal 32-bit legacy userspace programs, 0x80000000 and above is
off limits and accesses to it "can't happen", so nobody really cares
except kernel / system software engineers, who are expected to be
aware of the situation.

I've seen an instance in the field of a bug where two pointers printf
() to the same value (in a 32 bit environment) but do not compare
equal with == due to defects in the way that one of them was
constructed. I almost lost my voice from having to tell them "STFU,
it's __not__ a compiler bug".

Mark F. Haigh
(e-mail address removed)

Dik T. Winter · May 14, 2009

> Any word addressed machine where 10 is not correctly aligned for an int
> and the pointer to int mapping produces a byte offset from address 0. I
> believe the Cray falls into that camp.

The Cray has many surprises in pointers, but this is not one of them. 10
is a perfect word address, as is 11. It is when you come to byte addresses
that things are different. 10 is a byte address, the next byte is at
281474976710666 ;-).

lawrence.jones · May 14, 2009

Dik T. Winter said:
The Cray has many surprises in pointers, but this is not one of them. 10
is a perfect word address, as is 11. It is when you come to byte addresses
that things are different. 10 is a byte address, the next byte is at
281474976710666 ;-).

Ah, so the pointer/integer mapping leaves the bits alone. I thought
they got rotated to put the byte offset into the low-order bits where it
belongs.

Stephen Sprunk · Jun 6, 2009

Something to look out for is crossing the sign extension boundary with
arithmetic. This is virtually certain to not be handled properly. If
you think about it, the reason why makes perfect sense: the region
that 32-bit code sees as continuous is in fact composed of the very
discontinuous bottom (0x0000000000000000-0x000000007fffffff) and top
(0xffffffff80000000-0xffffffffffffffff) of the 64 bit space.

AMD faced this when defining their 64-bit extensions; they "solved" the
problem by declaring pointers to be signed and mandating sign-extension
when a 32-bit value was assigned to a register. The upper bits can (and
must) be ignored by 32-bit code, but they're still there and must have
the correct values in case some 64-bit code examines them (e.g. because
it's running on a 64-bit OS).

(In contrast, Intel mandated zero extension when a 16-bit value was
assigned to a 32-bit register; they also mandated no default extension
at all when assigning a value to an 8-bit half of a 16-bit register, but
both sign-extending and zero-extending instructions were available. Oh,
the fun compiler writers must have keeping track of all that...)

For normal 32-bit legacy userspace programs, 0x80000000 and above is
off limits and accesses to it "can't happen", so nobody really cares
except kernel / system software engineers, who are expected to be
aware of the situation.

This is where AMD's signed pointers become particularly useful. Today's
OS kernels can live comfortably in 2GB of RAM, whether in 32-bit or
64-bit modes. Therefore, rather than making them live in the "top" half
of memory, which might be at 2GB:4GB or 8,589,934,592:17,179,869,184GB,
they live at -2GB:0 in either mode. User space is 0:2GB in 32-bit mode
or 0:8,589,934,592GB in 64-bit mode -- but most kernel code doesn't need
to care about that.

S

Antoninus Twink · Jun 7, 2009

[good stuff]

Interesting and informative post - thanks!

Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Fibonacci	0	May 13, 2023
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Need help! Following code isnt working fully Comparison of integer and pointer	0	Nov 20, 2022
Beginner at c	0	Oct 5, 2023
Java matrix problem	3	Sep 10, 2023
Print with command-line arguments	0	Oct 2, 2022
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022

Implementation -defined behavior

amit.codename13

Antoninus Twink

Flash Gordon

Antoninus Twink

amit.codename13

Antoninus Twink

amit.codename13

James Kuyper

Antoninus Twink

Ike Naar

Richard Bos

lawrence.jones

mfhaigh

Dik T. Winter

lawrence.jones

Stephen Sprunk

Antoninus Twink

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads