[posted only to comp.lang.c -- alt.comp.lang.c does not exist on
newsguy]
(e-mail address removed)-berlin.de wrote: [after]
int *x = malloc( 100 * sizeof *x );
....
free( x );
Both the following lines would invoke undefined behavior.
printf( "%d %d\n", x[ 11 ], *( x + 42 ); /* wrong! */
printf( "%p\n", ( void * ) x ); /* wrong! */
I have a question about this [second] line, printing the value of 'x'.
Why is this wrong? The memory allocated [for] x is on the function stack,
and I didn't believe you are accessing any invalid memory region?
Let me back things up a bit. Suppose instead of "int *x" or even
"int y", or "float z", I tell you only that I have four bytes in
memory that are set to 0x12, 0x34, 0x56, and 0x78 in sequence.
What is the value stored in this four-byte region? Is it 0x12345678?
Is it perhaps 0x78563412? Or could it even be something like
17378244361449504001963252426735616.0?
The answer is: it depends. Those four bytes are, as a 32-bit int,
the first value (0x12345678) on a SPARC or 680x0-based machine,
but the second (0x78563412) on an Intel x86-based machine. The
third (1.73782e34, or 7019017 * pow(2,91)) occurs if those four
bytes are meant to be interpreted as a 32-bit floating point number
on the x86.
Clearly, then, the value of some sequence of bytes depends on the
*interpretation* of that byte-sequence. The next question I would
like to ask, then, is this: How are the bytes making up a pointer
interpreted?
On many machines, they happen to be interpreted in precisely the
same way as some integer; but this is not the only possible
interpretation. Those who used the x86 in its early 80186 and
80286 incarnations should remember the "1-megabyte pointer", in
which the upper 16 bits represented the top 16 of the 20 bits of
the address, and the lower 16 bits represented the bottom 16 of
the 20 bits of the address, with those two values being summed:
real_20_bit_address = ((upper_16(ptr) << 4) + lower_16(ptr)) & 0xfffff;
(This means that any given physical address has lots of different
32-bit values that refer to it. This particular "feature" was the
source of a lot of problems and the term "pointer normalization".
It is also one reason that the C standards define "a < b" only for
pointers a and b into a single object, while "a == b" is defined
even if a and b point to different objects -- the equality operators
must normalize their pointers, while the relational operators are
allowed to compare only the offsets.)
Yet another interpretation was allowed on some varieties of the
x86, in which the upper 16 bits of the pointer were an index into
an (external) table, and the lower 16 bits were an offset to be
applied to the result of the table:
real_NN_bit_address = table[upper_16(ptr)] + lower_16(ptr);
/* more or less */
All of these interpretations -- and indeed almost any other
interpretation anyone can think of yesterday, today, or tomorrow
-- are allowed (but not required) by the C standard. The last of
the above, with the table-lookup step, happens to allow something
else the x86 did: the table need not contain just a "base address".
It can also contain a "valid" flag:
if (table[upper_16(ptr)].valid == 0)
throw_runtime_error("invalid address");
real_NN_bit_address = table[upper_16(ptr)].base + lower_16(ptr);
Now, all free() needs to do is contain, as one of its steps:
table[upper_16(ptr)].valid = 0;
and suddenly a bit pattern that *was* valid, before the call to
free(), is no longer valid. An attempt to print it, which used
to work, may now cause a runtime "invalid address" error.
The bit pattern has not changed. What changed is the external
table. The C standard allows this, and tells you -- the C programmer
-- not to attempt to use the value in x after passing that value
to free(), just in case free() cleared the table's "valid" bit.
Of course, as Jens wrote, you *are* allowed to overwrite x with a
new value, or with NULL. Any C compiler must make sure this works,
even if it has this kind of "valid/invalid table entry" action that
goes on with malloc() and free().
This is all part of a more fundamental issue, which C programmers
in particular should consider, because C has "C bytes" that can be
used to access hardware-level "representations" instead of
language-level "values". That issue is: the *representation* of
a value, and value itself, are different things. Values arise
through *interpretation* of some bit pattern (a "representation"),
and the process of interpretation can be quite complex. We see
this now, today, on conventional architectures, only in floating-point
numbers -- but in the past, we saw such things elsewhere. There
were good reasons for complicated interpretations for pointers,
and those reasons may well recur in the future.
["Methods of interpretation" are also the reason we see byte-order
issues in integers. If some entity takes a long string of bits
and breaks it up into groups of, say, 8 at a time, it is that entity
that chooses which order to put out the groups, and then to
re-assemble them for later re-interpretation as a larger group.
Any given machine may have its own method(s) for splitting and
combining to get 8-bit groups, but if you, as a C programmer, deal
in terms of (integral) *values*, and do your own splitting and
combining, *you* can control the results, rather than being at the
mercy of your machine(s).]