A function is an address

  • Thread starter Julienne Walker
  • Start date
W

Wojtek Lerch

Douglas A. Gwyn said:
(Perhaps more importantly, pointers also have types.)

They do, but that's simple. The type can be easily tracked down at compile
time.
Those are restrictions, not a difference in the fundamental meaning.

They draw the line between defined behaviour and undefined behaviour.
That's a pretty fundamental thing in a language standard. When the standard
neglects to tell us where that line is, I'd consider that to be a pretty
fundamental defect.
Again, the model is of storage locations, and there are restrictions
(motivated by a desire for faster execution on some architectures).

No, the model is of arrays and their elements. That's how the standard
describes additive operators and equality operators when their operands are
pointers, and a lot of other operations are defined by reference to addition
and equality. The standard seems to assume that for every pointer that
points to an object, the object either is an element of one particular
array, or is not an array element at all. The restrictions are based on
that array. The problem is that for most non-trivial conversions that
produce a pointer value, the standard neglects to mention whether that value
points to an array element or what array that element belongs to; at best,
we're just told where the object it points to is located. When you try to
add an integer to such a pointer, the standard clearly says that the range
of integers that can be added to it is restricted, but fails to explain what
the restriction is.
There aren't all that many allowed operations for pointers.

There are a lot of differently defined conversions that produce pointers.
Very few of them bother to explain the restriction on the integers that can
be added to the resulting pointer. Nevertheless, the description of the
addition operator makes it clear that there always is a restriction. For
any pointer value, there are some integer values that invoke undefined
behaviour when added to the pointer; but when the pointer is the result of a
conversion, in most cases all the standard tells us about those integer
values is that zero is not one of them.

In some cases it's obvious what the standard really intended to promise us.
For instance, it says that converting a pointer to intptr_t and back
produces a pointer that compares equal to the original. But "everybody
knows" that the standard means to guarantee more than just that -- the
result not only compares equal to the original, but also points to an
element (or past the end) of the same array as the original. In other
words, has the same restrictions on the range of integers that can be added
to it. But the text never actually says that, because that part of the
standard happens to only care about the address ("location") that the
pointer points to, and forgets about the other information (the
"restrictions") that the standard associates with pointer values.
Assigning to the same type presumably loses no information.

The range of integers for which addition has defined behaviour is not the
kind of information that can be preserved or lost by an assignment or
conversion or other operations; it's the kind of information that the
standard should tell us, because without it, the line between defined
behaviour and undefined behaviour is missing. Imagine that the standard
neglected to define the type of certain expressions -- would you also say
that those expressions "lose" the information about the type, or would you
simply say that there's a defect in the standard that needs to be fixed?
 
D

Douglas A. Gwyn

Wojtek said:
No, the model is of arrays and their elements. That's how the standard
describes additive operators and equality operators when their operands are
pointers, and a lot of other operations are defined by reference to addition
and equality.

However, the array-of-objects model is overlaid on the byte-array
storage model, even though not much is specified about how pointer
values relate to storage-location indices.
The standard seems to assume that for every pointer that
points to an object, the object either is an element of one particular
array, or is not an array element at all. The restrictions are based on
that array. The problem is that for most non-trivial conversions that
produce a pointer value, the standard neglects to mention whether that value
points to an array element or what array that element belongs to; at best,
we're just told where the object it points to is located. When you try to
add an integer to such a pointer, the standard clearly says that the range
of integers that can be added to it is restricted, but fails to explain what
the restriction is.

Yes, the "where the object is located" is the fundamental semantic
content of the pointer value. The limitations on pointer arithmetic
are a different matter and in fact are virtually never encoded in
the pointer representation.

I'm not sure I understand what your issue is. Every object *can* be
treated as in effect an array of length 1 of objects of that type
(there is wording somewhere in the standard about this), which may
be useful for purposes of understanding validity of pointer arithmetic
for objects *not* declared with array type, but not for much else.
Storage locations correspond to array types only if so declared, or
if such a type is "impressed" upon the storage region as its
"effective type" (we need that for dynamic allocation, but it can
also occur through "punning"). The "type" says how much storage is
involved, how it is aligned, and how the storage content
(representation) is to be interpreted as a "value".

References to separately declared arrays (not "based on" each other)
may be used to create pointer values that are not permitted (in a s.c.
program) to be combined using pointer arithmetic; the only s.c.
relationship between them is (in)equality, which in practice means
that equality comparison may require the C implementation to first
perform some sort of global "address normalization" to ensure correct
results. We didn't require similar normalization for pointer
differencing, because the run-time cost outweighs the extremely
rare utility of that operation for pointers from different segments.

If a pointer value is produced by some chain of conversions that
at some point explicitly involves an array type, then that array
type imposes restrictions on the subsequent allowed pointer
arithmetic. Otherwise (no array bounds explicitly involved),
the only restriction (apart from maximum supported object size) is
whatever is implied by the actual allocation of the referenced
object (declared array length or malloc-specified storage size).
In some cases it's obvious what the standard really intended to promise us.
For instance, it says that converting a pointer to intptr_t and back
produces a pointer that compares equal to the original. But "everybody
knows" that the standard means to guarantee more than just that -- the
result not only compares equal to the original, but also points to an
element (or past the end) of the same array as the original.

I think there is wording elsewhere in the standard that says
that, in connection with pointer equality in general.
 
W

Wojtek Lerch

Douglas A. Gwyn said:
However, the array-of-objects model is overlaid on the byte-array
storage model, even though not much is specified about how pointer
values relate to storage-location indices.

It's also overlaid on the union model, and on the
type-punning-by-pointer-conversion model, and might interact with the
effective-type model in an unclear way. That's the problem: one part of the
standard seems to believe that every object is an element of at most one
array, but the rest of the standard introduces pointer conversions, unions,
and other complications, without bothering to explain how they affect the
meaning of "the" array that pointer arithmetic is based on.
Yes, the "where the object is located" is the fundamental semantic
content of the pointer value. The limitations on pointer arithmetic
are a different matter and in fact are virtually never encoded in
the pointer representation.

It doesn't matter whether they're encoded or not. Pointer representations
never encode the distinction between a valid pointer and an indeterminate
one either; nevertheless, attempting to use a value that the standard says
is indeterminate produces undefined behaviour. It's important for
programmers to know which values are indeterminate and which ones are not,
and it's the standard's job to tell it to us, because otherwise we wouldn't
know how to avoid undefined behaviour. For exactly the same reason, the
standard needs to tell us the limitations on pointer arithmetic.
Programmers need it to avoid undefined behaviour; implementors need it to
detect opportunities for optimization.
I'm not sure I understand what your issue is. Every object *can* be
treated as in effect an array of length 1 of objects of that type
(there is wording somewhere in the standard about this), which may
be useful for purposes of understanding validity of pointer arithmetic
for objects *not* declared with array type, but not for much else.

Yes, that's pretty much what 6.5.6p7 says, except it talks about "an object
that is not an element of an array", rather than "every object". And, more
importantly, it doesn't clarify whether it's talking about the declared
type, the effective type, the type of the lvalue that was involved in
producing the pointer, or some other type.
Storage locations correspond to array types only if so declared, or

A storage location can correspond to an element of several different arrays
at the same time, if those arrays live inside an object declared as a union.
The standard doesn't seem to clarify anywhere which of those arrays should
be considered "the" array that the object is an element of, for the purpose
of pointer arithmetic.

union {
int v, a[3], b[10], c[2][2]; char d[ 1000 * sizeof(int) ];
} u;

int *p = &u.v;

Does p point to an array element? If it does, then what's the length of the
array? Is p+10 defined? Is p+1000?

What about

int *q = (int*) &u;

Does q point to an element of an array of 3, 10, or 1000 elements, or does
it not point to an array element at all? (My take: according to 6.7.2.1p14,
q points to u.v, because that's the only member of the union that q has the
appropriate type to point to; and since u.v is not an array element, it
should be considered an array of one. Therefore, q+2 is undefined, and a
compiler is free to assume that q and u.a[2] cannot possibly refer to the
same object.)
if such a type is "impressed" upon the storage region as its
"effective type" (we need that for dynamic allocation, but it can
also occur through "punning"). The "type" says how much storage is
involved, how it is aligned, and how the storage content
(representation) is to be interpreted as a "value".

Are you sure that 6.5.6p7 talks about the effective type? I thought the
effective type is about accessing the content of an object, not about
computing pointers. Consider:

struct {
int a[ sizeof(float) ];
int b[ sizeof(float) ];
}
*sptr = malloc( sizeof(*sptr) );

int *iptr = &sptr->b[1] ;
float *fptr = (float*) sptr;

for ( int i = 0; i < 2 * sizeof(int); ++i )
fptr = 0.0; // Impress "float" as the effective type

At this point, does iptr still point to the second element of an array of
ints, as far as 6.5.6p7 is concerned? Is iptr-1 still defined?

And what about iptr-2?

....
If a pointer value is produced by some chain of conversions that
at some point explicitly involves an array type, then that array
type imposes restrictions on the subsequent allowed pointer
arithmetic.

I don't think you can have it both ways. Either it's based on the effective
type of what the pointer points to, or on the history of conversions and
arithmetic that produced the pointer.
Otherwise (no array bounds explicitly involved),
the only restriction (apart from maximum supported object size) is
whatever is implied by the actual allocation of the referenced
object (declared array length or malloc-specified storage size).

That's the opposite of what the standard says. According to 6.5.6p7, if no
array is involved, then the pointer should be considered to point to the
element of a one-element array; therefore you can't decrement it, and you
can only increment it once.
I think there is wording elsewhere in the standard that says
that, in connection with pointer equality in general.

I don't, and believe me, I looked. As you must know, pointer equality
allows two pointers based on different arrays to compare equal. It
explicitly talks about the case of comparing a pointer to the first element
of an array with a pointer past the end of a different array. It also
implicitly covers the case where two pointers point to the same object but
consider that object to be an element of two different arrays (such as
&u.a[1] and &u.b[1] from my example).
 
D

Douglas A. Gwyn

Wojtek said:
A storage location can correspond to an element of several different arrays
at the same time, if those arrays live inside an object declared as a union.
The standard doesn't seem to clarify anywhere which of those arrays should
be considered "the" array that the object is an element of, for the purpose
of pointer arithmetic.

Thanks for the additional discussion. I think we agree
about whether accesses (dereferencing) via a pointer are
valid (in a s.c. program), and the issue concerns only
pointer arithmetic.

It is evident that the rules for pointer arithmetic
cannot refer only the the detailed type of the pointer,
because (for example) then you couldn't use an int* to
walk through an array of ints.

The intent of my previous discussion was that the possible
"array object pointed to" must be the "currently active"
object encompassing the target location, where "currently
active" is determined by referring to the object
declaration, or by the effective type for dynamic storage.
The largest containing array type (for which a member
type is pointed to by the pointer) must be the relevant one.
union {
int v, a[3], b[10], c[2][2]; char d[ 1000 * sizeof(int) ];
} u;
int *p = &u.v;
Does p point to an array element?
Is p+10 defined? Is p+1000?

Does p point to a member of an int-array object (such as
u.b[0])? I maintain that it potentially does, depending
on whether there is a valid object around that location,
which in turn would depend on the member used for the last
write into the union, which the standard doesn't specify
as being a factor for pointer arithmetic validity. So
there is no allowed reason for rejecting pointer
arithmetic within the declared bounds. (You could argue
that it is not an object if it isn't "currently active",
but I think the intent is that there is an object there
although the content might not represent a valid value
for such an object. We need that anyway for pointers
into declared but uninitialized auto arrays.)
What about
int *q = (int*) &u;
Does q point to an element of an array of 3, 10, or 1000 elements, or does
it not point to an array element at all?

It certainlu points to a single anonymous non-array int object.
For purposes of address arithmetic validity (only), it *also*
points to an array object, by the same argument used above.
(My take: according to 6.7.2.1p14,
q points to u.v, because that's the only member of the union that q has the
appropriate type to point to; ...

It might as well point to u.v, since that has the same effect.
and since u.v is not an array element, it
should be considered an array of one. Therefore, q+2 is undefined, and a
compiler is free to assume that q and u.a[2] cannot possibly refer to the
same object.)


There are other potential objects to (members of) which the
pointer can be considered to point. What if the "v" member
had been omitted from the declaration?
struct {
int a[ sizeof(float) ];
int b[ sizeof(float) ];
}
*sptr = malloc( sizeof(*sptr) );

int *iptr = &sptr->b[1] ;
float *fptr = (float*) sptr;

for ( int i = 0; i < 2 * sizeof(int); ++i )
fptr = 0.0; // Impress "float" as the effective type

At this point, does iptr still point to the second element of an array of
ints, as far as 6.5.6p7 is concerned? Is iptr-1 still defined?


iptr still points to the same location and the same (now
corrupted) object. So
int *jptr = iptr-1;
would be valid (although accessing via jptr would not be valid).
And what about iptr-2?

The relevant array object is the one denoted by sptr->b; there
is no larger array object encompassing sptr-b. It is hard to
imagine an actual implementation using segmented structs, but
perhaps an implementation relies on the nonaliasing guarantee
in a manner that would be inconsistent with this usage.
I don't think you can have it both ways. Either it's based on the effective
type of what the pointer points to, or on the history of conversions and
arithmetic that produced the pointer.

I was thinking that such use of explicit array types might impose
further restrictions (i.e. an implementation could generate code
that depends on them), but maybe not.
That's the opposite of what the standard says. According to 6.5.6p7, if no
array is involved, then the pointer should be considered to point to the
element of a one-element array; therefore you can't decrement it, and you
can only increment it once.

It doesn't say "if no array is involved"; it talks about the
operand "pointing to an element of an array object". What
constitutes "an array object", and which one applies if there
is more than one logical candidate, has to be determined separately.
 
W

Wojtek Lerch

Douglas A. Gwyn said:
Thanks for the additional discussion. I think we agree
about whether accesses (dereferencing) via a pointer are
valid (in a s.c. program), and the issue concerns only
pointer arithmetic.

I don't know. If the object that a pointer points to is the first element
of a 2-element array but also of a 10-element array, doesn't 6.5.6p8 forbid
dereferencing P+2 as well as P+10? :)
It is evident that the rules for pointer arithmetic
cannot refer only the the detailed type of the pointer,
because (for example) then you couldn't use an int* to
walk through an array of ints.

That much is pretty obvious: the type of the pointer is a pointer to the
element type, and includes no information about the number of elements of
the array.

But conceivably, the *value* of the pointer could be considered to include
that information, either explicitly (encoded in the representation) or
implicitly (as a dependency on the array lvalue that decayed into the
pointer value at some point in the past). The committee's official response
to DR 260 seems to support this interpretation: pointers based on different
origins are distinct, even if they have identical representations (and
therefore point to the same "memory location"). A pointer to past the end
an array, a pointer to the first element of a 10-element array, and a
pointer to the first element of a 5-element array could all point to the
same location, but according to DR 260, they should be considered three
distinct pointers for the purpose of determining which operations have
undefined behaviour.
The intent of my previous discussion was that the possible
"array object pointed to" must be the "currently active"
object encompassing the target location, where "currently
active" is determined by referring to the object
declaration, or by the effective type for dynamic storage.
The largest containing array type (for which a member
type is pointed to by the pointer) must be the relevant one.
union {
int v, a[3], b[10], c[2][2]; char d[ 1000 * sizeof(int) ];
} u;
int *p = &u.v;
Does p point to an array element?
Is p+10 defined? Is p+1000?

Does p point to a member of an int-array object (such as
u.b[0])? I maintain that it potentially does, depending

Does that make the behaviour "potentially defined"? :)

If you mean that both p+10 and p+1000 are defined, do you mean that the
expressions &u.a[100] and &u.c[0][100] are defined as well? My
understanding was that they're not, to allow compilers to make aliasing
assumptions about pairs of lvalues such as u.c[0] and u.c[1][j].
on whether there is a valid object around that location,
which in turn would depend on the member used for the last
write into the union, which the standard doesn't specify
as being a factor for pointer arithmetic validity. So

I'm not sure what exactly you mean by "valid object" -- do you mean an
object whose effective type is int? Or whose value is not indeterminate?

Anyway, I agree that it would be bad if the validity of pointer arithmetic
depended on how the pointed-to memory location had been accessed in the
past. That's the main reason why I don't like the interpretation that
6.5.6p8 refers to the effective type when it talks about arrays. (Another
reason is that I find the text about effective types unclear and incomplete,
to put it gently.)
there is no allowed reason for rejecting pointer
arithmetic within the declared bounds. (You could argue
that it is not an object if it isn't "currently active",
but I think the intent is that there is an object there
although the content might not represent a valid value
for such an object. We need that anyway for pointers
into declared but uninitialized auto arrays.)

Or for the computation of pointers that are subsequently used for write
accesses that change what you refer to as the "currently active" object.
But yes, I meant the union in my example to be uninitialized. But I really
hoped that it doesn't matter.
It certainlu points to a single anonymous non-array int object.
For purposes of address arithmetic validity (only), it *also*
points to an array object, by the same argument used above.

It does not point to an array object. It points to an int object. That
object can be considered to be an element of several different array
objects. If your interpretation is that it actually should be considered an
element of all those arrays at the same time, then I don't know how to avoid
the conclusion that 6.5.6p8 forbids using the pointer to access the
one-past-the-end element of any of those arrays. The interpretation that
the array that the object should be considered an element of depends on the
origins of the pointer seems much more sensible to me, and more consistent
with the committee's official interpretation of DR 260.
It might as well point to u.v, since that has the same effect.

The effect is not the same, because u.v is not an array element, and that
makes accessing q[1] undefined.
and since u.v is not an array element, it
should be considered an array of one. Therefore, q+2 is undefined, and a
compiler is free to assume that q and u.a[2] cannot possibly refer to
the
same object.)


There are other potential objects to (members of) which the
pointer can be considered to point. What if the "v" member
had been omitted from the declaration?


Then the union wouldn't have a member whose type is int, and the conversion
of &u to int* would not be guaranteed to produce a pointer pointing to any
of the members. As far as I can tell, the standard doesn't say anything
about what the result of such a conversion points to, only that it can be
safely converted back to a pointer to the union type, producing a value that
compares equal to &u. But if there's a spot in the standard that I missed
that promises (int*)&u to point to the memory location occupied by u.a[0]
when v is absent from the union (or if it's so obvious that it doesn't even
deserve to be mentioned), I hope there also is a spot that I missed that
explains something about the range of pointer math that can be performed on
that pointer.
struct {
int a[ sizeof(float) ];
int b[ sizeof(float) ];
}
*sptr = malloc( sizeof(*sptr) );

int *iptr = &sptr->b[1] ;
float *fptr = (float*) sptr;

for ( int i = 0; i < 2 * sizeof(int); ++i )
fptr = 0.0; // Impress "float" as the effective type

At this point, does iptr still point to the second element of an array of
ints, as far as 6.5.6p7 is concerned? Is iptr-1 still defined?


iptr still points to the same location and the same (now
corrupted) object. So


The object isn't any more corrupted than it was before the assignments, is
it? Or do you mean something other than "indeterminate" or "not having the
effective type int"?
int *jptr = iptr-1;
would be valid (although accessing via jptr would not be valid).

I hope you mean reading but not writing, correct?
The relevant array object is the one denoted by sptr->b; there
is no larger array object encompassing sptr-b.

Lets go back one step. Is

int *aptr = &sptr->a[ sizeof(float) ], *bptr = aptr + 1;

defined or not?
I was thinking that such use of explicit array types might impose
further restrictions (i.e. an implementation could generate code
that depends on them), but maybe not.

I'm not sure if I understand you correctly. If the array is really big, but
a conversion to a smaller array type happened in the process of computing
the pointer, which array type determines the validity of pointer addition --
the small one or the big one?
It doesn't say "if no array is involved"; it talks about the
operand "pointing to an element of an array object". What

It doesn't, but it follows. If no array is involved, the object can't
possibly be an element of an array, can it?
constitutes "an array object", and which one applies if there
is more than one logical candidate, has to be determined separately.

Yes, and that's exactly what I complain is missing from the standard.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top