all-bits-zero pointer-to-object representation

Ersek, Laszlo · Apr 26, 2010

Hi,

with reference to [0] and [1], please consider the following:

1 #include <string.h>
2 #include <stdlib.h>
3
4 struct x {
5 double *y;
6 };
7
8 int
9 main(void)
10 {
11 struct x *x = malloc(sizeof *x);
12
13 /* suppose the allocation succeeds */
14 (void)memset(x, 0, sizeof *x);
15 (void)(0 == x->y);
16 return 0;
17 }

In my understanding, the evaluation of x->y on line 15 is undefined
behavior in C99.

Consider the following (fictious) extension:

"The all-bits-zero object representation is valid for any
pointer-to-object type. Any pointer-to-object object with the
all-bits-zero representation is a null pointer of the corresponding
pointer-to-object type."

Would this extension make the above program well defined?

In particular, would the following program still break aliasing rules, as
per C99 6.5 p 6-7?

1 #include <stdlib.h>
2
3 int
4 main(void)
5 {
6 double **d = malloc(sizeof *d);
7 size_t pos;
8
9 /* suppose the allocation succeeded */
10
11 for (pos = 0u; pos < sizeof *d; ++pos) {
12 ((char unsigned *)d)[pos] = 0u;
13 }
14
15 (void)*d;
16 return 0;
17 }

(I hope my question corresponds precisely to the austin-group-l topic.)

Thank you very much,
lacos

[0] https://www.opengroup.org/sophocles...tpl&source=L&listname=austin-group-l&id=13687
[1] https://www.opengroup.org/sophocles...tpl&source=L&listname=austin-group-l&id=13690

James Kuyper · Apr 26, 2010

Hi,

with reference to [0] and [1], please consider the following:

1 #include <string.h>
2 #include <stdlib.h>
3
4 struct x {
5 double *y;
6 };
7
8 int
9 main(void)
10 {
11 struct x *x = malloc(sizeof *x);
12
13 /* suppose the allocation succeeds */
14 (void)memset(x, 0, sizeof *x);
15 (void)(0 == x->y);
16 return 0;
17 }

In my understanding, the evaluation of x->y on line 15 is undefined
behavior in C99.

Consider the following (fictious) extension:

"The all-bits-zero object representation is valid for any
pointer-to-object type. Any pointer-to-object object with the
all-bits-zero representation is a null pointer of the corresponding
pointer-to-object type."

That depends upon what you mean by valid. The standard distinguishes
several cases. It talks about pointer objects containing representations
of which can be dereferenced, incremented, decremented, compared for
order, compared for equality, or simply copied as a pointer value. For
each of those operations, the set of pointer representations valid for
that operation is different.

A null pointer value must compare equal to any other null pointer value,
and it must compare unequal to any pointer to an object. A pointer is
not valid for the purpose of dereferencing it, unless it points at an
object. Therefore, in principle, an implementation cannot choose
all-bits-0 to be both a null pointer and a pointer which is valid for
the purpose of dereferencing it. However, offhand I can't come up with
any code with defined behavior which demonstrates the non-conformance of
such an implementation, so it might be permitted, under the "as-if" rule.

Would this extension make the above program well defined?

Yes. The behavior would be well-defined, by the implementor. The
behavior would, of course, still be undefined as far as the C standard
is concerned, because "undefined behavior" is a specialized piece of
jargon in the C standard. It doesn't carry the apparently obvious
meaning of "behavior that has no definition". Instead, it means
"behavior which is not defined by this standard" (I've paraphrased the
actual wording, for the sake improved clarity in this context).

In particular, would the following program still break aliasing rules,
as per C99 6.5 p 6-7?

1 #include <stdlib.h>
2
3 int
4 main(void)
5 {
6 double **d = malloc(sizeof *d);
7 size_t pos;
8
9 /* suppose the allocation succeeded */

It's almost as easy to handle the possibility that the allocation
failed, as it is to write a comment explaining that you've decided to
ignore that possibility.

11 for (pos = 0u; pos < sizeof *d; ++pos) {
12 ((char unsigned *)d)[pos] = 0u;
13 }
14
15 (void)*d;

16 return 0;
17 }

(I hope my question corresponds precisely to the austin-group-l topic.)

I'm not sure that it does. Aliasing is something that is inherently
impossible for null pointers, for they do not point at an object.

Ersek, Laszlo · Apr 26, 2010

Hi,

with reference to [0] and [1], please consider the following:

1 #include <string.h>
2 #include <stdlib.h>
3
4 struct x {
5 double *y;
6 };
7
8 int
9 main(void)
10 {
11 struct x *x = malloc(sizeof *x);
12
13 /* suppose the allocation succeeds */
14 (void)memset(x, 0, sizeof *x);
15 (void)(0 == x->y);
16 return 0;
17 }

In my understanding, the evaluation of x->y on line 15 is undefined
behavior in C99.

Consider the following (fictious) extension:

"The all-bits-zero object representation is valid for any
pointer-to-object type. Any pointer-to-object object with the
all-bits-zero representation is a null pointer of the corresponding
pointer-to-object type."

Click to expand...

That depends upon what you mean by valid.

My apologies. I tried (and failed) to formulate "all-bits-zero implies a
null pointer value" in standardese. Thus I meant all those uses that are
otherwise valid for any given lvalue evaluating to a null pointer value.

The standard distinguishes several cases. It talks about pointer objects
containing representations of which can be dereferenced, incremented,
decremented, compared for order, compared for equality, or simply copied
as a pointer value. For each of those operations, the set of pointer
representations valid for that operation is different.

A null pointer value must compare equal to any other null pointer value,
and it must compare unequal to any pointer to an object. A pointer is
not valid for the purpose of dereferencing it, unless it points at an
object. Therefore, in principle, an implementation cannot choose
all-bits-0 to be both a null pointer and a pointer which is valid for
the purpose of dereferencing it. However, offhand I can't come up with
any code with defined behavior which demonstrates the non-conformance of
such an implementation, so it might be permitted, under the "as-if"
rule.

Yes. The behavior would be well-defined, by the implementor. The
behavior would, of course, still be undefined as far as the C standard
is concerned, because "undefined behavior" is a specialized piece of
jargon in the C standard. It doesn't carry the apparently obvious
meaning of "behavior that has no definition". Instead, it means
"behavior which is not defined by this standard" (I've paraphrased the
actual wording, for the sake improved clarity in this context).

Thank you.

As I understand it, the question is: after adding this extension to
POSIX(R), would further extensions be necessary, so that the code above
becomes defined? Most probably, this could be answered completely only by
considering all other extensions introduced by POSIX. Assuming, however,
that POSIX only defined otherwise undefined (or unspecified) behavior, and
that it didn't redefine (or weaken/reclassify) already defined behavior, I
think the suggestion ought to be eligible for consideration in isolation
as well.

It's almost as easy to handle the possibility that the allocation failed, as
it is to write a comment explaining that you've decided to ignore that
possibility.

I agree absolutely. I had to force myself to omit the error checking and
write a comment instead. I favor examples with complete error checking. I
only wanted to sidestep dead-ends like "if malloc() fails, there is no
undefined behavior, because the first substatement of your *if* statement
is not executed then".

11 for (pos = 0u; pos < sizeof *d; ++pos) {
12 ((char unsigned *)d)[pos] = 0u;
13 }
14
15 (void)*d;

Click to expand...

16 return 0;
17 }

(I hope my question corresponds precisely to the austin-group-l topic.)

Click to expand...

I'm not sure that it does. Aliasing is something that is inherently
impossible for null pointers, for they do not point at an object.

I believe it isn't about an object hypothetically aliased by some other
(valid) pointer and a pointer with all-bits-zero representation. It is
about the pointer object with all-bits-zero representation itself, aliased
by differently typed pointers (pointer rvalues); like *d vs. *(char
unsigned *)d in the above.

I think that storing a valid (double*)0 null pointer value representation
in the space allocated by malloc() through either ((char unsigned
*)d)[...] or memset() doesn't force any effective type (eg. a character
type) on the allocated object. That is, the conclusion of the second
sentence of C99 6.5 p6 does not hold, because its premise is false.

--o--

I've downloaded ISO/IEC 9899:1999/Cor.2:2004(E) from
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/9899-1999_cor_2-2004.pdf>.
Entry 9 seems to imply that

{
int *ip;

ip = malloc(sizeof *ip);
if (0 != ip) {
(void)memset(ip, 0, sizeof *ip);
(void)*ip;
}
}

can invoke no undefined behavior, even though TC2 doesn't appear to extend
C99 6.5 with any requirement on memset(). Reformulating the original
question: extending C99 (with all TC's applied) with a requirement on
pointers-to-objects, similar to TC2 entry 9, would the code at the top
*instantly* become defined?

Thank you very much for your answer.
lacos

James Kuyper · Apr 27, 2010

Ersek, Laszlo wrote: ....

11 for (pos = 0u; pos < sizeof *d; ++pos) {
12 ((char unsigned *)d)[pos] = 0u;
13 }
14
15 (void)*d;

Click to expand...

16 return 0;
17 }

(I hope my question corresponds precisely to the austin-group-l topic.)

Click to expand...

I'm not sure that it does. Aliasing is something that is inherently
impossible for null pointers, for they do not point at an object.

Click to expand...

I believe it isn't about an object hypothetically aliased by some other
(valid) pointer and a pointer with all-bits-zero representation. It is
about the pointer object with all-bits-zero representation itself,
aliased by differently typed pointers (pointer rvalues); like *d vs.
*(char unsigned *)d in the above.

The anti-aliasing rules in 6.5p7 explicitly allow for the representation
of any object to be accessed through an lvalue of character type, in
addition to several other possibilities.

I think that storing a valid (double*)0 null pointer value
representation in the space allocated by malloc() through either ((char
unsigned *)d)[...] or memset() doesn't force any effective type (eg. a
character type) on the allocated object. That is, the conclusion of the
second sentence of C99 6.5 p6 does not hold, because its premise is false.

Sorry, I didn't realize that it was 'd' itself, rather than *d, that you
were thinking about in terms of aliasing.

C99 describes how the effective type is determined, and I agree that
neither the call to memset() nor writing to that memory as an array of
char gives it an effective type, since that writing did not take the
form of copying from an existing object. However, accessing it through
*d gives it an effective type of 'double*'; and 6.5p6 makes no
distinction between whether the access was for the purpose of writing
the memory, or reading it. If that memory has all bits zero at the time
of the access, then an POSIX-specific promise that such a representation
represents a null pointer seems to me to be sufficient to render the
behavior of that code defined - by POSIX, not by the C standard.

This is really an issue that you should raise in a forum devoted to the
POSIX standard, though I'm not sure what the appropriate one would be.
comp.std.unix has been quiet for so long that someone is starting the
formal process for removing it.

Ersek, Laszlo · Apr 27, 2010

Ersek, Laszlo wrote: ...
11 for (pos = 0u; pos < sizeof *d; ++pos) {
12 ((char unsigned *)d)[pos] = 0u;
13 }
14
15 (void)*d;

16 return 0;
17 }

Click to expand...

Click to expand...

[snip]

I think that storing a valid (double*)0 null pointer value
representation in the space allocated by malloc() through either ((char
unsigned *)d)[...] or memset() doesn't force any effective type (eg. a
character type) on the allocated object. That is, the conclusion of the
second sentence of C99 6.5 p6 does not hold, because its premise is
false.

Click to expand...

[snip]

C99 describes how the effective type is determined, and I agree that
neither the call to memset() nor writing to that memory as an array of
char gives it an effective type, since that writing did not take the
form of copying from an existing object. However, accessing it through
*d gives it an effective type of 'double*'; and 6.5p6 makes no
distinction between whether the access was for the purpose of writing
the memory, or reading it. If that memory has all bits zero at the time
of the access, then an POSIX-specific promise that such a representation
represents a null pointer seems to me to be sufficient to render the
behavior of that code defined - by POSIX, not by the C standard.

Thank you very much for your invaluable input. Much obliged.

This is really an issue that you should raise in a forum devoted to the
POSIX standard, though I'm not sure what the appropriate one would be.

As I understand it, austin-group-l is *that* forum. (See [0]/Q0, and [1].)
The issue was raised there and I thought comp.lang.c and comp.std.c
subscribers could contribute authoritatively. Thankfully, you proved the
hunch right.

If you don't mind, I'll forward your message to the corresponding
austin-group-l thread.

Cheers,
lacos

[0] http://www.opengroup.org/austin/faq.html
[1] http://www.opengroup.org/austin/lists.html

Ersek, Laszlo · Apr 27, 2010

The issue was raised there and I thought comp.lang.c and comp.std.c
subscribers could contribute authoritatively.

Small fix, with apologies: the comp.std.c idea came from Vincent Lefevre.

Thanks,
lacos

Vincent Lefevre · Apr 27, 2010

In comp.std.c, article <[email protected]>,

James Kuyper said:
C99 describes how the effective type is determined, and I agree that
neither the call to memset() nor writing to that memory as an array of
char gives it an effective type, since that writing did not take the
form of copying from an existing object.

There's a problem with this sentence ("since ... not" while
the C standard uses positive causality - see below). Here's
what I said in the austin-group mailing-list about memset()
used on a dynamically allocated region (but again, the
standard is not clear enough, IMHO):

6.5#6 says:

The effective type of an object for an access to its stored value is
the declared type of the object, if any.75)

Here there is no declared type (I recall the context: the memory
was allocated dynamically). So, this doesn't apply.

If a value is stored into an object having no declared type through
an lvalue having a type that is not a character type, then the type
of the lvalue becomes the effective type of the object for that
access and for subsequent accesses that do not modify the stored
value.

Here the type of the lvalue is a character type, so that this doesn't
apply. Another interpretation is that memset is its own way to store
data (just like memcpy and memmove below); still, the above sentence
doesn't apply here.

If a value is copied into an object having no declared type using
memcpy or memmove, or is copied as an array of character type, then
the effective type of the modified object for that access and for
subsequent accesses that do not modify the value is the effective
type of the object from which the value is copied, if it has one.

Here this is memset, not memcpy or memmove. I don't know what "is
copied as an array of character type" intends to mean. Anyway,
memset doesn't copy an object. So, this doesn't apply.

For all other accesses to an object having no declared type, the
effective type of the object is simply the type of the lvalue used
for the access.

This is an "else" case. This is how I deduce that the effective type
is a character type.

Vincent Lefevre · Apr 27, 2010

In comp.std.c, article <[email protected]>,

Small fix, with apologies: the comp.std.c idea came from Vincent Lefevre.

Actually, my remark in the austin-group list was just about the
effective type due to a memset() on a dynamically allocated region
(not about the representation of a null pointer). This question is
covered by the C standard, not by POSIX. That's why I suggested
comp.std.c.

Ersek, Laszlo · Apr 27, 2010

Here's what I said in the austin-group mailing-list about memset() used
on a dynamically allocated region (but again, the standard is not clear
enough, IMHO):

6.5#6 says:

(a)

The effective type of an object for an access to its stored value is
the declared type of the object, if any.75)

Here there is no declared type (I recall the context: the memory was
allocated dynamically). So, this doesn't apply.

(b)

If a value is stored into an object having no declared type through
an lvalue having a type that is not a character type, then the type
of the lvalue becomes the effective type of the object for that
access and for subsequent accesses that do not modify the stored
value.

Here the type of the lvalue is a character type, so that this doesn't
apply. Another interpretation is that memset is its own way to store
data (just like memcpy and memmove below); still, the above sentence
doesn't apply here.

(c)

If a value is copied into an object having no declared type using
memcpy or memmove, or is copied as an array of character type, then
the effective type of the modified object for that access and for
subsequent accesses that do not modify the value is the effective
type of the object from which the value is copied, if it has one.

Here this is memset, not memcpy or memmove. I don't know what "is copied
as an array of character type" intends to mean. Anyway, memset doesn't
copy an object. So, this doesn't apply.

(d)

For all other accesses to an object having no declared type, the
effective type of the object is simply the type of the lvalue used
for the access.

This is an "else" case. This is how I deduce that the effective type is
a character type.

Ah, okay, now I think I see what you mean. Sorry for being dense.

We seem to agree that none of (a) and (b) apply. You say that (c) doesn't
apply either, and thus (d) -- the "else branch" -- must apply. I didn't
understand this previously: I said (or rather, I think I said) "since none
of a-b-c applies, the access establishes no effective type at all". This
is probably a misinterpretation. (Euphemism for "I was wrong".)

But what if we assume for a moment that the all-bits-zero representation
carries a valid null pointer value for all pointer-to-object types? In
that case, wouldn't zeroing out the individual bytes of a
pointer-to-object object through a (char unsigned *) make (c) applicable?

----v----
If a value is copied into an object having no declared type [...] as an
array of character type, then the effective type of the modified object
for that access and for subsequent accesses that do not modify the value
is the effective type of the object from which the value is copied, if it
has one.
----^----

static double *dp; /* suppose all-bits-zero */
static char unsigned zeroes[sizeof dp];

static void
z1(void **dpp)
{
(void)memcpy(dpp, &dp, sizeof dp);
}

static void
z2(void **dpp)
{
size_t pos;

assert(0 == memcmp(&dp, zeroes, sizeof zeroes));
for (pos = 0u; pos < sizeof zeroes; ++pos) {
((char unsigned *)dpp)[pos] = zeroes[pos];
}
}

static void
z3(void **dpp)
{
(void)memset(dpp, 0, sizeof(double *));
}

(c) applies to z1(). z2() copies the exact same bit pattern (object
representation) from a character array to "*dpp". z3() establishes the
exact same bit pattern (object representation) in "*dpp" without a source
object.

Insomuch as TC2 entry 9 has rendered z2() and z3() equivalent to z1() wrt.
integers, without touching 6.5 at all, I think it would only be consequent
if a similar all-bits-zero requirement on object pointers (added as an
extension) made both z2() and z3() equivalent to z1(), wrt. object
pointers, necessitating no change to 6.5 either.

In my opinion, the reason why the standard doesn't explicitly include
memset() in (c), and the char-wise storing of a pattern, is only because
it couldn't do that without restricting the object representations
themselves. Adding a constraint on object representation sufficed to allow
z3() for integers. So should it for pointers-to-objects.

Cheers,
lacos

Vincent Lefevre · Apr 28, 2010

In comp.std.c, article <[email protected]>,

Ersek said:
But what if we assume for a moment that the all-bits-zero representation
carries a valid null pointer value for all pointer-to-object types? In
that case, wouldn't zeroing out the individual bytes of a
pointer-to-object object through a (char unsigned *) make (c) applicable?

It depends on how this is done. You could apply (c) under "If a value
[...] is copied as an array of character type". But note the words
"value" and "copied". This means that there is a source in memory
(with an effective type). While memcpy and memmove use such a source,
memset doesn't. I'm not such that even a "for" loop falls under this
condition because I don't see how an implementation could recognize
every form of such loops (in the most complicated cases).

----v----
If a value is copied into an object having no declared type [...] as an
array of character type, then the effective type of the modified object
for that access and for subsequent accesses that do not modify the value
is the effective type of the object from which the value is copied, if it
has one.
----^----

static double *dp; /* suppose all-bits-zero */
static char unsigned zeroes[sizeof dp];

static void
z1(void **dpp)
{
(void)memcpy(dpp, &dp, sizeof dp);
}

static void
z2(void **dpp)
{
size_t pos;

assert(0 == memcmp(&dp, zeroes, sizeof zeroes));
for (pos = 0u; pos < sizeof zeroes; ++pos) {
((char unsigned *)dpp)[pos] = zeroes[pos];
}
}

static void
z3(void **dpp)
{
(void)memset(dpp, 0, sizeof(double *));
}

(c) applies to z1().

and the effective type of the object in &dp (that is, double *)
is used.

z2() copies the exact same bit pattern (object representation) from
a character array to "*dpp".

But the object in zeroes has no effective type (except the individual
unsigned char), thus no value. You first need to force an effective
type (and a value), e.g. with

*((double **) &zeroes) = NULL;

Then I'm not sure that the for loop counts as a copy of such an
object.

z3() establishes the exact same bit pattern (object representation)
in "*dpp" without a source object.

Since there is no source, there is no effective type and no value.

Insomuch as TC2 entry 9 has rendered z2() and z3() equivalent to z1() wrt.
integers, without touching 6.5 at all,

??? Could you explain? I don't see such a thing on

http://www.open-std.org/jtc1/sc22/wg14/www/docs/tc2.htm

James Kuyper · Apr 28, 2010

Vincent said:
In comp.std.c, article <[email protected]>,

There's a problem with this sentence ("since ... not" while
the C standard uses positive causality - see below). Here's
what I said in the austin-group mailing-list about memset()
used on a dynamically allocated region (but again, the
standard is not clear enough, IMHO):

6.5#6 says:

The effective type of an object for an access to its stored value is
the declared type of the object, if any.75)

Here there is no declared type (I recall the context: the memory
was allocated dynamically). So, this doesn't apply.

If a value is stored into an object having no declared type through
an lvalue having a type that is not a character type, then the type
of the lvalue becomes the effective type of the object for that
access and for subsequent accesses that do not modify the stored
value.

Here the type of the lvalue is a character type, so that this doesn't
apply. Another interpretation is that memset is its own way to store
data (just like memcpy and memmove below); still, the above sentence
doesn't apply here.

If a value is copied into an object having no declared type using
memcpy or memmove, or is copied as an array of character type, then
the effective type of the modified object for that access and for
subsequent accesses that do not modify the value is the effective
type of the object from which the value is copied, if it has one.

Here this is memset, not memcpy or memmove. I don't know what "is
copied as an array of character type" intends to mean.

It means that the fact that memcpy() causes the memory to acquire that
effective type is not a magical feature of memcpy() or memmove(), but is
merely a consequence of the defined behavior of those functions. It
means that any other code that has the same defined behavior as either
of those two functions must therefore also have the same effect, of
establishing the effective type of that piece of memory. In particular,

double *d = NULL;
unsigned char *pin;
double **dp = malloc(sizeof din);
unsigned char *pout = (char*)dp;

if(pout)
{
for(pin = &d; pin < (char*)(&d + 1); pin++)
*pout++ = *pin;
}

must cause the memory pointed at by dp to acquire the effect type of
'double *', and to contain a valid representation of a null pointer to
double. The relevant consequence of this fact is that, for an
implementation which provides the guarantee (not provided by the
standard itself) that *pin has a value of 0 at each point where that
value is copied to *pout in the above loop, then that loop must be
replaceable by

for(size_t i=0; i < sizeof(**dout); i++)
pout = 0;

without any change to the resulting behavior. That's because, when such
a implementation-defined guarantee applies, the behavior defined by the
standard is identical for those two loops. That second loop has the same
standard-defined behavior as a call to memset(), which must therefore
also have that same effect.

... Anyway,
memset doesn't copy an object. So, this doesn't apply.

For all other accesses to an object having no declared type, the
effective type of the object is simply the type of the lvalue used
for the access.

This is an "else" case. This is how I deduce that the effective type
is a character type.

Click to expand...

The first access to *d, as a whole, in the code provided, is through a
pointer to a pointer to double, not through a pointer to unsigned char.
That access sets the effective type for the object as a whole to be
'double *', even though the previous access to the individual bytes of
the object set the effective type for those bytes to unsigned char. All
objects can also be accessed as arrays of unsigned char, so there is no
conflict between those two effective types.

Vincent Lefevre · Apr 28, 2010

In comp.std.c, article <[email protected]>,

It means that the fact that memcpy() causes the memory to acquire that
effective type is not a magical feature of memcpy() or memmove(), but is
merely a consequence of the defined behavior of those functions. It
means that any other code that has the same defined behavior as either
of those two functions must therefore also have the same effect, of
establishing the effective type of that piece of memory.

Perhaps, but I don't see how you can deduce all of this:

In particular,

double *d = NULL;
unsigned char *pin;
double **dp = malloc(sizeof din);

^^^

I think you mean d or double *.

unsigned char *pout = (char*)dp;

^^^^

I think you mean unsigned char.

if(pout)
{
for(pin = &d; pin < (char*)(&d + 1); pin++)

^^^^
Ditto.

*pout++ = *pin;
}

must cause the memory pointed at by dp to acquire the effect type of
'double *',

Yes, because an object of type double * is copied.

However, how far does this go? What if the char's are set in
some arbitrary order, possibly with other statements between
the stores?

and to contain a valid representation of a null pointer to double.
The relevant consequence of this fact is that, for an implementation
which provides the guarantee (not provided by the standard itself)
that *pin has a value of 0 at each point where that value is copied
to *pout in the above loop, then that loop must be replaceable by

for(size_t i=0; i < sizeof(**dout); i++)
pout = 0;

without any change to the resulting behavior.

Click to expand...

The behavior doesn't change at *this* point, but while you could say
that the effective type in the former case was double *, there's no
source of double * here, so that the effective type of the allocated
memory cannot be double *, and...

That's because, when such a implementation-defined guarantee
applies, the behavior defined by the standard is identical for those
two loops. That second loop has the same standard-defined behavior
as a call to memset(), which must therefore also have that same
effect.

Click to expand...

ditto for memset.

The first access to *d, as a whole, in the code provided, is through a
pointer to a pointer to double, not through a pointer to unsigned char.

Click to expand...

We are talking about dynamically allocated memory. *d is not in this
case.

That access sets the effective type for the object as a whole to be
'double *', even though the previous access to the individual bytes of
the object set the effective type for those bytes to unsigned char. All
objects can also be accessed as arrays of unsigned char, so there is no
conflict between those two effective types.

Click to expand...

This is off-topic: your code does not use memset.

James Kuyper · Apr 28, 2010

Vincent said:
In comp.std.c, article <[email protected]>,

Perhaps, but I don't see how you can deduce all of this:

I don't know how too explain that deduction to you; to me, it all seems
clearly implied by the phrase "copied as an array of character type".

^^^

I think you mean d or double *.

^^^^

I think you mean unsigned char.

^^^^
Ditto.

Yes - that single post contained far too many typos. Perhaps I should go
back to bed for a while before going to work.

Yes, because an object of type double * is copied.

However, how far does this go? What if the char's are set in
some arbitrary order, possibly with other statements between
the stores?

The order in which the chars are copied does not matter, because the
standard does not specify that order for memcpy (and, in fact, the
typical implementation of memmove() will sometimes copy the bytes in
reverse order). The only way that other statements, between the stores,
can be relevant, is if they change either the object being copied, or
the object being copied to. All other statements are irrelevant to the
equivalence with memcpy().

and to contain a valid representation of a null pointer to double.
The relevant consequence of this fact is that, for an implementation
which provides the guarantee (not provided by the standard itself)
that *pin has a value of 0 at each point where that value is copied
to *pout in the above loop, then that loop must be replaceable by

Click to expand...

for(size_t i=0; i < sizeof(**dout); i++)
pout = 0;

Click to expand...

without any change to the resulting behavior.

Click to expand...

The behavior doesn't change at *this* point,

But the difference between the two versions is complete at this point.
If the behavior has not yet differed, it has no license to change from
this point onward.

....

We are talking about dynamically allocated memory. *d is not in this
case.

Click to expand...

Sorry - another typo. That should have been *dp.

This is off-topic: your code does not use memset.

Click to expand...

memset() accesses objects as arrays of unsigned char. My example
contains code equivalent to memset(), and cites that equivalence as
justification for the usability of memset() to achieve the desired
effect - how is that off-topic? You might believe the connection to be
incorrect, but I don't see how it's off-topic.

Vincent Lefevre · Apr 28, 2010

In comp.std.c, article <[email protected]>,

I don't know how too explain that deduction to you; to me, it all seems
clearly implied by the phrase "copied as an array of character type".

Clearly not. There's no "double *" in "copied as an array of character
type". I don't see why a memset would set the effective type to
"double *". And why "double *" and not something else?

[...]

and to contain a valid representation of a null pointer to double.
The relevant consequence of this fact is that, for an implementation
which provides the guarantee (not provided by the standard itself)
that *pin has a value of 0 at each point where that value is copied
to *pout in the above loop, then that loop must be replaceable by

Click to expand...

for(size_t i=0; i < sizeof(**dout); i++)
pout = 0;

Click to expand...

without any change to the resulting behavior.

Click to expand...

The behavior doesn't change at *this* point,

Click to expand...

But the difference between the two versions is complete at this point.
If the behavior has not yet differed, it has no license to change from
this point onward.

Click to expand...

Here's a counter-example. Assume unsigned long and void * have
the same size 8 (no padding bits), and that the null pointer
is represented by a sequence of null bytes, and consider the
following two cases:

void foo (void)
{
void *p = malloc(8);
*(unsigned long *)p = 0;
printf ("%ld\n", *(unsigned long *)p);
}

void foo (void)
{
void *p = malloc(8);
*(void **)p = 0;
printf ("%ld\n", *(unsigned long *)p);
}

Until the "... = 0;", the behavior has not changed. However, though
the following line is the same in both cases, the first case has
well-specified behavior, while the second case has undefined behavior
(since it breaks the aliasing rules).

[...]

memset() accesses objects as arrays of unsigned char. My example
contains code equivalent to memset(), and cites that equivalence as
justification for the usability of memset() to achieve the desired
effect - how is that off-topic? You might believe the connection to
be incorrect, but I don't see how it's off-topic.

Click to expand...

Your code doesn't use memset. Please show a code that uses memset
(without typos). And reasoning based on "the behavior has not yet
differed" is flawed, as shown above.

James Kuyper · Apr 28, 2010

Vincent said:
In comp.std.c, article <[email protected]>,

Clearly not. There's no "double *" in "copied as an array of character
type".

"is copied as an array of character type" is a fragment of a much
longer, complicated statement. That complete statement talks about the
case when "a value is copied into an object", and following sentences
refer to "the object from which the value is copied" - all without
constraining the type of either object, or of the value copied, in any
ways. An object of type 'double *'. clearly qualifies as "an object",
despite the fact that neither 'double' nor '*' appear anywhere in that
entire paragraph.

... I don't see why a memset would set the effective type to
"double *". And why "double *" and not something else?

The relevant issue is the the fact that it has a known representation.
If you access memory with no declared type through lvalues of character
type in order to create something known to be a valid representation of
an object of a given type, and the first access to that memory is
through an lvalue of that type, that access gives that memory it's
effective type, and will not, in itself, run afoul of any requirement
specified by the C standard.

There are only a few types for which the standard guarantees that any
particular representation is valid, but any one of those type could have
been used in this example.

Representations of most types are unspecified by the standard; there's
no requirement that they be documented by the implementation. However,
most implementations do document the representations of many types, and
code that is not intended to be portable can use this approach for any
of those types; there's nothing specific to double* about this.

and to contain a valid representation of a null pointer to double.
The relevant consequence of this fact is that, for an implementation
which provides the guarantee (not provided by the standard itself)
that *pin has a value of 0 at each point where that value is copied
to *pout in the above loop, then that loop must be replaceable by
for(size_t i=0; i < sizeof(**dout); i++)
pout = 0;
without any change to the resulting behavior.
The behavior doesn't change at *this* point,

Click to expand...

Click to expand...

But the difference between the two versions is complete at this point.
If the behavior has not yet differed, it has no license to change from
this point onward.

Click to expand...

Here's a counter-example. Assume unsigned long and void * have
the same size 8 (no padding bits), and that the null pointer
is represented by a sequence of null bytes, and consider the
following two cases:

void foo (void)
{
void *p = malloc(8);
*(unsigned long *)p = 0;
printf ("%ld\n", *(unsigned long *)p);
}

void foo (void)
{
void *p = malloc(8);
*(void **)p = 0;
printf ("%ld\n", *(unsigned long *)p);
}

Neither unsigned long nor void** are character types; the license to do
something like this is restricted to accessing the objects as arrays of
character type. As soon as you accessed the memory pointed at by p
though an lvalue of type void**, it acquired that as it's effective
type. Because of the anti-aliasing rules, a conforming compiler is not
required to consider the possibility that *(void**)p and *(unsigned
long)p refer to the same location in memory (even though, in this case,
that "possibility" can trivially be determined to be a certainty). It
could therefore have evaluated *(unsigned long)p prior to execution of
the assignment expression, and used that saved result as an argument for
the printf() call. In a different, more complicated example, this could
actually be a reasonable optimization.

Such an optimization would not be permitted when a character type is
involved, because the anti-aliasing rules give special status to those
types.

Your code doesn't use memset. Please show a code that uses memset
(without typos).

Click to expand...

I can't guarantee "no typos", but I'll do my best to avoid them. Where I
wrote that the second loop was equivalent to a call to memset(), please
consider that statement to have been replaced by an otherwise identical
statement that specifies the precise syntax of the equivalent call:

memset(dp, 0, sizeof *dp);

I didn't think I'd have to spell out details like that, and I don't
understand why you think the absences of those details is a problem.

Ersek, Laszlo · Apr 28, 2010

In comp.std.c, article
<[email protected]>, Ersek,

static double *dp; /* suppose all-bits-zero */
static char unsigned zeroes[sizeof dp];

Click to expand...

static void
z1(void **dpp)
{
(void)memcpy(dpp, &dp, sizeof dp);
}

Click to expand...

static void
z2(void **dpp)
{
size_t pos;

Click to expand...

assert(0 == memcmp(&dp, zeroes, sizeof zeroes));
for (pos = 0u; pos < sizeof zeroes; ++pos) {
((char unsigned *)dpp)[pos] = zeroes[pos];
}
}

Click to expand...

static void
z3(void **dpp)
{
(void)memset(dpp, 0, sizeof(double *));
}
Insomuch as TC2 entry 9 has rendered z2() and z3() equivalent to z1() wrt.
integers, without touching 6.5 at all,

Click to expand...

??? Could you explain? I don't see such a thing on

http://www.open-std.org/jtc1/sc22/wg14/www/docs/tc2.htm

That's TC2 to C90 (ISO/IEC 9899:1990). I meant TC2 to C99 (ISO/IEC
9899:1999):

http://www.open-std.org/jtc1/sc22/wg14/www/docs/9899-1999_cor_2-2004.pdf

Cheers,
lacos

Vincent Lefevre · Apr 29, 2010

In comp.std.c, article <[email protected]>,

Ersek said:
In comp.std.c, article
<[email protected]>, Ersek,

static double *dp; /* suppose all-bits-zero */
static char unsigned zeroes[sizeof dp];

Click to expand...

static void
z1(void **dpp)
{
(void)memcpy(dpp, &dp, sizeof dp);
}

Click to expand...

static void
z2(void **dpp)
{
size_t pos;

Click to expand...

assert(0 == memcmp(&dp, zeroes, sizeof zeroes));
for (pos = 0u; pos < sizeof zeroes; ++pos) {
((char unsigned *)dpp)[pos] = zeroes[pos];
}
}

Click to expand...

static void
z3(void **dpp)
{
(void)memset(dpp, 0, sizeof(double *));
}
Insomuch as TC2 entry 9 has rendered z2() and z3() equivalent to z1() wrt.
integers, without touching 6.5 at all,

Click to expand...

??? Could you explain? I don't see such a thing on

http://www.open-std.org/jtc1/sc22/wg14/www/docs/tc2.htm

Click to expand...

That's TC2 to C90 (ISO/IEC 9899:1990).

OK, I didn't notice that.

I meant TC2 to C99 (ISO/IEC 9899:1999):

http://www.open-std.org/jtc1/sc22/wg14/www/docs/9899-1999_cor_2-2004.pdf

This entry 9 is about the representation of integers, not about
aliasing rules. See 6.5 p6 and p7.

Vincent Lefevre · Apr 29, 2010

In comp.std.c, article <[email protected]>,

"is copied as an array of character type" is a fragment of a much
longer, complicated statement.

Yes, but you only said "copied as an array of character type" a few
lines above. I would be interesting in the *whole* reasoning.

That complete statement talks about the case when "a value is copied
into an object", and following sentences refer to "the object from
which the value is copied"

The point with memset() is that no objects are involved, or possibly
only an array of unsigned char.

- all without constraining the type of either object, or of the
value copied, in any ways. An object of type 'double *'. clearly
qualifies as "an object", despite the fact that neither 'double' nor
'*' appear anywhere in that entire paragraph.

I recall the code (based on what was posted in the austin-group list):

double **dp = malloc(sizeof(double *));
memset (dp, 0, sizeof(double *));

There's a type double *, but no object of this type here. The question
was: what is the effective type of the object stored at dp just after
the memset()?

The relevant issue is the the fact that it has a known representation.
If you access memory with no declared type through lvalues of character
type in order to create something known to be a valid representation of
an object of a given type, and the first access to that memory is
through an lvalue of that type, that access gives that memory it's
effective type, and will not, in itself, run afoul of any requirement
specified by the C standard.

Where does the C standard say that? Please, quote it!

Neither unsigned long nor void** are character types;

You never said that this was a requirement. Again, I want your full
reasoning. But I think our disagreement is on your paragraph above,
on which I ask explanations.

Ersek, Laszlo · Apr 29, 2010

I recall the code (based on what was posted in the austin-group list):

double **dp = malloc(sizeof(double *));
memset (dp, 0, sizeof(double *));

There's a type double *, but no object of this type here. The question
was: what is the effective type of the object stored at dp just after
the memset()?

I would say either "none", which would be okay for the purpose in
question, or perhaps even "double *", if the memset() established a valid
value representation, which would be again okay. IIRC your opinion is
"char".

I believe the *intent* of the standard justifies my opinion (at least the
"none" option), but I'm sort of forced to agree that the strict *wording*
of the standard justifies yours.

I really have no more arguments in this discussion. I was under the vague
impression that many clc contributors share the opinion that TC2 entry 9
rendered memset()-to-\0 defined for zeroing out integers.

Cheers,
lacos

James Kuyper · Apr 30, 2010

Vincent said:
In comp.std.c, article <[email protected]>,
....
The point with memset() is that no objects are involved, or possibly
only an array of unsigned char.

Which means that it does not establish an effect type for the memory
affected by the memset() call. However, that memory doesn't need to have
an effective type at this point; it will acquire an effective type at
the point where the value of that object is read. The key points are that:

a) The anti-aliasing rules allow any object to be accessed and (if
modifiable) to be modified as if it were an array of unsigned char.
Ensuring that the resulting object contains a valid representation for
it's type is, in general, tricky - unless that representation is
obtained by copying from another object of that same type already
containing a valid value. But we're not addressing the general case;
we're addressing a specific case where a valid representation is known.

b) An unsigned char with a value of 0 has all-bits-0. This is relevant,
because the behavior of memset() is defined in terms of unsigned char.

c) For the specific implementation under discussion, all-bits-0 is also
a valid representation of a null pointer.

That's all that's needed to ensure that, when this block of memory
acquires an effective type by being accessed through an lvalue of type
double*, it will contain a valid representation of a null pointer to
double, and therefore can safely be read.

....

I recall the code (based on what was posted in the austin-group list):

double **dp = malloc(sizeof(double *));
memset (dp, 0, sizeof(double *));

There's a type double *, but no object of this type here. The question
was: what is the effective type of the object stored at dp just after
the memset()?

It has none. It acquires an effective type only when *dp is evaluated.
However, by that time, on such an implementation, it contains a valid
representation of a null pointer.

Because dp has the type double**. If it had the type unsigned long*, and
sizeof(double*) were replaced with (or happened to be identical to)
sizeof *dp (which is a good idea, any way), then the exact same code
would cause *dp to have an effective type of unsigned long. In either
case, it would not acquire that effective type until the *dp was
actually evaluated.

Where does the C standard say that? Please, quote it!

I can't tell you where it says that this does not "run afoul of any
requirement specified by the C standard", because the C standard does
not in fact contain words to that effect. What it does have, is a
complete absence of requirements violated by such code. I can prove that
by citation, but only by citation of every single clause of the
standard, and then pointing out that none of them specifies a
requirement violated by such code. I'm sorry, but I don't feel inclined
to honor your request that I quote that.

If you think that there is a requirement that it violates, you should be
able to the cite that requirement, which seems a more appropriate way
address our disagreement on this point.

You never said that this was a requirement.

I thought that you were already aware of the fact that the anti-aliasing
rules make special allowances for character types, and that my argument
was based upon those special allowances. Well, if you didn't know that
before, you know it now.

Pointer-to-Object type error	0	Mar 26, 2022
Representation of Pointer-to-Struct	17	Aug 6, 2010
Adding adressing of IPv6 to program	1	Feb 16, 2023
Can "all bits zero" be a trap representation for integral types?	6	Jul 6, 2007
NULL with representation other then all bits 0	64	Jan 28, 2006
a constant pointer to constant data and ...	3	Apr 19, 2014
printing bits ... the right way	2	Apr 1, 2010
print bits of unsigned value	7	Jun 15, 2010

all-bits-zero pointer-to-object representation

Ersek, Laszlo

James Kuyper

Ersek, Laszlo

James Kuyper

Ersek, Laszlo

Ersek, Laszlo

Vincent Lefevre

Vincent Lefevre

Ersek, Laszlo

Vincent Lefevre

James Kuyper

Vincent Lefevre

James Kuyper

Vincent Lefevre

James Kuyper

Ersek, Laszlo

Vincent Lefevre

Vincent Lefevre

Ersek, Laszlo

James Kuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads