union of structs: common variable stored in same address?

Tim Rentsch · Oct 9, 2009

jameskuyper said:
Tim said:

It is widely understood that a permissible conversion of a pointer to
a given object results in a new pointer value that points at an object
whose initial byte is the same as the initial byte of the object
pointed at by the original pointer - but the standard never actually
says so, [except for character types].

Click to expand...

What you mean is the Standard never says this directly.
Essentially everyone other than you believes it's implied by
other statements in the Standard. What makes you think your
interpretation is right and all those other people are wrong?

Click to expand...

I have been unable to identify a valid argument derived from the
actual requirements of the standard to demonstrate that this
conclusion is implied by those requirements. I've discussed that
opinion not just once, but many times, and no one who has disagreed
have ever presented such an argument, either (though not for want of
trying). Do I need anything more than that to justify my conclusion
that no such argument is possible?

Yes, no one has presented an argument that convinces you, I get
that. Have you considered the idea that your convictions rest
on some assumptions that other people generally don't agree
with? Have you ever tried to identify such (possible) assumptions?

What does the number of people who disagree with me have to do with
that?

There's a saying which I expect you've heard, "The battle is
not always to the strong, nor the race to the swift. But
that's the way to bet." One person's opinion (or conclusion,
if you prefer that) being different from most other people's
doesn't mean that opinion/conclusion is wrong necessarily,
but it does raise the probability that something is askew.

...

If I had never discussed this issue publicly before, and never seen
the opposing "arguments" before, I might be more willing to consider
that possibility. However, when people with a great deal of knowledge
of the standard, who believe strongly that I'm wrong about this, are
unable to articulate valid arguments based upon correct premises to
support their belief, I think I'm entitled to a little more confidence
in my understanding of this issue than you think I should have.

Wouldn't other people say the same thing about you? Doesn't
it seem more likely that the two sets of reasonings are based
on different underlying assumptions than that the reasoning
abilities of all those other people are worse than yours?

If you consider that arrogance, so be it.

Please don't read things into my statements that aren't there.
I didn't use the word arrogance, and in fact the word never
entered my mind. My question was meant only as a question, not
to imply a subtext.

Tim Rentsch · Oct 9, 2009

Ben Bacarisse said:
Tim Rentsch said:

Ben Bacarisse said:

(e-mail address removed) (Alan Curry) writes:
[snip]
The definition above looks a lot like an XEvent from X11/Xlib.h, which
starts out:

typedef union _XEvent {
int type; /* must not be changed; first element */
XAnyEvent xany;
XKeyEvent xkey;
XButtonEvent xbutton;
...

The code using this union often looks like

/* get a pointer-to-union ev from somewhere */
switch(ev->type) {
case KeyPress: /* do something with ev->xkey */
case KeyPress: /* do something with ev->xkey */
case ButtonRelease: /* do something with ev->xbutton */
case ButtonRelease: /* do something with ev->xbutton */
}

This is likely to work, yes, but I don't think is it guaranteed to do
what the programmer expects. I think it is permissible for an
implementation to align structs and size_ts in such a way that the
type members don't coincide. That is a practical matter. More
formally, I can't see any text in the standard that assures me that
will work as expected.

Click to expand...

What Alan said -- all members of a union are aligned at the
beginning of the union, and the first member of a struct is
aligned at the beginning of the struct.

Click to expand...

Honestly I am not yet persuaded. I've been busy so I have not had
time to think this through but the trouble is I am not 100% sure that
the wording in the standard is watertight.

I don't think is makes the assurance you state about alignment, at
least not directly. What it does is say that a pointer, suitably
converted, points to all union members.

I can't quite shake the fear that some peculiar addressing system
allows size_t and structs (even ones that start with a size_t member)
to differently aligned whilst permitting the pointers to work as
required due to the conversion. However, off and on over the last few
days I've tried top come up with a set of mappings between the various
address types that give the effect I am thinking of and I can't!
Every time, the mappings fall foul of some requirement or other or
they simply put the size_t members in the same place (as one would
expect). It is only a nagging doubt that prevents me from saying,
"no, I fold".

Different people read the Standard in different ways.
I'm interested to learn more about your views here
but first I would like to ask how you read the Standard.
Let me broadly put different ways of reading here into
two categories, (a) "how I think the committee expects
the Standard should be read", and (b) "what the Standard
literally says". The quotes there are not meant as
"scare quotes" but to indicate that these phrases too
have different meanings for different people. (Also,
a point of clarification -- "how I think the committee
expects the Standard should be read" is about the reading
process, not about the conclusions reached.)

Under this somewhat broad classification, which of
the following would you say most closely represents
your position:

(a) "I read the Standard as I think the committee expects
it to be read, and using that reading it isn't clear
that the text mandates the proposed conclusion [that unions
can't have padding at the beginning]";

(b) "It isn't clear how the committee expects the Standard
to be read (perhaps only in some phrasings), and looking
at what the text literally says hasn't convinced me of
the proposed conclusion"; or,

(c) "I don't know (or don't care) how the committee expects
the Standard should be read; what the text literally says
doesn't support the proposed conclusion (or at least I
haven't found passages that provide that support)."

(Or feel free to state an alternative (d) if you can
explain what it is.)

Besides indicating a, b, or c, if you could give section/paragraph
numbers for text you think is relevant to the question at hand,
that would be helpful. More followup after your response.

jameskuyper · Oct 9, 2009

Tim said:
jameskuyper said:

Tim said:

It is widely understood that a permissible conversion of a pointer to
a given object results in a new pointer value that points at an object
whose initial byte is the same as the initial byte of the object
pointed at by the original pointer - but the standard never actually
says so, [except for character types].

What you mean is the Standard never says this directly.
Essentially everyone other than you believes it's implied by
other statements in the Standard. What makes you think your
interpretation is right and all those other people are wrong?

Click to expand...

I have been unable to identify a valid argument derived from the
actual requirements of the standard to demonstrate that this
conclusion is implied by those requirements. I've discussed that
opinion not just once, but many times, and no one who has disagreed
have ever presented such an argument, either (though not for want of
trying). Do I need anything more than that to justify my conclusion
that no such argument is possible?

Click to expand...

Yes, no one has presented an argument that convinces you, I get
that. Have you considered the idea that your convictions rest
on some assumptions that other people generally don't agree
with?

Yes, but I only considered it briefly; it's seriously inconsistent
with the content of the many discussions I've had on this issue. The
issue seems not to be assumptions that I have made, but assumptions
that others make that I refuse to make. I have considered very
carefully the possibility that other people were right to make those
assumptions; and after careful consideration, rejected that
possibility. Those assumptions are accurate with respect to how real-
world implementations actually implement pointer conversions, and I
quite readily make those assumptions when reasoning about how real-
world implementations work. However, when answering questions about
what the standard actually requires, those assumptions have no basis
in fact. The fundamental problem seems to be getting people to
correctly apply the distinction between what actually happens, and
what the standard requires to happen.

They know how pointer conversions actually work, they read the
standard with that knowledge in place, and see that it's entirely
consistent with their assumptions, they realize that it was clearly
written by authors who made the same assumptions, and they miss the
fact that the standard falls short of actually requiring that those
assumptions be true.

It could have been the case that, despite not explicitly requiring the
assumptions to be true, by making those assumptions sufficiently
frequently, the authors might have accidentally written requirements
that could be combined to derive those requirements implicitly.
However, that hasn't actually happened.

Wouldn't other people say the same thing about you? ...

That's quite likely. But that's always the case when two people
disagree, pretty much independent of which one is right, or how good
either person's reasons are for believing what they believe.

... Doesn't
it seem more likely that the two sets of reasonings are based
on different underlying assumptions than that the reasoning
abilities of all those other people are worse than yours?

If I was unaware of what those assumptions were and how the reasoning
was performed, that would be a very reasonable possibility to
consider. However, I've discussed this many times with many different
people, some of them undoubtedly among the best informed of the people
disagreeing with me on this issue. They have told me what their
assumptions were, and they have told me how they reasoned to reach
that conclusion. I no longer need to guess about those matters; I can
evaluate the accuracy of their assumptions and the quality of their
reasoning - and in every case I've found either one or the other
lacking.

Please don't read things into my statements that aren't there.
I didn't use the word arrogance, and in fact the word never
entered my mind. My question was meant only as a question, not
to imply a subtext.

I wasn't reading things into your statements, I was anticipating
possible responses. However, they were only possible, not certain,
which is why I prefaced that comment with "If".

jameskuyper · Oct 9, 2009

Tim said:
It is widely understood that a permissible conversion of a pointer to
a given object results in a new pointer value that points at an object
whose initial byte is the same as the initial byte of the object
pointed at by the original pointer - but the standard never actually
says so, [except for character types].

Click to expand...

....
Essentially everyone other than you believes it's implied by
other statements in the Standard. What makes you think your
interpretation is right and all those other people are wrong? ....
Isn't it more likely that you've misunderstood some
other part of the Standard than that this point has been
missed by everyone else who's looked at it?

You're implicitly committing the fallacy of an argument from authority
here, with the authority being "everyone other than you", "all those
other people", and "everyone else who's looked at it". However, if I'm
wrong, there should be an actual argument demonstrating that point.
Instead of repeatedly asking me whether I've given adequate
consideration to the possibility that I'm wrong, why don't you just
present the relevant argument?

If "everyone other than you", "all of those other people" and
"everyone else who's looked at it" all know that the assumption I
referred to is correct, then the reasons they have for knowing it
should be fairly well known, too. Please let me know what you think
those reasons are.

Let me give you a little bit of help. The strongest argument I've seen
so far, I've already presented (though my presentation contained a
number of typos no one's bothered commenting on). Given

typedef union {
size_t type; // new, extra-struct type
struct S1 s1;
struct S2 s2;
// a pointer to each struct S? that has been declared
struct Sn sn;
} cursor_t;
cursor_t ct;

Sections 6.3.2.3p7 and 6.7.2.1p14 can be used to demonstrate the
point, which I'll quite happily concede, that

(char*)(size_t*)&ct == (char*)&ct.type

The next step of that argument simplifies this to:

(char*)&ct == (char*)&ct.type

but the person proposing that "proof" could never give me a
justification for that last step which was based upon actual
requirements of the standard. The closest he came was to assume as
true the conclusion that he's trying to prove, and use that assumption
to justify the step.

Care to give it a try? Fix up that argument. Justify the last step in
a non-circular way based upon actual requirements of the standard. Or
continue from that point and go a different way to prove the point.
Alternatively, use your own argument, if you wish.

But please stop questioning whether I've given adequate consideration
of other people's arguments.

Seebs · Oct 9, 2009

typedef union {
size_t type; // new, extra-struct type
struct S1 s1;
struct S2 s2;
// a pointer to each struct S? that has been declared
struct Sn sn;
} cursor_t;
cursor_t ct;

Sections 6.3.2.3p7 and 6.7.2.1p14 can be used to demonstrate the
point, which I'll quite happily concede, that

(char*)(size_t*)&ct == (char*)&ct.type
Okay.

The next step of that argument simplifies this to:

(char*)&ct == (char*)&ct.type

I think this was intended to be so obvious that no one bothered to specify it.
See, e.g., the language in 6.7.2.1, p13: "There may be padding within a
structure object, but not at its beginning."

Hmm. Okay, let's mess with this. For this NOT to be true, what has to be the
case? It must be that (char *)(size_t *)&ct != (char *)&ct. Which, in turn,
means that there must be some actual fiddling occuring when ct is cast to
(size_t). We know that (char *)&ct points to the lowest byte of ct (6.3.2.3,
parapgrah 7). We know that (size_t *)&ct points to ct.type (6.7.2.1, p.14).
We know that (char *)(size_t *)&ct points to ct.type.

For (char *)&ct not to be the same as (char *)&ct.type, then, there must be
some magic which occurs when &ct is converted to (size_t *), which corresponds
to an internal allocation which places ct.type other than at the beginning
of the union.

I'd actually feel comfortable asserting that the similarity between p13 and
p14 in 6.7.2.1 is intended to guarantee that neither has initial padding.
However, you could argue that the "There may be padding..." sentence being in
p13 and not p14 is an intentional difference, rather than the omission of
a redundant explanation of the meaning of the "suitably converted" language.

Hmm.

Okay, here's a puzzler for you:

(char *)(size_t *)(void *)&ct;

A conversion to (void *) can't itself invoke undefined behavior. Because
of this, we can pass a function in another translation unit a (void *)
which happens to point to ct, and another (void *) which happens to point
to ct.type.

We then convert them both via casts to (size_t *). Since each is a
suitably-converted pointer, they must match. However, there is no way
for the compiler, observing that translation unit, to know that one of
them was a pointer to ct, and one to ct.type. Hmm. Unless (void *) is
sufficiently complicated as to include the pedigree of the pointer, so
that it can continue to transparently shuffle things behind the scenes.

.... I think I'll go with: I am pretty sure that it is not possible for
the pointers to differ, and I would have no qualms reporting it as a bug
if they did, and I'd expect it to get fixed. I'm willing to grant that
I'm not totally sure that the standard is explicit, but I think it falls
under the same rule as "all struct pointers smell the same"; we know it's
intended even if it's not explicitly stated.

-s

jameskuyper · Oct 9, 2009

Seebs said:
I think this was intended to be so obvious that no one bothered to specify it.
See, e.g., the language in 6.7.2.1, p13: "There may be padding within a
structure object, but not at its beginning."

Hmm. Okay, let's mess with this. For this NOT to be true, what has to be the
case? It must be that (char *)(size_t *)&ct != (char *)&ct. Which, in turn,
means that there must be some actual fiddling occuring when ct is cast to
(size_t). We know that (char *)&ct points to the lowest byte of ct (6.3.2.3,
parapgrah 7). We know that (size_t *)&ct points to ct.type (6.7.2.1, p.14).
We know that (char *)(size_t *)&ct points to ct.type.

For (char *)&ct not to be the same as (char *)&ct.type, then, there must be
some magic which occurs when &ct is converted to (size_t *), which corresponds
to an internal allocation which places ct.type other than at the beginning
of the union.

I'd actually feel comfortable asserting that the similarity between p13 and
p14 in 6.7.2.1 is intended to guarantee that neither has initial padding.
However, you could argue that the "There may be padding..." sentence being in
p13 and not p14 is an intentional difference, rather than the omission of
a redundant explanation of the meaning of the "suitably converted" language.

It is precisely my assertion that "There may be unnamed padding within
a structure object, but not at its beginning." cannot be derived from
the previous sentence of the paragraph, and is therefore not
redundant, though I'm quite sure it was intended to be.

....

Hmm.

Okay, here's a puzzler for you:

(char *)(size_t *)(void *)&ct;

A conversion to (void *) can't itself invoke undefined behavior. ...

True. But the behavior, while not unspecified, is underspecified. The
only thing we know for certain about the value resulting from that
conversion is that if it is converted back to cursor_t*, the result of
that conversion would compare equal to ct. The standard says nothing
about where (void*)&ct points, and it says nothing about what happens
when (void*)&ct is converted to any type other than (cursor_t*).

... Because
of this, we can pass a function in another translation unit a (void *)
which happens to point to ct, and another (void *) which happens to point
to ct.type.

We then convert them both via casts to (size_t *). Since each is a
suitably-converted pointer, they must match.

It's precisely at that point which I must disagree. The standard does
not say where (size_t*)(void*)&ct points.

Seebs · Oct 10, 2009

It is precisely my assertion that "There may be unnamed padding within
a structure object, but not at its beginning." cannot be derived from
the previous sentence of the paragraph, and is therefore not
redundant, though I'm quite sure it was intended to be.

I think it can, but I'm not totally sure. The key is that I don't think
you can get the described behavior (suitably converted pointers point to
first member) without there being no padding.

True. But the behavior, while not unspecified, is underspecified. The
only thing we know for certain about the value resulting from that
conversion is that if it is converted back to cursor_t*, the result of
that conversion would compare equal to ct. The standard says nothing
about where (void*)&ct points, and it says nothing about what happens
when (void*)&ct is converted to any type other than (cursor_t*).

Hmmmm.

Okay, change that to "(char *)&ct", then. That's guaranteed and is
specified explicitly -- it points to the first character of ct.

It's precisely at that point which I must disagree. The standard does
not say where (size_t*)(void*)&ct points.

Sure it does. It's a suitably converted pointer to ct, and points to
ct.type, because it's a suitably converted pointer to the union.

As long as there's not an alignment problem, a series of conversions
is well-defined and intermediate conversions don't matter.

So (char *)&ct has to point to the first byte of ct, (char *)&ct.type
has to point to the first byte of ct.type, and (size_t *) of either
has to point to ct.type. Chains of non-undefined conversions have
to work, so far as I can tell. At least, I don't see any exceptions.

-s

James Kuyper · Oct 10, 2009

Seebs said:
I think it can, but I'm not totally sure. The key is that I don't think
you can get the described behavior (suitably converted pointers point to
first member) without there being no padding.

If there were padding, the compiler would know what the offset was at
any point where there was a conversion involving the union type. As long
at the type of the union member is not a character type, a conforming
implementation could add that offset when converting from the union type
to the member's type, and subtract it when converting in the opposite
direction. As far as I can see, such an implementation would not violate
any requirement of the standard.

An exception must be made if the member has a character type, because
conversion to a character type is guaranteed to return a pointer to the
first byte of the object. However, there can be padding before any
member of the union that does not have a character type, so long as the
union is big enough to allow that to happen.

Hmmmm.

Actually, that was a bad argument on my part. Section 6.2.6.1p4 says
"Values stored in non-bit-field objects of any other object type consist
of n Â´ CHAR_BIT bits, where n is the size of an object of that type, in
bytes. The value may be copied into an object of type unsigned char [n]
(e.g., by memcpy);"

Given the interface of memcpy(), that only makes sense if converting a
void* to a char* is guaranteed to give the same result as a direct
conversion to char*. The standard contains no such guarantee explicitly,
but I think that 6.2.6.1p4 implicitly guarantees it.

The key to that conclusion is the fact that memcpy() is not described as
having magical properties that allow it to copy the bytes of an object,
it's merely given as an example of one way to do it; which means that
my_memcpy(), a function with precisely the same interface and the
obvious pure-C implementation of the required semantics for memcpy(),
must serve the same purpose. Therefore, any guarantees that you need to
make my_memcpy() serve that purpose, that aren't already provided
elsewhere in the standard, are arguably implicit in 6.2.6.1p4. In
particular, converting a void* to a char* has to give you the first byte
of the object. This implies that in some sense (void*)&ct must point at
the location in memory of ct, and not be some arbitrary location.

I should have concentrated my attention, not on the (void*)&t
conversion, but the (size_t*). conversion. See below for more details.

....

Sure it does. It's a suitably converted pointer to ct, and points to
ct.type, because it's a suitably converted pointer to the union.

The standard unfortunately fails to define what "suitably converted"
means. If (size*)&ct were not suitable, that clause would be too obscure
to be meaningful, but I don't think it's clear that any conversion that
can't be derived from that one qualifies as suitable. If (size*)&ct is
suitable, you can easily derive that (size*)(cursor_t*)(void*)&ct must
be suitable, too. However, it's not clear to me that (size*)(void*)&ct is.

As long as there's not an alignment problem, a series of conversions
is well-defined and intermediate conversions don't matter.

That's a common assumption - but is it supported by requirements stated
in the standard? It's certainly not the case for arithmetic conversions:
(float)(int)3.5 is not the same as (float)3.5. Does the standard say
anything explicitly about pointer conversions, that it doesn't say about
arithmetic conversions, that allows you to drop intermediate steps in a
string of pointer conversions? I think it says far less about pointer
conversions than arithmetic ones, and that's precisely the problem.

So (char *)&ct has to point to the first byte of ct, (char *)&ct.type
has to point to the first byte of ct.type, and (size_t *) of either
has to point to ct.type. Chains of non-undefined conversions have
to work, so far as I can tell. At least, I don't see any exceptions.

I don't know of any real-world exceptions, because every real world
implementation I'm aware of obeys certain common assumptions for which I
can find no support in the standard. That doesn't mean that the standard
actually requires it to work.

Seebs · Oct 10, 2009

If there were padding, the compiler would know what the offset was at
any point where there was a conversion involving the union type. As long
at the type of the union member is not a character type, a conforming
implementation could add that offset when converting from the union type
to the member's type, and subtract it when converting in the opposite
direction. As far as I can see, such an implementation would not violate
any requirement of the standard.

That would depend on what "suitably" means.

I am pretty sure that it just means "to the right type, without any
undefined behavior in the meantime".

An exception must be made if the member has a character type, because
conversion to a character type is guaranteed to return a pointer to the
first byte of the object. However, there can be padding before any
member of the union that does not have a character type, so long as the
union is big enough to allow that to happen.

Hmm.

(char *)&ct has to point to the first byte of ct.
(char *)&ct.type has to point to the first byte of type.

(size_t *)(char *)&ct has to point, I believe, to the first byte of ct,
as long as (size_t) does not have stricter alignment requirements than
ct.

(size_t *)(char *)&ct.type has to point, I believe, to the first byte
of ct.type.

But there doesn't seem to be any sane way, given the intermediate
(char *), to tell which of the two you have -- so if the (size_t *)
conversion has to be different, there's something implausible
going on.

Given the interface of memcpy(), that only makes sense if converting a
void* to a char* is guaranteed to give the same result as a direct
conversion to char*. The standard contains no such guarantee explicitly,
but I think that 6.2.6.1p4 implicitly guarantees it.

It's certainly intended, so far as I know.

6.2.5p27 says that (char *) and (void *) have the same representation and
alignment requirements. Hmm. 6.5.4, paragraph 4, "A cast that specifies no
conversion has no effect on the type or value of an expression." I would
argue that if two types have the same representation, that a cast between them
specifies no conversion.

.... oh, hey.

I think I'm gonna argue this based on 6.7.2.1p14.

A pointer to a union object, suitably converted, points to *each*
of its members (or if a member is a bitfield, then to the unit in
which it resides) and vice versa.

Emphasis mine.

That same pointer points to each of the members.

Consider, then:

union ct_type {
unsigned char u;
size_t s;
} ct;

Clearly, (unsigned char *) &ct == &ct.u.

(unsigned char *) &ct.s... what about it? Hmm. It clearly, suitably
converted, points to ct. So...

(unsigned char *)(union ct_type *)(unsigned char *)&ct.s == &ct.u

I don't think this allows (unsigned char *) &ct.s to be different
from (unsigned char *) &ct.u.

And since we know that (unsigned char *) &ct has to point to the first
character of ct, and has to compare equal to &ct.u as well, I think that
guarantees that there is no initial padding.

The standard unfortunately fails to define what "suitably converted"
means. If (size*)&ct were not suitable, that clause would be too obscure
to be meaningful, but I don't think it's clear that any conversion that
can't be derived from that one qualifies as suitable. If (size*)&ct is
suitable, you can easily derive that (size*)(cursor_t*)(void*)&ct must
be suitable, too. However, it's not clear to me that (size*)(void*)&ct is.

Hmm. I think malloc'd memory doesn't work if you can't count on that -- you
need to be able to safely copy objects into typeless memory and copy them
out later. Thus, you have to be able to have an (unsigned char *) which
points to a block of data in which you have stashed an arbitrary object,
and no matter what you stashed there, if you cast to that type, you get
the object. So the (unsigned char *) pointer clearly has those properties,
and (void *) has the same representation.

That's a common assumption - but is it supported by requirements stated
in the standard? It's certainly not the case for arithmetic conversions:
(float)(int)3.5 is not the same as (float)3.5. Does the standard say
anything explicitly about pointer conversions, that it doesn't say about
arithmetic conversions, that allows you to drop intermediate steps in a
string of pointer conversions? I think it says far less about pointer
conversions than arithmetic ones, and that's precisely the problem.

True, it does say less about them, but I think the intent is that pointer
conversions are simply not as complicated.

-s

Barry Schwarz · Oct 10, 2009

If there were padding, the compiler would know what the offset was at
any point where there was a conversion involving the union type. As long
at the type of the union member is not a character type, a conforming
implementation could add that offset when converting from the union type
to the member's type, and subtract it when converting in the opposite
direction. As far as I can see, such an implementation would not violate
any requirement of the standard.

An exception must be made if the member has a character type, because
conversion to a character type is guaranteed to return a pointer to the
first byte of the object. However, there can be padding before any
member of the union that does not have a character type, so long as the
union is big enough to allow that to happen.

I don't think so. 6.5.8p5 says "All pointers to members of the same
union object compare equal." Since pointers always point to the
beginning of the designated object and padding is not part of the
member, it appears that everything must be left aligned at the
beginning of the union.

Seebs · Oct 10, 2009

I don't think so. 6.5.8p5 says "All pointers to members of the same
union object compare equal." Since pointers always point to the
beginning of the designated object and padding is not part of the
member, it appears that everything must be left aligned at the
beginning of the union.

Ah-hah! There you go. I shoulda thought to look at the equality check.

Actually, I found another piece of evidence: "The value of at most one of the
members can be stored in a union object at any time."

Assume, for the sake of argument, that (void *)(size_t *)&ct != (void *)&ct.

Now, put an unsigned char member in the union, store to the size_t, and
then memcpy a single byte over the unsigned char. It is obvious that both
values still exist, because they're non-overlapping, and that contradicts
the standard's description of how unions work.

So clearly, the assumption was wrong.

-s

James Kuyper · Oct 10, 2009

Seebs said:
That would depend on what "suitably" means.

I am pretty sure that it just means "to the right type, without any
undefined behavior in the meantime".

The standard does not define the behavior of (T*)(void*)&x, unless T is
either a character type, or the actual type of 'x'. Therefore, I don't
think conversions of this form qualify as 'suitable' (with those two
exceptions).

Hmm.

(char *)&ct has to point to the first byte of ct.
(char *)&ct.type has to point to the first byte of type.

(size_t *)(char *)&ct has to point, I believe, to the first byte of ct,
as long as (size_t) does not have stricter alignment requirements than
ct.

It's not clear to me that the standard actually requires that; but I'll
reserve judgment for now.

But there doesn't seem to be any sane way, given the intermediate
(char *), to tell which of the two you have -- so if the (size_t *)
conversion has to be different, there's something implausible
going on.

Anything other than the obvious result that we all expect would be
extremely implausible - but my point is, would it be non-conforming?

It's certainly intended, so far as I know.

6.2.5p27 says that (char *) and (void *) have the same representation and
alignment requirements. Hmm. 6.5.4, paragraph 4, "A cast that specifies no
conversion has no effect on the type or value of an expression." I would
argue that if two types have the same representation, that a cast between them
specifies no conversion.

That seems reasonable, but I don't think we can actually apply 6.5.4p4
to anything other than conversion to a type compatible with the
operand's type.

... oh, hey.

I think I'm gonna argue this based on 6.7.2.1p14.

A pointer to a union object, suitably converted, points to *each*
of its members (or if a member is a bitfield, then to the unit in
which it resides) and vice versa.

Emphasis mine.

That same pointer points to each of the members.

I think that "suitably converted" necessarily means something different
for members of different types, which opens the possibility that, after
suitable conversion, the same pointer may point to different locations
in memory.

Consider, then:

union ct_type {
unsigned char u;
size_t s;
} ct;

Clearly, (unsigned char *) &ct == &ct.u.

(unsigned char *) &ct.s... what about it? Hmm. It clearly, suitably
converted, points to ct. So...
(unsigned char *)(union ct_type *)(unsigned char *)&ct.s == &ct.u

The requirement that conversion to a pointer to a char type points at
the first byte of the object means that my arguments do not apply to
members which have a character type; I'll even concede that this
argument might extend that exemption to other members of the same union;
though it depends upon the poorly defined concept "suitably
converted". Your argument doesn't work, however, if none of the members
has character type.

Hmm. I think malloc'd memory doesn't work if you can't count on that -- you
need to be able to safely copy objects into typeless memory and copy them
out later. Thus, you have to be able to have an (unsigned char *) which
points to a block of data in which you have stashed an arbitrary object,
and no matter what you stashed there, if you cast to that type, you get
the object. So the (unsigned char *) pointer clearly has those properties,
and (void *) has the same representation.

I don't see a problem for malloc(). I believe that it's permissible for
(void*)&ct to point to one location in memory, and (size_t*)&ct to
point into a different location, which is the actual location where
ct.type is stored. (size*)(void*)&ct, in this case, would point to a
size_t-sized piece of memory at the start of ct, but not at the actual
location of ct.type. I don't see this as causing a problem for malloc.

James Kuyper · Oct 10, 2009

Barry Schwarz wrote:
....

I don't think so. 6.5.8p5 says "All pointers to members of the same
union object compare equal."

OK, that's what I was looking for; I didn't find it because I was
expecting the relevant clause to be part of the section describing
unions, or types (6.7.2.1), not in the section describing the relational
operators (had I had thoughts in that direction, I would have expected
it to be in the equality operators section).

Unlike the similar guarantee for members and the union they are part of,
this rule can apply to pointers to two completely unrelated types, which
makes it infeasible to assume that the type conversion needed to allow
them to be comparable might include a shift in location.

Point conceded.

Seebs · Oct 10, 2009

I think that "suitably converted" necessarily means something different
for members of different types, which opens the possibility that, after
suitable conversion, the same pointer may point to different locations
in memory.

I don't think so, because the possibility of converting them all to
(unsigned char *) and then converting them back means that, if there is a
conversion from (unsigned char *) to (size_t *), it has to be the same
conversion in ALL cases.

And that means that if you do it on the address of a size_t, it has to do the
same thing that it would do on the address of a union which contains a size_t.

So the first byte of the size_t has to be the first byte of the union.

But someone else got us with the relational operator rules, which don't
impose any requirement of "suitably converted".

The requirement that conversion to a pointer to a char type points at
the first byte of the object means that my arguments do not apply to
members which have a character type; I'll even concede that this
argument might extend that exemption to other members of the same union;
though it depends upon the poorly defined concept "suitably
converted". Your argument doesn't work, however, if none of the members
has character type.

Yeah, I see the issue with 'suitably converted'. I am pretty sure that
that language exists only to handle the "but how can it point to a size_t
if it's not a size_t *" question.

I don't see a problem for malloc(). I believe that it's permissible for
(void*)&ct to point to one location in memory, and (size_t*)&ct to
point into a different location, which is the actual location where
ct.type is stored. (size*)(void*)&ct, in this case, would point to a
size_t-sized piece of memory at the start of ct, but not at the actual
location of ct.type. I don't see this as causing a problem for malloc.

Consider:

unsigned char *p = malloc(sizeof(union ct_type));
union ct_type *ctp = p;

(unsigned char *)ctp == p.

(size_t *)p == (size_t *)ctp;

At this point, p is a pointer without any magic attachments saying that
it points to a union, so converting it to (size_t *) has to yield the
beginning of the malloc'd space (which is suitably aligned). And that
has to be the same as the address of ctp->size.

Basically, the purpose of using malloc here is to demonstrate that there
can't be anything magical about the pointer. We could argue that the address
of an object of type (union ct_type *) had magical properties controlling
how it is converted when converted to (size_t *), but that can't be true
for the address that came back from malloc.

-s

Seebs · Oct 10, 2009

Barry Schwarz wrote:
...

OK, that's what I was looking for; I didn't find it because I was
expecting the relevant clause to be part of the section describing
unions, or types (6.7.2.1), not in the section describing the relational
operators (had I had thoughts in that direction, I would have expected
it to be in the equality operators section).

Me too!

Point conceded.

I am still semi-convinced that the case can be made without this, but I
will certainly grant that you are right -- it is not nearly as unambiguous
as it is obvious.

-s

Tim Rentsch · Oct 12, 2009

jameskuyper said:
Tim said:

[snip]

Click to expand...

[snip]

I did have some explanations and other responses to offer in
reply to this posting (and also a related nephew posting).
However, since later in the thread James took a different
position on the question it seems best not to put these
up and just leave things here. Anyone who is still
interested in those comments is welcome to email me.

Union and pointer casts?	13	Feb 24, 2011
Can one get away with an under-allocated union?	5	Dec 25, 2010
Type information in structs of different types	20	Mar 14, 2006
Union trouble	7	Mar 28, 2008
Portability issues (union, bitfields)	7	Nov 4, 2009
Union type variable assignation --- in expression, in function argues	10	Feb 5, 2004
accessing comon initial sequence in union	0	Aug 12, 2004
Does *&s1 refer to the first member of structure variable s1	5	Jul 5, 2007

union of structs: common variable stored in same address?

Tim Rentsch

Tim Rentsch

jameskuyper

jameskuyper

Seebs

jameskuyper

Seebs

James Kuyper

Seebs

Barry Schwarz

Seebs

James Kuyper

James Kuyper

Seebs

Seebs

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads