C11, const, and aliasing

A

ais523

(This post is about C11.)

I'm trying to maintain some existing code; in particular, there are some
struct fields that I suspect are being written directly rather than via
functions designed for the purpose. I'm trying to handle this with the
following idiom:

struct some_struct {
union {
const int field;
int field_writable;
};
}

Here, the struct used to contain just "int field;"; I've changed it so
that reads from the struct still work, but writes to "field" will cause
a compiler diagnostic (with writes to "field_writable" working fine).

I'd like to make sure that this works correctly; some people I've been
talking to (as well as me, to some extent) have been concerned that this
might potentially caused undefined behaviour in situations like this:

int function(struct some_struct *s)
{
int x = s->field;
s->field_writable--;
return x - s->field;
}

Based on the draft of the C11 standard I have access to (N1570.pdf; is
that the right one to be using?), this seems to not run afoul of 6.5.7
(which describes aliasing restrictions):

"An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

[...]

- a qualified version of a type compatible with the effective type of
the object,

[...]

- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union)

[...]"


However, I'm having problems determining exactly what the object that is
accessed is. Is it the struct as a whole? The individual union? One of
the fields of the union, and if so, which?

Other parts of the standard which might disallow this sort of thing are
6.5.16.1 (which governs the rules for what can be assigned to by what,
but does not seem by itself to disallow assignment even to a const field
of a union); and 6.7.3.6, which defines what "const" means, and
disallows assigning to a const object through a non-const type, but
again leaves it unclear precisely which object it is that's being
modified.

I feel like I'm missing something, especially because nothing I can
see in the rules I've mentioned disallows the following code:

int main(void)
{
const int c;
c = 0;
return c;
}

which is clearly not intended to be strictly conforming. Thus, I feel
I've missed a rule or two about what I can do with const types.

So, in short, what I'm wondering about is, does this union idiom for
making a partially const struct field do what I want? And why/why not?
 
J

James Kuyper

(This post is about C11.)

I'm trying to maintain some existing code; in particular, there are some
struct fields that I suspect are being written directly rather than via
functions designed for the purpose. I'm trying to handle this with the
following idiom:

struct some_struct {
union {
const int field;
int field_writable;
};
}

Here, the struct used to contain just "int field;"; I've changed it so
that reads from the struct still work, but writes to "field" will cause
a compiler diagnostic (with writes to "field_writable" working fine).

I'd like to make sure that this works correctly; some people I've been
talking to (as well as me, to some extent) have been concerned that this
might potentially caused undefined behaviour in situations like this:

int function(struct some_struct *s)
{
int x = s->field;
s->field_writable--;
return x - s->field;
}

Based on the draft of the C11 standard I have access to (N1570.pdf; is
that the right one to be using?), ...

It's the most current draft, and is almost identical to the actual
standard.
... this seems to not run afoul of 6.5.7
(which describes aliasing restrictions):

"An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

[...]

- a qualified version of a type compatible with the effective type of
the object,

[...]

- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union)

[...]"


However, I'm having problems determining exactly what the object that is
accessed is. Is it the struct as a whole? The individual union? One of
the fields of the union, and if so, which?

Other parts of the standard which might disallow this sort of thing are
6.5.16.1 (which governs the rules for what can be assigned to by what,
but does not seem by itself to disallow assignment even to a const field
of a union); ...

Correct; it's 6.5.16p2 ("Constraints") which covers that:
"An assignment operator shall have a modifiable lvalue as its left
operand."
"A modifiable lvalue is an lvalue that ... does not have a const
qualified type ..." (6.3.2.1p1)
... and 6.7.3.6, which defines what "const" means, and
disallows assigning to a const object through a non-const type, but
again leaves it unclear precisely which object it is that's being
modified.

It doesn't disallow it, it just says that the behavior is undefined.
That's not relevant, because your code contains no examples of such a
thing. If it did, the relevant object would be the const-qualified
object designated by the lvalue with non-const type.
I feel like I'm missing something, especially because nothing I can
see in the rules I've mentioned disallows the following code:

int main(void)
{
const int c;
c = 0;
return c;
}

The standard never disallows code; it only specifies what is and is not
guaranteed about the result when the code is translated by a conforming
implementation of C.

'c' has a const-qualified type, therefore its appearance as the left
operand of an assignment operator is a constraint violation. The
standard guarantees one and only one thing about what will happen when
such code is translated by a conforming implementation of C: at least
one diagnostic message will be generated (5.1.1.3p1). The existence of
that constraint violation allows, but does not require, that the code be
rejected. If the implementation chooses to go ahead and complete
translation despite that issue, and if you decide to go ahead with
executing it despite the diagnostic, the behavior of the code is undefined.

s->field is const-qualified, and therefore not modifiable, so attempting
to assign to it would be a constraint violation. s->field_writable is
not const-qualified, so writing to it would therefore not be a
constraint violation.
 
A

ais523

James said:
struct some_struct {
union {
const int field;
int field_writable;
};
} [snip]
int function(struct some_struct *s)
{
int x = s->field;
s->field_writable--;
return x - s->field;
}
[snip]
s->field is const-qualified, and therefore not modifiable, so attempting
to assign to it would be a constraint violation. s->field_writable is
not const-qualified, so writing to it would therefore not be a
constraint violation.

I guess what's confusing me is whether I can write to both of them
without defeating the aliasing rules.

Something I missed at the time of the previous post: 6.2.5p20 implies
that the separate fields of a union are all different objects, which is
one of the main things I was confused about. I guess I can boil my
question down to two specific questions about the standard, then:

1) Is it a constraint violation, under 6.5p7, to access a "const int"
object that is a member of a union via an lvalue expression that refers
to an "int" object that is a member of the same union (which has type
"int")?

My reading seems to indicate "yes"; the effective type of the object is
its declared type of "const int", which is not compatible with "int"
(6.7.3p10); and "int" is not a qualified version of "const int", either.

2) Does assigning to a member of a union access the objects that are the
other members of the union?

3.1p1 appears to imply yes; they share storage, so modifying the value
of one will also modify the values of the others, which constitutes an
access.

I'm definitely still missing something, though. My current reading
implies that the following program contains a constraint violation of
6.5p7 (6.2.7p1 implies that "int" and "float" are incompatible, and
3.1p1 that assigning to a union field accesses all the objects that
are members of that union, thus 6.5p7 makes it a constraint violation
to access the object x.f via an object x.i whose declared type is
"int"):

union { int i; float f; } x;

int main(void) { x.i = 1; }

Obviously, though, this program would be expected to be strictly
conforming, and compilers do in fact give no diagnostics on it (implying
that there is no constraint violation after all).

Clearly, one of my assumptions is incorrect; either 6.5p7 allows
accessing a union member object via union members with incompatible
declared types, or else assigning to a union member object does not
access the other members of that union. What do I have wrong here?
 
B

Ben Bacarisse

ais523 said:
James said:
struct some_struct {
union {
const int field;
int field_writable;
};
} [snip]
int function(struct some_struct *s)
{
int x = s->field;
s->field_writable--;
return x - s->field;
}
[snip]
s->field is const-qualified, and therefore not modifiable, so attempting
to assign to it would be a constraint violation. s->field_writable is
not const-qualified, so writing to it would therefore not be a
constraint violation.

I guess what's confusing me is whether I can write to both of them
without defeating the aliasing rules.

Something I missed at the time of the previous post: 6.2.5p20 implies
that the separate fields of a union are all different objects, which is
one of the main things I was confused about. I guess I can boil my
question down to two specific questions about the standard, then:

Yes, that's easy to miss.
1) Is it a constraint violation, under 6.5p7, to access a "const int"
object that is a member of a union via an lvalue expression that refers
to an "int" object that is a member of the same union (which has type
"int")?

6.5p7 does not specify any constraints, so breaking the rules is not a
constraint violation (which requires a diagnostic), it's "just"
undefined behaviour.

That technicality aside, it would break the rules, were you to do it,
but there is no indication that you do. You access each of the two
objects in the union with an lvalue expression of the object's declared
type. To break the rule you have to do something like this:

void *p = &s->field;
*(int *)p = 42;
My reading seems to indicate "yes"; the effective type of the object is
its declared type of "const int", which is not compatible with "int"
(6.7.3p10); and "int" is not a qualified version of "const int",
either.

But each of your expressions above accesses an object via an lvalue
expression of the object's declared type.
2) Does assigning to a member of a union access the objects that are the
other members of the union?

3.1p1 appears to imply yes; they share storage, so modifying the value
of one will also modify the values of the others, which constitutes an
access.

I don't think so. As you say below, this would make almost all union
modifications undefined. I'll leave others to debate the standard's
wording.
I'm definitely still missing something, though. My current reading
implies that the following program contains a constraint violation of
6.5p7 (6.2.7p1 implies that "int" and "float" are incompatible, and
3.1p1 that assigning to a union field accesses all the objects that
are members of that union, thus 6.5p7 makes it a constraint violation
to access the object x.f via an object x.i whose declared type is
"int"):

union { int i; float f; } x;

int main(void) { x.i = 1; }

Obviously, though, this program would be expected to be strictly
conforming, and compilers do in fact give no diagnostics on it (implying
that there is no constraint violation after all).

Clearly, one of my assumptions is incorrect; either 6.5p7 allows
accessing a union member object via union members with incompatible
declared types, or else assigning to a union member object does not
access the other members of that union. What do I have wrong here?

If your questions are really about the standard, I'll leave that for
others, but if you are concerned about your code, don't be. Provided
every modification is via an lvalue expression of the correct type,
accessing the other member will re-interpret the bits as being a value
of that other type. And since int and const int are defined to have the
same representation, all is well.
 
T

Tim Rentsch

ais523 said:
James said:
struct some_struct {
union {
const int field;
int field_writable;
};
} [snip]
int function(struct some_struct *s)
{
int x = s->field;
s->field_writable--;
return x - s->field;
}
[snip]
s->field is const-qualified, and therefore not modifiable, so attempting
to assign to it would be a constraint violation. s->field_writable is
not const-qualified, so writing to it would therefore not be a
constraint violation.

I guess what's confusing me is whether I can write to both of them
without defeating the aliasing rules.

Something I missed at the time of the previous post: 6.2.5p20 implies
that the separate fields of a union are all different objects, which is
one of the main things I was confused about. I guess I can boil my
question down to two specific questions about the standard, then:

1) Is it a constraint violation, under 6.5p7, to access a "const int"
object that is a member of a union via an lvalue expression that refers
to an "int" object that is a member of the same union (which has type
"int")?

No, because it is the "int" object that is being accessed, not
the "const int" object.
My reading seems to indicate "yes"; the effective type of the object is
its declared type of "const int", which is not compatible with "int"
(6.7.3p10); and "int" is not a qualified version of "const int", either.

There are two objects. Only the "int" object is being accessed,
not the "const int" object.
2) Does assigning to a member of a union access the objects that are the
other members of the union?

No. It does update the same region of memory, but does not
access the other objects.
3.1p1 appears to imply yes; they share storage, so modifying the value
of one will also modify the values of the others, which constitutes an
access.

The underlying storage is affected, but only the object
corresponding to the selected member is accessed.
I'm definitely still missing something, though. My current reading
implies that the following program contains a constraint violation of
6.5p7 (6.2.7p1 implies that "int" and "float" are incompatible, and
3.1p1 that assigning to a union field accesses all the objects that
are members of that union, thus 6.5p7 makes it a constraint violation
to access the object x.f via an object x.i whose declared type is
"int"):

union { int i; float f; } x;

int main(void) { x.i = 1; }

Obviously, though, this program would be expected to be strictly
conforming, and compilers do in fact give no diagnostics on it (implying
that there is no constraint violation after all).

Clearly, one of my assumptions is incorrect; either 6.5p7 allows
accessing a union member object via union members with incompatible
declared types, or else assigning to a union member object does not
access the other members of that union. What do I have wrong here?

I think your confusion arises from the assumption that "an
object" is synonymous with the area of memory it occupies. In
the Standard's view of the world, these two things are different.
(To further confuse the issue, sometimes the Standard uses the
two notions almost interchangeably, but I believe that's just the
language being a bit careless, not anything more significant.)
The objects corresponding to members in a union overlap. They
may overlap exactly or only partially, but they all have at least
one byte in common (possibly excepting bitfields but let's ignore
that). However, the objects are still distinct objects - when
one member is stored into, the storing affects (some of) the
memory occupied by the objects corresponding to other members,
but those objects /are not accessed/ in the sense the Standard
uses the term "access". Storing into one member does affect
what happens when other members are read, but it does not
modify those objects -- rather, what it does is change the
storage underlying those objects, so that when those members
are accessed different values are produced.

I understand your being confused here, because the Standard
does a poor job of explaining its worldview as to how objects
and memory relate. But if you go back and re-read the various
passages again while keeping in mind the view outlined above, I
think you'll agree it fits fairly well. (Personally this area
is one I would like to see addressed in future revisions of the
Standard, but that is a topic for comp.std.c and one which I
will leave for another day.)
 
K

Keith Thompson

Tim Rentsch said:
I think your confusion arises from the assumption that "an
object" is synonymous with the area of memory it occupies. In
the Standard's view of the world, these two things are different.
[...]

The Standard defines an "object" as a "region of data storage in the
execution environment, the contents of which can represent values".
Sounds pretty synonymous to me.
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
I think your confusion arises from the assumption that "an
object" is synonymous with the area of memory it occupies. In
the Standard's view of the world, these two things are different.
[...]

The Standard defines an "object" as a "region of data storage in the
execution environment, the contents of which can represent values".
Sounds pretty synonymous to me.

They can't be synonymous, at the very least because of unions:

union { signed s; unsigned u; } it;

One region of memory, but two objects (or three if we count the
enclosing union when it has no padding).
 
K

Keith Thompson

Tim Rentsch said:
Keith Thompson said:
Tim Rentsch said:
I think your confusion arises from the assumption that "an
object" is synonymous with the area of memory it occupies. In
the Standard's view of the world, these two things are different.
[...]

The Standard defines an "object" as a "region of data storage in the
execution environment, the contents of which can represent values".
Sounds pretty synonymous to me.

They can't be synonymous, at the very least because of unions:

union { signed s; unsigned u; } it;

One region of memory, but two objects (or three if we count the
enclosing union when it has no padding).

it.s, it.u, and it are all the same "region of data storage",
which is (the relevant part of) the definition of "object". Is the
definition incorrect? (I would not be shocked to find that the
standard is not entirely consistent.)
 
J

James Kuyper

it.s, it.u, and it are all the same "region of data storage",
which is (the relevant part of) the definition of "object". Is the
definition incorrect? (I would not be shocked to find that the
standard is not entirely consistent.)

6.2.5p20 seems inconsistent with that interpretation: "— A union type
describes an overlapping nonempty set of member objects,"

The original question involved:

struct some_struct {
union {
const int field;
int field_writable;
};
}

If the field and field_writable members of an object of some_struct type
both designate the same object, then, for purposes of applying 6.7.3p6,
does that object have const-qualified type, or not?
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
Keith Thompson said:
[...]
I think your confusion arises from the assumption that "an
object" is synonymous with the area of memory it occupies. In
the Standard's view of the world, these two things are different.
[...]

The Standard defines an "object" as a "region of data storage in the
execution environment, the contents of which can represent values".
Sounds pretty synonymous to me.

They can't be synonymous, at the very least because of unions:

union { signed s; unsigned u; } it;

One region of memory, but two objects (or three if we count the
enclosing union when it has no padding).

it.s, it.u, and it are all the same "region of data storage",
which is (the relevant part of) the definition of "object". Is the
definition incorrect? (I would not be shocked to find that the
standard is not entirely consistent.)

I wouldn't say the definition is incorrect but rather incomplete.
Surely you are familiar with other parts of the Standard that
reinforce this view. For example, 6.5 p6 talks about "the
declared type of an object". A region of data storage does not of
itself have a declared type. In effect what happens is an area of
memory "acquires" a declared type by virute of being associated
with some declaration. In the union example though there are
multiple associations - one for each of it.s, it.u, and it. The
region of memory is the same, but we get different types depending
on which identifier is at the other end of the association. To
handle the ambiguity an "object" must be something like an ordered
pair of { name, memory area }, where "name" includes at least a
declared variable name but may also include further selection
information, eg, it.s. This association scheme is necessary for
some uses of "object" in the Standard. In other uses though it
isn't - referring to memory allocated by malloc(), for example,
there still are "objects" but they have no associated name. The
idea that an "object" is just a region of data storage is in
some cases too simplistic for how the Standard uses the term.

Incidentally, the term "object" is sometimes used in still other
ways that don't fit either of the above patterns. For example,
in 6.2.4, talking about storage durations, the Standard says "If
the scope is entered recursively, a new instance of the object is
created each time." This doesn't make sense if an object is a
region of data storage: we don't make a new /instance/ of some
memory area, rather we allocate a new area. The wording here is
using "object" in the sense of a "template" for one or more
regions of memory. That doesn't fit with either the definition
given for 'object' or most uses in other parts of the Standard.
 
K

Keith Thompson

Tim Rentsch said:
I wouldn't say the definition is incorrect but rather incomplete.
Surely you are familiar with other parts of the Standard that
reinforce this view. For example, 6.5 p6 talks about "the
declared type of an object". A region of data storage does not of
itself have a declared type. In effect what happens is an area of
memory "acquires" a declared type by virute of being associated
with some declaration. In the union example though there are
multiple associations - one for each of it.s, it.u, and it. The
region of memory is the same, but we get different types depending
on which identifier is at the other end of the association. To
handle the ambiguity an "object" must be something like an ordered
pair of { name, memory area }, where "name" includes at least a
declared variable name but may also include further selection
information, eg, it.s. This association scheme is necessary for
some uses of "object" in the Standard. In other uses though it
isn't - referring to memory allocated by malloc(), for example,
there still are "objects" but they have no associated name. The
idea that an "object" is just a region of data storage is in
some cases too simplistic for how the Standard uses the term.

I am, as I predicted, not shocked by the inconsistency.
Incidentally, the term "object" is sometimes used in still other
ways that don't fit either of the above patterns. For example,
in 6.2.4, talking about storage durations, the Standard says "If
the scope is entered recursively, a new instance of the object is
created each time." This doesn't make sense if an object is a
region of data storage: we don't make a new /instance/ of some
memory area, rather we allocate a new area. The wording here is
using "object" in the sense of a "template" for one or more
regions of memory. That doesn't fit with either the definition
given for 'object' or most uses in other parts of the Standard.

I'd call that an entirely new object (that happens to have the
same name created by the same definition). I think the phrase
"new instance of the object" is just a mistake.
 
T

Tim Rentsch

Keith Thompson said:
I am, as I predicted, not shocked by the inconsistency.


I'd call that an entirely new object (that happens to have the
same name created by the same definition). I think the phrase
"new instance of the object" is just a mistake.

So you agree with me that how the Standard uses the term "object"
is not always consistent with how it defines it? :)

btw I agree on your last point, this was a poor choice of
phrasing. At some level however this issue is exactly the one
under discussion - how the term "object" is used in the Standard
is not uniform from place to place, nor always consistent with
its definition.
 
K

Keith Thompson

Tim Rentsch said:
So you agree with me that how the Standard uses the term "object"
is not always consistent with how it defines it? :)
Certainly.

btw I agree on your last point, this was a poor choice of
phrasing. At some level however this issue is exactly the one
under discussion - how the term "object" is used in the Standard
is not uniform from place to place, nor always consistent with
its definition.

Yup.

Ideally there would be a single definition of "object", and the
standard's usage would be consistent with that definition. I'm not
sure whether the best way to get there would be to modify the
current definition, or to keep it and fix up the inconsistent usage.
 
T

Tim Rentsch

Richard said:
Tim Rentsch said:
Keith Thompson said:
[...]
I think your confusion arises from the assumption that "an
object" is synonymous with the area of memory it occupies. In
the Standard's view of the world, these two things are different.
[...]

The Standard defines an "object" as a "region of data storage in the
execution environment, the contents of which can represent values".
Sounds pretty synonymous to me.

They can't be synonymous, at the very least because of unions:

union { signed s; unsigned u; } it;

One region of memory, but two objects (or three if we count the
enclosing union when it has no padding).

it.s, it.u, and it are all the same "region of data storage",
which is (the relevant part of) the definition of "object". Is the
definition incorrect? (I would not be shocked to find that the
standard is not entirely consistent.)

I wouldn't say the definition is incorrect but rather incomplete.
Surely you are familiar with other parts of the Standard that
reinforce this view.

You are joking, right? Keith could replace Denzel in The Book Of
Eli if it was a copy of the standard to be carried across an arid
wasteland.

Your somewhat affected questioning of something so obvious to
anyone who has spent more than half a day here is very reminiscent
of Keith's approach - it appears to be catching.

Apparently you misunderstood the sense of my comments. I
wasn't expressing doubt or calling anything into question.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top