Yet another binary search tree library

F

Francis Moreau

Tim Rentsch said:
[...]
No, the item of 6.5p7 which I pointed out doesn't deal about:

struct object {
int a;
};

[...]

struct object an_object;
int *p = &an_object.a;

As you said, this is clearly defined, since we're accessing the
(member) object "an_object.a" with a pointer whose type is compatible.

Right, but this case isn't what I was asking about.

But this is the case for all examples you gave below, I think.
However doing this:

struct object *obj;
int a = 5;
obj = (struct object *)&a;

is defined by the C standard AFAIK (by what you call 'effective type
rules' I believe, but 'strict aliasing rules' defined by GCC breaks
this rules. [snip]

For starters this is (usually) undefined behavior anyway, because
of pointer alignments and size mismatches if nothing else.

However, suppose we have this:

int *p;
struct { int a; int b; } *x, *y;
x = malloc( sizeof *x );
y = malloc( sizeof *y );
/* presume the mallocs succeed */

y->a = y->b = 0;
*x = *y;
p = &x->a;

So 'p' is an alias to the object 'x->a'. They both have the same type.
*p = 5;
*x = *y;
/* under strict aliasing must '*p == 0' still be true? */

*x = *y;
*p = 7;
*y = *x;
/* under strict aliasing must 'y->a == 7' still be true? */

y->a = y->b = 0;
*x = *y;
*p = 9;
x = (void*) p;
*x = *y;
/* under strict aliasing must '*p == 0' still be true? */

Do you believe, even under gcc strict aliasing, that any of the
comment questions can have an answer other than "Yes"?. If so,
what leads you to that conclusion?

I don't think that GCC strict aliasing rule deals about these cases.
All
of these 3 cases are about a (member) object (whose type is 'int')
that
can be accessed by a pointer with the _same_ type.

However GCC strict aliasing rule deals about the case where an object
is
accessed by another one which has a _different_ type. For example:

int i;
struct A { int a; } *x;

x = (struct A *)&i;
x->a = 0;

where we're accessing an object (type of 'int') through a pointer to a
structure. So we're accessing an object through a pointer which has a
_different_ (not even /almost/ the same) type. Note that this example
is
perfectly legal C AFAIK.

GCC strict aliasing rule claims that can't happen, for the sake of
optimizations despite the C standard.

Of course, all of these is from my best understanding. I found, as you
said previously, the GCC man page quite badly written regarding the
strict aliasing stuff.
 
S

Seebs

I find it useful to read the Standard in two distinct modes. One, what I
might call the "comp.std.c mode", is a more literal and less assuming
reading (or interpretation, some might say). The other, what I might
call the "comp.lang.c mode", is less literal and allows more reading
between the lines, in an effort to discover what the committee expects
for how the words will be read. Mostly these two modes produce the same
conclusions, but there are some obvious counterexamples [...]
It's great to see these "modes" "formalized"; being aware of them will
help me read the Standard.

It's a useful tool, and there's a lot of very similar mode-changes going on
in conversations. I spend most of my time talking about C in terms of a
pure abstract machine, but if I need to write target-specific code, I
switch to talking about what actually happens on that implementation or
family of implementations, which may be quite a bit different.

The big problem is that people tend to unconsciously translate statements
from another mode into the mode they're in. The most obvious example is a
bit off-topic, but it's a very informative one. There are several
fundamentally different approaches to describing ethics, and among the most
obvious are:
1. Deontological ("duty-based") ethics; actions are defined to be
right and wrong in and of themselves, and you accept whatever
outcomes result from behaving in right manners.
2. Teleological ("outcome-based") ethics; actions are defined to be
right or wrong in terms of the desireability of their outcomes, and
you accept whatever actions result in the best outcomes.

Both of these systems are in moderately widespread use, and they tend to
come up with similar answers upwards of 90% of the time. However, when you
present people with a carefully-crafted "moral dilemma" problem, they tend
to resolve to one of these or the other (or one of the other systems...)

If you have a debate between two people, each of whom presupposes one of
these modes of ethical debate, they usually end up massively failing to
communicate. If the outcome-based ethicist says that, in a particular
situation, the best available course of action is to kill a given person
(usually Hitler), the duty-based ethicist tends to hear the claim "because
we have found this weird case, it is morally right to kill people". And
then they start arguing against the general claim that it is a moral duty
to kill, or at least, not a violation of a moral duty to do so. In general.
You get similar misunderstandings the other way.

The key is to learn to recognize when something is coming from a different
mode, and to know that the translation between modes *changes* what's been
said.

To bring this back to C examples, consider the vast gap you get in responses
to "what happens if I write ++i*++i?" in comp.lang.c. Some people are going
to attempt to describe the likely range of actual outputs from instructions
compilers might generate for the expression, other people are going to draw
the line at "it's undefined, and anything is all right".

In practice, no one probably really expects that it'll result in, say,
reloading inetd.conf -- but they may well use that kind of example as a way
to express that you really don't know, and can't know, and that attempting
to know is likely to result in you making even worse mistakes.

YMMV.

-s
 
F

Franck Bui-Huu

Tim Rentsch said:
[...]
So to sum up, I think:

struct my_struct obj;
char *p = (char *)&obj;
p += offsetof(struct my_struct, my_node);
p -= offsetof(struct my_struct, my_node);
((struct my_struct *)p)->a;

is a defined case of type punning.

Actually there isn't any type punning going on here, since the
types used to reference objects are the same in each case that
the respective object actually holds. (And ignoring that member
'a' hasn't been initialized, which is orthogonal to the question
of type punning.) Type punning refers to the case when a region
of memory has been stored (or perhaps declared) using one type,
then accessed using another. That hasn't happened here. The
pointer conversions are legal and portable, but the converted
pointers are accessing objects whose types match those of the
access in each case -- ergo, no type punning.

Correct.

Since no reinterpretation of the value happened, there's actually no
type punning involved.
 
F

Franck Bui-Huu

[snip]
I probably misused the term "type punning" (see above what I meant --
nothing more than manipulating a pointer-to-char and then casting it
to a pointer-to-struct).

My real problem was, again, that the compiler sees two pointers to
different types, and the areas occupied by the pointed-to objects
overlap. I was worried whether this breaks "strict aliasing rules".

Of course, it doesn't; the outer structure declaration ("struct
object_type") is visible, so if the compiler sees a (struct
object_type *) and a (struct node_type *), it cannot assume the latter
doesn't point into the target of the former -- unless "restrict" is in
effect.

From the point of view of the library API, (struct node_type *)
shouldn't be dereferenced from the user code.
 
F

Francis Moreau

Tim Rentsch said:
Francis Moreau said:
Tim Rentsch said:
[...]


No, the item of 6.5p7 which I pointed out doesn't deal about:

struct object {
int a;
};

[...]

struct object an_object;
int *p = &an_object.a;

As you said, this is clearly defined, since we're accessing the
(member) object "an_object.a" with a pointer whose type is compatible.

Right, but this case isn't what I was asking about.

But this is the case for all examples you gave below, I think.

It isn't. The cases below access the same memory using two
distinct types, one type 'int' and the other type a structure
type.

However doing this:

struct object *obj;
int a = 5;
obj = (struct object *)&a;

is defined by the C standard AFAIK (by what you call 'effective type
rules' I believe, but 'strict aliasing rules' defined by GCC breaks
this rules. [snip]

For starters this is (usually) undefined behavior anyway, because
of pointer alignments and size mismatches if nothing else.

However, suppose we have this:

int *p;
struct { int a; int b; } *x, *y;
x = malloc( sizeof *x );
y = malloc( sizeof *y );
/* presume the mallocs succeed */

y->a = y->b = 0;
*x = *y;
p = &x->a;

So 'p' is an alias to the object 'x->a'. They both have the same type.

The expression 'x->a' is never used in the example code to access
an object;

Isn't

*x = *y;

equivalent to

x->a = y->a;
x->b = y->b;

?
it's used only to take its address to assign to 'p'. The other
accesses to this object are done using a structure type, which is
different from int.



Please look again. They are about accessing an object using two
very different types.


I'm afraid your understanding of C is somewhat lacking.

Yes probably...
First, the conversion of '&i' to '(struct A*)' may fail (ie, yield
undefined behavior) because of alignment requirements.

How did you come to that conclusion ?

Could you show me the 'path' across the standard that leads you to
this conclusion ?

Just for reminder the definition of 'struct A' is:

struct A { int a; };
Second, even if it succeeds, trying to access 'i' using 'x->a' is
always undefined behavior, because in such cases '*x' must refer to a
structure object, and 'i' isn't in any structure.

First, doing 'x->a' (done by my previous example) is accessing the
member object 'a', not the structure object (ie the whole collection
of the member objects of the structure).

Second, 6.5p7 which lists all possible ways to access an object value,
tells me that 'x->a' is a defined way to access _an_ 'int' object
since "typeof(x->a) == int".

So could you tell me (again) how did you come to this conclusion ?
That theory doesn't hold up since the behavior in this case is
already undefined under the Standard.


The writing in the ISO document is much better than that in
the GCC man page. It might be better to read (or re-read)
the C Standard more carefully, and only after that try to
devine what is meant by the GCC man page.

Yes, also asking to the GCC's team to clarify this point might be a
good idea.
 
T

Tim Rentsch

Francis Moreau said:
Tim Rentsch said:
Francis Moreau said:
[...]


No, the item of 6.5p7 which I pointed out doesn't deal about:

struct object {
int a;
};

[...]

struct object an_object;
int *p = &an_object.a;

As you said, this is clearly defined, since we're accessing the
(member) object "an_object.a" with a pointer whose type is compatible.

Right, but this case isn't what I was asking about.

But this is the case for all examples you gave below, I think.

It isn't. The cases below access the same memory using two
distinct types, one type 'int' and the other type a structure
type.

However doing this:

struct object *obj;
int a = 5;
obj = (struct object *)&a;

is defined by the C standard AFAIK (by what you call 'effective type
rules' I believe, but 'strict aliasing rules' defined by GCC breaks
this rules. [snip]

For starters this is (usually) undefined behavior anyway, because
of pointer alignments and size mismatches if nothing else.

However, suppose we have this:

int *p;
struct { int a; int b; } *x, *y;
x = malloc( sizeof *x );
y = malloc( sizeof *y );
/* presume the mallocs succeed */

y->a = y->b = 0;
*x = *y;
p = &x->a;

So 'p' is an alias to the object 'x->a'. They both have the same type.

The expression 'x->a' is never used in the example code to access
an object;

Isn't

*x = *y;

equivalent to

x->a = y->a;
x->b = y->b;

?

No. Similar, yes, but not equivalent.

Yes probably...


How did you come to that conclusion ?

Could you show me the 'path' across the standard that leads you to
this conclusion ?

Just for reminder the definition of 'struct A' is:

struct A { int a; };

Suppose that 'sizeof(int) == 4'. The alignment of int's in
such an implementation is 1, 2, or 4. The structure, on
the other hand, can have an alignment that is any multiple
of the alignment of int's; so for example, if the alignment
of 'int' is 2, the alignment of the structure could be 14.
(Obviously 14 isn't very likely, but 8 would serve the
example just as well.) So if the address of 'i' not
suitably aligned for the 'struct A' type, converting the
pointer is undefined behavior. 6.3.2.3 p7.

First, doing 'x->a' (done by my previous example) is accessing the
member object 'a', not the structure object (ie the whole collection
of the member objects of the structure).

The description of the '->' operator says it designates a member of
a structure or union _object_ (my emphasis). Since in this example
there is no structure or union object, and there is no description
of what happens when there isn't such an object, the behavior is
undefined.
Second, 6.5p7 which lists all possible ways to access an object value,
tells me that 'x->a' is a defined way to access _an_ 'int' object
since "typeof(x->a) == int".

No it doesn't say that. What it /does/ say is that if you do _not_
access an int object using one of a specified set of types, then the
behavior is undefined. It does /not/ say that any access using an
int type is defined and/or legal, and indeed there are many that
aren't.
So could you tell me (again) how did you come to this conclusion ?

Did you even bother reading the description of the '->' operator?
There's nothing mysterious about the reasoning.

Yes, also asking to the GCC's team to clarify this point might be a
good idea.

Maybe so. Please let the group know how they respond.
 
F

Francis Moreau

Francis Moreau said:
[...]
No, the item of 6.5p7 which I pointed out doesn't deal about:
    struct object {
        int a;
    };
    [...]
    struct object an_object;
    int *p = &an_object.a;
As you said, this is clearly defined, since we're accessing the
(member) object "an_object.a" with a pointer whose type is compatible.
Right, but this case isn't what I was asking about.
But this is the case for all examples you gave below, I think.
It isn't.  The cases below access the same memory using two
distinct types, one type 'int' and the other type a structure
type.
However doing this:
    struct object *obj;
    int a = 5;
    obj = (struct object *)&a;
is defined by the C standard AFAIK (by what you call 'effective type
rules' I believe, but 'strict aliasing rules' defined by GCC breaks
this rules.  [snip]
For starters this is (usually) undefined behavior anyway, because
of pointer alignments and size mismatches if nothing else.
However, suppose we have this:
    int *p;
    struct { int a; int b; } *x, *y;
    x = malloc( sizeof *x );
    y = malloc( sizeof *y );
    /* presume the mallocs succeed */
    y->a = y->b = 0;
    *x = *y;
    p = &x->a;
So 'p' is an alias to the object 'x->a'. They both have the same type..
The expression 'x->a' is never used in the example code to access
an object;

  *x = *y;
equivalent to
  x->a = y->a;
  x->b = y->b;

No.  Similar, yes, but not equivalent.




Yes probably...
How did you come to that conclusion ?
Could you show me the 'path' across the standard that leads you to
this conclusion ?
Just for reminder the definition of 'struct A' is:
   struct A { int a; };

Suppose that 'sizeof(int) == 4'.  The alignment of int's in
such an implementation is 1, 2, or 4.  The structure, on
the other hand, can have an alignment that is any multiple
of the alignment of int's;  so for example, if the alignment
of 'int' is 2, the alignment of the structure could be 14.

Ok, so all of this is up to the implementation.

Let assume that 'struct A' and 'int' types have the same alignment.
The description of the '->' operator says it designates a member of
a structure or union _object_ (my emphasis).  Since in this example
there is no structure or union object, and there is no description
of what happens when there isn't such an object, the behavior is
undefined.

please see below...
No it doesn't say that.  What it /does/ say is that if you do _not_
access an int object using one of a specified set of types, then the
behavior is undefined.

and what the specified set of types ?

Answer:

[...]
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union)

Note that this is one defined way to access _an_ object. This doesn't
say a word about the fact that for this case it must be a _member_
object.

Did you even bother reading the description of the '->' operator?
There's nothing mysterious about the reasoning.

Please stop asking me to read the standard.

As you probably know, this document is far from trivial to understand.

And if you're skilled enough to understand such paper just after one
reading, then you had probably noticed that comp.lang.c is full of
questions asking to clarify the spec.
 
T

Tim Rentsch

Francis Moreau said:
Francis Moreau said:
Tim Rentsch <[email protected]> writes:
Francis Moreau <[email protected]> writes:
Tim Rentsch <[email protected]> writes:

No, the item of 6.5p7 which I pointed out doesn't deal about:
struct object {
int a;
};

struct object an_object;
int *p = &an_object.a;
As you said, this is clearly defined, since we're accessing the
(member) object "an_object.a" with a pointer whose type is compatible.
Right, but this case isn't what I was asking about.
But this is the case for all examples you gave below, I think.
It isn't. The cases below access the same memory using two
distinct types, one type 'int' and the other type a structure
type.
However doing this:
struct object *obj;
int a = 5;
obj = (struct object *)&a;
is defined by the C standard AFAIK (by what you call 'effective type
rules' I believe, but 'strict aliasing rules' defined by GCC breaks
this rules. [snip]
For starters this is (usually) undefined behavior anyway, because
of pointer alignments and size mismatches if nothing else.
However, suppose we have this:
int *p;
struct { int a; int b; } *x, *y;
x = malloc( sizeof *x );
y = malloc( sizeof *y );
/* presume the mallocs succeed */
y->a = y->b = 0;
*x = *y;
p = &x->a;
So 'p' is an alias to the object 'x->a'. They both have the same type.
The expression 'x->a' is never used in the example code to access
an object;

*x = *y;
equivalent to
x->a = y->a;
x->b = y->b;

No. Similar, yes, but not equivalent.




it's used only to take its address to assign to 'p'. The other
accesses to this object are done using a structure type, which is
different from int.
*p = 5;
*x = *y;
/* under strict aliasing must '*p == 0' still be true? */
*x = *y;
*p = 7;
*y = *x;
/* under strict aliasing must 'y->a == 7' still be true? */
y->a = y->b = 0;
*x = *y;
*p = 9;
x = (void*) p;
*x = *y;
/* under strict aliasing must '*p == 0' still be true? */
Do you believe, even under gcc strict aliasing, that any of the
comment questions can have an answer other than "Yes"?. If so,
what leads you to that conclusion?
I don't think that GCC strict aliasing rule deals about these cases.
All of these 3 cases are about a (member) object (whose type is 'int')
that can be accessed by a pointer with the _same_ type.
Please look again. They are about accessing an object using two
very different types.
However GCC strict aliasing rule deals about the case where an object
is accessed by another one which has a _different_ type. For example:
int i;
struct A { int a; } *x;
x = (struct A *)&i;
x->a = 0;
where we're accessing an object (type of 'int') through a pointer to a
structure. So we're accessing an object through a pointer which has a
_different_ (not even /almost/ the same) type. Note that this example
is perfectly legal C AFAIK.
I'm afraid your understanding of C is somewhat lacking.
Yes probably...
First, the conversion of '&i' to '(struct A*)' may fail (ie, yield
undefined behavior) because of alignment requirements.
How did you come to that conclusion ?
Could you show me the 'path' across the standard that leads you to
this conclusion ?
Just for reminder the definition of 'struct A' is:
struct A { int a; };

Suppose that 'sizeof(int) == 4'. The alignment of int's in
such an implementation is 1, 2, or 4. The structure, on
the other hand, can have an alignment that is any multiple
of the alignment of int's; so for example, if the alignment
of 'int' is 2, the alignment of the structure could be 14.

Ok, so all of this is up to the implementation.

Let assume that 'struct A' and 'int' types have the same alignment.
The description of the '->' operator says it designates a member of
a structure or union _object_ (my emphasis). Since in this example
there is no structure or union object, and there is no description
of what happens when there isn't such an object, the behavior is
undefined.

please see below...
No it doesn't say that. What it /does/ say is that if you do _not_
access an int object using one of a specified set of types, then the
behavior is undefined.

and what the specified set of types ?

Answer:

[...]
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union)

Note that this is one defined way to access _an_ object. This doesn't
say a word about the fact that for this case it must be a _member_
object.

Did you understand the main point I was making? That 6.5p7
doesn't ever guarantee legality, but only excludes cases besides
the ones it lists? 6.5p7 is a /necessary/ condition for an
access to be defined behavior, but it is not a /sufficient/
condition. That's why reading other sections of the Standard
is needed here.

Please stop asking me to read the standard.

As you probably know, this document is far from trivial to understand.

And if you're skilled enough to understand such paper just after one
reading, then you had probably noticed that comp.lang.c is full of
questions asking to clarify the spec.

I apologize for the tone of my earlier comment. It was
unnecessarily harsh.

However, if someone is making assertions about what the Standard
requires or allows, I don't think it's unreasonable to expect
that they should and will read the Standard to try to understand
what it says. I don't mind answering questions when the
reasoning necessary is convoluted or otherwise hard to find. In
this particular case I thought the reasoning involved would be
obvious to someone who had read the description of the operators
used. I explained that reasoning in the previous posting, and
also explained why 6.5p7 doesn't affect the outcome; 6.5p7
can only make otherwise defined behavior be allowed, it cannot
make undefined behavior be defined. So the semantic provisions
of the '->' operator, having failed to be met, mean this case
is undefined behavior.
 
F

Francis Moreau

Francis Moreau said:
[...]
No, the item of 6.5p7 which I pointed out doesn't deal about:
    struct object {
        int a;
    };
    [...]
    struct object an_object;
    int *p = &an_object.a;
As you said, this is clearly defined, since we're accessing the
(member) object "an_object.a" with a pointer whose type is compatible.
Right, but this case isn't what I was asking about.
But this is the case for all examples you gave below, I think.
It isn't.  The cases below access the same memory using two
distinct types, one type 'int' and the other type a structure
type.
However doing this:
    struct object *obj;
    int a = 5;
    obj = (struct object *)&a;
is defined by the C standard AFAIK (by what you call 'effective type
rules' I believe, but 'strict aliasing rules' defined by GCC breaks
this rules.  [snip]
For starters this is (usually) undefined behavior anyway, because
of pointer alignments and size mismatches if nothing else.
However, suppose we have this:
    int *p;
    struct { int a; int b; } *x, *y;
    x = malloc( sizeof *x );
    y = malloc( sizeof *y );
    /* presume the mallocs succeed */
    y->a = y->b = 0;
    *x = *y;
    p = &x->a;
So 'p' is an alias to the object 'x->a'. They both have the same type.
The expression 'x->a' is never used in the example code to access
an object;
Isn't
  *x = *y;
equivalent to
  x->a = y->a;
  x->b = y->b;
?
No.  Similar, yes, but not equivalent.
it's used only to take its address to assign to 'p'.  The other
accesses to this object are done using a structure type, which is
different from int.
    *p = 5;
    *x = *y;
    /* under strict aliasing must '*p == 0' still be true? */
    *x = *y;
    *p = 7;
    *y = *x;
    /* under strict aliasing must 'y->a == 7' still be true? */
    y->a = y->b = 0;
    *x = *y;
    *p = 9;
    x = (void*) p;
    *x = *y;
    /* under strict aliasing must '*p == 0' still be true? */
Do you believe, even under gcc strict aliasing, that any of the
comment questions can have an answer other than "Yes"?.  If so,
what leads you to that conclusion?
I don't think that GCC strict aliasing rule deals about these cases.
All of these 3 cases are about a (member) object (whose type is 'int')
that can be accessed by a pointer with the _same_ type.
Please look again.  They are about accessing an object using two
very different types.
However GCC strict aliasing rule deals about the case where an object
is accessed by another one which has a _different_ type. For example:
  int i;
  struct A { int a; } *x;
  x = (struct A *)&i;
  x->a = 0;
where we're accessing an object (type of 'int') through a pointer to a
structure. So we're accessing an object through a pointer which has a
_different_ (not even /almost/ the same) type. Note that this example
is perfectly legal C AFAIK.
I'm afraid your understanding of C is somewhat lacking.
Yes probably...
First, the conversion of '&i' to '(struct A*)' may fail (ie, yield
undefined behavior) because of alignment requirements.
How did you come to that conclusion ?
Could you show me the 'path' across the standard that leads you to
this conclusion ?
Just for reminder the definition of 'struct A' is:
   struct A { int a; };
Suppose that 'sizeof(int) == 4'.  The alignment of int's in
such an implementation is 1, 2, or 4.  The structure, on
the other hand, can have an alignment that is any multiple
of the alignment of int's;  so for example, if the alignment
of 'int' is 2, the alignment of the structure could be 14.
Ok, so all of this is up to the implementation.
Let assume that 'struct A' and 'int' types have the same alignment.
please see below...
and what the specified set of types ?

 [...]
 an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union)
Note that this is one defined way to access _an_ object. This doesn't
say a word about the fact that for this case it must be a _member_
object.

Did you understand the main point I was making?  That 6.5p7
doesn't ever guarantee legality, but only excludes cases besides
the ones it lists?  6.5p7 is a /necessary/ condition for an
access to be defined behavior, but it is not a /sufficient/
condition.  That's why reading other sections of the Standard
is needed here.
Please stop asking me to read the standard.
As you probably know, this document is far from trivial to understand.
And if you're skilled enough to understand such paper just after one
reading, then you had probably noticed that comp.lang.c is full of
questions asking to clarify the spec.

I apologize for the tone of my earlier comment.  It was
unnecessarily harsh.

no problem.
However, if someone is making assertions about what the Standard
requires or allows, I don't think it's unreasonable to expect
that they should and will read the Standard to try to understand
what it says.  I don't mind answering questions when the
reasoning necessary is convoluted or otherwise hard to find.

thank you, your answers are valuable to decipher the Standard.
In
this particular case I thought the reasoning involved would be
obvious to someone who had read the description of the operators
used.  I explained that reasoning in the previous posting, and
also explained why 6.5p7 doesn't affect the outcome; 6.5p7
can only make otherwise defined behavior be allowed, it cannot
make undefined behavior be defined.  So the semantic provisions
of the '->' operator, having failed to be met, mean this case
is undefined behavior.

Ok, I see now.

This is really sad to see that just adding a couple of word would have
clarify a lot this point. Instead I have to sail accross the whole
document to get scattered information.

How about this new example, which should make you happier regarding
the '->' operator:

struct A { int a; };
struct B { int b; };

struct A *p;
struct B b_object;

p = (struct A *)&b_object;
p->a = 1;

As you can see, there're no more alignment issues since (in my
understanding) all pointers to structure types have the same
representation and alignment requirements as each other.

And (still in my understanding), there's no more undefined behaviour
since the '->' operator requirement is satisfied now: p now points to
a structure object.

What do you think ?
 
T

Tim Rentsch

Francis Moreau said:
[snip] I don't mind answering questions when the
reasoning necessary is convoluted or otherwise hard to find.

thank you, your answers are valuable to decipher the Standard.

Glad to hear it.
Ok, I see now.

This is really sad to see that just adding a couple of word would have
clarify a lot this point. Instead I have to sail accross the whole
document to get scattered information.

Yes, for better or worse, some choices of writing style in the
Standard sometimes make it hard to find the information one wants
to find.
How about this new example, which should make you happier regarding
the '->' operator:

struct A { int a; };
struct B { int b; };

struct A *p;
struct B b_object;

p = (struct A *)&b_object;
p->a = 1;

As you can see, there're no more alignment issues since (in my
understanding) all pointers to structure types have the same
representation and alignment requirements as each other.

Pointers to structures have the same alignment requirements as
each other, but the structures themselves may have different
alignments, and it's the alignments of the structures that is
relevant here.

In principle it's possible for these two structs to have
different alignment requirements, but practically speaking it's
pretty unlikely, so for the sake of discussion let's ignore that
aspect.
And (still in my understanding), there's no more undefined behaviour
since the '->' operator requirement is satisfied now: p now points to
a structure object.

What do you think ?

This question is very good. It seems like a natural thing to
want to do, and finding an answer isn't especially easy or
obvious.

In fact, I believe the Standard means for this to be undefined
behavior. Unfortunately the Standard does not (IMO) address this
question as directly or as clearly as it should. The general
principle is that it's illegal to access members of one kind of
struct by means of member access (ie, using '.' or '->') using
a designator that is a different kind of struct. This principle
does have an exception, and that exception is spelled out
specifically in 6.5.2.3p5:

One special guarantee is made in order to simplify the use
of unions: if a union contains several structures that
share a common initial sequence (see below), and if the
union object currently contains one of these structures, it
is permitted to inspect the common initial part of any of
them anywhere that a declaration of the complete type of the
union is visible. Two structures share a common initial
sequence if corresponding members have compatible types
(and, for bit-fields, the same widths) for a sequence of one
or more initial members.

The argument is that, since the Standard goes to the trouble of
pointing out a case where accessing one struct through a different
type of struct is allowed, in other cases doing that is not allowed
(ie, is undefined behavior). This question may be worth debating
over in comp.std.c, because at the very least it would be nice if
the language were more clear. However, as far as a practical
reading of the Standard goes (ie, for comp.lang.c) I would say this
point settles the question. And, luckily, there is an example that
is pretty much right on point to your question, namely 6.5.2.3p8:

EXAMPLE 3 The following is a valid fragment:

union {
struct {
int alltypes;
} n;
struct {
int type;
int intnode;
} ni;
struct {
int type;
double doublenode;
} nf;
} u;
u.nf.type = 1;
u.nf.doublenode = 3.14;
/* ... */
if (u.n.alltypes == 1)
if (sin(u.nf.doublenode) == 0.0)
/* ... */

The following is not a valid fragment (because the union type is
not visible within function f):

struct t1 { int m; };
struct t2 { int m; };
int f(struct t1 *p1, struct t2 *p2)
{
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g()
{
union {
struct t1 s1;
struct t2 s2;
} u;
/* ... */
return f(&u.s1, &u.s2);
}

The second code fragment is very similar to the question you ask,
and is clearly identified as not a valid fragment. Because of
the similarity it seems safe to conclude that the example code
you gave (and other similar sorts of code) is also not valid.
Make sense?
 
F

Francis Moreau

Pointers to structures have the same alignment requirements as
each other, but the structures themselves may have different
alignments, and it's the alignments of the structures that is
relevant here.
OK.


In principle it's possible for these two structs to have
different alignment requirements, but practically speaking it's
pretty unlikely, so for the sake of discussion let's ignore that
aspect.
OK.



This question is very good.

Glad to hear it
 It seems like a natural thing to
want to do, and finding an answer isn't especially easy or
obvious.

In fact, I believe the Standard means for this to be undefined
behavior.  Unfortunately the Standard does not (IMO) address this
question as directly or as clearly as it should.  The general
principle is that it's illegal to access members of one kind of
struct by means of member access (ie, using '.' or '->') using
a designator that is a different kind of struct.  This principle
does have an exception, and that exception is spelled out
specifically in 6.5.2.3p5:

    One special guarantee is made in order to simplify the use
    of unions:  if a union contains several structures that
    share a common initial sequence (see below), and if the
    union object currently contains one of these structures, it
    is permitted to inspect the common initial part of any of
    them anywhere that a declaration of the complete type of the
    union is visible.  Two structures share a common initial
    sequence if corresponding members have compatible types
    (and, for bit-fields, the same widths) for a sequence of one
    or more initial members.

The argument is that, since the Standard goes to the trouble of
pointing out a case where accessing one struct through a different
type of struct is allowed, in other cases doing that is not allowed
(ie, is undefined behavior).  This question may be worth debating
over in comp.std.c, because at the very least it would be nice if
the language were more clear.  However, as far as a practical
reading of the Standard goes (ie, for comp.lang.c) I would say this
point settles the question.  And, luckily, there is an example that
is pretty much right on point to your question, namely 6.5.2.3p8:

    EXAMPLE 3 The following is a valid fragment:

        union {
            struct {
                int alltypes;
            } n;
            struct {
                int type;
                int intnode;
            } ni;
            struct {
                int type;
                double doublenode;
            } nf;
        } u;
        u.nf.type = 1;
        u.nf.doublenode = 3.14;
        /* ... */
        if (u.n.alltypes == 1)
        if (sin(u.nf.doublenode) == 0.0)
        /* ... */

    The following is not a valid fragment (because the union type is
    not visible within function f):

        struct t1 { int m; };
        struct t2 { int m; };
        int f(struct t1 *p1, struct t2 *p2)
        {
            if (p1->m < 0)
            p2->m = -p2->m;
            return p1->m;
        }
        int g()
        {
            union {
                struct t1 s1;
                struct t2 s2;
            } u;
            /* ... */
            return f(&u.s1, &u.s2);
        }

The second code fragment is very similar to the question you ask,
and is clearly identified as not a valid fragment.  Because of
the similarity it seems safe to conclude that the example code
you gave (and other similar sorts of code) is also not valid.
Make sense?

Yes it does.

What it doesn't anymore is the description of:

- type punning described by wikipedia
- the 'strict-aliasing' option in GCC man page

The Berkeley sockets interface, which is given as example of type
punning by: http://en.wikipedia.org/wiki/Type_punning, relies on
undefined behaviours:

struct sockaddr_in sa = {0};
[...]
bind(sockfd, (struct sockaddr *)&sa, sizeof sa);

Regarding the 'strict-aliasing' rule, as you said before (and you
proved me), 'strict aliasing rules' and 'standard aliasing rules' (the
one define by the C standards) are the same.

Passing '-fno-strict-aliasing' to GCC allows to generate no conformant
programs because it seems from the man page that GCC allows more
aliasing cases than the ones defined by the standards. Which ones
exactly, I don't know. Futhermore 'strict-aliasing' option is either
ON or OFF depending on the optimisation level. So passing '-O0' to GCC
can generate a program which behaves differently when compiling with '-
O2'.
 
T

Tim Rentsch

Francis Moreau said:
Glad to hear it


Yes it does.

What it doesn't anymore is the description of:

- type punning described by wikipedia

Some information in this wikipedia page is wrong. More
specifically, the reference about using union types uses a
non-normative statement from Annex J, which is misleading about
what exactly when unspecified values come into play. The actual
normative text is in 6.2.6.1p6:

When a value is stored in an object of structure or union
type, including in a member object, the bytes of the object
representation that correspond to any padding bytes take
unspecified values.

So it's only reading another union member _larger_ than the last
one written that unspecified values enter the picture. Reading a
union member that is smaller or the same size as the last one
written must re-interpret the previously written bytes under the
read member's type. Near the bottom of the Wikipedia page there is
a link for DR 257 -- if someone had followed up reading this DR and
the other DR's it mentions, they would see the text (now present in
a footnote in n1256) saying that this re-interpretation (and also
explaining it as "type punning") is what will happen. Maybe I
should put a note on the Wikipedia page about that and see if
anyone follows up on it... (No promises!)
- the 'strict-aliasing' option in GCC man page

I'll have more to say about that in a moment.
The Berkeley sockets interface, which is given as example of type
punning by: http://en.wikipedia.org/wiki/Type_punning, relies on
undefined behaviours:

struct sockaddr_in sa = {0};
[...]
bind(sockfd, (struct sockaddr *)&sa, sizeof sa);

The discussion of this on the Wikipedia page is somewhat shallow.
In fact it's impossible to tell, based on information in the
Wikipedia page, if there is undefined behavior or not (not counting
the question of alignment of the two structs). Certainly the
'bind()' function could be written in a way so that there is
no undefined behavior. The reason is, casting one pointer type
to another (assuming no alignment problems) isn't by itself
undefined behavior; the question is what happens with that other
pointer, /and that isn't shown in the example/. It's quite
straightforward to write 'bind()' so that no undefined behavior
will occur. So the example is, IMO, misleadingly incomplete. Or
maybe both misleading and incomplete.

Regarding the 'strict-aliasing' rule, as you said before (and you
proved me), 'strict aliasing rules' and 'standard aliasing rules' (the
one define by the C standards) are the same.

I have since revised my opinion on this. I don't know whether
gcc's -fstrict-aliasing is meant actually to invalidate some cases
of effective type rules, but I believe it does mean at least to
push into some gray areas of the effective type rules. In other
words there are some plausible interpretations of what effective
type rules require that using -fstrict-aliasing will fail to meet
in some cases. This makes it all the more important that exactly
what rules are followed by -fstrict-aliasing be documented precisely
so people know just what to expect.

Passing '-fno-strict-aliasing' to GCC allows to generate no conformant
programs because it seems from the man page that GCC allows more
aliasing cases than the ones defined by the standards. Which ones
exactly, I don't know. Futhermore 'strict-aliasing' option is either
ON or OFF depending on the optimisation level. So passing '-O0' to GCC
can generate a program which behaves differently when compiling with '-
O2'.

Actually it's the other way around. Using -fno-strict-aliasing
means _more_ cases alias, which would bring the compiler more
into conformance, not less. Using -fstrict-aliasing means some
cases that would satisfy the more tolerant aliasing rules now do
not, so those cases would be assumed to be independent, which means
more code motion, and a greater chance for non-conformance.

The point about strict-aliasing depending on optimization level
is a good one. On hearing about it I'm not really surprised,
but it's exactly the sort of thing that might bite someone if
they weren't tuned in to the possibility. Yow!
 
F

Francis Moreau

[snip]
What it doesn't anymore is the description of:
   - type punning described by wikipedia

Some information in this wikipedia page is wrong.  More
specifically, the reference about using union types uses a
non-normative statement from Annex J, which is misleading about
what exactly when unspecified values come into play.  The actual
normative text is in 6.2.6.1p6:

    When a value is stored in an object of structure or union
    type, including in a member object, the bytes of the object
    representation that correspond to any padding bytes take
    unspecified values.

So it's only reading another union member _larger_ than the last
one written that unspecified values enter the picture.  Reading a
union member that is smaller or the same size as the last one
written must re-interpret the previously written bytes under the
read member's type.  Near the bottom of the Wikipedia page there is
a link for DR 257 -- if someone had followed up reading this DR and
the other DR's it mentions, they would see the text (now present in
a footnote in n1256) saying that this re-interpretation (and also
explaining it as "type punning") is what will happen.  Maybe I
should put a note on the Wikipedia page about that and see if
anyone follows up on it...  (No promises!)
   - the 'strict-aliasing' option in GCC man page

I'll have more to say about that in a moment.
The Berkeley sockets interface, which is given as example of type
punning by:http://en.wikipedia.org/wiki/Type_punning, relies on
undefined behaviours:
   struct sockaddr_in sa = {0};
   [...]
   bind(sockfd, (struct sockaddr *)&sa, sizeof sa);

The discussion of this on the Wikipedia page is somewhat shallow.
In fact it's impossible to tell, based on information in the
Wikipedia page, if there is undefined behavior or not (not counting
the question of alignment of the two structs).

BTW, one question I'm wondering is: does a structure have the same
alignment requirement as its first member (since there's no padding at
the beginning of the structure) ?
 Certainly the
'bind()' function could be written in a way so that there is
no undefined behavior.

Well, actually looking at it more closely, there's no alias issue at
all as long as 'bind()' is not implemented as a macro.
 The reason is, casting one pointer type
to another (assuming no alignment problems) isn't by itself
undefined behavior;  the question is what happens with that other
pointer, /and that isn't shown in the example/.  It's quite
straightforward to write 'bind()' so that no undefined behavior
will occur.

Let's assume that 'bind()' will dereference the passed pointer...
 So the example is, IMO, misleadingly incomplete.  Or
maybe both misleading and incomplete.

Yes but you told me:

"""
The general principle is that it's illegal to access members of one
kind of struct by means of member access (ie, using '.' or '->') using
a designator that is a different kind of struct. This principle does
have an exception, and that exception is spelled out specifically in
6.5.2.3p5:
"""

and in the other hand, wikipedia claims:

"""
The Berkeley sockets library fundamentally relies on the fact that in
C, a pointer to struct sockaddr_in is freely convertible to a pointer
to struct sockaddr; and, in addition, that the two structure types
share the same memory layout. Therefore, a reference to the structure
field my_addr->sin_family (where my_addr is of type struct sockaddr*)
will actually refer to the field sa.sin_family (where sa is of type
struct sockaddr_in).
"""

and as I assumed before, it's most likely that 'bind()' will
dereference 'my_addr', hence whatever the implementation of 'bind()',
it will invoke undefined behavior.
I have since revised my opinion on this.
Interesting,

 I don't know whether
gcc's -fstrict-aliasing is meant actually to invalidate some cases
of effective type rules, but I believe it does mean at least to
push into some gray areas of the effective type rules.  In other
words there are some plausible interpretations of what effective
type rules require that using -fstrict-aliasing will fail to meet
in some cases.  This makes it all the more important that exactly
what rules are followed by -fstrict-aliasing be documented precisely
so people know just what to expect.

Which newsgroup/mailing-list is best for asking clarifications, in
your opinion ?
Actually it's the other way around.  Using -fno-strict-aliasing
means _more_ cases alias, which would bring the compiler more
into conformance, not less.

So you mean that these more cases alias have a defined behavior
according to the standard, right ?
 Using -fstrict-aliasing means some
cases that would satisfy the more tolerant aliasing rules now do
not, so those cases would be assumed to be independent, which means
more code motion, and a greater chance for non-conformance.

Well, that's what I thought before we started our discussion.

I tried to find alias cases which would be only allowed by '-fno-
strict-aliasing' but you proved me that all of them have undefined
behavior.
 
T

Tim Rentsch

Francis Moreau said:
[snip]
What it doesn't anymore is the description of:
- type punning described by wikipedia

Some information in this wikipedia page is wrong. More
specifically, the reference about using union types uses a
non-normative statement from Annex J, which is misleading about
what exactly when unspecified values come into play. The actual
normative text is in 6.2.6.1p6:

When a value is stored in an object of structure or union
type, including in a member object, the bytes of the object
representation that correspond to any padding bytes take
unspecified values.

So it's only reading another union member _larger_ than the last
one written that unspecified values enter the picture. Reading a
union member that is smaller or the same size as the last one
written must re-interpret the previously written bytes under the
read member's type. Near the bottom of the Wikipedia page there is
a link for DR 257 -- if someone had followed up reading this DR and
the other DR's it mentions, they would see the text (now present in
a footnote in n1256) saying that this re-interpretation (and also
explaining it as "type punning") is what will happen. Maybe I
should put a note on the Wikipedia page about that and see if
anyone follows up on it... (No promises!)
- the 'strict-aliasing' option in GCC man page

I'll have more to say about that in a moment.
The Berkeley sockets interface, which is given as example of type
punning by:http://en.wikipedia.org/wiki/Type_punning, relies on
undefined behaviours:
struct sockaddr_in sa = {0};
[...]
bind(sockfd, (struct sockaddr *)&sa, sizeof sa);

The discussion of this on the Wikipedia page is somewhat shallow.
In fact it's impossible to tell, based on information in the
Wikipedia page, if there is undefined behavior or not (not counting
the question of alignment of the two structs).

BTW, one question I'm wondering is: does a structure have the same
alignment requirement as its first member (since there's no padding at
the beginning of the structure) ?

The alignment requirement of a structure must be at least as
restrictive as the alignment requirements of all of its members,
including the first. So the alignment of a structure will be
some integer multiple (probably 1, but it can be larger than 1)
of the GCD of the alignments of its members.

Well, actually looking at it more closely, there's no alias issue at
all as long as 'bind()' is not implemented as a macro.

As far as the Standard is concerned, it doesn't matter whether
'bind()' is called or expanded as a macro; in either case if it
uses an undefined construct the result is undefined behavior.

Let's assume that 'bind()' will dereference the passed pointer...

Okay but there is more to the question... see below.
Yes but you told me:

"""
The general principle is that it's illegal to access members of one
kind of struct by means of member access (ie, using '.' or '->') using
a designator that is a different kind of struct. This principle does
have an exception, and that exception is spelled out specifically in
6.5.2.3p5:
"""

and in the other hand, wikipedia claims:

"""
The Berkeley sockets library fundamentally relies on the fact that in
C, a pointer to struct sockaddr_in is freely convertible to a pointer
to struct sockaddr; and, in addition, that the two structure types
share the same memory layout. Therefore, a reference to the structure
field my_addr->sin_family (where my_addr is of type struct sockaddr*)
will actually refer to the field sa.sin_family (where sa is of type
struct sockaddr_in).
"""

and as I assumed before, it's most likely that 'bind()' will
dereference 'my_addr', hence whatever the implementation of 'bind()',
it will invoke undefined behavior.

Consider two possible ways this could be done (both in bind):

my_addr->sin_family

or

(unsigned short *)my_addr /* or fancier, using offsetof, etc */

The first of these is (technically) undefined behavior -- one
kind of struct is being accessed as another. The second is not
undefined behavior, because the access is done directly which is
allowed under effective type rules (this assumes that the struct
alignment requirements are satisfied, and that the respective
offsets match up, but these conditions are implementation-defined
at worst; in fact the sin_family field is the first member in
these two structs so the offsets are guaranteed to match).

This is why I say that whether there is undefined behavior
depends on the definition of bind().

Which newsgroup/mailing-list is best for asking clarifications, in
your opinion ?

I don't really know. A good place to start is probably going
to google groups and searching for 'gcc' or 'gnu gcc' under
the 'search for a group' box. There may be a developers or
development-related mailing list for gcc, maybe that could
be found by a 'gcc mail list' query on google proper? I see
there is a 'gnu.gcc.help', that's probably a good group to
ask this question again. These are all just guesses, of course.

So you mean that these more cases alias have a defined behavior
according to the standard, right ?

Not exactly. The Standard requires some set of cases to alias
each other in a well-defined way. If the set of cases that
an implementation interprets as allowed aliasing is a superset
of the Standard-specified set, the implementation is conforming
as far as aliasing goes. But if the implementation-allowed set is
only a subset of the Standard-specified set, then there will be
some cases where behavior is defined as far as the Standard is
concerned, but the implementation might rearrange code in a way
that invalidates the Standard's specified semantics.

Getting back to your question, it isn't that more cases have
defined behavior, it's that more cases will behave in the
same way that a simple implementation would. The same behaviors
are defined in either cases; what changes is what behaviors
that the Standard considers non-defined will become predictable
under the more tolerant set of aliasing cases.

Well, that's what I thought before we started our discussion.

I tried to find alias cases which would be only allowed by '-fno-
strict-aliasing' but you proved me that all of them have undefined
behavior.

The gcc man page gives an example involving unions. I'm not
sure if the Standard means for this case to be undefined
behavior or not, but it is at least a gray area. So that may
be a starting point for you.

I have a question for you. Is your interest in strict
aliasing just academic, or do you expect it to make a
difference in some actual development you're involved
with? There may be a better way to solve the underlying
problem if there is a more specific problem to solve.
 
K

Keith Thompson

Tim Rentsch said:
The alignment requirement of a structure must be at least as
restrictive as the alignment requirements of all of its members,
including the first. So the alignment of a structure will be
some integer multiple (probably 1, but it can be larger than 1)
of the GCD of the alignments of its members.
[...]

I think you meant LCM (least common multiple), not GCD (greatest
common divisor). For example, if two members of a struct have
alignments 4 and 6, the alignment of the struct is at least 12.

In practice, on most systems, all alignments are powers of two,
so the alignment of a struct is merely at least the largest of any
of its members' alignments.
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
The alignment requirement of a structure must be at least as
restrictive as the alignment requirements of all of its members,
including the first. So the alignment of a structure will be
some integer multiple (probably 1, but it can be larger than 1)
of the GCD of the alignments of its members.
[...]

I think you meant LCM (least common multiple), not GCD (greatest
common divisor). For example, if two members of a struct have
alignments 4 and 6, the alignment of the struct is at least 12.

Sorry, you are right of course. Thank you.
 
F

Francis Moreau

[ sorry for the late answer, I was off during these last days ]

Francis Moreau <[email protected]> writes:

[ snip]
The alignment requirement of a structure must be at least as
restrictive as the alignment requirements of all of its members,
including the first.

Could you point out the revelant section in the standard ?
As far as the Standard is concerned, it doesn't matter whether
'bind()' is called or expanded as a macro;  in either case if it
uses an undefined construct the result is undefined behavior.

I was thinking about the alias thing only, but you're right.
Okay but there is more to the question...  see below.











Consider two possible ways this could be done (both in bind):

    my_addr->sin_family

or

    (unsigned short *)my_addr  /* or fancier, using offsetof, etc */

The first of these is (technically) undefined behavior -- one
kind of struct is being accessed as another.  The second is not
undefined behavior, because the access is done directly which is
allowed under effective type rules (this assumes that the struct
alignment requirements are satisfied, and that the respective
offsets match up, but these conditions are implementation-defined
at worst;  in fact the sin_family field is the first member in
these two structs so the offsets are guaranteed to match).
okay.



Not exactly.  The Standard requires some set of cases to alias
each other in a well-defined way.  If the set of cases that
an implementation interprets as allowed aliasing is a superset
of the Standard-specified set, the implementation is conforming
as far as aliasing goes.  But if the implementation-allowed set is
only a subset of the Standard-specified set, then there will be
some cases where behavior is defined as far as the Standard is
concerned, but the implementation might rearrange code in a way
that invalidates the Standard's specified semantics.

Getting back to your question, it isn't that more cases have
defined behavior, it's that more cases will behave in the
same way that a simple implementation would.  The same behaviors
are defined in either cases;  what changes is what behaviors
that the Standard considers non-defined will become predictable
under the more tolerant set of aliasing cases.

But does that bring the compiler more into conformance ?
The gcc man page gives an example involving unions.  I'm not
sure if the Standard means for this case to be undefined
behavior or not, but it is at least a gray area.  So that may
be a starting point for you.

Why are you not sure ?

I used structures in my examples but I think the arguments you gave me
still hold for unions.

Regarding this example given by the man page:

int f() {
double d = 3.0;
return ((union a_union *) &d)->i;
}

is not defined by the standard since '&d' doesn't point to a union
object.

Regarding this one:

int f() {
union a_union t;
int *ip;
t.d = 3.0;
ip = &t.i;
return *ip;
}

well I don't know, but I'm pretty sure you can come up with arguments
that make it undefined behavior ;)
I have a question for you.  Is your interest in strict
aliasing just academic, or do you expect it to make a
difference in some actual development you're involved
with?  There may be a better way to solve the underlying
problem if there is a more specific problem to solve.

No my interest is just academic, I try to avoid using aliasing stuffs
as far I'm concerned.
 
T

Tim Rentsch

Francis Moreau said:
[ sorry for the late answer, I was off during these last days ]

Francis Moreau <[email protected]> writes:

[ snip]
The alignment requirement of a structure must be at least as
restrictive as the alignment requirements of all of its members,
including the first.

Could you point out the revelant section in the standard ?

There is nothing in the Standard that says this directly. It's a
logical consequence of other requirements; more specifically,
since we can take the address of any non-bitfield member (of
either a struct or a union), the alignment requirements for
each member must translate into alignment requirements that
are at least as restrictive for the struct/union as a whole.
All the members' requirements must be met simultaneously,
hence my statement.

But does that bring the compiler more into conformance ?

Wider set of alias types allowed ==> more constraints on
code motion ==> code more in keeping with "naive" abstract
machine ==> smaller chance that code will violate expectations
of simple reading of standard. That isn't really "more into
conformance", since once it reaches the point of conforming
an implementation doesn't become "more conforming", but
I think you have the general idea. Make sense?

Why are you not sure ?

Because text in the Standard unfortunately is not clear (or
arguably is definitely ambiguous) on this point.

I used structures in my examples but I think the arguments you gave me
still hold for unions.

Generally that's true.
Regarding this example given by the man page:

int f() {
double d = 3.0;
return ((union a_union *) &d)->i;
}

is not defined by the standard since '&d' doesn't point to a union
object.
Right.

Regarding this one:

int f() {
union a_union t;
int *ip;
t.d = 3.0;
ip = &t.i;
return *ip;
}

well I don't know, but I'm pretty sure you can come up with arguments
that make it undefined behavior ;)

I think what you mean is that someone else can come up with
arguments that it's undefined behavior (and I believe that's
right, someone else can). Speaking pragmatically I'm content
to just label it a gray area.

No my interest is just academic, I try to avoid using aliasing stuffs
as far I'm concerned.

Okay, well I hope my answers have been interesting as well as
informative.
 
F

Francis Moreau

Francis Moreau said:
[ sorry for the late answer, I was off during these last days ]
[ snip]
The alignment requirement of a structure must be at least as
restrictive as the alignment requirements of all of its members,
including the first.
Could you point out the revelant section in the standard ?

There is nothing in the Standard that says this directly.  It's a
logical consequence of other requirements;  more specifically,
since we can take the address of any non-bitfield member (of
either a struct or a union), the alignment requirements for
each member must translate into alignment requirements that
are at least as restrictive for the struct/union as a whole.
All the members' requirements must be met simultaneously,
hence my statement.
Ok.
But does that bring the compiler more into conformance ?

Wider set of alias types allowed ==> more constraints on
code motion ==> code more in keeping with "naive" abstract
machine ==> smaller chance that code will violate expectations
of simple reading of standard.  That isn't really "more into
conformance", since once it reaches the point of conforming
an implementation doesn't become "more conforming", but
I think you have the general idea.  Make sense?

yes, but I was confused by the "more into conformance" term you used
previously.
Because text in the Standard unfortunately is not clear (or
arguably is definitely ambiguous) on this point.


Generally that's true.






I think what you mean is that someone else can come up with
arguments that it's undefined behavior (and I believe that's
right, someone else can).  Speaking pragmatically I'm content
to just label it a gray area.



Okay, well I hope my answers have been interesting as well as
informative.

as I said previously, definitively yes.

So I'll try to clarify which alias cases (that the standard considers
non defined) will become predictable with the '-fno-strict-aliasing'
switch.

But that's going to happen after a long rest :)

Thanks.
 
T

Tim Rentsch

Francis Moreau said:
Francis Moreau said:
[ sorry for the late answer, I was off during these last days ]
Actually it's the other way around. Using -fno-strict-aliasing
means _more_ cases alias, which would bring the compiler more
into conformance, not less.
So you mean that these more cases alias have a defined behavior
according to the standard, right ?
Not exactly. The Standard requires some set of cases to alias
each other in a well-defined way. If the set of cases that
an implementation interprets as allowed aliasing is a superset
of the Standard-specified set, the implementation is conforming
as far as aliasing goes. But if the implementation-allowed set is
only a subset of the Standard-specified set, then there will be
some cases where behavior is defined as far as the Standard is
concerned, but the implementation might rearrange code in a way
that invalidates the Standard's specified semantics.
Getting back to your question, it isn't that more cases have
defined behavior, it's that more cases will behave in the
same way that a simple implementation would. The same behaviors
are defined in either cases; what changes is what behaviors
that the Standard considers non-defined will become predictable
under the more tolerant set of aliasing cases.
But does that bring the compiler more into conformance ?

Wider set of alias types allowed ==> more constraints on
code motion ==> code more in keeping with "naive" abstract
machine ==> smaller chance that code will violate expectations
of simple reading of standard. That isn't really "more into
conformance", since once it reaches the point of conforming
an implementation doesn't become "more conforming", but
I think you have the general idea. Make sense?

yes, but I was confused by the "more into conformance" term you used
previously.

Not the best choice of wording on my part. Perhaps "further
in the direction of being conforming" would be better, or
"more directly aligned with semantics on the abstract machine".
Oh well, at least we finally got there. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top