Aliasing in assignment

Lauri Alanko · Mar 22, 2007

The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;
};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

return a.val;
}

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

Whose fault is this, the programmer's or the compiler's? Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a, but if that
made the code illegal, how about the following ubiquitous idiom:

Node* p = &a;
p = p->next;

Here, too, we both access p and write to p without sequence points in
between. What's the difference, or is there any?

Thanks in advance.

Lauri

Dave Vandervies · Mar 22, 2007

The following code crashes on Solaris 10 when compiled without
optimization:

[snip bits that make it compileable]

Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

Whose fault is this, the programmer's or the compiler's? Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a,

As far as I can tell, the old value is accessed only to determine the
new value to be stored (access pointer in a.next to determine where to
find value to store, dereference pointer to get value to store), which
means it's perfectly acceptable according to 6.5#2 of n869, which is
the paragraph that would make it undefined if that were the problem.

So unless I'm missing something, this looks like a compiler bug.

dave

Peter Nilsson · Mar 22, 2007

Lauri Alanko said:
The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;
};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

This violates constraint 6.7.8p4 which states that an
initialisor has to be a constant.

Try...

static Node b = { 2, 0 };
static Node a = { 1, &b };

a = *(a.next);
return a.val;

Note that neither 1 nor 2 are portable values for main to
return to the host. Use 0, EXIT_SUCCESS or EXIT_FAILURE;

the latter two from said:
}

What happens is that after a has been written to, and a.next
has been set to null, a.next is dereferenced again (for some
obscure reason).

Whose fault is this, the programmer's or the compiler's?

One fault is the programmer's. If fixing that doesn't fix the
problem, then it appears to be the compiler's.

Eric Sosman · Mar 22, 2007

Peter said:
This violates constraint 6.7.8p4 which states that an
initialisor has to be a constant.

No; the cited paragraph says

All the expressions in an initializer for an object that
^^^^^^^^^^^^^^^^^^
has static storage duration shall be constant expressions
^^^^^^^^^^^^^^^^^^^^^^^^^^^
or string literals.

(Emphasis mine.)

One fault is the programmer's. If fixing that doesn't fix the
problem, then it appears to be the compiler's.

I think (mind you, I say I "think") it's the compiler's fault.

Ben Pfaff · Mar 22, 2007

Eric Sosman said:
No; the cited paragraph says

All the expressions in an initializer for an object that
^^^^^^^^^^^^^^^^^^
has static storage duration shall be constant expressions
^^^^^^^^^^^^^^^^^^^^^^^^^^^
or string literals.

But in C89, the corresponding paragraph said:

All the expressions in an initializer for an object that has static
storage duration or in an initializer list for an object that has
aggregate or union type shall be constant expressions.

(or something similar; I'm quoting from a draft.)

pete · Mar 23, 2007

Ben said:
But in C89, the corresponding paragraph said:

All the expressions in an initializer
for an object that has static
storage duration or in an initializer list for an object that has
aggregate or union type shall be constant expressions.

(or something similar; I'm quoting from a draft.)

The same words are in ISO/IEC 9899: 1990.

Old Wolf · Mar 23, 2007

The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;

};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

return a.val;

}

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

How do you know?

Whose fault is this, the programmer's or the compiler's? Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a.

There's an ugly clause in the standard that defines what's legal.
Informally, it is legal to both read from and write to a variable
without a sequence point, if and only if you must perform the read
in order to compute the value to be written -- ie. there is a
temporal relationship.

In this case, it is OK because you cannot dereference a.next without
first evaluating a.next, which in turn requires that you have already
evaluated 'a'.

Mark F. Haigh · Mar 23, 2007

The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;

};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

return a.val;

}

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

Whose fault is this, the programmer's or the compiler's? [...]

<snip>

If you're using C89, it's the programmer's fault. Quoting from my BS/
EN 29899:1993 A4 hardcopy:

6.5.7 Initialization

Constraints [...]

All the expressions in an initializer for an object that has static
storage duration or in an initializer list for an object that has
aggregate or union type shall be constant expressions.

Obviously the code "Node a = { 1, &b };" does not satisfy this
constraint. In fact, my compiler complains about it when invoked in
strict C89 mode:

[mark@icepick ~]$ gcc -Wall -O2 foo.c -o foo -ansi -pedantic -std=c89
foo.c: In function 'main':
foo.c:14: warning: initializer element is not computable at load time

With the advent of GNU C and C99, the rules have changed. Quoting
9899:1999 TC2 draft N1124:

6.7.8 Initialization

Constraints [...]

4. All the expressions in an initializer for an object that has
static storage duration shall be constant expressions or string
literals.

Notice that the constraints have been loosened for aggregate and union
types.

Although I certainly cannot speak for the WG, it appears that one of
the reasons this feature has been adopted in C99 is the prevalent use
of the following GNU C extension (which pre-dates C99):

[GCC 2.95.3 Manual, Extensions to the C Language Family]

4.18 Non-Constant Initializers

As in standard C++, the elements of an aggregate initializer for an
automatic variable are not required to be constant expressions in GNU
C.

Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a, but if that
made the code illegal, how about the following ubiquitous idiom:

Node* p = &a;
p = p->next;

Here, too, we both access p and write to p without sequence points in
between. What's the difference, or is there any?

The difference is that the undefined behavior is invoked in the
initialization of the type. This undefined behavior is apparently
causing the assignment to crash, which is one of the things that
undefined behavior often does.

In contrast, the pointer object Node* p is not an aggregate type, and
not subject to the same initialization rules that an aggregate or
union type is.

Mark F. Haigh
(e-mail address removed)

christian.bau · Mar 24, 2007

The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;

};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

return a.val;

}

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

Whose fault is this, the programmer's or the compiler's? Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a, but if that
made the code illegal, how about the following ubiquitous idiom:

Node* p = &a;
p = p->next;

Here, too, we both access p and write to p without sequence points in
between. What's the difference, or is there any?

Thanks in advance.

Lauri

Since there are complaints about the initialisers (why the hell would
a compiler accept an initialisation if it invokes undefined
behavior? ), could you tell us what happens if you write

int main(void)
{
Node b, a;
b.val = 2; b.next = 0;
a.val = 1; a.next = &b;

a = *(a.next);

return a.val;

}

Flash Gordon · Mar 24, 2007

christian.bau wrote, On 24/03/07 00:39:

Since there are complaints about the initialisers (why the hell would
a compiler accept an initialisation if it invokes undefined
behavior? ), could you tell us what happens if you write

<snip>

Look up undefined in a dictionary or the C standard. It means it is not
defined, part of not being defined is that it does not define that a
diagnostic should be produced.

Mark F. Haigh · Mar 24, 2007

christian.bau wrote, On 24/03/07 00:39:

<snip>

Look up undefined in a dictionary or the C standard. It means it is not
defined, part of not being defined is that it does not define that a
diagnostic should be produced.
--

The C Standard Rationale has some interesting things to say:

3 Terms and Definitions

25 The terms unspecified behavior, undefined behavior, and
implementation-defined behavior are used to categorize the result of
writing programs whose properties the Standard does not, or cannot,
completely describe. The goal of adopting this categorization is to
allow a certain variety among implementations which permits quality of
implementation to be an active force in the marketplace as well as to
allow certain popular extensions, without removing the cachet of
conformance to the Standard.
[...]

Ah, yes. "Quality of implementation". Good-quality implementations
warn the user and try to do something reasonable. Poor-quality
implementations silently produce broken code.

I'd wager that Christian understands the definition of 'undefined'.
His point is that an implementation that cannot warn the user over
such a simple and minor transgression is a bit too DeathStation-ish on
the QoI scale to be allowed to roam free in the wild.

Mark F. Haigh
(e-mail address removed)

James Dow Allen · Mar 24, 2007

The following code crashes on Solaris 10 when compiled without
optimization:
... int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };
a = *(a.next);
return a.val;
}

Lawyerly types are debating what the compiler *may*
or *must* do, but I'm very curious about what it *did*
do. Please let us see the compiler output
(eg, output of ``cc -S'').

IIRC, Sun's compiler for Sparc would sometimes
(because of pipelining and to save space
in branches) allow an unwilled statement to execute,
but only if it were harmless, and (I thought) only
with optimization. Anyway that shouldn't arise in
your unbranching non-inlined function.

James

Flash Gordon · Mar 24, 2007

Mark F. Haigh wrote, On 24/03/07 07:24:

christian.bau wrote, On 24/03/07 00:39:

<snip>

Look up undefined in a dictionary or the C standard. It means it is not
defined, part of not being defined is that it does not define that a
diagnostic should be produced.
--

Click to expand...

The C Standard Rationale has some interesting things to say:

3 Terms and Definitions

25 The terms unspecified behavior, undefined behavior, and
implementation-defined behavior are used to categorize the result of
writing programs whose properties the Standard does not, or cannot,
completely describe. The goal of adopting this categorization is to
allow a certain variety among implementations which permits quality of
implementation to be an active force in the marketplace as well as to
allow certain popular extensions, without removing the cachet of
conformance to the Standard.
[...]

Ah, yes. "Quality of implementation". Good-quality implementations
warn the user and try to do something reasonable. Poor-quality
implementations silently produce broken code.

I'd wager that Christian understands the definition of 'undefined'.
His point is that an implementation that cannot warn the user over
such a simple and minor transgression is a bit too DeathStation-ish on
the QoI scale to be allowed to roam free in the wild.

In this particular case it could be that it does not warn because it
allows it as an extension which is allowed by what you quote above. So
there might be a very good reason for not producing a warning in default
mode.

Lauri Alanko · Mar 26, 2007

Thanks to Dave and Wolf for informative answers: 6.5#2 indeed seems to
justify both "p = p->next" and "a = *(a.next)", so I can conclude that
this is a compiler bug.

To those interested in the details:

typedef struct Node Node;

struct Node {
int val;
Node *next;
};

int main(void)
{
Node a, b;
b.val = 2;
b.next = 0;
a.val = 1;
a.next = &b;

a = *(a.next);

return a.val;
}

$ uname -a
SunOS xxxxxxxx 5.10 Generic sun4u sparc SUNW,Sun-Fire-V210 Solaris
$ /opt/SUNWspro/bin/cc -g -V -S t.c -o t.s
cc: Sun C 5.8 2005/10/13
acomp: Sun C 5.8 2005/10/13
$ /opt/SUNWspro/bin/cc -g -V -o t t.s
cc: Sun C 5.8 2005/10/13
ld: Software Generation Utilities - Solaris Link Editors: 5.10-1.479
$ ./t
Segmentation Fault

Here's the relevant part from t.s:

! 14 a.next = &b;

add %fp,-20,%l0
st %l0,[%fp-8]

! block 5
..L21:

! 16 a = *(a.next);

ld [%fp-8],%l2
add %fp,-12,%l0
..L_y0:
ld [%l2+0],%l1
st %l1,[%l0+0]
..L_y1:
ld [%l2+4],%l1
st %l1,[%l0+4]
ld [%fp-8],%l0
or %g0,4,%g1
1:
subcc %g1,4,%g1
..L_y2:
ld [%l0+%g1],%l2
bg 1b+4
subcc %g1,4,%g1

The segfault happens in the last ld instruction, since %l0 is zero.
("How do I know?" I use dbx, doh.) The last six instructions don't seem
to make any sense in any case. It's as if there were a dummy *(a.next)
dereference after the assignment was completed. This happens both with
and without -g, but not with -O.

Finally, to the numerous would-be language lawyers who responded: please
try to get your act together. Comp.lang.c must be in a sorry state
nowadays, if you can't find better remarks than "All right, maybe it's
legal _now_, but it's only been legal for seven years. If you'd tried
pulling that trick before then, you'd be in _real_ trouble now!" Somehow
that seems to lack the desired punch...

For what it's worth, Sun cc's man page explicitly says that C99 language
features are supported by default.

Lauri

CBFalconer · Mar 27, 2007

Lauri said:
Thanks to Dave and Wolf for informative answers: 6.5#2 indeed
seems to justify both "p = p->next" and "a = *(a.next)", so I can
conclude that this is a compiler bug.

No you can't.

.... snip ...

typedef struct Node Node;

struct Node {
int val;
Node *next;
};

int main(void)
{
Node a, b;
b.val = 2;
b.next = 0;
a.val = 1;
a.next = &b;

a = *(a.next);

return a.val;
}

If you follow the action, you will find you are dereferencing a
NULL pointer. Boom.

Dave Vandervies · Mar 27, 2007

No you can't.

... snip ...

If you follow the action, you will find you are dereferencing a
NULL pointer. Boom.

Where?

dave

CBFalconer · Mar 27, 2007

Dave said:
Where?

Now I don't see it myself. 6 sets a = b, so a.next is NULL. Yet
a.val is 2. Now it looks like a bug to me.

C11, const, and aliasing	13	Oct 26, 2013
strict-aliasing??	5	Apr 8, 2010
link pointer access problem	4	Sep 17, 2009
Strict aliasing and Q2.6 in the FAQ	7	Sep 19, 2011
gcc, aliasing rules and unions	3	Apr 18, 2006
Queue in C	25	May 19, 2014
assignment makes pointer from integer without a cast	3	May 5, 2009
strict aliasing rules in ISO C, someone understands them ?	20	Oct 13, 2005

Aliasing in assignment

Lauri Alanko

Dave Vandervies

Peter Nilsson

Eric Sosman

Ben Pfaff

pete

Old Wolf

Mark F. Haigh

christian.bau

Flash Gordon

Mark F. Haigh

James Dow Allen

Flash Gordon

Lauri Alanko

CBFalconer

Dave Vandervies

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads