Aliasing in assignment

L

Lauri Alanko

The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;
};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

return a.val;
}

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

Whose fault is this, the programmer's or the compiler's? Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a, but if that
made the code illegal, how about the following ubiquitous idiom:

Node* p = &a;
p = p->next;

Here, too, we both access p and write to p without sequence points in
between. What's the difference, or is there any?

Thanks in advance.


Lauri
 
D

Dave Vandervies

The following code crashes on Solaris 10 when compiled without
optimization:

[snip bits that make it compileable]
Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

Whose fault is this, the programmer's or the compiler's? Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a,

As far as I can tell, the old value is accessed only to determine the
new value to be stored (access pointer in a.next to determine where to
find value to store, dereference pointer to get value to store), which
means it's perfectly acceptable according to 6.5#2 of n869, which is
the paragraph that would make it undefined if that were the problem.

So unless I'm missing something, this looks like a compiler bug.


dave
 
P

Peter Nilsson

Lauri Alanko said:
The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;
};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

This violates constraint 6.7.8p4 which states that an
initialisor has to be a constant.

Try...

static Node b = { 2, 0 };
static Node a = { 1, &b };
a = *(a.next);
return a.val;

Note that neither 1 nor 2 are portable values for main to
return to the host. Use 0, EXIT_SUCCESS or EXIT_FAILURE;
the latter two from said:
}

What happens is that after a has been written to, and a.next
has been set to null, a.next is dereferenced again (for some
obscure reason).

Whose fault is this, the programmer's or the compiler's?

One fault is the programmer's. If fixing that doesn't fix the
problem, then it appears to be the compiler's.
 
E

Eric Sosman

Peter said:
This violates constraint 6.7.8p4 which states that an
initialisor has to be a constant.

No; the cited paragraph says

All the expressions in an initializer for an object that
^^^^^^^^^^^^^^^^^^
has static storage duration shall be constant expressions
^^^^^^^^^^^^^^^^^^^^^^^^^^^
or string literals.

(Emphasis mine.)
One fault is the programmer's. If fixing that doesn't fix the
problem, then it appears to be the compiler's.

I think (mind you, I say I "think") it's the compiler's fault.
 
B

Ben Pfaff

Eric Sosman said:
No; the cited paragraph says

All the expressions in an initializer for an object that
^^^^^^^^^^^^^^^^^^
has static storage duration shall be constant expressions
^^^^^^^^^^^^^^^^^^^^^^^^^^^
or string literals.

But in C89, the corresponding paragraph said:

All the expressions in an initializer for an object that has static
storage duration or in an initializer list for an object that has
aggregate or union type shall be constant expressions.

(or something similar; I'm quoting from a draft.)
 
P

pete

Ben said:
But in C89, the corresponding paragraph said:

All the expressions in an initializer
for an object that has static
storage duration or in an initializer list for an object that has
aggregate or union type shall be constant expressions.

(or something similar; I'm quoting from a draft.)

The same words are in ISO/IEC 9899: 1990.
 
O

Old Wolf

The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;

};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

return a.val;

}

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

How do you know?
Whose fault is this, the programmer's or the compiler's? Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a.

There's an ugly clause in the standard that defines what's legal.
Informally, it is legal to both read from and write to a variable
without a sequence point, if and only if you must perform the read
in order to compute the value to be written -- ie. there is a
temporal relationship.

In this case, it is OK because you cannot dereference a.next without
first evaluating a.next, which in turn requires that you have already
evaluated 'a'.
 
M

Mark F. Haigh

The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;

};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

return a.val;

}

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

Whose fault is this, the programmer's or the compiler's? [...]
<snip>

If you're using C89, it's the programmer's fault. Quoting from my BS/
EN 29899:1993 A4 hardcopy:

6.5.7 Initialization

Constraints [...]

All the expressions in an initializer for an object that has static
storage duration or in an initializer list for an object that has
aggregate or union type shall be constant expressions.


Obviously the code "Node a = { 1, &b };" does not satisfy this
constraint. In fact, my compiler complains about it when invoked in
strict C89 mode:

[mark@icepick ~]$ gcc -Wall -O2 foo.c -o foo -ansi -pedantic -std=c89
foo.c: In function 'main':
foo.c:14: warning: initializer element is not computable at load time


With the advent of GNU C and C99, the rules have changed. Quoting
9899:1999 TC2 draft N1124:

6.7.8 Initialization

Constraints [...]

4. All the expressions in an initializer for an object that has
static storage duration shall be constant expressions or string
literals.


Notice that the constraints have been loosened for aggregate and union
types.

Although I certainly cannot speak for the WG, it appears that one of
the reasons this feature has been adopted in C99 is the prevalent use
of the following GNU C extension (which pre-dates C99):

[GCC 2.95.3 Manual, Extensions to the C Language Family]

4.18 Non-Constant Initializers

As in standard C++, the elements of an aggregate initializer for an
automatic variable are not required to be constant expressions in GNU
C.

Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a, but if that
made the code illegal, how about the following ubiquitous idiom:

Node* p = &a;
p = p->next;

Here, too, we both access p and write to p without sequence points in
between. What's the difference, or is there any?

The difference is that the undefined behavior is invoked in the
initialization of the type. This undefined behavior is apparently
causing the assignment to crash, which is one of the things that
undefined behavior often does.

In contrast, the pointer object Node* p is not an aggregate type, and
not subject to the same initialization rules that an aggregate or
union type is.

Mark F. Haigh
(e-mail address removed)
 
C

christian.bau

The following code crashes on Solaris 10 when compiled without
optimization:

typedef struct Node Node;

struct Node {
int val;
Node *next;

};

int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };

a = *(a.next);

return a.val;

}

What happens is that after a has been written to, and a.next has been
set to null, a.next is dereferenced again (for some obscure reason).

Whose fault is this, the programmer's or the compiler's? Initially I
thought that it's the programmer's fault since there are no sequence
points between the access to a.next and the writing to a, but if that
made the code illegal, how about the following ubiquitous idiom:

Node* p = &a;
p = p->next;

Here, too, we both access p and write to p without sequence points in
between. What's the difference, or is there any?

Thanks in advance.

Lauri

Since there are complaints about the initialisers (why the hell would
a compiler accept an initialisation if it invokes undefined
behavior? ), could you tell us what happens if you write

int main(void)
{
Node b, a;
b.val = 2; b.next = 0;
a.val = 1; a.next = &b;

a = *(a.next);

return a.val;

}
 
F

Flash Gordon

christian.bau wrote, On 24/03/07 00:39:

Since there are complaints about the initialisers (why the hell would
a compiler accept an initialisation if it invokes undefined
behavior? ), could you tell us what happens if you write

<snip>

Look up undefined in a dictionary or the C standard. It means it is not
defined, part of not being defined is that it does not define that a
diagnostic should be produced.
 
M

Mark F. Haigh

christian.bau wrote, On 24/03/07 00:39:



<snip>

Look up undefined in a dictionary or the C standard. It means it is not
defined, part of not being defined is that it does not define that a
diagnostic should be produced.
--

The C Standard Rationale has some interesting things to say:

3 Terms and Definitions

25 The terms unspecified behavior, undefined behavior, and
implementation-defined behavior are used to categorize the result of
writing programs whose properties the Standard does not, or cannot,
completely describe. The goal of adopting this categorization is to
allow a certain variety among implementations which permits quality of
implementation to be an active force in the marketplace as well as to
allow certain popular extensions, without removing the cachet of
conformance to the Standard.
[...]

Ah, yes. "Quality of implementation". Good-quality implementations
warn the user and try to do something reasonable. Poor-quality
implementations silently produce broken code.

I'd wager that Christian understands the definition of 'undefined'.
His point is that an implementation that cannot warn the user over
such a simple and minor transgression is a bit too DeathStation-ish on
the QoI scale to be allowed to roam free in the wild.


Mark F. Haigh
(e-mail address removed)
 
J

James Dow Allen

The following code crashes on Solaris 10 when compiled without
optimization:
... int main(void)
{
Node b = { 2, 0 };
Node a = { 1, &b };
a = *(a.next);
return a.val;
}

Lawyerly types are debating what the compiler *may*
or *must* do, but I'm very curious about what it *did*
do. Please let us see the compiler output
(eg, output of ``cc -S'').

IIRC, Sun's compiler for Sparc would sometimes
(because of pipelining and to save space
in branches) allow an unwilled statement to execute,
but only if it were harmless, and (I thought) only
with optimization. Anyway that shouldn't arise in
your unbranching non-inlined function.

James
 
F

Flash Gordon

Mark F. Haigh wrote, On 24/03/07 07:24:
christian.bau wrote, On 24/03/07 00:39:


<snip>

Look up undefined in a dictionary or the C standard. It means it is not
defined, part of not being defined is that it does not define that a
diagnostic should be produced.
--

The C Standard Rationale has some interesting things to say:

3 Terms and Definitions

25 The terms unspecified behavior, undefined behavior, and
implementation-defined behavior are used to categorize the result of
writing programs whose properties the Standard does not, or cannot,
completely describe. The goal of adopting this categorization is to
allow a certain variety among implementations which permits quality of
implementation to be an active force in the marketplace as well as to
allow certain popular extensions, without removing the cachet of
conformance to the Standard.
[...]

Ah, yes. "Quality of implementation". Good-quality implementations
warn the user and try to do something reasonable. Poor-quality
implementations silently produce broken code.

I'd wager that Christian understands the definition of 'undefined'.
His point is that an implementation that cannot warn the user over
such a simple and minor transgression is a bit too DeathStation-ish on
the QoI scale to be allowed to roam free in the wild.

In this particular case it could be that it does not warn because it
allows it as an extension which is allowed by what you quote above. So
there might be a very good reason for not producing a warning in default
mode.
 
L

Lauri Alanko

Thanks to Dave and Wolf for informative answers: 6.5#2 indeed seems to
justify both "p = p->next" and "a = *(a.next)", so I can conclude that
this is a compiler bug.

To those interested in the details:

typedef struct Node Node;

struct Node {
int val;
Node *next;
};

int main(void)
{
Node a, b;
b.val = 2;
b.next = 0;
a.val = 1;
a.next = &b;

a = *(a.next);

return a.val;
}

$ uname -a
SunOS xxxxxxxx 5.10 Generic sun4u sparc SUNW,Sun-Fire-V210 Solaris
$ /opt/SUNWspro/bin/cc -g -V -S t.c -o t.s
cc: Sun C 5.8 2005/10/13
acomp: Sun C 5.8 2005/10/13
$ /opt/SUNWspro/bin/cc -g -V -o t t.s
cc: Sun C 5.8 2005/10/13
ld: Software Generation Utilities - Solaris Link Editors: 5.10-1.479
$ ./t
Segmentation Fault

Here's the relevant part from t.s:

! 14 a.next = &b;

add %fp,-20,%l0
st %l0,[%fp-8]

! block 5
..L21:

! 16 a = *(a.next);

ld [%fp-8],%l2
add %fp,-12,%l0
..L_y0:
ld [%l2+0],%l1
st %l1,[%l0+0]
..L_y1:
ld [%l2+4],%l1
st %l1,[%l0+4]
ld [%fp-8],%l0
or %g0,4,%g1
1:
subcc %g1,4,%g1
..L_y2:
ld [%l0+%g1],%l2
bg 1b+4
subcc %g1,4,%g1

The segfault happens in the last ld instruction, since %l0 is zero.
("How do I know?" I use dbx, doh.) The last six instructions don't seem
to make any sense in any case. It's as if there were a dummy *(a.next)
dereference after the assignment was completed. This happens both with
and without -g, but not with -O.

Finally, to the numerous would-be language lawyers who responded: please
try to get your act together. Comp.lang.c must be in a sorry state
nowadays, if you can't find better remarks than "All right, maybe it's
legal _now_, but it's only been legal for seven years. If you'd tried
pulling that trick before then, you'd be in _real_ trouble now!" Somehow
that seems to lack the desired punch...

For what it's worth, Sun cc's man page explicitly says that C99 language
features are supported by default.


Lauri
 
C

CBFalconer

Lauri said:
Thanks to Dave and Wolf for informative answers: 6.5#2 indeed
seems to justify both "p = p->next" and "a = *(a.next)", so I can
conclude that this is a compiler bug.

No you can't.

.... snip ...
typedef struct Node Node;

struct Node {
int val;
Node *next;
};

int main(void)
{
Node a, b;
b.val = 2;
b.next = 0;
a.val = 1;
a.next = &b;

a = *(a.next);

return a.val;
}

If you follow the action, you will find you are dereferencing a
NULL pointer. Boom.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,273
Latest member
DamonShoem

Latest Threads

Top