compiler optimizing wrong?

Dirk Zabel · Jun 10, 2011

Hi,
I get unexpected behaviour from some code which was part of a
manufacturer-supplied library for hardware-related functions
(programming internal flash memory). I would like to get your opinion if
compiler or the code is the culprit.

The compiler is gcc-arm 4.5.1. When no optimization is selected, the
generated code looks as expected (surprise!). If size optimization is
selected with argument "-Os", the getstatus() call inside the loop is
removed.
I stripped down the code as far as possible, so it is more or less
abstract now.

--------------- snip --------------------------------
typedef enum {
st_busy, st_complete, st_timeout
} status_t;

typedef struct
{
volatile uint32_t CR;
volatile uint32_t SR;
} hw_reg_t;

extern hw_reg_t statusreg;
/*
* Get status (busy or complete)
* from inspection of hardware status register
*/
status_t getstatus(void)
{
status_t rval;

if ((statusreg.SR & 1) == 1) {
rval = st_busy;
} else {
rval = st_complete;
}
return rval;
}

/*
* wait until hardware operation is complete or
* timeout occurred
*/
status_t wait_for_operation(unsigned long Timeout)
{
status_t status;

status = getstatus();
while((status == st_busy) && (Timeout != 0)) {
status = getstatus(); /* <- optimized out! */
Timeout--;
}
if(Timeout == 0 && status == st_busy ) {
status = st_timeout;
}
return status;
}
--------------- snip --------------------------------

If I remove the volatile qualifiers from the struct members and instead
put volatile in front of the variable declaration, the problem vanishes:

--------------- snip --------------------------------
typedef struct
{
uint32_t CR;
uint32_t SR;
} hw_reg_t;

extern volatile hw_reg_t statusreg;
--------------- snip --------------------------------

Btw, it also does not occur if I access the status register inline in
wait_for_operation() instead of calling getstatus().

Is this a compiler error or are the volatile declarations inside the
typedef unsufficient?

Regards
Dirk

Noob · Jun 10, 2011

Dirk said:
I get unexpected behaviour from some code which was part of a
manufacturer-supplied library for hardware-related functions
(programming internal flash memory). I would like to get your opinion if
compiler or the code is the culprit.

The compiler is gcc-arm 4.5.1. When no optimization is selected, the
generated code looks as expected (surprise!). If size optimization is
selected with argument "-Os", the getstatus() call inside the loop is
removed.
I stripped down the code as far as possible, so it is more or less
abstract now.

On a related note, you might be interested in the following
article by John Regehr.

Volatile Bugs, Three Years Later
http://blog.regehr.org/archives/503

Regards.

Leo Havmøller · Jun 10, 2011

extern volatile hw_reg_t statusreg;

Leo Havmøller.

Thad Smith · Jun 11, 2011

Hi,
I get unexpected behaviour from some code which was part of a
manufacturer-supplied library for hardware-related functions (programming
internal flash memory). I would like to get your opinion if compiler or the code
is the culprit. ....
status = getstatus();
while((status == st_busy) && (Timeout != 0)) {
status = getstatus(); /* <- optimized out! */
Timeout--;

This is a compiler bug. Since getstatus() accesses a volatile object, it must
be done each time through the loop.

Dirk Zabel · Jun 14, 2011

Am 10.06.2011 14:08, schrieb Dirk Zabel:
Thanks for the answers.

jstrap42 · Jun 14, 2011

On a related note, you might be interested in the following
article by John Regehr.

Volatile Bugs, Three Years Laterhttp://blog.regehr.org/archives/503

Regards.

The example given in section 2.1 of that paper is this (I've had to
retype it so there could be mistakes):

volatile int buffer_ready;
char buffer[BUF_SIZE];

void buffer_init() {
int i;
for(i=0; i<BUF_SIZE; i++)
buffer=0;
buffer_ready=1;
}

Can anyone point out exactly where the sequence points are in that
code snippet?

Chris H · Jun 14, 2011

On a related note, you might be interested in the following
article by John Regehr.

Volatile Bugs, Three Years Later
http://blog.regehr.org/archives/503

The article is misleading as it talks about "embedded compilers" by
which they mean GCC. 11 of the 14 were GCC and two of the other three
were hardly mainstream embedded.

On the other hand I know a lot of embedded compilers they did not look
at that work correctly. (I know about two of them myself we tested for
safety critical use)

Noob · Jun 14, 2011

Chris said:
The article is misleading as it talks about "embedded compilers" by
which they mean GCC.

You're confusing the article and the paper.
I counted 0 occurence of the string "embedded compiler"
in Regehr's article.

11 of the 14 were GCC and two of the other three
were hardly mainstream embedded.

IIUC, you dispute the conclusion of their 2008 paper, because
their study didn't include several widely-used compilers?

Can you name these compilers? Are they available at no cost
for academic research?

On the other hand I know a lot of embedded compilers they did not look
at that work correctly. (I know about two of them myself we tested for
safety critical use)

What point are you're trying to make.

Shao Miller · Jun 14, 2011

On a related note, you might be interested in the following
article by John Regehr.

Volatile Bugs, Three Years Laterhttp://blog.regehr.org/archives/503

Regards.

Click to expand...

The example given in section 2.1 of that paper is this (I've had to
retype it so there could be mistakes):

volatile int buffer_ready;
char buffer[BUF_SIZE];

void buffer_init() {
int i;
for(i=0; i<BUF_SIZE; i++)
buffer=0;
buffer_ready=1;
}

Can anyone point out exactly where the sequence points are in that
code snippet?

I think at each semi-colon, except for the first three. Also in the
evaluation of the size specification in the declaration of 'buffer'.
Also after 'i++'.

Chris H · Jun 14, 2011

You're confusing the article and the paper.
I counted 0 occurence of the string "embedded compiler"
in Regehr's article.

The introduction says
//////////////////////////
We tested thirteen production-quality C compilers
and, for each, found situations in which the compiler generated
incorrect code for accessing volatile variables. This result is
disturbing because it implies that embedded software and operating
systems both typically coded in C, both being bases for many
mission-critical and safety-critical applications, and both relying
on the correct translation of volatilesâ€”may be being miscompiled.
///////////////

and the targets were IA32, Coldfire, MSP430 and AVR

Though you are not likely to be using GCC for safety critical or mission
critical code. Certainly not on the higher SIL levels.

But I do continually hear, due to this report, that "embedded compilers
don't handle volatile correctly" normally when referring to compilers
that were not referenced in the report and do handle volatile correctly.

IIUC, you dispute the conclusion of their 2008 paper, because
their study didn't include several widely-used compilers?

Well.... To be blunt it only really tested one. Variants of GCC.

Can you name these compilers?

No I was/am under NDA when we did the testing but it was for safety
critical use.

Are they available at no cost
for academic research?

So it is research on compilers they "can get for free?" Though I do
take your point that it is difficult to test compilers when you have to
buy them as they are not cheap.

On the other hand as most of the commercial compilers are heavily and
rigorously tested internally what is the advantage in letting a bunch of
academics have them?

In which case the report should have been entitled "GCC Volatiles Are
Miscompiled, and What to Do about It"

Tim Rentsch · Jun 15, 2011

Please excuse my following up in the wrong place in the thread
and with no attributions. Newsreader glitch.

The example given in section 2.1 of that paper is this (I've had to
retype it so there could be mistakes):

volatile int buffer_ready;
char buffer[BUF_SIZE];

void buffer_init() {
int i;
for(i=0; i<BUF_SIZE; i++)
buffer=0;
buffer_ready=1;
}

Can anyone point out exactly where the sequence points are in that
code snippet?

After each full declarator:

buffer_ready in 'volatile int buffer_ready;'
buffer[BUF_SIZE] in 'char buffer[BUF_SIZE];'
i in 'int i;'

After each full expression:

i=0
i<BUF_SIZE
i++
buffer=0
buffer_ready=1

(Of course the middle three of these potentially occur multiple
times because of the 'for' loop.)

It doesn't make much sense to talk about sequence points
in (or after) declarators at file scope, but these points
are what's specified in the Standard.

jstrap42 · Jun 15, 2011

Please excuse my following up in the wrong place in the thread
and with no attributions. Newsreader glitch.

The example given in section 2.1 of that paper is this (I've had to
retype it so there could be mistakes):

Click to expand...

volatile int buffer_ready;
char buffer[BUF_SIZE];

Click to expand...

void buffer_init() {
int i;
for(i=0; i<BUF_SIZE; i++)
buffer=0;
buffer_ready=1;
}

Click to expand...

Can anyone point out exactly where the sequence points are in that
code snippet?

Click to expand...

After each full declarator:

buffer_ready in 'volatile int buffer_ready;'
buffer[BUF_SIZE] in 'char buffer[BUF_SIZE];'
i in 'int i;'

After each full expression:

i=0
i<BUF_SIZE
i++
buffer=0
buffer_ready=1

(Of course the middle three of these potentially occur multiple
times because of the 'for' loop.)

It doesn't make much sense to talk about sequence points
in (or after) declarators at file scope, but these points
are what's specified in the Standard.

Thanks. Again with reference to section 2.1 in this paper:

http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

The author states that "a compiler may not move acceses to volatile
variables across sequence points", yet he also states that "the
compiler is free to move the loop [to] below the store to
buffer_ready". I can't reconcile these two statements - it seems to me
that moving the loop to a position below the store to buffer_ready is
the same as moving the store to buffer_ready to a position above the
loop. The effect is to move the store across sequence points.

Shao Miller · Jun 15, 2011

Please excuse my following up in the wrong place in the thread
and with no attributions. Newsreader glitch.

The example given in section 2.1 of that paper is this (I've had to
retype it so there could be mistakes):

Click to expand...

volatile int buffer_ready;
char buffer[BUF_SIZE];

Click to expand...

void buffer_init() {
int i;
for(i=0; i<BUF_SIZE; i++)
buffer=0;
buffer_ready=1;
}

Click to expand...

Can anyone point out exactly where the sequence points are in that
code snippet?

Click to expand...

After each full declarator:

buffer_ready in 'volatile int buffer_ready;'
buffer[BUF_SIZE] in 'char buffer[BUF_SIZE];'
i in 'int i;'

After each full expression:

i=0
i<BUF_SIZE
i++
buffer=0
buffer_ready=1

(Of course the middle three of these potentially occur multiple
times because of the 'for' loop.)

It doesn't make much sense to talk about sequence points
in (or after) declarators at file scope, but these points
are what's specified in the Standard.

Click to expand...

Thanks. Again with reference to section 2.1 in this paper:

http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

The author states that "a compiler may not move acceses to volatile
variables across sequence points", yet he also states that "the
compiler is free to move the loop [to] below the store to
buffer_ready". I can't reconcile these two statements - it seems to me
that moving the loop to a position below the store to buffer_ready is
the same as moving the store to buffer_ready to a position above the
loop. The effect is to move the store across sequence points.

If you treat sequence points as linear, then I can understand your
confusion.

Please consider:

int i, j;

i = 5;
j = 6;

Since it's known that these objects are distinct and that the side
effects of setting their stored values are independent, I belive that
the compiler is free to choose any order, including "simultaneously."

But sequence points aren't exactly linear... C1X introduced a more
formal sequencing model include "before," "after," "unsequenced,"
"indeterminately sequenced."

Please consider:

int x, y = 3, z = 5;

x = (y, 7) + (z, 11);

There is a sequence point after 'y' is accessed and a sequence point
after 'z' is accessed, but those accesses are "unsequenced" relative to
one another; no order is imposed whatsoever (even "simultaneously").
But both accesses are "sequenced before" the computation of 7 + 11 and,
of course, before the access to 'x'.

So your example's 'for' loop contains side effects which are independent
of 'buffer_ready'. The compiler can move the loop the same as the 'i,
j' example, above. The access to the volatile 'buffer_ready' is still
bounded by the sequence point before (whatever that might be) and by the
sequence point at its semi-colon (at the end of the expression statement).

Phil Carmody · Jun 18, 2011

Noob said:
On a related note, you might be interested in the following
article by John Regehr.

Volatile Bugs, Three Years Later
http://blog.regehr.org/archives/503

I disagreed with some fairly fundamental parts of his paper 3 years ago,
and I still disagree with them now.

Phil

Tim Rentsch · Jun 19, 2011

Please excuse my following up in the wrong place in the thread
and with no attributions. Newsreader glitch.

The example given in section 2.1 of that paper is this (I've had to
retype it so there could be mistakes):

Click to expand...

volatile int buffer_ready;
char buffer[BUF_SIZE];

Click to expand...

void buffer_init() {
int i;
for(i=0; i<BUF_SIZE; i++)
buffer=0;
buffer_ready=1;
}

Click to expand...

Can anyone point out exactly where the sequence points are in that
code snippet?

Click to expand...

After each full declarator:

buffer_ready in 'volatile int buffer_ready;'
buffer[BUF_SIZE] in 'char buffer[BUF_SIZE];'
i in 'int i;'

After each full expression:

i=0
i<BUF_SIZE
i++
buffer=0
buffer_ready=1

(Of course the middle three of these potentially occur multiple
times because of the 'for' loop.)

It doesn't make much sense to talk about sequence points
in (or after) declarators at file scope, but these points
are what's specified in the Standard.

Click to expand...

Thanks. Again with reference to section 2.1 in this paper:

http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

I know this paper! I've read this paper. In fact a couple
of years ago I seem to remember posting a series of articles
in response to this paper. I think it's a good paper,
although it does have some stuff in it that is (take your
pick) somewhere between controversial and wrong.

The author states that "a compiler may not move acceses to volatile
variables across sequence points", yet he also states that "the
compiler is free to move the loop [to] below the store to
buffer_ready". I can't reconcile these two statements - it seems to me
that moving the loop to a position below the store to buffer_ready is
the same as moving the store to buffer_ready to a position above the
loop. The effect is to move the store across sequence points.

Click to expand...

IMO the comment about moving the loop after the volatile store
is, let me put it diplomatically, not exactly right. In fairness
I should add that it isn't absolutely wrong either. There is a
subtlety in the Standard because what constitutes an access to a
volatile object is implementation-defined. If you do a google
groups search for "volatile" "memory" "regime" I think you'll
be able to find my earlier articles on all this. Might also
try google groups search on the url, that might help pin down
the exact time frame.

Noob · Jun 20, 2011

Tim said:
http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

I know this paper! I've read this paper. In fact a couple
of years ago I seem to remember posting a series of articles
in response to this paper.

Did you have the following thread in mind?

From: Tim Rentsch
Newsgroups: comp.lang.c, comp.arch.embedded
Subject: The Semantics of 'volatile'
Date: 01 Jun 2009 16:43:11 -0700

http://groups.google.com/group/comp.lang.c/browse_frm/thread/9c05dfae38459f2c/556cd6a41357b79f

Tim Rentsch · Jun 20, 2011

Noob said:
Did you have the following thread in mind?

From: Tim Rentsch
Newsgroups: comp.lang.c, comp.arch.embedded
Subject: The Semantics of 'volatile'
Date: 01 Jun 2009 16:43:11 -0700

http://groups.google.com/group/comp.lang.c/browse_frm/thread/9c05dfae38459f2c/556cd6a41357b79f

I believe that was one of them, yes.

jstrap42 · Jun 29, 2011

Please excuse my following up in the wrong place in the thread
and with no attributions. Newsreader glitch.
The example given in section 2.1 of that paper is this (I've had to
retype it so there could be mistakes):
volatile int buffer_ready;
char buffer[BUF_SIZE];
void buffer_init() {
int i;
for(i=0; i<BUF_SIZE; i++)
buffer=0;
buffer_ready=1;
}
Can anyone point out exactly where the sequence points are in that
code snippet?
After each full declarator:
buffer_ready in 'volatile int buffer_ready;'
buffer[BUF_SIZE] in 'char buffer[BUF_SIZE];'
i in 'int i;'
After each full expression:
i=0
i<BUF_SIZE
i++
buffer=0
buffer_ready=1
(Of course the middle three of these potentially occur multiple
times because of the 'for' loop.)
It doesn't make much sense to talk about sequence points
in (or after) declarators at file scope, but these points
are what's specified in the Standard.

Click to expand...

Click to expand...

Thanks. Again with reference to section 2.1 in this paper:

http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

Click to expand...

The author states that "a compiler may not move acceses to volatile
variables across sequence points", yet he also states that "the
compiler is free to move the loop [to] below the store to
buffer_ready". I can't reconcile these two statements - it seems to me
that moving the loop to a position below the store to buffer_ready is
the same as moving the store to buffer_ready to a position above the
loop. The effect is to move the store across sequence points.

Click to expand...

If you treat sequence points as linear, then I can understand your
confusion.

Please consider:

int i, j;

i = 5;
j = 6;

Since it's known that these objects are distinct and that the side
effects of setting their stored values are independent, I belive that
the compiler is free to choose any order, including "simultaneously."

But sequence points aren't exactly linear... C1X introduced a more
formal sequencing model include "before," "after," "unsequenced,"
"indeterminately sequenced."

Please consider:

int x, y = 3, z = 5;

x = (y, 7) + (z, 11);

There is a sequence point after 'y' is accessed and a sequence point
after 'z' is accessed, but those accesses are "unsequenced" relative to
one another; no order is imposed whatsoever (even "simultaneously").
But both accesses are "sequenced before" the computation of 7 + 11 and,
of course, before the access to 'x'.

So your example's 'for' loop contains side effects which are independent
of 'buffer_ready'. The compiler can move the loop the same as the 'i,
j' example, above. The access to the volatile 'buffer_ready' is still
bounded by the sequence point before (whatever that might be) and by the
sequence point at its semi-colon (at the end of the expression statement)..- Hide quoted text -

- Show quoted text -

I think I have to view sequence points as linear. If they weren't then
how would one interpret the words 'previous' and 'subsequent' in this:

N869 5.1.2.3 paragraph 2

"Accessing a volatile object, modifying an object,
modifying a file, or calling a function that does any of

those operations are all side effects, which are changes
in the state of the execution environment. Evaluation of an
expression may produce side effects. At certain specified
points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and
no side effects of subsequent evaluations shall have taken
place."

Shao Miller · Jun 29, 2011

Please excuse my following up in the wrong place in the thread
and with no attributions. Newsreader glitch.

Click to expand...

The example given in section 2.1 of that paper is this (I've had to
retype it so there could be mistakes):

Click to expand...

volatile int buffer_ready;
char buffer[BUF_SIZE];

Click to expand...

void buffer_init() {
int i;
for(i=0; i<BUF_SIZE; i++)
buffer=0;
buffer_ready=1;
}

Click to expand...

Can anyone point out exactly where the sequence points are in that
code snippet?

Click to expand...

After each full declarator:

Click to expand...

buffer_ready in 'volatile int buffer_ready;'
buffer[BUF_SIZE] in 'char buffer[BUF_SIZE];'
i in 'int i;'

Click to expand...

After each full expression:

i=0
i<BUF_SIZE
i++
buffer=0
buffer_ready=1

Click to expand...

(Of course the middle three of these potentially occur multiple
times because of the 'for' loop.)

Click to expand...

It doesn't make much sense to talk about sequence points
in (or after) declarators at file scope, but these points
are what's specified in the Standard.

Click to expand...

Thanks. Again with reference to section 2.1 in this paper:

http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

Click to expand...

The author states that "a compiler may not move acceses to volatile
variables across sequence points", yet he also states that "the
compiler is free to move the loop [to] below the store to
buffer_ready". I can't reconcile these two statements - it seems to me
that moving the loop to a position below the store to buffer_ready is
the same as moving the store to buffer_ready to a position above the
loop. The effect is to move the store across sequence points.

Click to expand...

If you treat sequence points as linear, then I can understand your
confusion.

Please consider:

int i, j;

i = 5;
j = 6;

Since it's known that these objects are distinct and that the side
effects of setting their stored values are independent, I belive that
the compiler is free to choose any order, including "simultaneously."

But sequence points aren't exactly linear... C1X introduced a more
formal sequencing model include "before," "after," "unsequenced,"
"indeterminately sequenced."

Please consider:

int x, y = 3, z = 5;

x = (y, 7) + (z, 11);

There is a sequence point after 'y' is accessed and a sequence point
after 'z' is accessed, but those accesses are "unsequenced" relative to
one another; no order is imposed whatsoever (even "simultaneously").
But both accesses are "sequenced before" the computation of 7 + 11 and,
of course, before the access to 'x'.

So your example's 'for' loop contains side effects which are independent
of 'buffer_ready'. The compiler can move the loop the same as the 'i,
j' example, above. The access to the volatile 'buffer_ready' is still
bounded by the sequence point before (whatever that might be) and by the
sequence point at its semi-colon (at the end of the expression statement).

Click to expand...

I think I have to view sequence points as linear. If they weren't then
how would one interpret the words 'previous' and 'subsequent' in this:

N869 5.1.2.3 paragraph 2

"Accessing a volatile object, modifying an object,
modifying a file, or calling a function that does any of
those operations are all side effects, which are changes
in the state of the execution environment. Evaluation of an
expression may produce side effects. At certain specified
points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and
no side effects of subsequent evaluations shall have taken
place."

"Previous" and "subsequent" relative to data dependencies, perhaps?
Further to your point, we have:

"A statement specifies an action to be performed. Except as
indicated, statements are executed in sequence."

But that's the abstract machine and not the actual machine.

My understanding is that sequence points are not _strictly_ linear. In
the 'x = ...' example above, which of the two comma operator sequence
points comes first? Which is the subsequent sequence point?

Underneath "The least requirements on a conforming implementation are",
we can note a change in wording, from the C89:

"At sequence points, volatile objects are stable in the sense that
previous evaluations are complete and subsequent evaluations have not
yet occurred."

to the N1256 (C99):

"At sequence points, volatile objects are stable in the sense that
previous accesses are complete and subsequent accesses have not yet
occurred."

The former _might_ be interpreted to mean "previous evaluations
involving the object". The latter certainly seems clearer about that.

Then it's changed again in N1494 (C1X) to:

"Accesses to volatile objects are evaluated strictly according to the
rules of the abstract machine."

How about that.

Seebs · Jun 29, 2011

I think I have to view sequence points as linear. If they weren't then
how would one interpret the words 'previous' and 'subsequent' in this:

Shao was apparently talking about C1X, which introduced a new and fancier
model for this.

That said... In plain C, you can end up with several sequence points
which have no obvious ordering with respect to each other, at which point
I think "previous" and "subsequent" are perhaps best understood in terms of
execution time.

Consider something like:
static int x, y, z;

(x++ || x++) | (y++ || y++) | (z++ || z++);
or
(x++ || x++) | (x++ || x++) | (x++ || x++);

It doesn't matter WHICH of the sequence points introduced by the ||s
happens first or second or third. What matters is that those sequence
points definitely occur in *some* order. Unfortunately for the second
line of code, there's nothing that prevents the compiler from picking:
x++
x++
x++
sequence-point
sequence-point
sequence-point
[...]

At which point it's pretty clear that we have undefined behavior. But
there is still a middle sequence point, compared to which one of the others
is previous and the other is next. But the first one, so far as I can
tell, is fine, and sets everything to 2. No matter how you shuffle things
around, the two modifications of any *specific* object are always separated
by a sequence point.

-s

Optimization correct?	7	Oct 27, 2010
Optimizing Away	26	Nov 15, 2006
Question about C semantics/optimizing C compilers	9	Jul 1, 2009
Non-volatile compiler optimizations	6	May 12, 2010
Tips on optimizing these functions	2	Sep 27, 2008
Is there a way to flush registers and tell the C compiler to refill?	8	Mar 19, 2009
Win32 and modules	1	Sep 21, 2009
compiler reaction to wrong struct member name	2	Sep 3, 2009

compiler optimizing wrong?

Dirk Zabel

Noob

Leo Havmøller

Thad Smith

Dirk Zabel

jstrap42

Chris H

Noob

Shao Miller

Chris H

Tim Rentsch

jstrap42

Shao Miller

Phil Carmody

Tim Rentsch

Noob

Tim Rentsch

jstrap42

Shao Miller

Seebs

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads