Non-volatile compiler optimizations

N

Noob

Hello,

I'm trying to understand volatile. I have written trivial code
where a variable is tested to decide whether to break early out
of a loop.

I've considered 4 different cases.
auto variable
static with function scope
static with file scope
external linkage

AFAIU, since the abstract machine model is single threaded,
if a variable is not volatile, then its value cannot change
inside the loop. (Signals might be an exception?)

Thus, in every case, the compiler is allowed to remove the
test, because it may assume that the variable cannot change.

I tested with gcc -O3
In autovar, staticfuncvar, and filescopevar gcc removed the test.
In globalvar, gcc did not remove the test.

Is it unsafe for the compiler to assume that global's value
cannot change between two iterations of the loop?

(Signals might play a role here. A different "module" might
catch a signal, and change the value of the variable inside
the signal handler?)

If so, does that mean that volatile is not needed in the case
of a variable with external linkage?

Regards.


void foo(void);
void autovar(void)
{
int i;
int local = 0;
for (i = 0; i < 1000; ++i)
{
if (local) break;
foo();
}
}
void staticfuncvar(void)
{
int i;
static int local = 0;
for (i = 0; i < 1000; ++i)
{
if (local) break;
foo();
}
}
static int flocal = 0;
void filescopevar(void)
{
int i;
for (i = 0; i < 1000; ++i)
{
if (flocal) break;
foo();
}
}
int global = 0;
void globalvar(void)
{
int i;
for (i = 0; i < 1000; ++i)
{
if (global) break;
foo();
}
}
 
T

Tom St Denis

Hello,

I'm trying to understand volatile. I have written trivial code
where a variable is tested to decide whether to break early out
of a loop.

I've considered 4 different cases.
auto variable
static with function scope
static with file scope
external linkage

AFAIU, since the abstract machine model is single threaded,
if a variable is not volatile, then its value cannot change
inside the loop. (Signals might be an exception?)

C spec doesn't talk to threads so the compiler only needs to read the
object when it's logically required from a single thread point of
view. 'volatile' forces the compiler to read the object [or write it]
whenever such expressions occur in the program.

e.g.

int a = 4;
if (a == 4) { ... }

The compiler doesn't have to read 'a' the second time (during the
test), doesn't have to, but could if it wants. So the fact you see
different behaviour is not surprising.

Tom
 
J

John Regehr

int global = 0;
void globalvar(void)
{
   int i;
   for (i = 0; i < 1000; ++i)
   {
     if (global) break;
     foo();
   }

}

The issue here is that global could be changed after it is initialized
but before the function executes. That is why the loop is not
optimized away. It has nothing to do with signals. If you want to
use a variable for communication between signals/threads it must be
volatile, global is not enough.
 
B

Ben Bacarisse

John Regehr said:
The issue here is that global could be changed after it is initialized
but before the function executes. That is why the loop is not
optimized away. It has nothing to do with signals. If you want to
use a variable for communication between signals/threads it must be
volatile, global is not enough.

Even volatile is probably not enough. In the presence of interrupts, C
only makes sufficient guarantees about objects of type sig_atomic_t. As
for threads, I think the combined wisdom over on comp.programming.
threads is that, on modern hardware, the semantics of volatile are not
enough to do most things that you might want it to do (obviously it does
something, just not enough to be useful).

A volatile shared object *may* be enough, but that will be because of
the way some particular system works.

The new standard (C1x) intends to address these deficiencies.
 
N

Noob

John said:
The issue here is that global could be changed after it is initialized
but before the function executes. That is why the loop is not
optimized away.

Thanks to all for nudging me in the right direction.

AFAIU, gcc does not optimize the if-statement because foo
might modify global.

If foo did not modify global, IPA might enable gcc to
optimize the if-statement away.

http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flto-812

$ cat global.c
extern void foo(void);
extern int global;
void globalvar(void)
{
int i;
global = 0;
for (i = 0; i < 1000; ++i)
{
if (global) break;
foo();
}
}

NB: global is initialized just before entering the loop.

$ gcc -O3 -fomit-frame-pointer -S global.c

_globalvar:
pushl %ebx
xorl %ebx, %ebx
subl $8, %esp
movl $0, _global
L2:
call _foo
cmpl $999, %ebx
je L5
movl _global, %eax
addl $1, %ebx
testl %eax, %eax
je L2
L5:
addl $8, %esp
popl %ebx
ret

Two unrelated comments.

1) The register allocation seems sub-optimal. I would have
used ecx (scratch register) instead of ebx.

2) Why is gcc allocating 8 octets on the stack?
It has nothing to do with signals.

It seems the compiler would have to consider this possibility?
An (asynchronous) signal handler might modify global at any
time during the execution of the loop?
If you want to use a variable for communication between
signals/threads it must be volatile, global is not enough.

volatile, yes.

And sig_atomic_t (which is an int on my platform).

Regards.
 
A

Alan Curry

_globalvar:
pushl %ebx
xorl %ebx, %ebx
subl $8, %esp
movl $0, _global
L2:
call _foo
cmpl $999, %ebx
je L5
movl _global, %eax
addl $1, %ebx
testl %eax, %eax
je L2
L5:
addl $8, %esp
popl %ebx
ret

Two unrelated comments.

1) The register allocation seems sub-optimal. I would have
used ecx (scratch register) instead of ebx.

But the function foo is allowed to clobber %ecx, so you'd have to save %ecx
before calling foo, and restore it afterward. That would be 1000 saves and
restores instead of 1.
2) Why is gcc allocating 8 octets on the stack?

stack alignment. Together with the 4 bytes of %ebx and 4 byte return address
already on the stack, that makes 16 bytes, which is a nice round number. It's
considered polite (a.k.a. "an ABI requirement") to make sure the offset from
the stack pointer you received at the start of your function to the stack
pointer you give to a called function is a nice round number.

If all functions in a program obey that rule, and the initial stack pointer
was a nice round number, then every function gets an aligned stack pointer,
which is easier than making each function responsible for checking the
incoming stack pointer's low bits before using it as a base for local
variables.
 
N

Noob

Alan said:
But the function foo is allowed to clobber %ecx, so you'd have to save %ecx
before calling foo, and restore it afterward. That would be 1000 saves and
restores instead of 1.

Your explanation makes perfect sense.
stack alignment. Together with the 4 bytes of %ebx and 4 byte return address
already on the stack, that makes 16 bytes, which is a nice round number. It's
considered polite (a.k.a. "an ABI requirement") to make sure the offset from
the stack pointer you received at the start of your function to the stack
pointer you give to a called function is a nice round number.

I had forgotten about stack alignment.

Thanks for the insight.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top