volatile and sequence points in (GASP) multithreaded programs

Dave Rahardja · Jul 2, 2007

I'm trying to understand when exactly the compiler is allowed to elide access
to a variable.

Here is a common mistake made by a program waiting for a value that is
modified by an external entity:

bool flag;
while (!flag)
{
// Idle
}

In this case, the compiler is free to elide the access to flag, and replace
the while loop with an infinite loop.

Turning flag into a volatile variable, however, turns all accesses to flag
into observable behavior (§1.9-6), and prevents the compiler from eliding
access to the variable.

However, I'm confused about the relationship between observable behavior and
sequence points (§1.9-7). What does it mean that all side effects before a
sequence point is completed before the next evaluation is performed? Does it
mean that inserting a function call such as:

void foo(); // Defined in another compilation unit
bool flag;
while (!flag)
{
foo();
}

Forces the compiler to re-read flag at the head of each loop iteration?

My thought is that the compiler is /technically/ still allowed to elide the
access to flag, because its behavior is still not observable. However, for
practical purposes, a compiler will not know what side effect foo() may have
(including modifying flag), so unless it performs whole-program optimization,
it cannot elide access to flag.

This line of questioning comes from my experiments with multithreading, which
is not defined by the standard. I'm trying to determine exactly when I need to
use a volatile keyword, say when accessing data members in one object via
multiple threads. Even after synchronizing access to member variables (another
off-topic operation), I'm still not sure if my reads and writes to member
variables are liable to get optimized away. However, §1.9-7 seems to imply
that modifying an object defines a sequence point, and in all likelihood a
compiler will never optimize such operations away.

Can someone point me to a reliable guide on how to correctly manage
optimizations for multithreaded programs? Or do I simply have to declare all
variables accessed by multiple threads as volatile?

-dr

Old Wolf · Jul 2, 2007

However, I'm confused about the relationship between observable behavior and
sequence points (§1.9-7). What does it mean that all side effects before a
sequence point is completed before the next evaluation is performed?

How is that unclear? (the meaning of "side effects" is
precisely defined by the standard).

Does it mean that inserting a function call such as:

void foo(); // Defined in another compilation unit
bool flag;
while (!flag)
{
foo();
}

Forces the compiler to re-read flag at the head of each loop iteration?

I guess you mean, does it prevent the compiler from
optimising away the read of 'flag'. Supposing that
the compiler doesn't do optimization involving the
definition of foo() then the answer is 'yes'. foo()
might contain code such as:
extern bool flag;
flag = 1;

If flag were static then the loop condition check
could again be optimised away.

My thought is that the compiler is /technically/ still allowed to elide the
access to flag, because its behavior is still not observable.

'flag' doesn't have behaviour. I guess you mean
that changes of the value of 'flag' do not
constitute "observable behaviour", which is correct.

The compiler can generate whatever code it likes,
as long as the observable behaviour produced
matches what is specified by the C standard.
"observable behaviour" is strictly defined by
a section of the standard.

Even after synchronizing access to member variables
(another off-topic operation), I'm still not sure if
my reads and writes to member variables are liable
to get optimized away.

Well, if you have code such as:
LockFoo();
foo = 4;
UnlockFoo();

even if foo is volatile, the compiler can re-order so
that the modification is outside the locks, if it knows
that the locks don't access foo.

Which is why if you want to do multithreaded
programming you need to have support from the
compiler vendor, which may provide synchronization
primitives and more importantly, documentation about
how to do such programming. You should probably post
followup questions on the newsgroup of the compiler
or thread package that you are using.

James Kanze · Jul 2, 2007

I'm trying to understand when exactly the compiler is allowed to elide access
to a variable.

Here is a common mistake made by a program waiting for a value
that is modified by an external entity:

bool flag;
while (!flag)
{
// Idle
}

In this case, the compiler is free to elide the access to
flag, and replace the while loop with an infinite loop.

Turning flag into a volatile variable, however, turns all
accesses to flag into observable behavior (§1.9-6), and
prevents the compiler from eliding access to the variable.

However, I'm confused about the relationship between
observable behavior and sequence points (§1.9-7). What does it
mean that all side effects before a sequence point is
completed before the next evaluation is performed?

Just what it says. But without some additional guarantees, the
above can still be an infinite loop. The standard doesn't
define what constitutes and "access", but rather leaves it
implementation defined. And at least on the most common
systems, the compilers I'm familiar with don't define access in
a way that will ensure that a thread sees modifications made to
the variable in another thread.

Of course, in your example, there is no observable behavior, so
the compiler can pretty much do whatever it wants.

Does it
mean that inserting a function call such as:

void foo(); // Defined in another compilation unit
bool flag;
while (!flag)
{
foo();
}

Forces the compiler to re-read flag at the head of each loop
iteration?
No.

My thought is that the compiler is /technically/ still allowed
to elide the access to flag, because its behavior is still not
observable. However, for practical purposes, a compiler will
not know what side effect foo() may have (including modifying
flag), so unless it performs whole-program optimization, it
cannot elide access to flag.

Typically, the compiler will generate different code depending
on the level of optimization, yes. That's about all you can say
here.

This line of questioning comes from my experiments with
multithreading, which is not defined by the standard. I'm
trying to determine exactly when I need to use a volatile
keyword, say when accessing data members in one object via
multiple threads.

Volatile has no defined meaning in a multi-threaded environment.
At least with the compilers I use (Sun CC, g++ and VC++), you
can forget it. It makes no difference with regards to the
guarantees you have. (That's not quite true; the guarantees are
adequate for code running on a single core machine. But do any
of those still exist?)

Even after synchronizing access to member variables (another
off-topic operation), I'm still not sure if my reads and
writes to member variables are liable to get optimized away.
However, §1.9-7 seems to imply that modifying an object
defines a sequence point, and in all likelihood a compiler
will never optimize such operations away.

The end of a full expression is a sequence point. Accessing a
volatile variable is observable behavior.

Can someone point me to a reliable guide on how to correctly manage
optimizations for multithreaded programs?

Your compiler and OS documentation. If the compiler is Posix
conformant, the Posix standards. (I don't know where there is a
similar specification for Windows, but one probably exists
somewhere.)

Or do I simply have to declare all variables accessed by
multiple threads as volatile?

As I said above, volatile has no meaning in a multithreaded
environment. Posix does define a set of rules for C, which, by
extention, I would expect a Posix compliant C++ compiler to
respect. Presumably, Windows does the same---maybe they even
define C++ directly, I don't know. The standards committee is
currently working on a set of rules for C++, largely based on
the Posix rules (I think). In no case do you *have* to declare
anything volatile. You do have to obey other synchronization
rules, however---Posix defines a small set of functions which
ensure synchronization, and if any thread modifies an object,
the behavior is undefined unless all accesses in all threads are
protected by these functions. Anything else probably requires
inline assembler.

Old Wolf · Jul 3, 2007

On Jul 2, 1:03 am, Dave Rahardja

No.

Well, actually it causes undefined behaviour by
reading an uninitialized variable, so technically
the compiler is not forced to do anything

But supposing that 'flag' were initialized to false,
I think it does force the compiler to re-evaluate
flag because foo may change the value of flag, which
has external linkage.

Note, it might not do a physical read of flag if
it optimises across translation units, but I think
the thrust of the question was: is the above code
always an infinite loop, or could the condition
fail eventually?

Dave Rahardja · Jul 3, 2007

Well, actually it causes undefined behaviour by
reading an uninitialized variable, so technically
the compiler is not forced to do anything

But supposing that 'flag' were initialized to false,
I think it does force the compiler to re-evaluate
flag because foo may change the value of flag, which
has external linkage.

Note, it might not do a physical read of flag if
it optimises across translation units, but I think
the thrust of the question was: is the above code
always an infinite loop, or could the condition
fail eventually?

The fact that there is no standards-driven way to suppress optimization of
access to a variable is unfortunate. How do people even begin to write
multithreaded code with C++? Especially embedded code, where OS
synchronization is relatively expensive, and often unnecessary, given a
well-designed sequence of reads and writes?

So here begins my musings and observations based on compilers I have used:

<offtopic>
I suspect "most" compilers (and certainly all of the ones that I've worked
with) will guarantee that reads and writes to a volatile POD variable will
result in an actual machine-code access to that variable's memory location,
and in the order shown in the source code (we'll ignore pipelined memory
access for now).

Access of a volatile non-POD object does nothing special, except to bind their
calls to member functions marked volatile.

I suspect "most" compilers will not reorder variable access across non-inlined
function calls, because "most" compilers do not perform multi-unit
optimization (or such optimizations can be turned off), and have no idea what
the side effects of those functions are.

I suspect "most" compilers will cause all side-effects of a non-inlined
function to be actually performed (memory actually changed) by the time the
function returns.

In summary, it seems that "most" compilers treat volatile POD access and
non-inlined function calls as observable behavior relative to the thread it's
running on.
</offtopic>

Of course all of these suspicions are purely figments of my imagination, which
any compiler vendor is free to ignore.

I hope they address this issue in the next standard. It certainly would make
C++ a multithreaded (and multiprocessor) language. Besides, most compilers
already support some sort of multithreading facility.

-dr

James Kanze · Jul 3, 2007

Well, actually it causes undefined behaviour by
reading an uninitialized variable, so technically
the compiler is not forced to do anything

But supposing that 'flag' were initialized to false,
I think it does force the compiler to re-evaluate
flag because foo may change the value of flag, which
has external linkage.

First, of course, I see nothing in the above to suggest that
flag may have external linkage. But my point was really only
that the compiler is not required to re-evaluate the expression
per se; all that is required is that the observable behavior be
respected. In practice, a lot of compilers today optimize
beyond compilation units, and the compiler will be able to see
whether foo modifies flag or not. If foo doesn't modify flag,
the compiler is free to turn the loop into an endless loop, and
even if foo does modify flag, the compiler must only ensure that
the loop ends when it sets flag to true. Conceivably, for
example (although I don't know of any compiler which does so),
it could some how patch up the return address on the stack.

Note, it might not do a physical read of flag if
it optimises across translation units, but I think
the thrust of the question was: is the above code
always an infinite loop, or could the condition
fail eventually?

I'm not really sure what the thrust of the question was. The
subject line spoke of volatile and sequence points. At one
point, too the poster asked about threads. So I'm not too sure
what information he's really looking for: if threads are
involved, for example: volatile normally has no thread relevant
semantics, unless the compiler chooses to give it some, and
sequence points aren't defined with respect to threads.

James Kanze · Jul 3, 2007

On Jul 3, 5:55 am, Dave Rahardja

[...]

The fact that there is no standards-driven way to suppress
optimization of access to a variable is unfortunate.

That's certainly the intent of "volatile". But it doesn't help
much where threads are concerned. The standard doesn't (and for
various reasons, can't) define what it means by access---for
that matter, you don't say what you mean either. It leaves it
up to the implementation to provide a useful definition (which
many don't) of access, and the expression of intent.

How do people even begin to write
multithreaded code with C++?

By using the guarantees of the system. The C++ compilers I use
under Solaris claim (guarantee) Posix conformance. Under Linux,
more or less, too. (It's difficult to claim Posix conformance
when the underlying system isn't Posix conform, but recent
versions of Linux, with recent versions of g++, are Posix
conform with regards to threading and the thread primitives.)

And optimization really doesn't have much to do with the issue.
You can break thread safety with some optimizations, but that's
a question of specification: Posix forbids certain optimizations
in certain circumstances, not per se, but by giving a certain
number of guarantees which a Posix conformant system must
respect. And turning off all optimization doesn't guarantee
thread safety either. Given something like:

// Global variables...
int a = 0 ;
int b = 1 ;

// In some thread...
a = 2 ;
b = 3 ;

even without the slightest optimization (the compiler
strictly implementing the semantics of the abstract machine,
with no application of the "as if" rule), another thread can see
a == 0 then b == 3.

IMHO, the intent of volatile would be that if both variables
were declared volatile, then this could not happen. In
practice, however, that's not how modern compilers define an
"access"---at least for Sun CC and g++ (and the current version
of VC++), an access only means the emission of a machine
instruction to read or write the object, and says nothing about
when or even whether the cycles occur on the bus, or when they
become visible to another thread.

Especially embedded code, where OS
synchronization is relatively expensive, and often unnecessary, given a
well-designed sequence of reads and writes?

Compilers for simple embedded processors will typically
(hopefully?) define volatile so that it does something useful.
Simple embedded processors also usually have simple hardware
memory models---not the five or six levels of
pipeline/cache/etc. that a general purpose processor has, so
that a store instruction directly generates a write cycle on the
bus (and thus, that simply not optimizing is sufficient). Never
the less: when communicating between interupt routines and the
main thread, or between processors, you need to study the
documentation of both the compiler and the hardware very, very
carefully.

So here begins my musings and observations based on compilers I have used:

<offtopic>
I suspect "most" compilers (and certainly all of the ones that I've worked
with) will guarantee that reads and writes to a volatile POD variable will
result in an actual machine-code access to that variable's memory location,
and in the order shown in the source code (we'll ignore pipelined memory
access for now).

You still haven't defined "access"---does an access have to
access "main memory" (whatever that is), or are bus cycles to
the processor local cache sufficient. And how can you ignore
something as universal as pipelined memory.

Access of a volatile non-POD object does nothing special,
except to bind their calls to member functions marked
volatile.

I suspect "most" compilers will not reorder variable access
across non-inlined function calls, because "most" compilers do
not perform multi-unit optimization (or such optimizations can
be turned off), and have no idea what the side effects of
those functions are.

I think you're wrong here. Off hand, I don't know of any
compiler that doesn't do multi-unit optimization when requested.
I think even g++ supports this; Sun CC and VC++ certainly do.

I suspect "most" compilers will cause all side-effects of a
non-inlined function to be actually performed (memory actually
changed) by the time the function returns.

That's certainly not the case for any of the Sparc compilers I
use, even without cross-module optimization. As soon as they
can prove that the variable can't be accessed from another
translation unit, they leave it in a register. Intel compilers
are less agressive here, because they have less registers to
play with, and all functions share all of the registers.

In summary, it seems that "most" compilers treat volatile POD
access and non-inlined function calls as observable behavior
relative to the thread it's running on.

That's the key. Relative to the thread it's running on. That's
required by the standard.

</offtopic>

Of course all of these suspicions are purely figments of my
imagination, which any compiler vendor is free to ignore.

I hope they address this issue in the next standard.

Threading and atomic access are being studied, and it's a pretty
safe bet that the next version of the standard will define
program behavior in a multithreaded environment.

It certainly would make C++ a multithreaded (and
multiprocessor) language. Besides, most compilers already
support some sort of multithreading facility.

Yes. But not necessarily at the granularity you seem to be
requesting. The ones I know are Posix compliant, but as I said,
Posix says that behavior is undefined unless either 1) only one
thread ever accesses the object, 2) no thread ever modifies the
object, or 3) the correct synchronization primitives (e.g.
pthread_mutex_lock) are used. None of the compilers I know have
any support for lower level synchronization.

=?iso-8859-1?q?Kirit_S=E6lensminde?= · Jul 3, 2007

<offtopic>
I suspect "most" compilers (and certainly all of the ones that I've worked
with) will guarantee that reads and writes to a volatile POD variable will
result in an actual machine-code access to that variable's memory location,
and in the order shown in the source code (we'll ignore pipelined memory
access for now).

I don't know what the compilers may or may not g'tee, but certainly
for MSVC 7.1 on a dual core machine you cannot expect a flag set in
one thread to be immediatly readable with the same value in another
whether you use volatile or not.

You need to use a synchronisation primitive (mutex, critical section
etc.) or use an interlocked counter or a semaphore or write using
assembly instructions that are guaranteed by the chip vendor.
Basically volatile is pretty useless for multi-threaded code.

K

Dave Rahardja · Jul 5, 2007

First of all, thank you for all the responses I received on this probably
offtopic rant of mine.

On Jul 3, 5:55 am, Dave Rahardja

[...]

The fact that there is no standards-driven way to suppress
optimization of access to a variable is unfortunate.

Click to expand...

That's certainly the intent of "volatile". But it doesn't help
much where threads are concerned. The standard doesn't (and for
various reasons, can't) define what it means by access---for
that matter, you don't say what you mean either. It leaves it
up to the implementation to provide a useful definition (which
many don't) of access, and the expression of intent.

For threading purposes, my definition of access depends on whether I am on a
multiprocessor system. On a uniprocessor, it is sufficient to hit the cache.
On a multiprocessor system, it is necessary to hit shared main memory for
communications to occur.

By using the guarantees of the system. The C++ compilers I use
under Solaris claim (guarantee) Posix conformance. Under Linux,
more or less, too. (It's difficult to claim Posix conformance
when the underlying system isn't Posix conform, but recent
versions of Linux, with recent versions of g++, are Posix
conform with regards to threading and the thread primitives.)

This is unfortunate, because many embedded compiler vendors don't guarantee
anything of this nature. Perhaps a call to tech support will help.

And optimization really doesn't have much to do with the issue.
You can break thread safety with some optimizations, but that's
a question of specification: Posix forbids certain optimizations
in certain circumstances, not per se, but by giving a certain
number of guarantees which a Posix conformant system must
respect. And turning off all optimization doesn't guarantee
thread safety either. Given something like:

// Global variables...
int a = 0 ;
int b = 1 ;

// In some thread...
a = 2 ;
b = 3 ;

even without the slightest optimization (the compiler
strictly implementing the semantics of the abstract machine,
with no application of the "as if" rule), another thread can see
a == 0 then b == 3.

Yes. Obviously any compiler guarantees cannot eliminate logic errors or race
conditions built into the code. However, in certain embedded systems it is
often possible to implement race-free, non-interlocked interthread
communications by carefully sequencing shared memory access. For instance, a
high-speed ISR or thread can push values onto a queue while a slower polling
thread can pop values off the queue without requiring synchronization, as long
as we are careful about pointer access, and making certain operations atomic
(usually by momentarily disabling interrupts).

Compilers for simple embedded processors will typically
(hopefully?) define volatile so that it does something useful.
Simple embedded processors also usually have simple hardware
memory models---not the five or six levels of
pipeline/cache/etc. that a general purpose processor has, so
that a store instruction directly generates a write cycle on the
bus (and thus, that simply not optimizing is sufficient). Never
the less: when communicating between interupt routines and the
main thread, or between processors, you need to study the
documentation of both the compiler and the hardware very, very
carefully.

I figured that out in a hurry! However, some embedded compiler vendors
notoriously omit the guarantees that are required for such fine-grained
control of access.

You still haven't defined "access"---does an access have to
access "main memory" (whatever that is), or are bus cycles to
the processor local cache sufficient. And how can you ignore
something as universal as pipelined memory.

Pipelined memory introduces latency that must be managed only for physical
access. For instance, on certain PowerPC implementations the order in which
read and write instructions are executed can be reversed in hardware due to
the separate read and write pipelines. Such behavior will usually ruin device
drivers that hit hardware registers, and will require an additional
synchronization operator between instructions that depend on sequenced access
to physical hardware. However, for non-hardware access, pipelined memory
access becomes a nonissue, as the CPU hardware will usually guarantee that the
core will see access "as written", even when the actual hardware operations
are delayed.

I think you're wrong here. Off hand, I don't know of any
compiler that doesn't do multi-unit optimization when requested.
I think even g++ supports this; Sun CC and VC++ certainly do.

You're right. I'd have to defeat multi-unit optimization to guarantee my
assumption.

Yes. But not necessarily at the granularity you seem to be
requesting. The ones I know are Posix compliant, but as I said,
Posix says that behavior is undefined unless either 1) only one
thread ever accesses the object, 2) no thread ever modifies the
object, or 3) the correct synchronization primitives (e.g.
pthread_mutex_lock) are used. None of the compilers I know have
any support for lower level synchronization.

However, consider this code:

volatile int sharedData = 0;

extern pthread_mutex_t* mutex;

struct MutexLock
{
MutexLock(pthread_mutex_t* mutex):
mutex(mutex)
{
pthread_mutex_lock(mutex);
}

~MutexLock()
{
pthread_mutex_unlock(mutex);
}

private:
pthread_mutex_t* const mutex;
};

void threadA()
{
for (;

{
{
MutexLock(mutex);
sharedData = 1;
}
sleep(10); // OS call to cause thread to sleep
}
}

void threadB()
{
for (;

{
{
MutexLock(mutex);
if (sharedData == 1)
{
break;
}
}
sleep(10);
}
// Code continues...
}

Let's say threadA runs in one thread while threadB runs in another. How does
Posix sematincs guarantee that the compiler does not elide the reads or writes
to sharedData? After all, relative to each thread, sharedData cannot possibly
affect the flow of control.

-dr

sequence points	10	Oct 14, 2007
Help with post-increment and sequence points	6	Oct 20, 2011
Question on using volatile in a return type	3	Dec 28, 2008
The Semantics of 'volatile'	73	Jun 2, 2009
No way to add volatile access?	23	Aug 31, 2010
sequence points and printf()	17	Dec 1, 2011
sequence points and expression evaluation	10	Jul 2, 2010
bit operations and sequence points	2	Sep 5, 2006

volatile and sequence points in (GASP) multithreaded programs

Dave Rahardja

Old Wolf

James Kanze

Old Wolf

Dave Rahardja

James Kanze

James Kanze

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Dave Rahardja

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads