The Semantics of 'volatile'

MikeWhy · Jun 10, 2009

Tim Rentsch said:
Chris M. Thomasson said:

John Devereux said:

John Devereux wrote:

FreeRTOS.org wrote:

I once wrote an article on compiler validation for safety critical
systems. In return somebody sent me a paper they had published
regarding different compilers implementation of volatile. I forget
the numbers now, but the conclusion of their paper was that most
compilers don't implement it correctly anyway!

If it was the same one posted here a few months ago, it started out
with
a very basic false assumption about what volatile *means*. Casting
the
rest of the paper into doubt as far as I can see.

Are you both referring to the following paper?

"Volatiles Are Miscompiled, and What to Do about It"
http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

That is the one I meant. Their first example (2.1) is wrong I think:

======================================================================

volatile int buffer_ready;
char buffer[BUF_SIZE];
void buffer_init() {
int i;
for (i=0; i<BUF_SIZE; i++)
buffer = 0;
buffer_ready = 1;
}

"The for-loop does not access any volatile locations, nor does it
perform any side-effecting operations. Therefore, the compiler is free
to move the loop below the store to buffer_ready, defeating the
developer's intent."

======================================================================

The problem is that the compiler is *not* free to do this (as far as I
can see). Surely clearing the buffer *is* a side effect?

The example is meant to illustrate "what does volatile mean". If it
does
not mean what they think it does, the other claims seem suspect.

Click to expand...

The code is totally busted if your on a compiler that does not
automatically
insert a store-release memory barrier before volatile stores, and
load-acquire membars after volatile loads. I assume another thread will
eventually try to do something like:

int check_and_process_buffer() {
if (buffer_ready) {
/* use buffer */
return 1;
}
return 0;
}

AFAICT, MSVC 8 and above is the only compiler I know about that
automatically inserts membars on volatile accesses:

http://groups.google.com/group/comp.lang.c/msg/54d730b2650c996c

Otherwise, you would need to manually insert the correct barriers for a
particular architecture. Here is a portable version for Solaris that will
work on all arch's support by said OS:

#include <atomic.h>

volatile int buffer_ready;
char buffer[BUF_SIZE];
void buffer_init() {
int i;
for (i=0; i<BUF_SIZE; i++)
buffer = 0;
membar_producer();
buffer_ready = 1;
}

int check_and_process_buffer() {
if (buffer_ready) {
membar_consumer();
/* use buffer */
return 1;
}
return 0;
}

Click to expand...

Again, thank you for posting some excellent specific examples.

I would like to add one comment. Despite the differences, both
the MSVC 8 implementation and the Solaris implementations can
be conforming. The reason is the last sentence in 6.7.3 p 6,

What constitutes an access to an object that has
volatile-qualified type is implementation-defined.

Presumably the MSVC implementors and the Solaris implementors
reached different conclusions about how to define what
constitutes an access to a volatile-qualified object. Or, to put
that in the language I used earlier, what memory regime will be
aligned to under 'volatile'. It's possible, for example, that
the Solaris notion of volatile makes it work with some thread
implementations but not inter-process communication (or other,
differently implemented thread packages). (I'm only guessing
here; certainly I wouldn't call myself a Solaris expert.) In
any case, whichever choice is "better", both are allowed under
6.7.3 (provided of course the implementation-defined choice is
documented with the implementation).

I think you still missed the relevance and the point. Set aside the
implementation-defined part and its language for the moment. This specific
example points out that optimization takes place in both the hardware and
the compiler. The processor re-orders memory access, just as the compiler's
optimizations can as well. MEMBAR before and after volatile access enforces
at the hardware level that the specified operation order is maintained. In
other words -- that is, in the language of the standard -- it maintains the
state of the abtract machine to what the developer wrote. There doesn't seem
to me to be much room for interpretation. It would be instructive to review
the standard with this in mind as a specific, concrete example of what
implementation-defined might mean in context of volatile.

Chris M. Thomasson · Jun 11, 2009

Chris M. Thomasson said:
[...]
Otherwise, you would need to manually insert the correct barriers for a
particular architecture. Here is a portable version for Solaris that will
work on all arch's support by said OS:

#include <atomic.h>

volatile int buffer_ready;

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

volatile int buffer_ready = 0;

of course!

;^)

char buffer[BUF_SIZE];
void buffer_init() {
int i;
for (i=0; i<BUF_SIZE; i++)
buffer = 0;
membar_producer();
buffer_ready = 1;
}

int check_and_process_buffer() {
if (buffer_ready) {
membar_consumer();
/* use buffer */
return 1;
}
return 0;
}

Now, there is another issue. On NUMA systems with a non-cache-coherent
network of processing nodes, the code above might not work. One may need to
issue a special instruction in order to force the store issued on
`buffer_ready' to propagate from the intra-node level, up to the inter-node
level. Think if `check_and_process_buffer()' was running on
`Node1-CpuA-Core2-Thread-3', and `buffer_init()' was running on
`Node4-CpuD-Core3-Thread-1', and the memory which makes up `buffer_ready'
and `buffer' was local to `Node4'. There is no guarantee that the store to
`buffer_ready' will become visible to the CPU's on `Node1'. You may need to
use special instructions, such as message passing via channel interface or
something. Think of the PPC wrt communication between the memory which
belongs to the main PowerPC's, and the local private memory that belong to
each SPU. volatile alone is not going to help here, in any way shape or
form...

Tim Rentsch · Jun 11, 2009

MikeWhy said:
Tim Rentsch said:

Chris M. Thomasson said:

John Devereux wrote:

FreeRTOS.org wrote:

I once wrote an article on compiler validation for safety critical
systems. In return somebody sent me a paper they had published
regarding different compilers implementation of volatile. I forget
the numbers now, but the conclusion of their paper was that most
compilers don't implement it correctly anyway!

If it was the same one posted here a few months ago, it started out
with
a very basic false assumption about what volatile *means*. Casting
the
rest of the paper into doubt as far as I can see.

Are you both referring to the following paper?

"Volatiles Are Miscompiled, and What to Do about It"
http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

That is the one I meant. Their first example (2.1) is wrong I think:

======================================================================

volatile int buffer_ready;
char buffer[BUF_SIZE];
void buffer_init() {
int i;
for (i=0; i<BUF_SIZE; i++)
buffer = 0;
buffer_ready = 1;
}

"The for-loop does not access any volatile locations, nor does it
perform any side-effecting operations. Therefore, the compiler is free
to move the loop below the store to buffer_ready, defeating the
developer's intent."

======================================================================

The problem is that the compiler is *not* free to do this (as far as I
can see). Surely clearing the buffer *is* a side effect?

The example is meant to illustrate "what does volatile mean". If it
does
not mean what they think it does, the other claims seem suspect.

The code is totally busted if your on a compiler that does not
automatically
insert a store-release memory barrier before volatile stores, and
load-acquire membars after volatile loads. I assume another thread will
eventually try to do something like:

int check_and_process_buffer() {
if (buffer_ready) {
/* use buffer */
return 1;
}
return 0;
}

AFAICT, MSVC 8 and above is the only compiler I know about that
automatically inserts membars on volatile accesses:

http://groups.google.com/group/comp.lang.c/msg/54d730b2650c996c

Otherwise, you would need to manually insert the correct barriers for a
particular architecture. Here is a portable version for Solaris that will
work on all arch's support by said OS:

#include <atomic.h>

volatile int buffer_ready;
char buffer[BUF_SIZE];
void buffer_init() {
int i;
for (i=0; i<BUF_SIZE; i++)
buffer = 0;
membar_producer();
buffer_ready = 1;
}

int check_and_process_buffer() {
if (buffer_ready) {
membar_consumer();
/* use buffer */
return 1;
}
return 0;
}

Click to expand...

Again, thank you for posting some excellent specific examples.

I would like to add one comment. Despite the differences, both
the MSVC 8 implementation and the Solaris implementations can
be conforming. The reason is the last sentence in 6.7.3 p 6,

What constitutes an access to an object that has
volatile-qualified type is implementation-defined.

Presumably the MSVC implementors and the Solaris implementors
reached different conclusions about how to define what
constitutes an access to a volatile-qualified object. Or, to put
that in the language I used earlier, what memory regime will be
aligned to under 'volatile'. It's possible, for example, that
the Solaris notion of volatile makes it work with some thread
implementations but not inter-process communication (or other,
differently implemented thread packages). (I'm only guessing
here; certainly I wouldn't call myself a Solaris expert.) In
any case, whichever choice is "better", both are allowed under
6.7.3 (provided of course the implementation-defined choice is
documented with the implementation).

Click to expand...

I think you still missed the relevance and the point. Set aside the
implementation-defined part and its language for the moment. This specific
example points out that optimization takes place in both the hardware and
the compiler. The processor re-orders memory access, just as the compiler's
optimizations can as well. MEMBAR before and after volatile access enforces
at the hardware level that the specified operation order is maintained. In
other words -- that is, in the language of the standard -- it maintains the
state of the abtract machine to what the developer wrote. There doesn't seem
to me to be much room for interpretation. It would be instructive to review
the standard with this in mind as a specific, concrete example of what
implementation-defined might mean in context of volatile.

Hmmmm... how can I say this gently? I think you may be confusing
the notions of abstract machine and physical machine.

You say, in part, "[MEMBAR] maintains the state of the abtract
machine to what the developer wrote." In fact MEMBAR is not
necessary for correct functioning of the abstract machine. If
MEMBAR is necessary at all, it's necessary only for producing
appropriate physical machine semantics for use of volatile. If all
MEMBAR's were taken out, and no variables were accessed externally,
the program would still execute correctly. In other words the
abstract machine would still get a faithful mapping -- it's only
external accesses that might be affected, and such accesses are not
part of the abstract machine.

Also, you talk about "the hardware level". There isn't a single
hardware level. There are at least two, namely, the hardware state
as seen by execution of a single instruction stream (where MEMBAR
isn't needed), and the hardware state as seen by execution of
another thread or process, perhaps on another CPU (where MEMBAR may
be necessary to preserve some sort of partial ordering as seen by
the single instruction stream "virtual machine"). It isn't required
that volatile take the latter perspective -- it could just as well
take the first perspective, under the provision that what
constitutes a volatile-qualified access is implementation-defined.
Depending on what enviroments the implementation is intended to
support, either choice might be a good one.

I fully understand that hardware plays a role in "optimization"
(used in a slightly different sense here) -- I mentioned store
reordering in another posting, and there is also out-of-order
execution, and even speculative execution, etc. However these
"optimizations" are irrelevant as far as the implementation is
concerned (for non-volatile access), because the state as viewed by
the single executing instruction stream is carefully maintained to
appear exactly as though storage ordering is preserved, instruction
ordering is preserved, speculative branches that end up not being
taken are suppressed, etc[*]. It's only when 'volatile' is involved
that these effects might matter, because the program execution
stated is being viewed by an agent (process, thread, device logic,
etc) external to this program's execution state.

Finally, to repeat/restate my earlier comment, it's only true that
these effects /might/ matter, and not that they /must/ matter,
because an implementation isn't obligated to take the other-process
perspective as to what constitutes a volatile-qualified access.
Depending on what choice is made for this, the hardware-level
"optimizations" might or might not need to be taken into
account for what volatile does.

Does that make a little more sense now?

[*] It's a different story for machines like MIPS where the
underlying pipeline stages are exposed at the architectural
level. But that's not important for this discussion.

Tim Rentsch · Jun 11, 2009

Chris M. Thomasson said:
Chris M. Thomasson said:

[...]
Otherwise, you would need to manually insert the correct barriers for a
particular architecture. Here is a portable version for Solaris that will
work on all arch's support by said OS:

#include <atomic.h>

volatile int buffer_ready;

Click to expand...

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

volatile int buffer_ready = 0;

of course!

;^)

char buffer[BUF_SIZE];
void buffer_init() {
int i;
for (i=0; i<BUF_SIZE; i++)
buffer = 0;
membar_producer();
buffer_ready = 1;
}

int check_and_process_buffer() {
if (buffer_ready) {
membar_consumer();
/* use buffer */
return 1;
}
return 0;
}

Click to expand...

Now, there is another issue. On NUMA systems with a non-cache-coherent
network of processing nodes, the code above might not work. One may need to
issue a special instruction in order to force the store issued on
`buffer_ready' to propagate from the intra-node level, up to the inter-node
level.

Right, the different types of memory actions correspond to
different memory regimes -- the intra-node level is one memory
regime, and the inter-node level is another memory regime.

Think if `check_and_process_buffer()' was running on
`Node1-CpuA-Core2-Thread-3', and `buffer_init()' was running on
`Node4-CpuD-Core3-Thread-1', and the memory which makes up `buffer_ready'
and `buffer' was local to `Node4'. There is no guarantee that the store to
`buffer_ready' will become visible to the CPU's on `Node1'.

Click to expand...

I take what you're saying here to mean that the implementation
shown above, with MEMBAR's but not special instructions for
inter-node stores, will not guarantee that the store to
'buffer_ready' will be visible, because so much depends on
the specific memory architectures and how they interact.

You may need to
use special instructions, such as message passing via channel interface or
something. Think of the PPC wrt communication between the memory which
belongs to the main PowerPC's, and the local private memory that belong to
each SPU.

Click to expand...

Yes -- arbitrarily diverse memory architectures mean potentially
arbitrarily complicated memory coherence mechanisms.

volatile alone is not going to help here, in any way shape or
form...

Click to expand...

Most likely it won't, but in principle it could. Assuming first
that the necessary memory linkage could be established, so memory
in the 'buffer_init()' process could be accessed by code in the
'check_and_process_buffer()' process (such linkage could), an
implemenation could choose to implement volatile so it
synchronized the two memories appropriately when the volatile
accesses are done.

In practical terms I agree this sort of implementation isn't
likely, but the Standard allows it -- in particular, as to
how 'volatile' would behave in this respect, because that
choice is implementation-defined.

Chris M. Thomasson · Jun 11, 2009

Tim Rentsch said:
Chris M. Thomasson said:

Chris M. Thomasson said:

[...]
Otherwise, you would need to manually insert the correct barriers for a
particular architecture. Here is a portable version for Solaris that
will
work on all arch's support by said OS:

#include <atomic.h>

volatile int buffer_ready;

Click to expand...

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

volatile int buffer_ready = 0;

of course!

;^)

char buffer[BUF_SIZE];
void buffer_init() {
int i;
for (i=0; i<BUF_SIZE; i++)
buffer = 0;
membar_producer();
buffer_ready = 1;
}

int check_and_process_buffer() {
if (buffer_ready) {
membar_consumer();
/* use buffer */
return 1;
}
return 0;
}

Click to expand...

Now, there is another issue. On NUMA systems with a non-cache-coherent
network of processing nodes, the code above might not work. One may need
to
issue a special instruction in order to force the store issued on
`buffer_ready' to propagate from the intra-node level, up to the
inter-node
level.

Click to expand...

Right, the different types of memory actions correspond to
different memory regimes -- the intra-node level is one memory
regime, and the inter-node level is another memory regime.

Think if `check_and_process_buffer()' was running on
`Node1-CpuA-Core2-Thread-3', and `buffer_init()' was running on
`Node4-CpuD-Core3-Thread-1', and the memory which makes up `buffer_ready'
and `buffer' was local to `Node4'. There is no guarantee that the store
to
`buffer_ready' will become visible to the CPU's on `Node1'.

Click to expand...

I take what you're saying here to mean that the implementation
shown above, with MEMBAR's but not special instructions for
inter-node stores, will not guarantee that the store to
'buffer_ready' will be visible, because so much depends on
the specific memory architectures and how they interact.

Well, I ___expect___ MEMBAR to behave well within an intra-node point of
view. AFAICT, for inter-node communications, on NON ccNUMA (e.g., real cache
incoherent NUMA), if you can find MEMBAR pushing out coherency "pings"
across inter-node boundaries, well, that would not be good, IMVVVHO at
least!

However, what does ANY of that have to do with volatile?

A: NOTHING!

;^o

Yes -- arbitrarily diverse memory architectures mean potentially
arbitrarily complicated memory coherence mechanisms.

Click to expand...

Indeed. Well, I totally disagree on a vibe I am getting from your statement.
I get the vibe that you seem to think diverse highly specific memory models
seem to potentially require complicated coherence... Well, the term
`complicated' is in the eye/ear of the individual beholder, or perhaps
softened across a plurality of a specific local group of beholders...
Statistics are so precise!

Jesting of course... Perhaps? ;^D

Anyway, your 100% correct. Sometimes a parallelization of an algorithm might
simply require so many rendezvous' of some, perhaps "dubious", sort that
they simply cannot ever be made to scale in there present form.

Most likely it won't, but in principle it could.

Click to expand...

Yes. Absolutely.

Assuming first
that the necessary memory linkage could be established, so memory
in the 'buffer_init()' process could be accessed by code in the
'check_and_process_buffer()' process (such linkage could), an
implemenation could choose to implement volatile so it
synchronized the two memories appropriately when the volatile
accesses are done.

Click to expand...

An implementation can do its thing and define volatile accordingly.

In practical terms I agree this sort of implementation isn't
likely, but the Standard allows it -- in particular, as to
how 'volatile' would behave in this respect, because that
choice is implementation-defined.

Click to expand...

I PERSONALLY WANT volatile to be restricted to compiler optimizations wrt
the context of the abstract virtual machine. Of course the abstract machine
is single-threaded. Great! That means a physical machine can implement a
million threads that each implement a single local abstract C machine. They
never communicate until the end of computation. Lets that takes a month.
That whole month is governed by the local abstract machines. Lets say they
have the ability to network and cleverly rendezvous in a NUMA system after
they were finished? I say yes... No volatile needed; well, volatile can
probably be efficiently used by node-local only code...

As for optimizations on loop conditions, well, that's newbie stuff...

;^o

Tim Rentsch · Jun 19, 2009

Chris M. Thomasson said:
Tim Rentsch said:

Chris M. Thomasson said:

[...]
Otherwise, you would need to manually insert the correct barriers for a
particular architecture. Here is a portable version for Solaris that
will
work on all arch's support by said OS:

#include <atomic.h>

volatile int buffer_ready;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

volatile int buffer_ready = 0;

of course!

;^)

char buffer[BUF_SIZE];
void buffer_init() {
int i;
for (i=0; i<BUF_SIZE; i++)
buffer = 0;
membar_producer();
buffer_ready = 1;
}

int check_and_process_buffer() {
if (buffer_ready) {
membar_consumer();
/* use buffer */
return 1;
}
return 0;
}

Now, there is another issue. On NUMA systems with a non-cache-coherent
network of processing nodes, the code above might not work. One may need
to
issue a special instruction in order to force the store issued on
`buffer_ready' to propagate from the intra-node level, up to the
inter-node
level.

Click to expand...

Right, the different types of memory actions correspond to
different memory regimes -- the intra-node level is one memory
regime, and the inter-node level is another memory regime.

Think if `check_and_process_buffer()' was running on
`Node1-CpuA-Core2-Thread-3', and `buffer_init()' was running on
`Node4-CpuD-Core3-Thread-1', and the memory which makes up `buffer_ready'
and `buffer' was local to `Node4'. There is no guarantee that the store
to
`buffer_ready' will become visible to the CPU's on `Node1'.

Click to expand...

I take what you're saying here to mean that the implementation
shown above, with MEMBAR's but not special instructions for
inter-node stores, will not guarantee that the store to
'buffer_ready' will be visible, because so much depends on
the specific memory architectures and how they interact.

Click to expand...

Well, I ___expect___ MEMBAR to behave well within an intra-node point of
view. AFAICT, for inter-node communications, on NON ccNUMA (e.g., real cache
incoherent NUMA), if you can find MEMBAR pushing out coherency "pings"
across inter-node boundaries, well, that would not be good, IMVVVHO at
least!

Sorry, I guess I wasn't quite clear enough. My comment was meant
basically as an implicit question, trying to clarify your intended
meaning. Rephrasing, the two salient features are, one, using
MEMBAR is enough to guarantee intra-node access consistency, and
two, using MEMBAR (and nothing else) is not enough to guarantee
inter-node access consistency. That what I thought you meant
before, and I read this response as confirming that.

However, what does ANY of that have to do with volatile?

A: NOTHING!

;^o

Click to expand...

I think it's relevant to the discussion (ie, of volatile) because
there are two clearly distinct memory regimes (intra-node and
inter-node), and it's perfectly reasonable to consider an
implementation's volatile supporting one but not the other.
Indeed, I take you're saying to mean it's reasonable to /expect/
volatile to support intra-node access but not inter-node access
in this case. And I think that's right, in the sense that many
people experienced in such architectures would expect the same
thing. (And I wouldn't presume to contradict them, even if I
expected something else, which actually I don't.)

Indeed. Well, I totally disagree on a vibe I am getting from your statement.
I get the vibe that you seem to think diverse highly specific memory models
seem to potentially require complicated coherence... Well, the term
`complicated' is in the eye/ear of the individual beholder, or perhaps
softened across a plurality of a specific local group of beholders...
Statistics are so precise!

Click to expand...

My statement was more in the nature of an abstract, "mathematical"
conclusion than a comment on what architectures are actually out
there. I think we're actually pretty much on the same page here.
(OOPS! No pun intended...)

Jesting of course... Perhaps? ;^D

Anyway, your 100% correct. Sometimes a parallelization of an algorithm might
simply require so many rendezvous' of some, perhaps "dubious", sort that
they simply cannot ever be made to scale in there present form.

Click to expand...

To say this another way, parallelizing an algorithm in a particular
way might work well for one kind of synchronization (eg, intra-node
coherence) but not for another kind of synchronization (eg, inter-node
coherence).

Yes. Absolutely.

An implementation can do its thing and define volatile accordingly.

I PERSONALLY WANT volatile to be restricted to compiler optimizations wrt
the context of the abstract virtual machine.

Click to expand...

I had to read this sentence over several times to try to make sense of
it. I think I understand what you're saying; let me try saying it a
different way and see if we're in sync. Optimizations don't happen in
the abtract machine -- it's a single thread, one-step-at-a-time model,
exactly faithful to the original program source. However, in the
course of running a program on an actual computer, there needs to be a
degree of coherence between the abstract machine's "memory system" and
the computer's memory system. (The abstract machine's "memory system"
doesn't really exist except in some sort of conceptual sense, but it
seems useful to pretend it exists, to talk about coherence between it
and the actual computer memory). The coherence between the abstract
machine's memory system and the actual computer's memory doesn't have
to be exact, it only has to match up to the point where the "as if"
rule holds. Does that make sense?

Under this model, I think you're saying that you would like volatile
to impose coherence between the abstract machine memory system and
the "most local" physical machine memory system (ie, the same thread
executing on the same CPU), and not more than that. This coherence
is stronger than the non-volatile coherence, because the two memory
systems must be completely in sync (and not just "as if" in sync)
at points of volatile access.

In other words, the memory regime you're identifying (that volatile
would or should align with) is the same thread, same CPU memory
regime. Anything more than that, including inter-core (but still
intra-CPU), or even inter-thread (but still intra-core and intra-CPU)
would not be covered just by volatile. Is that what you mean, or
are do you mean to say something different?

To come at this a different way, let me ask it this way: which level
of communication/coherence (do you mean to say that) volatile should
support

(a) only same-thread, same core, same CPU, same node
(b) inter-thread, intra-core, intra-CPU, intra-node
(c) inter-thread, inter-core, intra-CPU, intra-node
(d) inter-thread, inter-core, inter-CPU, intra-node
(e) inter-thread, inter-core, inter-CPU, inter-node
(f) something else? (I didn't even mention intra/inter-process...)

I first thought you meant (a), but now I'm not so sure.

Of course the abstract machine
is single-threaded. Great! That means a physical machine can implement a
million threads that each implement a single local abstract C machine. They
never communicate until the end of computation. Lets that takes a month.
That whole month is governed by the local abstract machines. Lets say they
have the ability to network and cleverly rendezvous in a NUMA system after
they were finished? I say yes... No volatile needed; well, volatile can
probably be efficiently used by node-local only code...

Click to expand...

I'm not sure what a same-thread/same-core/same-CPU/same-node
definition of volatile buys us, except some sort of guarantee
for variable access in intra-thread signal handlers. (There
is also setjmp()/longjmp(), but I think that's incidental
since whatever guarantees there are there will be true no
matter what memory regime volatile identifies.)

Certainly it's possible to do inter-thread or inter-process
communication/synchronization using extra-linguistic mechanisms and
not using volatile, even under model (a) above. Ideally an
implementation would support several different choices of which
volatile model it follows (eg, selected by a compiler flag), and
developers could choose the model appropriate to the needs of the
program being developed. Before that can happen, however, we
have to have a language to talk about what the different choices
mean. My intention and hope in this thread has been to start
to develop that language, so that different choices can be
identifed, discussed, compared, and ideally selected -- easily.

Chris M. Thomasson · Jun 19, 2009

[...]

To come at this a different way, let me ask it this way: which level
of communication/coherence (do you mean to say that) volatile should
support

(a) only same-thread, same core, same CPU, same node
(b) inter-thread, intra-core, intra-CPU, intra-node
(c) inter-thread, inter-core, intra-CPU, intra-node
(d) inter-thread, inter-core, inter-CPU, intra-node
(e) inter-thread, inter-core, inter-CPU, inter-node
(f) something else? (I didn't even mention intra/inter-process...)

I first thought you meant (a), but now I'm not so sure.

[...]

First of all I need to read your entire detailed response carefully in order
to give a complete response. However, I can answer the question above:

I choose `a'

I do not agree with the fact that MSVC automatically inserts memory barriers
on volatile accesses because it can creates unnecessary overheads. What
happens if I don't need to use any membars at all, but still need to use
volatile? Well, the damn MSVC compiler will insert the membars right under
my nose. Also, what if I need a membar, but something not as strict as
store-release and load-acquire? Again, the MSVC compiler will force the more
expensive membars down my neck. Or, what if I need the membar, but in a
different place than the compiler automatically inserts them at? I am
screwed and have to code custom synchronization primitives in assembly
language, turn link time optimizations off, and use external function
declarations so they are accessible to a C program.

So, I want volatile to only inhibit certain compiler optimizations. I do not
want volatile to automatically stick in any membars1

;^o

Tim Rentsch · Jun 20, 2009

Chris M. Thomasson said:
[...]

To come at this a different way, let me ask it this way: which level
of communication/coherence (do you mean to say that) volatile should
support

(a) only same-thread, same core, same CPU, same node
(b) inter-thread, intra-core, intra-CPU, intra-node
(c) inter-thread, inter-core, intra-CPU, intra-node
(d) inter-thread, inter-core, inter-CPU, intra-node
(e) inter-thread, inter-core, inter-CPU, inter-node
(f) something else? (I didn't even mention intra/inter-process...)

I first thought you meant (a), but now I'm not so sure.

Click to expand...

[...]

First of all I need to read your entire detailed response carefully in order
to give a complete response. However, I can answer the question above:

I choose `a'

I do not agree with the fact that MSVC automatically inserts memory barriers
on volatile accesses because it can creates unnecessary overheads. What
happens if I don't need to use any membars at all, but still need to use
volatile? Well, the damn MSVC compiler will insert the membars right under
my nose. Also, what if I need a membar, but something not as strict as
store-release and load-acquire? Again, the MSVC compiler will force the more
expensive membars down my neck. Or, what if I need the membar, but in a
different place than the compiler automatically inserts them at? I am
screwed and have to code custom synchronization primitives in assembly
language, turn link time optimizations off, and use external function
declarations so they are accessible to a C program.

So, I want volatile to only inhibit certain compiler optimizations. I do not
want volatile to automatically stick in any membars1

;^o

Good, this makes clear (or at least mostly clear) what you want.
I also think I understand why you want it; not that that's
important necessarily, but to some degree the why clarifies the
what in this case.

At the same time, I think many other developers would prefer
other choices, including most of b-f above. It would be good to
support other choices also, perhaps through compiler options or
by using #pragma's. There isn't a single "right" choice for what
volatile should do -- it depends a lot on what kind of program is
being developed and on what assumptions hold for the environments
in which the program, or programs, will run. Ideally both the
development community and the implementation community will start
to realize this (or, realize it more fully). After that happens,
there needs to be a common language describing different possible
meanings for volatile -- more specifically, language more precise
than the kind of informal prose that's been used in the past --
so that developers and implementors can talk about the different
choices, and identify which choices are available in which
implementations.

karthikbalaguru · Jun 22, 2009

The Semantics of 'volatile'
===========================

I've been meaning to get to this for a while, finally there's a
suitable chunk of free time available to do so.

To explain the semantics of 'volatile', we consider several
questions about the concept and how volatile variables behave,
etc. The questions are:

1. What does volatile do?
2. What guarantees does using volatile provide? (What memory
regimes must be affected by using volatile?)
3. What limits does the Standard set on how using volatile
can affect program behavior?
4. When is it necessary to use volatile?

We will take up each question in the order above. The comments
are intended to address both developers (those who write C code)
and implementors (those who write C compilers and libraries).

What does volatile do?
----------------------

This question is easy to answer if we're willing to accept an
answer that may seem somewhat nebulous. Volatile allows contact
between execution internals, which are completely under control
of the implementation, and external regimes (processes or other
agents) not under control of the implementation. To provide such
contact, and provide it in a well-defined way, using volatile
must ensure a common model for how memory is accessed by the
implementation and by the external regime(s) in question.

Subsequent answers will fill in the details around this more
high level one.

What guarantees does using volatile provide?
--------------------------------------------

The short answer is "None." That deserves some elaboration.

Another way of asking this question is, "What memory regimes must
be affected by using volatile?" Let's consider some possibilities.
One: accesses occur not just to registers but to process virtual
memory (which might be just cache); threads running in the same
process affect and are affected by these accesses. Two: accesses
occur not just to cache but are forced out into the inter-process
memory (or "RAM"); other processes running on the same CPU core
affect and are affected by these accesses. Three: accesses occur
not just to memory belonging to the one core but to memory shared
by all the cores on a die; other processes running on the same CPU
(but not necessarily the same core) affect and are affected by
these accesses. Four: accesses occur not just to memory belonging
to one CPU but to memory shared by all the CPUs on the motherboard;
processes running on the same motherboard (even if on another CPU
on that motherboard) affect and are affected by these accesses.
Five: accesses occur not just to fast memory but also to some slow
more permanent memory (such as a "swap file"); other agents that
access the "swap file" affect and are affected by these accesses.

The different examples are intended informally, and in many cases
there is no distinction between several of the different layers.
The point is that different choices of regime are possible (and
I'm sure many readers can provide others, such as not only which
memory is affected but what ordering guarantees are provided).
Now the question again: which (if any) of these different
regimes are /guaranteed/ to be included by a 'volatile' access?

The answer is none of the above. More specifically, the Standard
leaves the choice completely up to the implementation. This
specification is given in one sentence in 6.7.3 p 6, namely:

What constitutes an access to an object that has
volatile-qualified type is implementation-defined.

So a volatile access could be defined as coordinating with any of
the different memory regime alternatives listed above, or other,
more exotic, memory regimes, or even (in the claims of some ISO
committee participants) no particular other memory regimes at all
(so a compiler would be free to ignore volatile completely)[*].
How extreme this range is may be open to debate, but I note that
Larry Jones, for one, has stated unequivocally that the possibility
of ignoring volatile completely is allowed under the proviso given
above. The key point is that the Standard does not identify which
memory regimes must be affected by using volatile, but leaves that
decision to the implementation.

A corollary to the above that any volatile-qualified access
automatically introduces an implementation-defined aspect to a
program.

[*] Possibly not counting the specific uses of 'volatile' as it
pertains to setjmp/longjmp and signals that the Standard
identifies, but these are side issues.

What limits are there on how volatile access can affect program behavior?
-------------------------------------------------------------------------

More properly this question is "What limits does the Standard
impose on how volatile access can affect program behavior?".

Again the short answer is None. The first sentence in 6.7.3 p 6
says:

An object that has volatile-qualified type may be modified
in ways unknown to the implementation or have other unknown
side effects.

Nowhere in the Standard are any limitations stated as to what
such side effects might be. Since they aren't defined, the
rules of the Standard identify the consequences as "undefined
behavior". Any volatile-qualified access results in undefined
behavior (in the sense that the Standard uses the term).

Some people are bothered by the idea that using volatile produces
undefined behavior, but there really isn't any reason to be. At
some level any C statement (or variable access) might behave in
ways we don't expect or want. Program execution can always be
affected by peculiar hardware, or a buggy OS, or cosmic rays, or
anything else outside the realm of what the implementation knows
about. It's always possible that there will be unexpected
changes or side effects, in the sense that they are unexpected by
the implementation, whether volatile is used or not. The
difference is, using volatile interacts with these external
forces in a more well-defined way; if volatile is omitted, there
is no guarantee as to how external forces on particular parts
of the physical machine might affect (or be affected by) changes
in the abstract machine.

Somewhat more succinctly: using volatile doesn't affect the
semantics of the abtract machine; it admits undefined behavior
by unknown external forces, which isn't any different from the
non-volatile case, except that using volatile adds some
(implementation-defined) requirements about how the abstract
machine maps onto the physical machine in the external forces'
universe. However, since the Standard mentions unknown side
effects explicitly, such things seem more "expectable" when
volatile is used. (volatile == Expect the unexected?)

When is it necessary to use volatile?
-------------------------------------

In terms of pragmatics this question is the most interesting of
the four. Of course, as phrased the question asked is more of a
developer question; for implementors, the phrasing would be
something more like "What requirements must my implementation
meet to satisfy developers who are using 'volatile' as the
Standard expects?"

To get some details out of the way, there are two specific cases
where it's necessary to use volatile, called out explicitly in
the Standard, namely setjmp/longjmp (in 7.13.2.1 p 3) and
accessing static objects in a signal handler (in 7.14.1.1 p 5).
If you're a developer writing code for one of these situations,
either use volatile, code around it so volatile isn't needed
(this can be done for setjmp), or be sure that the particular
code you're writing is covered by some implementation-defined
guarantees (extensions or whatever). Similarly, if you're an
implementor, be sure that using volatile in the specific cases
mentioned produces code that works; what this means is that the
volatile-using code should behave just like it would under
regular, non-exotic control structures. Of course, it's even
better if the implementation can do more than the minimum, such
as: define and document some additional cases for signal
handling code; make variable access in setjmp functions work
without having to use volatile, or give warnings for potential
transgressions (or both).

The two specific cases are easy to identify, but of course the
interesting cases are everything else! This area is one of the
murkiest in C programming, and it's useful to take a moment to
understand why. For implementors, there is a tension between
code generation and what semantic interpretation the Standard
requires, mostly because of optimization concerns. Nowhere is
this tension felt more keenly than in translating 'volatile'
references faithfully, because volatile exists to make actions in
the abstract machine align with those occurring in the physical
machine, and such alignment prevents many kinds of optimization.
To appreciate the delicacy of the question, let's look at some
different models for how implementations might behave.

The first model is given as an Example in 5.1.2.3 p 8:

EXAMPLE 1 An implementation might define a one-to-one
correspondence between abstract and actual semantics: at
every sequence point, the values of the actual objects would
agree with those specified by the abstract semantics.

We call this the "White Box model". When using implementations
that follow the White Box model, it's never necessary to use
volatile (as the Standard itself points out: "The keyword
volatile would then be redundant.").

At the other end of the spectrum, a "Black Box model" can be
inferred based on the statements in 5.1.2.3 p 5. Consider an
implementation that secretly maintains "shadow memory" for all
objects in a program execution. Regular memory addresses are
used for address-taking or index calculation, but any actual
memory accesses would access only the shadow memory (which is at
a different location), except for volatile-qualified accesses
which would load or store objects in the regular object memory
(ie, at the machine addresses ...

read more »

One of the biggest analysis of 'volatile' i have come across

Karthik Balaguru

Keith Thompson · Jun 22, 2009

karthikbalaguru said:
The Semantics of 'volatile'
===========================

I've been meaning to get to this for a while, finally there's a
suitable chunk of free time available to do so.

Click to expand...

[199 lines deleted]

One of the biggest analysis of 'volatile' i have come across

Why did you feel the need to re-post the whole thing just to add a
fairly meaningless one-line comment?

Richard Bos · Jun 22, 2009

One of the biggest analysis of 'volatile' i have come across

And you _had_ to quote it in its entirety, adding nothing but that
trivial remark, for _what_ reason?

Furrfu.

Richard

CBFalconer · Jun 22, 2009

Keith said:
.... snip ...

I've been meaning to get to this for a while, finally there's a
suitable chunk of free time available to do so.

Click to expand...

[199 lines deleted]

One of the biggest analysis of 'volatile' i have come across

Click to expand...

Why did you feel the need to re-post the whole thing just to add
a fairly meaningless one-line comment?

Agreed. Why do people do these silly things? Who said 'They do it
only to annoy' in Alice in Wonderland?

David Brown · Jun 23, 2009

CBFalconer said:
Keith said:

... snip ...

I've been meaning to get to this for a while, finally there's a
suitable chunk of free time available to do so. [199 lines deleted]
One of the biggest analysis of 'volatile' i have come across

Click to expand...

Why did you feel the need to re-post the whole thing just to add
a fairly meaningless one-line comment?

Click to expand...

Agreed. Why do people do these silly things? Who said 'They do it
only to annoy' in Alice in Wonderland?

I believe it was the cook, in her song about sneezing children (from
memory, so it might not be word-perfect):

Speak roughly to your little boy,
And beat him when he sneezes,
He only does it to annoy
Because he knows it teases.

Boudewijn Dijkstra · Jun 23, 2009

Op Tue, 23 Jun 2009 11:28:33 +0200 schreef David Brown

CBFalconer said:
CBFalconer said:

Keith said:

... snip ...
I've been meaning to get to this for a while, finally there's a
suitable chunk of free time available to do so.
[199 lines deleted]
One of the biggest analysis of 'volatile' i have come across
Why did you feel the need to re-post the whole thing just to add
a fairly meaningless one-line comment?

Click to expand...

Agreed. Why do people do these silly things? Who said 'They do it
only to annoy' in Alice in Wonderland?

Click to expand...

I believe it was the cook, in her song about sneezing children (from
memory, so it might not be word-perfect):

Speak roughly to your little boy,
And beat him when he sneezes,
He only does it to annoy
Because he knows it teases.

Word-perfect, but failed on punctuation.

http://en.wikisource.org/wiki/Alice's_Adventures_in_Wonderland/Chapter_6

Volatile and code reordering	1	Mar 20, 2009
volatile Info	56	Aug 4, 2010
Simultaneous Writes on Volatile	42	Aug 15, 2010
No way to add volatile access?	23	Aug 31, 2010
Non-volatile compiler optimizations	6	May 12, 2010
volatile and multiple threads	10	Jul 10, 2008
Volatile	2	Jul 31, 2006
volatile in C99	3	Nov 29, 2008

The Semantics of 'volatile'

MikeWhy

Chris M. Thomasson

Tim Rentsch

Tim Rentsch

Chris M. Thomasson

Tim Rentsch

Chris M. Thomasson

Tim Rentsch

karthikbalaguru

Keith Thompson

Richard Bos

CBFalconer

David Brown

Boudewijn Dijkstra

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads