Volatile variables

S

srinivas reddy

Hi,
Is there any chance that a program doesn' work properly even after a
variable is declared as volatile? I remember somebody mentioning a
scenario involving L1, L2 caches. Could anybody throw some light on
this?

Thanks,
Srinivas
 
M

Martin Dickopp

Hi,
Is there any chance that a program doesn' work properly even after a
variable is declared as volatile? I remember somebody mentioning a
scenario involving L1, L2 caches. Could anybody throw some light on
this?

If the program is incorrect to start with, there is a large (usually
close to one) chance that it will continue to not work properly if it
is changed so that some variables are volatile qualified. OTOH, if a
correct program is modified in this way, it will always continue to
work, although it might run slower.

Martin
 
T

The Real OS/2 Guy

Hi,
Is there any chance that a program doesn' work properly even after a
variable is declared as volatile? I remember somebody mentioning a
scenario involving L1, L2 caches. Could anybody throw some light on
this?

Sure. The programmer can produce bugs in his code. There is no
guarantee than a program is 100% errorfree.
 
D

Dan Pop

In said:
Is there any chance that a program doesn' work properly even after a
variable is declared as volatile?

If a program was correct before declaring anything as volatile, it will
keep being correct after and will produce the same output, unless the
output depends on unspecified behaviour. The only purpose of volatile is
to dumb down the compiler, WRT certain optimisations otherwise allowed by
the language.

OTOH, there are programs that aren't correct in the absence of the
volatile qualifier and that are fixed this way. Here's an example:

#include <stdio.h>
#include <signal.h>

sig_atomic_t gotsig = 0;

void handler(int signo)
{
gotsig = signo;
}

int main()
{
signal(SIGINT, handler);
puts("Press the interrupt key to exit.");
while (gotsig == 0) ;
printf ("The program received signal %d.\n", (int)gotsig);
return 0;
}

Because gotsig is not volatile, the compiler is free to "optimise" the

while (gotsig == 0) ;

loop to:

if (gotsig == 0) while (1) ;

since it sees no way the value of gotsig can change inside the loop.

If gotsig is volatile, the compiler must assume that its value can change
behind its back and keep testing its value.

But this doesn't mean that it's worth trying to fix broken programs by
randomly throwing in volatile qualifiers. As usual, there is no
substitute for knowing what you're doing.
I remember somebody mentioning a
scenario involving L1, L2 caches. Could anybody throw some light on
this?

Whoever mentioned such a scenario was heavily confused and in dire need
of a clue. The abstract C machine has no caching, therefore caching is
irrelevant to the correct behaviour of a C program.

Another typical example of using volatile is when writing memory testing
programs. Such programs often write values in memory and then read them
back and compare with the original. To the compiler, it is obvious what
the result of the comparison should be, so it can optimise all the memory
testing away and declare that the memory works correctly. To force the
writing to and the reading from memory, you *have* to use a pointer to
volatile data.

Dan
 
C

Chris Torek

In said:
(e-mail address removed) (srinivas reddy) writes: [snippage]
I remember somebody mentioning a scenario involving L1, L2 caches.
Could anybody throw some light on this?

Whoever mentioned such a scenario was heavily confused and in dire need
of a clue. The abstract C machine has no caching, therefore caching is
irrelevant to the correct behaviour of a C program.[/QUOTE]

I would not go so far as to say *this*. The second sentence is
true but does not imply the first, because the program might not
be written in Standard C after all.

In particular, one place one commonly abandons Standard C in order
to get actual work done :) on real machines has to do with device
drivers, where the "volatile" keyword is also heavily used. Device
drivers tend to "do I/O" (reading input and generating output is
often required to get work done), and some machines provide fast
I/O methods ("DMA" and the like) that completely bypass the CPU.

If a CPU has an on-chip cache[1], and if DMA bypasses the CPU
entirely[2], then DMA bypasses the on-chip cache. As it happens,
on-chip CPU caches generally come in one of two flavors, called
"write-through" and "write-back". In the case of a write-through
cache, DMA *output* (from memory to device) does not require any
special action, because data in the CPU cache is always also in
memory (this is the property that makes the cache "write-through").
When the device obtains the output-data from memory, it gets the
desired values. DMA *input*, however, has a problem; and with
write-back caches, even DMA output has the same problem: the data
in the CPU cache can differ from that in memory, before and/or
after the device's DMA transaction. To obtain correct co-operation
between the device and the CPU we use steps called "cache flushing".
(In the abstract model I implemented for BSD, we always do this
twice for every DMA transaction: one "pre-op" and one "post-op",
supplying flags as to whether the op is read, write, or both.)

Again, just as Dan Pop said, all of this is outside the model we
use in ANSI/ISO C (the "abstract machine") -- but it does occur in
"real world" C programming, in a place where the "volatile" keyword
is used quite a lot.

[1] Most do these days; some have multiple levels of on-chip cache.

[2] Some do, some do not; some CPUs even have bugs in the DMA
snooping hardware. Sometimes some DMA goes through some caches
and bypasses others. Some of the more byzantine architectures have
multiple levels of I/O adapters, which have their own memory-interaction
issues. Making devices "upcall" to their adapters to announce
"intent to do I/O" and "finished doing I/O" removes all a lot of
"hair" from the drivers; the adapters do any setup or teardown
required and continue to push the call up their own chain until it
reaches a level that is "all-knowing".
 
C

Carsten Hansen

srinivas reddy said:
Hi,
Is there any chance that a program doesn' work properly even after a
variable is declared as volatile? I remember somebody mentioning a
scenario involving L1, L2 caches. Could anybody throw some light on
this?

Thanks,
Srinivas

Well, your post is a little vague.

One issue I have seen in the past is the following.

volatile char * p = addr1;
volatile char * q = addr2;

*p = 0x01;
*q = 0x00;

or

x = *p;
y = *q;

If addr1 and addr2 are addresses on some hardware device, the code may only
work correctly if the specified order happens.
Is the compiler required to insert code (eieio on some architectures) to
insure that the two memory locations are accessed in the order according to
the C standard (there is a sequence point between the two statements).
Modern architectures with multiple caches typically allow the above reads or
writes to happen in any order (more accurately, they don't guarantee the
order in which they happen).

The C99 Standard says in 5.1.2.3:
"At sequence points, volatile objects are stable in the sense that previous
accesses are
complete and subsequent accesses have not yet occurred."

Personally I believe a compiler is required to insert eieio. But I have seen
some not doing it.


Carsten Hansen
 
C

Chris Torek

Personally I believe a compiler is required to insert eieio. But I have seen
some not doing it.

While EIEIO instructions are specific to PowerPC architectures, in
general I am not sure I would *want* such instructions for all
"volatile" accesses. In particular, compare the loops in:

struct software_state *ss;
struct hardware_register *reg;

ss->polling = 1;
while (ss->ready == 0)
continue;

reg->cmd = RESET;
while (reg->csr & BUSY)
continue;

(and assume a single CPU). In the first case, the same CPU will be
reading and writing the "polling" and "ready" fields, and there is
no need for an I/O synchronization instruction. In the second case,
the hardware will be controlling the BUSY bit based on the command,
and there is a need for an I/O synchronization instruction.

I do not remember offhand whether EIEIO in particular is a privileged
instruction, but if it is -- and some architectures do have protected
"I/O flush" instructions -- and the compiler always generated it
for every "volatile" access, one could not even write proper ANSI
C "volatile sig_atomic_t" code. If the compiler-writer makes
"volatile" do everything, it often does too much; if not, it often
does too little. "Too little", however, can be augmented, while
"too much" is hard to undo afterward. :)
 
S

srinivas reddy

I think I didn't phrase my question properly. My program needs to have
a volatile variable to work correctly and there are no bugs in the
code. Now even if a variable declared as volatile, could program fail
to read the modified value of volatile variable? Would compiler store
volatile vaiable in cache? If so, then two processes cache same
volatile variable at two different locations. One process can't see
other process's modification unless variable id flushed fro cache to
memory (cache is implemented using write-back policy, so memory dont
have updated value)?
 
C

Carsten Hansen

Chris Torek said:
While EIEIO instructions are specific to PowerPC architectures, in
general I am not sure I would *want* such instructions for all
"volatile" accesses. In particular, compare the loops in:

struct software_state *ss;
struct hardware_register *reg;

ss->polling = 1;
while (ss->ready == 0)
continue;

reg->cmd = RESET;
while (reg->csr & BUSY)
continue;

(and assume a single CPU). In the first case, the same CPU will be
reading and writing the "polling" and "ready" fields, and there is
no need for an I/O synchronization instruction. In the second case,
the hardware will be controlling the BUSY bit based on the command,
and there is a need for an I/O synchronization instruction.

I do not remember offhand whether EIEIO in particular is a privileged
instruction, but if it is -- and some architectures do have protected
"I/O flush" instructions -- and the compiler always generated it
for every "volatile" access, one could not even write proper ANSI
C "volatile sig_atomic_t" code. If the compiler-writer makes
"volatile" do everything, it often does too much; if not, it often
does too little. "Too little", however, can be augmented, while
"too much" is hard to undo afterward. :)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to
spammers.


In the code
reg->cmd = RESET;
while (reg->csr & BUSY)
continue;
how do you guarantee that the read from reg->csr happens after the write to
reg->cmd at the physical level without some I/O synchronization?

It is not the CPU's load and store instructions that do the actual read and
write to physical memory.
The way I see it, you issue a write to a memory location, then you do a read
from a different memory location. Since the two memory locations are
different that can happen in any order on a modern architectures.
But if you are actually controlling hardware, the order can be essential as
I'm sure you are aware of.

I agree that you don't need it in all instances. And since it will cause a
penalty it is kind contrary to the spirit of C.


Carsten Hansen
 
J

Jack Klein

On 27 Feb 2004 18:16:45 -0800, (e-mail address removed) (srinivas
reddy) wrote in comp.lang.c:

Do not top post, it makes discussions in technical groups very
difficult to follow and is considered rude. New material you add goes
after the relevant material you are quoting.
I think I didn't phrase my question properly. My program needs to have
a volatile variable to work correctly and there are no bugs in the
code. Now even if a variable declared as volatile, could program fail
to read the modified value of volatile variable? Would compiler store
volatile vaiable in cache? If so, then two processes cache same
volatile variable at two different locations. One process can't see
other process's modification unless variable id flushed fro cache to
memory (cache is implemented using write-back policy, so memory dont
have updated value)?

The following is a list of things that are neither defined nor
supported by the C standard:

- cache

- process

If your system provides these extensions and they cause a C program to
behave in a non-conforming manner, file a defect report with the
compiler vendor. Otherwise ask about these issues in a
compiler-spefic support group. This is not a language question.
 
C

Chris Torek

(this is getting off-topic and probably belongs somewhere like
comp.arch.embedded...)

In the code
reg->cmd = RESET;
while (reg->csr & BUSY)
continue;
how do you guarantee that the read from reg->csr happens after the write to
reg->cmd at the physical level without some I/O synchronization? ...
But if you are actually controlling hardware, the order can be essential as
I'm sure you are aware of.

Indeed.

On the V9 SPARC, rather than a single EIEIO instruction, the machine
offers a generalized MEMBAR ("memory barrier") instruction, which
takes operands. Memory barriers come in four "memory" flavors --
load/load, load/store, store/load, and store/store -- and several
additional forms, named "memissue" and "sync". For the above case,
one needs only a single "store/load" barrier between the write to
the command register and the read from the status register.

The effect of a "store/load" barrier is, roughly, "any loads
following the barrier may not be moved above any stores that preceded
the barrier." Stores before the barrier may be rearranged and
write-combined, however. Thus, *before* writing the command
register, we also need one *more* barrier if the device might refer
to memory (most likely a "memissue").

The one instruction that "always works" is "membar #sync", which
does a full CPU pipeline flush and empties the write aggregation
machinery entirely. This is, however, a hugely expensive instruction,
to be used only when absolutely necessary. Less-expensive barriers
("membar #StoreLoad|StoreStore", if I remember right offhand)
suffice for most cases. (It does not for cases that change the
CPU mode, i.e., you need a #sync for fussing about with certain
internal CPU registers. Then again, much of this is either entirely
assembly-coded, or requires heavy use of inlined assembly, and one
can insert the membar directly there.) In so-called "total store
order" -- TSO -- the CPU implicitly does such a memory barrier for
you after each instruction. In "partial store order" and "relaxed
memory order" models -- which are selectable with those internal
CPU registers -- the CPU will (or is supposed to) run faster, and
most existing (barrierless) device drivers will run fine in TSO.
As one converts device drivers, one can allow them to run in PSO
or RMO provided they have the correct "membar"s inserted.

We never got around to doing this for BSD/OS, but the general plan
was to abstract away the details by having device drivers use macros
when "talking to" device registers. (The hardware also provides
special bits in the MMUs to mark "device register" pages as having
"abnormal" memory semantics, so that even less code would have to
change -- but as the V9 architecture appendices note, these only
help in certain situations; some devices will still run into
write-combiner problems.)

(Much of the above hair, with different kinds of synchronization
instructions, comes about today because gigahertz CPU clock speeds
produce sub-nanosecond instruction timings, while I/O devices,
including on-board registers, often have response times best measured
in milliseconds. Five years ago it might take several hundred CPU
clock cycles to talk to a floppy device register; today it may take
thousands. In this race between the tortoise and the hare, the
hare must constantly stop and wait for the tortoise to catch up.
Main memory is also something of a "tortoise", with 150 ns cycle
times being more than 150 instruction times -- as many as 1200 on
a 4 GHz CPU running at two instructions per cycle. Main memory
speeds are more important, even though device speeds are more
shocking, because main memory is used so often, relatively speaking.)

(Incidentally, mainframes dealt with similar problems in the 1960s
and 1970s. Given that Computing Science folks never seem to study
their own history, I expect these same ideas will all be reinvented
soon. :) )
 
D

Dan Pop

In said:
In <[email protected]>
(e-mail address removed) (srinivas reddy) writes: [snippage]
I remember somebody mentioning a scenario involving L1, L2 caches.
Could anybody throw some light on this?

Whoever mentioned such a scenario was heavily confused and in dire need
of a clue. The abstract C machine has no caching, therefore caching is
irrelevant to the correct behaviour of a C program.

I would not go so far as to say *this*. The second sentence is
true but does not imply the first, because the program might not
be written in Standard C after all.[/QUOTE]

Then, how do we know what's its correct behaviour?
In particular, one place one commonly abandons Standard C in order
to get actual work done :) on real machines has to do with device
drivers, where the "volatile" keyword is also heavily used. Device
drivers tend to "do I/O" (reading input and generating output is
often required to get work done), and some machines provide fast
I/O methods ("DMA" and the like) that completely bypass the CPU.

And, by the time we're dealing with these issues, our code might
look superficially like C, but it ain't. All kinds of "function" calls
are actually invocations of macros that expand into inline assembly or
even (more) obscure compiler features.

Yeah, we have to use such things to get real work done, but there is
little point in pretending that we're programming in C while doing it.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top