Newbie question: accessing global variable on multiprocessor

amit · Jan 8, 2010

Hello friends,

If there's a global variable - to be accessed (read/written) by multiple
threads (on multiprocessor), then any correctly implemented access (of
that variable) will cause complete cache reload of all CPUs - is that
true or not? Anyway, what would be the cost (as compared to single read/
write instruct

I'm not talking about locking here - I'm talking about all threads seeing
the most recent value of that variable.

Thanks,

amit · Jan 8, 2010

amit said:
Hello friends,

If there's a global variable - to be accessed (read/written) by multiple
threads (on multiprocessor), then any correctly implemented access (of
that variable) will cause complete cache reload of all CPUs - is that
true or not? Anyway, what would be the cost (as compared to single read/
write instruct

I'm not talking about locking here - I'm talking about all threads
seeing the most recent value of that variable.

Thanks,

*bump*

anyone in this chatroom?

Ian Collins · Jan 8, 2010

amit said:
*bump*

anyone in this chatroom?

Which chat room?

Keith Thompson · Jan 8, 2010

amit said:
*bump*

anyone in this chatroom?

This is not a chatroom, it's a newsgroup. If you got a response
within 19 minutes, you'd be very lucky. If you don't see anything
within a day or two, you can start to wonder. You can think of it,
very loosely, as a kind of distributed e-mail; people will see your
posts when they get around to checking the newsgroup, not as soon
as you send them. There are also some delays imposed by propagation
from one server to another.

Standard C doesn't support threads. (The draft of the new
standard adds threading support, but it won't be relevant to
programmers for quite a few years.) You'll get better answers
in comp.programming.threads. I suggest browsing that newsgroup's
archives and/or checking its FAQ first.

Rui Maciel · Jan 9, 2010

amit said:
Hello friends,

If there's a global variable - to be accessed (read/written) by multiple
threads (on multiprocessor), then any correctly implemented access (of
that variable) will cause complete cache reload of all CPUs - is that
true or not? Anyway, what would be the cost (as compared to single read/
write instruct

I'm not talking about locking here - I'm talking about all threads seeing
the most recent value of that variable.

You will get better replies if you post your question on a newsgroup dedicated to
parallel programming, such as comp.programming.threads.

Hope this helps,
Rui Maciel

BGB / cr88192 · Jan 9, 2010

amit said:
Hello friends,

If there's a global variable - to be accessed (read/written) by multiple
threads (on multiprocessor), then any correctly implemented access (of
that variable) will cause complete cache reload of all CPUs - is that
true or not? Anyway, what would be the cost (as compared to single read/
write instruct

I'm not talking about locking here - I'm talking about all threads seeing
the most recent value of that variable.

well, this is not strictly standard C, but a few things go here:
the CPU is smart enough, it will not flush "all caches" on access, but would
instead only flush those which are relevant, and typically only on write
(for the other processors);
the functionality for this is built into the CPU's and the bus, so nothing
particularly special is needed.

note that, for shared variables, you would want to mark them 'volatile'
(this is a keyword which serves this purpose, among others). basically, this
just tells the compiler to read from and write changes directly to memory,
rather than have them likely sit around in a register somewhere.

as for the cost, in itself it is usually fairly small.

there are also atomic/bus-locking operations, which are usually used for
implementing mutexes, but are not usually needed for most data structures.

lacos · Jan 9, 2010

well, this is not strictly standard C, but a few things go here:
the CPU is smart enough, it will not flush "all caches" on access, but would
instead only flush those which are relevant, and typically only on write
(for the other processors);
the functionality for this is built into the CPU's and the bus, so nothing
particularly special is needed.

note that, for shared variables, you would want to mark them 'volatile'
(this is a keyword which serves this purpose, among others). basically, this
just tells the compiler to read from and write changes directly to memory,
rather than have them likely sit around in a register somewhere.

I sincerely believe that you're wrong. This is a very frequent fallacy
(I hope I'm using the right word). volatile in C has nothing to do with
threads. Volatile is what the standard defines it to be. See

http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2006/n2016.html

The question is interesting and relevant (if perhaps not topical in this
newsgroup). I was waiting for somebody to give "amit" an answer. Off the
top of my head:

- The new C++ standard will have atomic<type> which seems to be exactly
what amit needs.

- The CPU is absolutely not smart enough to find out what you need. The
compiler and the CPU may jointly and aggressively reorder the machine
level loads and stores that one would naively think to be the direct
derivation of his/her C code. Memory barriers are essential and the
POSIX threads implementations do utilize them.

To substantiate (or fix) these claims, a few links:

http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/threadsintro.html
http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
http://bartoszmilewski.wordpress.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
http://bartoszmilewski.wordpress.com/2008/11/11/who-ordered-sequential-consistency/
http://bartoszmilewski.wordpress.com/2008/12/01/c-atomics-and-memory-ordering/

I'm obviously not in the position to give advice to anybody here.
Nonetheless, my humble suggestion for the interested is to read all of
the writings linked to above. For me personally, the conclusion was to
avoid both unsychnronized and not explicitly synchronized access like
the plague.

Cheers,
lacos

Nobody · Jan 10, 2010

I sincerely believe that you're wrong. This is a very frequent fallacy
(I hope I'm using the right word). volatile in C has nothing to do with
threads. Volatile is what the standard defines it to be. See

Notably, the standard states that reading from a "volatile" variable is a
sequence point, while reading from non-volatile variables isn't.

The more significant issue is that a sequence point isn't necessarily what
people expect. The specification only describes the *abstract* semantics,
which doesn't have to match what actually occurs at the hardware level.

AFAIK, there are only two situations where you can say "if this variable
is declared "volatile", this code will behave in this way; if you omit the
qualifier, it's undefined or implementation-defined behaviour". One
case relates to setjmp()/longjmp(), the other to signal().

And even if the compiler provides the "assumed" semantics for "volatile"
(i.e. it emits object code in which read/write of volatile variables
occurs in the "expected" order), that doesn't guarantee that the processor
itself won't re-order the accesses.

Flash Gordon · Jan 10, 2010

Nobody said:
Notably, the standard states that reading from a "volatile" variable is a
sequence point, while reading from non-volatile variables isn't.

C&V? I don't think reading from a volatile is a sequence point.

The more significant issue is that a sequence point isn't necessarily what
people expect. The specification only describes the *abstract* semantics,
which doesn't have to match what actually occurs at the hardware level.

At this point, it is worth noting that there is a relationship between
volatile and sequence points. I believe the language for this is being
tidied up in the next version of the C standard, but since
reading/writing a volatile object is a side effect it has to be complete
before the sequence point.

AFAIK, there are only two situations where you can say "if this variable
is declared "volatile", this code will behave in this way; if you omit the
qualifier, it's undefined or implementation-defined behaviour". One
case relates to setjmp()/longjmp(), the other to signal().

For signal it needs to be volatile sig_atomic_t.

And even if the compiler provides the "assumed" semantics for "volatile"
(i.e. it emits object code in which read/write of volatile variables
occurs in the "expected" order), that doesn't guarantee that the processor
itself won't re-order the accesses.

However, it does have to document what it means by accessing a volatile,
and it should be possible to identify from this whether it prevents the
processor from reordering further down, whether it bypasses the cache etc.

In short, volatile seems like a sensible thing to specify on objects
accessed by multiple threads, but definitely is NOT guaranteed to be
sufficient, and may not be necessary. It's something where you need to
read the documentation for your implementation, and it may depend on
whether you have multiple cores on one processor, multiple separate
processors, and how the HW is designed.

Nobody · Jan 10, 2010

C&V? I don't think reading from a volatile is a sequence point.

Ugh; sorry. Reading from a volatile is a *side-effect*, which must not
occur before the preceding sequence point and must have occurred by the
following sequence point. 5.1.2.3 p2 and p6.

However, it does have to document what it means by accessing a volatile,
and it should be possible to identify from this whether it prevents the
processor from reordering further down, whether it bypasses the cache etc.

Easier said than done. The object code produced by a compiler may
subsequently be run on a wide range of CPUs, including those not invented
yet. The latest x86 chips will still run code which was generated for a
386.

Flash Gordon · Jan 10, 2010

Nobody said:
On Sun, 10 Jan 2010 11:06:45 +0000, Flash Gordon wrote:

Easier said than done. The object code produced by a compiler may
subsequently be run on a wide range of CPUs, including those not invented
yet. The latest x86 chips will still run code which was generated for a
386.

If the compiler does not claim to support processors not yet invented
then that is not a problem. You can't blame a compiler (or program) if
it fails for processors which are not supported even if the processor is
theoretically backwards compatible.

gwowen · Jan 11, 2010

I sincerely believe that you're wrong. This is a very frequent fallacy
(I hope I'm using the right word). volatile in C has nothing to do with
threads.

Well, nothing is C has anything to do with threads. However, since a
C-compiler may assume a piece of code is single threaded, its often
the case that the compiler will optimize away operations on a non-
volatile global variable that a . As such its often necessary (but
NOT sufficient) to declare such variables as volatile.

Suppose the following two bits of code are running concurrently:

------------------------
/* volatile */ unsigned int flag = 0;

void function_wait_for flag()
{
while(flag == 0) {}
do_some_parallel_processing();
return;
}
--------------------------
extern unsigned int flag;

void do_processing()
{
do_non_parallel_processing();
flag = 1;
do_some_other_parallel_processing();
}
--------------------------

You can see that that's a very simplistic way to parallelize a bit of
processing. Note, that since flag is not declared volatile, the
compiler may happily decide that flag is always zero and turn your
function into:

void function_wait_for flag()
{
while(true);
}

Yoinks!

Of course, its a busy-wait, and its terrible style, and there are
better ways to implement it, and its nearly always better to use real
threading primitives, like pthread supplies, rather than faking them
with volatile variables.

But with volatile it works, and without, it may not.

Threads change variables behind the compilers back -- volatile can act
as a warning that that might happen.

BGB / cr88192 · Jan 14, 2010

I sincerely believe that you're wrong. This is a very frequent fallacy
(I hope I'm using the right word). volatile in C has nothing to do with
threads. Volatile is what the standard defines it to be. See

http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2006/n2016.html

The question is interesting and relevant (if perhaps not topical in this
newsgroup). I was waiting for somebody to give "amit" an answer. Off the
top of my head:

- The new C++ standard will have atomic<type> which seems to be exactly
what amit needs.

- The CPU is absolutely not smart enough to find out what you need. The
compiler and the CPU may jointly and aggressively reorder the machine
level loads and stores that one would naively think to be the direct
derivation of his/her C code. Memory barriers are essential and the
POSIX threads implementations do utilize them.

To substantiate (or fix) these claims, a few links:

http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/threadsintro.html
http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/
http://bartoszmilewski.wordpress.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
http://bartoszmilewski.wordpress.com/2008/11/11/who-ordered-sequential-consistency/
http://bartoszmilewski.wordpress.com/2008/12/01/c-atomics-and-memory-ordering/

I'm obviously not in the position to give advice to anybody here.
Nonetheless, my humble suggestion for the interested is to read all of
the writings linked to above. For me personally, the conclusion was to
avoid both unsychnronized and not explicitly synchronized access like
the plague.

what something is defined as and how it is used are not always strictly the
same...

AFAIK, it is common understanding for compiler implementors that volatile
also be made an operation for doing thread-safe behavior, even though it is
not stated for this purpose.

similarly, as for load/store ordering with different variables:
how often does this actually matter in practice?...

granted, I can't say about non-x86 CPU's, but in general, on x86, everything
tends to work just fine simply using volatile for most variables which may
be involved in multi-thread activity.

a relative rarity as in my case most often in my case threads act
independently and on different data (and in the cases they do share data, it
is either fully synchronized, or almost entirely non-synchronized with one
thread not having any real assurance WRT data being handled in other
threads).

granted, fully-synchronous/fenced operations are generally used in special
conditions, such as for locking and unlocking mutexes, ...

BGB / cr88192 · Jan 14, 2010

I sincerely believe that you're wrong. This is a very frequent fallacy
(I hope I'm using the right word). volatile in C has nothing to do with
threads.

snip...

<--
But with volatile it works, and without, it may not.

Threads change variables behind the compilers back -- volatile can act
as a warning that that might happen.
-->

and I think most compiler writers already know this one implicitly...

beyond threading, volatile has little use in user-mode applications, so it
is essentially "re-dubbed" as an implicit "make variable safe for threads"
operation (possibly inserting memory fences, ... if needed).

all this is because, sometimes, us compiler writers don't exactly care what
exactly the standards say, and so may re-interpret things in some subtle
ways to make them useful.

this may mean:
volatile synchronizes memory accesses and may insert fences (although, as
noted, the x86/x86-64 ISA is usually smart enough to make this unneeded);
non-volatile variables are safe for all sorts of thread-unsafe trickery (as,
after all, if thread synchronization mattered for them they would have been
volatile);
....

as well as other subtleties:
pointer arithmetic on 'void *' working without complaint;
free casting between function pointers and data pointers;
....

as well, there may be restrictions for an arch above the level of the
standard:
for example, given structure definitions must be laid out in particular
ways, and apps may depend on the specific size and byte-level layout of
structures;
apps may depend on underlying details of the calling convention, stack
layout, register-allocation behavior, ...
....

a standards head will be like "no, code may not depend on this behavior",
"the compiler may do whatever it wants", ...

in reality, it is usually much more confined than this:
if the compiler varies on much of any of these little subtle details,
existing legacy code may break, ...

of course, this may lead to code getting "stuck" for a while, and when major
a change finally happens, it breaks a lot of code...

it is notable how much DOS-era C code doesn't work on Windows, or for that
matter, how lots of Win32 code will not work on Win64 even despite some of
the ugliness MS went through to try to make the transition go smoothly...

or such...

BGB / cr88192 · Jan 14, 2010

Flash Gordon said:
If the compiler does not claim to support processors not yet invented then
that is not a problem. You can't blame a compiler (or program) if it fails
for processors which are not supported even if the processor is
theoretically backwards compatible.

if a processor claims to be "backwards compatible" yet old code often breaks
on it, who takes the blame?...
that is right, it is the manufacturer...

it is worth noting the rather large numbers of hoops Intel, MS, ... have
gone through over the decades to make all this stuff work, and keep
working...

it is only the great sudden turn of events that MS dropped Win16 and MS-DOS
support from Win64, even though technically there was little "real" reason
for doing so (lacking v86 and segments in long mode to me seems more like an
excuse, as MS does demonstratably have the technology to just use an
interpreter...).

AMD could partly be blamed for their design decisions, but I guess they
figured "well, probably the OS will include an emulator for this old
stuff...".

the end result is that it is then forced on the user to go get and use an
emulator for their older SW, which works, but from what I have heard, there
are probably at least a few other unhappy users around from the recent turn
of events...

it doesn't help that even lots of 32-bit SW has broken on newer Windows, due
I suspect to MS no longer really caring so much anymore about legacy
support...

Chris M. Thomasson · Jan 14, 2010

amit said:
Hello friends,

If there's a global variable - to be accessed (read/written) by multiple
threads (on multiprocessor), then any correctly implemented access (of
that variable) will cause complete cache reload of all CPUs - is that
true or not? Anyway, what would be the cost (as compared to single read/
write instruct

I'm not talking about locking here - I'm talking about all threads seeing
the most recent value of that variable.

http://groups.google.com/group/comp.arch/browse_frm/thread/df6f520f7af13ea5
(read all...)

Flash Gordon · Jan 15, 2010

BGB said:
if a processor claims to be "backwards compatible" yet old code often breaks
on it, who takes the blame?...
that is right, it is the manufacturer...

it is worth noting the rather large numbers of hoops Intel, MS, ... have
gone through over the decades to make all this stuff work, and keep
working...

<snip>

Not successfully. I used programs that worked on a 286 PC but failed on
a 386 unless you switched "Turbo mode" off. This was nothing to do with
the OS.

it doesn't help that even lots of 32-bit SW has broken on newer Windows, due
I suspect to MS no longer really caring so much anymore about legacy
support...

It ain't all Microsoft's fault. Also, there are good technical reasons
for dropping support of ancient interfaces.

BGB / cr88192 · Jan 15, 2010

Flash Gordon said:
<snip>

Not successfully. I used programs that worked on a 286 PC but failed on a
386 unless you switched "Turbo mode" off. This was nothing to do with the
OS.

on DOS, yes, it is the HW in this case...

It ain't all Microsoft's fault. Also, there are good technical reasons for
dropping support of ancient interfaces.

yeah, AMD prompted it with a few of their changes...

but, MS could have avoided the problem by essentially migrating both NTVDM
and DOS support into an interpreter (which would itself provide segmentation
and v86).

a lot of the rest of what was needed (to glue the interpreter to Win64) was
likely already implemented in getting WoW64 working, ...

this way, we wouldn't have been stuck needing DOSBox for software from
decades-past...

if DOSBox can do it, MS doesn't have "that" much excuse, apart from maybe
that they can no longer "sell" all this old software, so for them there is
not as much market incentive to keep it working...

Nobody · Jan 15, 2010

if a processor claims to be "backwards compatible" yet old code often breaks
on it, who takes the blame?...
that is right, it is the manufacturer...

it is worth noting the rather large numbers of hoops Intel, MS, ... have
gone through over the decades to make all this stuff work, and keep
working...

It's also worth noting that they know when to give up.

If maintaining compatibility just requires effort (on the part of MS or
Intel), then usually they make the effort. If it would require a
substantial performance sacrifice (i.e. complete software emulation),
then tough luck.

BGB / cr88192 · Jan 16, 2010

Nobody said:
It's also worth noting that they know when to give up.

If maintaining compatibility just requires effort (on the part of MS or
Intel), then usually they make the effort. If it would require a
substantial performance sacrifice (i.e. complete software emulation),
then tough luck.

for the DOS or Win 3.x apps, few will notice the slowdown, as these apps
still run much faster in the emulator than on the original HW...

an emulator would also not slow down things not running in it:
DOS and Win 3.x apps would be "slowed" (still much faster than original),
whereas 32-bit apps run at full speed directly on the HW.

(I have also written the same sort of interpreter, and it is not exactly a
huge or difficult feat).

so, as I see it, there was little reason for them not to do this, apart from
maybe a lack of economic payofff (they can't make lots of money off of
having peoples' Win 3.x era stuff keep working natively...).

DOSBox works plenty well for DOS, but I have found DOSBox running Win3.11 to
be kind of lame (the primary reason being that DOSBox+Win3.11 means
poor-FS-sharing, one usually ends up having to exit Win3.11 to sync files,
....).

I had partly considered doing my own Win3.x on Win64 emulator, but I figured
this would be more effort than it is probably worth (I don't need good
integration that badly, but it would be nice...).

unsurprisingly, there does not seem to be a Windows port of Wine (but Wine
itself has the ability to make use of emulation...).

although, FWIW, Win 3.11 on DOSBox does seem a bit like a little toy OS,
almost like one of those little gimmick OS's that people put within some
games... (content simulated, and only a small number of things to look at,
....). except, this was the OS...

class global variable?(from Perltoot document)	2	Mar 11, 2008
exceptions from daemon threads which access the global namespace atinterpreter shutdown (how to sque	0	Apr 25, 2010
Newbie Question: Global variable vs. Top-level variable	2	May 25, 2005
Question about loggers	26	Mar 7, 2012
global component to manage current users	0	Nov 5, 2003
The Semantics of 'volatile'	73	Jun 2, 2009
Question about 2005 Ruby critique...	7	Sep 24, 2007
Newbie question: how to keep a socket listening?	7	Jun 24, 2005

Newbie question: accessing global variable on multiprocessor

amit

amit

Ian Collins

Keith Thompson

Rui Maciel

BGB / cr88192

lacos

Nobody

Flash Gordon

Nobody

Flash Gordon

gwowen

BGB / cr88192

BGB / cr88192

BGB / cr88192

Chris M. Thomasson

Flash Gordon

BGB / cr88192

Nobody

BGB / cr88192

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads