I have a guess (it would be good if someone on the standard committee
confirmed or denied):
let's say, thread 1 calculates an object o1
then, thread1 calculates an object o2 using object o1 (that is, o1
"carries a dependency" to o2)
then, thread1 releases o2.
then (in a sense of "before") thread2 either acquires or consumes o2.
The difference is that if thread2 acquires o2, it is only guaranteed to
see the value of o2 as calculated by thread1, but not o1
Otherwise, if thread2 consumes o2, it is guaranteed to see both o1 and
o2 as calculated by thread1
In hardware language this may mean that implementation is required to
execute read memory barrier for o2 only if "acquire" operation is used
and for o2 and all its "dependency-carriers" if "consume" is used.
-Pavel
PS. Again, this was just a not-so-educated guess (lock-free approach
have not worked too well for me so far; "hybrid" primitives like modern
implementations of POSIX mutex or LINUX futexes seemed to do better.
They relieve me from thinking too hard when I need to switch from
polling atomics to waiting on a lock and back.. atomically (sounds like
catch 22 which it probably is). I am not sure how or whether such
hybrids can be programmed in C++0x threading model and would be
interested to learn people's thoughts on this).
I'm not sure I have it quite correct either, but I think that Pavel is
mistaken.
From my understanding, here is a simple example which should prove
enlightening:
//Forgive my pseudo-code like stuff, I don't have access to an actual
compiler atm.
//initial conditions
int x = 0;
int y = 0;
std::atomic<int*> z = 0
//all threads started concurrently from initial conditions
//thread 1
x = 1;
y = 2;
z.store(&y, std::memory_order_release);
//thread 2
int* p = z.load(std::memory_order_acquire);
if (p != 0)
cout << x << " " << *p << endl;
//thread 3
int* p = z.load(std::memory_order_consume);
if (p != 0)
cout << x << " " << *p << endl;
//end example
Now, if I understand this correctly, thread 2, if it prints something,
then it will print "1 2". The acquire read of z in thread 2 read a
release write of thread 1, so there exists a happens-before
relationship between those two memory actions, thus the subsequent
read of x in thread 2 must see the earlier write of x in thread 1.
With thread 3, I'm not sure of the exact particulars - it might have a
race condition. Let's suppose that the consume read on z in thread 3
reads the release write on z in thread 1. This makes a "data
dependency", or whatever exact term the standard wants to use. Unlike
the acquire-release relationship which guarantees a strict happens-
before, the consume-release relationship only applies to reads made as
an indirection on a read which came from that consume, directly or
indirectly. So, the read on x in thread 3 is not from an indirection
on the consume read (on p), so it has no guarantees, so it may be a
race condition. (Anyone more knowledgeable help me out?) However, the
read on the object *p is an indirection from the consume read on the
object p, so given
- the consume read on p in thread 3 read the release write on p in
thread 1,
then
- the subsequent (atomic or non-atomic) read on *p is guaranteed to
see the previous (atomic or non-atomic) write in thread 1.
Or something like that. This is generalizing a bit, but it's intended
to get the point across.
To be clear, std::memory_order_consume provides guarantees which are a
strict subset of the guarantees of std::memory_order_acquire - you can
always replace a correct std::memory_order_consume with
std::memory_order_acquire and keep correctness and the same semantics.