C++0x: memory_order_acq_rel vs memory_order_seq_cst

  • Thread starter Dmitriy V'jukov
  • Start date
D

Dmitriy V'jukov

I am trying to figure out difference between memory_order_acq_rel and
memory_order_seq_cst memory order in C++0x.

As I understand memory_order_seq_cst is similar to volatile keyword in
Java, i.e.:

std::atomic_bool b1, b2;
std::atomic_store_explicit(&b1, true, std::memory_order_seq_cst);
// #StoreLoad is emitted by compiler
if (std::atomic_load(&b2, std::memory_order_seq_cst))
//...

Right?

But what is memory_order_acq_rel? And when I need it? Can you provide
some example?

Dmitriy V'jukov
 
A

Anthony Williams

Dmitriy V'jukov said:
I am trying to figure out difference between memory_order_acq_rel and
memory_order_seq_cst memory order in C++0x.

seq_cst implies Sequential Consistency: *all* seq_cst accesses to *all*
variables form a single total order for any given run of the application: it
is as-if the threads were reduced to a direct interleaving of instructions.

acq_rel provides *pairwise* ordering between two threads. It is possible for
everything to follow acq_rel semantics, and two threads see modifications to
two variables in different orders.

See slides 8-12 in the presentation I did on C++0x threads at ACCU 2008:

But what is memory_order_acq_rel? And when I need it? Can you provide
some example?

Sequential Consistency can impose high synchronization costs on machines with
many CPUs due to the requirement for a single total order. The pairwise
synchronization of acq_rel can be a significant performance optimization in
those cases.

Anthony
 
D

Dmitriy V'jukov

seq_cst implies Sequential Consistency: *all* seq_cst accesses to *all*
variables form a single total order for any given run of the application: it
is as-if the threads were reduced to a direct interleaving of instructions..

acq_rel provides *pairwise* ordering between two threads. It is possible for
everything to follow acq_rel semantics, and two threads see modifications to
two variables in different orders.

See slides 8-12 in the presentation I did on C++0x threads at ACCU 2008:



Sequential Consistency can impose high synchronization costs on machines with
many CPUs due to the requirement for a single total order. The pairwise
synchronization of acq_rel can be a significant performance optimization in
those cases.


Ok. Thank you. So memory_order_acq_rel is what usually called "full
fence" (mfence on x86). And memory_order_seq_cst also enforces total
order between *all* modifications in the system (locked instruction on
x86).

Hmmm... Now I am trying to figure out when memory_order_acq_rel is
insufficient, and one need memory_order_seq_cst. I can't imagine any
example straight off. But there must be some substantial reasons for
inclusion memory_order_seq_cst into standard. Can you provide some
example with memory_order_seq_cst?

Dmitriy V'jukov
 
A

Anthony Williams

Dmitriy V'jukov said:
Ok. Thank you. So memory_order_acq_rel is what usually called "full
fence" (mfence on x86). And memory_order_seq_cst also enforces total
order between *all* modifications in the system (locked instruction on
x86).

x86 is a bad architecture to use for examples, since there are too many
implicit fences, but I think you understand. I believe that the differences
are particularly apparent on architectures like PPC or alpha where
synchronization is explicit.

It is also worth noting that seq_cst only enforces a total order of seq_cst
operations: relaxed operations are still unordered, and a seq_cst operation is
treated as acq_rel by any other acquire/release/acq_rel operations.
Hmmm... Now I am trying to figure out when memory_order_acq_rel is
insufficient, and one need memory_order_seq_cst. I can't imagine any
example straight off. But there must be some substantial reasons for
inclusion memory_order_seq_cst into standard. Can you provide some
example with memory_order_seq_cst?

seq_cst is included, and is the default, because it is easiest to reason
about. If you looked at my slides, many people will find slide 11 hard to deal
with: the two reader threads see different orders of events.

Atomic ops are hard anyway, but non-SC atomics are an order of magnitude
harder.

Anthony
 
D

Dmitriy V'jukov

x86 is a bad architecture to use for examples, since there are too many
implicit fences, but I think you understand.

I hope :)
It is also worth noting that seq_cst only enforces a total order of seq_cst
operations: relaxed operations are still unordered, and a seq_cst operation is
treated as acq_rel by any other acquire/release/acq_rel operations.


seq_cst is included, and is the default, because it is easiest to reason
about. If you looked at my slides, many people will find slide 11 hard to deal
with: the two reader threads see different orders of events.

Atomic ops are hard anyway, but non-SC atomics are an order of magnitude
harder.


I agree that CS is much easier to reason about because it's like
interleaving of all operations. But, in my opinion, it's... strange
reason for inclusion into standard... there must be at least some use
cases for CS...

I think that situation on slide 11 is quite... unrealistic. Well, not
situation is unrealistic. Unrealistic that mentioned output can break
some real algorithm. Or I'm not right? I'm trying to figure out some
use cases for CS...

So reasoning of WG21 is "Use seq_cst until you don't understand slide
11 of Anthony Williams' Presentation. Then forget about seq_cst and
use acq_rel" :)

Dmitriy V'jukov
 
D

Dmitriy V'jukov

I agree that CS is much easier to reason about because it's like
interleaving of all operations. But, in my opinion, it's... strange
reason for inclusion into standard... there must be at least some use
cases for CS...

I think that situation on slide 11 is quite... unrealistic. Well, not
situation is unrealistic. Unrealistic that mentioned output can break
some real algorithm. Or I'm not right? I'm trying to figure out some
use cases for CS...


Put it this way. If you are implementing your own atomics library for
those who understand slide 11, will you include memory_order_seq_cst
or not? If yes, what is your reasoning?


Dmitriy V'jukov
 
A

Anthony Williams

Dmitriy V'jukov said:
I hope :)



I agree that CS is much easier to reason about because it's like
interleaving of all operations. But, in my opinion, it's... strange
reason for inclusion into standard... there must be at least some use
cases for CS...

I think that providing a layer that is easier to reason about *is* a good use
case. It is also the same level of guarantee provided by the Java 1.5 memory
model in the absence of data races, if I understand correctly.

http://java.sun.com/docs/books/jls/third_edition/html/memory.html
I think that situation on slide 11 is quite... unrealistic. Well, not
situation is unrealistic. Unrealistic that mentioned output can break
some real algorithm. Or I'm not right? I'm trying to figure out some
use cases for CS...

Non-SC atomics can be very hard to use correctly. The members of the memory
model group (of which I'm only a peripheral member) have had long discussions
about the meaning of several examples. If the experts have to have long
discussions, we need a simpler layer.
So reasoning of WG21 is "Use seq_cst until you don't understand slide
11 of Anthony Williams' Presentation. Then forget about seq_cst and
use acq_rel" :)

;-)

That's the simplest example I could think of to demonstrate the point. The
consequences can be quite far-reaching, though, especially if you have other
threads in the mix. You're currently working on lock-free stuff, so you should
be well-aware of that.

Anthony
 
A

Anthony Williams

Dmitriy V'jukov said:
Put it this way. If you are implementing your own atomics library for
those who understand slide 11, will you include memory_order_seq_cst
or not? If yes, what is your reasoning?

Yes.

Non-SC atomics are hard, even for experts. IIRC, at one of the memory model
group meetings, Paul McKenney said that he might use SC atomics for
prototyping code, and then selectively change some of them to acquire/release
semantics or even relaxed operations in order to optimise the code, once it
was working.

Anthony
 
D

Dmitriy V'jukov

Yes.

Non-SC atomics are hard, even for experts. IIRC, at one of the memory model
group meetings, Paul McKenney said that he might use SC atomics for
prototyping code, and then selectively change some of them to acquire/release
semantics or even relaxed operations in order to optimise the code, once it
was working.


It's very good point. And good use case.

I think that inverse process can take place too. If user defines
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS macro then I will ignore all
memory order parameters and use memory_order_seq_cst instead.
So if some unit-test starts failing, then user can try to define
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS first just to see whether it's
problem with fine-grained memory ordering or it's problem in algorithm
itself.
What do you think?


Dmitriy V'jukov
 
A

Anthony Williams

Dmitriy V'jukov said:
I think that inverse process can take place too. If user defines
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS macro then I will ignore all
memory order parameters and use memory_order_seq_cst instead.
So if some unit-test starts failing, then user can try to define
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS first just to see whether it's
problem with fine-grained memory ordering or it's problem in algorithm
itself.
What do you think?

That's a good idea. The standard doesn't require that the fine-grained
ordering actually be more fine-grained, so it would still be a compliant
library.

Anthony
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top