C++0x: memory_order_acq_rel vs memory_order_seq_cst

Dmitriy V'jukov · May 1, 2008

I am trying to figure out difference between memory_order_acq_rel and
memory_order_seq_cst memory order in C++0x.

As I understand memory_order_seq_cst is similar to volatile keyword in
Java, i.e.:

std::atomic_bool b1, b2;
std::atomic_store_explicit(&b1, true, std::memory_order_seq_cst);
// #StoreLoad is emitted by compiler
if (std::atomic_load(&b2, std::memory_order_seq_cst))
//...

Right?

But what is memory_order_acq_rel? And when I need it? Can you provide
some example?

Dmitriy V'jukov

Anthony Williams · May 1, 2008

Dmitriy V'jukov said:
I am trying to figure out difference between memory_order_acq_rel and
memory_order_seq_cst memory order in C++0x.

seq_cst implies Sequential Consistency: *all* seq_cst accesses to *all*
variables form a single total order for any given run of the application: it
is as-if the threads were reduced to a direct interleaving of instructions.

acq_rel provides *pairwise* ordering between two threads. It is possible for
everything to follow acq_rel semantics, and two threads see modifications to
two variables in different orders.

See slides 8-12 in the presentation I did on C++0x threads at ACCU 2008:

But what is memory_order_acq_rel? And when I need it? Can you provide
some example?

Sequential Consistency can impose high synchronization costs on machines with
many CPUs due to the requirement for a single total order. The pairwise
synchronization of acq_rel can be a significant performance optimization in
those cases.

Anthony

Dmitriy V'jukov · May 2, 2008

seq_cst implies Sequential Consistency: *all* seq_cst accesses to *all*
variables form a single total order for any given run of the application: it
is as-if the threads were reduced to a direct interleaving of instructions..

acq_rel provides *pairwise* ordering between two threads. It is possible for
everything to follow acq_rel semantics, and two threads see modifications to
two variables in different orders.

See slides 8-12 in the presentation I did on C++0x threads at ACCU 2008:

Sequential Consistency can impose high synchronization costs on machines with
many CPUs due to the requirement for a single total order. The pairwise
synchronization of acq_rel can be a significant performance optimization in
those cases.

Ok. Thank you. So memory_order_acq_rel is what usually called "full
fence" (mfence on x86). And memory_order_seq_cst also enforces total
order between *all* modifications in the system (locked instruction on
x86).

Hmmm... Now I am trying to figure out when memory_order_acq_rel is
insufficient, and one need memory_order_seq_cst. I can't imagine any
example straight off. But there must be some substantial reasons for
inclusion memory_order_seq_cst into standard. Can you provide some
example with memory_order_seq_cst?

Dmitriy V'jukov

Anthony Williams · May 2, 2008

Dmitriy V'jukov said:
Ok. Thank you. So memory_order_acq_rel is what usually called "full
fence" (mfence on x86). And memory_order_seq_cst also enforces total
order between *all* modifications in the system (locked instruction on
x86).

x86 is a bad architecture to use for examples, since there are too many
implicit fences, but I think you understand. I believe that the differences
are particularly apparent on architectures like PPC or alpha where
synchronization is explicit.

It is also worth noting that seq_cst only enforces a total order of seq_cst
operations: relaxed operations are still unordered, and a seq_cst operation is
treated as acq_rel by any other acquire/release/acq_rel operations.

Hmmm... Now I am trying to figure out when memory_order_acq_rel is
insufficient, and one need memory_order_seq_cst. I can't imagine any
example straight off. But there must be some substantial reasons for
inclusion memory_order_seq_cst into standard. Can you provide some
example with memory_order_seq_cst?

seq_cst is included, and is the default, because it is easiest to reason
about. If you looked at my slides, many people will find slide 11 hard to deal
with: the two reader threads see different orders of events.

Atomic ops are hard anyway, but non-SC atomics are an order of magnitude
harder.

Anthony

Dmitriy V'jukov · May 2, 2008

x86 is a bad architecture to use for examples, since there are too many
implicit fences, but I think you understand.

I hope

It is also worth noting that seq_cst only enforces a total order of seq_cst
operations: relaxed operations are still unordered, and a seq_cst operation is
treated as acq_rel by any other acquire/release/acq_rel operations.

seq_cst is included, and is the default, because it is easiest to reason
about. If you looked at my slides, many people will find slide 11 hard to deal
with: the two reader threads see different orders of events.

Atomic ops are hard anyway, but non-SC atomics are an order of magnitude
harder.

I agree that CS is much easier to reason about because it's like
interleaving of all operations. But, in my opinion, it's... strange
reason for inclusion into standard... there must be at least some use
cases for CS...

I think that situation on slide 11 is quite... unrealistic. Well, not
situation is unrealistic. Unrealistic that mentioned output can break
some real algorithm. Or I'm not right? I'm trying to figure out some
use cases for CS...

So reasoning of WG21 is "Use seq_cst until you don't understand slide
11 of Anthony Williams' Presentation. Then forget about seq_cst and
use acq_rel"

Dmitriy V'jukov

Dmitriy V'jukov · May 2, 2008

I agree that CS is much easier to reason about because it's like
interleaving of all operations. But, in my opinion, it's... strange
reason for inclusion into standard... there must be at least some use
cases for CS...

I think that situation on slide 11 is quite... unrealistic. Well, not
situation is unrealistic. Unrealistic that mentioned output can break
some real algorithm. Or I'm not right? I'm trying to figure out some
use cases for CS...

Put it this way. If you are implementing your own atomics library for
those who understand slide 11, will you include memory_order_seq_cst
or not? If yes, what is your reasoning?

Dmitriy V'jukov

Anthony Williams · May 2, 2008

Dmitriy V'jukov said:
I hope

I agree that CS is much easier to reason about because it's like
interleaving of all operations. But, in my opinion, it's... strange
reason for inclusion into standard... there must be at least some use
cases for CS...

I think that providing a layer that is easier to reason about *is* a good use
case. It is also the same level of guarantee provided by the Java 1.5 memory
model in the absence of data races, if I understand correctly.

http://java.sun.com/docs/books/jls/third_edition/html/memory.html

I think that situation on slide 11 is quite... unrealistic. Well, not
situation is unrealistic. Unrealistic that mentioned output can break
some real algorithm. Or I'm not right? I'm trying to figure out some
use cases for CS...

Non-SC atomics can be very hard to use correctly. The members of the memory
model group (of which I'm only a peripheral member) have had long discussions
about the meaning of several examples. If the experts have to have long
discussions, we need a simpler layer.

So reasoning of WG21 is "Use seq_cst until you don't understand slide
11 of Anthony Williams' Presentation. Then forget about seq_cst and
use acq_rel"

;-)

That's the simplest example I could think of to demonstrate the point. The
consequences can be quite far-reaching, though, especially if you have other
threads in the mix. You're currently working on lock-free stuff, so you should
be well-aware of that.

Anthony

Anthony Williams · May 2, 2008

Dmitriy V'jukov said:
Put it this way. If you are implementing your own atomics library for
those who understand slide 11, will you include memory_order_seq_cst
or not? If yes, what is your reasoning?

Yes.

Non-SC atomics are hard, even for experts. IIRC, at one of the memory model
group meetings, Paul McKenney said that he might use SC atomics for
prototyping code, and then selectively change some of them to acquire/release
semantics or even relaxed operations in order to optimise the code, once it
was working.

Anthony

Dmitriy V'jukov · May 2, 2008

Yes.

Non-SC atomics are hard, even for experts. IIRC, at one of the memory model
group meetings, Paul McKenney said that he might use SC atomics for
prototyping code, and then selectively change some of them to acquire/release
semantics or even relaxed operations in order to optimise the code, once it
was working.

It's very good point. And good use case.

I think that inverse process can take place too. If user defines
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS macro then I will ignore all
memory order parameters and use memory_order_seq_cst instead.
So if some unit-test starts failing, then user can try to define
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS first just to see whether it's
problem with fine-grained memory ordering or it's problem in algorithm
itself.
What do you think?

Dmitriy V'jukov

Anthony Williams · May 2, 2008

Dmitriy V'jukov said:
I think that inverse process can take place too. If user defines
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS macro then I will ignore all
memory order parameters and use memory_order_seq_cst instead.
So if some unit-test starts failing, then user can try to define
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS first just to see whether it's
problem with fine-grained memory ordering or it's problem in algorithm
itself.
What do you think?

That's a good idea. The standard doesn't require that the fine-grained
ordering actually be more fine-grained, so it would still be a compliant
library.

Anthony

ó++0x: atomic_load	4	May 1, 2008
C++0x memory model and atomics, some questions	5	Sep 1, 2010
Subtle difference between C++0x MM and other MMs	25	Aug 24, 2008
What in C++11 prohibits mutex operations from being reordered?	68	Apr 2, 2013
Compiler ordering barriers in C++0x	13	Apr 24, 2008
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
C++0x: release sequence	17	Jun 15, 2008
C++0x: Communication with signal handler	3	Sep 8, 2008

C++0x: memory_order_acq_rel vs memory_order_seq_cst

Dmitriy V'jukov

Anthony Williams

Dmitriy V'jukov

Anthony Williams

Dmitriy V'jukov

Dmitriy V'jukov

Anthony Williams

Anthony Williams

Dmitriy V'jukov

Anthony Williams

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads