What in C++11 prohibits mutex operations from being reordered?

Discussion in 'C++' started by Michael Podolsky, Apr 2, 2013.

  1. Hi Everyone,

    My question is about the memory model in the context of C++11
    standard.

    1. The standard does not demand mutex operations to be all totally
    ordered (memory_order_seq_cst), nor demand them to have
    memory_order_acq_rel semantics for lock() operation. From various
    parts of the standard it may be inferred that lock() operation should
    have memory_order_acquire semantics and unlock() operation -
    memory_order_release semantics.

    2. I then should ask the question: which part of the standard prevents
    lock() and unlock() operations on DIFFERENT mutexes to be reorered,
    i.e. what prevents the following sequence:

    m1.lock();
    m1.unlock();
    m2.lock();
    m2.unlock();

    to be compiled into:

    m1.lock();
    m2.lock();
    m1.unlock();
    m2.unlock();

    which obviously must not be allowed because of potential deadlocks.

    3. If the answer to the previous question is some special knowledge of
    compiler of mutex objects, I should then ask the same question for a
    hypothetical hand-made synchronization object, like a spin-lock,
    implemented via atomic operations with memory_order_acquire for
    lock() and memory_order_release on unlock(). The same question should
    be asked here - does the standard prohibit (and by which exactly
    paragraph if you may make the reference) reordering of unlock() and a
    following lock() operations for such a hand-made object which may
    internally use

    atomic<int>::store(0, memory_order_release) for unlock
    and then
    atomic<int>::exchange(1, memory_order_acquire) for lock?

    Note again that these operation are running on DIFFERENT memory
    locations, i.e. the question is now reformulated like: what prohibits
    the following reordering

    from

    while( atomic1.exchange(1, memory_order_acquire) ){;}
    // critical section protected by atomic1
    atomic1.store(0, memory_order_release);

    while( atomic2.exchange(1, memory_order_acquire) ){;}
    // critical section protected by atomic2
    atomic2.store(0, memory_order_release);

    to

    while( atomic1.exchange(1, memory_order_acquire) ){;}
    while( atomic2.exchange(1, memory_order_acquire) ){;}

    atomic1.store(0, memory_order_release);
    atomic2.store(0, memory_order_release);

    Regards,
    Michael
    Michael Podolsky, Apr 2, 2013
    #1
    1. Advertising

  2. Michael Podolsky

    Öö Tiib Guest

    On Tuesday, 2 April 2013 15:16:25 UTC+3, Michael Podolsky wrote:
    > My question is about the memory model in the context of C++11
    > standard.
    >
    > 1. The standard does not demand mutex operations to be all totally
    > ordered (memory_order_seq_cst), nor demand them to have
    > memory_order_acq_rel semantics for lock() operation. From various
    > parts of the standard it may be inferred that lock() operation should
    > have memory_order_acquire semantics and unlock() operation -
    > memory_order_release semantics.


    So it is not 'memory_order_relaxed' that may result with data reaching
    other thread in strange order.

    > 2. I then should ask the question: which part of the standard prevents
    > lock() and unlock() operations on DIFFERENT mutexes to be reorered,
    > i.e. what prevents the following sequence:
    >
    > m1.lock();
    > m1.unlock();
    > m2.lock();
    > m2.unlock();
    >
    > to be compiled into:
    >
    > m1.lock();
    > m2.lock();
    > m1.unlock();
    > m2.unlock();


    How compiler is allowed to compile it like that? The locks and unlocks
    have side effects and there is § 1.9/14:
    "Every value computation and side effect associated with a full-expression
    is sequenced before every value computation and side effect associated with
    the next full-expression to be evaluated."

    Perhaps the compilers have gone too far with optimizations somewhere?
    Öö Tiib, Apr 2, 2013
    #2
    1. Advertising

  3. On Apr 2, 10:10 am, Öö Tiib <> wrote:
    > On Tuesday, 2 April 2013 15:16:25 UTC+3, Michael Podolsky  wrote:
    > > 2. I then should ask the question: which part of the standard prevents
    > > lock() and unlock() operations on DIFFERENT mutexes to be reorered,
    > > i.e. what prevents the following sequence:

    >
    > > m1.lock();
    > > m1.unlock();
    > > m2.lock();
    > > m2.unlock();

    >
    > > to be compiled into:

    >
    > > m1.lock();
    > > m2.lock();
    > > m1.unlock();
    > > m2.unlock();

    >
    > How compiler is allowed to compile it like that?


    I hope it is not, just wanted to understand what exact part of
    standard prevents it from doing so.


    > The locks and unlocks
    > have side effects and there is § 1.9/14:
    >  "Every value computation and side effect associated with a full-expression
    > is sequenced before every value computation and side effect associated with
    > the next full-expression to be evaluated."


    Even a simple assignment has a side effect. §1.9/12 "modifying an
    object". So if you are right, the following two assignment to integer
    variables cannot be reordered.

    i1=5; // has side effect, so
    i2=7; // can't be reordered???

    We are not allowed by standard to look at the reordering of such
    assignments, but we can look at relaxed atomics:

    a1.store(5, memory_order_relaxed);
    a2.store(7, memory_order_relaxed); // can't be reordered by
    // compiler???



    It can not be direct relation between "sequenced before" relation and
    assembly output, otherwise we would not be allowed to reorder the
    previous two relaxed atomic operations on the level of compiler. And
    we are allowed.

    Regards,
    Michael
    Michael Podolsky, Apr 3, 2013
    #3
  4. Michael Podolsky

    Öö Tiib Guest

    On Wednesday, 3 April 2013 05:30:45 UTC+3, Michael Podolsky wrote:
    > On Apr 2, 10:10 am, Öö Tiib <> wrote:
    > > On Tuesday, 2 April 2013 15:16:25 UTC+3, Michael Podolsky wrote:
    > > > 2. I then should ask the question: which part of the standard prevents
    > > > lock() and unlock() operations on DIFFERENT mutexes to be reorered,
    > > > i.e. what prevents the following sequence:

    > >
    > > > m1.lock();
    > > > m1.unlock();
    > > > m2.lock();
    > > > m2.unlock();

    > >
    > > > to be compiled into:

    > >
    > > > m1.lock();
    > > > m2.lock();
    > > > m1.unlock();
    > > > m2.unlock();

    > >
    > > How compiler is allowed to compile it like that?

    >
    > I hope it is not, just wanted to understand what exact part of
    > standard prevents it from doing so.


    Mutex operations make things visible to other threads and so possibly observable behavior.

    > > The locks and unlocks
    > > have side effects and there is § 1.9/14:
    > > "Every value computation and side effect associated with a full-expression
    > > is sequenced before every value computation and side effect associated with
    > > the next full-expression to be evaluated."


    I forgot to say here that "sequenced before" is only making the abstract
    machines expected behavior in one thread clear. If compiler can prove
    that it is not observable then it can reorder.

    > Even a simple assignment has a side effect. §1.9/12 "modifying an
    > object". So if you are right, the following two assignment to integer
    > variables cannot be reordered.
    >
    > i1=5; // has side effect, so
    > i2=7; // can't be reordered???


    It can be reordered if the i1 and i2 are not volatile. Volatile means
    strictly abstract machine (§ 1.9/8) not observable behavior of
    abstract machine (§1.9/1 and /5).

    > We are not allowed by standard to look at the reordering of such
    > assignments, but we can look at relaxed atomics:
    >
    > a1.store(5, memory_order_relaxed);
    > a2.store(7, memory_order_relaxed); // can't be reordered by
    > // compiler???


    Can be. Those are relaxed! Relaxed means "no data races" and that's
    it, no "fences" (hardware memory ordering instructions) required here.
    Without fences ... the operations may be done concurrently or reordered
    by modern processors and so it does not even matter what assembler
    compiler did make of it. Several places in standard explain it.

    > It can not be direct relation between "sequenced before" relation and
    > assembly output, otherwise we would not be allowed to reorder the
    > previous two relaxed atomic operations on the level of compiler. And
    > we are allowed.


    You originally asked about mutexes, and not about "relaxed" atomic
    operations. For example § 1.10/5 elaborates the difference.
    Öö Tiib, Apr 3, 2013
    #4
  5. On Apr 3, 7:17 am, Öö Tiib <> wrote:
    > On Wednesday, 3 April 2013 05:30:45 UTC+3, Michael Podolsky  wrote:


    > > It can not be direct relation between "sequenced before" relation and
    > > assembly output, otherwise we would not be allowed to reorder the
    > > previous two relaxed atomic operations on the level of compiler. And
    > > we are allowed.

    >
    > You originally asked about mutexes, and not about "relaxed" atomic
    > operations. For example § 1.10/5 elaborates the difference.


    I thought you claimed that "sequenced before" is what solely prohibits
    compiler from reordering its output instructions in my example with
    mutexes, so I brought an example with integers (non-volatile, of
    course) and then an example with relaxed atomics to show that
    "sequenced before" is not directly equivalent to the order of
    instructions produced by compiler, the compiler may still reorder
    instructions, even if the later "sequenced before" former.

    > For example § 1.10/5 elaborates the difference.


    It is not clear for me still what in this paragraph prevents mutex
    operations from being reordered in my example. Besides take into
    account that in my original example (start of the thread) I have an
    implementation of spin-locks with atomic variables and ask the same
    question - what prevents these atomic operations from being reordered
    by compiler.

    Personally, I now have a feeling that it is ONLY a 'for' operator
    which makes compiler to decide it cannot reorder (at least for my
    custom-made spinlock) -- but I am not sure at all that I am correct,
    and still, even for this idea, I can't refer to any specific paragraph
    of the C++11 standard.

    Regards, Michael
    Michael Podolsky, Apr 3, 2013
    #5
  6. "Personally, I now have a feeling that it is ONLY a 'for' operator "

    I meant 'while' operator
    Michael Podolsky, Apr 3, 2013
    #6
  7. Michael Podolsky wrote:
    >
    > Hi Everyone,
    >
    > My question is about the memory model in the context of C++11
    > standard.
    >
    > 1. The standard does not demand mutex operations to be all totally
    > ordered (memory_order_seq_cst), nor demand them to have
    > memory_order_acq_rel semantics for lock() operation. From various
    > parts of the standard it may be inferred that lock() operation should
    > have memory_order_acquire semantics and unlock() operation -
    > memory_order_release semantics.
    >
    > 2. I then should ask the question: which part of the standard prevents
    > lock() and unlock() operations on DIFFERENT mutexes to be reorered,
    > i.e. what prevents the following sequence:
    >
    > m1.lock();
    > m1.unlock();
    > m2.lock();
    > m2.unlock();
    >
    > to be compiled into:
    >
    > m1.lock();
    > m2.lock();
    > m1.unlock();
    > m2.unlock();
    >
    > which obviously must not be allowed because of potential deadlocks.


    Injecting deadlocks is a no-no but I hope the intent is that the
    following

    T1:

    m1.lock();
    .... start T2 ...
    m1.unlock();
    m2.lock();

    T2:

    if (m2.trylock() == failure) assert(m1.trylock() == success);

    may still abort (in effect allowing deadlock-free reordering above).

    regards,
    alexander.
    Alexander Terekhov, Apr 3, 2013
    #7
  8. Michael Podolsky

    ?? Tiib Guest

    On Wednesday, 3 April 2013 15:23:18 UTC+3, Michael Podolsky wrote:
    > On Apr 3, 7:17?am, ?? Tiib <> wrote:
    > > On Wednesday, 3 April 2013 05:30:45 UTC+3, Michael Podolsky ?wrote:

    >
    > > > It can not be direct relation between "sequenced before" relation and
    > > > assembly output, otherwise we would not be allowed to reorder the
    > > > previous two relaxed atomic operations on the level of compiler. And
    > > > we are allowed.

    > >
    > > You originally asked about mutexes, and not about "relaxed" atomic
    > > operations. For example ? 1.10/5 elaborates the difference.

    >
    > I thought you claimed that "sequenced before" is what solely prohibits
    > compiler from reordering its output instructions in my example with
    > mutexes, so I brought an example with integers (non-volatile, of
    > course) and then an example with relaxed atomics to show that
    > "sequenced before" is not directly equivalent to the order of
    > instructions produced by compiler, the compiler may still reorder
    > instructions, even if the later "sequenced before" former.


    Yeah, sorry for being unclear. I do not get what exact part of
    puzzle it is then that you miss? "sequenced before" it was not
    maybe "synchronizes with" or "happens before"? Mutex access can
    not be reordered.

    > > For example ? 1.10/5 elaborates the difference.

    >
    > It is not clear for me still what in this paragraph prevents mutex
    > operations from being reordered in my example.


    Yes, ?1.10/5 only tells that atomic operations, mutexes and fences
    as synchronization operations and that relaxed atomic operations *are*
    *not* synchronization operations.

    Maybe you should first find out how synchronization operations work?
    It is tricky logic sprinkled all over the standard ... I think most
    of the ?1.10, ?29.3 and ?29.8 are relevant. I try to collect *short*
    and *simple* and *logical* explanation ... : If A is sequenced before
    B on the same thread, and B synchronizes with C on another thread,
    and C is sequenced before D on that second thread, then A happens before
    D, and A and D can access the same data even if they are non-atomic
    operations. So ... even ordinary 'int' access like 'i = 42' can not be
    ordered to other side of m2.lock() nothing to talk of m1.unlock() or
    some other synchronization operation like that.

    > Besides take into
    > account that in my original example (start of the thread) I have an
    > implementation of spin-locks with atomic variables and ask the same
    > question - what prevents these atomic operations from being reordered
    > by compiler.


    I did not understand what your example does, sorry. Let me try to write
    example that achieves some synchronization ...

    // bool that tells that thread 1 is done with data. It is
    // atomic just because standard does not guarantee that accesses to
    // bool are atomic.
    std::atomic<bool> ready(false);
    // the ordinary data
    int data=0;

    void thread_1()
    {
    // these lines can't be reordered in any way
    data=42;
    std::atomic_thread_fence(std::memory_order_release);
    ready.store(true,std::memory_order_relaxed);
    }

    void thread_2()
    {
    if(ready.load(std::memory_order_relaxed))
    {
    // these lines can't be reordered
    std::atomic_thread_fence(std::memory_order_acquire);
    std::cout<<"data="<<data<<std::endl;
    }
    }

    The effect is that thread 2 does not touch data if thread 1
    is not ready with it. So access to non-atomic "data" is safe, synchronized
    and no data race is possible.
    ?? Tiib, Apr 3, 2013
    #8
  9. ?? Tiib wrote:
    [...]
    > Yeah, sorry for being unclear. I do not get what exact part of
    > puzzle it is then that you miss? "sequenced before" it was not
    > maybe "synchronizes with" or "happens before"? Mutex access can
    > not be reordered.


    I always thought that e.g.

    a.unlock() "sequenced before" b.lock()

    can be reordered in deadlock-free manner as seen by another thread using
    trylock().

    That's the cornerstone of efficient locking as far as memory model is
    concerned.

    regards,
    alexander.
    Alexander Terekhov, Apr 3, 2013
    #9
  10. Michael Podolsky

    ?? Tiib Guest

    On Wednesday, 3 April 2013 20:10:02 UTC+3, Alexander Terekhov wrote:
    > ?? Tiib wrote:
    >
    > [...]
    > > Yeah, sorry for being unclear. I do not get what exact part of
    > > puzzle it is then that you miss? "sequenced before" it was not
    > > maybe "synchronizes with" or "happens before"? Mutex access can
    > > not be reordered.

    >
    > I always thought that e.g.
    >
    > a.unlock() "sequenced before" b.lock()
    >
    > can be reordered in deadlock-free manner as seen by another thread using
    > trylock().


    Did you mean try_lock() failing? That function is explicitly and specially
    allowed to fail by standard (see ?30.4.1.2/16) even when the lock is not
    held by any other thread.

    > That's the cornerstone of efficient locking as far as memory model is
    > concerned.


    It is just for to have cheap 'try_lock()' implementation. Makes sense.
    It is good to have both cheap and sometimes failing tools in mix with
    expensive and reliable tools.
    ?? Tiib, Apr 3, 2013
    #10
  11. ?? Tiib wrote:
    >
    > On Wednesday, 3 April 2013 20:10:02 UTC+3, Alexander Terekhov wrote:
    > > ?? Tiib wrote:
    > >
    > > [...]
    > > > Yeah, sorry for being unclear. I do not get what exact part of
    > > > puzzle it is then that you miss? "sequenced before" it was not
    > > > maybe "synchronizes with" or "happens before"? Mutex access can
    > > > not be reordered.

    > >
    > > I always thought that e.g.
    > >
    > > a.unlock() "sequenced before" b.lock()
    > >
    > > can be reordered in deadlock-free manner as seen by another thread using
    > > trylock().

    >
    > Did you mean try_lock() failing? That function is explicitly and specially
    > allowed to fail by standard (see ?30.4.1.2/16) even when the lock is not
    > held by any other thread.


    I did not mean spurious failures of C++ try_lock()... actually I never
    understood why one would need that because to me trylock() is basically
    timedlock(yesterday) and POSIX clearly states

    http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_timedlock.html

    "Under no circumstance shall the function fail with a timeout if the
    mutex can be locked immediately."

    whereas C++ draft (N3242 which I have) rather confusingly elaborates
    that

    "An implementation should ensure that try_lock() does not consistently
    return false in the absence of contending mutex acquisitions."

    (No idea what "consistently" supposed to mean here... well I suspect
    that they mean lost reservations due to contention (or context switch)
    without retry even if mutex is seen to be unlocked but I'd rather want
    the retry in this case)

    >
    > > That's the cornerstone of efficient locking as far as memory model is
    > > concerned.

    >
    > It is just for to have cheap 'try_lock()' implementation. Makes sense.
    > It is good to have both cheap and sometimes failing tools in mix with
    > expensive and reliable tools.


    No, it's about 'release' store for mutex unlock() and 'acquire' RMW for
    lock()/trylock() without expensive store-load fencing for locks.

    regards,
    alexander.
    Alexander Terekhov, Apr 3, 2013
    #11
  12. Michael Podolsky

    ?? Tiib Guest

    On Wednesday, 3 April 2013 23:55:16 UTC+3, Alexander Terekhov wrote:
    > ?? Tiib wrote:
    > >
    > > On Wednesday, 3 April 2013 20:10:02 UTC+3, Alexander Terekhov wrote:
    > > > ?? Tiib wrote:
    > > >
    > > > [...]
    > > > > Yeah, sorry for being unclear. I do not get what exact part of
    > > > > puzzle it is then that you miss? "sequenced before" it was not
    > > > > maybe "synchronizes with" or "happens before"? Mutex access can
    > > > > not be reordered.
    > > >
    > > > I always thought that e.g.
    > > >
    > > > a.unlock() "sequenced before" b.lock()
    > > >
    > > > can be reordered in deadlock-free manner as seen by another thread using
    > > > trylock().

    > >
    > > Did you mean try_lock() failing? That function is explicitly and specially
    > > allowed to fail by standard (see ?30.4.1.2/16) even when the lock is not
    > > held by any other thread.

    >
    > I did not mean spurious failures of C++ try_lock()... actually I never
    > understood why one would need that because to me trylock() is basically
    > timedlock(yesterday) and POSIX clearly states


    Oh. I did not understand that you meant 'pthread_mutex_trylock()'
    by 'trylock()'. m.try_lock() is perhaps *not* meant to be implemented in
    terms of m.try_lock_until(yesterday). I do not see that C++ thread
    support library is so tightly bound to POSIX pthread design.

    > [...]
    > whereas C++ draft (N3242 which I have) rather confusingly elaborates
    > that
    >
    > "An implementation should ensure that try_lock() does not consistently
    > return false in the absence of contending mutex acquisitions."
    >
    > (No idea what "consistently" supposed to mean here... well I suspect
    > that they mean lost reservations due to contention (or context switch)
    > without retry even if mutex is seen to be unlocked but I'd rather want
    > the retry in this case)


    Probably some naive "C++0x thread lib" implementation had defect that
    when several threads were busily m.try_lock() then all kept failing and
    so committee decided to forbid such "consistency".

    > > > That's the cornerstone of efficient locking as far as memory model is
    > > > concerned.

    > >
    > > It is just for to have cheap 'try_lock()' implementation. Makes sense.
    > > It is good to have both cheap and sometimes failing tools in mix with
    > > expensive and reliable tools.

    >
    > No, it's about 'release' store for mutex unlock() and 'acquire' RMW for
    > lock()/trylock() without expensive store-load fencing for locks.


    Ah whatever exact optimization there can be. It is anyway not true
    that 'a.unlock()' "sequenced before" 'b.lock()' may be observably
    reordered. a.try_lock() may spuriously fail so it is bad observation tool
    and there are no a.trylock() in C++ so how you observe that reordering?
    ?? Tiib, Apr 3, 2013
    #12
  13. On Apr 3, 12:53?pm, Alexander Terekhov <> wrote:
    > Michael Podolsky wrote:


    > > 2. I then should ask the question: which part of the standard prevents
    > > lock() and unlock() operations on DIFFERENT mutexes to be reorered,
    > > i.e. what prevents the following sequence:

    >
    > > m1.lock();
    > > m1.unlock();
    > > m2.lock();
    > > m2.unlock();

    >
    > > to be compiled into:

    >
    > > m1.lock();
    > > m2.lock();
    > > m1.unlock();
    > > m2.unlock();

    >
    > > which obviously must not be allowed because of potential deadlocks.

    >
    > Injecting deadlocks is a no-no but I hope the intent is that the
    > following
    >
    > T1:
    >
    > m1.lock();
    > ... start T2 ...
    > m1.unlock();
    > m2.lock();
    >
    > T2:
    >
    > if (m2.trylock() == failure) assert(m1.trylock() == success);
    >
    > may still abort (in effect allowing deadlock-free reordering above).


    The "intent" was merely to understand better and in details the memory
    model of C++11 standard.

    trylock() in C++11 may spuriously fail, we all know about it, yet
    pretending it is not, i.e. supposing your testing assertions are
    meaningful, I wonder how to compile this code (hardware reordering
    optimizations apart) to allow it to release m1 late and still not to
    deadlock. Do you mean something in the following style:


    m1.lock();
    // m1 critical section
    bool succeeded = m2.try_lock();
    m1.unlock();

    if(!succeeded)
    m2.lock();
    // m2 critical secion
    m2.unlock();

    BTW, do you have by change any idea about my original question? Say,
    about reordering in:

    atomic1.store(0, memory_order_release);

    while( atomic2.exchange(1, memory_order_acquire) ){;}

    I currently have a feeling that it is merely a 'while' operator which
    prevents the following two lines to be reordered by compiler. So that
    the following two lines MAY be reordered:

    atomic1.store(0, memory_order_release);

    if( atomic2.exchange(1, memory_order_acquire) ){printf("Hello\n");}

    And I also have a feeling that C++11 might be a little vague about
    this moment -- of course, I may be completely wrong here, no intention
    to cast any doubt on C++11 standard.

    Regards, Michael
    Michael Podolsky, Apr 4, 2013
    #13
  14. On Apr 3, 12:44?pm, "christian.bau" <>
    wrote:
    > On Apr 3, 3:30?am, Michael Podolsky <>
    > wrote:
    >
    > > Even a simple assignment has a side effect. ?1.9/12 "modifying an
    > > object". So if you are right, the following two assignment to integer
    > > variables cannot be reordered.

    >
    > > i1=5; // has side effect, so
    > > i2=7; // can't be reordered???

    >
    > "As if" rule: The compiler can do anything it likes as long as a
    > conforming program cannot figure out the difference.


    It is difference between actual execution and the observable behavior
    of the abstract machine 1.9p1, not between actual execution and
    sequential execution of the code.

    > the program can behave differently depending on the order of
    > mutexes, so it's not allowed to reorder them.



    I am really serious, but what I am going to say may sound offensive or
    like a nonsense.

    What in C++ standard defines that different mutexes should be taken in
    order, i.e. why do we think that

    mutex1.unlock();
    mutex2.lock();

    are defined by the rules of C++ to be executed in this sequence?

    I'll now elaborate my argument. Consider Ex1:

    thread 1:
    // atomic1 and atomic2 are initially 0 (zero)
    atomic1.store(1, memory_order_relaxed);
    atomic2.store(2, memory_order_relaxed);

    thread2:

    int a2 = atomic2.load(memory_order_seq_cst);
    int a1 = atomic1.load(memory_order_seq_cst);

    if(a2==2 && a1==0) printf("Stores were reordered!");

    The behavior of this program may be DIFFERENT (it may print or it may
    not), and yet we tolerate that as both behaviors are in compliance
    with the standard.

    Now consider Ex2 (note the change of ordering arguments in the first
    thread):

    thread 1:
    // atomic1 and atomic2 are initially 0 (zero)
    atomic1.store(1, memory_order_release);
    atomic2.store(2, memory_order_acquire);

    thread2:

    int a2 = atomic2.load(memory_order_seq_cst);
    int a1 = atomic1.load(memory_order_seq_cst);
    if(a2==2 && a1==0) printf("Stores were reordered!");

    Again, the behaviors may be different, yet we tolerate that.

    Now consider Ex3:

    mutex1.unlock(); // has release semantics
    mutex2.lock(); // has acquire semantics

    these operations has release and acquire semantics exactly as Ex2,
    then what exactly prevents them from being reordered or how exactly
    their reordering will violate the spec of abstract C++11 machine? It
    is not a different execution, not a danger of deadlock, formally it
    must be a difference from the abstract C++11 machine behavior.

    Regards, Michael
    Michael Podolsky, Apr 4, 2013
    #14
  15. I should correct a mistake.

    Ex2 should rather be something like:

    thread 1:
    // atomic1 and atomic2 are initially 0 (zero)
    atomic1.store(1, memory_order_release);
    atomic2.exchange(2, memory_order_acquire); // correction!


    thread2:
    int a2 = atomic2.load(memory_order_seq_cst);
    int a1 = atomic1.load(memory_order_seq_cst);
    if(a2==2 && a1==0) printf("Stores were reordered!");

    as there cannot be an acquire barrier on a simple store operation


    Regards, Michael
    Michael Podolsky, Apr 4, 2013
    #15
  16. On Apr 3, 4:55?pm, Alexander Terekhov <> wrote:
    > ?? Tiib wrote:


    > > Did you mean try_lock() failing? That function is explicitly and specially
    > > allowed to fail by standard (see ?30.4.1.2/16) even when the lock is not
    > > held by any other thread.


    > actually I never
    > understood why one would need that because to me trylock() is basically
    > timedlock(yesterday) and POSIX clearly states
    >
    > http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mute...
    >
    > "Under no circumstance shall the function fail with a timeout if the
    > mutex can be locked immediately."


    No way. It's C++11, not POSIX. In C++11 both

    * mutex::try_lock() and
    * timed_mutex::try_lock_for(const chrono::duration<Rep, Period>&
    rel_time)

    are allowed to fail even if the mutex is immediately available. This
    all is the price of introduction of ultimate SC semantics into C++, as
    I could understand. The details on this subject with try_lock are in
    Hans Boehm - "Reordering Constrains for Pthread-Style Locks" p7.

    Regards, Michael
    Michael Podolsky, Apr 4, 2013
    #16
  17. On Apr 3, 12:59?pm, ?? Tiib <> wrote:

    > Yeah, sorry for being unclear. I do not get what exact part of
    > puzzle it is then that you miss? "sequenced before" it was not
    > maybe "synchronizes with" or "happens before"?


    Inside one thread it is "sequenced before" and ordering constrains of
    atomics what
    majorly should govern compilation. I think that "happens before" and
    "synchronizes with"
    are not related as they govern inter-thread synchronization (inside
    one thread "happens before" is mostly trivially "sequenced before" ).
    Anyway I do not see you are making any clear argument here.

    > Mutex access can not be reordered.


    I hope so. Yet this is not written in the standard.


    >
    > > > For example ? 1.10/5 elaborates the difference.

    >
    > > It is not clear for me still what in this paragraph prevents mutex
    > > operations from being reordered in my example.

    >
    > Yes, ?1.10/5 only tells that atomic operations, mutexes and fences
    > as synchronization operations and that relaxed atomic operations *are*
    > *not* synchronization operations.
    >
    > Maybe you should first find out how synchronization operations work?
    > It is tricky logic sprinkled all over the standard ... I think most
    > of the ?1.10, ?29.3 and ?29.8 are relevant. I try to collect *short*
    > and *simple* and *logical* explanation ... : If A is sequenced before
    > B on the same thread, and B synchronizes with C on another thread,
    > and C is sequenced before D on that second thread, then A happens before
    > D, and A and D can access the same data even if they are non-atomic
    > operations. So ... even ordinary 'int' access like 'i = 42' can not be
    > ordered to other side of m2.lock() nothing to talk of m1.unlock() or
    > some other synchronization operation like that.


    Yes, this is clear. I spent some time learning the standard and some
    articles around
    before making this post. I just do not see how the scenario you
    present is related
    to my example.

    >
    > > Besides take into
    > > account that in my original example (start of the thread) I have an
    > > implementation of spin-locks with atomic variables and ask the same
    > > question - what prevents these atomic operations from being reordered
    > > by compiler.

    >
    > I did not understand what your example does, sorry.


    while( atomic1.exchange(1, memory_order_acquire) ){;}
    // critical section protected by atomic1
    atomic1.store(0, memory_order_release);

    this was just a very simple (and ineffective) implementation of a
    spinlock. The first line grabs the lock, the last releases it. And my
    question is then what prevents the following lines to be reordered:

    atomic1.store(0, memory_order_release); // unlocking previous
    critical seciotn
    while( atomic2.exchange(1, memory_order_acquire) ){;} // locking next
    critical section



    Let me try to write
    > example that achieves some synchronization ...
    >
    > ? // bool that tells that thread 1 is done with data. It is
    > ? // atomic just because standard does not guarantee that accesses to
    > ? // bool are atomic.
    > ? std::atomic<bool> ready(false);
    > ? // the ordinary data
    > ? int data=0;
    >
    > ? void thread_1()
    > ? {
    > ? ? ? // these lines can't be reordered in any way
    > ? ? ? data=42;
    > ? ? ? std::atomic_thread_fence(std::memory_order_release);
    > ? ? ? ready.store(true,std::memory_order_relaxed);
    > ? }
    >
    > ? void thread_2()
    > ? {
    > ? ? ? if(ready.load(std::memory_order_relaxed))
    > ? ? ? {
    > ? ? ? ? ? // these lines can't be reordered
    > ? ? ? ? ? std::atomic_thread_fence(std::memory_order_acquire);
    > ? ? ? ? ? std::cout<<"data="<<data<<std::endl;
    > ? ? ? }
    > ? }
    >
    > The effect is that thread 2 does not touch data if thread 1
    > is not ready with it. So access to non-atomic "data" is safe, synchronized
    > and no data race is possible.


    Yep, this is a very basic stuff as for memory ordering in C++. Sorry,
    do not see a relation to the discussed problem.

    Regards, Michael
    Michael Podolsky, Apr 4, 2013
    #17
  18. Michael Podolsky

    Luca Risolia Guest

    On 04/04/2013 03:49, Michael Podolsky wrote:
    > On Apr 3, 12:53 pm, Alexander Terekhov <> wrote:
    >> Michael Podolsky wrote:

    >
    >>> 2. I then should ask the question: which part of the standard prevents
    >>> lock() and unlock() operations on DIFFERENT mutexes to be reorered,
    >>> i.e. what prevents the following sequence:

    >>
    >>> m1.lock();
    >>> m1.unlock();
    >>> m2.lock();
    >>> m2.unlock();

    >>
    >>> to be compiled into:

    >>
    >>> m1.lock();
    >>> m2.lock();
    >>> m1.unlock();
    >>> m2.unlock();


    > trylock() in C++11 may spuriously fail, we all know about it, yet
    > pretending it is not, i.e. supposing your testing assertions are
    > meaningful, I wonder how to compile this code (hardware reordering
    > optimizations apart) to allow it to release m1 late and still not to
    > deadlock. Do you mean something in the following style:
    >
    >
    > m1.lock();
    > // m1 critical section
    > bool succeeded = m2.try_lock();
    > m1.unlock();
    >
    > if(!succeeded)
    > m2.lock();
    > // m2 critical secion
    > m2.unlock();
    >
    > BTW, do you have by change any idea about my original question? Say,
    > about reordering in:
    >
    > atomic1.store(0, memory_order_release);
    >
    > while( atomic2.exchange(1, memory_order_acquire) ){;}


    The statement after the store can be placed before the store itself and
    the statement before the while loop can be placed after the loop:
    basically, the compiler is allowed to swap the statements. The only
    memory order which guarantees sequential consistency is
    std::memory_order_seq_cst, which is the default for atomic types.

    > I currently have a feeling that it is merely a 'while' operator which
    > prevents the following two lines to be reordered by compiler. So that
    > the following two lines MAY be reordered:


    No. Merely considering a while loop does not prevent any statement
    reordering.

    > And I also have a feeling that C++11 might be a little vague about
    > this moment -- of course, I may be completely wrong here, no intention
    > to cast any doubt on C++11 standard.


    To avoid deadlocks when acquiring two or more locks together the
    standard offers std::lock(). If you cannot acquire (nested) locks at the
    same time for some reasons, it's enough to acquire them in a fixed order.
    Luca Risolia, Apr 4, 2013
    #18
  19. On Apr 3, 11:12?pm, Luca Risolia <> wrote:

    > The statement after the store can be placed before the store itself and
    > the statement before the while loop can be placed after the loop:
    > basically, the compiler is allowed to swap the statements. The only
    > memory order which guarantees sequential consistency is
    > std::memory_order_seq_cst, which is the default for atomic types.
    >
    > > I currently have a feeling that it is merely a 'while' operator which
    > > prevents the following two lines to be reordered by compiler. So that
    > > the following two lines MAY be reordered:

    >
    > No. Merely considering a while loop does not prevent any statement
    > reordering.


    So you agreed that my implementation of spin-locks may lead to
    deadlocks just because compiler may reorder completely my
    implementations of unlock() and a following lock(). I personally think
    that my implementation is pretty cool ))) and correct and may not be
    reordered by C++11 compiler and so lead to deadlocks, but I may be
    wrong here.

    Then what about original unlock() and lock() on standard C++11
    mutexes? I believe they (mutexes) are implemented without
    std::memory_order_seq_cst, but with the same release (for unlock) and
    acquire (for lock) ordering I used for my spinlocks. In no place in C+
    +11 standard do they say that operations on mutexes are all
    std::memory_order_seq_cst.

    >
    > > And I also have a feeling that C++11 might be a little vague about
    > > this moment -- of course, I may be completely wrong here, no intention
    > > to cast any doubt on C++11 standard.

    >
    > To avoid deadlocks when acquiring two or more locks together the
    > standard offers std::lock(). If you cannot acquire (nested) locks at the
    > same time for some reasons, it's enough to acquire them in a fixed order.


    Well, I did not mean to acquire the mutexes together, it is just
    compiler that transformed my code into doing so :))) Seriously, I do
    not see that std::lock() makes much sense for multithread programming,
    deadlocks usually happen because you grab different mutexes in
    different places, not just all in one place.

    Regards, Michael
    Michael Podolsky, Apr 4, 2013
    #19
  20. Michael Podolsky

    Luca Risolia Guest

    On 04/04/2013 05:29, Michael Podolsky wrote:
    > On Apr 3, 11:12 pm, Luca Risolia <> wrote:
    >
    >> The statement after the store can be placed before the store itself and
    >> the statement before the while loop can be placed after the loop:
    >> basically, the compiler is allowed to swap the statements. The only
    >> memory order which guarantees sequential consistency is
    >> std::memory_order_seq_cst, which is the default for atomic types.


    > So you agreed that my implementation of spin-locks may lead to
    > deadlocks just because compiler may reorder completely my
    > implementations of unlock() and a following lock(). I personally think
    > that my implementation is pretty cool ))) and correct and may not be
    > reordered by C++11 compiler and so lead to deadlocks, but I may be
    > wrong here.


    Probably your implementation is not correct. I think you should pass
    std::memory_order_acq_rel to exchange() (which is a read-modify-write
    operation) instead of just std::memory_order_acquire. Also, a simple
    std::atomic_flag should be more appropriate to implement a "spinlock".
    Luca Risolia, Apr 4, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    15
    Views:
    505
    Roy Harvey
    Oct 26, 2006
  2. NaeiKinDus
    Replies:
    1
    Views:
    560
    Jack Klein
    Apr 14, 2007
  3. NaeiKinDus
    Replies:
    3
    Views:
    592
    James Kanze
    Apr 15, 2007
  4. sven
    Replies:
    2
    Views:
    1,916
    Roy Smith
    Dec 4, 2009
  5. Richard Hollis

    Mutex page operations

    Richard Hollis, Dec 12, 2004, in forum: ASP General
    Replies:
    1
    Views:
    135
    Richard Hollis
    Dec 14, 2004
Loading...

Share This Page