What in C++11 prohibits mutex operations from being reordered?

  • Thread starter Michael Podolsky
  • Start date
M

Michael Podolsky

That is why I snipped it as irrelevant. On cases of unspecified
behaviors (something may or may not happen) standard tends to tell the
limits of allowable behaviors. (not so on case of undefined behaviors).


That is cite of §1.9/3? Mine copy has such: "Certain other aspects
and operations of the abstract machine are described in this
International Standard as unspecified (for example, order of evaluation
of arguments to a function). Where possible, this International Standard
defines a set of allowable behaviors. These define the
nondeterministic aspects of the abstract machine. An instance of the
abstract machine can thus have more than one possible execution for a
given program and a given input."

I think it's quite the same. My idea was that changing observable
behavior does not prove by itself anything about violation of C++
standard, as an abstract machine may have a set of allowed behaviors,
specifically because of unspecified aspects of the standard.

What set of allowable behaviors for what unspecified behavior you are
talking about? What paragraph?

I generally meant that for lock-unlock there is respectively acquire
and release semantics and for acquire and release semantics there are
some limitations which leave still a space for a reordering of release
operation with following acquire operation. I think, I can prove by
references to particular paragraphs that a release operation may be
reordered with following acquire without violation of the standard.
But I think it is generally obvious. (it may be still not obvious or
not so at all for lock-unlock).

But... I currently came to some conclusion (with the help of this
forum) that I was not right from the start, and had some
misconceptions about C++ execution model. So what I am saying in this
post is just in the context of the thread, while I generally tend to
agree that unlock and following lock can't be reordered (not
completely, at least) by C++11 rules.

Regards, Michael
 
M

Michael Podolsky

Michael Podolsky wrote:

[...]
That is what you see in my last example. Ordered execution does not
deadlock. Reordered execution deadlocks. Both executions are allowed
for observable behavior of abstract machine.
Now if you agree with this, then probably your argument is not enough
to explain why mutex operations cannot be reordered.

With mutexes in the absence of trylock() the program is guaranteed to
behave SC (and obviously without injected deadlocks).

With trylock() in the game, reordering m1.unlock() -> m2.lock()
(overlapping of subsequent-as-coded critical regions) can be observed as
far as transition of mutex states is concerned but deadlocks still won't
be injected.

What is so hard to understand here?

That with mutexes program behaves SC is not an axiom or rule of C++11
- it is theorem which should be proven basing on c++ memory rules like
acquire-release semantics of respective lock-unlock, total order of
operations on one mutex etc. My feeling (and claim) was that program
protected by "acquire-release" mutex is not SC because a x.unlock()
may be reoredered with y.lock().

I think finally, that I was not right here. But at least I think that
putting this claim by itself is not completely insane and makes some
reason (though in the context of some misconception I had about C++
execution model).

Thanks, Michael
 
M

Michael Podolsky

The C++ abstract machine executes the code faithfully as written without
any form of optimization being applied. This means that the abstract
machine can't ever deadlock on the code above because at no time does a
single thread hold both locks at the same time.

This is a very important moment in the context of what drove my
understanding of this topic,
and it looks that you are right here and I was wrong.
Thank you very much for pointing on my mistake.


You are wrong here. It does NOT take a reordering of the two stores in
thread 1 to have thread 2 deadlock. It is sufficient that the results
from those writes become visible to thread 2 out of order. This is a
possibility, even for the abstract machine.
Clause 1.10/5 notes that relaxed atomic operations do not provide
synchronization and clause 1.10/6 notes that modifications on different
atomic objects can become visible to other threads in inconsistent
orders. Together, this allows the deadlock to occur in the abstract
machine even without reordering the stores.

I did not mean that different order of visibility may happen ONLY as a
result of reordering.

But as for

I am now nearly sure that you are right about this moment, the
abstract machine executes its code
without reordering.
I don't agree here, because non-reordered execution can also deadlock.

You are right. Again.

Many thanks,
Michael
 
B

Bart van Ingen Schenau

Yes, that mostly relies on fact that non-atomic side effect in one
thread is not observable in other thread without data race (undefined
behavior) unless the side effect "happens before" observing. The
synchronization objects like mutexes are meant for to achieve that
"happens before" e.g. synchronization.

I understand that if such "happens before" is achieved then accessing
atomic objects or acquiring and releasing ownership of other mutexes
(not the one used for achieving "happens before" above) are also
observable side effects (there even can not be data races). That is why
I asked what unspecified behavior OP meant.

You make is sound as if, without data races, there must be one single
total order in which observable side-effects happen. This is not true.

For example, if two threads access different (unrelated) volatile objects
without any form of synchronization, you have two unsequenced observable
side effects (access to a volatile object) without there being a data
race or other kind of undefined behavior.
Even within the confines of the abstract machine, these accesses can
occur in either order or even simultaneously and that can be different
from execution to execution.

Bart v Ingen Schenau
 
Ö

Öö Tiib

You make is sound as if, without data races, there must be one single
total order in which observable side-effects happen. This is not true.

I meant that the possible data races are major factor that limit the
conditions when the side effects could be observed from other thread
even if there is synchronization by mutexes. That makes lot of the
optimizations that were available without threads still available
with threads, the synchronization with mutexes only adds barriers.
For example, if two threads access different (unrelated) volatile objects
without any form of synchronization, you have two unsequenced observable
side effects (access to a volatile object) without there being a data
race or other kind of undefined behavior.

Even within the confines of the abstract machine, these accesses can
occur in either order or even simultaneously and that can be different
from execution to execution.

Certainly, but does not this go bit outside of your "even with
synchronization by mutexes" above? Mutexes and volatile together do
limit the freedom of compiler quite severely.
 
B

Bart van Ingen Schenau

I meant that the possible data races are major factor that limit the
conditions when the side effects could be observed from other thread
even if there is synchronization by mutexes. That makes lot of the
optimizations that were available without threads still available with
threads, the synchronization with mutexes only adds barriers.
OK.



Certainly, but does not this go bit outside of your "even with
synchronization by mutexes" above? Mutexes and volatile together do
limit the freedom of compiler quite severely.

Here is another example of what I mean, now with the use of a mutex.
Consider there are two threads and one mutex m.
If the threads execute the following code:

//thread 1:
m.lock();
std::cout << "hello ";
m.unlock();

//thread 2:
m.lock();
std::cout << "world!";
m.unlock();

then the abstract machine allows the standard output stream to contain
either "hello world!" or "world!hello ", but it is up to the exact order
in which the threads reach the critical section that determines which
output you will get.

But perhaps we are talking at cross purposes. All I am trying to show is
that mutexes don't magically force a single global order on the
observable side-effects of a program.

Bart v Ingen Schenau
 
A

Alexander Terekhov

Michael Podolsky wrote:
[...]
Could not they do two flavors of trylock - one for CS and one (sane)
just for the case of people who want real real trylock and not smth
undeterministic.

I suspect they didn't want to teach the world that with sane
trylock/timedlock the following

T1 (owns m1 before T3 and T4 are launched): m1.unlock();
T2 (owns m2 before T3 and T4 are launched): m2.unlock();
T3: a = m1.trylock(), b = m2.trylock(); // or with timeouts
T4: c = m2.trylock(), d = m1.trylock(); // or with timeouts

may result in a = true, b = false, c == true, d = false thus exposing
non-SC behaviour even if the program is apparently "fully locked". (And
it can not be made SC under C++ model even with SC fence added to T3 and
T4.)

So I gather that the solution (in the spirit of Dr. Evil so to speak)
was to relax specification of trylock/timedlock beyond the sanity and
allow the program to yield non-SC result even under SC model due to
C++-invented spuriousness of trylock/timedlock failures (in direct
contradiction to POSIX which makes it even more grotesque).

regards,
alexander.
 
A

Alexander Terekhov

(corrections)

Alexander said:
Michael Podolsky wrote:
[...]
Could not they do two flavors of trylock - one for CS and one (sane)
just for the case of people who want real real trylock and not smth
undeterministic.

I suspect they didn't want to teach the world that with sane
trylock/timedlock the following

T1 (owns m1 before T3 and T4 are launched): m1.unlock();
T2 (owns m2 before T3 and T4 are launched): m2.unlock();
T3: a = m1.trylock(), b = m2.trylock(); // or with timeouts
T4: c = m2.trylock(), d = m1.trylock(); // or with timeouts

Err, bad example (due to modification by successful trylock). Here's
another one with even less threads:

T1: (initially nobody owns m1): m1.lock(); // w/o subsequent unlock
T2: (owns m2 before T3 is launched): if (!m1.trylock()) m2.unlock();
T3: if (m2.trylock()) assert(!m1.trylock());
may result in a = true, b = false, c == true, d = false thus exposing

may result in abort thus exposing
non-SC behaviour even if the program is apparently "fully locked". (And
it can not be made SC under C++ model even with SC fence added to T3 and
T4.)

T2 and T3
So I gather that the solution (in the spirit of Dr. Evil so to speak)
was to relax specification of trylock/timedlock beyond the sanity and
allow the program to yield non-SC result even under SC model due to
C++-invented spuriousness of trylock/timedlock failures (in direct
contradiction to POSIX which makes it even more grotesque).

No err here. ;-)

regards,
alexander.
 
J

Joshua Maurice

C++-invented spuriousness of trylock/timedlock failures (in direct
contradiction to POSIX which makes it even more grotesque).

I'd suggest to google this. This is not something that C++ just
invented. This is a straight up bug in the POSIX standard. Almost no
implementation of POSIX actually obeys the formal specification of
POSIX trylock. The naive intuitive implementation of lock, unlock, and
trylock does not obey the formal specification, and to obey the POSIX
trylock specification requires an implementation to add overhead on
all regular locking and unlocking (IIRC). All C++11 did was
standardize existing practice.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top