Synchronization Algorithm Verificator for C++0x

kwikius · Aug 8, 2008

It might be interesting to time the actual mfence overhead when the
preceding locked RMW instruction has already drained the store queue.

You could , I guess try to extract the individual elements of soup, or
you could just call it soup.

regards
Andy Little

kwikius · Aug 8, 2008

You could , I guess try to extract the individual elements of soup, or
you could just call it soup.

hmm. Irespective of my feelings re threads, the above remark was
seriously out of order.

Sincere apologies to anyone offended.

regards
Andy Little

Chris M. Thomasson · Aug 9, 2008

Dmitriy V'jukov said:
Consider implementation of number 2 on x86.
Producer:
1. Interlocked RMW to enqueue element
2. mfence in signal (even more expensive than Interlocked RMW, and
absolutely useless)

Consumer:
3. Interlocked RMW in get
4. mfence in get (even more expensive than Interlocked RMW, and
absolutely useless)

Maybe compiler can optimize 4 away, because 3 and 4 will be in one
function. But I am not sure that first versions of C++0x compilers
will be so smart. And it's questionable that compiler will be able to
optimize 2 away, because 1 and 2 will be in different functions and
probably in different source files.

Do you like this?
No.

Don't hasten forgetting your asm skills

Ouch.

Dmitriy V'jukov · Aug 11, 2008

Did you develop Relacy in part so that you can totally "deconstruct" your
work-stealing algorithm into a form in which its "random" run-time behavior
could be studied at another level and therefore be _reliably_ debugged?

;^D

Initially, yes, it was the preposition for relacy creation. But now
it's developed into independent tool.

Dmitriy V'jukov

Dmitriy V'jukov · Aug 15, 2008

I want to announce release 1.1 of Relacy Race Detector.

First of all, now you can freely DOWNLOAD latest version of Relacy
Race Detector DIRECTLY FROM WEB:
http://groups.google.com/group/relacy/files

Main change in release 1.1 is support for standard synchronization
primitives:
1. mutex (std::mutex, pthread_mutex_init, InitializeCriticalSection)
2. rw_mutex (pthread_rwlock_init, InitializeSRWLock)
3. condition variable (std::condition_variable,
std::condition_variable_any, pthread_cond_init,
InitializeConditionVariable)
4. semaphore (sem_init, CreateSemaphore)
5. event (CreateEvent)

Now you can test more traditional "mutex-based" algorithms (i.e. no
lock-free), and Relacy still will detect data races, deadlocks,
livelocks, resource leaks, incorrect usage of API etc.
Also you can test mixed lock-based/lock-free algorithms. For example
fast-path is lock-free, and slow-path is mutex-based.
For examples see 'spsc_queue' example in distributive, it uses
std::mutex and std::condition_variable as well as lock-free fast-path.
And following manual implementation of condition_variable via Win
API's CreateEvent and CreateSemaphore:
http://groups.google.com/group/comp.programming.threads/msg/30c2ec41c4d498a2

Also I add RL_DEBUGBREAK_ON_ASSERT and RL_MSVC_OUTPUT options. And
initial_state/final_state parameters. See details here:
http://groups.google.com/group/relacy/web/advanced-moments

Dmitriy V'jukov

Dmitriy V'jukov · Aug 26, 2008

I have uploaded release 1.2 of Relacy Race Detector:
http://groups.google.com/group/relacy/files

The main feature of the release is support for Java/CLI (aka .NET)
algorithm verification (but don't get confused - it's still C++
library).
The support includes following things:
- Emulation of GC: just allocate-and-forget. GC eliminates important
problems associated with lock-free algorithms - safe memory
reclamation and ABA problem. You can use GC and in C++ mode - just
define RL_GC.
- CLI memory model. Stronger atomic RMW operations and stronger
seq_cst fence.
- Java memory model. Stronger atomic RMW operations and stronger
seq_cst fence and seq_cst fence emitted automatically between volatile
store and volatile load.
- Basic CLI API associated with synchronization. rl::nvolatile<>
class emulates CLI volatiles. rl::nvar<> emulates plain CLI variables.
Interlocked operations available in rl::Interlocked namespace (i.e.
rl::Interlocked::CompareExchange()). And also
rl::Thread::MemoryBarrier(), rl::Thread::VolatileRead(),
rl::Thread::VolatileWrite() and rl::Thread::SpinWait().
- Basic Java API associated with sycnronization. rl::jvolatile<>,
rl::jvar<>, rl::AtomicInteger and rl::AtomicLong. Note that extensive
Java/CLI support libraries are not emulated (various ConcurrentQueue's
and so on). But you can use mutexes, condition_variables, semaphores
and events from C++ API, POSIX API or Win API.

Example of CLI algorithm verification you can see here:
http://groups.google.com/group/relacy/browse_frm/thread/f91c913c298a20ab

Java example:
http://groups.google.com/group/relacy/browse_frm/thread/257150359b8fb57d

Also release includes a bunch of bug fixes.

Looking forward to your comments.

Dmitriy V'jukov

Dmitriy V'jukov · Aug 26, 2008

I have uploaded release 1.2 of Relacy Race Detector:
http://groups.google.com/group/relacy/files

Dmitriy V'jukov

Dmitriy V'jukov · Aug 26, 2008

The main feature of the release is support for Java/CLI (aka .NET)
algorithm verification (but don't get confused - it's still C++
library).
The support includes following things:
- Emulation of GC: just allocate-and-forget. GC eliminates important
problems associated with lock-free algorithms - safe memory
reclamation and ABA problem. You can use GC and in C++ mode - just
define RL_GC.
- CLI memory model. Stronger atomic RMW operations and stronger
seq_cst fence.
- Java memory model. Stronger atomic RMW operations and stronger
seq_cst fence and seq_cst fence emitted automatically between volatile
store and volatile load.
- Basic CLI API associated with synchronization. rl::nvolatile<>
class emulates CLI volatiles. rl::nvar<> emulates plain CLI variables.
Interlocked operations available in rl::Interlocked namespace (i.e.
rl::Interlocked::CompareExchange()). And also
rl::Thread::MemoryBarrier(), rl::Thread::VolatileRead(),
rl::Thread::VolatileWrite() and rl::Thread::SpinWait().
- Basic Java API associated with sycnronization. rl::jvolatile<>,
rl::jvar<>, rl::AtomicInteger and rl::AtomicLong. Note that extensive
Java/CLI support libraries are not emulated (various ConcurrentQueue's
and so on). But you can use mutexes, condition_variables, semaphores
and events from C++ API, POSIX API or Win API.

I forgot to mention following moment. There are no such things as data
races and uninitialized variables in Java/CLI. This basically forces
me to disable some types of automatically detectable errors (data
races and accesses to uninitialized variables) in Java/CLI mode. And
from verification point of view this is bad. But in Java/CLI mode you
still can use rl::var<>'s, and this will effectively re-enable
detection of mentioned types of errors. I.e. if you know that some
variable must be accesses only in 'single-threaded way' (for example,
all accesses are protected by mutex), then you can use rl::var<> for
this variable.

Dmitriy V'jukov

Peter Dimov · Aug 27, 2008

- Java memory model. Stronger atomic RMW operations and stronger
seq_cst fence and seq_cst fence emitted automatically between volatile
store and volatile load.

What are the goals of this mode?

If it's intended to analyze Java code against the formal Java memory
model, the above description is not correct. Consider again

T1: x = 1;
T2: y = 1;
T3: r1 = x; r2 = y;
T4: r3 = y; r4 = x;

(x, y volatile, initial value 0)

(r1,r2,r3,r4) = (1, 0, 1, 0) is disallowed by the Java memory model,
but allowed by your description.

This mode may be considered correct if it's intended to analyze Java
code running on a typical x86 JVM that follows the cookbook, but these
JVMs do not conform to the Java memory model and will likely be fixed
at some point.

Why not just model volatile loads and stores as volatile loads and
stores, instead of transforming the code and analyzing the transformed
result?

Dmitriy V'jukov · Aug 27, 2008

What are the goals of this mode?

If it's intended to analyze Java code against the formal Java memory
model, the above description is not correct.

Yes, goal of this mode is to analyze Java code (translated to C++)
against formal Java memory model.

Consider again

T1: x = 1;
T2: y = 1;
T3: r1 = x; r2 = y;
T4: r3 = y; r4 = x;

(x, y volatile, initial value 0)

(r1,r2,r3,r4) = (1, 0, 1, 0) is disallowed by the Java memory model,
but allowed by your description.

This mode may be considered correct if it's intended to analyze Java
code running on a typical x86 JVM that follows the cookbook, but these
JVMs do not conform to the Java memory model and will likely be fixed
at some point.

Yes, you are right.

17.4.4 Synchronization Order
Every execution has a synchronization order. A synchronization order
is a
total order over all of the synchronization actions of an execution.
....
Synchronization actions induce the synchronized-with relation on
actions,
defined as follows:
....
- A write to a volatile variable v synchronizes-with all subsequent
reads of v by any thread (where subsequent is defined according to the
synchronization
order).

This basically means that there must be total order over all accesses
to volatile variables.

Then what is the most precise way to model Java volatiles in terms of C
++0x?
On the one hand, binding must not allow false positives (i.e. Relacy
falsely detects error). On the other hand, binding must minimize
(ideally eliminate) false negatives (i.e. Relacy fails to detect real
error).

Does modeling of Java volatile stores/loads as C++0x seq_cst stores/
loads satisfy those requirements?

Why not just model volatile loads and stores as volatile loads and
stores, instead of transforming the code and analyzing the transformed
result?

I don't get you here. There are no such synchronization actions as
volatile loads and stores in C++0x. Please elaborate.

Thank you for pointing to the issue! I will fix it!

Dmitriy V'jukov

Peter Dimov · Aug 27, 2008

Does modeling of Java volatile stores/loads as C++0x seq_cst stores/
loads satisfy those requirements?

I believe that it does.

I don't get you here. There are no such synchronization actions as
volatile loads and stores in C++0x.

Yes, I used the Java meaning of volatile. I meant that Relacy can
directly implement the JMM meaning of volatile for rl::jvolatile,
instead of translating it to C++ terms first. This would allow you to
run the same algorithm expressed in Java and in C++ seq_cst and verify
whether the results match. So Relacy can give us a direct answer to
your first question above.

Dmitriy V'jukov · Aug 27, 2008

I believe that it does.

Good! Than it will be simple.

Yes, I used the Java meaning of volatile. I meant that Relacy can
directly implement the JMM meaning of volatile for rl::jvolatile,
instead of translating it to C++ terms first. This would allow you to
run the same algorithm expressed in Java and in C++ seq_cst and verify
whether the results match. So Relacy can give us a direct answer to
your first question above.

Hmmm... It looks like vicious circle

Direct modeling of Java and CLI synchronization primitives I consider
as last resort. I hope that I will be able to easily model Java/CLI
primitives via C++0x primitives. Currently I add only 2 patches. First
I've described here:
http://groups.google.com/group/comp.programming.threads/msg/07c810b38be80bbb
And second is that every atomic RMW is followed by seq_cst fence:
T java_cli_atomic_rmw(...)
{
T r = cpp0x_atomic_rmw(..., memory_order_seq_cst);
cpp0x_atomic_thread_fence(memory_order_seq_cst);
return r;
}
I think that this is intended behavior of CLI Interlocked operations,
because they based on Win32 Interlocked operations, and they are based
on x86 locked instructions

I am not sure about Java here. I can't find answer in language
specification and description of atomic package...

Dmitriy V'jukov

C++0x memory model and atomics, some questions	5	Sep 1, 2010
Synchronization problem	2	Nov 12, 2008
On alignment (final committee draft for C++0x and n1425 for C1X)	2	Aug 21, 2010
On alignment (final committee draft for C++0x and n1425 for C1X)	6	Aug 20, 2010
Threading question - read-read coherency almost, sort of, withseq_cst fences	8	Jun 16, 2011
working draft C++, program model	0	Oct 26, 2008
New interesting application of Transactional Memory forsingle-threading	3	Oct 23, 2008
string combination algorithm for a "word descrambler"	4	Jan 19, 2005

Synchronization Algorithm Verificator for C++0x

kwikius

kwikius

Chris M. Thomasson

Dmitriy V'jukov

Dmitriy V'jukov

Dmitriy V'jukov

Dmitriy V'jukov

Dmitriy V'jukov

Peter Dimov

Dmitriy V'jukov

Peter Dimov

Dmitriy V'jukov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads