C++0x: release sequence

Discussion in 'C++' started by Dmitriy V'jukov, Jun 15, 2008.

  1. Current C++0x draft (N2606):
    1.10/6
    A release sequence on an atomic object M is a maximal contiguous sub-
    sequence of side effects in the modification
    order of M, where the first operation is a release, and every
    subsequent operation
    — is performed by the same thread that performed the release, or
    — is a non-relaxed atomic read-modify-write operation.

    Why in second clause there is *non-relaxed* atomic read-modify-write
    operation? Why non-relaxed?
    On what hardware architecture relaxed atomic read-modify-write
    operations in release sequence will interfere with efficient
    implementation?

    Dmitriy V'jukov
     
    Dmitriy V'jukov, Jun 15, 2008
    #1
    1. Advertising

  2. "Dmitriy V'jukov" <> writes:

    > Current C++0x draft (N2606):
    > 1.10/6
    > A release sequence on an atomic object M is a maximal contiguous sub-
    > sequence of side effects in the modification
    > order of M, where the first operation is a release, and every
    > subsequent operation
    > — is performed by the same thread that performed the release, or
    > — is a non-relaxed atomic read-modify-write operation.
    >
    > Why in second clause there is *non-relaxed* atomic read-modify-write
    > operation? Why non-relaxed?
    > On what hardware architecture relaxed atomic read-modify-write
    > operations in release sequence will interfere with efficient
    > implementation?


    Relaxed operations can read values from other threads
    out-of-order. Consider the following:

    atomic_int x=0;
    atomic_int y=0;

    Processor 1 does store-release:

    A: x.store(1,memory_order_relaxed)
    B: y.store(1,memory_order_release)

    Processor 2 does relaxed RMW op:
    int expected=1;
    C: while(!y.compare_swap(expected,2,memory_order_relaxed));

    Processor 3 does load-acquire:
    D: a=y.load(memory_order_acquire);
    E: b=x.load(memory_order_relaxed);

    If a is 2, what is b?

    On most common systems (e.g. x86, PowerPC, Sparc), b will be 1. This
    is not guaranteed by the standard though, since this may not be
    guaranteed by NUMA systems.

    Anthony
    --
    Anthony Williams | Just Software Solutions Ltd
    Custom Software Development | http://www.justsoftwaresolutions.co.uk
    Registered in England, Company Number 5478976.
    Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
     
    Anthony Williams, Jun 16, 2008
    #2
    1. Advertising

  3. On Jun 16, 2:13 pm, Anthony Williams <> wrote:

    > Relaxed operations can read values from other threads
    > out-of-order. Consider the following:
    >
    > atomic_int x=0;
    > atomic_int y=0;
    >
    > Processor 1 does store-release:
    >
    > A: x.store(1,memory_order_relaxed)
    > B: y.store(1,memory_order_release)
    >
    > Processor 2 does relaxed RMW op:
    > int expected=1;
    > C: while(!y.compare_swap(expected,2,memory_order_relaxed));
    >
    > Processor 3 does load-acquire:
    > D: a=y.load(memory_order_acquire);
    > E: b=x.load(memory_order_relaxed);
    >
    > If a is 2, what is b?
    >
    > On most common systems (e.g. x86, PowerPC, Sparc), b will be 1. This
    > is not guaranteed by the standard though, since this may not be
    > guaranteed by NUMA systems.



    The problem is that it prohibits usage of relaxed fetch_add in acquire
    operation in reference counting with basic thread-safety:

    struct rc_t
    {
    std::atomic<int> rc;
    };

    void acquire(rc_t* obj)
    {
    obj->rc.fetch_add(1, std::memory_order_relaxed);
    }

    This implementation can lead to data races in some usage patterns
    according to C++0x. Is it intended?

    Dmitriy V'jukov
     
    Dmitriy V'jukov, Jun 16, 2008
    #3
  4. "Dmitriy V'jukov" <> writes:

    > On Jun 16, 2:13 pm, Anthony Williams <> wrote:
    >
    >> Relaxed operations can read values from other threads
    >> out-of-order. Consider the following:
    >>
    >> atomic_int x=0;
    >> atomic_int y=0;
    >>
    >> Processor 1 does store-release:
    >>
    >> A: x.store(1,memory_order_relaxed)
    >> B: y.store(1,memory_order_release)
    >>
    >> Processor 2 does relaxed RMW op:
    >> int expected=1;
    >> C: while(!y.compare_swap(expected,2,memory_order_relaxed));
    >>
    >> Processor 3 does load-acquire:
    >> D: a=y.load(memory_order_acquire);
    >> E: b=x.load(memory_order_relaxed);
    >>
    >> If a is 2, what is b?
    >>
    >> On most common systems (e.g. x86, PowerPC, Sparc), b will be 1. This
    >> is not guaranteed by the standard though, since this may not be
    >> guaranteed by NUMA systems.

    >
    >
    > The problem is that it prohibits usage of relaxed fetch_add in acquire
    > operation in reference counting with basic thread-safety:
    >
    > struct rc_t
    > {
    > std::atomic<int> rc;
    > };
    >
    > void acquire(rc_t* obj)
    > {
    > obj->rc.fetch_add(1, std::memory_order_relaxed);
    > }
    >
    > This implementation can lead to data races in some usage patterns
    > according to C++0x. Is it intended?


    I'm fairly sure it was intentional. If you don't want data races,
    specify a non-relaxed ordering: I'd guess that
    std::memory_order_acquire would be good for your example. If you don't
    want the sync on the fetch_add, you could use a fence. Note that the
    use of fences in the C++0x WP has changed this week from
    object-specific fences to global fences. See Peter Dimov's paper
    N2633:
    http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2633.html

    Relaxed ordering is intended to be minimal overhead on all systems, so
    it provides no ordering guarantees. On systems that always provide the
    ordering guarantees, putting memory_order_acquire on the fetch_add is
    probably minimal overhead. On systems that truly exhibit relaxed
    ordering, requiring that the relaxed fetch_add participate in the
    release sequence could add considerable overhead.

    Consider my example above on a distributed system where the processors
    are conceptually "a long way" apart, and data synchronization is
    explicit.

    With the current WP, processor 2 only needs to synchronize access to
    y. If the relaxed op featured in the release sequence, it would need
    to also handle the synchronization data for x, so that processor 3 got
    the "right" values for x and y.

    Anthony
    --
    Anthony Williams | Just Software Solutions Ltd
    Custom Software Development | http://www.justsoftwaresolutions.co.uk
    Registered in England, Company Number 5478976.
    Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
     
    Anthony Williams, Jun 16, 2008
    #4
  5. On Jun 16, 3:09 pm, Anthony Williams <> wrote:

    > Relaxed ordering is intended to be minimal overhead on all systems, so
    > it provides no ordering guarantees. On systems that always provide the
    > ordering guarantees, putting memory_order_acquire on the fetch_add is
    > probably minimal overhead. On systems that truly exhibit relaxed
    > ordering, requiring that the relaxed fetch_add participate in the
    > release sequence could add considerable overhead.
    >
    > Consider my example above on a distributed system where the processors
    > are conceptually "a long way" apart, and data synchronization is
    > explicit.
    >
    > With the current WP, processor 2 only needs to synchronize access to
    > y. If the relaxed op featured in the release sequence, it would need
    > to also handle the synchronization data for x, so that processor 3 got
    > the "right" values for x and y.



    In your example, yes, one have to use non-relaxed rmw. But consider
    following example:

    struct object
    {
    std::atomic<int> rc;
    int data;

    void acquire()
    {
    rc.fetch_add(1, std::memory_order_relaxed);
    }

    void release()
    {
    if (1 == rc.fetch_sub(1, std::memory_order_release)
    {
    std::atomic_fence(std::memory_order_acquire);
    data = 0;
    delete this;
    }
    }
    };

    object* g_obj;

    void thread1();
    void thread2();
    void thread3();

    int main()
    {
    g_obj = new object;
    g_obj->data = 1;
    g_obj->rc = 3;

    thread th1 = start_thread(&thread1);
    thread th2 = start_thread(&thread2);
    thread th3 = start_thread(&thread3);

    join_thread(th1);
    join_thread(th2);
    join_thread(th3);
    }

    void thread1()
    {
    volatile int data = g_obj->data;
    g_obj->release(); // T1-1
    }

    void thread2()
    {
    g_obj->acquire(); // T2-1
    g_obj->release(); // T2-2
    g_obj->release(); // T2-3
    }

    void thread3()
    {
    g_obj->release(); // T3-1
    }

    From point of view of current C++0x draft this code contains race on
    g_obj->data. But I think this code is perfectly legal from hardware
    point of view.

    Consider following order of execution:
    T1-1
    T2-1 - here release sequence is broken, because of relaxed rmw
    T2-2 - but here release sequence is effectively "resurrected from
    dead", because thread, which executed relaxed rmw, now execute non-
    relaxed rmw
    T2-3
    T3-1

    So I think that T1-1 must 'synchronize-with' T3-1.

    Formal definition is something like this:

    A release sequence on an atomic object M is a maximal contiguous sub-
    sequence of side effects in the modification order of M, where the
    first operation is a release, and every subsequent operation
    — is performed by the same thread that performed the release, or
    — is a non-relaxed atomic read-modify-write operation.
    — is a *relaxed* atomic read-modify-write operation.

    Loaded release sequence on an atomic object M wrt evaluation A is part
    of release sequence starting from the beginning and up to (inclusive)
    value loaded by evaluation A.

    An evaluation A that performs a release operation on an object M
    synchronizes with an evaluation B that performs an acquire operation
    on M and reads a value written by any side effect in the release
    sequence headed by A, *if* for every relaxed rmw operation in loaded
    release sequence there is subsequent non-relaxed rmw operation in
    loaded release sequence executed by the same thread.

    More precisely: *if* for every relaxed rmw operation (executed not by
    thread which execute release)...

    I'm trying to make definitions more "permissive", thus making more
    correct (from hardware point of view) usage patterns legal (from C++0x
    point of view).

    What do you think?

    Dmitriy V'jukov
     
    Dmitriy V'jukov, Jun 16, 2008
    #5
  6. On Jun 16, 3:09 pm, Anthony Williams <> wrote:

    > Note that the
    > use of fences in the C++0x WP has changed this week from
    > object-specific fences to global fences. See Peter Dimov's paper
    > N2633:http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2633.html



    Yes, I've already read this. It's just GREAT! It's far more useful and
    intuitive.
    And it contains clear and simple binding to memory model, i.e.
    relations between acquire/release fences; and between acquire/release
    fences and acquire/release operations.

    Is it already generally approved by memory model working group?
    For atomic_fence() I'm not worry :) But what about complier_fence()?

    Btw, I see some problems in Peter Dimov's proposal.
    First, it's possible to write:

    x.store(1, std::memory_order_relaxed);
    std::atomic_compiler_fence(std::memory_order_release);
    y.store(1, std::memory_order_relaxed);

    But it's not possible to write:

    x.store(1, std::memory_order_relaxed);
    y.store(1, std::memory_order_relaxed_but_compiler_order_release);
    // or just y.store(1, std::compiler_order_release);

    I.e. it's not possible to use complier ordering, when using acquire/
    release operations. It's a bit inconsistent, especially taking into
    account that acquire/release operations are primary and standalone
    bidirectional fences are supplementary.

    Second, more important moment. It's possible to write:

    //thread 1:
    data = 1;
    std::atomic_memory_fence(std::memory_order_release);
    x.store(1, std::memory_order_relaxed);

    //thread 2:
    if (x.load(std::memory_order_acquire))
    assert(1 == data);

    But it's not possible to write:

    //thread 1:
    data = 1;
    z.store(1, std::memory_order_release);
    x.store(1, std::memory_order_relaxed);

    //thread 2:
    if (x.load(std::memory_order_acquire))
    assert(1 == data);

    From point of view of Peter Dimov's proposal, this core contains race
    on 'data'.

    I think there must be following statements:

    - release operation *is a* release fence
    - acquire operation *is a* acquire fence

    So this:
    z.store(1, std::memory_order_release);
    basically transforms to:
    std::atomic_memory_fence(std::memory_order_release);
    z.store(1, std::memory_order_release);

    Then second example will be legal. What do you think?

    Dmitriy V'jukov
     
    Dmitriy V'jukov, Jun 16, 2008
    #6
  7. "Dmitriy V'jukov" <> writes:

    > On Jun 16, 3:09 pm, Anthony Williams <> wrote:
    >
    >> Relaxed ordering is intended to be minimal overhead on all systems, so
    >> it provides no ordering guarantees. On systems that always provide the
    >> ordering guarantees, putting memory_order_acquire on the fetch_add is
    >> probably minimal overhead. On systems that truly exhibit relaxed
    >> ordering, requiring that the relaxed fetch_add participate in the
    >> release sequence could add considerable overhead.
    >>
    >> Consider my example above on a distributed system where the processors
    >> are conceptually "a long way" apart, and data synchronization is
    >> explicit.
    >>
    >> With the current WP, processor 2 only needs to synchronize access to
    >> y. If the relaxed op featured in the release sequence, it would need
    >> to also handle the synchronization data for x, so that processor 3 got
    >> the "right" values for x and y.

    >
    >
    > In your example, yes, one have to use non-relaxed rmw. But consider
    > following example:
    >
    > struct object
    > {
    > std::atomic<int> rc;
    > int data;
    >
    > void acquire()
    > {
    > rc.fetch_add(1, std::memory_order_relaxed);
    > }
    >
    > void release()
    > {
    > if (1 == rc.fetch_sub(1, std::memory_order_release)
    > {
    > std::atomic_fence(std::memory_order_acquire);
    > data = 0;
    > delete this;
    > }
    > }
    > };
    >
    > object* g_obj;
    >
    > void thread1();
    > void thread2();
    > void thread3();
    >
    > int main()
    > {
    > g_obj = new object;
    > g_obj->data = 1;
    > g_obj->rc = 3;
    >
    > thread th1 = start_thread(&thread1);
    > thread th2 = start_thread(&thread2);
    > thread th3 = start_thread(&thread3);
    >
    > join_thread(th1);
    > join_thread(th2);
    > join_thread(th3);
    > }
    >
    > void thread1()
    > {
    > volatile int data = g_obj->data;
    > g_obj->release(); // T1-1
    > }
    >
    > void thread2()
    > {
    > g_obj->acquire(); // T2-1
    > g_obj->release(); // T2-2
    > g_obj->release(); // T2-3
    > }
    >
    > void thread3()
    > {
    > g_obj->release(); // T3-1
    > }
    >
    > From point of view of current C++0x draft this code contains race on
    > g_obj->data. But I think this code is perfectly legal from hardware
    > point of view.


    I guess it depends on your hardware. The relaxed fetch_add says "I
    don't care about ordering", yet your code blatantly does care about
    the ordering. I can't help thinking it should be
    fetch_add(1,memory_order_acquire).

    > Consider following order of execution:
    > T1-1
    > T2-1 - here release sequence is broken, because of relaxed rmw
    > T2-2 - but here release sequence is effectively "resurrected from
    > dead", because thread, which executed relaxed rmw, now execute non-
    > relaxed rmw
    > T2-3
    > T3-1


    And there's the rub: I don't think this is sensible. You explicitly
    broke the release sequence with the relaxed fetch_add, so you can't
    resurrect it.

    T2-1 is not ordered wrt the read of g_obj->data in thread1. If it
    needs to be ordered, it should say so.

    Anthony
    --
    Anthony Williams | Just Software Solutions Ltd
    Custom Software Development | http://www.justsoftwaresolutions.co.uk
    Registered in England, Company Number 5478976.
    Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
     
    Anthony Williams, Jun 16, 2008
    #7
  8. "Dmitriy V'jukov" <> writes:

    > On Jun 16, 3:09 pm, Anthony Williams <> wrote:
    >
    >> Note that the
    >> use of fences in the C++0x WP has changed this week from
    >> object-specific fences to global fences. See Peter Dimov's paper
    >> N2633:http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2633.html

    >
    >
    > Yes, I've already read this. It's just GREAT! It's far more useful and
    > intuitive.
    > And it contains clear and simple binding to memory model, i.e.
    > relations between acquire/release fences; and between acquire/release
    > fences and acquire/release operations.
    >
    > Is it already generally approved by memory model working group?
    > For atomic_fence() I'm not worry :) But what about complier_fence()?


    Yes. It's been approved to be applied to the WP with minor renamings
    (atomic_memory_fence -> atomic_thread_fence, atomic_compiler_fence ->
    atomic_signal_fence)

    > Btw, I see some problems in Peter Dimov's proposal.
    > First, it's possible to write:
    >
    > x.store(1, std::memory_order_relaxed);
    > std::atomic_compiler_fence(std::memory_order_release);
    > y.store(1, std::memory_order_relaxed);
    >
    > But it's not possible to write:
    >
    > x.store(1, std::memory_order_relaxed);
    > y.store(1, std::memory_order_relaxed_but_compiler_order_release);
    > // or just y.store(1, std::compiler_order_release);
    >
    > I.e. it's not possible to use complier ordering, when using acquire/
    > release operations. It's a bit inconsistent, especially taking into
    > account that acquire/release operations are primary and standalone
    > bidirectional fences are supplementary.


    You're right that you can't do this. I don't think it's a problem as
    compiler orderings are not really the same as the inter-thread
    orderings.

    > Second, more important moment. It's possible to write:
    >
    > //thread 1:
    > data = 1;
    > std::atomic_memory_fence(std::memory_order_release);
    > x.store(1, std::memory_order_relaxed);
    >
    > //thread 2:
    > if (x.load(std::memory_order_acquire))
    > assert(1 == data);
    >
    > But it's not possible to write:
    >
    > //thread 1:
    > data = 1;
    > z.store(1, std::memory_order_release);
    > x.store(1, std::memory_order_relaxed);
    >
    > //thread 2:
    > if (x.load(std::memory_order_acquire))
    > assert(1 == data);
    >
    > From point of view of Peter Dimov's proposal, this core contains race
    > on 'data'.


    Yes. Fences are global, whereas ordering on individual objects is
    specific. The fence version is equivalent to:

    // thread 1
    data=1
    x.store(1,std::memory_order_release);

    > I think there must be following statements:
    >
    > - release operation *is a* release fence
    > - acquire operation *is a* acquire fence
    >
    > So this:
    > z.store(1, std::memory_order_release);
    > basically transforms to:
    > std::atomic_memory_fence(std::memory_order_release);
    > z.store(1, std::memory_order_release);
    >
    > Then second example will be legal. What do you think?


    I think that compromises the model, because it makes release
    operations contagious. The fence transformation is precisely the
    reverse of this, which I think is correct.

    Anthony
    --
    Anthony Williams | Just Software Solutions Ltd
    Custom Software Development | http://www.justsoftwaresolutions.co.uk
    Registered in England, Company Number 5478976.
    Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
     
    Anthony Williams, Jun 16, 2008
    #8
  9. On Jun 16, 11:36 pm, Anthony Williams <> wrote:

    > > From point of view of current C++0x draft this code contains race on
    > > g_obj->data. But I think this code is perfectly legal from hardware
    > > point of view.

    >
    > I guess it depends on your hardware. The relaxed fetch_add says "I
    > don't care about ordering", yet your code blatantly does care about
    > the ordering. I can't help thinking it should be
    > fetch_add(1,memory_order_acquire).


    Ok. Let's put it this way. I change your initial example a bit:

    atomic_int x=0;
    atomic_int y=0;

    Processor 1 does store-release:

    A: x.store(1,memory_order_relaxed)
    B: y.store(1,memory_order_release)

    Processor 2 does relaxed RMW op and then non-relaxed RMW op:
    int expected=1;
    C: while(!y.compare_swap(expected,2,memory_order_relaxed));
    D: y.fetch_add(1,memory_order_acq_rel));

    Processor 3 does load-acquire:
    E: a=y.load(memory_order_acquire);
    F: b=x.load(memory_order_relaxed);
    if (3 == a) assert(1 == b);

    If a is 3, what is b?

    I believe that b==1, and here is no race on x.

    And in this particular example following lines:
    C: while(!y.compare_swap(expected,2,memory_order_relaxed));
    D: y.fetch_add(1,memory_order_acq_rel));
    synchronize memory exactly like this single line:
    C: while(!y.compare_swap(expected,3,memory_order_acq_rel));

    Or I am wrong even here?

    Dmitriy V'jukov
     
    Dmitriy V'jukov, Jun 16, 2008
    #9
  10. On Jun 16, 11:47 pm, Anthony Williams <> wrote:

    > > Yes, I've already read this. It's just GREAT! It's far more useful and
    > > intuitive.
    > > And it contains clear and simple binding to memory model, i.e.
    > > relations between acquire/release fences; and between acquire/release
    > > fences and acquire/release operations.

    >
    > > Is it already generally approved by memory model working group?
    > > For atomic_fence() I'm not worry :) But what about complier_fence()?

    >
    > Yes. It's been approved to be applied to the WP with minor renamings
    > (atomic_memory_fence -> atomic_thread_fence, atomic_compiler_fence ->
    > atomic_signal_fence)


    COOL!

    Looking forward to next draft. Btw, what about dependent memory
    ordering (memory_order_consume)? Is it going to be accepted?




    > > Btw, I see some problems in Peter Dimov's proposal.
    > > First, it's possible to write:

    >
    > > x.store(1, std::memory_order_relaxed);
    > > std::atomic_compiler_fence(std::memory_order_release);
    > > y.store(1, std::memory_order_relaxed);

    >
    > > But it's not possible to write:

    >
    > > x.store(1, std::memory_order_relaxed);
    > > y.store(1, std::memory_order_relaxed_but_compiler_order_release);
    > > // or just y.store(1, std::compiler_order_release);

    >
    > > I.e. it's not possible to use complier ordering, when using acquire/
    > > release operations. It's a bit inconsistent, especially taking into
    > > account that acquire/release operations are primary and standalone
    > > bidirectional fences are supplementary.

    >
    > You're right that you can't do this. I don't think it's a problem as
    > compiler orderings are not really the same as the inter-thread
    > orderings.


    Yes, but why I can do and inter-thread orderings and compiler
    orderings with stand-alone fences, and can do only inter-thread
    orderings with operations? Why stand-alone fences are more 'powerful'?


    > > Second, more important moment. It's possible to write:

    >
    > > //thread 1:
    > > data = 1;
    > > std::atomic_memory_fence(std::memory_order_release);
    > > x.store(1, std::memory_order_relaxed);

    >
    > > //thread 2:
    > > if (x.load(std::memory_order_acquire))
    > > assert(1 == data);

    >
    > > But it's not possible to write:

    >
    > > //thread 1:
    > > data = 1;
    > > z.store(1, std::memory_order_release);
    > > x.store(1, std::memory_order_relaxed);

    >
    > > //thread 2:
    > > if (x.load(std::memory_order_acquire))
    > > assert(1 == data);

    >
    > > From point of view of Peter Dimov's proposal, this core contains race
    > > on 'data'.

    >
    > Yes. Fences are global, whereas ordering on individual objects is
    > specific.


    Hmmm... need to think some more on this...


    > The fence version is equivalent to:
    >
    > // thread 1
    > data=1
    > x.store(1,std::memory_order_release);
    >
    > > I think there must be following statements:

    >
    > > - release operation *is a* release fence
    > > - acquire operation *is a* acquire fence

    >
    > > So this:
    > > z.store(1, std::memory_order_release);
    > > basically transforms to:
    > > std::atomic_memory_fence(std::memory_order_release);
    > > z.store(1, std::memory_order_release);

    >
    > > Then second example will be legal. What do you think?

    >
    > I think that compromises the model, because it makes release
    > operations contagious...


    .... and this will interfere with efficient implementation on some
    hardware. Right? Or there are some 'logical' reasons for this (why you
    don't want to make release operations contagious)?

    Dmitriy V'jukov
     
    Dmitriy V'jukov, Jun 16, 2008
    #10
  11. "Dmitriy V'jukov" <> writes:

    > On Jun 16, 11:47 pm, Anthony Williams <> wrote:
    >
    >> > Yes, I've already read this. It's just GREAT! It's far more useful and
    >> > intuitive.
    >> > And it contains clear and simple binding to memory model, i.e.
    >> > relations between acquire/release fences; and between acquire/release
    >> > fences and acquire/release operations.

    >>
    >> > Is it already generally approved by memory model working group?
    >> > For atomic_fence() I'm not worry :) But what about complier_fence()?

    >>
    >> Yes. It's been approved to be applied to the WP with minor renamings
    >> (atomic_memory_fence -> atomic_thread_fence, atomic_compiler_fence ->
    >> atomic_signal_fence)

    >
    > COOL!
    >
    > Looking forward to next draft. Btw, what about dependent memory
    > ordering (memory_order_consume)? Is it going to be accepted?


    Yes. That's been voted in too.

    >> > Btw, I see some problems in Peter Dimov's proposal.
    >> > First, it's possible to write:

    >>
    >> > x.store(1, std::memory_order_relaxed);
    >> > std::atomic_compiler_fence(std::memory_order_release);
    >> > y.store(1, std::memory_order_relaxed);

    >>
    >> > But it's not possible to write:

    >>
    >> > x.store(1, std::memory_order_relaxed);
    >> > y.store(1, std::memory_order_relaxed_but_compiler_order_release);
    >> > // or just y.store(1, std::compiler_order_release);

    >>
    >> > I.e. it's not possible to use complier ordering, when using acquire/
    >> > release operations. It's a bit inconsistent, especially taking into
    >> > account that acquire/release operations are primary and standalone
    >> > bidirectional fences are supplementary.

    >>
    >> You're right that you can't do this. I don't think it's a problem as
    >> compiler orderings are not really the same as the inter-thread
    >> orderings.

    >
    > Yes, but why I can do and inter-thread orderings and compiler
    > orderings with stand-alone fences, and can do only inter-thread
    > orderings with operations? Why stand-alone fences are more 'powerful'?


    Stand-alone fences affect all data touched by the executing thread, so
    they are inherently more 'powerful'.

    >> The fence version is equivalent to:
    >>
    >> // thread 1
    >> data=1
    >> x.store(1,std::memory_order_release);
    >>
    >> > I think there must be following statements:

    >>
    >> > - release operation *is a* release fence
    >> > - acquire operation *is a* acquire fence

    >>
    >> > So this:
    >> > z.store(1, std::memory_order_release);
    >> > basically transforms to:
    >> > std::atomic_memory_fence(std::memory_order_release);
    >> > z.store(1, std::memory_order_release);

    >>
    >> > Then second example will be legal. What do you think?

    >>
    >> I think that compromises the model, because it makes release
    >> operations contagious...

    >
    > ... and this will interfere with efficient implementation on some
    > hardware. Right? Or there are some 'logical' reasons for this (why you
    > don't want to make release operations contagious)?


    It affects where you put the memory barrier instruction. The whole
    point of relaxed operations is that they don't have memory barriers,
    but if you make the release contagious the compiler might have to add
    extra memory barriers in some cases. N2633 shows how you can
    accidentally end up having to get full barriers all over the place.

    Anthony
    --
    Anthony Williams | Just Software Solutions Ltd
    Custom Software Development | http://www.justsoftwaresolutions.co.uk
    Registered in England, Company Number 5478976.
    Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
     
    Anthony Williams, Jun 16, 2008
    #11
  12. "Dmitriy V'jukov" <> writes:

    > On Jun 16, 11:36 pm, Anthony Williams <> wrote:
    >
    >> > From point of view of current C++0x draft this code contains race on
    >> > g_obj->data. But I think this code is perfectly legal from hardware
    >> > point of view.

    >>
    >> I guess it depends on your hardware. The relaxed fetch_add says "I
    >> don't care about ordering", yet your code blatantly does care about
    >> the ordering. I can't help thinking it should be
    >> fetch_add(1,memory_order_acquire).

    >
    > Ok. Let's put it this way. I change your initial example a bit:
    >
    > atomic_int x=0;
    > atomic_int y=0;
    >
    > Processor 1 does store-release:
    >
    > A: x.store(1,memory_order_relaxed)
    > B: y.store(1,memory_order_release)
    >
    > Processor 2 does relaxed RMW op and then non-relaxed RMW op:
    > int expected=1;
    > C: while(!y.compare_swap(expected,2,memory_order_relaxed));
    > D: y.fetch_add(1,memory_order_acq_rel));
    >
    > Processor 3 does load-acquire:
    > E: a=y.load(memory_order_acquire);
    > F: b=x.load(memory_order_relaxed);
    > if (3 == a) assert(1 == b);
    >
    > If a is 3, what is b?
    >
    > I believe that b==1, and here is no race on x.


    Not by the current memory model. On common desktop hardware, I believe
    you are right.

    > And in this particular example following lines:
    > C: while(!y.compare_swap(expected,2,memory_order_relaxed));
    > D: y.fetch_add(1,memory_order_acq_rel));
    > synchronize memory exactly like this single line:
    > C: while(!y.compare_swap(expected,3,memory_order_acq_rel));
    >
    > Or I am wrong even here?


    Under the current memory model, I believe you are wrong.

    There's two atomic operations, so they can have another operation
    interleaved between them. The relaxed operation cannot be reordered
    with the fetch_add, since it's on the same variable, but it can be
    reordered with respect to operations on other threads.

    Anthony
    --
    Anthony Williams | Just Software Solutions Ltd
    Custom Software Development | http://www.justsoftwaresolutions.co.uk
    Registered in England, Company Number 5478976.
    Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
     
    Anthony Williams, Jun 16, 2008
    #12
  13. On 17 ÉÀÎ, 00:30, Anthony Williams <> wrote:

    > > And in this particular example following lines:
    > > C: while(!y.compare_swap(expected,2,memory_order_relaxed));
    > > D: y.fetch_add(1,memory_order_acq_rel));
    > > synchronize memory exactly like this single line:
    > > C: while(!y.compare_swap(expected,3,memory_order_acq_rel));

    >
    > > Or I am wrong even here?

    >
    > Under the current memory model, I believe you are wrong.


    I understand this. But I am talking about memory model itself. Maybe
    it's not... correct... ok, precise. Maybe it's better to change memory
    model to allow such code...

    > There's two atomic operations, so they can have another operation
    > interleaved between them. The relaxed operation cannot be reordered
    > with the fetch_add, since it's on the same variable, but it can be
    > reordered with respect to operations on other threads.


    But processor 3 checks whether (3 == a), if (3 == a) then processor 2
    execute not only relaxed compare_swap but also subsequent *non-
    relaxed* fetch_add. This means that processor 2 nevertheless
    synchronize memory. So how this can be that processor 3 will fail
    assert?

    Dmitriy V'jukov
     
    Dmitriy V'jukov, Jun 16, 2008
    #13
  14. On 17 ÉÀÎ, 00:27, Anthony Williams <> wrote:
    > > Looking forward to next draft. Btw, what about dependent memory
    > > ordering (memory_order_consume)? Is it going to be accepted?

    >
    > Yes. That's been voted in too.


    Oooo, it's a bad news. I only start understading current "1.10", and
    they change it almost completely! :)

    The latest proposal about dependent ordering is:
    http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2556.html
    Right?

    And what about syntax with double square brackets
    [[carries_dependency]]? It's quite unusual syntax addition for C/C+
    +...


    > >> > Btw, I see some problems in Peter Dimov's proposal.
    > >> > First, it's possible to write:

    >
    > >> > x.store(1, std::memory_order_relaxed);
    > >> > std::atomic_compiler_fence(std::memory_order_release);
    > >> > y.store(1, std::memory_order_relaxed);

    >
    > >> > But it's not possible to write:

    >
    > >> > x.store(1, std::memory_order_relaxed);
    > >> > y.store(1, std::memory_order_relaxed_but_compiler_order_release);
    > >> > // or just y.store(1, std::compiler_order_release);

    >
    > >> > I.e. it's not possible to use complier ordering, when using acquire/
    > >> > release operations. It's a bit inconsistent, especially taking into
    > >> > account that acquire/release operations are primary and standalone
    > >> > bidirectional fences are supplementary.

    >
    > >> You're right that you can't do this. I don't think it's a problem as
    > >> compiler orderings are not really the same as the inter-thread
    > >> orderings.

    >
    > > Yes, but why I can do and inter-thread orderings and compiler
    > > orderings with stand-alone fences, and can do only inter-thread
    > > orderings with operations? Why stand-alone fences are more 'powerful'?

    >
    > Stand-alone fences affect all data touched by the executing thread, so
    > they are inherently more 'powerful'.


    I'm starting to understand. Initially I was thinking that it's just 2
    forms of saying the same thing (stand-alone fence and acquire/release
    operation). It turns out to be not true. Ok.

    Dmitriy V'jukov
     
    Dmitriy V'jukov, Jun 17, 2008
    #14
  15. "Dmitriy V'jukov" <> writes:

    > On 17 июн, 00:27, Anthony Williams <> wrote:
    >> > Looking forward to next draft. Btw, what about dependent memory
    >> > ordering (memory_order_consume)? Is it going to be accepted?

    >>
    >> Yes. That's been voted in too.

    >
    > Oooo, it's a bad news. I only start understading current "1.10", and
    > they change it almost completely! :)


    It's all additions, so it's not too bad. The key thing is that the
    paper adds memory_order_consume and dependency ordering, which
    provides an additional mechanism for introducing a happens-before
    relationship between threads.

    > The latest proposal about dependent ordering is:
    > http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2556.html
    > Right?


    That's the latest pre-meeting paper. The latest (which is what was
    voted on) is N2664 which is currently only available on the committee
    site. It should be in the post-meeting mailing.

    > And what about syntax with double square brackets
    > [[carries_dependency]]? It's quite unusual syntax addition for C/C+
    > +...


    That's the new attribute syntax. This part of the proposal has not
    been included for now.

    Anthony
    --
    Anthony Williams | Just Software Solutions Ltd
    Custom Software Development | http://www.justsoftwaresolutions.co.uk
    Registered in England, Company Number 5478976.
    Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
     
    Anthony Williams, Jun 17, 2008
    #15
  16. "Dmitriy V'jukov" <> writes:

    > On 17 июн, 00:30, Anthony Williams <> wrote:
    >
    >> > And in this particular example following lines:
    >> > C: while(!y.compare_swap(expected,2,memory_order_relaxed));
    >> > D: y.fetch_add(1,memory_order_acq_rel));
    >> > synchronize memory exactly like this single line:
    >> > C: while(!y.compare_swap(expected,3,memory_order_acq_rel));

    >>
    >> > Or I am wrong even here?

    >>
    >> Under the current memory model, I believe you are wrong.

    >
    > I understand this. But I am talking about memory model itself. Maybe
    > it's not... correct... ok, precise. Maybe it's better to change memory
    > model to allow such code...


    I think that the current memory model actually "works" in the sense
    that you can reason about what the results could be, and that it
    reflects what processors do in the circumstances it describes. The HPC
    community were very keen that the memory model not preclude machines
    with a more relaxed model than current Power and Sparc architectures
    allow.

    >> There's two atomic operations, so they can have another operation
    >> interleaved between them. The relaxed operation cannot be reordered
    >> with the fetch_add, since it's on the same variable, but it can be
    >> reordered with respect to operations on other threads.

    >
    > But processor 3 checks whether (3 == a), if (3 == a) then processor 2
    > execute not only relaxed compare_swap but also subsequent *non-
    > relaxed* fetch_add. This means that processor 2 nevertheless
    > synchronize memory. So how this can be that processor 3 will fail
    > assert?


    Processor 2 synchronizes with Processor 3. However, processor 1
    doesn't synchronize with processor 2, and processor 2 doesn't touch x,
    so there is no happens-before relationship on x between P1 and P2 or
    P1/P2 and P3.

    Anthony
    --
    Anthony Williams | Just Software Solutions Ltd
    Custom Software Development | http://www.justsoftwaresolutions.co.uk
    Registered in England, Company Number 5478976.
    Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
     
    Anthony Williams, Jun 17, 2008
    #16
  17. On Jun 17, 10:48 am, Anthony Williams <> wrote:
    > "Dmitriy V'jukov" <> writes:
    > > On 17 ÉÀÎ, 00:27, Anthony Williams <> wrote:
    > >> > Looking forward to next draft. Btw, what about dependent memory
    > >> > ordering (memory_order_consume)? Is it going to be accepted?

    >
    > >> Yes. That's been voted in too.

    >
    > > Oooo, it's a bad news. I only start understading current "1.10", and
    > > they change it almost completely! :)

    >
    > It's all additions, so it's not too bad. The key thing is that the
    > paper adds memory_order_consume and dependency ordering, which
    > provides an additional mechanism for introducing a happens-before
    > relationship between threads.
    >
    > > The latest proposal about dependent ordering is:
    > >http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2556.html
    > > Right?

    >
    > That's the latest pre-meeting paper. The latest (which is what was
    > voted on) is N2664 which is currently only available on the committee
    > site. It should be in the post-meeting mailing.



    I hope that in N2664 'happens before' definition is changed. Because
    now I can't understand it.
    For example in following code:

    int data;
    std::atomic<int> x;

    thread 1:
    data = 1;
    x.store(1, std::memory_order_release); (A)

    thread2:
    if (x.load(std::memory_order_consume)) (B)
    assert(1 == data); (C)

    A dependency-ordered before B, B sequenced before C.
    So according to definition of 'happens before' in n2556, A happens-
    before C.
    According to my understanding, this is simply wrong. There is no data-
    dependency between B and C, so A must not happens-before C. (there is
    control dependency, but currently C++0x doesn't respect control-
    dependency)

    ------------------------------------
    Another moment:

    An evaluation A carries a dependency to an evaluation B if
    * the value of A is used as an operand of B, and:
    o B is not an invocation of any specialization of
    std::kill_dependency, and
    o A is not the left operand to the comma (',') operator,

    I think here ---------------------------------/\/\/\/\/\/\/\
    must be 'built-in comma operator'. Because consider following example:

    struct X
    {
    int data;
    };

    void operator , (int y, X& x)
    {
    x.data = y;
    }

    std::atomic<int> a;

    int main()
    {
    int y = a.load(std::memory_order_consume);
    X x;
    y, x; // here 'carries a dependency' is broken, because 'y' is a
    left operand of comma operator
    int z = x.data; // but I think, that 'z' still must be in
    'dependency tree' rooted by 'y'
    }


    Where I am wrong this time? :)


    Dmitriy V'jukov
     
    Dmitriy V'jukov, Jun 19, 2008
    #17
  18. "Dmitriy V'jukov" <> writes:

    > On Jun 17, 10:48 am, Anthony Williams <> wrote:
    >> "Dmitriy V'jukov" <> writes:
    >> > On 17 июн, 00:27, Anthony Williams <> wrote:
    >> >> > Looking forward to next draft. Btw, what about dependent memory
    >> >> > ordering (memory_order_consume)? Is it going to be accepted?

    >>
    >> >> Yes. That's been voted in too.

    >>
    >> > Oooo, it's a bad news. I only start understading current "1.10", and
    >> > they change it almost completely! :)

    >>
    >> It's all additions, so it's not too bad. The key thing is that the
    >> paper adds memory_order_consume and dependency ordering, which
    >> provides an additional mechanism for introducing a happens-before
    >> relationship between threads.
    >>
    >> > The latest proposal about dependent ordering is:
    >> >http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2556.html
    >> > Right?

    >>
    >> That's the latest pre-meeting paper. The latest (which is what was
    >> voted on) is N2664 which is currently only available on the committee
    >> site. It should be in the post-meeting mailing.

    >
    >
    > I hope that in N2664 'happens before' definition is changed. Because
    > now I can't understand it.


    N2664 is almost the same as N2556.

    > For example in following code:
    >
    > int data;
    > std::atomic<int> x;
    >
    > thread 1:
    > data = 1;
    > x.store(1, std::memory_order_release); (A)
    >
    > thread2:
    > if (x.load(std::memory_order_consume)) (B)
    > assert(1 == data); (C)
    >
    > A dependency-ordered before B, B sequenced before C.


    Yes.

    > So according to definition of 'happens before' in n2556, A happens-
    > before C.


    No. happens-before is no longer transitive if one of the legs is a
    dependency ordering.

    N2664 says:

    "An evaluation A inter-thread happens before an evaluation B if,

    * A synchronizes with B, or
    * A is dependency-ordered before B, or
    * for some evaluation X,
    o A synchronizes with X and X is sequenced before B, or
    o A is sequenced before X and X inter-thread happens before B, or
    o A inter-thread happens before X and X inter-thread happens before B."

    "An evaluation A happens before an evaluation B if:

    * A is sequenced before B, or
    * A inter-thread happens before B."

    A is dependency-ordered before B, so A inter-thread happens-before B,
    and A happens-before B.

    However A neither synchronizes with B or C, nor is sequenced before B, so
    the only way A could inter-thread-happen-before C is if B
    inter-thread-happens-before C. Since C is not atomic, B cannot
    synchronize with C or be dependency-ordered before C. Thus A does not
    inter-thread-happen-before C, and A does not happen-before C.

    > According to my understanding, this is simply wrong. There is no data-
    > dependency between B and C, so A must not happens-before C. (there is
    > control dependency, but currently C++0x doesn't respect control-
    > dependency)


    You're right in your analysis, but N2664 agrees with you.

    > ------------------------------------
    > Another moment:
    >
    > An evaluation A carries a dependency to an evaluation B if
    > * the value of A is used as an operand of B, and:
    > o B is not an invocation of any specialization of
    > std::kill_dependency, and
    > o A is not the left operand to the comma (',') operator,
    >
    > I think here ---------------------------------/\/\/\/\/\/\/\
    > must be 'built-in comma operator'. Because consider following example:


    Yes. That's fixed in N2664:

    "An evaluation A carries a dependency to an evaluation B if

    * the value of A is used as an operand of B, unless:
    o B is an invocation of any specialization of std::kill_dependency (29.1), or
    o A is the left operand of a built-in logical AND ('&&', see 5.14) or logical OR ('||', see 5.15) operator, or
    o A is the left operand of a conditional ('?:') operator (5.16), or
    o A is the left operand of the built-in comma (',') operator (5.18);
    or
    * A writes a scalar object or bit-field M, B reads the value written by A from M, and A is sequenced before B, or
    * for some evaluation X, A carries a dependency to X, and X carries a dependency to B."

    Anthony
    --
    Anthony Williams | Just Software Solutions Ltd
    Custom Software Development | http://www.justsoftwaresolutions.co.uk
    Registered in England, Company Number 5478976.
    Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
     
    Anthony Williams, Jun 19, 2008
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Goche
    Replies:
    1
    Views:
    7,491
    John Goche
    Dec 17, 2005
  2. stef mientki
    Replies:
    13
    Views:
    646
    stef mientki
    Oct 20, 2007
  3. Benjamin Peterson

    [RELEASE] Python 2.7 release candidate 1 released

    Benjamin Peterson, Jun 6, 2010, in forum: Python
    Replies:
    3
    Views:
    368
    Lie Ryan
    Jun 6, 2010
  4. Giovanni Funchal

    Release sequence in C++0x memory model

    Giovanni Funchal, Oct 13, 2010, in forum: C Programming
    Replies:
    0
    Views:
    302
    Giovanni Funchal
    Oct 13, 2010
  5. Benjamin Peterson

    [RELEASE] 3.1.4 release candidate 1

    Benjamin Peterson, May 29, 2011, in forum: Python
    Replies:
    2
    Views:
    198
    Alain Ketterlin
    May 30, 2011
Loading...

Share This Page