C++0x memory model and atomics, some questions

Discussion in 'C++' started by Joshua Maurice, Sep 1, 2010.

  1. So, I'm trying to learn more about threading. Which papers are the
    most current? Which are the most readable breakdowns which I might
    comprehend without delving too deep into standardese?

    I've thus far found these links to be of great use:

    http://www.hpl.hp.com/techreports/2008/HPL-2008-56.pdf
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2664.htm
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2338.html

    Here's some questions.

    1-
    So, we now have the stringent requirement that an implementation
    cannot "introduce races", including specifically that an
    implementation cannot introduce a write to a memory location on a code
    path where there was none. This has big implications. No longer is an
    implementation allowed to store an 8 bit unit by reading teh current
    value of a whole 32 or 64 bit word, use bit shifting and bit masking,
    and then write back the whole 32 or 64 bit word to modify the 8 bit
    memory location. This means that writing to certain smaller types
    might be a lot slower now (though it's my understanding most sane
    architectures did this anyway).

    It's also my understanding that the "int" type was meant at some point
    to be the "fastest" integer type for the implementation. However, is
    this the case nowadays when "int" on a lot of 64 bit systems is 32 bit
    for "compatibility reasons"? Also specifically for the atomic compare
    and exchange, how can I know which ones will be implemented via a
    hardware compare and swap, and which ones need to be emulated?
    Offhand, I would figure out on each system which is the "efficient"
    base integer type, and then I would provide a typedef, and I would use
    that typedef in portable code.

    2-
    Related: Is anyone going to use std::atomic for anything but integer
    types? Some of the implementation notes from
    http://gcc.gnu.org/wiki/Atomic/GCCMM
    give me pause. How efficient would a std::atomic<some_complex_type> be
    vs an explicit lock guarding the relevant pieces of code? What's the
    expectation of use of std::atomic on complex user types and the
    expected implementation? A (static) lock per type? A lock hidden in
    each atomic object which is big enough or complex enough?

    3-
    http://gcc.gnu.org/wiki/Atomic/GCCMM/ImplemSumm
    > Optimizer effects


    > It is worth noting that the impact on the optimizer is slightly different with the different models. Although the existence of an atomic load or store operations will act as an optimization barrier, some optimizations can be performed on the atomic operations themselves if they are not in sequentially consistent mode.


    > For instance, dead store removal can be performed:


    > x.store (a, memory_order_release);
    > x.store (b, memory_order_release);


    > It is perfectly safe to note that storing 'a' into 'x' is in fact a dead store, and remove it. Its also possible to move statements around, subject to the various dependencies imposed by the memory model mode.


    This is incorrect, right? Specifically,
    x.store (a);
    x.store (b);
    aka
    x.store (a, memory_order_seq_cst);
    x.store (b, memory_order_seq_cst);
    can be optimized to
    x.store (b, memory_order_seq_cst);
    right? An implementation does not need to faithfully do every single
    atomic load and store with memory_order_seq_cst, right?

    4-
    Let's consider cooperative thread canceling. Suppose each thread has a
    stop flag, a std::atomic<bool>. When you want to cancel a thread, you
    set that thread's stop flag. The thread periodically checks its stop
    flag's value, and if it's set, it starts returning error codes (or
    throwing exceptions) to break out and finish.

    As I understand it, with the default memory_order_seq_cst, there is a
    total order across all memory_order_seq_cst operations on all memory
    locations. So, if the write and reads to the stop flag all used
    memory_order_seq_cst, then the thread will notice the stop request
    quite quickly, basically as soon as the scheduler schedules it.
    However, with any of the other memory_order's, the update may take
    arbitrary long to become visible (though QoI should make it visible
    "soon"). As I don't need this stop flag to give any other additional
    guarantees, I only need to decide between memory_order_relaxed and
    memory_order_seq_cst, right? Should I favor one over the other? Why?

    5-
    I remember this one thread from a while ago on comp.lang.c+
    +.moderated.
    http://groups.google.com.ng/group/comp.lang.c .moderated/msg/2a9e0dc41f977db7

    It was a large discussion over the usefulness and portability of
    volatile as a threading primitive. I recall that I very strongly said
    don't sprinkle your code with it, though it's perhaps acceptable to
    use volatile to implement a more standard primitive on whatever
    machines actually guarantee useful threading semantics (like some
    windows ones).

    I specifically remembered that post by Andy Venikov as a potentially
    legitimate use of volatile. I replied to it saying that as long as the
    compiler didn't do anything silly like speculatively load some things,
    and reloaded all things, then it would work. I also said that it
    seemed reasonable, but I'm not sure any machine actually guarantees
    any of that. I felt that something was wrong, but I was not and am not
    the most knowledgeable about assembly so I couldn't put my finger on
    it. Now I think I can. He has a data dependency.

    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2664.htm

    What Andy Venikov was trying to do seems to be exactly the use case
    for std::memory_order_consume. Right? Without that, his volatile hack
    is just that, a hack.

    6-
    I definitely do not get all of the nuances involved with
    std::memory_order_consume. The paper
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2664.htm
    goes into some details, but it's not quite formal enough for my
    liking, and/or the standardese is too standardesy for my liking. Any
    better sources on this?
     
    Joshua Maurice, Sep 1, 2010
    #1
    1. Advertising

  2. On Sep 1, 2:38 pm, Joshua Maurice <> wrote:
    > 3-
    > http://gcc.gnu.org/wiki/Atomic/GCCMM/ImplemSumm
    >
    > > Optimizer effects
    > > It is worth noting that the impact on the optimizer is slightly different with the different models. Although the existence of an atomic load or store operations will act as an optimization barrier, some optimizations can be performed on the atomic operations themselves if they are not in sequentially consistent mode.
    > > For instance, dead store removal can be performed:
    > >  x.store (a, memory_order_release);
    > >  x.store (b, memory_order_release);
    > > It is perfectly safe to note that storing 'a' into 'x' is in fact a dead store, and remove it. Its also possible to move statements around, subject to the various dependencies imposed by the memory model mode.

    >
    > This is incorrect, right? Specifically,
    >   x.store (a);
    >   x.store (b);
    > aka
    >   x.store (a, memory_order_seq_cst);
    >   x.store (b, memory_order_seq_cst);
    > can be optimized to
    >   x.store (b, memory_order_seq_cst);
    > right? An implementation does not need to faithfully do every single
    > atomic load and store with memory_order_seq_cst, right?


    My reading of the draft standard
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3035.pdf
    confirms, though it strongly suggests that an implementation should
    not do this in an unbounded loop.

    > 4-
    > Let's consider cooperative thread canceling. Suppose each thread has a
    > stop flag, a std::atomic<bool>. When you want to cancel a thread, you
    > set that thread's stop flag. The thread periodically checks its stop
    > flag's value, and if it's set, it starts returning error codes (or
    > throwing exceptions) to break out and finish.
    >
    > As I understand it, with the default memory_order_seq_cst, there is a
    > total order across all memory_order_seq_cst operations on all memory
    > locations. So, if the write and reads to the stop flag all used
    > memory_order_seq_cst, then the thread will notice the stop request
    > quite quickly, basically as soon as the scheduler schedules it.
    > However, with any of the other memory_order's, the update may take
    > arbitrary long to become visible (though QoI should make it visible
    > "soon"). As I don't need this stop flag to give any other additional
    > guarantees, I only need to decide between memory_order_relaxed and
    > memory_order_seq_cst, right? Should I favor one over the other? Why?


    Again, the draft standard
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3035.pdf
    says that an implementation "should" make atomic stores visible to
    other threads in a "reasonable" amount of time. If the implementation
    follows this note, then I guess memory_order_relaxed is the way to go.
    I'm just concerned that the implementation might not.
     
    Joshua Maurice, Sep 3, 2010
    #2
    1. Advertising

  3. Alexander Terekhov, Sep 3, 2010
    #3
  4. On Sep 3, 4:50 am, Alexander Terekhov <> wrote:
    > Joshua Maurice wrote:
    >
    > > So, I'm trying to learn more about threading. Which papers are the
    > > most current?

    >
    > Try
    >
    > http://www.cl.cam.ac.uk/~pes20/cpp/tech.pdf
    >
    > regards,
    > alexander.


    Thank you very much. This is definitely one of the things I was
    looking for.
     
    Joshua Maurice, Sep 3, 2010
    #4
  5. One other question.

    -7
    // Initial conditions
    (not atomic) int x = 0, y = 0

    // Thread 1
    if (x == 1)
    y = 1;

    //Thread 2
    if (y == 1)
    x = 1;

    // ----
    So, given those initial conditions, for those two threads started
    concurrently, can you end up with x == 1 and y == 1? Does this
    (pseudo) C++ program have formally undefined behavior?

    Is there real hardware which can end with x == 1 and y == 1 for a
    simplistic naive translation to assembly? Or is there hypothesized
    useful hardware which can do that? My naive understanding says no, but
    I'm learning to question such simplistic sequentially consistent
    reasoning.
     
    Joshua Maurice, Sep 4, 2010
    #5
  6. Joshua Maurice wrote:
    >
    > One other question.
    >
    > -7
    > // Initial conditions
    > (not atomic) int x = 0, y = 0
    >
    > // Thread 1
    > if (x == 1)
    > y = 1;
    >
    > //Thread 2
    > if (y == 1)
    > x = 1;
    >
    > // ----
    > So, given those initial conditions, for those two threads started
    > concurrently, can you end up with x == 1 and y == 1? Does this
    > (pseudo) C++ program have formally undefined behavior?


    No and no.

    >
    > Is there real hardware which can end with x == 1 and y == 1 for a
    > simplistic naive translation to assembly?


    No.

    > Or is there hypothesized
    > useful hardware which can do that? My naive understanding says no, but
    > I'm learning to question such simplistic sequentially consistent
    > reasoning.


    A hypothesized hardware can speculate that x/y is 1 and proceed with an
    uncommitted modification (preserving the old value until commit) just to
    roll it back (undo modification) a bit later when the speculation turns
    out to be false. If such hypothesized hardware would make the
    uncommitted modification visible to other threads then there would be a
    problem.

    regards,
    alexander.
     
    Alexander Terekhov, Sep 4, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PKJ
    Replies:
    1
    Views:
    394
    Victor Bazarov
    Oct 22, 2004
  2. Gumby
    Replies:
    7
    Views:
    408
    Gumby
    Mar 21, 2005
  3. Christopher Benson-Manica

    Intended uses of 1.5 atomics package

    Christopher Benson-Manica, Aug 4, 2006, in forum: Java
    Replies:
    4
    Views:
    409
    Christopher Benson-Manica
    Aug 10, 2006
  4. Replies:
    1
    Views:
    504
    Andreas Wollschlaeger
    Oct 6, 2006
  5. Chris Thomasson

    Some quick c++ memory-model concerns...

    Chris Thomasson, Sep 3, 2006, in forum: C++
    Replies:
    0
    Views:
    393
    Chris Thomasson
    Sep 3, 2006
Loading...

Share This Page