Please disprove this Double-Checked Locking "fix"

Discussion in 'C++' started by jl_post@hotmail.com, Apr 26, 2011.

  1. Guest

    Hi,

    Recently I've been reading up on "Double-Checked Locking" in C++
    and how it's often implemented imperfectly. The Article "C++ and the
    Perils of Double-Checked Locking" by Scott Meyers and Andrei
    Alexandrescu ( http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
    ) provides a good overview of how it's usually done and why it's often
    inadequate.

    I won't go into details here, but the article states how this code:

    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    pInstance = new Singleton;
    }
    }
    return pInstance;
    }

    and its variants aren't completely thread safe.

    Now, I've been thinking about how to make the code thread-safe, and
    a few days ago I came up with a couple of solutions. As I implemented
    them, I realized that these variants of mine also had problems.

    One of the "solutions" I came up with was this:

    // Assign the singleton object to a temp pointer:
    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    Singleton* temp = new Singleton; // initialize to temp
    pInstance = temp; // assign temp to pInstance
    }
    }
    return pInstance;
    }

    Now, this approach isn't new (and is in fact covered in Meyer's and
    Alexandrescu's article). The immediate problem is that the compiler
    can optimize away the temporary variable, so there's no guarantee that
    pInstance won't get the memory before it is properly initialized.

    So I thought about making sure pInstance was unset in the
    constructor, like this:

    Singleton::Singleton() {
    // initializing member data
    assert( pInstance == 0 );
    }

    This way, the assert() statement ensures that pInstance will stay NULL
    until the constructor is finished.

    BUT! I realized that this solution was flawed, because the
    compiler can inline the constructor into the ::instance() method, like
    this:

    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    Singleton* temp = operator new(sizeof(Singleton)); //
    initialize to temp
    new (temp) Singleton;
    // initializing member data
    assert( pInstance == 0 );
    pInstance = temp; // assign temp to pInstance
    }
    }
    return pInstance;
    }

    and further rearrange the lines like this:


    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    Singleton* temp = operator new(sizeof(Singleton)); //
    initialize to temp
    new (temp) Singleton;
    assert( pInstance == 0 );
    pInstance = temp; // assign temp to pInstance
    // initializing member data
    }
    }
    return pInstance;
    }

    (The compiler is allowed to do this, I think, because the compiler can
    reorder code as long as it's not detectable in a single-threaded
    program.)

    So I realized pInstance still has the possibility of being assigned
    memory before it is properly initialized.

    Then I thought: What if I included a second lock to make sure that
    the constuctor is finished before pInstance is set? So I tried this:

    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    Singleton* temp = new Singleton; // initialize to temp
    secondLock.lock();
    pInstance = temp; // assign temp to pInstance
    secondLock.unlock();
    }
    }
    }
    return pInstance;
    }

    Singleton::Singleton() {
    secondLock.lock();
    // initializing member data
    secondLock.unlock();
    }

    But eventually I realized that the order of the locks is not
    guaranteed, that pInstance could still get assigned before the
    constructor gets entered!

    So I shot down my own two solutions. However, I kept thinking of
    variants, and I started to wonder: What if I combined the two
    variants? In other words, I'd take advantage of both assert() and a
    second lock, like this:


    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    Singleton* temp = new Singleton; // initialize to temp
    secondLock.lock();
    pInstance = temp; // assign temp to pInstance
    secondLock.unlock();
    }
    }
    }
    return pInstance;
    }

    Singleton::Singleton() {
    secondLock.lock();
    // initializing member data
    assert( pInstance == 0 );
    secondLock.unlock();
    }

    In this way, the second lock guarantees that pInstance is not assigned
    to inside the constructor code, and the assert() statement ensures
    that the constructor code executes before the code that assigns to
    pInstance.

    This looks good as far as I can tell, but I'm thinking it's too
    good to be true.

    Since I'm a bit skeptical about this last solution, could someone
    poke holes in it? (I'm eager to see if I really did find a solid
    solution, or if it's just another pipe dream.)

    Thanks!
     
    , Apr 26, 2011
    #1
    1. Advertising

  2. On Apr 26, 9:58 am, wrote:
    > Hi,
    >
    >    Recently I've been reading up on "Double-Checked Locking" in C++
    > and how it's often implemented imperfectly.  The Article "C++ and the
    > Perils of Double-Checked Locking" by Scott Meyers and Andrei
    > Alexandrescu (http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
    > ) provides a good overview of how it's usually done and why it's often
    > inadequate.

    [Snipped discussion of double checked locking]
    >    Since I'm a bit skeptical about this last solution, could someone
    > poke holes in it?  (I'm eager to see if I really did find a solid
    > solution, or if it's just another pipe dream.)


    I'm sorry for being so gruff in the following (not really), but it's
    evident that you have not even fully read the paper which you cited,
    the only paper, and that is highly irritating.

    Look, follow this, and understand just how royally screwed your
    approach is. Given the initial conditions:
    int x = 0;
    int y = 0;
    Thread 1 executing:
    x = 1;
    y = 2;
    And threads 2-5 executing the following all concurrently with threads
    1-5:
    cout << x << ' ' << y << endl;
    On non-obscure hardware and compilers, that can print /on a single
    execution/, all 4 combinations:
    00
    02
    10
    12
    Repeat, /on a single execution/. Different threads can see some writes
    done by different threads in different orders! Thread 2 could see the
    write to x without seeing the write to y, and thread 3 can see the
    write to y without seeing the write to x.

    Technically, the standard guarantees even worse - undefined behavior,
    but the above is an example that /really happens/ on /real/ hardware
    and /real/ compilers. This is largely due to hardware "reorderings",
    but let me quote the paper for the relevant bit:

    To emphasize, it might be the compiler reordering it, it might be the
    hardware reordering it, and it could even be some new kind of thing
    which hasn't been invented yet! In practice the compiler writers and
    hardware makers give guarantees for C++03 which mean that your code is
    fundamentally broken. There is no standard or given guarantee of any
    kind that any code written without proper synchronization will work.
    None. Your code, give it to any compiler writer, and they will say
    "won't work", which means while it might incidentally work today, but
    tomorrow they might put in a new optimization (software or hardware),
    and your code breaks. **This is the fundamental problem which you
    cannot dodge!!**

    Here's how you can break your code. Note again the general problem
    that there are no guarantees that could even make it work, so I want
    you to focus on the above basic problem, and do not spend too much
    time on this, but I present it completeness. For example:

    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    Singleton* temp = new Singleton; // initialize to temp
    secondLock.lock();
    pInstance = temp; // assign temp to pInstance
    secondLock.unlock();
    }
    }
    return pInstance;
    }

    A sufficiently smart compiler is allowed to move things from before a
    lock to after the lock, and from after an unlock to before a lock. It
    can transform the above to:

    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    secondLock.lock();
    Singleton* temp = new Singleton; // initialize to temp
    pInstance = temp; // assign temp to pInstance
    secondLock.unlock();
    }
    }
    return pInstance;
    }

    And once we get that, it's trivial to change it to:

    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    secondLock.lock();
    pInstance = new Singleton;
    secondLock.unlock();
    }
    }
    return pInstance;
    }

    Which means we're back to screwed for the reasons known to you. I
    didn't even need to resort to the wonderful DEC Alpha and its split
    cache, but I could have.

    You need to reread the paper which you cited. Here's the link again
    for your benefit.
    http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
    And pay attention this time.
     
    Joshua Maurice, Apr 26, 2011
    #2
    1. Advertising

  3. On Apr 26, 11:16 am, Joshua Maurice <> wrote:
    > A sufficiently smart compiler is allowed to move things from before a
    > lock to after the lock, and from after an unlock to before a lock.


    Ack, I meant:

    > A sufficiently smart compiler is allowed to move things from before a
    > lock to after the lock, and from after an unlock to before
    > **an unlock**.


    Hopefully it's clear from context. My mistake. My apologies.
     
    Joshua Maurice, Apr 26, 2011
    #3
  4. On Apr 26, 11:30 am, Pete Becker <> wrote:
    > On 2011-04-26 14:16:33 -0400, Joshua Maurice said:
    >
    > > This is largely due to hardware "reorderings",
    > > but let me quote the paper for the relevant bit:

    >
    > >

    >
    > > To emphasize, it might be the compiler reordering it, it might be the
    > > hardware reordering it,

    >
    > Let me underscore the problem, as the quotation above misses part of
    > the issue. Even when the instructions are executed in exactly the order
    > that you want them to be executed, different threads can see results in
    > a different order from the order in which the thread doing the stores
    > actually did them.
    >
    > When data is shared between threads and at least one thread is writing
    > that data, all accesses to that data must be synchronized.
    >
    > C++0x says that the behavior of a program that does not synchronize
    > such accesses is undefined.


    I fully agree. I also stated that /very explicitly/ just before the
    snipped quote with my 1+4 thread example. I did intend to clearly
    state exactly that. At least I think I did. For future reference, was
    this unclear? Or did you merely mean to emphasize this point? That is
    a good point to emphasize.

    I'm just a little confused.
     
    Joshua Maurice, Apr 26, 2011
    #4
  5. James Kanze Guest

    On Apr 26, 5:58 pm, wrote:

    > Recently I've been reading up on "Double-Checked Locking" in
    > C++ and how it's often implemented imperfectly.


    You mean, how it is impossible to implement in standard C++
    (03).

    > The Article "C++ and the Perils of Double-Checked Locking" by
    > Scott Meyers and Andrei Alexandrescu
    > (http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf)
    > provides a good overview of how it's usually done and why it's
    > often inadequate.


    The standard solution for implementing a singleton works just
    fine:

    Singleton* Singleton::instance()
    {
    ScopedLock lock(mutex);
    if ( pInstance == NULL )
    pInstance = new Singleton;
    return pInstance;
    }

    If the lock is a performance bottleneck (a case I've yet to
    encounter), then you can simply insure that the singleton is
    constructed before threads are started, and skip the lock.

    Anything else is simply added complexity, for no reason.

    --
    James Kanze
     
    James Kanze, Apr 30, 2011
    #5
  6. Pavel Guest

    Leigh Johnston wrote:
    > On 30/04/2011 23:54, James Kanze wrote:
    >> On Apr 26, 5:58 pm, wrote:
    >>
    >>> Recently I've been reading up on "Double-Checked Locking" in
    >>> C++ and how it's often implemented imperfectly.

    >>
    >> You mean, how it is impossible to implement in standard C++
    >> (03).
    >>
    >>> The Article "C++ and the Perils of Double-Checked Locking" by
    >>> Scott Meyers and Andrei Alexandrescu
    >>> (http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf)
    >>> provides a good overview of how it's usually done and why it's
    >>> often inadequate.

    >>
    >> The standard solution for implementing a singleton works just
    >> fine:
    >>
    >> Singleton* Singleton::instance()
    >> {
    >> ScopedLock lock(mutex);
    >> if ( pInstance == NULL )
    >> pInstance = new Singleton;
    >> return pInstance;
    >> }
    >>

    >
    > How does this prevent CPU reordering of stores to the pInstance pointer
    > and the object pInstance points to?

    Any system-dependent mutex has to behave like a double-side memory barrier (it's
    a follow-up from the mutual exclusion requirement).

    This does not prevent from unnecessary overhead though (grabbing a mutex in a
    highly contentious case can be expensive and it is unnecessary when all
    contenders want is to read a pointer rather than change it -- which is the case
    after the Singleton has been initialized).


    >
    > /Leigh


    -Pavel
     
    Pavel, May 1, 2011
    #6
  7. James Kanze Guest

    On May 1, 9:49 pm, Leigh Johnston <> wrote:
    > On 30/04/2011 23:54, James Kanze wrote:
    > > On Apr 26, 5:58 pm, wrote:


    > >> Recently I've been reading up on "Double-Checked Locking" in
    > >> C++ and how it's often implemented imperfectly.


    > > You mean, how it is impossible to implement in standard C++
    > > (03).


    > >> The Article "C++ and the Perils of Double-Checked Locking" by
    > >> Scott Meyers and Andrei Alexandrescu
    > >> (http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf)
    > >> provides a good overview of how it's usually done and why it's
    > >> often inadequate.


    > > The standard solution for implementing a singleton works just
    > > fine:


    > > Singleton* Singleton::instance()
    > > {
    > > ScopedLock lock(mutex);
    > > if ( pInstance == NULL )
    > > pInstance = new Singleton;
    > > return pInstance;
    > > }


    > How does this prevent CPU reordering of stores to the
    > pInstance pointer and the object pInstance points to?


    Compiler magic:).

    Presumably, somewhere in the constructor of ScopedLock, you end
    up calling pthread_mutex_lock, or something similar. And if the
    compiler is Posix compliant, it knows that it cannot move code
    accross a call to pthread_mutex_lock. (And of course, the code
    in pthread_mutx_lock contains the necessary machine instructions
    to ensure that the hardware respects the order. Posix requires
    it.) (Replace Posix with Windows, and posix_mutex_lock with the
    corresponding Windows function, if that's your platform.)

    Of course, if the compiler doesn't support multithreading (or
    you didn't specify the options it requires for it to support
    multithreading), then there is no solution; you cant use
    multithreading, period.

    --
    James Kanze
     
    James Kanze, May 1, 2011
    #7
  8. James Kanze Guest

    On May 1, 10:26 pm, Pavel
    <> wrote:

    [...]
    > This does not prevent from unnecessary overhead though
    > (grabbing a mutex in a highly contentious case can be
    > expensive and it is unnecessary when all contenders want is to
    > read a pointer rather than change it -- which is the case
    > after the Singleton has been initialized).


    Contention becomes relevant when two conditions are met: many
    threads want to acquire the mutex, and having acquired the
    mutex, a thread holds it for a certain time. In the case of the
    singleton, once the object has been created, the thread holds
    the mutex the time it takes for a comparison with null. That
    is, a very, very short time. I have yet to see a case where
    there was actual contention for the mutex in a singleton once
    the singleton has been created. And if you do end up in such a
    rare case, as I pointed out, calling Singleton::instance() once
    before threading starts is sufficient to ensure that the mutex
    isn't needed at all. Double check locking is a solution in
    search of a problem, and to top it off, a solution which doesn't
    work (at least in standard C++).

    --
    James Kanze
     
    James Kanze, May 1, 2011
    #8
  9. Pavel Guest

    James Kanze wrote:
    > On May 1, 10:26 pm, Pavel
    > <> wrote:
    >
    > [...]
    >> This does not prevent from unnecessary overhead though
    >> (grabbing a mutex in a highly contentious case can be
    >> expensive and it is unnecessary when all contenders want is to
    >> read a pointer rather than change it -- which is the case
    >> after the Singleton has been initialized).

    >
    > Contention becomes relevant when two conditions are met: many
    > threads want to acquire the mutex, and having acquired the
    > mutex, a thread holds it for a certain time. In the case of the
    > singleton, once the object has been created, the thread holds
    > the mutex the time it takes for a comparison with null. That
    > is, a very, very short time. I have yet to see a case where
    > there was actual contention for the mutex in a singleton once
    > the singleton has been created.

    I agree most of the times it does not happen. However, I have met situations
    when several threads start calling a short method on a single instance (like
    sending/reading a simple but often message to/from a singleton queue or similar)
    and, after they clash once, the situation called "lock convoy" emerges whereas
    the threads spent most of their time switching from runnable to blocked state.

    In some old implementations of mutices this condition was made worse by CPU
    cache traffic bursts occuring when the holding thread released a mutex -- all
    blocked threads were awoken to see which grabs it first; some of the current
    implementations have this aggravating issue solved (e.g. in linux 2.6 Kernel
    it's solved via futex-based implementation).

    Regardless, as soon as a lock convoy is formed, it's kind of self-supporting
    until all contending threads but one receive a relatively long message to
    process -- all at the same time -- for which you might have to wait for a while.

    > And if you do end up in such a
    > rare case, as I pointed out, calling Singleton::instance() once
    > before threading starts is sufficient to ensure that the mutex
    > isn't needed at all. Double check locking is a solution in
    > search of a problem, and to top it off, a solution which doesn't
    > work (at least in standard C++).

    I have never defended DCL. In fact, I have never defended a Singleton pattern.
    IMHO it is the most-overused, the least-understood, the vaguest-defined of all
    GoF patterns; and few people seem to have read the end of the "Consequences"
    sub-section of "Singleton" section of their book (actuating Consequences #4,5
    there may invalidate the whole discussion on DCL for Singleton irrelevant).

    I can't recall when I used it last time -- most of the times you are much better
    without. When I have to use one written by someone else, I try to acquire it
    before starting any threads and cache, as per your advice or per the article
    referred to by OP.

    >
    > --
    > James Kanze
     
    Pavel, May 2, 2011
    #9
  10. Scott Meyers Guest

    On 4/26/2011 10:50 AM, Pete Becker wrote:
    > pinstance points to. The C++0x solution is:
    >
    > #include <atomic>
    > std::atomic<Singleton*> pinstance;
    >
    > if (pinstance == 0) {
    > Lock lock;
    > if (pinstance == 0)
    > pinstance = new Singleton;
    > }
    > return pinstance;
    >
    > Making pinstance an atomic variable ensures that all writes that "happen
    > before" the assignment to pinstance in one thread are seen before the
    > result of that assignment is seen in any other thread.


    True, but that's not enough to make this code work. You also need to
    know that assignment to pinstance can't take place until the Singleton
    constructor has run to completion. You have that guarantee in C++0x,
    too, so the code is correct, but it's not just the use of atomics that
    gives you that correctness. You also need the additional guarantees
    that C++0x offers regarding the behavior of new expressions (compared to
    the weaker guarantees that C++03 gives you).

    Scott

    --
    * C++ and Beyond: Meyers, Sutter, & Alexandrescu, Aug 7-10 in Banff
    (http://cppandbeyond.com/)
     
    Scott Meyers, May 2, 2011
    #10
  11. Pavel Guest

    Leigh Johnston wrote:
    > On 01/05/2011 22:44, Leigh Johnston wrote:
    >> On 01/05/2011 22:26, Pavel wrote:
    >>> Leigh Johnston wrote:
    >>>> On 30/04/2011 23:54, James Kanze wrote:
    >>>>> On Apr 26, 5:58 pm, wrote:
    >>>>>
    >>>>>> Recently I've been reading up on "Double-Checked Locking" in
    >>>>>> C++ and how it's often implemented imperfectly.
    >>>>>
    >>>>> You mean, how it is impossible to implement in standard C++
    >>>>> (03).
    >>>>>
    >>>>>> The Article "C++ and the Perils of Double-Checked Locking" by
    >>>>>> Scott Meyers and Andrei Alexandrescu
    >>>>>> (http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf)
    >>>>>> provides a good overview of how it's usually done and why it's
    >>>>>> often inadequate.
    >>>>>
    >>>>> The standard solution for implementing a singleton works just
    >>>>> fine:
    >>>>>
    >>>>> Singleton* Singleton::instance()
    >>>>> {
    >>>>> ScopedLock lock(mutex);
    >>>>> if ( pInstance == NULL )
    >>>>> pInstance = new Singleton;
    >>>>> return pInstance;
    >>>>> }
    >>>>>
    >>>>
    >>>> How does this prevent CPU reordering of stores to the pInstance pointer
    >>>> and the object pInstance points to?
    >>> Any system-dependent mutex has to behave like a double-side memory
    >>> barrier (it's a follow-up from the mutual exclusion requirement).

    >>
    >> I didn't realize this was guaranteed for all systems...
    >>

    >
    > The ellipsis was an indication that I find your claim that all mutex
    > implementations include memory barriers slightly dubious. :) From
    > Wikipedia:
    >
    > "These primitives (mutexes) are *usually* implemented with the memory
    > barriers required to provide the expected memory visibility semantics.
    > In such environments explicit use of memory barriers is not *generally*
    > necessary."
    >
    > I guess a mutex implementation that didn't include memory barriers on a
    > system where memory barriers are needed would not be very useful.

    Yes, that's what I meant. Wikipedia is a great resource to find an explanation
    of a concept but it does not have to tie all ends together with the strictness
    of a specification. In our case, the article on Mutual Exclusion says:

    "Mutual exclusion .. algorithms are used in concurrent programming to avoid the
    simultaneous use of a common resource, such as a global variable, by pieces of
    computer code called critical sections. A critical section is a piece of code in
    which a process or thread accesses a common resource".

    In our case, we want mutual exclusion for threads and our "common resource" is
    memory containing pInstance. A mutex that in fact does *not* allow a thread to
    access the common resource (e.g. due to not executing a memory barrier) does not
    seem to satisfy to the above definition.

    Then of course another article (on memory barriers) written by other people does
    not agree.. I do not see an issue with that. Probably they saw something called
    "mutex" that required the mutex client code to additionally exercise memory
    barrier.. in my and the Wikipedia first article's view it would not be a
    complete mutex, anyway. I am fully content with being in this sort of
    disagreement with one or more articles at Wikipedia.

    >
    > /Leigh


    -Pavel
     
    Pavel, May 2, 2011
    #11
  12. James Kanze Guest

    On May 2, 12:23 am, Pavel
    <> wrote:
    > James Kanze wrote:


    [...]
    > In fact, I have never defended a Singleton pattern. IMHO it
    > is the most-overused, the least-understood, the
    > vaguest-defined of all GoF patterns; and few people seem to
    > have read the end of the "Consequences" sub-section of
    > "Singleton" section of their book (actuating Consequences #4,5
    > there may invalidate the whole discussion on DCL for Singleton
    > irrelevant).


    I don't think that the presentation in the GoF was actually that
    vague. The problem is that as presented in the GoF, it is a
    form of contract enforcement: the design says that there may
    only be one instance, so we design the class to enforce the
    design. Which per se is probably overkill: the semantics of
    most singletons are such that no reasonable programmer would
    create more than one instance anyway, so it's not worth the
    extra effort to forbid it. In the case of C++, however, the
    singleton idiom is most often used primarily to solve order of
    initialization issues; once you've created the code necessary to
    solve those, making it a singleton (enforcing only one instance)
    is merely a question of where you put the "public:".

    --
    James Kanze
     
    James Kanze, May 2, 2011
    #12
  13. On May 1, 2:44 pm, Leigh Johnston <> wrote:
    > On 01/05/2011 22:26, Pavel wrote:
    > > Leigh Johnston wrote:
    > >> How does this prevent CPU reordering of stores to the pInstance pointer
    > >> and the object pInstance points to?

    > > Any system-dependent mutex has to behave like a double-side memory
    > > barrier (it's a follow-up from the mutual exclusion requirement).

    >
    > I didn't realize this was guaranteed for all systems...


    Anything that calls itself a mutex or lock in the C++0x or Java memory
    model, or any sane memory model for that matter, must guarantee that.
    That's word the terms "mutex", "lock", and "critical section" mean. It
    guarantees that properly synchronized code will execute according to
    the guarantees of the language, and if some hardware needs a "double-
    sided memory barrier", then it needs a "double-sided memory barrier".
     
    Joshua Maurice, May 2, 2011
    #13
  14. On May 1, 5:14 pm, Scott Meyers <> wrote:
    > On 4/26/2011 10:50 AM, Pete Becker wrote:
    >
    > > pinstance points to. The C++0x solution is:

    >
    > > #include <atomic>
    > > std::atomic<Singleton*> pinstance;

    >
    > > if (pinstance == 0) {
    > > Lock lock;
    > > if (pinstance == 0)
    > > pinstance = new Singleton;
    > > }
    > > return pinstance;

    >
    > > Making pinstance an atomic variable ensures that all writes that "happen
    > > before" the assignment to pinstance in one thread are seen before the
    > > result of that assignment is seen in any other thread.

    >
    > True, but that's not enough to make this code work.  You also need to
    > know that assignment to pinstance can't take place until the Singleton
    > constructor has run to completion.  You have that guarantee in C++0x,
    > too, so the code is correct, but it's not just the use of atomics that
    > gives you that correctness.  You also need the additional guarantees
    > that C++0x offers regarding the behavior of new expressions (compared to
    > the weaker guarantees that C++03 gives you).


    I don't think I'd describe it that way.

    Consider:
    pinstance = new Singleton;

    For a simple pointer in C++03, there are no threading guarantees
    given, so of course it can implement it however it wants, which is
    apparently frequently enough something like the following pseudo-code
    pinstance = malloc(sizeof(Singleton));
    new(pinstance) Singleton();
    Even under POSIX, to detect the difference between the above and the
    following pseudo-code
    temp = malloc(sizeof(Singleton));
    new(temp) Singleton();
    pinstance = temp;
    would require code that has a race condition, which means undefined
    behavior, which means that the compiler can do whatever it wants.

    In C++0x, the situation is unchanged for a simple pointer. There are
    no guarantees specific to "new expressions" which change this. The
    compiler is allowed to implement it in exactly the same way. If you
    have some code that could detect this, then you still have a race
    condition, and thus undefined behavior, and thus all bets are off, so
    the compiler free to do as it wants.

    What does change is when we throw std::atomic into the mix. When
    pinstance is of type std::atomic, the following:
    pinstance = new Singleton;
    is equivalent to the following ala operator overloading pseudo-code
    (forgive me for not knowing the specific function name offhand):
    pinstance.set(new Singleton);
    And C++0x does have very specific guarantees concerning std::atomic.
    It doesn't matter that the argument is a new expression. It could be
    any sort of expression or function call. There is no guarantee
    specific to new expressions. There are specific guarantees relating to
    std::atomic::set and std::atomic::get, and there are guarantees that
    describe all expressions, but not new expressions in particular.

    Perhaps you were talking about the guarantees specific to function
    local static variables? C++0x does guarantee that their initialization
    is properly synchronized so that the first access will initialize it,
    and all other threads will wait until it's done, and all other threads
    will properly see the initialized function local static variable.
    However, Pete Becker's code does not use a function local static, and
    even then the guarantees about function local statics are not specific
    to new expressions. You could initialize one with a normal function
    call or some other expression which isn't a new expression, and the
    same guarantees apply.
     
    Joshua Maurice, May 2, 2011
    #14
  15. "Leigh Johnston" <> wrote in message
    news:...
    [...]
    >> On 01/05/2011 22:26, Pavel wrote:

    [...]
    >>> Any system-dependent mutex has to behave like a double-side memory
    >>> barrier (it's a follow-up from the mutual exclusion requirement).

    >>
    >> I didn't realize this was guaranteed for all systems...
    >>

    >
    > The ellipsis was an indication that I find your claim that all mutex
    > implementations include memory barriers slightly dubious. :) From
    > Wikipedia:
    >
    > "These primitives (mutexes) are *usually* implemented with the memory
    > barriers required to provide the expected memory visibility semantics. In
    > such environments explicit use of memory barriers is not *generally*
    > necessary."
    >
    > I guess a mutex implementation that didn't include memory barriers on a
    > system where memory barriers are needed would not be very useful.


    FWIW, there are mutex implementations that can "skip" explicit memory
    barrier operations. You basically need to think in term of "Asymmetric
    Synchronization". Here is some more information on the subject:

    http://blogs.sun.com/dave/resource/Asymmetric-Dekker-Synchronization.txt

    http://www.trl.ibm.com/people/kawatiya/Kawachiya05phd.pdf
    (chapter 6)
     
    Chris M. Thomasson, May 4, 2011
    #15
  16. Pavel Guest

    James Kanze wrote:
    > On May 2, 12:23 am, Pavel
    > <> wrote:
    >> James Kanze wrote:

    >
    > [...]
    >> In fact, I have never defended a Singleton pattern. IMHO it
    >> is the most-overused, the least-understood, the
    >> vaguest-defined of all GoF patterns; and few people seem to
    >> have read the end of the "Consequences" sub-section of
    >> "Singleton" section of their book (actuating Consequences #4,5
    >> there may invalidate the whole discussion on DCL for Singleton
    >> irrelevant).

    >
    > I don't think that the presentation in the GoF was actually that
    > vague. The problem is that as presented in the GoF, it is a
    > form of contract enforcement: the design says that there may
    > only be one instance,

    That's where the vagueness started (in definition of the problem, not the
    analysis and especially consequences which most people don't read to the end).

    "Only one" -- but in what context?
    One per a process?
    a thread?
    the whole world?
    a CPU?
    one computer?
    corporate network?
    a region of the corporate network?

    all of the above make sense and are being actively used. 50% of singletons that
    were defined in single-threaded programs as one-per-process (and, hence,
    per-thread), should be taken to TLS and made "one-per-thread" (and hence,
    "many-per-process) when converting to multi-threaded program. The other 50%
    should not (but of these 50%, some would gain from changing the requirement to
    "only one per machine" and becoming servers and yet others could perfectly
    become a pool with many-to-many relationship to the available threads or CPUs --
    or cores or who knows what). The techniques for enforcing the "singleness"
    contracts above are meaningfully different.

    The other vagueness of "only one" is illustrated by the question "during which
    period of time"? Should it be the only one intance per life of a process or a
    thread? Or it should outlive some processes or threads? Or there should be
    exactly one (or no more than one?) instance "at at time" but it can be
    re-created several times during the life of a process (or a thread)? All of the
    above are useful in particular circumstances and enforcement techniques of
    differ significantly on the actual specific requirements (Andrei Alexandrescu
    illustrated the latter requirement with his "Phoenix singleton" composed of a
    set of different generic policies. Still he did not cover many of the above
    variances and I can't imagine a useful implementation that does. Even his
    singleton is more proof-of-concept than production-ready code despite relatively
    high quality (and of course it does not even try to address the performance
    issue of thread-safe singleton that DCL tried to solve. Phoenix singleton does
    not work with DCL -- or fixed-with-atomic-DCL -- too well)).

    so we design the class to enforce the
    > design. Which per se is probably overkill: the semantics of
    > most singletons are such that no reasonable programmer would
    > create more than one instance anyway, so it's not worth the
    > extra effort to forbid it. In the case of C++, however, the
    > singleton idiom is most often used primarily to solve order of
    > initialization issues; once you've created the code necessary to
    > solve those, making it a singleton (enforcing only one instance)
    > is merely a question of where you put the "public:".

    Yeah, that's another bunch of good questions (and don't even let me start on
    order-of-destruction issues). I prefer explicitly give one object to the
    constructor of the another when one depends on another or create them within a
    manager object that ensures the sequence. Then you end up with one manager for
    all interdependent "singletons" for a particular program. Make next step and
    create that manager in "main()" (or, if you are a library writer, make user
    create it in his "main()", or, if your user is a library writer, too, make him
    include your manager in his own manager that he has to make his user to create
    in main etc recursively) and there is no need for a singleton. BUT.. this only
    works if no one in the chain of library writers decided to put singleness inside
    his or her class (that is, s/he did not give his or her users a choice to create
    an object of class defined in his/her library at will). I am trying not to be
    such a nuisance; that's why I forgot when there was the last time I decided to
    encapsulate singleness in my class.

    >
    > --
    > James Kanze


    -Pavel
     
    Pavel, May 6, 2011
    #16
  17. Pavel Guest

    Chris M. Thomasson wrote:
    > "Leigh Johnston"<> wrote in message
    > news:...
    > [...]
    >>> On 01/05/2011 22:26, Pavel wrote:

    > [...]
    >>>> Any system-dependent mutex has to behave like a double-side memory
    >>>> barrier (it's a follow-up from the mutual exclusion requirement).
    >>>
    >>> I didn't realize this was guaranteed for all systems...
    >>>

    >>
    >> The ellipsis was an indication that I find your claim that all mutex
    >> implementations include memory barriers slightly dubious. :) From
    >> Wikipedia:
    >>
    >> "These primitives (mutexes) are *usually* implemented with the memory
    >> barriers required to provide the expected memory visibility semantics. In
    >> such environments explicit use of memory barriers is not *generally*
    >> necessary."
    >>
    >> I guess a mutex implementation that didn't include memory barriers on a
    >> system where memory barriers are needed would not be very useful.

    >
    > FWIW, there are mutex implementations that can "skip" explicit memory
    > barrier operations. You basically need to think in term of "Asymmetric
    > Synchronization". Here is some more information on the subject:
    >
    > http://blogs.sun.com/dave/resource/Asymmetric-Dekker-Synchronization.txt
    >
    > http://www.trl.ibm.com/people/kawatiya/Kawachiya05phd.pdf
    > (chapter 6)

    Thanks Chris for interesting resources! The techniques described there do show
    how to avoid executing memory barriers in both threads and still achieve mutual
    exclusion. My statement above still holds: these mutices "behave like a
    double-side memory barrier" (quoting myself) without actually executing them.
    They ensure the other threads sees the changes in the correct order which
    answers the original Leigh's question ("How does this prevent CPU reordering of
    stores to the pInstance pointer and the object pInstance points to?").

    All that damned C++ "as-if" legalese.. I have never imagined myself using such
    language by accident .. but that exactly what happened.

    -Pavel
     
    Pavel, May 6, 2011
    #17
  18. On Apr 26, 11:16 am, Joshua Maurice <> wrote:
    > Here's how you can break your code. Note again the general problem
    > that there are no guarantees that could even make it work, so I want
    > you to focus on the above basic problem, and do not spend too much
    > time on this, but I present it completeness. For example:
    >
    >     Singleton* Singleton::instance() {
    >       if (pInstance == 0) {
    >         Lock lock;
    >         if (pInstance == 0) {
    >           Singleton* temp = new Singleton;  // initialize to temp
    >           secondLock.lock();
    >           pInstance = temp;  // assign temp to pInstance
    >           secondLock.unlock();
    >         }
    >       }
    >       return pInstance;
    >     }
    >
    > A sufficiently smart compiler is allowed to move things from before a
    > lock to after the lock, and from after an unlock to before a lock. It
    > can transform the above to:
    >
    >     Singleton* Singleton::instance() {
    >       if (pInstance == 0) {
    >         Lock lock;
    >         if (pInstance == 0) {
    >           secondLock.lock();
    >           Singleton* temp = new Singleton;  // initialize to temp
    >           pInstance = temp;  // assign temp to pInstance
    >           secondLock.unlock();
    >         }
    >       }
    >       return pInstance;
    >     }
    >
    > And once we get that, it's trivial to change it to:
    >
    >     Singleton* Singleton::instance() {
    >       if (pInstance == 0) {
    >         Lock lock;
    >         if (pInstance == 0) {
    >           secondLock.lock();
    >           pInstance = new Singleton;
    >           secondLock.unlock();
    >         }
    >       }
    >       return pInstance;
    >     }
    >
    > Which means we're back to screwed for the reasons known to you. I
    > didn't even need to resort to the wonderful DEC Alpha and its split
    > cache, but I could have.
    >
    > You need to reread the paper which you cited. Here's the link again
    > for your benefit.http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
    > And pay attention this time.


    I was looking over this while searching for interview questions, and I
    realized I made a mistake. You cannot make the first reordering so
    easily due to the lock nonsense inside the constructor. However, a
    sufficiently smart compiler could notice your clever ruse, optimize
    away the assert as always true, see a lock and unlock pair guarding
    nothing, optimize that away, and then move the assignment to temp past
    the mutex acquire, as demonstrated above.

    Of course, as I tried to emphasize, and as I will emphasize again now,
    that's not the only kind of thing which can break it. Cache coherency
    problems can lead to quite bizarre situations, especially with a
    processor with a weak memory model. Of course, the most important
    detail is that you are not given guarantees that it will work. Thus
    implementers, hardware and compiler and linker, and anything new which
    hasn't been invented yet, are free to break your code, because it has
    undefined behavior, because your program breaks the rules.
     
    Joshua Maurice, May 12, 2011
    #18
  19. Joshua Maurice wrote:

    > However, a sufficiently smart compiler could notice your clever ruse,
    > optimize away the assert as always true, see a lock and unlock pair
    > guarding nothing, optimize that away, and then move the assignment to
    > temp past the mutex acquire, as demonstrated above.


    Regarding the compiler optimizing away a lock/unlock pair guarding
    "nothing": AIUI both lock and unlock need to provide certain fences.
    Therefore, again AIUI, they can't be optimized away by the compiler even
    if there's nothing in between, because that would remove the fences and
    alter the behavior.

    Am I missing something here?

    Gerhard
     
    Gerhard Fiedler, May 13, 2011
    #19
  20. On May 13, 3:56 pm, Gerhard Fiedler <> wrote:
    > Joshua Maurice wrote:
    > > However, a sufficiently smart compiler could notice your clever ruse,
    > > optimize away the assert as always true, see a lock and unlock pair
    > > guarding nothing, optimize that away, and then move the assignment to
    > > temp past the mutex acquire, as demonstrated above.

    >
    > Regarding the compiler optimizing away a lock/unlock pair guarding
    > "nothing": AIUI both lock and unlock need to provide certain fences.
    > Therefore, again AIUI, they can't be optimized away by the compiler even
    > if there's nothing in between, because that would remove the fences and
    > alter the behavior.
    >
    > Am I missing something here?


    That may be how they're commonly implemented, but that's not the
    guaranteed semantics. Two different mutexes may as a matter of fact on
    a given implementation give "happens-before" effects between the two
    different mutexes, but there's nothing guaranteed about it.

    I was ambiguously describing the situation, possibly to the extent of
    being wrong. Allow me to correct myself.

    Remember the code:

    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    Singleton* temp = new Singleton; // initialize to temp
    secondLock.lock();
    pInstance = temp; // assign temp to pInstance
    secondLock.unlock();
    }
    }
    return pInstance;
    }

    Singleton::Singleton() {
    secondLock.lock();
    assert( pInstance == 0 );
    secondLock.unlock();
    }

    The compiler can expand inline as follows (pseudo-code):

    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    Singleton* temp = (Singleton*) ::eek:perator
    new(sizeof(Singleton));

    secondLock.lock();
    assert( pInstance == 0 );
    secondLock.unlock();

    secondLock.lock();
    pInstance = temp; // assign temp to pInstance
    secondLock.unlock();
    }
    }
    return pInstance;
    }

    With that, it sees:
    someMutex.lock();
    <blah1>
    someMutex.unlock();
    <blah2>
    someMutex.lock();
    <blah3>
    someMutex.unlock();

    The compiler sees a lock, unlock, lock, unlock, in straightline code,
    without branching (or exceptions, or volatile (to keep signal handlers
    correct)). The compiler is totally free to replace that with:
    someMutex.lock();
    <blah1>
    <blah2>
    <blah3>
    someMutex.unlock();

    It may be bad from a QoI to do that. It depends. It depends heavily on
    the situation. I see it as quite reasonable that the compiler could
    employ some heuristics to deduce when it's a savings to combine
    critical sections. In this case, <blah1> and <blah2> are actually
    empty, no-ops, so it's always a savings to combine the two adjacent
    critical sections as such.

    So, it can transform it to:

    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    Singleton* temp = (Singleton*) ::eek:perator
    new(sizeof(Singleton));

    secondLock.lock();
    assert( pInstance == 0 );
    pInstance = temp; // assign temp to pInstance
    secondLock.unlock();
    }
    }
    return pInstance;
    }

    With some simple data flow analysis, and allowed motions past locks,
    it can transform it to:
    Singleton* Singleton::instance() {
    if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
    secondLock.lock();
    pInstance = (Singleton*) ::eek:perator new(sizeof(Singleton));
    secondLock.unlock();
    }
    }
    return pInstance;
    }

    ----

    Now, back to my original much more controversial statement - can a
    compiler simply remove a lock unlock pair? Ex:
    mutex.lock();
    mutex.unlock();
    Maybe. I mentioned "clever ruse" with whole program optimization in
    mind. (However, upon thinking about it, I just showed that you don't
    even need whole program optimization.) Without whole program
    optimization, I think no. Could someone please more educated weigh
    in?

    Thus far, after 10 minutes of attempts just now to write a conforming
    race-free program where you could tell the difference if a compiler
    simply removed an "empty" mutex acquire release pair, the only
    programs I can find are ones that would deadlock before the change,
    and not deadlock after the change. A deadlock is observable behavior,
    so a compiler cannot remove it for that reason.
     
    Joshua Maurice, May 14, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ed
    Replies:
    9
    Views:
    405
  2. Vinay Aggarwal
    Replies:
    23
    Views:
    778
    xarax
    Feb 19, 2004
  3. Gerald Thaler

    performance of double checked locking

    Gerald Thaler, Dec 16, 2004, in forum: Java
    Replies:
    8
    Views:
    578
    John C. Bollinger
    Dec 17, 2004
  4. Replies:
    7
    Views:
    563
    Ross Bamford
    Oct 11, 2005
  5. JohnQ
    Replies:
    3
    Views:
    385
    david.baird
    Mar 21, 2007
Loading...

Share This Page