Thread-safe reference counts.

Discussion in 'C++' started by jason.cipriani@gmail.com, Mar 27, 2008.

  1. Guest

    I have some code where objects are dynamically allocated by some
    thread, and then used by multiple threads and freed when they are no
    longer needed. I think that reference counting is the most appropriate
    way to handle this in my situation.

    However, my code currently has a very obvious problem in it. Here is
    how I have reference counting implemented for my objects, basically
    (sorry about any errors I am typing this here in this message):

    === BEGIN CODE ===

    class A {
    public:
    A ();
    void AddRef ();
    void Release ();
    private:
    ~A ();
    int refs_; // reference count
    pthread_mutex_t mtx_; // protects refs_
    };

    // constructor inits reference count to 1
    A::A ()
    : refs_(1), mtx_(PTHREAD_MUTEX_INITIALIZER)
    {
    }

    // increment count
    void A::AddRef () {
    pthread_mutex_lock(&mtx_);
    ++ refs_;
    pthread_mutex_unlock(&mtx_);
    }

    // decrement count, destroying object at 0.
    void A::Release () {
    bool deleteme;
    pthread_mutex_lock(&mtx_);
    -- _refs;
    deleteme = (_refs == 0);
    pthread_mutex_unlock(&mtx_);
    // <--- and here is the problem!
    if (deleteme)
    delete this;
    }

    === END CODE ===

    The problem is in Release(). There is a short period of time where the
    mutex is unlocked but the object is already doomed for destruction; if
    another thread calls AddRef() during that time, that other thread now
    has a pointer to garbage and doesn't know it.

    I can't delete this before unlocking the mutex. There is also a second
    problem, where even if, hypothetically speaking, I could do the delete
    inside the mutex lock in release, there's still the problem where
    another thread may have called AddRef() in the mean time, and it is
    blocking on that mutex, and by the time Release() returns if the
    reference count had decreased to 0, the object is deleted, the thread
    blocking in AddRef() continues, and operates on the object that's
    already been deleted.

    So I have two questions:

    1) I have no specific reason for implementing this all by hand. I am
    not completely familiar with boost, though. Is there some boost thread-
    safe reference counting thing that I can use to take care of this all
    for me?

    2) In any case, how can I fix the above problems in my own code? I
    can't quite get my head around it, it seems like it's not possible to
    do reference counting from "within" the object; that I need some kind
    of external "wrapper" around it instead. Also I am unsure about the
    logic that I'd need to use to protect against the case where AddRef
    blocks on the mutex, Release destroys the object, then AddRef
    continues on a deleted object. Really I guess what I said in #1 was
    kind of a lie: I certainly wouldn't *mind* not using boost, as I'm
    currently not using boost for anything else in this project, and
    installing boost on all the development machines is a *minor* hassle
    (nothing I can't overcome, but I don't mind implementing this by hand
    at this stage in this particular project, at least).

    Thanks,
    Jason
     
    , Mar 27, 2008
    #1
    1. Advertising

  2. Guest

    On Mar 26, 8:50 pm, ""
    <> wrote:
    > Is there some boost thread-
    > safe reference counting thing that I can use to take care of this all
    > for me?


    I see boost::shared_ptr, which appears to do exactly what I want. I'll
    just accept the fact that it works by magic and move on with my life.

    Jason
     
    , Mar 27, 2008
    #2
    1. Advertising

  3. Sam Guest

    writes:

    > So I have two questions:
    >
    > 1) I have no specific reason for implementing this all by hand. I am
    > not completely familiar with boost, though. Is there some boost thread-
    > safe reference counting thing that I can use to take care of this all
    > for me?


    Boost's shared_ptr class does this. The flaw in your logic is that you
    require explicit action by a thread to register and deregister a reference
    to an instance of a class.

    The correct approach is to incorporate registration and deregistration into
    any instance of a pointer to an instance of the class, so this is handled
    automatically by the pointer implementation. Any time the pointer is passed,
    the copy constructor registers another reference to the class. ANy time a
    pointer goes out of scope, the reference gets deregisters. When the
    reference count goes to 0, by definition that means the last pointer to the
    class instance just went out of scope, and the object can be safely
    destructed.

    That's what Boost's shared_ptr does. Although -- much like the rest of
    Boost's code -- it is horribly inefficient, and suffers from certain design
    flaws -- it is often a good starting point for your own reference-counted
    pointers.

    > of external "wrapper" around it instead. Also I am unsure about the
    > logic that I'd need to use to protect against the case where AddRef
    > blocks on the mutex, Release destroys the object, then AddRef
    > continues on a deleted object.


    You need to wrap your brain around the concept that /every/ time a pointer
    to the object gets instantiated, the reference count gets increased, and
    /every/ time a pointer to the object goes out of scope, the reference count
    gets decreased, so when the reference count goes to 0, by definition, no
    more pointers to the object exist and it's safe to delete it.

    You don't want to do it manually, by hand, so that's why you want to use an
    explicit pointer object class, that handles the reference counting as part
    of its constructor, copy constructor, assignment operator, and destructor.
    That's the only way to do it reliably.

    > Really I guess what I said in #1 was
    > kind of a lie: I certainly wouldn't *mind* not using boost, as I'm
    > currently not using boost for anything else in this project, and


    Then don't. Resist the temptation to use Boost as long as you can. Many
    years from now, you'll be thankful for that. Define and implement your own
    pointer class, and use this as a learning experience. True, your reference
    counting implementation will be much slower than Boost's. Boost does not use
    mutexes for this, rather it uses CPU-specific atomic increment/decrement
    instructions (and gets it partially wrong, by the way). But this is a good
    way to learn some important principles of thread-safe programming.


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBH6v/Nx9p3GYHlUOIRAhS0AJwMC7C/6uwYtYNkRDdLk4h6AIHscwCfaoth
    JKFZzwlCqo+COdKp3DpAbWk=
    =HaO2
    -----END PGP SIGNATURE-----
     
    Sam, Mar 27, 2008
    #3
  4. Guest

    Thanks for the detailed reply, Sam. I appreciate it and it confirms a
    lot of my suspicions.

    On Mar 26, 10:00 pm, Sam <> wrote:
    > That's what Boost's shared_ptr does. Although -- much like the rest of
    > Boost's code -- it is horribly inefficient, and suffers from certain design
    > flaws -- it is often a good starting point for your own reference-counted
    > pointers.


    I can't comment on shared_ptr's efficiency since I have no experience
    with it. Blinding efficiency is not important for this particular
    application, however, but as far as "certain design flaws"... is there
    anything in particular that I should watch out for? I'm not doing
    anything weird, I have STL containers of shared_ptr<Object>'s, I
    sometimes return shared_ptr<Object>'s from functions, other than that
    it's pretty basic stuff. So far in my shared_ptr experiments it seems
    to be well-behaved; although using "ptr.reset(new Object)" is a little
    strange when you are used to "ptr = new Object" -- but it makes sense
    (similarily, "thelist.push_back(MyPtr(new Object))" as opposed to
    "thelist.push_back(new Object)" -- but I'm thankful for the explicit
    constructor, it's helping me catch some mistakes).


    > You need to wrap your brain around the concept that /every/ time a pointer
    > to the object gets instantiated, the reference count gets increased, and
    > /every/ time a pointer to the object goes out of scope, the reference count
    > gets decreased, so when the reference count goes to 0, by definition, no
    > more pointers to the object exist and it's safe to delete it.


    Thanks for explaining it this way; I had been thinking about it on a
    wider scope, like "when this thread gets a pointer, the reference
    count is incremented, and when this thread is done with it, decrement
    it" -- which was confusing the heck out of me.

    > [snip] Define and implement your own
    > pointer class, and use this as a learning experience. True, your reference
    > counting implementation will be much slower than Boost's.[snip]
    > But this is a good
    > way to learn some important principles of thread-safe programming.


    Definitely; at least for a learning experience, I was planning on
    doing this at the end of this project. I appreciate the shared_ptr's
    magic but it's always good to know what's going on under the hood.

    Thanks again,
    Jason
     
    , Mar 27, 2008
    #4
  5. [comp.programming.threads added]

    <> wrote in message
    news:...
    >I have some code where objects are dynamically allocated by some
    > thread, and then used by multiple threads and freed when they are no
    > longer needed. I think that reference counting is the most appropriate
    > way to handle this in my situation.
    >
    > However, my code currently has a very obvious problem in it. Here is
    > how I have reference counting implemented for my objects, basically
    > (sorry about any errors I am typing this here in this message):

    [...]

    Your question about another thread calling AddRef while one is calling
    Release and dropping to zero means that you need strong thread-safety which
    shared_ptr does not provide. Here are some implementations you can take a
    look at:


    Joe Seighs work:

    http://atomic-ptr-plus.sourceforge.net/

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/3c763698da537d8f



    My work:


    http://appcore.home.comcast.net/~appcore/vzoom/refcount


    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/177609a5eff4a466
    (100% portable version (e.g., POSIX Threads)



    Here is some information on strong thread-safety:

    http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e5167941d32340c6



    If you have any questions, please feel free to ask. I know that we can work
    out a soultion.
     
    Chris Thomasson, Mar 27, 2008
    #5
  6. "Chris Thomasson" <> wrote in message
    news:...
    > [comp.programming.threads added]
    >
    > <> wrote in message
    > news:...
    >>I have some code where objects are dynamically allocated by some
    >> thread, and then used by multiple threads and freed when they are no
    >> longer needed. I think that reference counting is the most appropriate
    >> way to handle this in my situation.
    >>
    >> However, my code currently has a very obvious problem in it. Here is
    >> how I have reference counting implemented for my objects, basically
    >> (sorry about any errors I am typing this here in this message):

    > [...]
    >
    > Your question about another thread calling AddRef while one is calling
    > Release and dropping to zero means that you need strong thread-safety
    > which shared_ptr does not provide. Here are some implementations you can
    > take a look at:

    [...]

    Here is context that I accidentally snipped:


    <> wrote in message
    news:...
    {
    The problem is in Release(). There is a short period of time where the
    mutex is unlocked but the object is already doomed for destruction; if
    another thread calls AddRef() during that time, that other thread now
    has a pointer to garbage and doesn't know it.

    I can't delete this before unlocking the mutex. There is also a second
    problem, where even if, hypothetically speaking, I could do the delete
    inside the mutex lock in release, there's still the problem where
    another thread may have called AddRef() in the mean time, and it is
    blocking on that mutex, and by the time Release() returns if the
    reference count had decreased to 0, the object is deleted, the thread
    blocking in AddRef() continues, and operates on the object that's
    already been deleted.
    }


    According to the text above, you need strong thread-safety; period. You not
    following the rule of basic thread-safety which says that a thread _cannot_
    acquire a reference to an object that it does not already own a reference
    to. If your application does not follow that rule, then you need something
    like Joe's excellent atomic_ptr.
     
    Chris Thomasson, Mar 27, 2008
    #6
  7. wrote:
    > On Mar 26, 8:50 pm, ""
    > <> wrote:
    >> Is there some boost thread-
    >> safe reference counting thing that I can use to take care of this all
    >> for me?

    >
    > I see boost::shared_ptr, which appears to do exactly what I want. I'll
    > just accept the fact that it works by magic and move on with my life.


    Alot of libraries do this. I implemented it in Austria C++.

    at::ptrTarget_MT provides the definition of the thread safe reference
    counted base class.
    (defined on line 210)
    http://austria.svn.sourceforge.net/...stria/code/at_lifetime_mt.h?view=markup#l_210

    It's also portable across Windows and Linux(x86 +amd64).
     
    Gianni Mariani, Mar 27, 2008
    #7
  8. Chris Thomasson wrote:
    > [comp.programming.threads added]
    >
    > <> wrote in message
    > news:...
    >
    >> I have some code where objects are dynamically allocated by some
    >> thread, and then used by multiple threads and freed when they are no
    >> longer needed. I think that reference counting is the most appropriate
    >> way to handle this in my situation.
    >>
    >> However, my code currently has a very obvious problem in it. Here is
    >> how I have reference counting implemented for my objects, basically
    >> (sorry about any errors I am typing this here in this message):

    >
    > [...]
    >
    > Your question about another thread calling AddRef while one is calling
    > Release and dropping to zero means that you need strong thread-safety
    > which shared_ptr does not provide. Here are some implementations you can
    > take a look at:


    This just means, that you are making a copy of an object where the
    destructor is in progress. That's simply a bug and should be avoided ;-)

    best regards,
    Torsten

    --
    kostenlose Wirtschaftssimulation: http://www.financial-rumors.de
     
    Torsten Robitzki, Mar 27, 2008
    #8
  9. In article <-scan.com>,
    Sam <> wrote:
    >-=-=-=-=-=-
    >
    >That's what Boost's shared_ptr does. Although -- much like the rest of
    >Boost's code -- it is horribly inefficient, and suffers from certain design
    >flaws -- it is often a good starting point for your own reference-counted
    >pointers.


    [...]

    >Then don't. Resist the temptation to use Boost as long as you can. Many
    >years from now, you'll be thankful for that. Define and implement your own
    >pointer class, and use this as a learning experience. True, your reference
    >counting implementation will be much slower than Boost's. Boost does not use
    >mutexes for this, rather it uses CPU-specific atomic increment/decrement
    >instructions (and gets it partially wrong, by the way). But this is a good
    >way to learn some important principles of thread-safe programming.


    Care to clarify?

    And does this also apply to std::tr1::shared_ptr which is based on
    boost impelmentation on many platfroms.

    Yan
     
    Yannick Tremblay, Mar 27, 2008
    #9
  10. Sam Guest

    Yannick Tremblay writes:

    > In article <-scan.com>,
    > Sam <> wrote:
    >>-=-=-=-=-=-
    >>
    >>That's what Boost's shared_ptr does. Although -- much like the rest of
    >>Boost's code -- it is horribly inefficient, and suffers from certain design
    >>flaws -- it is often a good starting point for your own reference-counted
    >>pointers.

    >
    > [...]
    >
    >>Then don't. Resist the temptation to use Boost as long as you can. Many
    >>years from now, you'll be thankful for that. Define and implement your own
    >>pointer class, and use this as a learning experience. True, your reference
    >>counting implementation will be much slower than Boost's. Boost does not use
    >>mutexes for this, rather it uses CPU-specific atomic increment/decrement
    >>instructions (and gets it partially wrong, by the way). But this is a good
    >>way to learn some important principles of thread-safe programming.

    >
    > Care to clarify?


    The entire design is completely braindead. A separate object, a tiny
    object that holds the reference count, gets instantiated for every object
    tracked by shared_ptr. This is insane. Heavy use of shared_ptr is just going
    to hammer your heap like there's no tomorrow. This is horrible design. The
    correct approach is to store the reference count in superclass and derive
    from it. You can't then attach shared_ptr to an arbitrary libstdc++ object,
    of course, so you just multiply-inherit from it. How hard is that?

    Then you have the shared_ptr itself, an object that holds two pointers, one
    to the original object, the other to the reference count.

    Which means that most compilers won't be able to hold a shared_ptr in CPU
    registers, like ordinary pointers. Instead, things are going to get tossed
    around in memory.

    There are also several other design flaws as well. Like the fact that each
    time a reference count gets updated, it involves a call to a library
    function. Reference count increment/decrement cannot be inlined, due to a
    real first class fookup in the templates. After I read through the class
    definition, I just shook my head, laughed, then quickly coded my own
    reference-counted object implementation, and ended up benchmarking it 15%
    faster than Boost's disgusting implementation.

    > And does this also apply to std::tr1::shared_ptr which is based on
    > boost impelmentation on many platfroms.


    Yes.


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBH64FMx9p3GYHlUOIRAlZqAJ4xAdLVbQFcD4u7f++dM+/C/X1K2ACffOh3
    z7q9y9KZHR39q9sI8fpUOPU=
    =wRH0
    -----END PGP SIGNATURE-----
     
    Sam, Mar 27, 2008
    #10
  11. * Sam:
    > Yannick Tremblay writes:
    >
    >> In article <-scan.com>,
    >> Sam <> wrote:
    >>> -=-=-=-=-=-
    >>>
    >>> That's what Boost's shared_ptr does. Although -- much like the rest
    >>> of Boost's code -- it is horribly inefficient, and suffers from
    >>> certain design flaws -- it is often a good starting point for your
    >>> own reference-counted pointers.

    >>
    >> [...]
    >>
    >>> Then don't. Resist the temptation to use Boost as long as you can.
    >>> Many years from now, you'll be thankful for that. Define and
    >>> implement your own pointer class, and use this as a learning
    >>> experience. True, your reference counting implementation will be much
    >>> slower than Boost's. Boost does not use mutexes for this, rather it
    >>> uses CPU-specific atomic increment/decrement instructions (and gets
    >>> it partially wrong, by the way). But this is a good way to learn some
    >>> important principles of thread-safe programming.

    >>
    >> Care to clarify?

    >
    > The entire design is completely braindead. A separate object, a tiny
    > object that holds the reference count, gets instantiated for every
    > object tracked by shared_ptr. This is insane. Heavy use of shared_ptr is
    > just going to hammer your heap like there's no tomorrow. This is
    > horrible design. The correct approach is to store the reference count in
    > superclass and derive from it. You can't then attach shared_ptr to an
    > arbitrary libstdc++ object, of course, so you just multiply-inherit from
    > it. How hard is that?
    >
    > Then you have the shared_ptr itself, an object that holds two pointers,
    > one to the original object, the other to the reference count.
    >
    > Which means that most compilers won't be able to hold a shared_ptr in
    > CPU registers, like ordinary pointers. Instead, things are going to get
    > tossed around in memory.
    >
    > There are also several other design flaws as well. Like the fact that
    > each time a reference count gets updated, it involves a call to a
    > library function. Reference count increment/decrement cannot be inlined,
    > due to a real first class fookup in the templates. After I read through
    > the class definition, I just shook my head, laughed, then quickly coded
    > my own reference-counted object implementation, and ended up
    > benchmarking it 15% faster than Boost's disgusting implementation.


    Have you looketh at boost::intrusive_ptr?


    >> And does this also apply to std::tr1::shared_ptr which is based on
    >> boost impelmentation on many platfroms.



    Cheers,

    - Alf

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Mar 27, 2008
    #11
  12. gpderetta Guest

    On Mar 27, 12:13 pm, Sam <> wrote:
    > Yannick Tremblay writes:
    > > Sam <> wrote:

    >
    > >>That's what Boost's shared_ptr does. Although -- much like the rest of
    > >>Boost's code -- it is horribly inefficient, and suffers from certain design
    > >>flaws -- it is often a good starting point for your own reference-counted
    > >>pointers.


    Like the rest of Boost's code? That's not my experience.

    >
    > > [...]

    >
    > >>Then don't. Resist the temptation to use Boost as long as you can. Many
    > >>years from now, you'll be thankful for that. Define and implement your own
    > >>pointer class, and use this as a learning experience. True, your reference
    > >>counting implementation will be much slower than Boost's. Boost does not use
    > >>mutexes for this, rather it uses CPU-specific atomic increment/decrement
    > >>instructions (and gets it partially wrong, by the way). But this is a good
    > >>way to learn some important principles of thread-safe programming.

    >
    > > Care to clarify?

    >
    > The entire design is completely braindead. A separate object, a tiny
    > object that holds the reference count, gets instantiated for every object
    > tracked by shared_ptr. This is insane. Heavy use of shared_ptr is just going
    > to hammer your heap like there's no tomorrow.


    Use a custom allocator. Or use a platform with a better default
    allocator.
    Also the draft standard (and I think the next boost release) has
    make_shared or something like that that will encapsulate 'new' and
    will allocate the reference count and your object with a single
    allocation; as a plus it will be binary compatible with the default
    separate referece counted shared_ptrs.

    > This is horrible design. The
    > correct approach is to store the reference count in superclass and derive
    > from it. You can't then attach shared_ptr to an arbitrary libstdc++ object,
    > of course, so you just multiply-inherit from it. How hard is that?
    >


    If you do not want to pay for a separate heap allocated reference
    count, use boost::intrusive_ptr. Which doesn't force you to derive
    from some superclass.

    Not requiring changes in user classes is good desing, not horrible
    design.

    Also note that using a separate reference count gives you weak_ptr
    (technically you can have it with an intrusive reference counted smart
    pointer, but you would have to defer deallocation of the pointed
    object untill the last weak pointer dies, which is Not What You Want
    (TM) ).

    > Then you have the shared_ptr itself, an object that holds two pointers, one
    > to the original object, the other to the reference count.
    >
    > Which means that most compilers won't be able to hold a shared_ptr in CPU
    > registers, like ordinary pointers.


    Most compilers? Maybe from the last century. I would be surprised if
    any half-recent compiler did that.

    > Instead, things are going to get tossed
    > around in memory.
    >
    > There are also several other design flaws as well. Like the fact that each
    > time a reference count gets updated, it involves a call to a library
    > function.


    On most plattforms boost shared_ptr uses inline asm or compiler
    intrinsics for reference count updates. Calls to platfrom specific
    mutex lock/unlock are only a fall back. And you know that, as you said
    the same thing in the previous paragraph. Also I would like to know
    how it "gets it partially wrong".

    > Reference count increment/decrement cannot be inlined, due to a
    > real first class fookup in the templates.


    What is that supposed to mean?

    > After I read through the class
    > definition, I just shook my head, laughed, then quickly coded my own
    > reference-counted object implementation, and ended up benchmarking it 15%
    > faster than Boost's disgusting implementation.
    >


    15% is hardly impressive. It is unlikely that shared_ptr operations
    are responsible of more than 1% of your application run time (if they
    are, you are likely doing something wrong). Improving 15% on that
    isn't a great accomplishment nor a good use of your time.

    With a custom tailored shared ptr you can do much, much better (as,
    I'm sure, Chris Thomasson is going to tell you ;) ). But of course you
    lose most of the nice properties of shared_ptr: weak_ptr, support for
    incomplete types, custom deleter, aliasing constructor etc, etc... and
    of course a peer reviewed, widely tested design (even blessed by the
    standard committee).

    > > And does this also apply to std::tr1::shared_ptr which is based on
    > > boost impelmentation on many platfroms.

    >
    > Yes.


    No.

    --
    gpd
     
    gpderetta, Mar 27, 2008
    #12
  13. gpderetta wrote:

    >> After I read through the class
    >> definition, I just shook my head, laughed, then quickly coded my own
    >> reference-counted object implementation, and ended up benchmarking it 15%
    >> faster than Boost's disgusting implementation.
    >>

    >
    > 15% is hardly impressive. It is unlikely that shared_ptr operations
    > are responsible of more than 1% of your application run time (if they
    > are, you are likely doing something wrong). Improving 15% on that
    > isn't a great accomplishment nor a good use of your time.


    I once made a comparison of boost::shared_ptr and loki::SmartPtr, which uses
    a custom allocator (both with non-intrusive reference counting). In debug
    mode, loki was slightly slower because of massive policy templates, but
    when compiled with -O2 it needed only 50% of the time of boost::shared_ptr.

    Of course, their design is rather complementary. Boost::shared_ptr is a more
    general and broad approach, while Loki provides possibilities to configure
    every tiny bit with policies. Each has it's strength and weaknesses. I have
    used both and didn't run into problems so far.

    lg,
    Michael
     
    Michael Oswald, Mar 27, 2008
    #13

  14. > This just means, that you are making a copy of an object where the
    > destructor is in progress. That's simply a bug and should be avoided ;-)
    >
    > best regards,
    > Torsten


    I agree. I would put it simply -- you cannot call 'AddRef' unless you
    have an explicit or implicit reference to an object. The 'AddRef'
    function is not special, it must be called with a reference just like
    every other function.

    The puzzle is this -- how did you get a pointer to object to call
    'AddRef' on anyway?

    I've heard a lot of talk about strong thread safety and the like, but
    I have to admit, I don't get it. In order to call 'AddRef' on an
    object, you need a pointer to it, and how could you possibly have
    gotten that pointer without something that already had a reference?

    The existence of a pointer should mean the existence of a reference --
    otherwise how can you know that pointer remains valid, whether a call
    for AddRef or for any other purpose?

    DS
     
    David Schwartz, Mar 27, 2008
    #14
  15. Guest

    On Mar 27, 8:12 am, gpderetta <> wrote:
    > On Mar 27, 12:13 pm, Sam <> wrote:
    >
    > > Yannick Tremblay writes:
    > > > Sam  <> wrote:

    >
    > > >>That's whatBoost'sshared_ptr does. Although -- much like the rest of
    > > >>Boost'scode -- it is horribly inefficient, and suffers from certain design
    > > >>flaws -- it is often a good starting point for your own reference-counted
    > > >>pointers.

    >
    > Like the rest ofBoost'scode? That's not my experience.


    I think some Boost libraries are better than others though.
    http://www.webEbenezer.net/comparison.html

    Brian
     
    , Mar 27, 2008
    #15
  16. David Schwartz wrote:
    >
    > > This just means, that you are making a copy of an object where the
    > > destructor is in progress. That's simply a bug and should be avoided ;-)
    > >
    > > best regards,
    > > Torsten

    >
    > I agree. I would put it simply -- you cannot call 'AddRef' unless you
    > have an explicit or implicit reference to an object. The 'AddRef'
    > function is not special, it must be called with a reference just like
    > every other function.
    >
    > The puzzle is this -- how did you get a pointer to object to call
    > 'AddRef' on anyway?
    >
    > I've heard a lot of talk about strong thread safety and the like, but
    > I have to admit, I don't get it. In order to call 'AddRef' on an
    > object, you need a pointer to it, and how could you possibly have
    > gotten that pointer without something that already had a reference?
    >
    > The existence of a pointer should mean the existence of a reference --
    > otherwise how can you know that pointer remains valid, whether a call
    > for AddRef or for any other purpose?


    See

    http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2297.html#atomic

    regards,
    alexander.
     
    Alexander Terekhov, Mar 27, 2008
    #16
  17. gpderetta Guest

    On Mar 27, 9:42 pm, wrote:
    > On Mar 27, 8:12 am, gpderetta <> wrote:
    >
    > > On Mar 27, 12:13 pm, Sam <> wrote:

    >
    > > > Yannick Tremblay writes:
    > > > > Sam <> wrote:

    >
    > > > >>That's whatBoost'sshared_ptr does. Although -- much like the rest of
    > > > >>Boost'scode -- it is horribly inefficient, and suffers from certain design
    > > > >>flaws -- it is often a good starting point for your own reference-counted
    > > > >>pointers.

    >
    > > Like the rest ofBoost'scode? That's not my experience.

    >
    > I think some Boost libraries are better than others though.http://www.webEbenezer.net/comparison.html
    >


    Use an std::vector (or boost::array) instead of std::deque or
    std::list for applications where serialization performance is
    critical.

    --
    gpd
     
    gpderetta, Mar 27, 2008
    #17
  18. Tim H Guest

    On Mar 27, 1:26 pm, David Schwartz <> wrote:
    > > This just means, that you are making a copy of an object where the
    > > destructor is in progress. That's simply a bug and should be avoided ;-)

    >
    > > best regards,
    > > Torsten

    >
    > I agree. I would put it simply -- you cannot call 'AddRef' unless you
    > have an explicit or implicit reference to an object. The 'AddRef'
    > function is not special, it must be called with a reference just like
    > every other function.
    >
    > The puzzle is this -- how did you get a pointer to object to call
    > 'AddRef' on anyway?
    >
    > I've heard a lot of talk about strong thread safety and the like, but
    > I have to admit, I don't get it. In order to call 'AddRef' on an
    > object, you need a pointer to it, and how could you possibly have
    > gotten that pointer without something that already had a reference?
    >
    > The existence of a pointer should mean the existence of a reference --
    > otherwise how can you know that pointer remains valid, whether a call
    > for AddRef or for any other purpose?
    >
    > DS


    The owner calls AddRef before it hands you a pointer.
     
    Tim H, Mar 27, 2008
    #18
  19. Sam Guest

    gpderetta writes:

    > On Mar 27, 12:13 pm, Sam <> wrote:
    >> Yannick Tremblay writes:
    >> > Care to clarify?

    >>
    >> The entire design is completely braindead. A separate object, a tiny
    >> object that holds the reference count, gets instantiated for every object
    >> tracked by shared_ptr. This is insane. Heavy use of shared_ptr is just going
    >> to hammer your heap like there's no tomorrow.

    >
    > Use a custom allocator. Or use a platform with a better default
    > allocator.


    What a great idea. The implementation is braindead? No problem, just pile on
    a heap of workarounds that adds another layer of overhead, that'll fix it!

    > Also the draft standard (and I think the next boost release) has
    > make_shared or something like that that will encapsulate 'new' and
    > will allocate the reference count and your object with a single
    > allocation; as a plus it will be binary compatible with the default
    > separate referece counted shared_ptrs.


    Ditto!

    >> This is horrible design. The
    >> correct approach is to store the reference count in superclass and derive
    >> from it. You can't then attach shared_ptr to an arbitrary libstdc++ object,
    >> of course, so you just multiply-inherit from it. How hard is that?
    >>

    >
    > If you do not want to pay for a separate heap allocated reference
    > count, use boost::intrusive_ptr. Which doesn't force you to derive
    > from some superclass.


    I'm almost afraid to look.

    > Not requiring changes in user classes is good desing, not horrible
    > design.


    "User classes" are, by definition, designed by the user. Now that's an
    interesting approach: rather than designing a class hierarchy to properly
    implement reference-counted objects, nooooooo!!! That makes too much sense.
    Instead, forcibly rip out all the reference-counting logic, and use a pile
    of duct tape and superglue to attach it to your class hierarchy.

    That's a guaranteed recipe for efficiently getting things done, isn't it?

    > Also note that using a separate reference count gives you weak_ptr
    > (technically you can have it with an intrusive reference counted smart
    > pointer, but you would have to defer deallocation of the pointed
    > object untill the last weak pointer dies, which is Not What You Want
    > (TM) ).


    /me looks at my reference-counted class hierarchy, that supports weak
    references and virtual inheritance, refrains from any cockamamie logic like
    deferred allocation, and benchmarks way ahead of Boost.

    I must be imagining things. Hold on while I pop an aspirin, or two, and
    reorient myself.

    > Most compilers? Maybe from the last century. I would be surprised if
    > any half-recent compiler did that.


    Try gcc 4.1. CPU registers are at a premium. It's very unlikely that the
    compiler will put anything into a register that's not a single, native type.
    If there's enough registers for everything, then maybe, but this rarely
    occurs in practice. A compiler is more likely to keep a pointer or a native
    type, rather than clear out a bunch of registers to hold an entire struct.

    Using a reference to a single object that's implemented as a single pointer,
    versus two pointers, has a far better chance of resulting in faster code.
    That's a no-brainer.

    >> There are also several other design flaws as well. Like the fact that each
    >> time a reference count gets updated, it involves a call to a library
    >> function.

    >
    > On most plattforms boost shared_ptr uses inline asm or compiler
    > intrinsics for reference count updates. Calls to platfrom specific
    > mutex lock/unlock are only a fall back. And you know that, as you said
    > the same thing in the previous paragraph. Also I would like to know
    > how it "gets it partially wrong".


    Except that on x86 it uses a wrong builtin, for which there's no
    corresponding CPU instruction, resulting in:

    x86_64:
    movq %rax, %rdi
    movl $-1, %esi
    call _ZN9__gnu_cxx18__exchange_and_addEPVii

    i386:
    movl $-1, 4(%esp)
    movl %eax, (%esp)
    call _ZN9__gnu_cxx18__exchange_and_addEPVii

    That's for every reference decrement. Lovely. Ditto for increment.

    Seriously, has anyone actually LOOKED at the code that comes out of Boost,
    and checked it for sanity?

    >> Reference count increment/decrement cannot be inlined, due to a
    >> real first class fookup in the templates.

    >
    > What is that supposed to mean?


    Wrong gcc builtin!!!!!!

    __exchange_and_add() does not get compiled by gcc to an equivalent CPU
    instruction on either i386 or x86_64, for the simple reason that there is no
    such CPU instruction, and it has to be emulated by libgcc! shared_ptr forces
    a function call to libgcc every time you bounce shared_ptr from place A to
    place B. Whoever did this, without verifying that the code gets compiled to
    an actual atomic x86 instruction, should wear a paper bag on his head for at
    least a month. All they had to do is use the correct builtin, and eliminate
    all that overhead of setting up a stack frame, invoking a remote library
    function, likely resulting in a trashed cacheline, and replace all of that
    useless crap with one or two native CPU instructions!

    What a joke.

    >> After I read through the class
    >> definition, I just shook my head, laughed, then quickly coded my own
    >> reference-counted object implementation, and ended up benchmarking it 15%
    >> faster than Boost's disgusting implementation.
    >>

    >
    > 15% is hardly impressive. It is unlikely that shared_ptr operations
    > are responsible of more than 1% of your application run time (if they
    > are, you are likely doing something wrong). Improving 15% on that
    > isn't a great accomplishment nor a good use of your time.


    You must not be familiar with developing real world applications for the
    financial industry, where every nanosecond of latency counts. Even a 1%
    improvement translates to a non-trivial competitive advantage. I'll take it
    any day.

    Furthermore, this was a rather crude benchmark, which probably ended up
    hitting the CPU cache most of the time. A more realistic mock-up will likely
    show a greater difference due to doubling of heap allocation activity that
    shared_ptr demands, not to mention all the unnecessary calls to libgcc.
    Especially since, as a result of deferred allocation by shared_ptr, the
    extra allocation won't have a 1:1 relationship with heap allocation of the
    referenced objects themselves.

    > With a custom tailored shared ptr you can do much, much better (as,


    I see no need for something heavily customized. Just basic operations,
    typical for reference-counted objects, will suffice, as long as you do them
    correctly. Really, this is not rocket science.

    My impression is that many Boosts classes are what you get in response to a
    homework assignment for a junior-level college course on programming
    algorithms and concepts, with only a minimum of consideration paid to
    practical, real-world scenarios.

    > I'm sure, Chris Thomasson is going to tell you ;) ). But of course you
    > lose most of the nice properties of shared_ptr: weak_ptr, support for
    > incomplete types, custom deleter, aliasing constructor etc, etc... and
    > of course a peer reviewed, widely tested design (even blessed by the
    > standard committee).


    Gee, then I must've imagined knocking off all this, including weak
    references, templated default allocators/deallocators, and supporting
    multiply-inherited objects, in just a day or two, last year.

    If I was on a committee that apparently accepted and blessed Boost's
    shared_ptr, without even bothering to check at what code it generates, and
    how, I'd probably be embarassed.

    >> > And does this also apply to std::tr1::shared_ptr which is based on
    >> > boost impelmentation on many platfroms.

    >>
    >> Yes.

    >
    > No.


    I'm afraid the answer is still yes.


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBH7CsPx9p3GYHlUOIRAqigAJsG2SMqpWOx6M1lPUYWxvmyyRJCxwCeJ5zo
    ApL4IIvYRMZMR97bQ7VYqHg=
    =EAVw
    -----END PGP SIGNATURE-----
     
    Sam, Mar 27, 2008
    #19
  20. gpderetta Guest

    On Mar 28, 12:17 am, Sam <> wrote:
    > gpderetta writes:
    > > On Mar 27, 12:13 pm, Sam <> wrote:
    > >> Yannick Tremblay writes:
    > >> > Care to clarify?

    >
    > >> The entire design is completely braindead. A separate object, a tiny
    > >> object that holds the reference count, gets instantiated for every object
    > >> tracked by shared_ptr. This is insane. Heavy use of shared_ptr is just going
    > >> to hammer your heap like there's no tomorrow.

    >
    > > Use a custom allocator. Or use a platform with a better default
    > > allocator.

    >
    > What a great idea. The implementation is braindead? No problem, just pile on
    > a heap of workarounds that adds another layer of overhead, that'll fix it!
    >


    If your platform allocator is slow is not shared_ptr fault.

    > [...]
    > >> This is horrible design. The
    > >> correct approach is to store the reference count in superclass and derive
    > >> from it. You can't then attach shared_ptr to an arbitrary libstdc++ object,
    > >> of course, so you just multiply-inherit from it. How hard is that?

    >
    > > If you do not want to pay for a separate heap allocated reference
    > > count, use boost::intrusive_ptr. Which doesn't force you to derive
    > > from some superclass.

    >
    > I'm almost afraid to look.


    Nobody is forcing you to.

    >
    > > Not requiring changes in user classes is good desing, not horrible
    > > design.

    >
    > "User classes" are, by definition, designed by the user. Now that's an
    > interesting approach: rather than designing a class hierarchy to properly
    > implement reference-counted objects, nooooooo!!! That makes too much sense.
    > Instead, forcibly rip out all the reference-counting logic, and use a pile
    > of duct tape and superglue to attach it to your class hierarchy.


    No duct tape nor superglue. Just common C++ idioms.
    I prefer to adapt non intrusively my classes to use them with third
    party
    libraries than design for them.

    >
    > That's a guaranteed recipe for efficiently getting things done, isn't it?


    Actually yes.

    >
    > > Also note that using a separate reference count gives you weak_ptr
    > > (technically you can have it with an intrusive reference counted smart
    > > pointer, but you would have to defer deallocation of the pointed
    > > object untill the last weak pointer dies, which is Not What You Want
    > > (TM) ).

    >
    > /me looks at my reference-counted class hierarchy, that supports weak
    > references and virtual inheritance, refrains from any cockamamie logic like
    > deferred allocation, and benchmarks way ahead of Boost.
    >


    15% faster doesn't sound that 'way ahead'.

    > I must be imagining things. Hold on while I pop an aspirin, or two, and
    > reorient myself.


    Certainly not, but I would be curious to see how did you implement
    weak pointers.

    >
    > > Most compilers? Maybe from the last century. I would be surprised if
    > > any half-recent compiler did that.

    >
    > Try gcc 4.1.


    That's what I'm using.

    > CPU registers are at a premium. It's very unlikely that the
    > compiler will put anything into a register that's not a single, native type.
    > If there's enough registers for everything, then maybe, but this rarely
    > occurs in practice. A compiler is more likely to keep a pointer or a native
    > type, rather than clear out a bunch of registers to hold an entire struct.
    >


    It will reserve registers according to what it is more useful in a
    specific place.

    Of course small objects means less register spills, but a compiler can
    still keep the most used part of an object (in shared pointer case,
    the pointer to the held object) and spill the less used part to the
    stack (in shared_ptr case, the pointer to the shared count).

    > [...]
    > >> There are also several other design flaws as well. Like the fact that each
    > >> time a reference count gets updated, it involves a call to a library
    > >> function.

    >
    > > On most plattforms boost shared_ptr uses inline asm or compiler
    > > intrinsics for reference count updates. Calls to platfrom specific
    > > mutex lock/unlock are only a fall back. And you know that, as you said
    > > the same thing in the previous paragraph. Also I would like to know
    > > how it "gets it partially wrong".

    >
    > Except that on x86 it uses a wrong builtin, for which there's no
    > corresponding CPU instruction, resulting in:
    >
    > x86_64:
    > movq %rax, %rdi
    > movl $-1, %esi
    > call _ZN9__gnu_cxx18__exchange_and_addEPVii
    >
    > i386:
    > movl $-1, 4(%esp)
    > movl %eax, (%esp)
    > call _ZN9__gnu_cxx18__exchange_and_addEPVii
    >
    > That's for every reference decrement. Lovely. Ditto for increment.
    >


    What I see with gcc-4.1.2 and boost 1.33.1 is:

    #APP
    lock
    xadd %eax, 8(%rbx)
    #NO_APP

    It doesn't use gcc builtins but custom inline assembler.
    Maybe you are using a very old boost version?

    >
    > >> Reference count increment/decrement cannot be inlined, due to a
    > >> real first class fookup in the templates.

    >
    > > What is that supposed to mean?

    >
    > Wrong gcc builtin!!!!!!
    >
    > __exchange_and_add() does not get compiled by gcc to an equivalent CPU
    > instruction on either i386 or x86_64, for the simple reason that there is no
    > such CPU instruction,


    So what? It can be implemented with CAS. It is not unreasonable to
    expect that the compiler
    would do that for you. Anyways, recent boost releases (at least since
    December 2005) do not use the intrinsic.

    > and it has to be emulated by libgcc! shared_ptr forces
    > a function call to libgcc every time you bounce shared_ptr from place A to
    > place B.


    Not in the disassembled code I'm looking at.

    > [...]
    > >> After I read through the class
    > >> definition, I just shook my head, laughed, then quickly coded my own
    > >> reference-counted object implementation, and ended up benchmarking it 15%
    > >> faster than Boost's disgusting implementation.

    >
    > > 15% is hardly impressive. It is unlikely that shared_ptr operations
    > > are responsible of more than 1% of your application run time (if they
    > > are, you are likely doing something wrong). Improving 15% on that
    > > isn't a great accomplishment nor a good use of your time.

    >
    > You must not be familiar with developing real world applications for the
    > financial industry, where every nanosecond of latency counts. Even a 1%
    > improvement translates to a non-trivial competitive advantage. I'll take it
    > any day.


    I develop applications where latency is important, but 1% is hardly
    significative. Anyways in your case is more 15% of 1% ;).

    If shared_ptr construction is the bottleneck you are likely going to
    get a much bigger speed up by not using reference count at all (use a
    real garbage collector or maybe an arena allocator).

    >
    > Furthermore, this was a rather crude benchmark, which probably ended up
    > hitting the CPU cache most of the time. A more realistic mock-up will likely
    > show a greater difference due to doubling of heap allocation activity that
    > shared_ptr demands,


    In a more realistic application you hardly do any shared_ptr
    construction at all, at least not in the tightest loop of your
    program.

    > not to mention all the unnecessary calls to libgcc.
    > Especially since, as a result of deferred allocation by shared_ptr, the
    > extra allocation won't have a 1:1 relationship with heap allocation of the
    > referenced objects themselves.
    >
    > > With a custom tailored shared ptr you can do much, much better (as,

    >
    > I see no need for something heavily customized. Just basic operations,
    > typical for reference-counted objects, will suffice, as long as you do them
    > correctly. Really, this is not rocket science.
    >


    Then do not complain about performance!

    --
    gpd
     
    gpderetta, Mar 28, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmFtZXNT?=

    ASP.NET Worker Process Thread Counts

    =?Utf-8?B?SmFtZXNT?=, Apr 27, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    673
    =?Utf-8?B?SmFtZXNT?=
    Apr 27, 2005
  2. raghu

    Reference Counts

    raghu, May 18, 2006, in forum: Python
    Replies:
    5
    Views:
    463
    Tim Peters
    May 19, 2006
  3. Replies:
    0
    Views:
    358
  4. Gabriel Rossetti
    Replies:
    0
    Views:
    1,361
    Gabriel Rossetti
    Aug 29, 2008
  5. John Nagle
    Replies:
    5
    Views:
    485
    John Nagle
    Mar 12, 2012
Loading...

Share This Page