why boost:shared_ptr so slower?

C

Chris M. Thomasson

Juha Nieminen said:
I'm saying that sizeof(NonintrusiveSmartPointer) == sizeof(void*),
which is what the original poster seemed to be concerned about.

Of course the reference counter will still take its own memory, but it
will not make instances of the non-intrusive smart pointer any larger
than one single raw pointer.


The solution is much simpler than that. Rather than storing the
pointer to the managed object in the smart pointer, store it alongside
the reference counter. The smart pointer will only have a pointer to
this structure as member.

Yes, there will be an additional level of indirection when accessing
the managed object, but at least the size of the smart pointer will be
the size of one raw pointer. This might in some cases be advantageous if
the level of sharing is significant.

Okay. I totally understand what you're getting at. There are some more
advantages when you do it this way. For instance, one can atomically mutate
the pointer to the underlying reference count object using normal word-sized
atomic operations. IIRC from a past conversation with
David Abrahams and Peter Dimov, Boost does not allow one to do this because
it's size is greater than sizeof(void*). You have to use double-width atomic
operations which are not portable at all.
 
A

Alf P. Steinbach

* Chris M. Thomasson:
Okay. I totally understand what you're getting at. There are some more
advantages when you do it this way. For instance, one can atomically
mutate the pointer to the underlying reference count object using normal
word-sized atomic operations. IIRC from a past conversation with
David Abrahams and Peter Dimov, Boost does not allow one to do this
because it's size is greater than sizeof(void*). You have to use
double-width atomic operations which are not portable at all.

Actually using the pointer is the most frequent operation, where an extra
indirection can be costly.


Cheers,

- Alf
 
C

Chris M. Thomasson

Alf P. Steinbach said:
* Chris M. Thomasson: [...]
Okay. I totally understand what you're getting at. There are some more
advantages when you do it this way. For instance, one can atomically
mutate the pointer to the underlying reference count object using normal
word-sized atomic operations. IIRC from a past conversation with
David Abrahams and Peter Dimov, Boost does not allow one to do this
because it's size is greater than sizeof(void*). You have to use
double-width atomic operations which are not portable at all.

Actually using the pointer is the most frequent operation, where an extra
indirection can be costly.

Agreed. However, everything has it's tradeoffs...

;^)
 
C

Chris M. Thomasson

Howard Hinnant said:
Replying to no one in particular:

For those concerned about needing two heap allocations to create a
shared_ptr, you might want to check out boost:: (soon to be std::)
make_shared:

struct A
{
A(int i, char c);
...
};

shared_ptr<A> p = make_shared<A>(i, c);

The A and the refcount are allocated within one heap allocation (just
like an intrusive implementation). If you still need more control
than that (e.g. allocate out of static buffer) there is:

shared_ptr<A> p = allocate_shared<A>(my_allocator<A>(), i, c);

That's pretty cool. I did not know about it because I am not a user of
Boost.
 
T

Thomas J. Gritzan

joseph said:
OK, my intent isn't to beat up boost::shared_ptr because I use it
frequently, and find it very useful. Unfortunately, I could use it a
lot more if it didn't utilize heap allocations when being constructed,
or I could control those heap allocations better. Probably I can and
just am not aware of how.

I was mistaken about the two heap allocations. Perhaps this was true
in an older boost version?

What I meant about the intermediate objects is this. If I had a
singleton as follows, which self-destructs as soon as all users of it
go out of scope:

template<typename T>
class Single
{
static shared_ptr<T> instance()
{
if(!m_instance)
{
m_instance = shared_ptr<T>(new T);
return m_instance;
}

Move the "return" here.
}
static shared_ptr<T> m_instance;
};

Every time I call "instance()", a shared_ptr is being created (slow
heap allocation).

Huh? The shared_ptr is copied but there's no heap allocation. It's just
the reference count that is incremented. There's only one heap
allocation for the shared_ptr state when you create the first shared_ptr
from a raw pointer.
If instead I had used plain pointers:
template<typename T>
class Single
{
static T* instance()
{
if(!m_instance)
{
m_instance = new T;
}
}
static T* m_instance; //initialize to 0
};

I would have the initial allocation, but could call "instance" an
unlimited amount of times without additional allocations. (I'm
obviously not getting the reference counting either here)

I don't see a reason for a shared_ptr for a singleton either.
 
K

Keith H Duggar

So what do you think one "commonly thinks of" when one says a
construct is "thread safe".

I mean that the entire type interface is "as thread-safe as a
POSIX-thread-safe function".
As far as I can tell,
boost::smart_ptr meets the definitions of Posix, and offers
the maximum thread safety which one can reasonably expect.

I disagree that boost::smart_ptr meets the definition of POSIX
"thread-safe". Why? Because only a subset of, not all of, the
interface can be safely invoked concurrently by multiple threads
(example reset).
In what way?

The basic definition of "thread safe", at least as I see it used
by the experts in the field, more or less corresponds to the
Posix definition: there is no problem using different instances
of the object in different threads, regardless of what you're
doing with them, there is no problem using the same instance in
different threads as long as none of the threads are modifying
the instance (from a logical point of view). Anything else

First off (please correct me if I'm wrong) POSIX only defines
"thread-safe" for functions and that definition is:

3.396 Thread-Safe
A function that may be safely invoked concurrently by multiple
threads. Each function defined in the System Interfaces volume
of IEEE Std 1003.1-2001 is thread-safe unless explicitly stated
otherwise. Examples are any "pure" function, a function which
holds a mutex locked while it is accessing static storage, or
objects shared among threads.

So first the POSIX definition must be extended to the notion of
type (ie class in C++). One way to do that is to take the simple
and common view of all member functions being free functions that
take a hidden this pointer. Then we apply the POSIX definition to
that set of functions and if *every* function (minus destructor)
is POSIX thread-safe then we can say the class is "thread-safe".
boost::share_ptr fails this criteria for example "reset" fails
when supplied with identical "this" pointers.

In other words, it is what N2410

http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2410.html

proposed to call "strong thread-safe" also as Chris MT has called
in some other posts. However, note that since "strong thread-safe"
is simply the most natural extension of POSIX "thread-safe" to
C++ types, then "thread-safe" without qualification should mean
"strong thread-safe" and that is consistent with your claim that
"the experts in the field, more or less corresponds to the Posix
definition". It's just I don't know where you got your definition
of POSIX "thread-safe" because that's not what I recall from the
POSIX document?

Finally, I will note N2519

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2519.html

which claims that the level of thread-safe that boost::shared_ptr
provides is "not widely recognized or named in the literature" and
that is consistent with my experience as well.

KHD
 
J

James Kanze

I mean that the entire type interface is "as thread-safe as a
POSIX-thread-safe function".

In other words (quoting the Posix standard): "A function that
may be safely invoked concurrently by multiple threads." All of
the member functions of boost::shared_ptr meet that requirement.

In this case, of course, we're not talking about just functions,
but about an in memory object. And the Posix definitions for
memory access say that "Applications shall ensure that access to
any memory location by more than one thread of control (threads
or processes) is restricted such that no thread of control can
read or modify a memory location while another thread of control
may be modifying it." Which is exactly what boost::shared_ptr
requires.
I disagree that boost::smart_ptr meets the definition of POSIX
"thread-safe". Why? Because only a subset of, not all of, the
interface can be safely invoked concurrently by multiple
threads (example reset).

I don't think so. Could you post a scenario where reset (or any
other function) fails when invoked from different threads. Or
for that matter, when any combination of functions fails when
called from different threads, providing the basic requirement
is met: if you're modifying the object (and reset() modifies
it), then if any other thread accesses the same object (the same
instance of boost::shared_ptr---client code shouldn't have to be
concerned with what different instances of the object share),
access must be synchronized. This is exactly the same
requirement as for any Posix function. (There are a very few IO
functions which give an even stronger guarantee, i.e. read and
write.) In other words, given:

boost::shared_ptr< T > p1( new T ) ;
boost::shared_ptr< T > p2( p1 ) ;

you should be able to do p1.reset() in a thread, regardless of
what other threads might be doing with p2, but if you do
p1.reset() in one thread, all accesses to p1 must be
synchronized if p1 is accessed from any other thread. If I
understand correctly, boost::shared_ptr guarantees this, and
that's all that Posix guarantees for it's thread-safe functions.
First off (please correct me if I'm wrong) POSIX only defines
"thread-safe" for functions and that definition is:
3.396 Thread-Safe
A function that may be safely invoked concurrently by multiple
threads. Each function defined in the System Interfaces volume
of IEEE Std 1003.1-2001 is thread-safe unless explicitly stated
otherwise. Examples are any "pure" function, a function which
holds a mutex locked while it is accessing static storage, or
objects shared among threads.

It also defines guarantees concerning "memory" accesses, see
above. These are perhaps even more relevant than the function
guarantees when an object is involved---a boost::shared_ptr,
after all, is a replacement for a raw pointer, not for some
function.
So first the POSIX definition must be extended to the notion of
type (ie class in C++). One way to do that is to take the simple
and common view of all member functions being free functions that
take a hidden this pointer. Then we apply the POSIX definition to
that set of functions and if *every* function (minus destructor)
is POSIX thread-safe then we can say the class is "thread-safe".
boost::share_ptr fails this criteria for example "reset" fails
when supplied with identical "this" pointers.

And localtime_r fails when supplied with identical buffers.
That's a foregone. Posix doesn't require thread-safe functions
to be able to be called on the same objects from different
threads. Because that's not reasonable; if you want an extreme
example, think of setjmp (which Posix defines as thread safe).
In other words, it is what N2410

proposed to call "strong thread-safe" also as Chris MT has
called in some other posts. However, note that since "strong
thread-safe" is simply the most natural extension of POSIX
"thread-safe" to C++ types, then "thread-safe" without
qualification should mean "strong thread-safe" and that is
consistent with your claim that "the experts in the field,
more or less corresponds to the Posix definition". It's just I
don't know where you got your definition of POSIX
"thread-safe" because that's not what I recall from the POSIX
document?

If you'd read the document you'd site, it points out quite
clearly that the so-called "strong thread-safety" is a very
naïve meaning for thread safety. As pointed out above, Posix
doesn't require it, and in fact, no expert that I know defines
thread-saftey in that way.
Finally, I will note N2519

which claims that the level of thread-safe that
boost::shared_ptr provides is "not widely recognized or named
in the literature" and that is consistent with my experience
as well.

It's curious that in the sentence immediately following this
statement, he cites a document that does "recognize" this level
of thread safety. And if this level of thread safety has no
special name, it's probably because it is what is generally
assumed by "thread-safety" by the experts in the field; I've
never seen any article by an expert in threading that spoke of
"strong thread-safety" other than to explain that this is not
what is meant by "thread-safety".
 
J

James Kanze

I don't even understand the difference between "basic/normal"
and "strong".

"basic/normal" thread-safety is what is normally meant by thread
safety (e.g. Posix guarantees). "Strong" thread safety is
something more: it means that the object state can be modified
(from the client point of view) from several threads
simultaneously. It can be useful in a few specific cases
(message queues, etc.), but is expensive, and not generally
useful enough to warrant the expense.

As you might imagine, when used without further qualification,
"thread-safety" means normal thread-safety.
boost::shared_ptr is thread-safe because instance of it can be
used in different threads even if these instances point to the
same object. boost::shared_ptr doesn't malfunction if two
threads manipulate these instances at the same time.

Exactly. Also, you can use the same instance in many threads
provided no thread mutates it (again, from a client point of
view).
 
J

James Kanze

Do you mean that the overhead of a non-intrusive
reference-counting smart pointer will be sizeof(void*)?
Yes.

Are you stealing some bits in the pointer to keep the
reference count or something?

No, you introduce an additional level of indirection. (Isn't
that the solution for every problem:)?) Something like:

template< typename T >
class SharedPtr
{
struct Impl
{
T* ptr ;
int cnt ;

Impl( T* ptr ) : ptr( ptr ), cnt( 0) {}
} ;
Impl* myImpl ;
public:
SharedPtr( T* newedPtr )
: myImpl( new Impl( newedPtr ) )
{
++ myImpl->cnt ;
}
// ...
} ;

(It's obviously a bit more complicated than that if you want to
support conversions of SharedPtr said:
, but you get the idea.)
I must be totally misunderstanding you. I can think of a way
in which the counters are located in a static array and
objects are queued for destruction when that count goes to
zero. But that might mean non-deterministic destruction
properties.

You could always keep the counters in map, indexed by the object
address. I rather think that the performance of copying a
pointer would be unacceptable in such cases (since it implies a
map lookup in order to increment), but I've not actually
measured it to be sure---perhaps with a very clever
implementation of the map.
 
C

Chris M. Thomasson

Juha Nieminen said:
Care to show me how you subclass a basic type?


Care to show me how you make an intrusive smart pointer work with
incomplete types?

For example, when the intrusive smart pointer needs to increment the
reference counter, how do you do it when the object type is incomplete?

Perhaps something along the lines of:
____________________________________________________________________
struct ref_base {
unsigned m_count;
virtual ~ref_base() = 0;
};


ref_base::~ref_base() {}


template<typename T>
class ptr {
ref_base* m_ptr;

void prv_init() const {
if (m_ptr) m_ptr->m_count = 1;
}

void prv_inc() const {
if (m_ptr) ++m_ptr->m_count;
}

void prv_dec() const {
if (m_ptr) {
if (! --m_ptr->m_count) {
delete m_ptr;
}
}
}

public:
ptr(T* ptr = NULL) : m_ptr(static_cast<ref_base*>(ptr)) {
prv_init();
}

ptr(ptr const& rhs) : m_ptr(rhs.m_ptr) {
prv_inc();
}

~ptr() {
prv_dec();
}

ptr& operator = (ptr const& rhs) {
rhs.prv_inc();
prv_dec();
m_ptr = rhs.m_ptr;
return *this;
}

T* operator ->() {
return static_cast<T*>(m_ptr);
}
};
____________________________________________________________________
 
C

Chris M. Thomasson

No, you introduce an additional level of indirection. (Isn't
that the solution for every problem:)?)
:^D


Something like:
template< typename T >
class SharedPtr
{
struct Impl
{
T* ptr ;
int cnt ;

Impl( T* ptr ) : ptr( ptr ), cnt( 0) {}
} ;
Impl* myImpl ;
public:
SharedPtr( T* newedPtr )
: myImpl( new Impl( newedPtr ) )
{
++ myImpl->cnt ;
}
// ...
} ;

Okay. See, when I used the term overhead, I meant the total overhead
including the pointer to the private counter object `myImpl' and the amount
of memory it takes to create said object. So, that's 1 pointer + 1 pointer +
1 int. Perhaps I am in error thinking about it that way.

[...]
 
J

Juha Nieminen

James said:
You could always keep the counters in map, indexed by the object
address. I rather think that the performance of copying a
pointer would be unacceptable in such cases (since it implies a
map lookup in order to increment), but I've not actually
measured it to be sure---perhaps with a very clever
implementation of the map.

There is also a third possibility, which is somewhere between an
intrusive and non-intrusive solution: Every time you allocate an object
to be managed, make the allocation larger by sizeof(size_t), and then
use that extra space for the "semi-intrusive" reference counter.

The advantage of this is that the smart pointer will behave basically
in the exact same way as an intrusive smart pointer, and the memory
consumption will be exactly the same.

This solution even has additional advantages over classic intrusive
smart pointers: The "semi-intrusive" smart pointer can support existing
types and even builtin types, and can work with incomplete types at no cost.

Of course there are some disadvantages over non-intrusive smart
pointers. For instance, you obviously cannot manage any object which has
been allocated in the regular way, without that extra space. Depending
on how you implement the allocation, a base class type smart pointer
might or might not support derived class objects.
 
J

Juha Nieminen

Sam said:
That's the general idea -- all reference-counted objects are then
derived from this superclass, virtually, and this is a rough outline of
how I implemented it, except that the virtual superclass's destructor is
not abstract, of course, nor does it need to be.

You do realize that casting from a virtual base class to the actual
object's type can incur a penalty which does not happen with regular
non-intrusive smart pointers?
The actual
increment/decrement operations are the same as with shared_ptr --
compiler-specific instructions that compile down to atomic CPU
operations, so that they are thread-safe.

Increments and decrements are in no way guaranteed to be atomic, and
in some architectures they may well not be. Even if they were, there's
still a huge mutual exclusion problem here:

if (! --m_ptr->m_count) {
delete m_ptr;
}

Guess what happens if another thread executes this same code in
between the decrement and the comparison to null in this thread, and the
counter happened to be 2 to begin with.
I think that shared_ptr could've been a much better implementation, only
with a little bit additional forethought.

Except that shared_ptr was designed to work with any existing type
(including builtin types) without the need to modify that type.
 
S

SG

Juha said:
  There is also a third possibility, which is somewhere between an
intrusive and non-intrusive solution: Every time you allocate an object
to be managed, make the allocation larger by sizeof(size_t), and then
use that extra space for the "semi-intrusive" reference counter.

The advantage of this is that the smart pointer will behave basically
in the exact same way as an intrusive smart pointer, and the memory
consumption will be exactly the same.

It's my understanding that the std::make_shared<T>(...) function
template does something very similar (as Howard Hinnant pointed out
already).
Also, consider another major design flaw with shared_ptr: a class
method has no way of obtaining a reference to its own instance.

I think that's what N2914 section 20.8.10.5 is about:

" Class template enable_shared_from_this

A class T can inherit from enable_shared_from_this<T> to
inherit the shared_from_this member functions that obtain
a shared_ptr instance pointing to *this. "


It looks like std::shared_ptr will be just as good as any intrusive
pointer w.r.t. allocation counts and "shared_from_this". A shared_ptr
implementation will probably store two pointers but I'm fine with
that. The flipside is that types don't have to derive from some
special base class for reference counting. This is IMHO a big
advantage.


Cheers!
SG
 
J

Juha Nieminen

SG said:
It's my understanding that the std::make_shared<T>(...) function
template does something very similar (as Howard Hinnant pointed out
already).

I think you mean boost::make_shared? Although I hope it will some day
be in the std namespace as well... :)
 
J

Juha Nieminen

Sam said:
Casting from any superclass to a subclass incurs a penalty

I don't think that's true. If the compiler can see both class
declarations and there is a simple inheritance relation between them
(with no virtual functions in the derived class), the pointer doesn't
have to be modified in any way and this decision can be done at compile
time. The cast will effectively not create any machine code whatsoever.
Which, of course, is exactly what I stated in my first message in this
thread, and explained what the trade-offs are. Besides, you cannot
assume that you can arbitrarly attach a shared_ptr to some arbitrary
instance of a builtin or a basic type. For all you know, it was
allocated on the stack, and not the heap, or whichever library allocated
it on the heap, might decide to deallocate it, since it's a basic or a
builtin type that the library owns it, despite the fact that you have a
shared_ptr pointing to it.

If you want to use a shared_ptr which points to an object which must
not be destroyed by that shared_ptr, you can tell it. If the object must
be destroyed in some special way (eg. by using some custom allocator),
you can also tell it that.
 
J

Juha Nieminen

Pete said:
If the decrement is atomic (not an atomic CPU instruction, but atomic in
the sense of not tearing and producing a result that's visible to all
threads that use the variable) then this works just fine. Of course, all
the other manipulations of this variable must also be similarly atomic.

I don't understand how that can work if the result of the decrement is
not immediately visible to all threads.

If, for example, the code waits for the next sequence point to
actually write the new value of the variable into shared RAM, you will
obviously have another problem: Two threads might see the counter as
having the value 2, they then may both decrement it to 1 at the same
time, after which both decide that the object should not be deleted,
after which both write the 1 into the shared RAM. Then they both go out
of scope and the object is never deleted, and thus leaked.

And as I said, if the decrement is immediately visible to all threads,
you end up having a mutual exclusion problem which may cause the object
to be deleted twice (and possibly other even more obscure problems). In
the best case scenario your program will simply crash.
 
C

Chris M. Thomasson

Juha Nieminen said:
I don't understand how that can work if the result of the decrement is
not immediately visible to all threads.

If, for example, the code waits for the next sequence point to
actually write the new value of the variable into shared RAM, you will
obviously have another problem: Two threads might see the counter as
having the value 2, they then may both decrement it to 1 at the same
time, after which both decide that the object should not be deleted,
after which both write the 1 into the shared RAM. Then they both go out
of scope and the object is never deleted, and thus leaked.

And as I said, if the decrement is immediately visible to all threads,
you end up having a mutual exclusion problem which may cause the object
to be deleted twice (and possibly other even more obscure problems). In
the best case scenario your program will simply crash.


For a reference counting algorithm with basic thread-safety:
___________________________________________________________
void refcount_decrement()
{
MEMBAR_RELEASE();

if (ATOMIC_FAA(&m_ptr->m_count, -1) == 1)
{
MEMBAR_ACQUIRE();
delete m_ptr;
}
}
___________________________________________________________




is 100% perfectly safe. BTW, ATOMIC_FAA is Fetch-and-Add.
 
C

Chris M. Thomasson

Pete Becker said:
If the decrement is atomic (not an atomic CPU instruction, but atomic in
the sense of not tearing and producing a result that's visible to all
threads that use the variable) then this works just fine. Of course, all
the other manipulations of this variable must also be similarly atomic.

You would also need to ensure that the read-modify-write sequence was a full
atomic operation. No compiler I have ever seen would ensure that the
following RMW was uninterruptible in the presence of multiple threads:


int i = 3;

--i;



Think if thread A read a value of 3, thread B read a value of 3 and thread C
read a value of 3. Then thread A wrote a value of 2, thread B wrote a value
of 2 and finally thread C wrote a value of 2. The end result would be 2 when
it should have been 0.
 
K

Keith H Duggar

In other words (quoting the Posix standard): "A function that
may be safely invoked concurrently by multiple threads." All of
the member functions of boost::shared_ptr meet that requirement.

[snip same old arguments ie that that some functions are only
safe when called on *different* objects and that it is this
conditional safety that "all the experts" mean when they say
"thread-safe"]
If you'd read the document you'd site, it points out quite
clearly that the so-called "strong thread-safety" is a very
naïve meaning for thread safety.

It points out nothing about the notion being "naive". That is
your coloration. It simply points out the likely costs.
As pointed out above, Posix
doesn't require it, and in fact, no expert that I know defines
thread-saftey in that way.


It's curious that in the sentence immediately following this
statement, he cites a document that does "recognize" this level
of thread safety. And if this level of thread safety has no
special name, it's probably because it is what is generally
assumed by "thread-safety" by the experts in the field; I've
never seen any article by an expert in threading that spoke of
"strong thread-safety" other than to explain that this is not
what is meant by "thread-safety".

The crux of our disagreement is two fold 1) you hold that
a function can be "thread-safe" even if it conditionally
imposes some extra requirements such as different objects,
buffer pointers, etc 2) you hold that "all the experts"
agree with you.

As to 1) I simply disagree. I think something is "thread-safe"
only if it is the naive sense of "if the class works when there
is only a single-thread and is thread-safe, then it works when
there are multiple threads with *no additional synchronization
coding required* (note this was just a *toy* way of putting it,
read the article I'm about to link for a more useful wording
and discussion of it).

As to 2) well there are at least 3 experts who hold my view:
Brian Goetz, Joshua Bloch, and Chris Thomasson. Here is, an
article by Brian Goetz that lays out the issues very nicely:

http://www.ibm.com/developerworks/java/library/j-jtp09263.html

Finally, as to 1) ultimately it is a matter of definition. If
you are right and we all agree to call the "as thread-safe as
a built-in type" just "thread-safe" that would be fine too.
However, as you can see, there is disagreement and my point
here was simply that one should be a bit more careful that just
saying boost::shared_ptr is "thread-safe". Indeed, one should
call it exactly what the Boost document calls it "as thread-safe
as a built-in type" or perhaps "conditionally thread-safe" so
as to be careful and avoid confusion.

And by the way, that last point is not just some pointless nit-
pick. Even in the last year-and-half at work, I caught (during
review) three cases of thread-unsafe code that was a result of a
boost::shared_ptr instance being shared unsafely (all were cases
of one thread writing and one reading). When I discussed the review
with the coders all three said exactly the same thing "But I thought
boost::shared_ptr was thread-safe?". Posts like Juha's that say
unconditionally "boost::shared_ptr is thread-safe" continue to
help perpetuate this common (as naive as you might say it is)
misunderstanding.

KHD
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top