Thread-safe reference counts.

J

James Kanze

Chris said:
[...]
My only point is that, IMVHO of course, GC can be a wonderful
tool for all sorts of lifetime management schemes. This due to
the fact that it can inform the management protocol when an
object is in a quiescent state.
But you still haven't explained what you mean by "an object in a
quiescent state".
[...]
http://dictionary.reference.com/browse/quiescent
An object is at rest and has nothing to do.
Including being terminated?
Calling the objects destructor is fine in this state.

That's not what I asked.

I think we're talking at cross purposes. I am using the word
terminated intentionally, to try to break the link with the C++
concept of destructor. An object which has indefinite lifetime
doesn't need termination, so there's no question there that the
object has nothing to do. If an object has definite lifetime,
however, terminating its lifetime will normally result in it
doing something.

There's also a sense that most objects I deal with are at rest
and have nothing to do most of the time; they only wake up and
have something to do in response to some specific external
event.
When the counter has dropped to zero, the object can be
destroyed, reused, cached, ect.

You're missing the point. If the object has a determinate
lifetime, and something occurs to make that lifetime end, then
it must be terminated. Whether it is still accessible or not.
Later accesses are a programming error, but delaying termination
would also be an error. Depending on other design
considerations, this may be handled by putting the object in a
specific, invalid state, and signaling the error on next use, or
by some form of the observer pattern, notifying all clients of
the objects demise.
No dynamic object should be able to acquire a reference to an
object whose reference count is zero.

And if I don't use reference counting?

An object may or may not be reachable. If the object has an
indefinite lifetime, the fact that it is no longer reachable
means that we can reuse its memory. If the object has a
definitely lifetime, and that lifetime has not ended, the fact
that it is no longer reachable is an error in the program; if
that lifetime has ended, the fact that it is reachable is
irrelevant, since it cannot be used.
A Proxy GC will call an in-quiescent state callback function
for an object when it determines that said object has quiesced
(e.g., unreachable). This is analogous to a reference counting
algorithm dropping the count to zero and subsequently
notifying the application via. callback. Imagine if
shared_ptr did not call dtor, but called a function that
allowed an application to decide what to do.

Boost::shared_ptr does. I use it to release locks, for example.

Of course, in such cases, there are other pointers to the object
as well.
It can call the objects dtor, or cache it, or immediately
reuse it, whatever, the object is quiescent.

But those are all implementation details. Generally speaking,
if we use garbage collection:

-- if the object has an indeterminate lifetime, garbage
collection handles it perfectly---we don't need any
additional code, and

-- if the object has a determinate lifetime, garbage collection
allows protection against errors:

o since the memory won't be reused as long as it is
reachable, we can set a flag when the object is
terminated, and test the flag in each use of the object,
triggering an error in case the flag is set, and

o if the garbage collector supports finalization (i.e.
calls a function when the object ceases to be reachable,
before recycling the memory), we can verify that the
object has been correctly terminated before it became
unreachable.
It can be reused, cached, the dtor can be called, ect.

Nominally, except for memory management issues, there's no
reason for such an object to have a destructor. If we consider
conceptually infinite memory, and that allocation and
deallocation are not observable behavior, then the fact that an
object with indeterminate lifetime is no longer reachable has no
logical effect on the program. The object just "disappears".
The programmer who creates the logic can decide. E.g:
void object_quiescent(object* const _this) {
// you can call dtor; delete _this
// you can cache; object_cache_push(_this);
// the object is unreachable indeed.
}

The whole point is that for some objects, such lifetime issues
are irrelevant, at least from the design point of view, and
doing anything is extra work for the programmer. And for other
objects, such lifetime issues are very relevant, but they are
totally independent of reachability; the object (or some owning
object) is reacting to an external consideration which
determines the lifetime, and the object's lifetime must be
terminated, immediately, regardless of reachability. The end of
its lifetime is part of the observable behavior of the object.
The fact that the GC or reference-count can infrom the program
logic that an object is not able to be reached is valuable
information to any lifetime management scheme which deals with
dynamic objects.

I disagree (but the problem may be with regards to what we mean
by "lifetime"). If lifetime is relevant, then its termination
is observable behavior, which must occur at a specific instance,
independently of whether the object can be reached or not
(except that if it cannot be reached, there is no way to inform
it that it must terminate). If lifetime is not relevant (i.e.
termination has no observable behavior), then there's no point
in informing the object about it.
A quiescent-state is like a GC determining that an object can
be reaped, but informing the application and letting it decide
what to do. It can call the dtor, or reuse, ect.

But I really don't see the need, conceptually, except in some
very special cases.
Check this out:

Does that make any sense?

I did enough web search to determine that it mean
read-copy-update. The concept seems related to some compiler
optimization techniques I've seen, in which the compiler
considers "values", rather than the variables that they contain,
but I didn't (and don't) have time now to study it in detail.
My first impression, however, is that we really are talking
about different things when we talk about object lifetime---I'm
talking about the actual design level objects, and you're
talking about specific "generations" or "values" of those
objects---a much lower level concept. That feedback from the
garbage collector can help in such low level optimization, I
have no doubt, but I tend to view such things as happening "under
the hood" for the application level programmer; he just sees his
design object.
 
D

Dmitriy V'jukov

I don't understand your requirement 2. It doesn't matter who or what
makes the copy of the pointer, so long as *something* has a reference.


And what if *something* decide to release reference concurrently?

I know what you will answer - concurrent modifications must be
suppressed by means of mutex. Ok, you are right. But consider
following simple functions:

void acquire(T* x)
{
atomic_inc(x->rc);
}

It's simple, well-known, fast, can be executed concurrently by any
number of threads *without* any additional protection. In short it's
good.

So the whole point of 'basic thread-safety vs. strong thread-safety'
is whether you can use this function *as-is* or not.

You are trying to call both things 'correct synchronization fitting
the situation'. Well it's legitimate. But sometimes it's helpful to
differentiate those situations. So some situations I call 'requiring
only basic thread safety' meaning only that I can use above-mentioned
acquire() function as-is. Some situations I call 'requiring strong
thread safety' meaning only that I can't use above-mentioned acquire()
function as-is. Accordingly some primitives I call 'supporting only
basic thread safety' or 'supporting strong thread safety'. That is
all. Not more and not less.
You always can call both things 'correct synchronization fitting the
situation'. Just sometimes use some additional mutexes and sometimes
not.



The owner of a reference is notional, not enforced. For example, if a
global variable contains a pointer, it also has a [notional]
reference. Any thread that access that global variable can 'borrow'
the reference.

The only potential problem is if the owner of the 'notional' reference
releases its reference at the same time it is using it. But that race
is in the owner of the reference, not the atomic pointer
implementation or whatever.

A global variable can have a reference. A thread can have a reference.
A collection can have a reference. The owner is just conceptual.

If, for example, I have a hash table that has a bunch of objects in
it, it can also have a reference to each of those objects to ensure
they aren't removed while they're still in the table. But that
reference can be used by any thread that calls into the hash table's
functions.


I agree that owner of the reference is notional. But when it's
notional of one type you can use above-mentioned acquire() function as-
is. And when it's notional of another type you can't. It's legitimate
to not differentiate types of owner (it's what you are trying to do).
But sometimes it's helpful just to simplify communications.


Dmitriy V'jukov
 
D

Dmitriy V'jukov

Let me phrase my point more clearly: 2 is always satisfied. Anything
that is using a reference is, for all intents and purposes, the owner
of that reference (while it is using it). Of course, the use of a
reference must be properly synchronized. Even strong thread safety
won't let you use and release a reference at the same time.


Well, you can consider that condition 2 is always satisfied. But then
the point is whether (1) it's satisfied without any additional efforts
or (2) you must make some additional efforts to satisfy it.

This division is notional too. The whole purpose is to simplify
communication.


Dmitriy V'jukov
 
D

David Schwartz

And what if *something* decide to release reference concurrently?

Then nothing will save you. Not even strong thread safety can
guarantee correct results if you concurrently release and use the same
reference.
I know what you will answer - concurrent modifications must be
suppressed by means of mutex.

That's certainly one way, but there are other ways. But it should go
without saying that you cannot concurrently use and free a resource,
and you must prevent this somehow. A mutex is one way.
Ok, you are right. But consider
following simple functions:

void acquire(T* x)
{
atomic_inc(x->rc);
}

It's simple, well-known, fast, can be executed concurrently by any
number of threads *without* any additional protection. In short it's
good.

I agree.
So the whole point of 'basic thread-safety vs. strong thread-safety'
is whether you can use this function *as-is* or not.

Here's where you lose me. I have no idea what that's supposed to mean.
That will always be safe provided you have a reference and never be
safe if you don't. What does basic thread safety versus strong thread
safety have to do with it?
You are trying to call both things 'correct synchronization fitting
the situation'. Well it's legitimate. But sometimes it's helpful to
differentiate those situations. So some situations I call 'requiring
only basic thread safety' meaning only that I can use above-mentioned
acquire() function as-is. Some situations I call 'requiring strong
thread safety' meaning only that I can't use above-mentioned acquire()
function as-is. Accordingly some primitives I call 'supporting only
basic thread safety' or 'supporting strong thread safety'. That is
all. Not more and not less.

So there is some cases where it is safe to manipulate a reference
counted object without a reference to it? Or there is some case where
it is safe to concurrently use and free a resource?
You always can call both things 'correct synchronization fitting the
situation'. Just sometimes use some additional mutexes and sometimes
not.

That's a completely backwards approach to synchronization. If you are
going to use reference-counting smart pointers, a reference should
always accompany a pointer. That's the whole point of that approach.

Now if there's some case where some optimization that sometimes
disassociates pointers with references is known safe, sure, go ahead
and use it. Make sure you use it properly and make sure you use it
where there's demonstrated need.

But the basic idea of smart reference-counting pointers is that
pointers and references travel together and you can only get a
reference from something that already has one.
The owner of a reference is notional, not enforced. For example, if a
global variable contains a pointer, it also has a [notional]
reference. Any thread that access that global variable can 'borrow'
the reference.
The only potential problem is if the owner of the 'notional' reference
releases its reference at the same time it is using it. But that race
is in the owner of the reference, not the atomic pointer
implementation or whatever.
A global variable can have a reference. A thread can have a reference.
A collection can have a reference. The owner is just conceptual.
If, for example, I have a hash table that has a bunch of objects in
it, it can also have a reference to each of those objects to ensure
they aren't removed while they're still in the table. But that
reference can be used by any thread that calls into the hash table's
functions.
I agree that owner of the reference is notional. But when it's
notional of one type you can use above-mentioned acquire() function as-
is. And when it's notional of another type you can't. It's legitimate
to not differentiate types of owner (it's what you are trying to do).
But sometimes it's helpful just to simplify communications.

You can always use that acquire function as-is, provided you have a
reference. You can never use it, as is or any other way, without one,
as the object might go away. You cannot perform any operation without
a reference.

DS
 
D

Dmitriy V'jukov

Then nothing will save you. Not even strong thread safety can
guarantee correct results if you concurrently release and use the same
reference.


Strong thread safety guarantees correct results in this situation.
But note that type of input parameter of acquire function with strong
thread safety is not T*, it's T**.

T* g_object;
T* strong_acquire(T** dest);

T* obj = strong_acquire(&g_object);

This pattern is legal. Even provided concurrent modification of
g_object's value.
Note that thread doesn't hold reference to object which it want to
acquire in any shape or form before calling strong_acquire() (actually
thread even doesn't know which particular object it will acquire).
Postcondition: thread has pointer to object which was stored in
g_object at some point of time during execution of strong_acquire(),
and thread has 'private' reference to that object.

Does this clarify things?


Dmitriy V'jukov
 
D

David Schwartz

Strong thread safety guarantees correct results in this situation.
But note that type of input parameter of acquire function with strong
thread safety is not T*, it's T**.

T* g_object;
T* strong_acquire(T** dest);

T* obj = strong_acquire(&g_object);
This pattern is legal. Even provided concurrent modification of
g_object's value.

I agree.
Note that thread doesn't hold reference to object which it want to
acquire in any shape or form before calling strong_acquire()

The thread doesn't, but 'g_object' does. The thread is using
'g_object' and gains the benefit of its reference.
(actually
thread even doesn't know which particular object it will acquire).
Postcondition: thread has pointer to object which was stored in
g_object at some point of time during execution of strong_acquire(),
and thread has 'private' reference to that object.

Does this clarify things?

It clarifies to me that you are completely missing my point. Why
should anybody care about any of this? Just always keep pointers and
references together -- the whole point of having smart reference-
counting pointers in the first place -- and it will just work.

This whole nonsense about strong thread safety versus normal thread
safety is a pointless exercise in creating a problem, creating a
solution to it, and then trying to convince people that there is a
problem. There is no problem. Keep references and pointers together
always -- which is, again, the whole point of having these smart
reference-counting pointers.

DS
 
D

Dmitriy V'jukov


One post above you said that you disagree, and that this won't work...
The thread doesn't, but 'g_object' does. The thread is using
'g_object' and gains the benefit of its reference.

But value of 'g_object' can be concurrently modified.
It clarifies to me that you are completely missing my point. Why
should anybody care about any of this? Just always keep pointers and
references together -- the whole point of having smart reference-
counting pointers in the first place -- and it will just work.


This will not work (with basic thread safety) if smart pointer can be
concurrently modified. With strong thread safety this will work. Don't
you see the difference?


Dmitriy V'jukov
 
D

David Schwartz

This will not work (with basic thread safety) if smart pointer can be
concurrently modified. With strong thread safety this will work. Don't
you see the difference?

You are missing my point and we are talking past each other. I
understand your argument and don't disagree with it technically.
Perhaps an analogy would help.

My argument is like this:

The whole point of a car is to be able to drive whenever you want. You
shouldn't need something special in your car just to drive at 3AM on a
Thursday. If it's cheaper or easier to make a car that doesn't work at
3AM on a Thursday, then maybe people who know they don't need to drive
then can buy such a car. But there's no rational reason ordinary car
users should worry. A car should work any time you want to use it.

Your argument is like this:

Most people don't use their cars on Thursdays at 3AM. People who want
to do that can get an "anytime car". But normally there's no reason
for a car to come with this ability, since it's more complicated to
make cars like that.

My point is that atomic reference-counting pointers should just work.
A pointer should always come with a reference. If that's expensive to
implement, oh well, that's life. You can always optimize when and
where you know for a fact that it is safe to do so.

So-called "strong thread-safety" is just ordinary thread safety. So
called "basic thread safety" is just an implementation with known race
conditions that someone might particularly chose if they know it is
safe to do so.

My point is that this whole issue has been turned on its head in the
name of premature optimization.

DS
 
D

Dmitriy V'jukov

You are missing my point and we are talking past each other. I
understand your argument and don't disagree with it technically.
Perhaps an analogy would help.

My argument is like this:

The whole point of a car is to be able to drive whenever you want. You
shouldn't need something special in your car just to drive at 3AM on a
Thursday. If it's cheaper or easier to make a car that doesn't work at
3AM on a Thursday, then maybe people who know they don't need to drive
then can buy such a car. But there's no rational reason ordinary car
users should worry. A car should work any time you want to use it.

Your argument is like this:

Most people don't use their cars on Thursdays at 3AM. People who want
to do that can get an "anytime car". But normally there's no reason
for a car to come with this ability, since it's more complicated to
make cars like that.

My point is that atomic reference-counting pointers should just work.
A pointer should always come with a reference.


99.9% of produced by computer industry cars not able to drive on
Thursdays at 3AM. It's boost::shared_ptr, it's all home-made multi-
threaded smart pointer, I'm sure it's smart pointers which you made.
Computer industry produces such cars from beginning. Programmers get
used to such cars. Such car are very useful.

Now one propose to make new type of cars. I don't think that it's a
good idea to used the same term. Because everybody will be confused.

Sometimes quantity is transformed into quality. It's like saying "All
computers must have at least 100000 GFlops of processing power. I
don't care that it's ineffective. It's the right way. Producing
computers which have only 1 GFlops you just creating problems. Don't
try to make premature economic optimization producing such silly
computers. Always produce computers with at least 100000 GFlops."


If that's expensive to
implement, oh well, that's life. You can always optimize when and
where you know for a fact that it is safe to do so.

So-called "strong thread-safety" is just ordinary thread safety. So
called "basic thread safety" is just an implementation with known race
conditions that someone might particularly chose if they know it is
safe to do so.

My point is that this whole issue has been turned on its head in the
name of premature optimization.


Term 'premature optimization' is very stylish today. But there is also
term 'premature pessimization'. Multi-threaded smart pointer with
basic thread safety is just the right tool for many tasks.
boost::shared_ptr is used for years by thousands of developers, and
nobody complains. Why std::string in C++ is not multi-threaded? Is it
premature optimization?


Dmitriy V'jukov
 
D

Dmitriy V'jukov

My argument is like this:

The whole point of a car is to be able to drive whenever you want. You
shouldn't need something special in your car just to drive at 3AM on a
Thursday. If it's cheaper or easier to make a car that doesn't work at
3AM on a Thursday, then maybe people who know they don't need to drive
then can buy such a car. But there's no rational reason ordinary car
users should worry. A car should work any time you want to use it.


Your argument is like this:

The whole point of a car is to be able to drive, fly and make coffee
whenever you want.

My argument is like this:

Most people don't use their cars to fly and make coffee.



Smart pointer with basic thread safety (boost::shared_ptr) is not just
a buggy version of full-fledged smart pointer with strong thread
safety. It's independent non-buggy thing suitable for different
situations.

Yes, basic thread-safety is subset of strong thread-safety. But it's
not the reason to eliminate basic thread-safety at all. It's like
supercomputer vs. usual computer. It's like F1 racing car vs. usual
car.


Dmitriy V'jukov
 
D

Dmitriy V'jukov

strong thread-safety is a _requirement_ for several different algorithms,
NOT an optimization.


David Schwartz asserts that basic thread safety is just an
optimization of strong thread safety - just make *all* smart pointer
strongly thread-safe, and forgot about basic thread-safety. If I
understand him correctly.


Dmitriy V'jukov
 
J

James Kanze

"James Kanze" <[email protected]> wrote in message

[...]
My only point was that, IMHO, GC can help out with certain
types of lifetime management schemes because it can detect
when an object can be "safely" destroyed. That can be a fairly
important piece of information.

Yes. For a specific definition of "object lifetime". When I
speak of object lifetime, I'm normally thinking of the OO design
concept. I agree that with a more generalized definition, if
you're thinking in terms of object as a lower level concept (a
bit of memory containing a value---i.e. like the definition of
the word in the C and C++ standards), then there's a sense in
which garbage collection integrates, or can or should integrate
with object lifetime. Thus, my safety feature, of having the
terminated object mark that it is terminated, and verify on each
use that it isn't so marked, depends on the lifetime of the
underlying memory (the low level object) lasting as long as
there are any pointers to it; the lifetime of the OO object is
not the same as that of the physical (for lack of a better word)
object.
 
G

Gil Hamilton

David Schwartz asserts that basic thread safety is just an
optimization of strong thread safety - just make *all* smart pointer
strongly thread-safe, and forgot about basic thread-safety. If I
understand him correctly.

Yes. You and Chris keep trying to make a distinction between "strong
thread safety" and "ordinary thread safety". However, it sounds to me (and
I believe this is David's point as well) that what you are defining as
"ordinary thread safety" is NOT thread-safe at all. If it isn't safe to
use among multiple threads then how can it be called thread-safe?

Now, assuming that you have some "ordinary thread safety" that actually IS
thread-safe, then you would have no need for any notion of *strong* thread
safety because your "ordinary" thread safety already has all the properties
of the "strong" version.

GH
 
D

Dilip

Correct but irrelevant. I was not referring to the properties of a GC but,
rather, to the properties of the heap during run-time.




In the example I gave:

{
Bar bar;
f(bar);
// *
g();
}

At * there are zero references to "bar" but its reference count is one.

So what? In a GC-scenario bar might be unreachable at * if its a
reference type -- ok Good. that doesn't mean its about to be
collected. so in the ref-counting case, bar is alive till the end of
scope, in the GC case bar is alive until a GC can be triggered. in
both cases its just sitting there twiddling its thumbs. you are just
making a distinction without difference.
Incidentally, the need to update reference counts whilst unwinding the stack
as an exception propagates is another performance cost not present in
modern alternatives. For example, exceptions are ~6x faster in OCaml than
C++ (with g++).

Why are we suddenly talking about exceptions?
 
G

Gil Hamilton

shared_ptr honors the ordinary/basic thread-safety level.

OK. I think we're all violently agreeing except perhaps as to terminology.
If we are discussing "thread-safe accesses and manipulations of reference-
counted objects", which seems to have been the chief topic of this thread,
then certainly shared_ptr does not begin to provide that. And hence, for
the purposes of this discussion -- IMO -- it cannot be considered thread-
safe in any sense.

GH
 
P

Peter

The problem is in Release(). There is a short period of time where the
mutex is unlocked but the object is already doomed for destruction; if
another thread calls AddRef() during that time, that other thread now
has a pointer to garbage and doesn't know it.


I guess you have somewhere a weak pointer (in contrary to a hard
reference counted pointer)
which is going to be destroyed by the destructor of the object, am I
right?
E.g. you have some object factory which contains pointers to all
created objects in a map.
In this case you need to lock this map when calling AddRef() or
Release().
And of course looking up an entry in this map must also be locked.
 
D

David Schwartz

The quick answer is to use external locks. Solving this with locks is
trivial. So, how would you do it without using locks? Think about this for a
moment. thread_a cannot use 'refcount_acquire' because it could run into the
following race-condition:

Your question is nonsensical. A perfectly good answer to the
nonsensical question is "I would use ATOMIC_MAGIC(g_shared)" which
somehow makes it all work.

You get to invent magic ATOMIC operations that make everything work
and then say all I have are locks. Then you say I can't use locks?!
Wtf?
That is the race-condition which basic thread-safety simply cannot
address... Also, read this post:

That's why "basic thread-safety" is an optimization to be used when
appropriate and when there is proven need for such optimization. It is
like a car that cannot be driven at 3AM on Thursdays. A car should
work any time, and one that doesn't work in unusual but perfectly
reasonable situations should be used only in the exceptional case
where the benefits are known to outweigh the costs".

DS
 
T

Thomas J. Gritzan

Jon said:
We do use Linux but I'm not about to take Linus' advice about garbage
collection because he is a project coordinator for part of one of the OSs
we use.


Irrelevant. This was about garbage collection.

Irrelevant. This was about GC in kernel mode.

Tell us an OS thats written in OCaml or Haskell, or even uses GC in kernel
mode.
The quote was in context of OS kernel mode code. Not even Microsoft did that.
You realise Linux isn't "very popular" by any stretch of the imagination?

Huh?

fup2: c.p.t
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top