Thread problem (constructor+assignement in one step?)

Y

yuraukar

I am trying to create a garbage collection class in C++ to collect
instances of a specific class (so no general gc). The approach is to
use smart pointers and a mark and a simple sweep gc.

smartptr<B> pB;
pB = new B(args);

The constructor of B has been adjusted to register new instances with
the garbage collector. The constructor of smartptr has also been
adjusted to register new instances of smart pointers with the garbage
collector.
To collect, the gc now iterates all known smart pointers and marks the
referenced objects as active. After this, it iterates all known objects
and deletes all thhose that have not been marked active. The gc will be
called explicitly.

This works fine so far when using a single thread. However, when using
multiple threads, there is a problem.
The assignement above is essentially two steps. First create the object,
then do the assignement. What if between these two steps another thread
runs the gc? It will find the new created object not (yet) referenced
and delete it.

What would be the proper solution to prevent this? At a first thought,
you might come to
gc.lock();
pB = new B(args);
gc.unlock();

But then consider
B* f(...)
{
...
return new B(args);
}

...
pB = f(...);

Here, construction and assignement are not on one line. Thus using the
lock strategy will not work. Wrapping the whole function call with
lock/unlock will result in bad performance.

(In fact, the reason of using a gc like this is a little more
complicated. At the bottom line: no I cannot move to reference counting
or one of the smart pointer implementations).

I am looking for any suggestions.
 
D

David Hopwood

yuraukar said:
I am trying to create a garbage collection class in C++ to collect
instances of a specific class (so no general gc). The approach is to
use smart pointers and a mark and a simple sweep gc.

smartptr<B> pB;
pB = new B(args);

The constructor of B has been adjusted to register new instances with
the garbage collector. The constructor of smartptr has also been
adjusted to register new instances of smart pointers with the garbage
collector.

Perhaps put a flag in instances of B that indicates whether the instance has
been assigned to any smartptr yet, and change smartptr<B>::eek:perator= to set
this flag. Then don't collect instances of B until they have been assigned.

Of course real GCs avoid this problem by scanning the stack and registers,
and by using GC safe-points.
 
T

Tom Widmer

I am trying to create a garbage collection class in C++ to collect
instances of a specific class (so no general gc). The approach is to
use smart pointers and a mark and a simple sweep gc.

smartptr<B> pB;
pB = new B(args);

Are you sure it is sensible to offer direct assignment of a pointer to
a smart pointer? You probably should make the constructor explicit.
The constructor of B has been adjusted to register new instances with
the garbage collector.

Isn't that the problem? Why does B have to know about the garbage
collector?

The constructor of smartptr has also been
adjusted to register new instances of smart pointers with the garbage
collector.

Why doesn't it also register the passed pointer?
I am looking for any suggestions.

Seems like if you do the above, your problem is solved. Alternatively,
you could create a factory function to create B objects that returns a
smartptr<B>.

Tom
 
J

Joe Seigh

David said:
Perhaps put a flag in instances of B that indicates whether the instance has
been assigned to any smartptr yet, and change smartptr<B>::eek:perator= to set
this flag. Then don't collect instances of B until they have been assigned.

Of course real GCs avoid this problem by scanning the stack and registers,
and by using GC safe-points.

What's a GC safe-point (I sort of have an idea) and how do they interact with
the GC? I know how to do concurrent GC but I'm wondering what the conventional
approach is.

Joe Seigh
 
P

Peter van Merkerk

yuraukar said:
I am trying to create a garbage collection class in C++ to collect
instances of a specific class (so no general gc). The approach is to
use smart pointers and a mark and a simple sweep gc.

smartptr<B> pB;
pB = new B(args);

The constructor of B has been adjusted to register new instances with
the garbage collector. The constructor of smartptr has also been
adjusted to register new instances of smart pointers with the garbage
collector.
To collect, the gc now iterates all known smart pointers and marks the
referenced objects as active. After this, it iterates all known objects
and deletes all thhose that have not been marked active. The gc will be
called explicitly.

This works fine so far when using a single thread. However, when using
multiple threads, there is a problem.
The assignement above is essentially two steps. First create the object,
then do the assignement. What if between these two steps another thread
runs the gc? It will find the new created object not (yet) referenced
and delete it.

What would be the proper solution to prevent this? At a first thought,
you might come to
gc.lock();
pB = new B(args);
gc.unlock();

But then consider
B* f(...)
{
...
return new B(args);
}

...
pB = f(...);

Here, construction and assignement are not on one line.

Even if it were on a single line that does not guarantee that everything
on that line will be executed atomically. In fact there is no way in C++
to insure that (even simple) statements are not interrupted during their
execution. Even a simple assignment can be translated into several
machine code instructions, which means another thread can interrupt even
halfway during the assignment. You will need to you the synchronisation
mechanisms offered by the OS you are using.

Maybe you are better of by marking inactive objects as opposed to
marking active objects.
 
Y

yuraukar

Even if it were on a single line that does not guarantee that everything
on that line will be executed atomically. In fact there is no way in C++
to insure that (even simple) statements are not interrupted during their
execution. Even a simple assignment can be translated into several
machine code instructions, which means another thread can interrupt even
halfway during the assignment. You will need to you the synchronisation
mechanisms offered by the OS you are using.

I am aware of that. My point was that in the example above, there is no
way of wrapping the line with a gc.lock()/gc.unlock() call.
Maybe you are better of by marking inactive objects as opposed to
marking active objects.

And how would this solve my problem? As far as I see, the issue is that
the new created instance is not referenced at the time the gc is run
because it is in some intermediate state (not yet assigned).
 
Y

yuraukar

David Hopwood wrote:

Perhaps put a flag in instances of B that indicates whether the instance
has
been assigned to any smartptr yet, and change smartptr<B>::eek:perator= to set
this flag. Then don't collect instances of B until they have been assigned.
And if for some reason, the instance is never assigned, the gc will
never collect it...
 
Y

yuraukar

Tom Widmer wrote:

Are you sure it is sensible to offer direct assignment of a pointer to
a smart pointer? You probably should make the constructor explicit.
Not sure what you mean here - what would this look like?
Isn't that the problem? Why does B have to know about the garbage
collector?
Reason is that I want to do gc for a single class only. As far as I see
it any mark-and-sweep gc needs to know or be able to find out (a) all
instances and (b) all references to these.

General purpose GCs try to find out about (a) and (b) by scanning heap,
stack, registers, etc. This seems complicated, OS and probably
machine-dependent.
Given my constraints, I could as well actively maintain the two lists.
This is what I have done.
The constructor of smartptr has also been



Why doesn't it also register the passed pointer?
The smart pointer registers itself, it contains a pointer to the
instance. When the GC is run, it can ask the smart pointer to which
object it points, so no need to change the registration with each
assignement.
Seems like if you do the above, your problem is solved. Alternatively,
you could create a factory function to create B objects that returns a
smartptr<B>.
Factory function seems like a solution to me. I didn't understand the
other solution.
 
T

Tom Widmer

Tom Widmer wrote:


Not sure what you mean here - what would this look like?

explicit smartptr(T* ptr) //explicit added to signature
:m_ptr(ptr)
{
registerMe();
registerPtr();
}

Now you can't write:

smartptr<B> ptr = new B();

but instead you might write
smartptr said:
Reason is that I want to do gc for a single class only. As far as I see
it any mark-and-sweep gc needs to know or be able to find out (a) all
instances and (b) all references to these.

Yes, but the B class doesn't necessarily need to inform the GC of its
existence. Instead, the smart pointer constructor can register the B
instances that it controls. My "registerPtr()" call above is meant to
do that.
General purpose GCs try to find out about (a) and (b) by scanning heap,
stack, registers, etc. This seems complicated, OS and probably
machine-dependent.
Given my constraints, I could as well actively maintain the two lists.
This is what I have done.
Ok.

The smart pointer registers itself, it contains a pointer to the
instance. When the GC is run, it can ask the smart pointer to which
object it points, so no need to change the registration with each
assignement.

But the first time a "raw" pointer is put under the control of a smart
pointer, isn't that the best time to register that "raw" pointer with
the GC system?
Factory function seems like a solution to me. I didn't understand the
other solution.

Basically, "B" doesn't attempt to register with the GC. Instead, the
constructor of smartptr that takes a B* will register that B* with the
GC system. This suggests that your usage should always be:

smartptr<B> ptr(new B); //ptr constructor registers ptr *and* new B.
and you should generally avoid:
B* ptr = new B;

This may not be appropriate, depending on whether Bs are always
created and immediate used to initialize a smartptr<B>.

Tom
 
T

Tom Widmer

David Hopwood wrote:


And if for some reason, the instance is never assigned, the gc will
never collect it...

Is this ever going to happen (assuming you are writing the code)?
Isn't it what you want in any case? e.g.

B b; //don't want this to be collected!
smartptr<B> ptr(new B); //do want this to be collected.

Why can't you use reference counting? A summary of your usage of both
the raw B pointers and the smartptr<B> classes would help point to a
solution...

Tom
 
T

Tom Widmer

David Hopwood wrote:


And if for some reason, the instance is never assigned, the gc will
never collect it...

Is this ever going to happen (assuming you are writing the code)?
Isn't it what you want in any case? e.g.

B b; //don't want this to be collected!
smartptr<B> ptr(new B); //do want this to be collected.

Why can't you use reference counting? A summary of your usage of both
the raw B pointers and the smartptr<B> classes would help point to a
solution...

Tom
 
T

Tom Widmer

Why is the latter preferable?

That particular syntax isn't particularly preferable, rather it is the
prevention of implicit conversions from pointers to smartpointers in
general that is preferable, since you can end up accidentally
assigning a pointer to a smart pointer that you didn't intend to. e.g.

void f(smartptr<B> p);

B b;
f(&b); //whoops.

Tom
 
J

Joe Seigh

Tom said:
That particular syntax isn't particularly preferable, rather it is the
prevention of implicit conversions from pointers to smartpointers in
general that is preferable, since you can end up accidentally
assigning a pointer to a smart pointer that you didn't intend to. e.g.

void f(smartptr<B> p);

B b;
f(&b); //whoops.

Tom
 
J

Joe Seigh

Tom said:
That particular syntax isn't particularly preferable, rather it is the
prevention of implicit conversions from pointers to smartpointers in
general that is preferable, since you can end up accidentally
assigning a pointer to a smart pointer that you didn't intend to. e.g.

void f(smartptr<B> p);

B b;
f(&b); //whoops.

You mean assigning the same object to two separate smart pointers. That
doesn't really solve the problem. You have to resort to the factory pattern
which makes the ctor private and sort of makes templates somewhat less useful
since you have to do some custom coding for each class you want to use with
smart pointers. It's a C++ problem which won't ever get fixed since you can
sort of do pointer abstraction in C++, just not very well. You can contrive
all you want but it won't fix it and it won't get fixed.

Joe Seigh
 
Y

yuraukar

Tom said:
Is this ever going to happen (assuming you are writing the code)?
Isn't it what you want in any case? e.g.

B b; //don't want this to be collected!
smartptr<B> ptr(new B); //do want this to be collected.

Why can't you use reference counting? A summary of your usage of both
the raw B pointers and the smartptr<B> classes would help point to a
solution...

In fact my current implementation has been using ref counting and ran
into serious problems when circular references had to be supported. So
using a GC should resolve the problem.

In fact, instances of B could have references to other instanced of B
and so forth. Circular references are allowed. Look at this as a group.
Now as long as any thread holds a pointer to an instance in a group, the
group should not be deleted.
I tried introducing the group concept as well. Did ref counting on the
group, but then... merging groups would be an issue as soon as you
construct a reference from an instance in one group to another.
At the bottom line: GC seems to be like the solution, provided I can
find a good GC implementation.

The GC holds a list of B's and smart pointers. Each B can reference
other B's with normal pointers. The list of referenced B's can be
obtained with something like B->getReferencedBs().
The m&s GC interates the smart pointers, marks the B's, marks B's
referenced by these B's etc. At the end it deletes all not marked B's.

So I have two pointers in my code: smart pointers and normal pointers
(weak pointers).

Your other post recommending that constructor of B does not register the
instance with the GC, but the smart pointer does sounds good. The
disadvantage would be that B's never associated with smart pointers
would never be seen by the GC.

I will think of that.
 
T

Tom Widmer

In fact my current implementation has been using ref counting and ran
into serious problems when circular references had to be supported. So
using a GC should resolve the problem.

You can use a weak_ptr style object to break cycles. Using
boost::shared_ptr and boost::weak_ptr may to be less error prone and a
lot more efficient than writing your own GC implementation. However, I
agree that reference counting isn't always appropriate.
In fact, instances of B could have references to other instanced of B
and so forth. Circular references are allowed. Look at this as a group.
Now as long as any thread holds a pointer to an instance in a group, the
group should not be deleted.
I tried introducing the group concept as well. Did ref counting on the
group, but then... merging groups would be an issue as soon as you
construct a reference from an instance in one group to another.
At the bottom line: GC seems to be like the solution, provided I can
find a good GC implementation.

The GC holds a list of B's and smart pointers. Each B can reference
other B's with normal pointers. The list of referenced B's can be
obtained with something like B->getReferencedBs().

Ahh, this is why B must be an integral part of the GC mechanism.
The m&s GC interates the smart pointers, marks the B's, marks B's
referenced by these B's etc. At the end it deletes all not marked B's.

Sounds reasonable.
So I have two pointers in my code: smart pointers and normal pointers
(weak pointers).

Your other post recommending that constructor of B does not register the
instance with the GC, but the smart pointer does sounds good. The
disadvantage would be that B's never associated with smart pointers
would never be seen by the GC.

I will think of that.

It does solve your original problem at least. I suppose you could also
provide an explicit method to register a B with the gc. Obviously you
would only call that when a B was safely referenceable by the GC, due
to be being referenced by another B. Alternatively, it could be used
as a kind of delayed delete call.

Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,140
Latest member
SweetcalmCBDreview
Top