An accurate, pauseless & deterministic GC for C++

Mingnan G. · Nov 28, 2007

- I notice that the _hnxgc_assign_lptr function is made up of more than an
interlocked update, or simple distributed reference count increment. I
notice at multiple lines of code that makes calls into other object
functions made up of multiple lines of code themselves.

Yes, basically _hnxgc_assign_lptr is handling that a new reference is
assigned to a CLockedPtr pointer. This will include two operations
(so
you notice more than an interlocked update), the first one is to
decrement the
RC of the object that the pointer originally pointed to, the second
one is
to increment the RC of the object that the pointer will point to
after
assignment.

Note: assign a CLockedPtr to another CLockedPtr will not cause any
RC operation, so will not execute the _hnxgc_assign_lptr function.

I give an example, suppose two pointers: 'p1' points to 'A', 'p2'
points to 'B',
'p1' is a CLockedPtr pointer, then considering the following C++
statement:

p1 = p2;

(1) if 'p2' is alos a CLockedPtr pointer, then there is no RC
operation, no
calling to _hnxgc_assign_lptr function.

(2) if 'p2' is other type of pointer, such as CWeakPtr, CMemberPtr,
then the
_hnxgc_assign_lptr is called. It will complete following works:
a. increment the RC of object 'B';
b. decrement the RC of object 'A'.
c. set the pointer 'p1' pointing to 'B';

- I notice that the _hnxgc_assign_mptr function has more lines than
_hnxgc_assign_lptr, and it does make calls to functions made up of multiple
lines of code.

the situation _hnxgc_assign_mptr is like _hnxgc_assign_lptr. It also
include a
RC decrement and a RC increment operation.

- Mingnan Guo

Chris Thomasson · Nov 29, 2007

Mingnan G. said:
- I notice that the _hnxgc_assign_lptr function is made up of more than
an
interlocked update, or simple distributed reference count increment. I
notice at multiple lines of code that makes calls into other object
functions made up of multiple lines of code themselves.

Click to expand...

[...]

Note: assign a CLockedPtr to another CLockedPtr will not cause any
RC operation, so will not execute the _hnxgc_assign_lptr function.

I give an example, suppose two pointers: 'p1' points to 'A', 'p2'
points to 'B',
'p1' is a CLockedPtr pointer, then considering the following C++
statement:

p1 = p2;

(1) if 'p2' is alos a CLockedPtr pointer, then there is no RC
operation, no
calling to _hnxgc_assign_lptr function.

(2) if 'p2' is other type of pointer, such as CWeakPtr, CMemberPtr,
then the
_hnxgc_assign_lptr is called. It will complete following works:
a. increment the RC of object 'B';
b. decrement the RC of object 'A'.
c. set the pointer 'p1' pointing to 'B';

- I notice that the _hnxgc_assign_mptr function has more lines than
_hnxgc_assign_lptr, and it does make calls to functions made up of
multiple
lines of code.

Click to expand...

the situation _hnxgc_assign_mptr is like _hnxgc_assign_lptr. It also
include a
RC decrement and a RC increment operation.

Thank you for that information Mingnan as it help me understand what's going
on much better. Okay, so I think what you are saying is that CLockedPtr
provides the strong thread-safety level and the following pseudo-code would
be legal if the refcount object was replaced with CLockedPtr:
_____________
static refcount<foo> g_foo(new foo);

threads_a_x() {
for(;

{
refcount<foo> l_foo(g_foo);
if (l_foo) {
l_foo->foo();
}
}
}

threads_y_z() {
for(;

{
refcount<foo> l_foo(new foo);
g_foo = l_foo;
}

}
______________

This example will not work if the implementation of refcount does not have
the strong thread-safety guarantee. Refer to the following thread for far
more details of what I am getting at here:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e5167941d32340c6

Can I do lock-free programming with your smart-pointers? Refer to the
following post for more info:

http://groups.google.com/group/comp.lang.c++/msg/eada4a93d932430e

Is that a legal programming style within the scope of your GC?

Mingnan G. · Nov 30, 2007

Thank you for that information Mingnan as it help me understand what's going
on much better. Okay, so I think what you are saying is that CLockedPtr
provides the strong thread-safety level and the following pseudo-code would
be legal if the refcount object was replaced with CLockedPtr:
_____________
static refcount<foo> g_foo(new foo);

threads_a_x() {
for(; {
refcount<foo> l_foo(g_foo);
if (l_foo) {
l_foo->foo();
}
}

}

threads_y_z() {
for(; {
refcount<foo> l_foo(new foo);
g_foo = l_foo;
}

}

______________

This example will not work if the implementation of refcount does not have
the strong thread-safety guarantee. Refer to the following thread for far
more details of what I am getting at here:

http://groups.google.com/group/comp.programming.threads/browse_frm/th...

Can I do lock-free programming with your smart-pointers? Refer to the
following post for more info:

http://groups.google.com/group/comp.lang.c++/msg/eada4a93d932430e

Is that a legal programming style within the scope of your GC?- Hide quoted text -

- Show quoted text -

Yes, I think it is okay for this example. But lock-free programming in
HnxGC application should be with very cautions, the lock-free (non-
blocking) feature of HnxGC just means that pointer assignment in
application mutator threads is lock-free with backgound concurrently
running collector thread. Correct lock-free mechanism *between
application threads* as in this example is the responsibility of
application program not HnxGC collector. We don't provide extra lock-
free mechanism above traditional C/C++ programming for application.
Accessing a shared resource normally should under some synchronization
mechanism protection, e.g. mutex, unless you deliberately act without
lock for performance.

For illustration, I modified your example a little as following:

--------------------------------------------------

// atomically shared with multiple threads (it should be on nature-
boundary address, e.g. 4 byte for 32-bit platform)
static void * volatile g_foo = NULL;

threads_a_x() {
for(;

{
lp<CFoo> l_foo;

l_foo.attach(g_foo); // read atomically from 'g_foo'

if (l_foo) {
l_foo->foo();
}
}
}

threads_y_z() {
for(;

{
lp<CFoo> l_foo = gcnew CFoo;
void *p = l_foo.detach();

// write to 'g_foo' with racing
if(atomicCompareAndSwap(g_foo, 0, p) != 0) {
// failed, restore to 'l_foo' to reclaim this object
l_foo.attach(p);
}
}
}

-----------------------------------------

Actually, if an application maintains consistency of reference
relationship before and after an reference operation, the operation
can be invisible to HnxGC collector and it is free to use any multi-
threading mechanisms as you like. The "moving reference" mechanism in
HnxGC Smart Pointer is also an example, refer to the following link
for more details:

http://hnxgc.harnixworld.com/algo_refcount.htm ,

and

"Assignment between 2 CLockedPtr" in http://hnxgc.harnixworld.com/prog_b03.htm

Dmitriy Vyukov · Dec 4, 2007

(sorry for the repost but it probably
would help if I actually included
c.p.t)
[...]
CC'ing c.p.t
I don't know what the GlobalMemoryFence is yet since that part is not
filled in yet. Generally with shared pointers you need acquire/release
semantics which can't be elided.
[...]
I took a quick look at the code. AFAICT, he creates a thread
per-processor
and binds it to that processor.
So if your running an a 64-processor system, he creates:
Thread 1 - Bound To CPU 1
Thread 2 - Bound To CPU 2
Thread 3 - Bound To CPU 3
Thread 4 - Bound To CPU 4
Thread 5 - Bound To CPU 5
Thread 6 - Bound To CPU 6
[and on and on to 64]
Those threads basically sit on an event and wait. When they are awoken,
they
execute a membar and atomically decrement a counter and check to see if
it
dropped to zero, which means that every "CPU" thread has executed it.
AFAICT, this is exactly the same as the synchronize_rcu() function
implemented with a daemon per-cpu. So, there is really nothing new here
at
all. I think he is going to have a problem with prior are for sure.
As for the smart pointer reference counting, I haven't looked yet as I
was
mainly interested in GlobalMemoryFence (a.k.a, synchronize_rcu()).
Any thoughts?

Click to expand...

Click to expand...

synchronize_rcu is for locking(lock-free),

Click to expand...

synchronize_rcu() produces the exact same overall effect as
GlobalMemoryFence. When the daemon run on each CPU, it constitutes a global
memory barrier. The well known technique is not tied to just the writer-side
of a non-blocking reader-pattern. You can use this for other things. For
instance, I could implement the reference counting algorithm you use with
synchronize_rcu(), no problem.

GlobalMemoryFence is for
memory semantics for MP.
Totally two different things.

Click to expand...

I believe that they do the same thing. BTW, have you looked at Joe Seighs
prior art? He has a more scaleable method of extracting "global" quiescent
states:

http://atomic-ptr-plus.sourceforge.net

I had done some in-house work on this in an experimental setting so luckily
I knew how to sketch out a solution to Joe Seighs challenge to show a
Windows implementation of a "global" memory fence:

http://groups.google.com/group/comp.programming.threads/browse_frm/th...

I say that those are more scaleable because you don't have to create a
thread per-cpu, and issue a global broadcast. I don't think that would go
over well with a distributed NUMA system with a shitload of processors. I
could make your system work on a NUMA, however you would have to change your
API to allow for thread grouping.

BTW, did you see this function:
FlushProcessWriteBuffers()
http://msdn2.microsoft.com/en-us/library/ms683148.aspx

MSDN contains only following indistinct description:
"The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It
guarantees the visibility of write operations performed on one
processor to the other processors."

I can't understand what it means. And one will find nothing about this
function in the whole web - just few links to msdn. But I think this
is very promising - just generate IPI to all other processors, and
they execute membar straight away.

Dmitriy V'jukov

Chris Thomasson · Dec 4, 2007

[...]

BTW, did you see this function:
FlushProcessWriteBuffers()
http://msdn2.microsoft.com/en-us/library/ms683148.aspx

MSDN contains only following indistinct description:
"The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It
guarantees the visibility of write operations performed on one
processor to the other processors."

I can't understand what it means. And one will find nothing about this
function in the whole web - just few links to msdn. But I think this
is very promising - just generate IPI to all other processors, and
they execute membar straight away.

Humm. Well, it seems as if it would work. I think it might be analog of the
GlobalMemoryFence (aka, synchronize_rcu) used in Mingnan's GC.

Joe Seigh · Dec 4, 2007

Dmitriy said:
BTW, did you see this function:
FlushProcessWriteBuffers()
http://msdn2.microsoft.com/en-us/library/ms683148.aspx

MSDN contains only following indistinct description:
"The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It
guarantees the visibility of write operations performed on one
processor to the other processors."

I can't understand what it means. And one will find nothing about this
function in the whole web - just few links to msdn. But I think this
is very promising - just generate IPI to all other processors, and
they execute membar straight away.

McKenny has a patent application for essentially the same thing. Also
one of the patents I did on an early form of something similar to RCU
cites a patent issued in 1976 which discloses a method by MVS to cause
all processors to purge their TLB's by issuing signal processor instructions.

An accurate, pauseless & deterministic GC for C++	5	Feb 13, 2008
A garbage collector for C++	13	Apr 21, 2006
GC libraries for C++	1	Nov 27, 2006
Messing with the GC	7	Jan 19, 2013
For an online game.	0	Apr 30, 2020
Can anyone code this for me ?	1	Dec 6, 2024
Seeking co-founders for my company.	3	Sep 8, 2024
Programming D. E. Knuth in Python with the Deterministic Finite Automatonconstruct	12	Mar 17, 2012

An accurate, pauseless & deterministic GC for C++

Mingnan G.

Chris Thomasson

Mingnan G.

Dmitriy Vyukov

Chris Thomasson

Joe Seigh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads