An accurate, pauseless & deterministic GC for C++

M

Mingnan G.

- I notice that the _hnxgc_assign_lptr function is made up of more than an
interlocked update, or simple distributed reference count increment. I
notice at multiple lines of code that makes calls into other object
functions made up of multiple lines of code themselves.

Yes, basically _hnxgc_assign_lptr is handling that a new reference is
assigned to a CLockedPtr pointer. This will include two operations
(so
you notice more than an interlocked update), the first one is to
decrement the
RC of the object that the pointer originally pointed to, the second
one is
to increment the RC of the object that the pointer will point to
after
assignment.

Note: assign a CLockedPtr to another CLockedPtr will not cause any
RC operation, so will not execute the _hnxgc_assign_lptr function.

I give an example, suppose two pointers: 'p1' points to 'A', 'p2'
points to 'B',
'p1' is a CLockedPtr pointer, then considering the following C++
statement:

p1 = p2;

(1) if 'p2' is alos a CLockedPtr pointer, then there is no RC
operation, no
calling to _hnxgc_assign_lptr function.

(2) if 'p2' is other type of pointer, such as CWeakPtr, CMemberPtr,
then the
_hnxgc_assign_lptr is called. It will complete following works:
a. increment the RC of object 'B';
b. decrement the RC of object 'A'.
c. set the pointer 'p1' pointing to 'B';
- I notice that the _hnxgc_assign_mptr function has more lines than
_hnxgc_assign_lptr, and it does make calls to functions made up of multiple
lines of code.

the situation _hnxgc_assign_mptr is like _hnxgc_assign_lptr. It also
include a
RC decrement and a RC increment operation.

- Mingnan Guo
 
C

Chris Thomasson

Mingnan G. said:
- I notice that the _hnxgc_assign_lptr function is made up of more than
an
interlocked update, or simple distributed reference count increment. I
notice at multiple lines of code that makes calls into other object
functions made up of multiple lines of code themselves.
[...]

Note: assign a CLockedPtr to another CLockedPtr will not cause any
RC operation, so will not execute the _hnxgc_assign_lptr function.

I give an example, suppose two pointers: 'p1' points to 'A', 'p2'
points to 'B',
'p1' is a CLockedPtr pointer, then considering the following C++
statement:

p1 = p2;

(1) if 'p2' is alos a CLockedPtr pointer, then there is no RC
operation, no
calling to _hnxgc_assign_lptr function.

(2) if 'p2' is other type of pointer, such as CWeakPtr, CMemberPtr,
then the
_hnxgc_assign_lptr is called. It will complete following works:
a. increment the RC of object 'B';
b. decrement the RC of object 'A'.
c. set the pointer 'p1' pointing to 'B';
- I notice that the _hnxgc_assign_mptr function has more lines than
_hnxgc_assign_lptr, and it does make calls to functions made up of
multiple
lines of code.

the situation _hnxgc_assign_mptr is like _hnxgc_assign_lptr. It also
include a
RC decrement and a RC increment operation.

Thank you for that information Mingnan as it help me understand what's going
on much better. Okay, so I think what you are saying is that CLockedPtr
provides the strong thread-safety level and the following pseudo-code would
be legal if the refcount object was replaced with CLockedPtr:
_____________
static refcount<foo> g_foo(new foo);

threads_a_x() {
for(;;) {
refcount<foo> l_foo(g_foo);
if (l_foo) {
l_foo->foo();
}
}
}


threads_y_z() {
for(;;) {
refcount<foo> l_foo(new foo);
g_foo = l_foo;
}

}
______________


This example will not work if the implementation of refcount does not have
the strong thread-safety guarantee. Refer to the following thread for far
more details of what I am getting at here:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e5167941d32340c6


Can I do lock-free programming with your smart-pointers? Refer to the
following post for more info:

http://groups.google.com/group/comp.lang.c++/msg/eada4a93d932430e



Is that a legal programming style within the scope of your GC?
 
M

Mingnan G.

Thank you for that information Mingnan as it help me understand what's going
on much better. Okay, so I think what you are saying is that CLockedPtr
provides the strong thread-safety level and the following pseudo-code would
be legal if the refcount object was replaced with CLockedPtr:
_____________
static refcount<foo> g_foo(new foo);

threads_a_x() {
for(;;) {
refcount<foo> l_foo(g_foo);
if (l_foo) {
l_foo->foo();
}
}

}

threads_y_z() {
for(;;) {
refcount<foo> l_foo(new foo);
g_foo = l_foo;
}

}

______________

This example will not work if the implementation of refcount does not have
the strong thread-safety guarantee. Refer to the following thread for far
more details of what I am getting at here:

http://groups.google.com/group/comp.programming.threads/browse_frm/th...

Can I do lock-free programming with your smart-pointers? Refer to the
following post for more info:

http://groups.google.com/group/comp.lang.c++/msg/eada4a93d932430e

Is that a legal programming style within the scope of your GC?- Hide quoted text -

- Show quoted text -

Yes, I think it is okay for this example. But lock-free programming in
HnxGC application should be with very cautions, the lock-free (non-
blocking) feature of HnxGC just means that pointer assignment in
application mutator threads is lock-free with backgound concurrently
running collector thread. Correct lock-free mechanism *between
application threads* as in this example is the responsibility of
application program not HnxGC collector. We don't provide extra lock-
free mechanism above traditional C/C++ programming for application.
Accessing a shared resource normally should under some synchronization
mechanism protection, e.g. mutex, unless you deliberately act without
lock for performance.

For illustration, I modified your example a little as following:

--------------------------------------------------

// atomically shared with multiple threads (it should be on nature-
boundary address, e.g. 4 byte for 32-bit platform)
static void * volatile g_foo = NULL;

threads_a_x() {
for(;;) {
lp<CFoo> l_foo;

l_foo.attach(g_foo); // read atomically from 'g_foo'

if (l_foo) {
l_foo->foo();
}
}
}

threads_y_z() {
for(;;) {
lp<CFoo> l_foo = gcnew CFoo;
void *p = l_foo.detach();

// write to 'g_foo' with racing
if(atomicCompareAndSwap(g_foo, 0, p) != 0) {
// failed, restore to 'l_foo' to reclaim this object
l_foo.attach(p);
}
}
}

-----------------------------------------

Actually, if an application maintains consistency of reference
relationship before and after an reference operation, the operation
can be invisible to HnxGC collector and it is free to use any multi-
threading mechanisms as you like. The "moving reference" mechanism in
HnxGC Smart Pointer is also an example, refer to the following link
for more details:

http://hnxgc.harnixworld.com/algo_refcount.htm ,

and

"Assignment between 2 CLockedPtr" in http://hnxgc.harnixworld.com/prog_b03.htm
 
D

Dmitriy Vyukov

(sorry for the repost but it probably
would help if I actually included
c.p.t)
[...]
CC'ing c.p.t
I don't know what the GlobalMemoryFence is yet since that part is not
filled in yet. Generally with shared pointers you need acquire/release
semantics which can't be elided.
[...]
I took a quick look at the code. AFAICT, he creates a thread
per-processor
and binds it to that processor.
So if your running an a 64-processor system, he creates:
Thread 1 - Bound To CPU 1
Thread 2 - Bound To CPU 2
Thread 3 - Bound To CPU 3
Thread 4 - Bound To CPU 4
Thread 5 - Bound To CPU 5
Thread 6 - Bound To CPU 6
[and on and on to 64]
Those threads basically sit on an event and wait. When they are awoken,
they
execute a membar and atomically decrement a counter and check to see if
it
dropped to zero, which means that every "CPU" thread has executed it.
AFAICT, this is exactly the same as the synchronize_rcu() function
implemented with a daemon per-cpu. So, there is really nothing new here
at
all. I think he is going to have a problem with prior are for sure.
As for the smart pointer reference counting, I haven't looked yet as I
was
mainly interested in GlobalMemoryFence (a.k.a, synchronize_rcu()).
Any thoughts?
synchronize_rcu is for locking(lock-free),

synchronize_rcu() produces the exact same overall effect as
GlobalMemoryFence. When the daemon run on each CPU, it constitutes a global
memory barrier. The well known technique is not tied to just the writer-side
of a non-blocking reader-pattern. You can use this for other things. For
instance, I could implement the reference counting algorithm you use with
synchronize_rcu(), no problem.
GlobalMemoryFence is for
memory semantics for MP.
Totally two different things.

I believe that they do the same thing. BTW, have you looked at Joe Seighs
prior art? He has a more scaleable method of extracting "global" quiescent
states:

http://atomic-ptr-plus.sourceforge.net

I had done some in-house work on this in an experimental setting so luckily
I knew how to sketch out a solution to Joe Seighs challenge to show a
Windows implementation of a "global" memory fence:

http://groups.google.com/group/comp.programming.threads/browse_frm/th...

I say that those are more scaleable because you don't have to create a
thread per-cpu, and issue a global broadcast. I don't think that would go
over well with a distributed NUMA system with a shitload of processors. I
could make your system work on a NUMA, however you would have to change your
API to allow for thread grouping.


BTW, did you see this function:
FlushProcessWriteBuffers()
http://msdn2.microsoft.com/en-us/library/ms683148.aspx

MSDN contains only following indistinct description:
"The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It
guarantees the visibility of write operations performed on one
processor to the other processors."

I can't understand what it means. And one will find nothing about this
function in the whole web - just few links to msdn. But I think this
is very promising - just generate IPI to all other processors, and
they execute membar straight away.


Dmitriy V'jukov
 
C

Chris Thomasson

[...]
BTW, did you see this function:
FlushProcessWriteBuffers()
http://msdn2.microsoft.com/en-us/library/ms683148.aspx

MSDN contains only following indistinct description:
"The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It
guarantees the visibility of write operations performed on one
processor to the other processors."

I can't understand what it means. And one will find nothing about this
function in the whole web - just few links to msdn. But I think this
is very promising - just generate IPI to all other processors, and
they execute membar straight away.

Humm. Well, it seems as if it would work. I think it might be analog of the
GlobalMemoryFence (aka, synchronize_rcu) used in Mingnan's GC.
 
J

Joe Seigh

Dmitriy said:
BTW, did you see this function:
FlushProcessWriteBuffers()
http://msdn2.microsoft.com/en-us/library/ms683148.aspx

MSDN contains only following indistinct description:
"The function generates an interprocessor interrupt (IPI) to all
processors that are part of the current process affinity. It
guarantees the visibility of write operations performed on one
processor to the other processors."

I can't understand what it means. And one will find nothing about this
function in the whole web - just few links to msdn. But I think this
is very promising - just generate IPI to all other processors, and
they execute membar straight away.

McKenny has a patent application for essentially the same thing. Also
one of the patents I did on an early form of something similar to RCU
cites a patent issued in 1976 which discloses a method by MVS to cause
all processors to purge their TLB's by issuing signal processor instructions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top