A garbage collector for C++

Discussion in 'C++' started by Mingnan G., Apr 22, 2006.

  1. Mingnan G.

    Mingnan G. Guest

    Hello everyone.

    I have written a garbage collector for standard C++ application. It has
    following main features.

    1) Deterministic Finalization
    Providing deterministic finalization, the system can manage resources
    as well as objects. The programming style is clear and easy, conforming
    to RAII (Resource Acquisition Is Initialization) idiom of C++
    programmers. The memory usage is very efficient, acyclic garbage is
    reclaimed when the last reference to it is removed. A well-designed
    application, which eliminates cyclic data structure, does not need
    expensive garbage collection and is always running with minimum memory
    usage.

    2) Accurate GC for C++
    It is a fully accurate tracing garbage collector. All garbage objects
    are identified by the system, no conservative stack frame guessing.
    Fully C++ optimization compiler support.

    3) No Pause (less than 1us)
    In this system, all application codes automatically become fully
    interruptible and GC-Safe. Therefore, scavenge can start at any place
    without rendezvous requirement. A special concurrent tracing garbage
    collector successfully evades root-set scanning, and does not cause
    suspension of any thread at all. In the worst racing case, the latency
    is less than one microsecond (not millisecond). It is very satisfied
    for real-time systems.

    4) Small Overhead
    The runtime cost of application threads is far less than a normal
    reference counting. If there is no concurrently running scavenging
    action, there is no write-barrier overhead, no strong memory ordering
    requirement, no synchronization overhead. There is no extra code or
    data structure injected for GC safe point. The whole system does not
    require strong memory ordering, it is suitable for most modern
    processor architecture. Multi-processor concurrency can be further
    exploited by the multi-threading property of mutator and collector.

    5) Compatible Object Model
    As well as conventional C++, the system supports multiple-inheritance,
    object as member variables, and object arrays. Support C++ raw pointer,
    unions, bit-fields and hidden pointers. Support C++ templates. Support
    native object and tracing.

    6) Widely Portable
    Even conducting an accurate tracing, the system does not require any
    special information from compiler. Application can use any standard C++
    compiler, such as Visual C++ 8 and GCC. There is no special platform
    requirement, such as Win32 system call: SuspendThread, GetWriteWatch,
    etc. Even virtual memory support is not necessary. Thus, it can be
    ported to a wider area.

    Does anyone have any interest in this system? Any comments are welcome!

    Mingnan G.
     
    Mingnan G., Apr 22, 2006
    #1
    1. Advertising

  2. On 21 Apr 2006 17:45:03 -0700, "Mingnan G." <>
    wrote:

    >Hello everyone.
    >
    >I have written a garbage collector for standard C++ application. It has
    >following main features.
    >
    >1) Deterministic Finalization
    >Providing deterministic finalization, the system can manage resources
    >as well as objects. The programming style is clear and easy, conforming
    >to RAII (Resource Acquisition Is Initialization) idiom of C++
    >programmers. The memory usage is very efficient, acyclic garbage is
    >reclaimed when the last reference to it is removed.


    This probably means some kind of ref-counted "smart" pointer. With
    real pointers and new-ed objects you can hardly get RAII??

    >Does anyone have any interest in this system? Any comments are welcome!


    Unless you want to sell your GC as a commercial product you could just
    post links to download and documentation.

    Best regards,
    Roland Pibinger
     
    Roland Pibinger, Apr 22, 2006
    #2
    1. Advertising

  3. "Mingnan G." <> wrote in message
    news:...
    > Hello everyone.
    >
    > I have written a garbage collector for standard C++ application. It has
    > following main features.


    [...]

    > 4) Small Overhead
    > The runtime cost of application threads is far less than a normal
    > reference counting.


    I posted something on amortizing the cost of reference counting down to
    virtually zero by using by using something called "proxy" garbage
    collection. Unfortunately, I can't seem to find the exact message on Google,
    so I appended my *copy to the end of this message...




    > If there is no concurrently running scavenging
    > action, there is no write-barrier overhead, no strong memory ordering
    > requirement, no synchronization overhead.


    What sort of transactions occur between your collector and its mutators
    during a "scavenge" operation? Do your mutators have any #StoreLoad
    dependencies during a scavenge operation? I take it that the collector can
    indeed interrupt mutators during scavenges, correct?




    > There is no extra code or
    > data structure injected for GC safe point. The whole system does not
    > require strong memory ordering, it is suitable for most modern
    > processor architecture. Multi-processor concurrency can be further
    > exploited by the multi-threading property of mutator and collector.


    I am guessing, but from this, it sounds like your design allows for loads to
    pass stores (e.g., #LoadStore dependency ) right? If so, how does your stuff
    differ from the following scheme:

    http://citeseer.ist.psu.edu/domani00implementing.html

    http://groups.google.com/group/comp.programming.threads/msg/16b31df060411bcd?hl=en



    [...]


    > Any comments are welcome!


    FWIW, I use a more relaxed version of the synchronization pattern I briefly
    described here:

    http://groups.google.com/group/comp.programming.threads/msg/e59c2dd1afb8e0a3?hl=en

    For my proxy gc that I briefly introduced here:

    http://groups.google.com/group/comp.programming.threads/msg/72684aa070964e6a

    As you can see, the periodic/episodic synchronization usage pattern
    amortizes the cost of memory barriers over successive epochs in time. This
    allows mutators to make synchronization-free local reference count
    adjustments; no atomic operations or memory barriers are required. The same
    goes for collector<->mutator communications. So, a mutator can literally
    adjust a reference count by doing this:


    ++this_mutator->refs[whatever];


    Both VZOOM and Joe Seighs RCU-SMR can be implemented in a way that can that
    can dissect quiescent-states out of operating system provided
    information... Except, as I stated before, the stuff that I invented can be
    based on a more relaxed sync pattern I created.


    BTW, speaking of inventions, If you think you have something truly novel,
    and sets you apart from prior art, I suggest that you think about doing a
    patent feasibility study, and possibly going thought the patent application
    process. I suggest that you read the following thread:

    http://groups.google.com/group/comp...fb64e?q=atomic patent&rnum=1#0b9072fea09fb64e

    :O




    I would also suggest that you include comp.programming.threads into the
    discussion...










    * ----------------
    "David Hopwood" <> wrote in message
    news:...
    > [posting via Google because of news server problems -- sorry if this is
    > a duplicate]
    >
    > Chris Thomasson wrote:
    >> Sergey P. Derevyago wrote:
    >>>Joe Seigh wrote:
    >>>
    >>>>> The question is: Can you take XXX proxy GC and plug it just like
    >>>>>Boehm GC?
    >>>>
    >>>>No, things like RCU, etc... aren't used for storage management. They're
    >>>>used for read lock-free solutions to the reader/writer problem in a
    >>>>multi-threaded environment.
    >>>
    >>> So I was right: there is no GC/C++ that doesn't "stop the world".
    >>> And it seems like it's not possible for any GC/C++ not to "stop the
    >>> world".

    >>
    >> You don't need to garbage collect everything. Proxy collection can allow
    >> an application to get away from traditional GC, and all of the overhead
    >> that comes along with them.

    >
    > I always find it odd when people talk about the "overhead of GC" as
    > though
    > other memory management techniques had no overheads.
    >
    > You shouldn't assume that a hybrid of GC and some other memory
    > management
    > technique will have lower overhead than applying GC to all objects. If
    > the
    > other technique is some form of reference counting (as is commonly the
    > case
    > when using proxy GC), then it definitely won't; refcounting is very
    > inefficient.


    I agree, atomic lock-free refcounting can be very expensive. However, you
    can simply amortize the cost of reference counting, if your using
    ref-counting to begin with, when your implementing a proxy collector. You
    don't reference count everything, "just" the collector structures. IMO,
    using lock-free atomic reference counting to protect each node of a
    data-structure would generate massive overheads. However, as you probably
    know, there are other methods that can amortize the cost of actually
    reference counting the collector structures. I created a straightforward
    usage pattern for my VZOOM collector that is based on per-thread
    periodic/episodic synchronizations...




    - You don't need to activate the collector every time you access a shared
    data-structure... You can setup a runtime configurable dynamic ratio of
    accesses:syncs... Unfortunately, algorithms like RCU are not compatible with
    this setup because it simply requires an arbitrary number of references to
    persist over quiescent states. Your application will drive right off a cliff
    if your try to do this with RCU! IMO, even RCU-SMR can't really fit this
    bill because it was not really designed to allow the user to grab an
    arbitrary number of references...


    - Also, the collector I came up can be implemented with POSIX and dependant
    load-ordering. So, an application can use it to implement lock-free reader
    patterns on many different operating systems and processor combinations. Can
    a traditional GC compete with the flexibility of a portable user-space proxy
    collector wrt solving the reader/writer problem in lock-free algorithms?
     
    Chris Thomasson, Apr 23, 2006
    #3
  4. Mingnan G.

    Mingnan G. Guest

    >With real pointers and new-ed objects you can hardly get RAII??
    I am not sure I understand what you mean. In this system, C++ raw
    pointers are treated as weak references as Java. They will not retain
    an object alive. Only smart pointers will keep an object alive. New
    objects are created similar to C++ new operator. For example:
    class Foo { ... };
    CLockedPtr<Foo> p = hgc_new(Foo, (arguments...));

    The system will invoke the destructor of class Foo at proper time. When
    the object is acyclic, the system will reclaim immediately. When the
    object is circular referenced, the system will reclaim it by tracing
    garbage collection. Programmers need not care about these thing, they
    should only puts their native-resource management code in destructor
    method, and let the system to do the else things.

    > ... you could just post links to download and documentation.

    Thanks for your kind advice. I am doing that currently. I need a place
    to put my documentation and source code, and I need some time.
     
    Mingnan G., Apr 23, 2006
    #4
  5. "Mingnan G." <> wrote in message
    news:...
    > >With real pointers and new-ed objects you can hardly get RAII??

    > I am not sure I understand what you mean. In this system, C++ raw
    > pointers are treated as weak references as Java. They will not retain
    > an object alive. Only smart pointers will keep an object alive.


    You have to use smart pointers to keep a "specific object alive"? What kind
    of "per-object overhead" are we talking about here? How does a mutator
    thread use the smart pointer to interact with your collectors "scavenging"
    operations?




    FWIW, this was one of the reasons why proxy gc was created, mutator threads
    can use raw pointers and dependant load-ordering to access shared
    data-structures; no per-object smart-pointer logic needed. As you probably
    know, dependant load-ordering is supported on every existing arch, except
    DEC alpha; it happens to require a rmb(). On Linux this type of "special"
    barrier is expressed as read_barrier_depends(); its a no-op on everything
    but alpha.

    As I stated in an earlier post, your mutator threads do not need to track
    each object individually. You can convert your per-object smart pointers,
    into per-epoch smart pointers and follow a simple RCU pattern:

    http://www.rdrop.com/users/paulmck/RCU/

    enter_non_quiescent();

    // access shared data-structures

    leave_non_quiescent();


    This could be expressed in C++ as:

    {
    gc::scoped_non_quiescent collected_scope;
    // access shared data-structures
    }


    The overhead behind enter_non_quiescent() can be as simple as a normal load,
    and leave_non_quiescent() can be as simple as a normal store. No atomic ops
    and no #StoreLoad or #LoadStore memory barriers required. No kidding.
     
    Chris Thomasson, Apr 23, 2006
    #5
  6. "Mingnan G." <> wrote in message
    news:...
    > >With real pointers and new-ed objects you can hardly get RAII??

    > I am not sure I understand what you mean. In this system, C++ raw
    > pointers are treated as weak references as Java. They will not retain
    > an object alive. Only smart pointers will keep an object alive.


    Do your smart pointers have an atomic CAS or XCHG operation? I mean, can I
    do something like this pseudo-code:


    static your_strongptr< Foo > p_Foo = your_new( Foo ... );


    {
    your_weakptr< Foo > c_Foo, x_Foo = your_new( Foo ... );
    do {
    c_Foo = p_Foo;
    if ( c_Foo ) {
    x_Foo->LocalModify( c_Foo );
    }
    } while ( p_Foo.cas( c_Foo, x_Foo ) );

    if ( c_Foo ) {
    c_Foo->ProcessExchangedWith( x_Foo );
    }
    }




    Can I do 100% lock-free programming with your smart pointers?
     
    Chris Thomasson, Apr 23, 2006
    #6
  7. Mingnan G.

    Mingnan G. Guest

    I am glad to see your kindly messages and suggestions, thanks very
    much.

    I am still carefully reading your posts and links. Followings are
    questions I know to answer.

    > What sort of transactions occur between your collector and its mutators during a "scavenge" operation?

    During a "scavenge" operation, the system uses multi-threading
    synchronization mechanism between collector and mutators. There is some
    overhead. However, since most objects are acyclic, scavenge operation
    need not run as frequently as prior-art garbage collector. I recommend
    tracing garbage collection runs with higher priority than normal
    application threads, but lower than real-time threads. So that,
    expensive tracing GC can finished sooner, and real-time threads are not
    affected.

    > Do your mutators have any #StoreLoad dependencies during a scavenge operation? I take it that the collector can indeed interrupt mutators during scavenges, correct?

    Mutators have #StoreLoad dependencies during a scavenge operation. The
    collector can interrupt any application threads at any place, and start
    a tracing collection without delay.

    > Do your smart pointers have an atomic CAS or XCHG operation?

    Yes, but not always. During the "scavenge" operation, there is some
    complicate synchronization overhead. These synchronization will use
    atomic CAS or XCHG.

    > your_weakptr< Foo > c_Foo, x_Foo = your_new( Foo ... );

    In this system, your_weakptr< Foo > will not keep the newly created Foo
    object alive. Therefore, when the execution of this statement is
    finished, the newly created object is reclaimed immediately. That maybe
    not you wanted.

    Mingnan G.
     
    Mingnan G., Apr 23, 2006
    #7
  8. Mingnan G.

    Mingnan G. Guest

    > Can I do 100% lock-free programming with your smart pointers?
    Maybe, I am studying those links you posted. The problem is that I want
    to provide strict deterministic finalization, all acyclic garbage
    objects are reclaimed in a predictable ordering. I am not sure
    per-epoch smart pointers may work.
     
    Mingnan G., Apr 23, 2006
    #8
  9. "Mingnan G." <> wrote in message
    news:...
    >I am glad to see your kindly messages and suggestions, thanks very
    > much.
    >
    > I am still carefully reading your posts and links. Followings are
    > questions I know to answer.
    >
    >> What sort of transactions occur between your collector and its mutators
    >> during a "scavenge" operation?

    > During a "scavenge" operation, the system uses multi-threading
    > synchronization mechanism between collector and mutators. There is some
    > overhead. However, since most objects are acyclic, scavenge operation
    > need not run as frequently as prior-art garbage collector.
    > I recommend
    > tracing garbage collection runs with higher priority than normal
    > application threads, but lower than real-time threads. So that,
    > expensive tracing GC can finished sooner, and real-time threads are not
    > affected.
    >
    >> Do your mutators have any #StoreLoad dependencies during a scavenge
    >> operation? I take it that the collector can indeed interrupt mutators
    >> during scavenges, correct?

    > Mutators have #StoreLoad dependencies during a scavenge operation. The
    > collector can interrupt any application threads at any place, and start
    > a tracing collection without delay.
    >
    >> Do your smart pointers have an atomic CAS or XCHG operation?

    > Yes, but not always. During the "scavenge" operation, there is some
    > complicate synchronization overhead. These synchronization will use
    > atomic CAS or XCHG.


    Okay. So you are using atomic operations for the scavenge operations...
    Humm...




    >> your_weakptr< Foo > c_Foo, x_Foo = your_new( Foo ... );

    > In this system, your_weakptr< Foo > will not keep the newly created Foo
    > object alive. Therefore, when the execution of this statement is
    > finished, the newly created object is reclaimed immediately. That maybe
    > not you wanted.


    Your correct. I wanted the newly created Foo object (x_Foo) to be CAS'd into
    the global "smart-pointer" (p_Foo), therefore keeping it alive. The "old"
    object (c_Foo) is the one that can be garbage collected. Humm, I wonder if I
    can do atomic lock-free COW or PCOW based algorithms with your setup...
    Humm... How about this:


    static your_strongptr< Foo > p_Foo = your_new( Foo ... );


    {
    your_strongptr< Foo > c_Foo, x_Foo = your_new( Foo ... );
    do {
    c_Foo = p_Foo;
    if ( c_Foo ) {
    x_Foo->LocalModify( c_Foo );
    }
    } while ( p_Foo.cas( c_Foo, x_Foo ) );

    if ( c_Foo ) {
    c_Foo->ProcessExchangedWith( x_Foo );
    }
    }


    Now there is no weak pointers, could this work in your setup?
     
    Chris Thomasson, Apr 23, 2006
    #9
  10. "Mingnan G." <> wrote in message
    news:...
    >> Can I do 100% lock-free programming with your smart pointers?

    > Maybe, I am studying those links you posted. The problem is that I want
    > to provide strict deterministic finalization, all acyclic garbage
    > objects are reclaimed in a predictable ordering. I am not sure
    > per-epoch smart pointers may work.


    Well, one of the good things about per-epoch smart pointers is that they are
    very friendly to large shared linked data-structures. Reader threads can
    walk through a such a data-structure by using normal raw pointers, and
    normal loads and stores to access the structures nodes:


    {
    gc::scoped_non_quiescent collected_scope;

    node_t *nx, *n = collection::head;
    while ( n ) {
    nx = n->nx;
    whatever( n );
    n = nx;
    }
    }




    When you are using per-object smart pointers, reader threads have to access
    the smart pointer logic every time they access a node. The larger the linked
    data-structure, the more overheadgets; negative scalability:


    {
    your_strongptr< node_t > nx, n = collection::head;
    while ( n ) {
    nx = n->nx;
    whatever( n );
    n = nx;
    }
    }



    This is a simple example of why per-object stuff can be "very" expensive...
     
    Chris Thomasson, Apr 23, 2006
    #10
  11. Mingnan G. wrote:
    > I am glad to see your kindly messages and suggestions, thanks very
    > much.
    >
    > I am still carefully reading your posts and links. Followings are
    > questions I know to answer.
    >
    >
    >>What sort of transactions occur between your collector and its mutators during a "scavenge" operation?

    >
    > During a "scavenge" operation, the system uses multi-threading
    > synchronization mechanism between collector and mutators. There is some
    > overhead. However, since most objects are acyclic, scavenge operation
    > need not run as frequently as prior-art garbage collector. I recommend
    > tracing garbage collection runs with higher priority than normal
    > application threads, but lower than real-time threads. So that,
    > expensive tracing GC can finished sooner, and real-time threads are not
    > affected.
    >
    >
    >>Do your mutators have any #StoreLoad dependencies during a scavenge operation? I take it that the collector can indeed interrupt mutators during scavenges, correct?

    >
    > Mutators have #StoreLoad dependencies during a scavenge operation. The
    > collector can interrupt any application threads at any place, and start
    > a tracing collection without delay.


    Humm... I am curious as to how you are implementing the interrupt logic???
     
    Chris Thomasson, Apr 23, 2006
    #11
  12. Mingnan G.

    Mingnan G. Guest

    Interrupt logic in this system is quite straight. All application code
    is fully interruptible, so the mutator can be interrupted at any place.
    If an application thread invoke the garbage collection, the thread
    immediately enter the scavenge stage and starts traversals. Other
    mutators keep running concurrently with the thread, which become the
    collector. During the scavenge stage you can do anything, including
    creating new threads, creating new GC objects, etc. Higher priority
    threads might preempt the collector at any place, but this will cause
    the collector running longer to finish a collection.

    Mingnan G.
     
    Mingnan G., Apr 24, 2006
    #12
  13. Mingnan G.

    Mingnan G. Guest

    I do not quite understand your "per-epoch" smart pointers. In this
    system, smart pointers physically is just a word of object address. It
    traps assignment to pointers. Converting a strong smart pointer to a
    raw pointer has no overhead through C++ compiler's optimization, just
    like normal C/C++ pointer casting. On the contrary, converting a raw
    pointer to strong smart pointer will indead cause a little overhead.
    The overhead is very small and in most cases programmers should not
    convert a raw pointer to a strong pointer. A strong pointer normally
    should derived from another strong pointer. Convertion between strong
    pointers is faster than converting from raw pointers.

    Mingnan G.
     
    Mingnan G., Apr 24, 2006
    #13
  14. Mingnan G.

    Rui Maciel Guest

    Mingnan G. wrote:

    >> ... you could just post links to download and documentation.

    > Thanks for your kind advice. I am doing that currently. I need a place
    > to put my documentation and source code, and I need some time.


    Why not release it as a sourceforge project? Sourceforge was created for
    that exact purpose. It offers web hosting, CVS/subversion, project forums,
    bug tracking and some other stuff I don't remember right now.

    Here is the URL: http://sourceforge.net/


    Hope this helps
    Rui maciel
    --
    Running Kubuntu 5.10 with KDE 3.5.2 and proud of it.
    jabber:
     
    Rui Maciel, Apr 24, 2006
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rob Tillie

    Garbage Collector Debugging

    Rob Tillie, Aug 15, 2003, in forum: ASP .Net
    Replies:
    11
    Views:
    1,765
    JerryK
    Aug 18, 2003
  2. Pyramis
    Replies:
    0
    Views:
    403
    Pyramis
    Jan 25, 2004
  3. Colt

    Garbage collector problem

    Colt, Nov 15, 2003, in forum: Java
    Replies:
    9
    Views:
    689
    Tim Ward
    Nov 18, 2003
  4. bart59
    Replies:
    0
    Views:
    503
    bart59
    Jun 17, 2004
  5. Will Hartung

    Garbage Collector Tuning?

    Will Hartung, Sep 8, 2004, in forum: Java
    Replies:
    3
    Views:
    2,097
    Kevin McMurtrie
    Sep 11, 2004
Loading...

Share This Page