You thoughts/philosphies on manual garbage collection

Discussion in 'Ruby' started by dkmd_nielsen, Mar 8, 2007.

  1. dkmd_nielsen

    dkmd_nielsen Guest

    The process that initiated my message earlier (about deleting array
    elements) is a rather long running process of rebuilding and
    reconfiguring parameter files. There hundreds of files, each with as
    many as 22,000 parameters to processed. For example, four small test
    files ran in about two minutes. There is a ton of string manipulation
    going on, which probably translated into lots of trailing string parts
    and pointer laying around RAM...clogging it up. I was thinking of
    manually initiating garbage collection after every five or ten files
    processed. Is that a smart thing?

    What are yours thoughts on manually initiated garbage collection?
    What kinds of practices result in bits and pieces of objects and
    pointers being left laying around in the ether of RAM? Are there
    tools that help see what happens to RAM while a process runs, like a
    debugger does with variables?

    Thanks for everything
    dvn
     
    dkmd_nielsen, Mar 8, 2007
    #1
    1. Advertising

  2. dkmd_nielsen

    Guest

    On Fri, 9 Mar 2007, dkmd_nielsen wrote:

    > The process that initiated my message earlier (about deleting array
    > elements) is a rather long running process of rebuilding and
    > reconfiguring parameter files. There hundreds of files, each with as
    > many as 22,000 parameters to processed. For example, four small test
    > files ran in about two minutes. There is a ton of string manipulation
    > going on, which probably translated into lots of trailing string parts
    > and pointer laying around RAM...clogging it up. I was thinking of
    > manually initiating garbage collection after every five or ten files
    > processed. Is that a smart thing?
    >
    > What are yours thoughts on manually initiated garbage collection?
    > What kinds of practices result in bits and pieces of objects and
    > pointers being left laying around in the ether of RAM? Are there
    > tools that help see what happens to RAM while a process runs, like a
    > debugger does with variables?
    >
    > Thanks for everything
    > dvn


    if you can fork - that's the best - then you just let each child's death clean
    up that sub-segment of work's memory.

    -a
    --
    be kind whenever possible... it is always possible.
    - the dalai lama
     
    , Mar 8, 2007
    #2
    1. Advertising

  3. On 08.03.2007 23:18, wrote:
    > On Fri, 9 Mar 2007, dkmd_nielsen wrote:
    >
    >> The process that initiated my message earlier (about deleting array
    >> elements) is a rather long running process of rebuilding and
    >> reconfiguring parameter files. There hundreds of files, each with as
    >> many as 22,000 parameters to processed. For example, four small test
    >> files ran in about two minutes. There is a ton of string manipulation
    >> going on, which probably translated into lots of trailing string parts
    >> and pointer laying around RAM...clogging it up. I was thinking of
    >> manually initiating garbage collection after every five or ten files
    >> processed. Is that a smart thing?


    To OP: generally "manual" GC is considered bad since it interferes with
    the automatic mechanism.

    >> What are yours thoughts on manually initiated garbage collection?
    >> What kinds of practices result in bits and pieces of objects and
    >> pointers being left laying around in the ether of RAM? Are there
    >> tools that help see what happens to RAM while a process runs, like a
    >> debugger does with variables?
    >>
    >> Thanks for everything
    >> dvn

    >
    > if you can fork - that's the best - then you just let each child's death
    > clean
    > up that sub-segment of work's memory.


    Also, forking has the added advantage of better utilizing multi core CPU's.

    If you do encounter excessive memory usage then you should

    a) make sure you do not hold onto stuff longer than needed

    b) check your algorithms for inefficient dealing with objects; since you
    mention string processing, this is a typical gotcha:

    s += "foo" # creates a new string
    s << "foo" # just appends to s

    Another one

    a=[]
    a += ["foo", "bar"] # creates another array
    a << "foo" << "bar" # just appends
    a.concat ["foo", "bar"] # just appends

    c) If files you are processing are large then you might also try to do
    some kind of stream processing where you do not have to keep the whole
    file's content in memory (if that's applicable to your problem domain).

    Kind regards

    robert
     
    Robert Klemme, Mar 9, 2007
    #3
  4. wrote:
    > On Fri, 9 Mar 2007, dkmd_nielsen wrote:
    >
    >> The process that initiated my message earlier (about deleting array
    >> elements) is a rather long running process of rebuilding and
    >> reconfiguring parameter files. There hundreds of files, each with as
    >> many as 22,000 parameters to processed. For example, four small test
    >> files ran in about two minutes. There is a ton of string manipulation
    >> going on, which probably translated into lots of trailing string parts
    >> and pointer laying around RAM...clogging it up. I was thinking of
    >> manually initiating garbage collection after every five or ten files
    >> processed. Is that a smart thing?
    >>
    >> What are yours thoughts on manually initiated garbage collection?
    >> What kinds of practices result in bits and pieces of objects and
    >> pointers being left laying around in the ether of RAM? Are there
    >> tools that help see what happens to RAM while a process runs, like a
    >> debugger does with variables?
    >>
    >> Thanks for everything
    >> dvn

    >
    > if you can fork - that's the best - then you just let each child's death
    > clean
    > up that sub-segment of work's memory.


    One caution: mark-and-sweep GC and fork don't always play well together,
    in terms of sharing memory pages. The mark algorithm needs to touch all
    live objects in the heap. The child inherits the parent's heap, with
    copy on write. If the parent has a large heap, and the child does a GC,
    all those pages are copied into the child's address space. Memory
    usage will scale badly as the number of child processes grows. (Perhaps
    you factor your process into one child for each of the hundreds of files?)

    It can be a good idea to GC.disable in the child, in some cases:

    - parent has large heap, and

    - child lifespan and allocation rate are such that is does not need to GC

    Some benchmarks:

    http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/186561

    --
    vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
     
    Joel VanderWerf, Mar 11, 2007
    #4
  5. Joel VanderWerf wrote:
    > One caution: mark-and-sweep GC and fork don't always play well together,
    > in terms of sharing memory pages. The mark algorithm needs to touch all
    > live objects in the heap. The child inherits the parent's heap, with
    > copy on write. If the parent has a large heap, and the child does a GC,
    > all those pages are copied into the child's address space. Memory usage
    > will scale badly as the number of child processes grows. (Perhaps you
    > factor your process into one child for each of the hundreds of files?)
    >
    > It can be a good idea to GC.disable in the child, in some cases:
    >
    > - parent has large heap, and
    >
    > - child lifespan and allocation rate are such that is does not need to GC
    >
    > Some benchmarks:
    >
    > http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/186561


    I looked for some extra information on this topic and found:
    http://blog.beaver.net/2005/03/ruby_gc_and_copyonwrite.html

    That's pretty disheartening news to me. I had plans to make a fcgi-like
    process manager that would take advantage of copy-on-write to reduce the
    memory footprint of a webapp by pre-loading all libraries in the parent
    process. But if ruby's GC renders COW useless... there's not much point
    anymore.

    Are there any plans to optimize ruby to make it fork-friendly?

    Daniel
     
    Daniel DeLorme, Mar 13, 2007
    #5
  6. dkmd_nielsen

    Gary Wright Guest

    On Mar 11, 2007, at 3:26 PM, Joel VanderWerf wrote:

    > wrote:
    >> if you can fork - that's the best - then you just let each child's
    >> death clean
    >> up that sub-segment of work's memory.

    >
    > One caution: mark-and-sweep GC and fork don't always play well
    > together, in terms of sharing memory pages. The mark algorithm
    > needs to touch all live objects in the heap. The child inherits the
    > parent's heap, with copy on write.


    I think you are describing a different situation than the OP and Ara.

    If you've got hundreds of files to process and the processing is
    sufficiently
    complex to justify forking for each file then the parent just
    iterates over
    the file list forking and waiting for each child to process each
    file. The
    parent's address space won't have all the stale objects generated by
    the child's
    processing so each new child starts with a reasonable memory footprint.

    One fork per file is the easiest to program but if that is
    problematic for
    some reason you could batch things up pretty easily.


    Gary Wright
     
    Gary Wright, Mar 13, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Frank Millman

    Some thoughts on garbage collection

    Frank Millman, Jan 23, 2006, in forum: Python
    Replies:
    5
    Views:
    376
    Stephen Kellett
    Jan 24, 2006
  2. Øyvind Isaksen
    Replies:
    1
    Views:
    1,012
    Øyvind Isaksen
    May 18, 2007
  3. Replies:
    14
    Views:
    635
    Bo Persson
    Jun 18, 2008
  4. Conan

    Manual Garbage Collection

    Conan, Jun 13, 2007, in forum: Ruby
    Replies:
    3
    Views:
    131
    Conan Rubymanjaro
    Jun 13, 2007
  5. Tridib Bandopadhyay
    Replies:
    25
    Views:
    514
    Tridib Bandopadhyay
    May 20, 2011
Loading...

Share This Page