GC and low file performance when large array is allocated

Discussion in 'Ruby' started by Geert Fannes, Oct 13, 2004.

  1. Geert Fannes

    Geert Fannes Guest

    Hello,

    I noticed that ruby's disc performance drops drastically when a large
    array is allocated. I think it has to do with garbage collection since
    the performance increases again by disabling the garbage collection. I
    created a small test program to illustrate the problem:

    #
    #begin of program
    #
    allocateBefore=true
    useFileLoop=true
    disableGC=false

    GC.disable if disableGC

    #create a file containing 100000 lines of 'test'
    File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}

    largeArray=Array.new(20000000) if allocateBefore

    if useFileLoop
    File.open('testfile') do |fi|
    fi.each{|line|}
    end
    else
    1000000.times{|i|}
    end

    largeArray=Array.new(20000000) if !allocateBefore
    #
    #end of program
    #

    On my home pc, the above program takes 3.225 sec. If I allocate the
    large array AFTER the fi.each-loop by setting allocateBefore=false, it
    takes only 0.467 sec. The same good performance occurs when I disable
    the garbage collection by setting disableGC=true. Unfortunately,
    disabling GC is not an option in my real application since my file is
    a lot larger and all my memory gets consumed very fast.

    If I play with the allocateBefore and disableGC when the
    1000000.times-loop is enabled (by setting useFileLoop=false), I don't
    get this difference anymore.

    Any idea what is going on here? How can I achieve a good file
    performance with large arrays in memory?

    Greets,
    Geert Fannes.
     
    Geert Fannes, Oct 13, 2004
    #1
    1. Advertising

  2. Geert Fannes

    Kent Sibilev Guest

    If you run Unix, maybe you should consider using mmap module?

    Cheers,
    Kent.
    On Oct 13, 2004, at 5:44 PM, Geert Fannes wrote:

    > Hello,
    >
    > I noticed that ruby's disc performance drops drastically when a large
    > array is allocated. I think it has to do with garbage collection since
    > the performance increases again by disabling the garbage collection. I
    > created a small test program to illustrate the problem:
    >
    > #
    > #begin of program
    > #
    > allocateBefore=true
    > useFileLoop=true
    > disableGC=false
    >
    > GC.disable if disableGC
    >
    > #create a file containing 100000 lines of 'test'
    > File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
    >
    > largeArray=Array.new(20000000) if allocateBefore
    >
    > if useFileLoop
    > File.open('testfile') do |fi|
    > fi.each{|line|}
    > end
    > else
    > 1000000.times{|i|}
    > end
    >
    > largeArray=Array.new(20000000) if !allocateBefore
    > #
    > #end of program
    > #
    >
    > On my home pc, the above program takes 3.225 sec. If I allocate the
    > large array AFTER the fi.each-loop by setting allocateBefore=false, it
    > takes only 0.467 sec. The same good performance occurs when I disable
    > the garbage collection by setting disableGC=true. Unfortunately,
    > disabling GC is not an option in my real application since my file is
    > a lot larger and all my memory gets consumed very fast.
    >
    > If I play with the allocateBefore and disableGC when the
    > 1000000.times-loop is enabled (by setting useFileLoop=false), I don't
    > get this difference anymore.
    >
    > Any idea what is going on here? How can I achieve a good file
    > performance with large arrays in memory?
    >
    > Greets,
    > Geert Fannes.
    >
     
    Kent Sibilev, Oct 14, 2004
    #2
    1. Advertising

  3. On Oct 13, 2004, at 2:44 PM, Geert Fannes wrote:
    > I noticed that ruby's disc performance drops drastically when a large
    > array is allocated. I think it has to do with garbage collection since
    > the performance increases again by disabling the garbage collection. I
    > created a small test program to illustrate the problem:
    >
    > #
    > #begin of program
    > #
    > allocateBefore=true
    > useFileLoop=true
    > disableGC=false
    >
    > GC.disable if disableGC
    >
    > #create a file containing 100000 lines of 'test'
    > File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
    >
    > largeArray=Array.new(20000000) if allocateBefore
    >
    > if useFileLoop
    > File.open('testfile') do |fi|
    > fi.each{|line|}
    > end
    > else
    > 1000000.times{|i|}
    > end
    >
    > largeArray=Array.new(20000000) if !allocateBefore
    > #
    > #end of program
    > #

    Here is the Shark (profiler) output for allocateBefore=false, with
    everything else the same.
    Took about 1.3 seconds.
    9.9% 9.9% mach_kernel ml_restore
    6.0% 6.0% ruby memfill
    5.8% 5.8% ruby rb_yield_0
    4.7% 4.7% mach_kernel ml_set_interrupts_enabled
    4.6% 4.6% ruby saveFP
    4.5% 4.5% ruby rb_call0
    4.3% 4.3% ruby rb_eval
    3.5% 3.5% libSystem.B.dylib szone_malloc
    3.3% 3.3% libSystem.B.dylib _setjmp
    2.8% 2.8% libSystem.B.dylib szone_free
    2.6% 2.6% libSystem.B.dylib __error
    2.0% 2.0% ruby rb_newobj
    2.0% 2.0% mach_kernel hw_add_map
    1.6% 1.6% ruby rb_call
    1.6% 1.6% ruby new_dvar
    1.3% 1.3% ruby rb_funcall
    1.2% 1.2% mach_kernel tws_traverse_address_hash_list
    1.2% 1.2% ruby st_lookup
    1.2% 1.2% ruby restFP
    1.1% 1.1% ruby obj_free
    1.1% 1.1% ruby call_cfunc
    1.1% 1.1% commpage __memcpy
    1.0% 1.0% ruby io_write
    1.0% 1.0% libSystem.B.dylib fwrite
    1.0% 1.0% libSystem.B.dylib __sfvwrite
    0.9% 0.9% ruby rb_io_puts
    0.9% 0.9% ruby rb_io_fwrite
    0.8% 0.8% mach_kernel vm_fault


    Output for allocateBefore=true and everything else the same.
    Took about 9.7 seconds.
    45.9% 45.9% ruby gc_mark
    42.1% 42.1% ruby gc_mark_children
    1.1% 1.1% mach_kernel ml_restore
    1.1% 1.1% mach_kernel ml_set_interrupts_enabled
    0.6% 0.6% ruby memfill
    0.6% 0.6% ruby rb_eval
    0.5% 0.5% ruby rb_yield_0
    0.5% 0.5% ruby rb_call0
    0.4% 0.4% ruby saveFP
    0.4% 0.4% libSystem.B.dylib szone_malloc
    0.3% 0.3% libSystem.B.dylib szone_free
    0.3% 0.3% libSystem.B.dylib _setjmp
    0.3% 0.3% ruby rb_call
    0.3% 0.3% ruby call_cfunc
    0.2% 0.2% commpage __memcpy
    0.2% 0.2% mach_kernel hw_add_map
    0.2% 0.2% ruby rb_newobj
    0.2% 0.2% libSystem.B.dylib __error
    0.2% 0.2% ruby restFP
    0.2% 0.2% ruby io_write
    0.2% 0.2% ruby obj_free
    0.2% 0.2% libSystem.B.dylib __sfvwrite
    0.1% 0.1% mach_kernel vm_page_grab
    0.1% 0.1% ruby st_foreach
    0.1% 0.1% ruby rb_funcall
    0.1% 0.1% mach_kernel vm_fault
    0.1% 0.1% ruby st_lookup

    Definitely a performance hit. Pretty interesting.
    -Charlie
     
    Charles Mills, Oct 14, 2004
    #3
  4. Geert Fannes

    Geert Fannes Guest

    Hello,

    I played some more with the test program and apparently it has nothing
    to do with the file access. The program below is a simplified version,
    which is more to the point. If I exchange the string allocation t="t"
    with t=1, there is no performance drop anymore.

    #
    #begin of program
    #
    allocateBefore=true
    disableGC=false

    GC.disable if disableGC

    largeArray=Array.new(20000000) if allocateBefore

    100000.times{|i|t="t"}

    largeArray=Array.new(20000000) if !allocateBefore
    #
    #end of program
    #

    Any idea why the string allocation (and possibly deallocation) takes
    so much more time when there is a large array in memory? Can I destroy
    an object manually? This could be helpfull in combination with
    disabling the garbage collection.

    Greets,
    Geert Fannes.
    (Geert Fannes) wrote in message news:<>...
    > Hello,
    >
    > I noticed that ruby's disc performance drops drastically when a large
    > array is allocated. I think it has to do with garbage collection since
    > the performance increases again by disabling the garbage collection. I
    > created a small test program to illustrate the problem:
    >
    > #
    > #begin of program
    > #
    > allocateBefore=true
    > useFileLoop=true
    > disableGC=false
    >
    > GC.disable if disableGC
    >
    > #create a file containing 100000 lines of 'test'
    > File.open('testfile','w'){|fo| 100000.times{fo.puts 'test'}}
    >
    > largeArray=Array.new(20000000) if allocateBefore
    >
    > if useFileLoop
    > File.open('testfile') do |fi|
    > fi.each{|line|}
    > end
    > else
    > 1000000.times{|i|}
    > end
    >
    > largeArray=Array.new(20000000) if !allocateBefore
    > #
    > #end of program
    > #
    >
    > On my home pc, the above program takes 3.225 sec. If I allocate the
    > large array AFTER the fi.each-loop by setting allocateBefore=false, it
    > takes only 0.467 sec. The same good performance occurs when I disable
    > the garbage collection by setting disableGC=true. Unfortunately,
    > disabling GC is not an option in my real application since my file is
    > a lot larger and all my memory gets consumed very fast.
    >
    > If I play with the allocateBefore and disableGC when the
    > 1000000.times-loop is enabled (by setting useFileLoop=false), I don't
    > get this difference anymore.
    >
    > Any idea what is going on here? How can I achieve a good file
    > performance with large arrays in memory?
    >
    > Greets,
    > Geert Fannes.
     
    Geert Fannes, Oct 14, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page