Memory Question

Discussion in 'Ruby' started by zdennis, Feb 26, 2006.

  1. zdennis

    zdennis Guest

    I am doing some large queries with Mysql and the memory that gets allocated never seems to go back
    to the system. In this test I am querying 47,000 records.

    My setup for my test is ubuntu breezy, rails 1.0, mysql 5.0.18 and a compiled mysql 2.7 ruby driver
    running in production mode. I have wrote a helper to my program which monitors object count, memory
    utilization and a threshold of 10% to determine how many objects are sticking around and how many
    are being garbage collected.

    On first run...

    Loaded suite test/unit/table1_test
    Started
    String count 73685
    Building query 21Mb
    String count 73745

    Received query results 89Mb
    String count 901034
    Threshold breaker String (827236) started w/ 73743 ended w/ 900979
    Threshold breaker Table1 47000 started w/ 0 ended w/ 47000

    Starting GC 89Mb
    String count 25041
    Done with GC 82Mb
    Threshold breaker String -48704 started w/ 73743 ended w/ 25039

    The first block of text shows that when I started my test ruby was utilizing 21Mb of memory and over
    70,000 strings were in existance. After the query results were constructed there was over 900,000
    strings in existance and my ruby process had grown to 89Mb. The threshold is shown as a gain in
    47,000 Table1 objects and 827,236 Strings since the the first object count was captured (which
    occured right before the original String count).

    After garbage collecting the Stringcount is down 48,704 from when it was first captured, and there
    is no threshold break for Table1 because when the program first started there were 0 in existance.
    Thus meaning that 0 Table1 objects are in existance. However the memory size never seems to leave
    82Mb. It never seems to leave that size.

    If I query again, memory goes up to 133Mb, after all Strings and ActiveRecord models have been
    garbage collected, memory goes down ever so slightly again. And this continues as long as I keep
    querying.

    The test is broken out into three methods:
    - test_build_mem_usage (count objects, perform query, store results in local variable )
    - test_gc (count objects, GC.start)
    - test_z (done, recount objects since GC is done)

    I guess my biggest unknown at the moment is...as I do large queries is ruby just hanging onto that
    space? Why would it keep growing for the next time I did 47,000 item query, if it already had unused
    space available from my last query?

    At the bottom of this post is the actual test schema and test code I was using.

    Zach

    ---- start schema ----
    create table table1 (
    id int unsigned not null auto_increment,
    description varchar(255),
    store_name varchar(255),
    address1 varchar(40),
    address2 varchar(40),
    city varchar(40),
    state varchar(15),
    zip_code varchar(5),
    primary key( id )
    )TYPE=MyISAM;


    ---- start test code ----
    # hook into the Rails environment
    require File.dirname(__FILE__) + '/../test_helper'

    require 'table1'

    class Object
    def count_objects
    objects = Hash.new{ |h,k| h[k]=0 }
    ObjectSpace.each_object{ |obj| objects[obj.class] += 1 }
    objects
    end

    def print_threshold_breakers hsh1, hsh2, threshold
    # threshold is in percentages
    threshold = 1.0 + 1.0 / threshold

    hsh2.each_key do |key|
    max_num = hsh1[key] * threshold
    min_num = hsh1[key] / threshold

    if hsh2[key] > max_num or hsh2[key] < min_num
    putsf "Threshold breaker #{key.to_s}", "(#{hsh2[key]-hsh1[key]}) started w/ #{hsh1[key]}
    ended w/ #{hsh2[key]}"
    end
    end
    end

    def count_objects_for clazz
    c = 0
    ObjectSpace.each_object{ |o| c+=1 if o.is_a? clazz }
    c
    end

    def mem_usage
    # get the top two lines from top
    line_arr = `ps -p #{Process.pid} -F`.split( /\n/ )

    # split the line array into columns of headers and data
    arr1, arr2 = line_arr.map{ |line| line.split( /\s+/ ) }

    # force the same number of elements in arr2 as there are in arr1 by joining any leftover
    elements
    column_arr = [ arr1 ]
    column_arr << arr2[0 .. arr1.size-2] + arr2[arr1.size-1 .. arr2.size-1].join( ' ' ).to_a

    # get column/data key pair array
    keypair_arr = column_arr.transpose

    # create hash
    hsh = {}
    keypair_arr.each{ |e| hsh[e[0]] = e[1] }

    # grab results from RSS, which are stored in Kb
    (hsh[ 'RSS' ].to_i / 1024.0).round.to_s << "Mb"
    end

    def putsf label, *args
    printf( "%-40.40s %-40s\n", label.to_s, args.join( ' ' ) )
    end

    def print_class_count clazz
    putsf "#{clazz.name} count", count_objects_for( clazz )
    end


    end


    class TableTest < Test::Unit::TestCase

    def test_build_mem_usage
    print_class_count String
    putsf 'Building mem usage', mem_usage
    h1 = @@h1 = count_objects
    print_class_count String

    records = Table1.find :all, :limit=>47000
    @@oid = records.object_id
    h2 = count_objects

    putsf 'Done building mem usage', mem_usage
    print_class_count String

    print_threshold_breakers h1, h2, 10
    sleep 2
    puts
    end

    def test_starting_gc
    putsf 'Starting GC', mem_usage

    h1 = count_objects
    GC.start

    h2 = @@h2 = count_objects

    print_class_count String
    putsf 'Done with GC', mem_usage
    print_threshold_breakers h1, h2, 10
    puts
    end

    def test_z
    test_build_mem_usage
    test_starting_gc

    test_build_mem_usage
    test_starting_gc

    test_build_mem_usage
    test_starting_gc

    print_class_count String
    print_class_count Table1
    putsf 'Done', mem_usage
    print_threshold_breakers @@h1, @@h2, 10
    puts

    ObjectSpace.each_object{ |obj| puts "FOUND THE RECORD ARRAY " if obj.object_id == @@oid }
    # ObjectSpace.each_object { |obj| puts obj if obj.is_a? String }
    end


    end
    zdennis, Feb 26, 2006
    #1
    1. Advertising

  2. zdennis

    Eric Hodel Guest

    On Feb 26, 2006, at 1:51 AM, zdennis wrote:

    > I am doing some large queries with Mysql and the memory that gets
    > allocated never seems to go back to the system. In this test I am
    > querying 47,000 records.
    >
    > My setup for my test is ubuntu breezy, rails 1.0, mysql 5.0.18 and
    > a compiled mysql 2.7 ruby driver running in production mode. I have
    > wrote a helper to my program which monitors object count, memory
    > utilization and a threshold of 10% to determine how many objects
    > are sticking around and how many are being garbage collected.
    >
    > On first run...
    >
    > Loaded suite test/unit/table1_test
    > Started
    > String count 73685
    > Building query 21Mb
    > String count 73745
    >
    > Received query results 89Mb
    > String count 901034
    > Threshold breaker String (827236) started w/
    > 73743 ended w/ 900979
    > Threshold breaker Table1 47000 started w/ 0 ended
    > w/ 47000
    >
    > Starting GC 89Mb
    > String count 25041
    > Done with GC 82Mb
    > Threshold breaker String -48704 started w/ 73743
    > ended w/ 25039
    >
    > The first block of text shows that when I started my test ruby was
    > utilizing 21Mb of memory and over 70,000 strings were in existance.
    > After the query results were constructed there was over 900,000
    > strings in existance and my ruby process had grown to 89Mb. The
    > threshold is shown as a gain in 47,000 Table1 objects and 827,236
    > Strings since the the first object count was captured (which
    > occured right before the original String count).
    >
    > After garbage collecting the Stringcount is down 48,704 from when
    > it was first captured, and there is no threshold break for Table1
    > because when the program first started there were 0 in existance.
    > Thus meaning that 0 Table1 objects are in existance. However the
    > memory size never seems to leave 82Mb. It never seems to leave that
    > size.
    >
    > If I query again, memory goes up to 133Mb, after all Strings and
    > ActiveRecord models have been garbage collected, memory goes down
    > ever so slightly again. And this continues as long as I keep querying.


    Ruby does not feature a compacting garbage collector so you may
    simply have a lot of nearly-empty pages of memory that cannot be
    returned to the operating system.

    If you still suspect a leak I suggest you first see if you can
    achieve the same results by simply generating objects without any DB
    and also with just the mysql driver, no Rails.

    --
    Eric Hodel - - http://blog.segment7.net
    This implementation is HODEL-HASH-9600 compliant

    http://trackmap.robotcoop.com
    Eric Hodel, Feb 26, 2006
    #2
    1. Advertising

  3. zdennis

    Minkoo Seo Guest

    Eric Hodel wrote:
    > On Feb 26, 2006, at 1:51 AM, zdennis wrote:
    >
    > Ruby does not feature a compacting garbage collector so you may
    > simply have a lot of nearly-empty pages of memory that cannot be
    > returned to the operating system.
    >


    What? Is this true? It's quite disappointing. Is there any master plan
    in the ruby language for supporting compacting GC?

    - Minkoo Seo
    Minkoo Seo, Feb 26, 2006
    #3
  4. zdennis

    zdennis Guest

    Eric Hodel wrote:
    > Ruby does not feature a compacting garbage collector so you may simply
    > have a lot of nearly-empty pages of memory that cannot be returned to
    > the operating system.
    >
    > If you still suspect a leak I suggest you first see if you can achieve
    > the same results by simply generating objects without any DB and also
    > with just the mysql driver, no Rails.
    >


    Well I did some digging, and I'm left with more questions. The following script does NOT collect the
    800,000 strings created (this doesn't use Mysql, Rails or anything else, just ruby core). I ran this
    in accordance with a couple other scripts which utilize 150mb+ of memory, to see if the os would
    somehow force ruby to do better GC'ing, it didn't and swap was used. On my system this script
    consistently used 42Mb of memory, and the string count never went below 800,000. Why would the
    strings not get gc'd when they are out of scope ?

    ----------test1.rb----------------
    def count_objects_for clazz
    c = 0
    ObjectSpace.each_object{ |o| c+=1 if o.is_a? clazz }
    c
    end

    class A
    def run
    arr = []
    800000.times { arr << "d" * 7 }

    puts "check now"
    sleep 10
    end
    end

    puts count_objects_for( String )
    A.new.run
    puts count_objects_for( String )
    GC.start
    puts count_objects_for( String )

    Thread.new do
    loop do
    puts count_objects_for( String )
    GC.start
    sleep 5
    end
    end.join
    -------end test1.rb-------------


    Send test, same script as above but slight modification to call Array#clear. Now this actually
    collects the Strings. I am specifically removing reference in the below code by clearing the Array
    arr. But in the above code I don't reference arr or it's value anywhere outside of the method it is
    in, so after that call occurs shouldn't it be fair game for the garbage collector?

    -------- test2.rb-----------
    def count_objects_for clazz
    c = 0
    ObjectSpace.each_object{ |o| c+=1 if o.is_a? clazz }
    c
    end

    class A
    def run
    arr = []
    800000.times { arr << "d" * 7 }
    puts "check now"
    sleep 10
    arr.clear
    end
    end

    puts count_objects_for( String )
    A.new.run
    puts count_objects_for( String )
    GC.start
    puts count_objects_for( String )

    Thread.new do
    loop do
    puts count_objects_for( String )
    GC.start
    sleep 5
    end
    end.join
    -------end test1.rb-------------

    I modified this second script to run with 8 million string also. When the Strings are GC'd the ruby
    processes memory usage goes down, but not as much as I'd think. With 8 million strings I am getting
    330+Mb of memory in use, but when I GC them, memory only seems to go 191Mb, where I would think it
    would fall back down in the single/double digit's, perhaps 15M or 20M. Any reason why ruby doesn't
    let go? Is this a problem? Right now I have ran this with ruby 1.8.3 (2005-06-23) [i486-linux], and
    ruby 1.8.4 (2005-12-24) [powerpc-linux] with similar results.

    Zach
    zdennis, Feb 26, 2006
    #4
  5. zdennis

    gregarican Guest

    Minkoo Seo wrote:


    > Eric Hodel wrote:
    > > On Feb 26, 2006, at 1:51 AM, zdennis wrote:

    >
    > > Ruby does not feature a compacting garbage collector so you may
    > > simply have a lot of nearly-empty pages of memory that cannot be
    > > returned to the operating system.

    >
    >
    >
    > What? Is this true? It's quite disappointing. Is there any master plan
    > in the ruby language for supporting compacting GC?
    >
    > - Minkoo Seo


    I have personally seen similar behavior. I have a Win32 Ruby CTI
    listener process (based on the library of my http://tsapi.rubyforge.org
    project) that I leave on my desktop 24x7 that serves like a caller ID
    box for incoming phone calls to my extension. The looping script I have
    sleeps 1 second each iteration to help keep the CPU from maxing out.
    All I am doing is polling an event queue on the CTI server every
    second. Not sure about what GC'ing would be going on behind the scenes
    since the objects stay in memory due to the looping construct. But
    after about a day or so the memory utilization starts to get to about
    24 MB. Then after a few more days it tops 40 MB. If I don't restart the
    script I see problems with other NTVDM processes running alongside it.
    Perhaps I need to explicitly open/close the sockets each pass?
    gregarican, Feb 27, 2006
    #5
  6. zdennis

    H.Yamamoto Guest

    Hello.

    Ruby's GC is conservative, so there is no guarantee object is freed even if
    object is not reachable from GC's root.

    But anyway, probably I found the problem on ELTS_SHARED.

    /////////////////////////////////

    def pause
    GC.start
    $stdout.puts "measure memory and hit any key..."
    $stdout.flush
    $stdin.getc
    end

    pause

    a = Array.new(10000){ "." * 1000 } # huge memory

    pause

    a.map!{|s| s[-100..-1]} # memory stays large

    pause

    a.map!{|s| s[-3..-1]} # reduces memory

    pause


    /////////////////////////////////

    This is because rb_str_substr (string.c) 's

    else if (len > sizeof(struct RString)/2 &&
    beg + len == RSTRING(str)->len && !FL_TEST(str, STR_ASSOC)) {
    str2 = rb_str_new3(rb_str_new4(str));
    RSTRING(str2)->ptr += RSTRING(str2)->len - len;
    RSTRING(str2)->len = len;
    }

    is executed at

    a.map!{|s| s[-100..-1]} # memory stays large

    rb_str_new3 generates ELS_SHARED RString which holds original RString.

    When original string becomes unreachable, it should be garbage collected.
    But ELTS_SHARED substring references it (RString#aux->shared), so not collected
    until substring itself becomes unreachable.

    I haven't confirmed this is really cause of your problem, but there is possibility
    this hidden huge string eats memory. (maybe same thing happens on Array)
    H.Yamamoto, Feb 27, 2006
    #6
  7. zdennis

    zdennis Guest

    H. Yamamoto,

    I posted my post and your reply to ruby-core. Hopefully this can get clarified or as you suspect,
    the problem you potentially found can be fixed, Thank you for your reply,

    Zach

    H.Yamamoto wrote:
    > Hello.
    >
    > Ruby's GC is conservative, so there is no guarantee object is freed even if
    > object is not reachable from GC's root.
    >
    > But anyway, probably I found the problem on ELTS_SHARED.
    >
    > /////////////////////////////////
    >
    > def pause
    > GC.start
    > $stdout.puts "measure memory and hit any key..."
    > $stdout.flush
    > $stdin.getc
    > end
    >
    > pause
    >
    > a = Array.new(10000){ "." * 1000 } # huge memory
    >
    > pause
    >
    > a.map!{|s| s[-100..-1]} # memory stays large
    >
    > pause
    >
    > a.map!{|s| s[-3..-1]} # reduces memory
    >
    > pause
    >
    >
    > /////////////////////////////////
    >
    > This is because rb_str_substr (string.c) 's
    >
    > else if (len > sizeof(struct RString)/2 &&
    > beg + len == RSTRING(str)->len && !FL_TEST(str, STR_ASSOC)) {
    > str2 = rb_str_new3(rb_str_new4(str));
    > RSTRING(str2)->ptr += RSTRING(str2)->len - len;
    > RSTRING(str2)->len = len;
    > }
    >
    > is executed at
    >
    > a.map!{|s| s[-100..-1]} # memory stays large
    >
    > rb_str_new3 generates ELS_SHARED RString which holds original RString.
    >
    > When original string becomes unreachable, it should be garbage collected.
    > But ELTS_SHARED substring references it (RString#aux->shared), so not collected
    > until substring itself becomes unreachable.
    >
    > I haven't confirmed this is really cause of your problem, but there is possibility
    > this hidden huge string eats memory. (maybe same thing happens on Array)
    >
    >
    zdennis, Feb 27, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mahesh Prasad
    Replies:
    1
    Views:
    691
    Tom Wells
    Feb 22, 2004
  2. Cy Huckaba
    Replies:
    1
    Views:
    1,136
    Xie Xiao
    Jun 26, 2003
  3. Julián Sanz García

    RAM Memory or virual memory

    Julián Sanz García, Nov 12, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    888
    Julián Sanz García
    Nov 12, 2004
  4. humbleaptience
    Replies:
    0
    Views:
    5,478
    humbleaptience
    Feb 22, 2006
  5. Lee
    Replies:
    8
    Views:
    1,162
    Chris Uppal
    Jan 24, 2004
Loading...

Share This Page