How do I reduce the memory usage of a script?

Discussion in 'Ruby' started by Scott Ellsworth, Jul 13, 2005.

  1. Hi, all.

    Please find attached a simple Ruby script that rummages through my
    ITunes files, reads the first megabyte or so, finds the encoder, and
    then prints the encoder and filename. This lets me know which tracks
    need re-ripping.

    This script blows through half a gig of RAM while running, and I really
    do not see why. It should only have perhaps a few megabytes at max in
    RAM.

    FWIW, the output looks like:
    iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
    Music/Yellowcard/Ocean Avenue Song1.m4a
    iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
    Music/Yellowcard/Ocean Avenue Song2.m4a
    iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
    Music/Yellowcard/Ocean Avenue Song3.m4a

    Style and speed optimizations are accepted, but the runtime is under a
    minute now for the 5500 files I have in my library, so memory usage is
    my real problem.

    Help?

    #!/usr/bin/env ruby
    require 'find'
    def procpath(f)
    if File.file?(f) then
    if File.fnmatch("*.m4a",f) then
    found = false
    data = IO.read(f, 65536*8)
    re = /[[:alnum:]_., ]{9,}/
    data.scan(re) do |string|
    if (string =~ /QuickTime/) then
    filename = File.basename(f)
    dirname = File.dirname(f)
    # puts "#{string} #{dirname}"
    puts "#{string} #{dirname} #{filename}"
    found = true
    break
    end
    end
    if (!found) then
    puts "Unknown #{f}"
    end
    end
    elsif File.directory?(f) && !File.fnmatch(".", f) &&
    !File.fnmatch("..", f) then
    Dir.foreach(f) { |subf| procpath(subf) }
    end
    end

    Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
    procpath(f)
    end

    Scott
    --

    Java, Cocoa, and Database consulting for the life sciences

    --
    Scott Ellsworth

    Java and database consulting for the life sciences
    Scott Ellsworth, Jul 13, 2005
    #1
    1. Advertising

  2. Scott Ellsworth

    John Carter Guest

    On Thu, 14 Jul 2005, Scott Ellsworth wrote:

    > #!/usr/bin/env ruby
    > require 'find'
    > def procpath(f)
    > if File.file?(f) then
    > if File.fnmatch("*.m4a",f) then
    > found = false
    > data = IO.read(f, 65536*8)
    > re = /[[:alnum:]_., ]{9,}/
    > data.scan(re) do |string|
    > if (string =~ /QuickTime/) then
    > filename = File.basename(f)
    > dirname = File.dirname(f)
    > # puts "#{string} #{dirname}"
    > puts "#{string} #{dirname} #{filename}"
    > found = true
    > break
    > end
    > end
    > if (!found) then
    > puts "Unknown #{f}"
    > end
    > end
    > elsif File.directory?(f) && !File.fnmatch(".", f) &&
    > !File.fnmatch("..", f) then
    > Dir.foreach(f) { |subf| procpath(subf) }


    Why are you recursing here? Find.find does this stuff for you!

    > end
    > end
    >
    > Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
    > procpath(f)
    > end



    John Carter Phone : (64)(3) 358 6639
    Tait Electronics Fax : (64)(3) 359 4632
    PO Box 1645 Christchurch Email :
    New Zealand

    Carter's Clarification of Murphy's Law.

    "Things only ever go right so that they may go more spectacularly wrong later."

    From this principle, all of life and physics may be deduced.
    John Carter, Jul 13, 2005
    #2
    1. Advertising

  3. On Jul 13, 2005, at 5:30 PM, Scott Ellsworth wrote:

    > Hi, all.
    >
    > Please find attached a simple Ruby script that rummages through my
    > ITunes files, reads the first megabyte or so, finds the encoder, and
    > then prints the encoder and filename. This lets me know which tracks
    > need re-ripping.
    >
    > This script blows through half a gig of RAM while running, and I
    > really
    > do not see why. It should only have perhaps a few megabytes at max in
    > RAM.
    >
    > FWIW, the output looks like:
    > iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
    > Music/Yellowcard/Ocean Avenue Song1.m4a
    > iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
    > Music/Yellowcard/Ocean Avenue Song2.m4a
    > iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
    > Music/Yellowcard/Ocean Avenue Song3.m4a
    >
    > Style and speed optimizations are accepted, but the runtime is under a
    > minute now for the 5500 files I have in my library, so memory usage is
    > my real problem.
    >
    > Help?
    >
    > #!/usr/bin/env ruby
    > require 'find'
    > def procpath(f)
    > if File.file?(f) then
    > if File.fnmatch("*.m4a",f) then
    > found = false
    > data = IO.read(f, 65536*8)
    > re = /[[:alnum:]_., ]{9,}/
    > data.scan(re) do |string|
    > if (string =~ /QuickTime/) then
    > filename = File.basename(f)
    > dirname = File.dirname(f)
    > # puts "#{string} #{dirname}"
    > puts "#{string} #{dirname} #{filename}"
    > found = true
    > break
    > end
    > end
    > if (!found) then
    > puts "Unknown #{f}"
    > end
    > end
    > elsif File.directory?(f) && !File.fnmatch(".", f) &&
    > !File.fnmatch("..", f) then
    > Dir.foreach(f) { |subf| procpath(subf) }
    > end
    > end
    >
    > Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
    > procpath(f)
    > end
    >
    > Scott
    > --
    >
    > Java, Cocoa, and Database consulting for the life sciences
    >
    > --
    > Scott Ellsworth
    >
    > Java and database consulting for the life sciences
    >
    >


    Well mileage may vary and all that jazz, but on my box it took up
    like ~30M virtual according to top and like 1.5MB ~ 2MB physical.
    Have you tried explicity invoking the GC?
    Logan Capaldo, Jul 13, 2005
    #3
  4. Scott Ellsworth

    daz Guest

    Scott Ellsworth wrote:
    > Hi, all.
    >

    [...]
    >
    > This script blows through half a gig of RAM while running, and I really
    > do not see why. It should only have perhaps a few megabytes at max in
    > RAM.
    >

    [...]

    > if (!found) then
    > puts "Unknown #{f}"

    else
    data = nil
    GC.start # garbage collect
    > end



    Any better with that addition ?

    daz
    daz, Jul 13, 2005
    #4
  5. Scott Ellsworth

    daz Guest

    (Called away from keyboard)

    Compare last with:

    if (!found) then
    puts "Unknown #{f}"
    end
    data = nil
    GC.start # garbage collect

    .... which will garbage collect more often.

    Best,

    daz
    daz, Jul 13, 2005
    #5
  6. In article <>,
    "daz" <> wrote:

    > if (!found) then
    > puts "Unknown #{f}"
    > end
    > data = nil
    > GC.start # garbage collect


    This did seem to drop the memory usage on my MacOS X 10.4.2 system.

    I will investigate the Find.find command next to see if I can get rid of
    some recursion. An array of 5500 paths should not be _that_ big, at
    least in comparison with four or five levels of directory depth.

    Scott

    --
    Scott Ellsworth

    Java and database consulting for the life sciences
    Scott Ellsworth, Jul 14, 2005
    #6
  7. Scott Ellsworth wrote:
    > In article <>,
    > "daz" <> wrote:
    >
    >> if (!found) then
    >> puts "Unknown #{f}"
    >> end
    >> data = nil
    >> GC.start # garbage collect

    >
    > This did seem to drop the memory usage on my MacOS X 10.4.2 system.
    >
    > I will investigate the Find.find command next to see if I can get rid
    > of some recursion. An array of 5500 paths should not be _that_ big,
    > at least in comparison with four or five levels of directory depth.


    The problem might be that the data is still around while you enter the
    recursion. If you want to verify that this is the case you can simply do
    data = nil after processing. But: You definitely need to throw out the
    recursion from propath() - otherwise you'll be processing directories over
    and over again (I smell something like O(n*n) here)!

    Kind regards

    robert
    Robert Klemme, Jul 14, 2005
    #7
  8. In article <>,
    "Robert Klemme" <> wrote:

    > Scott Ellsworth wrote:
    > > In article <>,
    > > "daz" <> wrote:
    > >
    > >> if (!found) then
    > >> puts "Unknown #{f}"
    > >> end
    > >> data = nil
    > >> GC.start # garbage collect

    > >
    > > This did seem to drop the memory usage on my MacOS X 10.4.2 system.
    > >
    > > I will investigate the Find.find command next to see if I can get rid
    > > of some recursion. An array of 5500 paths should not be _that_ big,
    > > at least in comparison with four or five levels of directory depth.

    >
    > The problem might be that the data is still around while you enter the
    > recursion. If you want to verify that this is the case you can simply do
    > data = nil after processing. But: You definitely need to throw out the
    > recursion from propath() - otherwise you'll be processing directories over
    > and over again (I smell something like O(n*n) here)!


    I have removed the recursion - see below.

    A question, though, Is the String.scan method I used the best way to do
    the scan this block of data? Every file is going to contain the string
    'QuickTime' somewhere in the first few MB, and I want from the last
    nonprintable character before it to the next nonprintable character
    after. I only need to read from disk until I find that string, and once
    I find it, I need only the bytes before, plus a version number
    afterwards. I certainly do not need to manipulate more than a few
    hundred characters around that magic string, and once I have read, I do
    not need to go back.

    NB - nonprintable here is defined as [[:alnum:]_., ]

    work@boggle:Desktop$ time ./detectEncoding.rb > songs.txt

    real 3m30.563s
    user 0m26.229s
    sys 0m23.746s

    New code:

    #!/usr/bin/env ruby
    require 'find'
    re = /[[:alnum:]_., ]{9,}/
    Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
    if File.file?(f) && File.fnmatch("*.m4a",f) then
    found = false
    data = IO.read(f, 65536*8)
    data.scan(re) do |string|
    if (string =~ /QuickTime/) then
    filename = File.basename(f)
    dirname = File.dirname(f)
    puts "#{string} #{dirname}"
    # puts "#{string} #{dirname} #{filename}"
    found = true
    break
    end
    end
    if (!found) then
    puts "Unknown #{f}"
    end
    data = nil
    GC.start # garbage collect
    end
    end

    Scott

    --
    Scott Ellsworth

    Java and database consulting for the life sciences
    Scott Ellsworth, Jul 19, 2005
    #8
  9. Scott Ellsworth <> wrote:
    > In article <>,
    > "Robert Klemme" <> wrote:
    >
    >> Scott Ellsworth wrote:
    >>> In article <>,
    >>> "daz" <> wrote:
    >>>
    >>>> if (!found) then
    >>>> puts "Unknown #{f}"
    >>>> end
    >>>> data = nil
    >>>> GC.start # garbage collect
    >>>
    >>> This did seem to drop the memory usage on my MacOS X 10.4.2 system.
    >>>
    >>> I will investigate the Find.find command next to see if I can get
    >>> rid of some recursion. An array of 5500 paths should not be _that_
    >>> big, at least in comparison with four or five levels of directory
    >>> depth.

    >>
    >> The problem might be that the data is still around while you enter
    >> the recursion. If you want to verify that this is the case you can
    >> simply do data = nil after processing. But: You definitely need to
    >> throw out the recursion from propath() - otherwise you'll be
    >> processing directories over and over again (I smell something like
    >> O(n*n) here)!

    >
    > I have removed the recursion - see below.
    >
    > A question, though, Is the String.scan method I used the best way to
    > do the scan this block of data? Every file is going to contain the
    > string 'QuickTime' somewhere in the first few MB, and I want from the
    > last nonprintable character before it to the next nonprintable
    > character after. I only need to read from disk until I find that
    > string, and once I find it, I need only the bytes before, plus a
    > version number afterwards. I certainly do not need to manipulate
    > more than a few hundred characters around that magic string, and once
    > I have read, I do not need to go back.
    >
    > NB - nonprintable here is defined as [[:alnum:]_., ]


    The problem with your script is that it does not find "QuickTime" if your
    chunk reading cuts it in half (or "Q" and "uickTime" - whatever). It might
    be easier to just slurp in the complete file (depending on size - a few MB
    are no problem) and then do the scan on the single string. Also, I don't
    understand why you don't put QuickTime into your search RE.

    Kind regards

    robert


    >
    > work@boggle:Desktop$ time ./detectEncoding.rb > songs.txt
    >
    > real 3m30.563s
    > user 0m26.229s
    > sys 0m23.746s
    >
    > New code:
    >
    > #!/usr/bin/env ruby
    > require 'find'
    > re = /[[:alnum:]_., ]{9,}/
    > Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
    > if File.file?(f) && File.fnmatch("*.m4a",f) then
    > found = false
    > data = IO.read(f, 65536*8)
    > data.scan(re) do |string|
    > if (string =~ /QuickTime/) then
    > filename = File.basename(f)
    > dirname = File.dirname(f)
    > puts "#{string} #{dirname}"
    > # puts "#{string} #{dirname} #{filename}"
    > found = true
    > break
    > end
    > end
    > if (!found) then
    > puts "Unknown #{f}"
    > end
    > data = nil
    > GC.start # garbage collect
    > end
    > end
    >
    > Scott
    Robert Klemme, Aug 3, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. metfan
    Replies:
    2
    Views:
    4,847
    Robert Olofsson
    Oct 21, 2003
  2. Joseph Turian

    Reduce memory usage?

    Joseph Turian, Jan 9, 2006, in forum: C++
    Replies:
    3
    Views:
    431
    =?ISO-8859-1?Q?Stefan_N=E4we?=
    Jan 9, 2006
  3. Tincy
    Replies:
    0
    Views:
    369
    Tincy
    Jan 12, 2009
  4. Tincy
    Replies:
    0
    Views:
    314
    Tincy
    Jan 12, 2009
  5. nick
    Replies:
    58
    Views:
    1,922
    Bart van Ingen Schenau
    Mar 16, 2009
Loading...

Share This Page