How do I reduce the memory usage of a script?

S

Scott Ellsworth

Hi, all.

Please find attached a simple Ruby script that rummages through my
ITunes files, reads the first megabyte or so, finds the encoder, and
then prints the encoder and filename. This lets me know which tracks
need re-ripping.

This script blows through half a gig of RAM while running, and I really
do not see why. It should only have perhaps a few megabytes at max in
RAM.

FWIW, the output looks like:
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song1.m4a
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song2.m4a
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song3.m4a

Style and speed optimizations are accepted, but the runtime is under a
minute now for the 5500 files I have in my library, so memory usage is
my real problem.

Help?

#!/usr/bin/env ruby
require 'find'
def procpath(f)
if File.file?(f) then
if File.fnmatch("*.m4a",f) then
found = false
data = IO.read(f, 65536*8)
re = /[[:alnum:]_., ]{9,}/
data.scan(re) do |string|
if (string =~ /QuickTime/) then
filename = File.basename(f)
dirname = File.dirname(f)
# puts "#{string} #{dirname}"
puts "#{string} #{dirname} #{filename}"
found = true
break
end
end
if (!found) then
puts "Unknown #{f}"
end
end
elsif File.directory?(f) && !File.fnmatch(".", f) &&
!File.fnmatch("..", f) then
Dir.foreach(f) { |subf| procpath(subf) }
end
end

Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
procpath(f)
end

Scott
 
J

John Carter

#!/usr/bin/env ruby
require 'find'
def procpath(f)
if File.file?(f) then
if File.fnmatch("*.m4a",f) then
found = false
data = IO.read(f, 65536*8)
re = /[[:alnum:]_., ]{9,}/
data.scan(re) do |string|
if (string =~ /QuickTime/) then
filename = File.basename(f)
dirname = File.dirname(f)
# puts "#{string} #{dirname}"
puts "#{string} #{dirname} #{filename}"
found = true
break
end
end
if (!found) then
puts "Unknown #{f}"
end
end
elsif File.directory?(f) && !File.fnmatch(".", f) &&
!File.fnmatch("..", f) then
Dir.foreach(f) { |subf| procpath(subf) }

Why are you recursing here? Find.find does this stuff for you!
end
end

Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
procpath(f)
end


John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand

Carter's Clarification of Murphy's Law.

"Things only ever go right so that they may go more spectacularly wrong later."

From this principle, all of life and physics may be deduced.
 
L

Logan Capaldo

Hi, all.

Please find attached a simple Ruby script that rummages through my
ITunes files, reads the first megabyte or so, finds the encoder, and
then prints the encoder and filename. This lets me know which tracks
need re-ripping.

This script blows through half a gig of RAM while running, and I
really
do not see why. It should only have perhaps a few megabytes at max in
RAM.

FWIW, the output looks like:
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song1.m4a
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song2.m4a
iTunes v4.9, QuickTime 7.0.1 /Users/work/Music/iTunes/iTunes
Music/Yellowcard/Ocean Avenue Song3.m4a

Style and speed optimizations are accepted, but the runtime is under a
minute now for the 5500 files I have in my library, so memory usage is
my real problem.

Help?

#!/usr/bin/env ruby
require 'find'
def procpath(f)
if File.file?(f) then
if File.fnmatch("*.m4a",f) then
found = false
data = IO.read(f, 65536*8)
re = /[[:alnum:]_., ]{9,}/
data.scan(re) do |string|
if (string =~ /QuickTime/) then
filename = File.basename(f)
dirname = File.dirname(f)
# puts "#{string} #{dirname}"
puts "#{string} #{dirname} #{filename}"
found = true
break
end
end
if (!found) then
puts "Unknown #{f}"
end
end
elsif File.directory?(f) && !File.fnmatch(".", f) &&
!File.fnmatch("..", f) then
Dir.foreach(f) { |subf| procpath(subf) }
end
end

Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
procpath(f)
end

Scott

Well mileage may vary and all that jazz, but on my box it took up
like ~30M virtual according to top and like 1.5MB ~ 2MB physical.
Have you tried explicity invoking the GC?
 
D

daz

Scott said:
Hi, all.
[...]

This script blows through half a gig of RAM while running, and I really
do not see why. It should only have perhaps a few megabytes at max in
RAM.
[...]

if (!found) then
puts "Unknown #{f}"
else
data = nil
GC.start # garbage collect


Any better with that addition ?

daz
 
D

daz

(Called away from keyboard)

Compare last with:

if (!found) then
puts "Unknown #{f}"
end
data = nil
GC.start # garbage collect

.... which will garbage collect more often.

Best,

daz
 
S

Scott Ellsworth

daz said:
if (!found) then
puts "Unknown #{f}"
end
data = nil
GC.start # garbage collect

This did seem to drop the memory usage on my MacOS X 10.4.2 system.

I will investigate the Find.find command next to see if I can get rid of
some recursion. An array of 5500 paths should not be _that_ big, at
least in comparison with four or five levels of directory depth.

Scott
 
R

Robert Klemme

Scott said:
This did seem to drop the memory usage on my MacOS X 10.4.2 system.

I will investigate the Find.find command next to see if I can get rid
of some recursion. An array of 5500 paths should not be _that_ big,
at least in comparison with four or five levels of directory depth.

The problem might be that the data is still around while you enter the
recursion. If you want to verify that this is the case you can simply do
data = nil after processing. But: You definitely need to throw out the
recursion from propath() - otherwise you'll be processing directories over
and over again (I smell something like O(n*n) here)!

Kind regards

robert
 
S

Scott Ellsworth

Robert Klemme said:
The problem might be that the data is still around while you enter the
recursion. If you want to verify that this is the case you can simply do
data = nil after processing. But: You definitely need to throw out the
recursion from propath() - otherwise you'll be processing directories over
and over again (I smell something like O(n*n) here)!

I have removed the recursion - see below.

A question, though, Is the String.scan method I used the best way to do
the scan this block of data? Every file is going to contain the string
'QuickTime' somewhere in the first few MB, and I want from the last
nonprintable character before it to the next nonprintable character
after. I only need to read from disk until I find that string, and once
I find it, I need only the bytes before, plus a version number
afterwards. I certainly do not need to manipulate more than a few
hundred characters around that magic string, and once I have read, I do
not need to go back.

NB - nonprintable here is defined as [[:alnum:]_., ]

work@boggle:Desktop$ time ./detectEncoding.rb > songs.txt

real 3m30.563s
user 0m26.229s
sys 0m23.746s

New code:

#!/usr/bin/env ruby
require 'find'
re = /[[:alnum:]_., ]{9,}/
Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
if File.file?(f) && File.fnmatch("*.m4a",f) then
found = false
data = IO.read(f, 65536*8)
data.scan(re) do |string|
if (string =~ /QuickTime/) then
filename = File.basename(f)
dirname = File.dirname(f)
puts "#{string} #{dirname}"
# puts "#{string} #{dirname} #{filename}"
found = true
break
end
end
if (!found) then
puts "Unknown #{f}"
end
data = nil
GC.start # garbage collect
end
end

Scott
 
R

Robert Klemme

Scott Ellsworth said:
Robert Klemme said:
The problem might be that the data is still around while you enter
the recursion. If you want to verify that this is the case you can
simply do data = nil after processing. But: You definitely need to
throw out the recursion from propath() - otherwise you'll be
processing directories over and over again (I smell something like
O(n*n) here)!

I have removed the recursion - see below.

A question, though, Is the String.scan method I used the best way to
do the scan this block of data? Every file is going to contain the
string 'QuickTime' somewhere in the first few MB, and I want from the
last nonprintable character before it to the next nonprintable
character after. I only need to read from disk until I find that
string, and once I find it, I need only the bytes before, plus a
version number afterwards. I certainly do not need to manipulate
more than a few hundred characters around that magic string, and once
I have read, I do not need to go back.

NB - nonprintable here is defined as [[:alnum:]_., ]

The problem with your script is that it does not find "QuickTime" if your
chunk reading cuts it in half (or "Q" and "uickTime" - whatever). It might
be easier to just slurp in the complete file (depending on size - a few MB
are no problem) and then do the scan on the single string. Also, I don't
understand why you don't put QuickTime into your search RE.

Kind regards

robert

work@boggle:Desktop$ time ./detectEncoding.rb > songs.txt

real 3m30.563s
user 0m26.229s
sys 0m23.746s

New code:

#!/usr/bin/env ruby
require 'find'
re = /[[:alnum:]_., ]{9,}/
Find.find("/Users/work/Music/iTunes/iTunes Music/") do |f|
if File.file?(f) && File.fnmatch("*.m4a",f) then
found = false
data = IO.read(f, 65536*8)
data.scan(re) do |string|
if (string =~ /QuickTime/) then
filename = File.basename(f)
dirname = File.dirname(f)
puts "#{string} #{dirname}"
# puts "#{string} #{dirname} #{filename}"
found = true
break
end
end
if (!found) then
puts "Unknown #{f}"
end
data = nil
GC.start # garbage collect
end
end

Scott
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top