Ruby, memory and speed

G

Guillaume Marcais

I have a script that aggregates data from multiple file, store it all
in a hash, and then emit a summary on standard input. The input files
(text files) are fairly big, like 4 of about 50Mb and 4 of about 350Mb.
The hash will grow to about 500 000 keys. The memory footprint of the
ruby process as reported by top is above 2 Gigs.

When the script start, it processes the files at a speed of 10K/s or
so. Not lightening fast, but will get the job done. As time goes on,
the speed drops down to 100 bytes/s or less, while still taking 100%
CPU time. Unbearable. The machine it is running on is pretty good:
4xAMD Opteron 64bit, 32G memory, local scsi raided drive.

Does the performance of Ruby collapse past a certain memory usage? Like
the GC kicks in all the time.

Any clue on how to speed this up? Any help appreciated.

Guillaume.


The code is as followed:

delta and snps are IOs. reads is a hash. max is an integer (4 in my
case).
It expects a line starting with a '>' on delta. Then it reads some
information on delta (and discard the rest) and some more information
on snps (if present). All this is then recorded in the reads hash file.
Each entry entry in the hash are arrays with the 4 best match found so
far.

def delta_reorder(delta, snps, reads, max = nil)
l = delta.gets or return
snps_a = nil
loop do
l =~ /^>(\S+)\s+(\S+)/ or break
contig_name, read_name = $1, $2
read = (reads[read_name] ||= [])
loop do
l = delta.gets or break
l[0] == ?> and break
cs, ce, rs, re, er = l.scan(/\d+/)
er_i = er.to_i
cs && ce && rs && re && er or break
l = delta.gets while l && l != "0\n"
if snps
snps_a = []
er_i.times { l << snps.gets or break; snps_a << l.split[-1] }
end
score = (re.to_i - rs.to_i).abs - 6 * er_i
if max
# i = read.bsearch_upper_boundary { |x| score <=> x[1] }
# read.insert(i, [contig_name, score, cs, ce, rs, re, er,
snps_a])
# read.slice!(max..-1) if read.size > max
if read.size >= max
min = read.min { |x, y| x[1] <=> y[1] }
if score > min[1]
min.replace([contig_name, score, cs, ce, rs, re, er,
snps_a])
end
else
read << [contig_name, score, cs, ce, rs, re, er, snps_a]
end
else
if !read[0] || score > read[0][1]
read[0] = [contig_name, score, cs, ce, rs, re, er, snps_a]
end
end
end
end
end


Example of data (comments after # are mine, not present in file):

# read_name (hash key) is gnl|ti|379331986
gi|56411835|ref|NC_004353.2| gnl|ti|379331986 1281640 769
246697 246940 722 479 22 22 0 # Keep this info. Collect 22 lines
from snps IO
0 # Skip
440272 440723 156 617 41 41 0 # Keep this info. Collect 41 lines
from snps IO
147 # Skip 'til 0
-22
-206
-1
-1
-1
-1
-1
-1
-1
-1
-1
0
441263 441492 384 152 17 17 0 # Keep. Collect lines from snps.
-44 # Skip 'til 0
-1
-1
-1
37
0
gi|56411835|ref|NC_004353.2| gnl|ti|379331989 1281640 745 # and so
forth...
453805 453934 130 1 8 8 0
0
 
R

Robert Klemme

I have a script that aggregates data from multiple file, store it all in
a hash, and then emit a summary on standard input. The input files (text
files) are fairly big, like 4 of about 50Mb and 4 of about 350Mb. The
hash will grow to about 500 000 keys. The memory footprint of the ruby
process as reported by top is above 2 Gigs.

When the script start, it processes the files at a speed of 10K/s or so.
Not lightening fast, but will get the job done. As time goes on, the
speed drops down to 100 bytes/s or less, while still taking 100% CPU
time. Unbearable. The machine it is running on is pretty good: 4xAMD
Opteron 64bit, 32G memory, local scsi raided drive.

Does the performance of Ruby collapse past a certain memory usage? Like
the GC kicks in all the time.

Any clue on how to speed this up? Any help appreciated.

Guillaume.


The code is as followed:

delta and snps are IOs. reads is a hash. max is an integer (4 in my case).
It expects a line starting with a '>' on delta. Then it reads some
information on delta (and discard the rest) and some more information on
snps (if present). All this is then recorded in the reads hash file.
Each entry entry in the hash are arrays with the 4 best match found so far.

def delta_reorder(delta, snps, reads, max = nil)
l = delta.gets or return
snps_a = nil
loop do
l =~ /^>(\S+)\s+(\S+)/ or break
contig_name, read_name = $1, $2

Small optimization, which will help only if delta_reorder is called ofen:

read = (reads[read_name.freeze] ||= [])

Background: a Hash will dup a non frozen string to avoid nasty effects
if the original changes.

<snip/>

To make people's lives who want to play with this easier you could
provide a complete test set (original script + data files).

I don't fully understand your processing but maybe there's an option to
improve this algorithm wise.

Kind regards

robert
 
B

Bill Kelly

From: "Francis Cianfrocca said:
This is a perfect example of what I've noticed many times: Ruby's
performance is perfectly fast and acceptable until the working set gets a
certain (not terribly large) size, then it falls off a cliff. GC perhaps has
something to do with it, but I suspect that's only a small part of the
problem.

Can anyone speculate as to what other subsystems or facets of Ruby's
architecture, besides GC, might possibly cause this?


Regards,

Bill
 
G

Guillaume Marcais

Le 17 ao=FBt 06, =E0 10:45, Robert Klemme a =E9crit :
I have a script that aggregates data from multiple file, store it all=20=
in a hash, and then emit a summary on standard input. The input files=20=
(text files) are fairly big, like 4 of about 50Mb and 4 of about=20
350Mb. The hash will grow to about 500 000 keys. The memory footprint=20=
of the ruby process as reported by top is above 2 Gigs.
When the script start, it processes the files at a speed of 10K/s or=20=
so. Not lightening fast, but will get the job done. As time goes on,=20=
the speed drops down to 100 bytes/s or less, while still taking 100%=20=
CPU time. Unbearable. The machine it is running on is pretty good:=20
4xAMD Opteron 64bit, 32G memory, local scsi raided drive.
Does the performance of Ruby collapse past a certain memory usage?=20
Like the GC kicks in all the time.
Any clue on how to speed this up? Any help appreciated.
Guillaume.
The code is as followed:
delta and snps are IOs. reads is a hash. max is an integer (4 in my=20=
case).
It expects a line starting with a '>' on delta. Then it reads some=20
information on delta (and discard the rest) and some more information=20=
on snps (if present). All this is then recorded in the reads hash=20
file.
Each entry entry in the hash are arrays with the 4 best match found=20=
so far.
def delta_reorder(delta, snps, reads, max =3D nil)
l =3D delta.gets or return
snps_a =3D nil
loop do
l =3D~ /^>(\S+)\s+(\S+)/ or break
contig_name, read_name =3D $1, $2

Small optimization, which will help only if delta_reorder is called=20
ofen:

read =3D (reads[read_name.freeze] ||=3D [])

Background: a Hash will dup a non frozen string to avoid nasty effects=20=
if the original changes.

<snip/>

To make people's lives who want to play with this easier you could=20
provide a complete test set (original script + data files).

Will do, when I get to my office.

Guillaume.
 
F

Francis Cianfrocca

Can anyone speculate as to what other subsystems or facets of Ruby's
architecture, besides GC, might possibly cause this?

Total, rank speculation here, but I don't suspect GC because I
consistently see this kind of problem even when a lot of objects are
being created and not many are becoming garbage. My wild
unsubstantiated guess is that Ruby creates data structures in memory
that are not designed to maximize processor- and L2- cache
utilization. Perhaps data objects overflow cache lines, or they are
heavily pointer-driven (requiring more memory-bus cycles than is
optimal), or they exhibit poor locality of reference.
 
G

Guillaume Marcais

Le 17 ao=FBt 06, =E0 11:16, Francis Cianfrocca a =E9crit :
This is a perfect example of what I've noticed many times: Ruby's
performance is perfectly fast and acceptable until the working set=20
gets a
certain (not terribly large) size, then it falls off a cliff. GC=20
perhaps has
something to do with it, but I suspect that's only a small part of the
problem.

Is there any tuning of GCC so it kicks in less frequently when the=20
memory consumption is large? Or could it be that the Hash algorithm=20
chokes when the number of keys gets large (like > 100_000)?

I guess I could re-implement in C or C++, but it saddens me because it=20=

is a quick script for (probably) a one time task and doing string=20
processing is so much easier in Ruby. I'll try in Perl first.
Before I spend a lot of time understanding your algorithm: is there a
specific reason why you need to keep the entire set in memory? You say
you're generating summary output at the end of the run, but can you
accumulate your summary in an object of fixed size?

No, I can't (at least I don't see how). I have n reads split into ns=20
files and m contigs split into ms files. Then all ns read files are=20
match against all ms contig files, which leads to ns * ms match (or=20
delta) files. And each reads may have one or many match with any of the=20=

contigs.

I want to extract for each read the 4 "best" matches. (the score is=20
defined as the length of the match minus 6 times the number of errors=20
in this match (we are talking about DNA alignment, so a match can have=20=

a small number of mismatches, due to sequencing errors, mutations,=20
etc.)). The snps here are just added information, not crucial, but=20
probably adds to the problem because it increases the size of the data=20=

set. It is the exact position of all the single mismatches inside of a=20=

match. The number of line (one mismatch per line) is given by the=20
sub-header in the delta file.

So for every set of reads, I parse the ms delta files corresponding and=20=

extract for the best 4 match for each read. The routing showed before=20
parses only one file. The same hash is used for all ms files.
Or does your summary depend on some kind of transformation on the=20
entire set
(like a numeric sort)? If so, there are things you can do to improve=20=
the
algorithm so you're not keeping a large working set in memory.

Maybe. I haven't seen it yet. And I went first with the dumb algorithm=20=

as it is an ad-hock script, not part of some great piece of software.

Thanks,
Guillaume.=
 
M

Mauricio Fernandez

]
When the script start, it processes the files at a speed of 10K/s or
so. Not lightening fast, but will get the job done. As time goes on,
the speed drops down to 100 bytes/s or less, while still taking 100%
CPU time. Unbearable. The machine it is running on is pretty good:
4xAMD Opteron 64bit, 32G memory, local scsi raided drive.

Does the performance of Ruby collapse past a certain memory usage? Like
the GC kicks in all the time.

Any clue on how to speed this up? Any help appreciated.
[...]

Have you tried to increment GC_MALLOC_LIMIT in gc.c?
 
D

David Vallner

Is there any tuning of GCC so it kicks in less frequently when the
memory consumption is large? Or could it be that the Hash algorithm
chokes when the number of keys gets large (like > 100_000)?

If I remember a past thread correctly, the Ruby GC is set to keep memory
usage below 8 MB by default. This is specified is determined by a #define
in the interpreter source code, and I don't think it's really an adaptive
algorithm. If GC is the performance problem, you could try changing that
constant and rebuilding the interpreter.

As for the Hash problem, profiling could probably tell you that. Of
course, profiling with such a huge dataset when you already know the code
gets anomalously slow might take really, really long.

David Vallner
 
G

guillaume.marcais

Mauricio said:
]
When the script start, it processes the files at a speed of 10K/s or
so. Not lightening fast, but will get the job done. As time goes on,
the speed drops down to 100 bytes/s or less, while still taking 100%
CPU time. Unbearable. The machine it is running on is pretty good:
4xAMD Opteron 64bit, 32G memory, local scsi raided drive.

Does the performance of Ruby collapse past a certain memory usage? Like
the GC kicks in all the time.

Any clue on how to speed this up? Any help appreciated.
[...]

Have you tried to increment GC_MALLOC_LIMIT in gc.c?

No. Any hint on what value I can raised it to? Any upper limit that I
should not cross?

Guillaume.
 
G

guillaume.marcais

Mauricio said:
]
When the script start, it processes the files at a speed of 10K/s or
so. Not lightening fast, but will get the job done. As time goes on,
the speed drops down to 100 bytes/s or less, while still taking 100%
CPU time. Unbearable. The machine it is running on is pretty good:
4xAMD Opteron 64bit, 32G memory, local scsi raided drive.

Does the performance of Ruby collapse past a certain memory usage? Like
the GC kicks in all the time.

Any clue on how to speed this up? Any help appreciated.
[...]

Have you tried to increment GC_MALLOC_LIMIT in gc.c?

No. Any hint on what value I can raised it to? Any upper limit that I
should not cross?

Guillaume.
 
G

guillaume.marcais

David said:
If I remember a past thread correctly, the Ruby GC is set to keep memory
usage below 8 MB by default. This is specified is determined by a #define
in the interpreter source code, and I don't think it's really an adaptive
algorithm. If GC is the performance problem, you could try changing that
constant and rebuilding the interpreter.

As for the Hash problem, profiling could probably tell you that. Of
course, profiling with such a huge dataset when you already know the code
gets anomalously slow might take really, really long.

Yeah, that's my problem. I did some profiling on some smaller subset,
and that's why I removed the binary search (see posted code). A binary
search on a sorted of only 4 elements doesn't buy you anything, it was
even slower. But I never tried with a dataset that would push the
memory consumption, sot I got no insight on that.

Guillaume.
 
J

John Carter

Small optimization, which will help only if delta_reorder is called ofen:

read = (reads[read_name.freeze] ||= [])

Background: a Hash will dup a non frozen string to avoid nasty effects if the
original changes.

Stole that for
http://rubygarden.org:3000/Ruby/page/show/RubyOptimization


Suggestions for the Original Poster...

* Browse that Wiki page, it may have something for you. (Alternatively,
once you solve your problem add the solution to that page!)

* If you are on Linux, use "vmstat 5"
eg.
vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 127628 14308 4548 198096 1 1 28 7 27 20 8 1 86 5
0 0 127628 14308 4548 198096 0 0 0 0 421 835 0 0 100 0


Watch the "si" and "so". (Swap In Swap Out) If you are swapping 2 or
more swaps every 5 seconds, then you don't have ruby GC problems, you
have memory problems. ie. Tweaking GC won't help. You have to store less
in ram full stop. Remember to set any dangling references that you won't
use again to nil, especially from class variables and globals.





John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand

Carter's Clarification of Murphy's Law.

"Things only ever go right so that they may go more spectacularly wrong later."

From this principle, all of life and physics may be deduced.
 
M

M. Edward (Ed) Borasky

Bill said:
Can anyone speculate as to what other subsystems or facets of Ruby's
architecture, besides GC, might possibly cause this?


Regards,

Bill

I've posted a script for building a "gprof-enabled" Ruby on GNU/Linux
systems at

http://rubyforge.org/cgi-bin/viewvc.cgi/MatrixBenchmark/?root=cougar

It's a pretty simple process to dig into at least the CPU portion of the
equation -- where is the Ruby interpreter spending cycles interpreting
your script? Some more sophisticated tools are required to profile
memory usage, but it's more or less impossible for a normally-designed
program to abuse memory without spending CPU cycles doing it. :)

Another simple thing you can do is run "top" while your Ruby script is
executing. On Windows, the Task Manager will do the same thing in the
"Process" tab. Sort in descending order on CPU and if your script is at
the "top", it's spending CPU time. Sort in descending order on memory
size and if your script is at the top without a lot of CPU usage, it's a
memory management bottleneck.
 
G

Guillaume Marcais

Le 18 ao=FBt 06, =E0 00:07, John Carter a =E9crit :
Small optimization, which will help only if delta_reorder is called=20=
ofen:

read =3D (reads[read_name.freeze] ||=3D [])

Background: a Hash will dup a non frozen string to avoid nasty=20
effects if the original changes.

Stole that for
http://rubygarden.org:3000/Ruby/page/show/RubyOptimization


Suggestions for the Original Poster...

* Browse that Wiki page, it may have something for you. = (Alternatively,
once you solve your problem add the solution to that page!)

Thanks for the pointer.

I also used Mmap#scan. It is pretty elegant compare to the usual:

io.each { |l|
l.chomp!
next unless l =3D=98 /blah/
...
}

I would think it is faster too (no formal testing done).
* If you are on Linux, use "vmstat 5" eg.
vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--=20
----cpu----
r b swpd free buff cache si so bi bo in cs us=20=
sy id wa
0 0 127628 14308 4548 198096 1 1 28 7 27 20 8 =20=
1 86 5
0 0 127628 14308 4548 198096 0 0 0 0 421 835 0 =20=
0 100 0


Watch the "si" and "so". (Swap In Swap Out) If you are swapping 2 or
more swaps every 5 seconds, then you don't have ruby GC problems, you
have memory problems. ie. Tweaking GC won't help. You have to store=20
less
in ram full stop. Remember to set any dangling references that you=20
won't
use again to nil, especially from class variables and globals.

Checked that. But no, the machine has plenty of memory and si/so was=20
always 0.

I finally went back to doing stream parsing. Instead of aggregating the=20=

information from many file in one big hash, I read and write one record=20=

at a time in a format suitable for sort (the UNIX command). Then pipe=20
it to sort. Finally, I merge all relevant files together with 'sort=20
-m'. This lead to the result in a few hours only.

Thank you all for your suggestions,
Guillaume.
 
M

Mauricio Fernandez

Mauricio said:
Does the performance of Ruby collapse past a certain memory usage? Like
the GC kicks in all the time.

Any clue on how to speed this up? Any help appreciated.
[...]

Have you tried to increment GC_MALLOC_LIMIT in gc.c?

No. Any hint on what value I can raised it to? Any upper limit that I
should not cross?

You have so much RAM you could probably afford to set it to a few hundred
MBs... If malloc_limit grows quickly enough it won't really make a
difference, though. It'd be nice to know if it actually does in your case,
for future reference.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top