REXML memory consumption

R

Ray Chen

My process memory usage has been increasing steadily, and some probing
pointed me to REXML. I created a test that consisted of feeding 10 xml
files ranging in size from 15kB to 270kB to REXML::Document.new(). The
files are fed smallest to largest. I would think that memory usage
should return back to ~8 MB since the REXML::Document should go out of
scope, and everything should get garbage-collected.

Is there something wrong with my understanding of Ruby or does REXML
hold onto memory?

===Memory Usage===
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
ray 26354 0.0 0.2 20724 7920 pts/9 S+ 22:46 0:00 ruby
mem_test2.rb
ray 26354 0.0 0.2 20912 7988 pts/9 S+ 22:46 0:00 ruby
mem_test2.rb
ray 26354 18.0 0.3 24000 11044 pts/9 S+ 22:46 0:00 ruby
mem_test2.rb
ray 26354 31.0 0.4 27696 14772 pts/9 S+ 22:46 0:00 ruby
mem_test2.rb
ray 26354 44.0 0.5 28752 15812 pts/9 S+ 22:46 0:00 ruby
mem_test2.rb
ray 26354 62.0 0.5 28916 15944 pts/9 S+ 22:46 0:00 ruby
mem_test2.rb
ray 26354 88.0 0.6 34204 21144 pts/9 R+ 22:46 0:00 ruby
mem_test2.rb
ray 26354 57.5 0.7 37236 24272 pts/9 R+ 22:46 0:01 ruby
mem_test2.rb
ray 26354 73.5 0.8 38816 25920 pts/9 R+ 22:46 0:01 ruby
mem_test2.rb
ray 26354 96.0 0.9 42900 29720 pts/9 R+ 22:46 0:01 ruby
mem_test2.rb


===Test Code===
require 'rexml/document'

def construct(i)
#create the string
f = File.open("/tmp/#{i}.xml", 'r')
str = ''

while line = f.gets
str << line
end
f.close

#construct the xml
xml = REXML::Document.new(str)
xml = nil

return nil
end


puts 'USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
COMMAND'

for j in 1..10
construct(j)
GC.start
puts `ps aux | grep 'ruby mem_test2' | grep -v grep`
end
 
R

Robert Klemme

2007/11/17 said:
My process memory usage has been increasing steadily, and some probing
pointed me to REXML. I created a test that consisted of feeding 10 xml
files ranging in size from 15kB to 270kB to REXML::Document.new(). The
files are fed smallest to largest. I would think that memory usage
should return back to ~8 MB since the REXML::Document should go out of
scope, and everything should get garbage-collected.

Is there something wrong with my understanding of Ruby or does REXML
hold onto memory?

You probably just overlooked that there are two levels of memory
management: Ruby's internal MM and the operating system's MM. You
just looked at the OS side. While the memory can be released
internally this does not mean that Ruby will give it back to the OS.
There were more exhaustive discussions of the topic here (meaning
ruby-talk) - you'll find them in the archives.

Cheers

robert
 
J

Jano Svitok

My process memory usage has been increasing steadily, and some probing
pointed me to REXML. I created a test that consisted of feeding 10 xml
files ranging in size from 15kB to 270kB to REXML::Document.new(). The
files are fed smallest to largest. I would think that memory usage
should return back to ~8 MB since the REXML::Document should go out of
scope, and everything should get garbage-collected.

Is there something wrong with my understanding of Ruby or does REXML
hold onto memory?

You can get marginally better by replacing
#create the string
f = File.open("/tmp/#{i}.xml", 'r')
str = ''

while line = f.gets
str << line
end
f.close

with

str = File.read("/tmp/#{i}.xml")

NB: The your version would be better written (with regards to
exception safety etc.) as:

str = ''
File.open("/tmp/#{i}.xml", 'r') do |f|
while line = f.gets
str << line
end
end
#construct the xml
xml = REXML::Document.new(str)
xml = nil

return nil
end

As Robert said, there are more things happening. One of them is that
ruby allocates memory in increasing heap blocks.
If anything used is still inside the block, the block won't be
released to system.

I tried to reuse one string as a buffer for the file, but it didn't
help [see IO#read(lenght, buffer)]. Other thing I tried was to
send the file itself to REXML::Document.new, but it was even worse [
File.open(...) {|f| REXML::Doc.new(f) }].

This is on win xp sp2.

You can find on the net some tools to find out what consumes the
memory - but most of them are in the
hacks category (no offense!). On windows there is the Ruby Memory
Validator that does a similar job.
 
R

Robert Klemme

You can get marginally better by replacing

with

str = File.read("/tmp/#{i}.xml")

There is an even better method for reading XML documents:

doc = File.open("/tmp/#{i}.xml", 'rb') {|io| REXML::Document.new io}

No need to read the whole file into a large string before it is parsed
as XML.
NB: The your version would be better written (with regards to
exception safety etc.) as:

str = ''
File.open("/tmp/#{i}.xml", 'r') do |f|
while line = f.gets
str << line
end
end

If I would be doing the reading myself I'd choose #read over #gets. The
reason is that line reading is a form of parsing the input and that
should be left to the XML parser.
#construct the xml
xml = REXML::Document.new(str)
xml = nil

return nil
end

As Robert said, there are more things happening. One of them is that
ruby allocates memory in increasing heap blocks.
If anything used is still inside the block, the block won't be
released to system.

I tried to reuse one string as a buffer for the file, but it didn't
help [see IO#read(lenght, buffer)]. Other thing I tried was to
send the file itself to REXML::Document.new, but it was even worse [
File.open(...) {|f| REXML::Doc.new(f) }].

Really? Interesting. This is the form I would prefer for the simple
reason that at no point in time there are two copies of the file in
memory. A quick test reveals that the total memory of a process using
this idiom is higher than using the other idiom.

$ ruby -r rexml/document -e
'd=File.open("Anwendungsdaten/Skype/shared.xml","rb") {|io|
REXML::Document.new io};sleep 10'

-> 4924kb

$ ruby -r rexml/document -e
'd=REXML::Document.new(File.read("Anwendungsdaten/Skype/shared.xml"));sleep
10'

-> 4876kb

$ du -k Anwendungsdaten/Skype/shared.xml
28 Anwendungsdaten/Skype/shared.xml

This is ruby 1.8.5 on cygwin on Win XP SP2. I'd probably still stick
with the former approach since it seems more reasonable to let the
parser read from the IO and not from a string and the difference is not
too big.
This is on win xp sp2.

You can find on the net some tools to find out what consumes the
memory - but most of them are in the
hacks category (no offense!). On windows there is the Ruby Memory
Validator that does a similar job.

Thanks for the hint.

Cheers

robert
 
R

Ray Chen

Thanks for the hints about memory management. I started reading other
threads on the issue, and I now understanding that profiling from the OS
side isn't entirely accurate.

My actual code parses from a string, so I wanted to preserve
constructing a REXML::Document from a string, even if that part was
poorly written in the test script.

Thanks again.

Ray
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,876
Messages
2,569,929
Members
46,197
Latest member
CalebV535

Latest Threads

Top