ANN: zozo 1.0.0 Released

J

Jeremy Evans

= What?

zozo is a tool that makes it easy to reduce the memory footprint of your
applications by having them not load rubygems/bundler at runtime:

$ unicorn -c unicorn.conf -D
$ ps ux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME
COMMAND
jeremy 18226 0.0 0.5 17196 4496 ?? S 1:34PM 0:00.01
ruby: unicorn master -c unicorn.conf -D (ruby)
jeremy 8473 31.3 3.3 27180 30172 ?? S 1:34PM 0:00.62
ruby: unicorn worker[0] -c unicorn.conf -D (ruby)

$ zozo -R config.ru unicorn
$ ruby -I lib bin/unicorn -c unicorn.conf -D
$ ps ux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME
COMMAND
jeremy 17561 0.0 0.4 5548 3908 ?? S 1:35PM 0:00.01
ruby: unicorn master -c unicorn.conf -D (ruby)
jeremy 22626 4.2 2.0 15016 17904 ?? S 1:35PM 0:00.25
ruby: unicorn worker[0] -c unicorn.conf -D (ruby)

As you can see, the memory footprint is reduced dramatically:

master process:
VSZ: 5548/17196 => 68% reduction
RSS: 3908/4496 => 13% reduction
worker process:
VSZ: 15016/27180 => 45% reduction
RSS: 17904/30172 => 41% reduction

That's a major difference, as a 41% reduction in memory footprint means
you can host 68% more workers in the same amount of memory. It also
makes your applications faster. They will start faster because zozo
loads all necessary library files into a single directory tree. They
will run faster as there will be fewer ruby objects to check every time
the garbage collector is run.

= Why?

Rubygems is a fine package distribution system, but it is not very
efficient from a runtime memory standpoint. If your application uses
rubygems in production, every time it starts, rubygems needs to figure
out which packages to load. zozo makes it so that this is calculation
is only done once, and the result is cached into a local directory.

= How?

zozo works by starting ruby and checking the current load path. It then
requires all of the command line arguments given, and checks the load
path again. Any new entries in the load path are checked and their
contents are loaded into a local directory (lib by default). By
default, zozo uses symlinks, but it can use hard links (-H) or make
copies (-c) via a command line option.

In addition, new entries in the load path that end in /bin are loaded
into a separate local directory (bin by default). This allows you to
run them with loading rubygems via:

ruby -I lib bin/$program

zozo adds replacement rubygems.rb, ubygems.rb, and bundler.rb files to
the lib directory it creates, so it works transparently if your program
requires rubygems and/or bundler. If you run your program without
adding the lib directory zozo creates to the load path, rubygems/bundler
will be used as it was. If you run your program with the lib directory
zozo creates in the load path, then rubygems/bundler will not be loaded,
and it won't need to be because all other libraries your program uses
will already be in the load path.

= Where?

http://github.com/jeremyevans/zozo

= Who?

Jeremy Evans / (e-mail address removed)

= When?

Now:

sudo gem install zozo

= Does not work with Rails!

The replacement rubygems.rb and bundler.rb files only do the bare
minimum. The rubygems.rb file adds Kernel#gem, and the bundler.rb file
adds Bundler.setup, both of which are defined to do nothing and return
nil. No other features are mocked out. This means that frameworks that
rely on introspecting the running Gem/Bundler configuration (notably
Rails) will not work.

This is probably fixable, and I'll accept patches that allow zozo to
work with Rails, but I don't plan on working on the issue myself. As
Rails uses a substantial amount of memory by itself, it benefits less
from zozo than more memory friendly frameworks such as Sinatra.
 
J

John Barnette

Rubygems is a fine package distribution system, but it is not very
efficient from a runtime memory standpoint. If your application uses
rubygems in production, every time it starts, rubygems needs to figure
out which packages to load. zozo makes it so that this is calculation
is only done once, and the result is cached into a local directory.

I appreciate the work you've done here, but I'd also be delighted to =
hear some comments or patches to help improve RubyGems' memory =
footprint. Did you know we're up on GitHub now? If you notice any =
particularly stupid/wasteful memory stuff in RG I'd love to hear about =
it.

http://github.com/rubygems


~ j.
 
J

Jeremy Evans

John said:
I appreciate the work you've done here, but I'd also be delighted to
hear some comments or patches to help improve RubyGems' memory
footprint. Did you know we're up on GitHub now? If you notice any
particularly stupid/wasteful memory stuff in RG I'd love to hear about
it.

http://github.com/rubygems

I'm sorry if I implied that rubygems is wasteful with memory. By "not
very efficient" I meant that it uses a lot of memory compared to other
lightweight libraries such as sequel, sinatra, and unicorn. There
probably is a good reason for rubygems' memory use.

If it's possible to save the ~10MB per process by doing the rubygems'
calculation once and caching the result, I definitely think it's worth
it, especially if 10MB is a good portion of the process's memory
footprint.

In terms of analyzing rubygems' memory use, I'd probably start with tmm1
and ice799's memprof: http://github.com/ice799/memprof. I haven't
actually used it, but I've seen the presentations and I'm pretty sure it
could tell you where rubygems is using memory. If I had to guess, it
has mostly to do with how much code rubygems is loading, even without
doing anything:

$ ruby -e "system('ps ux | fgrep ruby')"
jeremy 4207 0.0 0.1 1064 2700 p8 S+ 9:13PM 0:00.01 ruby
-e syste
$ ruby -rubygems -e "system('ps ux | fgrep ruby')"
jeremy 5489 0.0 0.3 9200 11300 p8 S+ 9:13PM 0:00.15 ruby
-rubygems -e system('ps ux | fgrep ruby')

Considering how much Sequel adds:

$ ruby -I lib -r sequel -e "system('ps ux | fgrep ruby')"
jeremy 2488 0.0 0.3 6952 8952 pe S+ 9:25PM 0:00.12 ruby
-I lib -r sequel -e system('ps ux | fgrep ruby')

When you consider that rubygems' codebase is larger than Sequel's (9648
LOC for rubygems and 5548 for Sequel), it's not surprising that rubygems
takes more memory. If code size is truly the reason, the only thing you
can do is try to reduce the amount of code you load at once, if
possible. Sequel does this by not loading adapters, connection pools,
plugins, and extensions that aren't being used. Rubygems might be able
to do something similar, by only loading code necessary for the purpose
(i.e. only load the code for installing gems when the user uses gem
install). That may cause some backwards compatibility issues, though.

Jeremy
 
R

Roger Pack

zozo is a tool that makes it easy to reduce the memory footprint of your
applications by having them not load rubygems/bundler at runtime:

Fascinating.

I've been working on a rubygems replacement as well:

http://github.com/rdp/faster_rubygems

Mine replaces loading of full rubygems (+specs) with loading a cache
file listing known lib files. Zozo looks most excellent, and you'd
think you could rip out of the guts of rails' gem loading and it would
work fine with rail, though that might be hard :)

The only drawback I see to zozo is that it doesn't appear to catch gem
updates. But it would work splendidly for those ok with those
restrictions, like servers :)

-r
 
R

Roger Pack

I appreciate the work you've done here, but I'd also be delighted to hear=
some comments or patches to help improve RubyGems' memory footprint. Did y=
ou know we're up on GitHub now? If you notice any particularly stupid/waste=
ful memory stuff in RG I'd love to hear about it.

I do have some thoughts on that, as I've been experimenting lately
with speeding up rubygems.

The first thing that comes to mind is that currently rubygems always
loads *full* rubygems when all you typically need is its require
capabilities:

ex:
=3D> ["c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/defaults.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/exceptions.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/version.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/requirement.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/gem_path_searcher.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/user_interaction.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/platform.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/specification.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/source_index.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/builder.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/config_file.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/command.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/command_manager.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/gemcutter-0.5.0/lib/rubygems/commands/m=
igrate.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/gemcutter-0.5.0/lib/rubygems/commands/t=
umble.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/local_remote_options.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/remote_fetcher.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/gemcutter_utilities.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/gemcutter-0.5.0/lib/rubygems/commands/w=
ebhook.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/version_option.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/gemcutter-0.5.0/lib/rubygems/commands/y=
ank.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/specific_install-0.2.3/lib/rubygems/com=
mands/specific_install_command.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems.rb"]

There's lots of unused stuff in there.

Suggestion: load a skeleton of files by default, until needed.

Beyond that, rubygems currently takes a relatively long time to load
all the spec files (especially rdoc-data gem), when in reality all it
needs to know is the gem version and which requireable files are in
the require_paths of each gem. It can lazy load full specs when
necessary.

Suggestion: keep a cache file with all vital information in the root
of each gem path, ex:

.../gems/1.9.1/.rubygems_cache

A further optimization is to use marshal for the cache, to avoid
having to load YAML by default, etc.

I've been experimenting with with with faster_rubygems [1] and it
speeds up startup for jruby from 1.6s to 0.3s.

This avoids mounds of file stats and require's, and actually makes
ruby fast to start on windows.

I'd be happy to integrate something like this into normal rubygems, if
there's any interest...

Thanks!

-roger

[1] http://github.com/rdp/faster_rubygems
 
J

Jeremy Evans

Roger said:
Fascinating.

I've been working on a rubygems replacement as well:

http://github.com/rdp/faster_rubygems

Mine replaces loading of full rubygems (+specs) with loading a cache
file listing known lib files. Zozo looks most excellent, and you'd
think you could rip out of the guts of rails' gem loading and it would
work fine with rail, though that might be hard :)

I tried to get Rails to work with zozo for a few hours and gave up.
Rails is pretty tied to rubygems, at least in 2.3.x. It may be easier
on Rails 3, but I haven't tried.
The only drawback I see to zozo is that it doesn't appear to catch gem
updates. But it would work splendidly for those ok with those
restrictions, like servers :)

That's correct. My recommendation is that whenever the gems change, "rm
-r lib bin" (or any custom lib and bin directory names), and then
regenerate them with zozo.

Jeremy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top