Automatically Determining "Requires" and "Provides" information for a Ruby script or library

Scott Parkerson · Jan 19, 2007

I'm looking to write a script that examines one or more scripts
written in Ruby and programatically determine the following:

* What does each script provide? Specifically, what classes are
defined, whether they are in a module namespace, etc.

* What does the script require to operate (i.e. dependencies on other
Ruby scripts)

I need this because I am looking into writing Ruby support into the
Conary packaging system (c.f. http://wiki.rpath.com/wiki/Conary).
Conary already automatically generates provides/requires information
when packaging scripts for other popular languages (e.g. Perl, Python,
Java), so I think that having Ruby support would be A Good Thing.

Ideally, I should be able to 100% accurately identify what the script
provides, and *mostly* identify requires for dependencies. I say
mostly because many scripts use tricks to load plugins that involve
using the module's filename and path examined at runtime. The main
thing is that there are no "false provides/requires" returned by my
script.

I would like the solution to not require examining gemspecs, or even
require gems at all.

If anyone has any ideas about how to go about doing this, please
respond here. Let's talk!

Thanks in advance,
Scott

gwtmp01 · Jan 19, 2007

Ideally, I should be able to 100% accurately identify what the script
provides, and *mostly* identify requires for dependencies.

I don't think this is technically possible because of the dynamic
nature of Ruby. That is to say that I think the task you describe
is equivalent to the halting problem as it requires the ability to
divine the intent of executable code by simply examining the code
as opposed to running it and seeing what happens.

There are certainly heuristics you could use to get pretty close to
100% for 'normal' code (searching for 'class X' and for 'require...'),
but you'll never get to 100% and you'll probably have to re-implement
a good portion of the Ruby parser in the process (i.e. to deal with
nested classes/modules). That may be good enough for your needs though.

Maybe there is something that already does that out there? Googling...

Check out
http://www.zenspider.com/ZSS/Products/ParseTree/index.html and
http://www.zenspider.com/ZSS/Products/ParseTree/Examples/
Dependencies.html
maybe that will give you some ideas.

Gary Wright

Scott Parkerson · Jan 19, 2007

I don't think this is technically possible because of the dynamic
nature of Ruby. That is to say that I think the task you describe
is equivalent to the halting problem as it requires the ability to
divine the intent of executable code by simply examining the code
as opposed to running it and seeing what happens.

I think you are right, so let me amend my request a bit.

* I don't need to get the exact signature of every method provided or
required. File-based provides should be enough for what we need.
Anything more complex can be synthesized manually. In most cases, Ruby
provide would essentially be the filepath, with the $LOAD_PATH chopped
off the front and the extension removed. Thus,
/usr/lib/ruby/1.8/yaml.rb provides 'yaml'.

* Requires could be whatever was required at require time. I wrote a
quick and dirty requires generator that essentially overrode
Kernel.require to stuff the argument to require into a Set. Thus, for
yaml:

/usr/lib/ruby/1.8/yaml.rb requires the following modules:
"yaml/constants"
"yaml/ypath"
"yaml/error"
"yaml/rubytypes"
"stringio"
"date"
"rational"
"syck"
"yaml/syck"
"yaml/basenode"
"date/format"
"yaml/tag"
"yaml/stream"
"yaml/types"

The big question is whether Ruby C extensions are always required by
filename (i.e. if you have a C extension called big/foo.so, require
'big/foo' will load it). In Python, this is tricky, as the shared
library name may not be the thing you use with import at all.

The bottom line is to have a "good enough" provides/requires mechanism
that automates packaging information. It's obviously not perfect, as
two foo.rb's might do wildly different things.

Here's the code snippet, so far (quick and dirty is probably a vast
understatement):

require 'set'
$required = Set.new

module Kernel
alias_method

ld_require, :require
def require(m)
begin
result = old_require(m)
rescue LoadError => blargh
print "warning: #{blargh}\n"
rescue NameError
true
end
$required = $required.add(m)
result
end
end

require(ARGV[0])

print "#{ARGV[0]} requires the following modules:\n"
$required.each { |file| p file }

Zev Blut · Jan 22, 2007

Hello,

I'm looking to write a script that examines one or more scripts
written in Ruby and programatically determine the following:

* What does the script require to operate (i.e. dependencies on other
Ruby scripts)

I have a tool in the kwala project on rubyforge that attempts to
determine this. It uses the Java prefuse library to display a dynamic
graph for inspection, or if you don't want the Java dependency you can
have it output a static graphviz graph. It also does a few other
things like find require cycles. If you are interested take a look in
the cycle_detector.rb file.

You can find it here:
http://kwala.rubyforge.org/

I hope that helps,
Zev

Ruby and E.V.E. Paradox	33	Jan 16, 2007
Ruby Weekly News 22nd - 28th August 2005	0	Aug 31, 2005
Ruby Weekly News 26th June - 2nd July 2006	2	Jul 4, 2006
Ruby Weekly News 7th - 13th March 2005	1	Mar 13, 2005
Ruby Weekly News 28th February - 6th March 2005	1	Mar 6, 2005
Ruby Weekly News 7th - 13th November 2005	1	Nov 15, 2005
Ruby Weekly News 24th April - 1st May 2005	1	May 7, 2005
Ruby Weekly News 18th - 24th April 2005	1	Apr 26, 2005

Automatically Determining "Requires" and "Provides" information for a Ruby script or library

Scott Parkerson

gwtmp01

Scott Parkerson

Zev Blut

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads