Compiling Ruby code

G

Guest

Hi,

Once in a while the question pops up if it is possible to compile Ruby
code to native machine code. The answer has always been no. But I keep
wondering how hard it would really be to make this possible.

Ruby is written in C. And when Ruby parses a Ruby script it converts
each statement to a C call. Probably the same calls you can use on your
own in a Ruby C extension. So why wouldn't it be possible to parse a
Ruby script and convert all statements to Ruby C code and put it in a
*.c file (instead of calling the Ruby C statements directly). This *.c
file can then be compiled into machine code with a C compiler like gcc.
If each *.rb file is converted to a C file it could be compiled to a
dynamically loadable library which could then be used on require
statements (just like regular Ruby C extensions).

What I mean is, this...

class Example
def example
puts "Hello World!"
end
end

.... can also be written in C using the Ruby C API, am I right? So why
wouldn't it be possible to convert all Ruby code to C code using the
Ruby C API?

This would probably result in some performance gain (no need to parse
the code anymore at run-time), but for some people more important, you
can distribute your Ruby applications closed-source. In the future the
performance gain maybe could be increased by performing special
optimizations during the conversion process.

Am I right on this, or do I forget something important which makes the
above quite hard to do?

With kind regards,

Peter
 
R

Robert Klemme

Nospam said:
Hi,

Once in a while the question pops up if it is possible to compile Ruby
code to native machine code. The answer has always been no. But I keep
wondering how hard it would really be to make this possible.

Ruby is written in C. And when Ruby parses a Ruby script it converts
each statement to a C call. Probably the same calls you can use on your
own in a Ruby C extension. So why wouldn't it be possible to parse a
Ruby script and convert all statements to Ruby C code and put it in a
*.c file (instead of calling the Ruby C statements directly). This *.c
file can then be compiled into machine code with a C compiler like gcc.
If each *.rb file is converted to a C file it could be compiled to a
dynamically loadable library which could then be used on require
statements (just like regular Ruby C extensions).

What I mean is, this...

class Example
def example
puts "Hello World!"
end
end

... can also be written in C using the Ruby C API, am I right? So why
wouldn't it be possible to convert all Ruby code to C code using the
Ruby C API?

I guess it *is* possible, but then you would have to bundle gcc with the
Ruby interpreter, because you must deal with dynamic redefinition of
methods and eval.
This would probably result in some performance gain (no need to parse
the code anymore at run-time),

AFAIK at runtime there is a parse phase and an execute phase. Each line
of code is only parsed once and then converted into some kind of byte code
(not the Java flavour), so you probably don't gain much time.
but for some people more important, you
can distribute your Ruby applications closed-source.

That would impose certain restrictions, especially you must forbid this:

# just a stupid example that loads different files
# depending on some runtime state
case some_var
when "foo"
require "x-as-foo-impl.rb"
when "bar"
require "x-as-bar-impl.rb"
when /^(\w+)-\w+$/
require "x-as-#{$1}-impl.rb"
else
require "x-default-impl.rb"
end
In the future the
performance gain maybe could be increased by performing special
optimizations during the conversion process.

Am I right on this, or do I forget something important which makes the
above quite hard to do?

I think so, as noted above.

Regards

robert
 
M

Michael Neumann

Nospam said:
Hi,

Once in a while the question pops up if it is possible to compile Ruby
code to native machine code. The answer has always been no. But I keep
wondering how hard it would really be to make this possible.

Ruby is written in C. And when Ruby parses a Ruby script it converts
each statement to a C call. Probably the same calls you can use on your
own in a Ruby C extension. So why wouldn't it be possible to parse a
Ruby script and convert all statements to Ruby C code and put it in a
*.c file (instead of calling the Ruby C statements directly). This *.c
file can then be compiled into machine code with a C compiler like gcc.
If each *.rb file is converted to a C file it could be compiled to a
dynamically loadable library which could then be used on require
statements (just like regular Ruby C extensions).

What I mean is, this...

class Example
def example
puts "Hello World!"
end
end

.... can also be written in C using the Ruby C API, am I right? So why
wouldn't it be possible to convert all Ruby code to C code using the
Ruby C API?

This would probably result in some performance gain (no need to parse
the code anymore at run-time), but for some people more important, you
can distribute your Ruby applications closed-source. In the future the
performance gain maybe could be increased by performing special
optimizations during the conversion process.

I remember that a long time ago, there was a ruby-to-c compiler (was it
called r2c?). But IIRC, there was only little performance gain. Remember
that you still need a Ruby parser, due to "eval". It would be nice, but
I'd even more like to see a bytecode compiler (written in pure Ruby
running on top of the bytecode interpreter).

Regards,

Michael
 
S

Scott Rubin

Michael said:
I remember that a long time ago, there was a ruby-to-c compiler (was it
called r2c?). But IIRC, there was only little performance gain. Remember
that you still need a Ruby parser, due to "eval". It would be nice, but
I'd even more like to see a bytecode compiler (written in pure Ruby
running on top of the bytecode interpreter).

Regards,

Michael

Yes, I have to vote for a bytecode compiler. Right now I'm using ruby
to develop some software on an embedded arm-linux device. I've found
that for this application, which is not very demanding of the system,
that ruby can perform pretty comparably to an equivalent C program. And
the development time it saves to write in a higher level language is
worth its weight in gold. But every time I run a ruby program it takes
it a significant and noticable amount of time to start up. This is
obviously due to the ruby parser compiling the text into bytecode every
time. If it was possible to precompile this bytecode and put that on the
target machine it would have significant advantages. Also, it would make
it possible to distribute the application without distributing the
source code to it. That isn't so important for us, but it may be for
others.

2 cents. Scott
 
G

gabriele renzi

il Mon, 19 Jul 2004 13:54:53 +0200, Nospam
<[email protected]> ha scritto::

I agree that this could be done, linking the runtime for dynamic
behaviour.
but the performance hit would still be in the method lookup phase, as
projects like python-to-c converter pointed out.
You need to do some clever type inference and runtime optimization,
and that won't be so easy.
Look out for starkiller (python compiler+type inference) for a
discussion on this.
 
K

Kristof Bastiaensen

Hi,
Hi

<snip>

What I mean is, this...

class Example
def example
puts "Hello World!"
end
end

... can also be written in C using the Ruby C API, am I right? So why
wouldn't it be possible to convert all Ruby code to C code using the
Ruby C API?

Yes, that's possible, but depending on the C code you write, there
isn't much performance gain. You can write any ruby code in C,
but if you just convert it litteraly to C, there isn't much performance
gain. For example when you write
arr[1] = 2 #arr is an Array
you could use rb_ary_store(arr, 1, INT2FIX(2)), but that wouldn't be
equivalent to the ruby code. When the Array#[] method would change, the
ruby code would see the change, but the C code wouldn't. The solution
would be to use something like rb_funcall(arr, rb_intern("[]"), 0), but
then there would be no performance benefit, because ruby still has to look
up the method. The point is that it is possible to write more efficient
ruby code in C, but it isn't really ruby code anymore.
This would probably result in some performance gain (no need to parse
the code anymore at run-time), but for some people more important, you
can distribute your Ruby applications closed-source. In the future the
performance gain maybe could be increased by performing special
optimizations during the conversion process.

The parsing fase doesn't take so much performance, since it is only done
at loading time.
Am I right on this, or do I forget something important which makes the
above quite hard to do?

With kind regards,

Peter

Well, I do think it is possible to compile Ruby, but it would be to hard.
Firstly eval and module_eval should be thrown away, because they need to
be able to parse code at runtime.
Continuations make compiling very messy, since the stack needs to be
copied. This can also can cause troubles when interfacing with native
c-calls. (I wonder how the current Ruby-interpreter manages this...)

The best way is IMO to have a different language that resembles Ruby
as much as possible, but allows it to be compiled easily. (I am sure
Matz wouldn't allow Ruby to be crippled to allow compilation).
It would have limitations that Ruby doesn't have (no eval or continuations),
but it would be a nice alternative for coding in C. In fact code blocks
and closures aren't that difficult to compile, and I think any decent
compilable language should have them.

As for Ruby, the best performance gain would be by using a Just-In-Time
compiler. This way, the compiler can make assumptions about the code, and
recompile when this assumptions are wrong. Take for example the following
code:
5.times do { |i| puts i }
Here the compiler could inline the code for times and produce very
efficient code. The same goes for other standard library functions like
Array#[], etc. However it is still possible to redefine Numeric#times
(though I don't see a good reason to!). If the method would be redefined,
then the just-in-time compiler could recompile it. The same goes for
method arguments. When an argument to a method is always from the same
class, the compiler can call the methods on that object directly (without
lookup), or even inline them. However when different kinds of objects get
passed to the method, then the JIT-compiler could recompile the method, so
that it works with all objects.

In this way the JIT can create well performing code, and at the same time
keep the dynamic nature of Ruby. When methods are redefined, or added,
the jit-compiler makes the changes.
It would be nice to have such an interpreter, but it would be a
considerable amount of work to implement it, as it would need a
complete rewrite of the Ruby-interpreter.

Regards,
Kristof
 
M

Mikael Brockman

Kristof Bastiaensen said:
Well, I do think it is possible to compile Ruby, but it would be to hard.
Firstly eval and module_eval should be thrown away, because they need to
be able to parse code at runtime.
Continuations make compiling very messy, since the stack needs to be
copied.

That's not strictly true. If you convert the code to
continuation-passing style, reifying continuations doesn't require
copying the stack. CHICKEN and many other Scheme compilers choose this
approach.

mikael
 
K

Kristof Bastiaensen

On Tue, 20 Jul 2004 00:00:22 +0900, Mikael Brockman wrote:

Hi,
That's not strictly true. If you convert the code to
continuation-passing style, reifying continuations doesn't require
copying the stack. CHICKEN and many other Scheme compilers choose this
approach.

mikael

That's interesting. Could you explain how that works?
My idea was that when a continuation is saved, it needs
to keep the information, where it is going to (return
adresses), and all local bindings. I would think the best
way is to save the stack, at least for compiled (machine)
code.

Kristof
 
D

David Ross

Once in a while the question pops up if it is
possible to compile Ruby
code to native machine code. The answer has always
been no. But I keep
wondering how hard it would really be to make this
possible.

The answer is really yes. Why? People say no because
of the implementation on how it is supposed to work.
It is very possible to, but it would take time.

Ruby is written in C. And when Ruby parses a Ruby
script it converts
each statement to a C call. Probably the same calls
you can use on your
own in a Ruby C extension. So why wouldn't it be
possible to parse a
Ruby script and convert all statements to Ruby C
code and put it in a
*.c file (instead of calling the Ruby C statements
directly). This *.c
file can then be compiled into machine code with a C
compiler like gcc.

Evaluated code at runtime needs to be thought of.
There needs to be a small runtime running on top of
the compiled program.

Am I right on this, or do I forget something
important which makes the
above quite hard to do?

Just very hard to do and no one wants to bother with
creating it. Which is why I emailed "Ruby
specification" on the mailing list about creating
rubycc. --David Ross



__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/
 
G

Gavin Sinclair

Just very hard to do and no one wants to bother with
creating it. Which is why I emailed "Ruby
specification" on the mailing list about creating
rubycc.

My fearless prediction: no-one will write a Ruby specification if you
don't. (Unless someone else is working hard on a compiler that we
don't know, or that I've forgotten.)

What is your rationale (in detail) for not looking at the Ruby source?

Lots of people would like to see a Ruby compiler, so any attempt by
yourself to create an English specification would almost certainly be
supported by several people testing it thoroughly for correctness.
Why not start now?

Cheers,
Gavin
 
D

David Ross

What is your rationale (in detail) for not looking
at the Ruby source?

I do not know if I can look at the source to make a
specification, then start working on a compiler that
is under a BSD license. Most of the code is copyright
by Matz and the companies that hired him. I do not
want any legal complications. --David





__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/
 
M

Mikael Brockman

Kristof Bastiaensen said:
On Tue, 20 Jul 2004 00:00:22 +0900, Mikael Brockman wrote:

Hi,


That's interesting. Could you explain how that works?
My idea was that when a continuation is saved, it needs
to keep the information, where it is going to (return
adresses), and all local bindings. I would think the best
way is to save the stack, at least for compiled (machine)
code.

In continuation-passing style, no expressions ever return. When you
rewrite an expression into CPS, the value that it would return is
instead applied to a function that the expression receives as an
argument. For example,

f (g x)

is rewritten as

(\k -> g x (\v -> f v k))

where \v -> e denotes the function from v to e. Every program can be
converted to this style. In this style, call/cc is a trivial operation,
since every expression receives its reified continuation as an argument.

You might want to read Danvy's ``Three Steps for the CPS
Transformation''[1] and Baker's ``CONS Should Not CONS Its Arguments,
Part II: Cheney on the MTA''[2].

[1] http://www.daimi.au.dk/~danvy/Papers/3steps.ps.gz
[2] http://home.pipeline.com/~hbaker1/CheneyMTA.html

mikael
 
L

Lothar Scholz

Hello Michael,


MN> I remember that a long time ago, there was a ruby-to-c compiler (was it
MN> called r2c?). But IIRC, there was only little performance gain. Remember

AFAIK it only packed the ruby code as a C string and called eval. Very
clever compiler.
 
L

Lothar Scholz

Hello Scott,


SR> Yes, I have to vote for a bytecode compiler. Right now I'm using ruby
SR> to develop some software on an embedded arm-linux device. I've found
SR> that for this application, which is not very demanding of the system,
SR> that ruby can perform pretty comparably to an equivalent C program. And
SR> the development time it saves to write in a higher level language is
SR> worth its weight in gold. But every time I run a ruby program it takes
SR> it a significant and noticable amount of time to start up. This is
SR> obviously due to the ruby parser compiling the text into bytecode every
SR> time. If it was possible to precompile this bytecode and put that on the
SR> target machine it would have significant advantages. Also, it would make
SR> it possible to distribute the application without distributing the
SR> source code to it. That isn't so important for us, but it may be for
SR> others.

Use ExErb, in version 3.2 it stores the node trees. So this is true
anymore. But from my experience i'm not sure if parsing is the time
killer. I think it is building up the whole method universe, filling
the method lookup caches and do other housekeepings.

If you want to check this, do a "cat * > out.rb" and feed the huge
"out.rb" into the "yy_compile" function, which only generates a node
tree but not does the method building.
 
S

Scott Rubin

Lothar said:
Hello Scott,




SR> Yes, I have to vote for a bytecode compiler. Right now I'm using ruby
SR> to develop some software on an embedded arm-linux device. I've found
SR> that for this application, which is not very demanding of the system,
SR> that ruby can perform pretty comparably to an equivalent C program. And
SR> the development time it saves to write in a higher level language is
SR> worth its weight in gold. But every time I run a ruby program it takes
SR> it a significant and noticable amount of time to start up. This is
SR> obviously due to the ruby parser compiling the text into bytecode every
SR> time. If it was possible to precompile this bytecode and put that on the
SR> target machine it would have significant advantages. Also, it would make
SR> it possible to distribute the application without distributing the
SR> source code to it. That isn't so important for us, but it may be for
SR> others.

Use ExErb, in version 3.2 it stores the node trees. So this is true
anymore. But from my experience i'm not sure if parsing is the time
killer. I think it is building up the whole method universe, filling
the method lookup caches and do other housekeepings.

If you want to check this, do a "cat * > out.rb" and feed the huge
"out.rb" into the "yy_compile" function, which only generates a node
tree but not does the method building.
Lothar,

I would
use ExErb, except for one thing. I'm a Linux user. I have 4 computers
and the target development board at my disposal. All of them run Linux.
I wont lecture you on the usual Linux vs. Windows stuff, that's a
waste of your time and mine. But yeah, exerb really wont help me in
this case.

Every time a ruby program is launched there is a lot of up-front
processing that gets done. I just need something that can alleviate some
or all of this up-front processing by storing the results on disk. I
don't care how it works or what it does as long as it makes programs
launch faster. And it has to work on all ruby supported operating
systems. Python does something like this, but python is no good for my
embedded development since it requires too much disk space and it
requires gcc. Ruby I can fit in a very small package (1.8MB IIRC) and
it cross-compiles for arm nicely.

Thanks for your help though.

-Scott
 
A

Ara.T.Howard

I would
use ExErb, except for one thing. I'm a Linux user. I have 4 computers and
the target development board at my disposal. All of them run Linux. I wont
lecture you on the usual Linux vs. Windows stuff, that's a waste of your time
and mine. But yeah, exerb really wont help me in this case.

Every time a ruby program is launched there is a lot of up-front processing
that gets done. I just need something that can alleviate some or all of this
up-front processing by storing the results on disk. I don't care how it works
or what it does as long as it makes programs launch faster. And it has to
work on all ruby supported operating systems. Python does something like
this, but python is no good for my embedded development since it requires too
much disk space and it requires gcc. Ruby I can fit in a very small package
(1.8MB IIRC) and it cross-compiles for arm nicely.

Thanks for your help though.

-Scott


how about a simple 'ruby-interpreter' server written as a drb object. the
server could simply start ruby via a fork'd process or via a pipe, and run the
code in question; something similar in spirit to mod_ruby. you would
eliminate the start up costs only with this method, but your post seem to
suggest this might be o.k./sufficient

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 
J

Joel VanderWerf

Ara.T.Howard said:
how about a simple 'ruby-interpreter' server written as a drb object. the
server could simply start ruby via a fork'd process or via a pipe, and
run the
code in question; something similar in spirit to mod_ruby. you would
eliminate the start up costs only with this method, but your post seem to
suggest this might be o.k./sufficient

Cool idea. It's been in the back of my mind to do something like that.
The ruby daemon doesn't even have to use DRb--it could just listen on a
socket for the name of a .rb file to load. A ruby daemon would make it
more efficient to use ruby programs as "commands" in shell scripts (not
that its really inefficient now). Also, it would reduce memory usage if
you've got a lot of scripts running at the same time.

One way to use this would be to send the server a .rb file that loads up
a bunch of libs and then daemonizes goes into the server code again,
creating a specialized ruby daemon that has already loaded the stuff
that takes some time. I use a similar approach for development already
by having a "puts; gets; fork" loop after each major stage in loading
libs and data. But the daemon approach would make it easier to start
separate forked apps from different command lines.
 
P

Peter C. Verhage

Thanks all for the interesting read. I don't need my Ruby code to be
compiled to machine code, but I was just interested if it would be
possible and what it would mean. I agree with others that a JIT compiler
would be the way to go for Ruby, Java already proved that JIT compiled
code can sometimes out perform normally compiled C code and I don't
really mind that my Ruby program might be theoretically not the best
performing program, you can always buy more processing power, but there
aren't many other languages which are as nice as Ruby. :)

Regards,

Peter (Nospam)
 
L

Lothar Scholz

Hello Scott,

SR> Lothar,

SR> I would
SR> use ExErb, except for one thing. I'm a Linux user. I have 4 computers
SR> and the target development board at my disposal. All of them run Linux.
SR> I wont lecture you on the usual Linux vs. Windows stuff, that's a
SR> waste of your time and mine. But yeah, exerb really wont help me in
SR> this case.

Okay then we need to port the stuff to Linux, it's not so difficult to
generate a ELF header and store the node tree of preparsed source
code. Why not dropping a line to the maintainer.

Of course the real pain with deployment on Linux is the library isssue that
is the some for C programs as for interpreters (which are C programs).
Until the major distributions still kill the ideas of the file
hierarchy standard by adding version tags to libc there will be no
good solution - other then using MacOS X.

But of course this will not help you very much for reducing the
startup time. But here some other clever techniques may help - like a
"require" on demand instead putting all of them on the top of a
source file.
 
S

Scott Rubin

Joel said:
Cool idea. It's been in the back of my mind to do something like that.
The ruby daemon doesn't even have to use DRb--it could just listen on a
socket for the name of a .rb file to load. A ruby daemon would make it
more efficient to use ruby programs as "commands" in shell scripts (not
that its really inefficient now). Also, it would reduce memory usage if
you've got a lot of scripts running at the same time.

One way to use this would be to send the server a .rb file that loads up
a bunch of libs and then daemonizes goes into the server code again,
creating a specialized ruby daemon that has already loaded the stuff
that takes some time. I use a similar approach for development already
by having a "puts; gets; fork" loop after each major stage in loading
libs and data. But the daemon approach would make it easier to start
separate forked apps from different command lines.

While this is a good idea, and actually may prove useful for me in the
future, it will not help me in this situation. Since all the ruby
programs will start when the system boots and they never restart. The
time saved when starting the ruby programs would be spent starting the
daemon so it would even out, and possible even take longer. It's not a
big deal really. The normal speed is adequate for the application. It
would just be nice to make it a little faster.

-Scott
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top