Making Ruby faster

  • Thread starter Tomasz Wegrzanowski
  • Start date
G

Giles Bowkett

I think you need to post that to ruby-lang, actually. Sounds very
interesting though.
 
R

Robert Dober

Here's a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
* http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here's a blog post with details;
* http://t-a-w.blogspot.com/2007/02/making-ruby-faster.html
Definitely a great read, I cannot judge the technical quality in depth
but for what I understand this is reasonable stuff.
I got a stupid question too, did you try the debugger after using -O3?

However you should post this to ruby-core if you hope for your patches
to be accepted.
It might not be worth to do so however in the advent of Ruby2 and the
small improvement.
But I am an ignorant so just try your luck there ;)

Again this is good stuff even if it might not be adopted.


Cheers
Robert
 
T

Tomasz Wegrzanowski

I got a stupid question too, did you try the debugger after using -O3?

I never use C debugger for single-stepping. If I want to know what's
happening inside
a program I usually printf or strace or efence or ltrace or fenris or objdump or
LD_PRELOAD or something like that.
The only use I have for C debugger is backtrace printing with segfaults.
It works fine with -O3, you just need to pass -g option instead of
-fomit-frame-pointer
(on some platforms -O* causes -fomit-frame-pointer, but not on x86).
In any case, there's not much difference between -O and -O3 when it comes
to debugging - any optimizations make single-stepping difficult.
 
W

William James

Here's a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
*http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here's a blog post with details;
*http://t-a-w.blogspot.com/2007/02/making-ruby-faster.html

3% is a modest improvment. Let's see what can be
achieved by switching to a language that's designed
to be fast.

Ruby Lua LuaJIT
------------------------------
tarai 0% 408% 1918%
fib 0% 497% 2874%
mandelbrot 0% 920% 1347%

The source:

local function tarai( x, y, z )
if x <= y then
return y
else
return tarai( tarai(x-1, y, z),
tarai(y-1, z, x),
tarai(z-1, x, y))
end
end

tarai(12, 6, 0)

#####

local function fib( n )
if n < 3 then
return 1
else
return fib(n-1) + fib(n-2)
end
end

#####

local sqrt = math.sqrt
local insert = table.insert

local function c_mul( c1, c2 )
local a,b = unpack(c1)
local c,d = unpack(c2)
return { a*c - b*d, a*d + c*b }
end

local function c_abs( c )
local a, b = unpack( c )
return sqrt( a*a + b*b )
end

local function is_mandelbrot( z )
for i = 1, 100 do
z = c_mul( z, z )
if c_abs( z ) > 2 then
return false
end
end
return true
end

ary = {}

for xx = 0, 100 do
for yy = 0, 100 do
local x, y = xx / 50.0, yy / 50.0
local c = {x, y}
if is_mandelbrot( c ) then
insert( ary, c )
end
end
end
 
G

gga

Here's a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
*http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here's a blog post with details;
*http://t-a-w.blogspot.com/2007/02/making-ruby-faster.html

Looks cool. Couple of comments...

1) if-then-else-switch... Not clear to me why it would be faster than
the switch. Also, most of that stuff is gone in ruby1.9. Seems you
were using ruby1.8, right? You should probably redo this patch
against yarv (see vm.c, for example).

2) Fixnum improvements. Kind of a good idea, but maybe a problematic
implementation. FIX2LONG() and similar are macros that allow hiding
the implementation details. This could allow in the future to have
Ruby treat Fixnums as true real objects (like Python), without
changing the source code. Currently, yes, they are pretty inefficient
as they do a shift each time, even when unneeded.
Removing the use of the macro may not be such a good idea, as
fix_equal like you have written it could break as soon as Fixnums are
no longer represented as longs, but as a class. You might work around
it by creating your own pseudo FIX2LONG_SANS_SHIFT() macro. That way,
if stuff changes, it will become obvious, and changing the macro to be
== to FIX2LONG() is all it should take.
Also, if I read your code correctly, fix_equal() as you presented it
is slightly different in that it does not check that the pointers are
the same when it is not a fixnum (this means something like a == a may
perform slower -- albeit I'll admit that is a weird case). I would
probably ask in the dev list why that check was needed in the first
place.

3) Judy is in general not faster than a hash lookup. It is just more
memory efficient.
Ruby might benefit from a better hash, thou, like google's
dense_hash_map (albeit it is C++ code, which Matz is
against).

4) You should probably make a separate patch for each improvement you
made, instead of all of them together. That will make it easier to
merge (and evaluate) them in case one of them is rejected but others
accepted.


Other than that, pretty cool. You should keep working on it and join
the ruby dev list to start merging your changes.
Here's also the docs for sending patches:
http://www.ruby-lang.org/en/community/ruby-core/
 
T

Tomasz Wegrzanowski

3) Judy is in general not faster than a hash lookup. It is just more
memory efficient.

I cannot fully agree with that characterization, even if it's correct
considering
only the operation count.

It's not something you see on typical benchmarks [1], as they tend to access all
memory with the same frequency, but real programs' memory access patterns
more or less follow the power law - very few data structures are
accessed all the time,
and gradually bigger groups of data structures are accessed less frequently,
ending with most of the program memory being accessed very rarely.

For typical programs computers are pretty good at having the more commonly
accessed things in faster caches. This means even modest decrease in
data structure sizes will pull frequently accessed data structures into faster
caches. In most benchmarks all items are equally probably, so the items pulled
into freed caches tends to be uninteresting.

I'd like to see Judy vs hash tables performance comparison on real programs,
or at least on more realistic benchmarks.

One more thing - Ruby uses numeric hash tables for symbol lookup.
Symbols ids are highly non-random:

irb> ids = ObjectSpace.enum_for:)each_object,
Module).inject([]){|ms,md| ms + md.instance_methods}.uniq.map{|x|
x.to_sym.object_id}.sort; [ids.min, ids.max, ids.size]
=> [378, 269218, 1061]

That's exactly the situation Judy is optimized for.

[1] - http://www.nothings.org/computer/judy/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top