any tricks to speed up ruby?

J

John Carter

It's an interesting thought. However, I wasn't able to get gcc 4.2.2 to do
some simpler things, like profile-based optimization, on the Ruby source, so
I wouldn't expect something that complex to work out of the box. There are a
lot of great things in gcc, but not many of them are as well tested as, say,
the standard modular C library or program, and, of course, the Linux kernel.

I'm not sure thats simpler... I also looked at that once and decided
there were some distinctly unsimple things going on.

If I'm right about kde, then yes, its a very well tested feature. Also
there would be major impetus from embedded systems users to use whole
program optimization features.
As far as I know, "-O3 -march=<your processor type>" is about the best you
can get out of gcc without a *lot* of work.

In the "medium work" category I suspect there are things relating to
function attributes and builtins that could get 5% or so more juice
(but tend to clutter the code with unportable improvements).
And there are a lot more things you can do at the Ruby source level
that have a bigger payoff than that does.

As always. My 100-10-1 rule of thumb is to expect speed up factors of up
to about 100x for using a much better algorithm, factors up to about
10x for code tweaks, factors up to 2x but usually near 1x for compiler
optimization tweaks.


John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand
 
S

s.ross

[Note: parts of this message were removed to make it a legal post.]

factors up to 2x but usually near 1x for compiler
optimization tweaks

This is getting off the Ruby topic, but I think you're being unfair to
optimizers (or rather, those who write them). The great thing about
optimizers is that we can write clear and expressive code and not
worry so much about strength reduction or loop unrolling or all those
other things optimizers do for us. Honestly, does it make sense to you
to read code like this:

a = (a << 1) + 1;

-or-

a *= 3;

The performance difference in a tight loop would be noticeable, but
the next person to pick up the code would probably get a quizzical
look on his or her face reading it. So the benchmark improvement may
be between 1 and 2 percent, but the maintainability improvement could
be of far greater benefit.

I'm not too familiar with gcc's global opts, but these are the most
dangerous, but at the same time least obvious ones. If the optimizer
gets it wrong, the code can break in the oddest ways; however, when
the optimizer nails a global opt, it can make a difference you might
never have predicted.

I am truly a fan of compiler optimizations because they fall into the
"geez, that really is smart stuff" category. Still, you are absolutely
correct that a better algorithm will win almost every time.

Just my $.02 :)
 
R

Roger Pack

So ... I assume you tried
$ export CFLAGS='march=ppc -O3' # march actually doesn't work on mac's
$ ./configure
$ make

Wow that does indeed help (once figured out). for me on my G4 it was
(after I figured out I had a 7450 processor)
export CFLAGS='-mtune=7450 -mcpu=7450 -fast -fPIC'

and compile with --disable-pthread

and voila, a faster Ruby (not sure if the real reason was the compiler
flags, the pthread, or the updated version from p110 to p114, but
something helped it--I'm guessing it was the compiler options).

new ruby with compiler options:
time ./ruby -e "10000000.times {}"
real 0m4.049s


old ruby (the mac osx port one):
time ruby -e "10000000.times {}"
real 0m5.400s

Wow it's a wonder they don't automatically set these up at compile time
to be optimized since they're so helpful.
Unfortunately gcc on mac's doesn't seem to have the -fwhole_program (at
least mine doesn't), so will have to wait to try that one till I'm back
in x86 land. Thanks for your help!
-R
 
W

William James

James said:
Before this goes too far, the answer "Use C" would be considered:

A) Helpful
B) Trolling
C) Flame-bait
D) Laughable
E) None of the above

Use Pascal.

A recent thread showed how slow Ruby is at generating
a string in which each character in the original string is
duplicated.

// Pascal (FreePascal)
uses sysutils { for timing }, strutils { for DupeString };

function dup_chars( var s: ansistring ): ansistring;
var
out: ansistring;
i: longint;
c: char;
begin
setlength( out, length(s) * 2 );
for i := 1 to length(s) do
begin
c := s;
out[2*i-1] := c;
out[2*i] := c;
end;
exit( out )
end;

var
s : ansistring;
when : tDateTime;
begin
s := dupeString( 'abc', 1000000 );
when := Time;
dup_chars( s );
writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
end.
 
P

Paul Brannan

Wow that does indeed help (once figured out). for me on my G4 it was
(after I figured out I had a 7450 processor)
export CFLAGS='-mtune=7450 -mcpu=7450 -fast -fPIC'

and compile with --disable-pthread

and voila, a faster Ruby (not sure if the real reason was the compiler
flags, the pthread, or the updated version from p110 to p114, but
something helped it--I'm guessing it was the compiler options).

I suspect --disable-pthread had the largest impact. The cost of memory
allocations can be high when linking with the threading library. Re-run
your tests without the other options if you want to verify.

I'm also surprised you got improved performance with -fPIC. I thought
position-independant code was supposed to run slower, usually.

Paul
 
M

Michal Suchanek

I suspect --disable-pthread had the largest impact. The cost of memory
allocations can be high when linking with the threading library. Re-run
your tests without the other options if you want to verify.

I'm also surprised you got improved performance with -fPIC. I thought
position-independant code was supposed to run slower, usually.

I suspect this is pretty much noop as most of the code is in libruby
anyway and it usually has to be compiled with -fPIC to link at all.

Thanks

Michal
 
P

Paul Brannan

I suspect this is pretty much noop as most of the code is in libruby
anyway and it usually has to be compiled with -fPIC to link at all.

Perhaps it depends on the platform, but on x86 linux, libruby is a
static library by default unless --enable-shared is used.

Paul
 
R

Roger Pack

and voila, a faster Ruby (not sure if the real reason was the compiler
flags, the pthread, or the updated version from p110 to p114, but
something helped it--I'm guessing it was the compiler options).

new ruby with compiler options:
time ./ruby -e "10000000.times {}"
real 0m4.049s


old ruby (the mac osx port one):
time ruby -e "10000000.times {}"
real 0m5.400s


So turns out that the difference seems to be that between p110 and p114.
compiler options were maybe 0.2s difference.

Some interesting benchmarks:

p110
5.4s

p111
5.23s

p114
4.1s

latest stable snapshot from the ruby-lang page:
5.8s

[latest snapshot build doesn't run since it appears to be based on 1.9
(?) ]

Anybody have any idea what might going on here? All compiled similarly,
p114 seems to smoke the rest.

Thanks.
-R


[Fri Mar 21 15:39:11 ~/Downloads/ruby-1.8.6-p114 ]$ time ./ruby -e
"10000000.times {}"

real 0m4.143s
user 0m3.601s
sys 0m0.026s
[Fri Mar 21 15:39:18 ~/Downloads/ruby-1.8.6-p114 ]$ time ruby -e
"10000000.times {}" # 'normal' ruby p111

real 0m5.742s
user 0m4.616s
sys 0m0.063s
 
P

Paul Brannan

Anybody have any idea what might going on here? All compiled
similarly, p114 seems to smoke the rest.

I'm guessing it's the way you built it; these are the only changes
listed in the ChangeLog between p111 and p114:

Mon Mar 3 23:34:13 2008 GOTOU Yuuzou <[email protected]>

* lib/webrick/httpservlet/filehandler.rb: should normalize path
separators in path_info to prevent directory traversal attacks
on DOSISH platforms.
reported by Digital Security Research Group [DSECRG-08-026].

* lib/webrick/httpservlet/filehandler.rb: pathnames which have
not to be published should be checked case-insensitively.

Mon Dec 3 08:13:52 2007 Kouhei Sutou <[email protected]>

* test/rss/test_taxonomy.rb, test/rss/test_parser_1.0.rb,
test/rss/test_image.rb, test/rss/rss-testcase.rb: ensured
declaring XML namespaces.

Paul
 
R

Roger Pack

Paul said:
I'm guessing it's the way you built it; these are the only changes
listed in the ChangeLog between p111 and p114:

Yep you were right on.

stable branch is fast

[Mon Mar 24 15:09:40 ~/Downloads/ruby_stable ]$ time ./ruby -e
"10000000.times {}"

real 0m4.222s


p111 is fast
time /usr/bin/ruby_old -e "10000000.times {}"

real 0m4.276s

and the rest in between similarly are.

The only truly slow one appears to be the macPort version. I don't know
what compile flags they are using but it appears truly slower.


time ruby -e "10000000.times {}"

real 0m5.710s
(consistently)
Despite that they both have similar startup speeds.

Thanks for pointing that out!
 
P

Paul Brannan

The only truly slow one appears to be the macPort version. I don't know
what compile flags they are using but it appears truly slower.

ruby -rrbconfig -e 'puts Config::CONFIG["configure_args"]'

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top