Is ruby's regex slower?

R

Ruby Newbee

Hi,

I wrote this message without other purpose, just show a result for
comparison. :)

First I got the page which will be used for analysis (got all domain
names from it):

wget http://www.265.com/Kexue_Jishu/

It will get an index.html page.

Then I run this ruby script:

#!/usr/bin/ruby

f = File.open("index.html")

f.each_line do |c|
puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
end

f.close


And this perl script:

#!/usr/bin/perl

open HD,"index.html" or die $!;
while(<HD>) {
print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
}
close HD;


When using "time" command to see the running time, I saw ruby is
slower than perl (maybe due to the regex?).

Ruby's:

real 0m0.013s
user 0m0.012s
sys 0m0.000s

Perl's:

real 0m0.004s
user 0m0.004s
sys 0m0.000s

Both versions:

# ruby -v
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]

# perl -v
This is perl, v5.8.8 built for i486-linux-thread-multi


Yes that's the result, but not influence me to love ruby.


Thanks.
Jenn.
 
A

Ayumu Aizawa

Hi Jenn.

Its interested ;)

How's it?

#!/usr/bin/ruby

regex =3D /href=3D"http:\/\/(.*?)\/.*" target=3D"_blank"/

File.open("index.html") do |f|
f.each_line do |c|
puts $1 if c =3D~ regex
end
end
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

Hi,

I wrote this message without other purpose, just show a result for
comparison. :)

First I got the page which will be used for analysis (got all domain
names from it):

wget http://www.265.com/Kexue_Jishu/

It will get an index.html page.

Then I run this ruby script:

#!/usr/bin/ruby

f = File.open("index.html")

f.each_line do |c|
puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
end

f.close


And this perl script:

#!/usr/bin/perl

open HD,"index.html" or die $!;
while(<HD>) {
print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
}
close HD;


When using "time" command to see the running time, I saw ruby is
slower than perl (maybe due to the regex?).

Ruby's:

real 0m0.013s
user 0m0.012s
sys 0m0.000s

Perl's:

real 0m0.004s
user 0m0.004s
sys 0m0.000s

Both versions:

# ruby -v
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]

# perl -v
This is perl, v5.8.8 built for i486-linux-thread-multi


Yes that's the result, but not influence me to love ruby.


Thanks.
Jenn.
It seems like most of the time would be spent loading the environment and
printing the output, making it difficult to compare regexp speeds.

Anyway, just wanted to say the Ruby one can be done in a more succinct
syntax:

File.open("index.html").each do |c|
puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
end
 
R

Ruby Newbee

Hi Jenn.

Its interested ;)

How's it?


Thanks for the reminding, I got your meanings.
This time I used a compiled regex for both ruby and perl, the speed is
still different:


# cat regex_compile.rb
#!/usr/bin/ruby

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/

File.open("index.html") do |f|
f.each_line do |c|
puts $1 if c =~ regex
end
end


# cat regex_compile.pl
#!/usr/bin/perl

open HD,"index.html" or die $!;
while(<HD>) {
print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/o;
}
close HD;


# time ruby regex_compile.rb > /dev/null

real 0m0.011s
user 0m0.008s
sys 0m0.004s


# time perl regex_compile.pl > /dev/null

real 0m0.003s
user 0m0.000s
sys 0m0.000s
 
W

Wybo Dekker

It seems like most of the time would be spent loading the environment and
printing the output, making it difficult to compare regexp speeds.

Sure; so why not do it 1000 times:

#!/usr/bin/ruby
1000.times do
File.open("index.html").each do |c|
puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
end
end

time ./test.rb >/tmp/t
elap 6.511 user 6.336 syst 0.136 CPU 99.40%


#!/usr/bin/perl
for ($i=0; $i<1000; $i+=1) {
open HD,"index.html" or die $!;
while(<HD>) {
print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
}
close HD;
}

time ./test.pl >/tmp/t
elap 0.864 user 0.844 syst 0.020 CPU 100.04%

So perl is 7 or 8 times faster here.
 
W

W. James

Ayumu said:
regex = /href="http:\/\/(.*?)\/.*" target="_blank"/

File.open("index.html") do |f|
f.each_line do |c|
puts $1 if c =~ regex
end
end

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/
IO.foreach("index.html"){|line| puts $1 if line =~ regex }



With no looping:

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/
puts IO.readlines("index.html").map{|s| s[ regex, 1 ] }.compact

--
 
R

Robert Klemme

Thanks for the reminding, I got your meanings.
This time I used a compiled regex for both ruby and perl, the speed is
still different:


# cat regex_compile.rb
#!/usr/bin/ruby

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/

File.open("index.html") do |f|
f.each_line do |c|
puts $1 if c =~ regex
end
end

"Compiling" regular expression does not bring any advantages. In fact,
usually it's slower than using the regular expression inline as you did
in your first example. The Ruby interpreter optimizes this already.

If the speed difference does not bother it why bother discussing it?

Btw, I'd probably formulate the regexp differently in order to avoid
".*?" which could be slow. Also, if you have a lot of slashes in the
regexp the %r form comes in handy because you do not need all the escapes:

File.foreach "index.html" do |line|
puts $1 if %r{href="http://([^"/]*)/[^"]*"\s+target="_blank"} =~ line
end

Kind regards

robert
 
R

Rilindo Foster

Wait, you are parsing HTML with regex?

I need to post this, then:

http://www.codinghorror.com/blog/archives/001311.html

:)

Ayumu said:
regex = /href="http:\/\/(.*?)\/.*" target="_blank"/

File.open("index.html") do |f|
f.each_line do |c|
puts $1 if c =~ regex
end
end

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/
IO.foreach("index.html"){|line| puts $1 if line =~ regex }



With no looping:

regex = /href="http:\/\/(.*?)\/.*" target="_blank"/
puts IO.readlines("index.html").map{|s| s[ regex, 1 ] }.compact
 
R

Roger Pack

time ./test.rb >/tmp/t
elap 6.511 user 6.336 syst 0.136 CPU 99.40%
time ./test.pl >/tmp/t
elap 0.864 user 0.844 syst 0.020 CPU 100.04%

So perl is 7 or 8 times faster here.

You could try ruby 1.9 and see if it helps the speed.

If slow regex is a big problem for you I could probably hack up a gem
that wraps PCRE or what not.
-r
 
K

Kyle Schmitt

Wait, you are parsing HTML with regex?

I need to post this, then:

http://www.codinghorror.com/blog/archives/001311.html

:)


Thank you. It's been far too long since I've read Coding Horror.

Although it reminds me ,I should bug one of my PhD candidate friends
for some perl code I counseled him to fix. He was parsing a 500MB+
csv file with getlines and string compares and splits in perl..... I
think he literally banged his head on the table when I introduced him
to CPAN and showed him CSV libraries...
 
K

Kornelius Kalnbach

Roger said:
You could try ruby 1.9 and see if it helps the speed.
not very.

I get best results in Ruby with:

regexp = %r{href="http://([^"/]*)/[^"]*"\s+target="_blank"}
1000.times do
puts File.read('index.html').scan(regexp)
end

~/ruby/bench time ruby19 regex.rb > /dev/null
real 0m1.428s
user 0m1.359s
sys 0m0.056s

~/ruby/bench time perl5.10.0 regex.pl > /dev/null
real 0m1.189s
user 0m1.095s
sys 0m0.084s

It's still slower. Perl has regular expression magic beyond my
imagination, though. I heard they take the most "rare" character in the
literal part of the regex (let's say, the colon) and search for it using
machine code, and then work their way backwards to the beginning of the
regexp...

Say what you want, but Perl rocks when it comes to text processing
speed.

Python is even faster:

import re
regexp = re.compile(r'href="http://([^"/]*)/[^"]*"\s+target="_blank"')
for i in xrange(1000):
with open("index.html") as f:
for m in regexp.finditer(f.read()):
print m.group(1)

time python2.6 regex.py > /dev/null
real 0m0.943s
user 0m0.880s
sys 0m0.053s
 
M

Marnen Laibow-Koser

Kornelius Kalnbach wrote:
[...]
It's still slower. Perl has regular expression magic beyond my
imagination, though. I heard they take the most "rare" character in the
literal part of the regex (let's say, the colon) and search for it using
machine code, and then work their way backwards to the beginning of the
regexp...

I think that's only done when study is called, but I could be wrong.
Say what you want, but Perl rocks when it comes to text processing
speed.

Python is even faster:

import re
regexp = re.compile(r'href="http://([^"/]*)/[^"]*"\s+target="_blank"')
for i in xrange(1000):
with open("index.html") as f:
for m in regexp.finditer(f.read()):
print m.group(1)

time python2.6 regex.py > /dev/null
real 0m0.943s
user 0m0.880s
sys 0m0.053s

Yeah. I love Ruby, but I'm getting a bit annoyed by the fact that it's
so much slower than Python...

Best,
-- 
Marnen Laibow-Koser
http://www.marnen.org
(e-mail address removed)
 
K

Kornelius Kalnbach

Marnen said:
Yeah. I love Ruby, but I'm getting a bit annoyed by the fact that it's
so much slower than Python...
You can improve it then :)

[murphy]
 
R

Robert Klemme

Yeah. I love Ruby, but I'm getting a bit annoyed by the fact that it's
so much slower than Python...

The question is: does it matter for most practical purposes - and: do
you want to sacrifice a clean and simple program and the fun of creating
it for a few cycles of CPU time? I wouldn't - especially since 1.9 is
so much faster than 1.8 was. My 0.02EUR.

Kind regards

robert
 
M

Marnen Laibow-Koser

Robert said:
The question is: does it matter for most practical purposes - and: do
you want to sacrifice a clean and simple program and the fun of creating
it for a few cycles of CPU time?

No. That's why I haven't learned Python yet, although between the speed
increase and GAE, it's sometimes tempting. But I'd really miss the
beautiful design of Ruby.

But my point was a bit different. Python and Ruby are basically similar
languages, and what annoys me is that there seems not to have been the
will in the Ruby community to steal some speed tricks from Python. (I'd
be working on this if I knew anything practical about language
implementation, but I don't.)
I wouldn't - especially since 1.9 is
so much faster than 1.8 was. My 0.02EUR.

Unfortunately, I don't quite trust 1.9 for use with Rails yet...
Kind regards

robert

Best,
-- 
Marnen Laibow-Koser
http://www.marnen.org
(e-mail address removed)
 
R

Roger Pack

But my point was a bit different. Python and Ruby are basically similar
languages, and what annoys me is that there seems not to have been the
will in the Ruby community to steal some speed tricks from Python. (I'd
be working on this if I knew anything practical about language
implementation, but I don't.)

Yeah no kidding. Somehow speed just hasn't "felt" like the ruby
community's thing, until 1.9 at least.

I am working on a few projects to make it faster [and I suppose the
macruby, rubinius and jruby guys, are, as well].
Unfortunately, I don't quite trust 1.9 for use with Rails yet...

Come to the dark side...

-r
 
R

Robert Klemme

But my point was a bit different. Python and Ruby are basically similar
languages, and what annoys me is that there seems not to have been the
will in the Ruby community to steal some speed tricks from Python. (I'd
be working on this if I knew anything practical about language
implementation, but I don't.)

Well, 1.9 *has been* improved dramatically in the area of performance
(among others). Whether this is because of "tricks stolen from Python"
I cannot judge. Your statement seems to imply that the Ruby community
is negligent of performance which is not true.

Kind regards

robert
 
A

Albert Schlef

Robert said:
The question is: does it matter for most practical purposes - and: do
you want to sacrifice a clean and simple program and the fun of creating
it for a few cycles of CPU time? I wouldn't - especially since 1.9 is
so much faster than 1.8 was. My 0.02EUR.

Why does everybody say that CPUs are fast nowadays and that "it dosn't
mattar that language XYZ is slow"?

It does matter: web applications. If your applications can't serve all
the visitors, then you're going to lose your customer or you'll have to
learn some other language with better performance.
 
K

Kirk Haines

Why does everybody say that CPUs are fast nowadays and that "it dosn't
mattar that language XYZ is slow"?

It does matter: web applications. If your applications can't serve all
the visitors, then you're going to lose your customer or you'll have to
learn some other language with better performance.

That's such a red herring that I'm not even sure how to address it.

How about this:

A dynamic web site that's been running a couple years, with all of the
content pulled from a database, and navigation generated dynamically
from db contents, running on a shared server that is a few years old
(i.e. not cutting edge hardware), running on Ruby 1.8.6 (i.e. not a
speedy version of Ruby).

Requests per second: 137.45 [#/sec] (mean)

Make it more complex by pulling a page that renders a big table of
itty bitty numbers for mutual fund performance:

Requests per second: 82.48 [#/sec] (mean)

However, mitigate even those slow speeds by running it behind a load
balancer, implemented with Ruby, that caches the generated pages and
serves them from cache (while the LB is also managing requests for 70
other sites):

Requests per second: 6107.65 [#/sec] (mean)

Sure, performance improvements in the language implementations are
great -- it's always good when our stuff runs faster. But it's absurd
to argue that Ruby, even in the older versions of MRI, can't provide
far more than sufficient performance for _really_ fast web
applications. I made my living writing web software with Ruby for
several years before most anyone else was doing it, and even then, in
that particular application arena, Ruby always proved to be more than
fast enough to render complex web pages in a few milliseconds.

And these days, the landscape is nowhere near that simple. Look at
Ruby 1.9.x, or JRuby, or Rubinius, and see where the trends are
heading. The future looks good, IMHO.


Kirk Haines
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top