Timeout and Exponetial Regexes

E

eden li

Side-stepping the debate about running exponential regular expressions
first place, I'd like to catch a long running regex using
Timeout.timeout():

$ ruby -v && irb -r timeout
ruby 1.8.4 (2005-12-24) [i486-linux]
irb(main):001:0> Timeout.timeout 5 do ("a"*300)+"b" =~ /^(a*)*$/ end

Except that it appears the mechanism the Timeout module uses can't seem
to do this. After looking at the source of Timeout.timeout(), it seems
that the regex is blocking all threads and disallowing the Timeout
mechanism from detecting the runaway block. Indeed, the following
never returns:

irb(main):002:0> t = Thread.new do ("a"*300)+"b" =~ /^(a*)*$/ end

A few questions:
- Why do regexes seem to stop up all threads? Is this fixed in ruby
1.9?
- Is there a way to offload this processing into another thread
without resorting to DRb?
- Is there a better way to detect this sort of thing?
 
M

MonkeeSage

eden said:
irb(main):001:0> Timeout.timeout 5 do ("a"*300)+"b" =~ /^(a*)*$/ end

Hmm...is that a valid expression? Capture "a" zero or more times, and
then repeat that capture zero or more times?! What exactly does that
mean?

perl5 runs it but finds no match:

("a" x 300) . "b" =~ /^(a*)*$/

python spits out an error:

import re
s = ("a"*300)+"b"
re.search(r'^(a*)*$', s)

# ...
# raise error, v # invalid expression
# sre_constants.error: nothing to repeat

And ruby 1.9. runs it but finds no match (same as perl).

So I think that the expresion is broken and ruby 1.8 has a bug.

Regards,
Jordan
 
A

ara.t.howard

Hmm...is that a valid expression? Capture "a" zero or more times, and
then repeat that capture zero or more times?! What exactly does that
mean?

perl5 runs it but finds no match:

("a" x 300) . "b" =~ /^(a*)*$/

python spits out an error:

import re
s = ("a"*300)+"b"
re.search(r'^(a*)*$', s)

# ...
# raise error, v # invalid expression
# sre_constants.error: nothing to repeat

And ruby 1.9. runs it but finds no match (same as perl).

So I think that the expresion is broken and ruby 1.8 has a bug.

Regards,
Jordan

i've been running it for hours - still hasn't exited... but it does seem
valid:

irb(main):003:0> 'aaaa'[ %r/^(a*)*/ ]
=> "aaaa"

interesting...


-a
 
E

eden

It's a perfectly valid regex that happens to run in exponential time if
the regex engine that tries to run it isn't smart about backtracking
(it appears, in this case, Perl and Ruby 1.9 are, Ruby 1.8 is not, and
Python is smart enough to reject it because it knows its regex engine
can't handle it).

However, the point of my post is not the regex. What I was trying to
find out was why Ruby seems to block threads whenever a regex is
running. I just gave an example that would fit into one line.

I run a batch of regexes on a batch of input on a regular basis, and I
want some way to catch an exponential regex if I happen to accidentally
write one. Right now it appears there's no way to do this in Ruby 1.8.

... or is there?
 
R

Rick DeNatale

However, the point of my post is not the regex. What I was trying to
find out was why Ruby seems to block threads whenever a regex is
running.

I'm not entirely sure, but I suspect that it's because of the use of
green threads, rather than native threads.

Native threads can get scheduled whenever there's a hardware interrupt.

Thread switching for green threads only happens when certain internal
functions are called e.g. rb_thread_schedule. I suspect that the
regex engine isn't calling this, or if it does it doesn't call it very
often.

If this is the case it's not really blocking threads explicitly, but
the effect is the same.

Just a theory, I haven't slogged through the code thoroughly enough to prove it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top