Regexp and string halts ruby

R

Ricardo Ramalho

I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.
 
A

ara.t.howard

I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

it's just a bad regex.

harp:~ > cat a.rb
#! /usr/bin/env ruby

re = /^((?:\s*[^,;=]+[;=][^,;=]+,?)+)$/

s = " pid=934, time=478611, command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i
4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q
AW02,AW02S TMQFORWARD -C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02, "

m = re.match s

puts m.to_a.first


harp:~ > time ruby a.rb
pid=934, time=478611, command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i
4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q
AW02,AW02S TMQFORWARD -C dom=ccaApp -g 650 -i 4278 -u pthp68 -U

real 0m0.008s
user 0m0.010s
sys 0m0.000s



google 'exponential backtracking'

regards.

-a
 
R

Robert Klemme

I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Are you *sure* that it enters an infinite loop? I see you nest two star
operators and a ".+" inside a + operator - this calls for trouble (i.e.
exponential run time).

I don't know what you are actually trying to achieve with your regexp
but you should consider changing it. What strikes me odd:

- you have a group that covers the whole RX, this is superfluous as you
can get the same more easily

- You are grouping but not extracting, maybe you can show more code and
more of the text you match against (if this is just a small part).

Kind regards

robert
 
W

Wolfgang Nádasi-Donner

Ricardo said:
I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

There is no infinite loop - it it takes a long time, because your regular
expression is possibly too complex. These "(...*...+...*...)+" constructs are
dangerous. See "Mastering Regular Expressions - Friedl" for details.

An example with your regular expression and your string, starting with a short
part an taken longer ones each step. The whole string may take some hundred
years on my small machine ;-)

regex = /^((?:\s*.+[;=][^,]*,?)+)$/

string = " pid=934, time=478611,command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i
4278 -u pthp68, "
t = Time.new
string.match(regex)
puts "Match need appr. #{Time.now-t} seconds"
STDOUT.flush

string = " pid=934, time=478611,command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i
4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG, "
t = Time.new
string.match(regex)
puts "Match need appr. #{Time.now-t} seconds"
STDOUT.flush

string = " pid=934, time=478611,command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i
4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S
TMQFORWARD, "
t = Time.new
string.match(regex)
puts "Match need appr. #{Time.now-t} seconds"

Match need appr. 7.0 seconds
Match need appr. 21.38 seconds
Match need appr. 60.658 seconds

Wolfgang Nádasi-Donner
 
D

Daniel Martin

Ricardo Ramalho said:
/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

It is possible to write a regular expression that takes exponential
time to confirm that there's no match. This is not really anything
too new. You can do similar things to Python and to Perl too,
although with Perl you have to try significantly harder since 5.6.
(There's some code in there that detects certain exponential-time
matching patterns and adjusts behavior)

You can simplify your example to:

/(?:.+=[^,]*,?)+$/ =~ "x=x,x" * 12

Which will complete on my machine, albeit very slowly. Increasing the
number much more will get you what appears to be a hang.

Note that if the regular expression you gave were to match the string,
then this regular expression should also match:

/(?:\s*.+[;=][^,]*,?)$/

Consider this a "prerequisite" to your match - if this doesn't match,
then there's no way your full expression could match your input. In
your code, one thing you could do is check against this short
expression first, and only if that matches then check against the long
expression.

However, it may be much more productive simply to rewrite your
expression as:

/^((?:\s*.+[;=][^,]*,)*(?:\s*.+[;=][^,]*,?))$/

I've expanded out your (?:blah)+ into a (?:blah)*(?:blah), and (most
importantly!) I've removed the "?" after the "," in the first clause,
on the assumption that the comma is optional only on the last piece in
the line. With that change, ruby no longer goes nuts trying to match
this regular expression.
 
W

William James

I've tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

Perhaps you should have done it this way:

s = " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 -- -i 10 -t 600 -q AW02,AW02S
TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 -- -i 10 -t 600 -q AW02, "

re = /[^\s;=]+[;=][^;=,\s]+/

p s.scan( re )

--- output -----
["pid=934", "time=478611", "command=TMQFORWARD",
"args=-C", "dom=ccaApp", "dom=ccaApp"]
 
R

Ricardo Ramalho

Thanks guys, i guess that was it, the regexp was too complex. Daniel
Martin's simplification (?:blah,?)+ into a (?:blah,)*(?:blah,?) really
did the trick! What (mis)led me to believe it was a bug was the fact
that i tried the original regexp/string in perl, and it returned rather
quickly. I didn't know that, also according to Martin, perl has code in
that detects certain exponential-time
matching patterns and adjusts behavior.

I think i'll go and get Friedl's Mastering Regular Expressions - i'm
curious and i like O'Reilly's books anyway... :)

Best regards,
Ricardo Ramalho
 
J

James Edward Gray II

I think i'll go and get Friedl's Mastering Regular Expressions - i'm
curious and i like O'Reilly's books anyway... :)

You wont be disappointed. It is a tome of black magic. ;)

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top