Rubys Regular Expression Engine

  • Thread starter Wolfgang Nádasi-Donner
  • Start date
W

Wolfgang Nádasi-Donner

Some weeks ago I sent a message here, "Onigurama - Problem with Subexpression Call", sent on Montag, 27. Juni
2005 23:22 - unfortunately there was no comment from someone. So I try to ask the major part again.

If I'm correcly informed, the Regular Expression Engine used by Ruby will be Onigurama, starting in Ruby 1.9.
Where is the right place to sent Problem Reports or Change Recommendations against Onigurama? The Engine
itself is not only used in the Ruby environment, so I don't know what to do with Problems that have to be
addressed to Onigurama in the Ruby environment.

***** Problem Report *****

Onigurama does not recognize left recursion correctly.

As I wrote in my earlier message

-----------------------------------------------------------
orgstring= "5-((3+4)*5)+6+xx"
pattern = /^(?<bal>[^()]*(\(\g<bal>\)[^()]*)*)$/

puts "\nMuster: #{pattern.inspect}\nString: '#{orgstring}'"
if (res = orgstring.match(pattern)) then
puts "Match: '#{res}'"
else
puts "+++++ No Match"
end
-----------------------------------------------------------

produces the Error Message

-----------------------------------------------------------
never ending recursion: /^(?<bal>[^()]*(\(\g<bal>\)[^()]*)*)$/
-----------------------------------------------------------

which is definitely wrong. After using a useless alternative in this expression it works fine

-----------------------------------------------------------
orgstring= "5-((3+4)*5)+6+xx"
pattern = /^(?<bal>[^()]*((x|(\(\g<bal>\)))[^()]*)*)$/

puts "\nMuster: #{pattern.inspect}\nString: '#{orgstring}'"
if (res = orgstring.match(pattern)) then
puts "Match: '#{res}'"
else
puts "+++++ No Match"
end
 
N

Nikolai Weibull

Wolfgang N=EF=BF=BDdasi-Donner wrote:
^-- Sorry, but your missing proper encoding information.
Some weeks ago I sent a message here, "Onigurama - Problem with
Subexpression Call", sent on Montag, 27. Juni 2005 23:22 -
unfortunately there was no comment from someone. So I try to ask the
major part again.

It's easier to track if you point to the number of the message instead.
It can be found at http://www.ruby-talk.org/.
Onigurama does not recognize left recursion correctly.

Why should it? (See below.)
As I wrote in my earlier message
=20

You make my eyes bleed... Please don't use regular expressions for
matching parentheses, it simply can't be done using regular expressions
(well, .NET-"regular expressions" can, and perhaps there's some support
for it in Oniguruma as well, but definitely not in the way you're doing
it.).

Use a context free grammar for that instead. You can use a tool like
racc to do that.
puts "\nMuster: #{pattern.inspect}\nString: '#{orgstring}'"
if (res =3D orgstring.match(pattern)) then
puts "Match: '#{res}'"
else
puts "+++++ No Match"
end

No, it's definitely right. Your trying to match a <bal> inside a <bal>,
right? That sounds like infinite recursion to me,
nikolai

--=20
Nikolai Weibull: now available free of charge at http://bitwi.se/!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
 
W

Wolfgang Nádasi-Donner

snip >>>>>
Onigurama does not recognize left recursion correctly. Why should it? (See below.)

Because it is described in the Onigurama documentation
As I wrote in my earlier message

You make my eyes bleed... Please don't use regular expressions for
matching parentheses, it simply can't be done using regular expressions
(well, .NET-"regular expressions" can, and perhaps there's some support
for it in Oniguruma as well, but definitely not in the way you're doing
it.).
It is normal usage like the bal-Pattern in Snobol4. I agree that this is not good for fixed programms, but my
usage of Ruby is mainly interactive data analysis (now done using IRB, later using a special shell). It is
very helpful in this case,



Sorry, strong disagreement. The part "(\(\g<bal>\)[^()]*)*" will either recognize nothing or it must consume a
"("-Character before coming to the recursive "\b<bal>"-Part.

Best regards, Wolfgang Nadasi-Donner
 
N

Nikolai Weibull

Because it is described in the Onigurama documentation

Hm, OK, sorry. It works in 3.8.5
(http://www.geocities.jp/kosako3/oniguruma/) it seems. Btw, can't that
regular expression be simplified a bit? The two * could be turned into
a single ? and the [^()] can be turned into [^(] and [^)], or?,
nikolai


--=20
Nikolai Weibull: now available free of charge at http://bitwi.se/!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
 
W

Wolfgang Nádasi-Donner

snip >>>>>
Hm, OK, sorry. It works in 3.8.5
(http://www.geocities.jp/kosako3/oniguruma/) it seems. Btw, can't that
regular expression be simplified a bit? The two * could be turned into
a single ? and the [^()] can be turned into [^(] and [^)], or?,
nikolai
I really do think so - but I'm starting experiments with Ruby/Onigurama now. I try to find some basic building
blocks that can be used interactively by '#{patternvar}'.

The above Regex now works (it is still unreadable), but a lot has to be done. Some things related to the
produced MatchData object are still unclear for me if one is using named groups. The Example
orgstring= <<EOT
firstproc(p1,p2,p3(p31,p32),p4(p41,p42(p421,p422),p43),p5)
nochneproc ( einfach, nur , ein, paar, parameter )
dieletzte ( auch, mit ( fast ) , allem )
EOT

pattern1 = /(?<n>\w+)\s*(?<p>\((?<bal>[^()]*?((x|(\(\g<bal>\)))[^()]*?)*?)\))/
pattern2 = /[(,]\s*(?<bal>[^()]*?((x|(\(\g<bal>\)))[^()]*?)*?)(?=\s*[,)])/

orgstring.scan(pattern1) do
puts "\n---- Name: '#{$~[1]}'"
$~[2].scan(pattern2) do
puts "Parameter: '#{$~[1]}'"
end
end
---- Name: 'firstproc'
Parameter: 'p1'
Parameter: 'p2'
Parameter: 'p3(p31,p32)'
Parameter: 'p4(p41,p42(p421,p422),p43)'
Parameter: 'p5'

---- Name: 'nochneproc'
Parameter: 'einfach'
Parameter: 'nur'
Parameter: 'ein'
Parameter: 'paar'
Parameter: 'parameter'

---- Name: 'dieletzte'
Parameter: 'auch'
Parameter: 'mit ( fast )'
Parameter: 'allem'
It's still an early step.

Best regards, Wolfgang Nadasi-Donner
 
W

Wolfgang Nádasi-Donner

snip >>>>>
.... Btw, can't that
regular expression be simplified a bit? The two * could be turned into
a single ? and the [^()] can be turned into [^(] and [^)], or?
It depends on the definition of 'balanced' needed. If [^(] is used the string 'abcde(fgh' will be recognized
as balanced.

The two * are necessary if one needs to match balanced strings, independant of its structure, e.g.
'abvv(d(jhj)kjj)hd(jhb(j))hdjh'. This form has an advantage too. it is related to Friedl's loop enrolling
structure "<normal>(<special><normal>*)*' and avoids endless matching.

Much more complicated is the question "greedy or not greedy" in the recursion.

Best regards, Wolfgang Nadasi-Donner
 
W

Wolfgang Nádasi-Donner

snip >>>>>
Can anyone give me a hint where to place things related to Ruby/Onigurama?

Btw, this major question is still open.

Best regards, Wolfgang Nadasi-Donner
 
N

Nikolai Weibull

Wolfgang said:
.... Btw, can't that regular expression be simplified a bit? The
two * could be turned into a single ? and the [^()] can be turned
into [^(] and [^)], or?
It depends on the definition of 'balanced' needed. If [^(] is used the
string 'abcde(fgh' will be recognized as balanced.

I assume you mean 'abcde)fgh'.
The two * are necessary if one needs to match balanced strings,
independant of its structure, e.g. 'abvv(d(jhj)kjj)hd(jhb(j))hdjh'.
This form has an advantage too. it is related to Friedl's loop
enrolling structure "<normal>(<special><normal>*)*' and avoids endless
matching.

Hm, yes that's true,
nikolai

--=20
Nikolai Weibull: now available free of charge at http://bitwi.se/!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top