regexp question - look for parentheses then remove them

M

Max Williams

I'm struggling with a regular expression problem, can anyone help?

I want to take a string, look for anything in parentheses, and if i find
anything, put it into an array, minus the parentheses.

currently i'm doing this:

parentheses = /\(.*\)/
array = string.scan(parentheses)

This gives me eg

"3 * (1 + 2)" => ["(1 + 2)"]

- but is there an easy way to strip the parentheses off before putting
it into the array?

eg
"3 * (1 + 2)" => ["1 + 2"]

In addition, if i have nested parentheses inside the outer parentheses,
i want to keep them, eg

"3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]

can anyone show me how to do this?

thanks
max
 
J

Jesús Gabriel y Galán

I'm struggling with a regular expression problem, can anyone help?

I want to take a string, look for anything in parentheses, and if i find
anything, put it into an array, minus the parentheses.

currently i'm doing this:

parentheses = /\(.*\)/
array = string.scan(parentheses)

This gives me eg

"3 * (1 + 2)" => ["(1 + 2)"]

- but is there an easy way to strip the parentheses off before putting
it into the array?

eg
"3 * (1 + 2)" => ["1 + 2"]

In addition, if i have nested parentheses inside the outer parentheses,
i want to keep them, eg

"3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]

can anyone show me how to do this?

x = "3 * (1 + 2)".match(/\((.*)\)/)
x.captures
=> ["1 + 2"]
x = "3 * (2 + (1 + 3))".match(/\((.*)\)/)
x.captures
=> ["2 + (1 + 3)"]

Hope this helps,

Jesus.
 
M

Max Williams

Jesús Gabriel y Galán said:
This gives me eg
i want to keep them, eg

"3 * (1 + (4 / 2))" => ["1 + (4 / 2)"]

can anyone show me how to do this?

x = "3 * (1 + 2)".match(/\((.*)\)/)
x.captures
=> ["1 + 2"]
x = "3 * (2 + (1 + 3))".match(/\((.*)\)/)
x.captures
=> ["2 + (1 + 3)"]

Hope this helps,

Jesus.

ah, "captures" - that's the same as MatchData#to_a, right? Perfect,
thanks!
 
J

Jesús Gabriel y Galán

Jes=FAs Gabriel y Gal=E1n said:
This gives me eg
i want to keep them, eg

"3 * (1 + (4 / 2))" =3D> ["1 + (4 / 2)"]

can anyone show me how to do this?

x =3D "3 * (1 + 2)".match(/\((.*)\)/)
x.captures
=3D> ["1 + 2"]
x =3D "3 * (2 + (1 + 3))".match(/\((.*)\)/)
x.captures
=3D> ["2 + (1 + 3)"]

Hope this helps,

Jesus.

ah, "captures" - that's the same as MatchData#to_a, right? Perfect,

Not exactly, because the MatchData#to_a returns as the first position
of the array the string that matched, and then starting from x[1] the
captured groups. MatchData#captures only contains the captures. See
the difference:

irb(main):001:0> a =3D "123456".match(/(.)(.)\d\d/)
=3D> #<MatchData:0xb7c97a04>
irb(main):002:0> a.to_a
=3D> ["1234", "1", "2"]
irb(main):003:0> a.captures
=3D> ["1", "2"]

Jesus.
 
J

Jesús Gabriel y Galán

From: Jes=FAs Gabriel y Gal=E1n [mailto:[email protected]]
"3 * (1 + 2)" =3D> ["1 + 2"]
"3 * (1 + (4 / 2))" =3D> ["1 + (4 / 2)"]

can anyone show me how to do this?

x =3D "3 * (1 + 2)".match(/\((.*)\)/)
x.captures
=3D> ["1 + 2"]
x =3D "3 * (2 + (1 + 3))".match(/\((.*)\)/)
x.captures
=3D> ["2 + (1 + 3)"]
That can fail if you have more than one bracket pair on the lowest level:

irb(main):002:0> "3 * (2 + (1 + 3)) + (1 * 4)".match(/\((.*)\)/).to_a
=3D> ["(2 + (1 + 3)) + (1 * 4)", "2 + (1 + 3)) + (1 * 4"]

True, what would be the expected result for this?

["2 + (1 + 3)", "1 * 4"] ???

I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.

Cheers,

Jesus.
 
J

Jesús Gabriel y Galán

[snip]

??
I agree that for complex cases a regexp is not the solution. A
solution like yours counting parens (or with a stack) should be
preferred way.
Yep, parsing something with an arbitrarily stacked parentheses is the
classic example of something that can't be done with a regex. (Well,
assuming you actually care about the nested parens.)

I've read that the .NET regex engine has some constructs to recognize
balanced constructs like parens:

http://puzzleware.net/blogs/archive/2005/08/13/22.aspx

Interesting !!

Jesus.
 
T

tho_mica_l

ah, "captures"

You can access the match data right away:

x = /\((.*)\)/.match("3 * (1 + 2)")
x[1]
or $1

I'd also make the * non-greedy -> *?

/\((.*?)\)/.match("3 * (1 + 2) * (3 + 4)")[1]
=> "1 + 2"

but:
/\((.*)\)/.match("3 * (1 + 2) * (3 + 4)")[1]
=> "1 + 2) * (3 + 4"
 
W

Wolfgang Nádasi-Donner

Jesús Gabriel y Galán said:
I've read that the .NET regex engine has some constructs to recognize
balanced constructs like parens...

It's possible in Ruby 1.9 or Ruby 1.8 and the Oniguruma library too:

module Matchelements
def bal(lpar='(', rpar=')')
raise RegexpError,
"wrong length of left bracket '#{lpar}' in bal" unless lpar.length
== 1
raise RegexpError,
"wrong length of right bracket '#{rpar}' in bal" unless
rpar.length == 1
raise RegexpError,
"identical left and right bracket '#{lpar}' in bal" if
lpar.eql?(rpar)
lclass, rclass = lpar, rpar
lclass = '\\' + lclass if lclass.match(/[\-\[\]]/)
rclass = '\\' + rclass if rclass.match(/[\-\[\]]/)
return "(?<bal>" +
"[^#{lclass}#{rclass}]*?" +
"(?:\\#{lpar}\\g<bal>\\#{rpar}" +
"[^#{lclass}#{rclass}]*?" +
")*?" +
")"
end
end
include Matchelements

result = "3 * (2 + (1 + 3)) + (1 * 4)".scan(/\(#{bal()}\)/)

p result # => [["2 + (1 + 3)"], ["1 * 4"]]

Wolfgang Nádasi-Donner
 
M

Max Williams

tho_mica_l said:
I'd also make the * non-greedy -> *?

/\((.*?)\)/.match("3 * (1 + 2) * (3 + 4)")[1]
=> "1 + 2"

but:
/\((.*)\)/.match("3 * (1 + 2) * (3 + 4)")[1]
=> "1 + 2) * (3 + 4"

Excellent tip, cheers!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top