requiring balanced parens in a regexp?

Peter Michaux · Nov 10, 2006

Hi,

In the following string I would like to find the word that comes after
"test" as long as test is not inside parenthesis. In this example the
match would be "two".

"the (test one) test two"

I found some indication that Perl regexp can do this with some sort of
recursive regexp. Can JavaScript regular expressions ensure that all
parentheses to the right of "test" are closed before proclaiming a
match? If so how? If not must I walk through the string counting how
nested each character is?

Thank you,
Peter

shimmyshack · Nov 10, 2006

could you just use

\)[\w ]*[test ]([\w]+)[ \w^\)]*\(

note that this automatically searches what is not enclosed and grabs
the word after test, and if a repeated open bracket is found, then no
match occurs.

I mean i havent tested it, and i should be asleep, but thats where i
would start off. (this assumes that you dont have a string as you gave
above, but in fact have an arbitrarily long string with many such
matches possible. You must make sure you take care of the start and end
of the string if you are grabbing a substring of a larger string,
before you attempt matches.

Peter Michaux · Nov 10, 2006

Hi Shimmyshack,

could you just use

\)[\w ]*[test ]([\w]+)[ \w^\)]*\(

note that this automatically searches what is not enclosed and grabs
the word after test, and if a repeated open bracket is found, then no
match occurs.

I mean i havent tested it, and i should be asleep, but thats where i
would start off. (this assumes that you dont have a string as you gave
above, but in fact have an arbitrarily long string with many such
matches possible. You must make sure you take care of the start and end
of the string if you are grabbing a substring of a larger string,
before you attempt matches.

Thanks for the suggestion. I think that if I allow nested parens then
this won't work. In the following "test two" looks like it is outside
some brackets.

((test one) test two (test three)) test four

I think that my whole idea has gone down the tubes. I need to write a
tokenizer and a real parser.

Thanks again,
Peter

Dr J R Stockton · Nov 10, 2006

Thu said:
Thanks for the suggestion. I think that if I allow nested parens then
this won't work. In the following "test two" looks like it is outside
some brackets.

((test one) test two (test three)) test four

I think that my whole idea has gone down the tubes. I need to write a
tokenizer and a real parser.

Consider looping on replacing ( any number of not ( or ) ) by
space until there was no change, then doing a simple search on the
residue. Undertested :-

St = "((test one) test two (test three)) test four"

function RP(S) { return S == (S=S.replace(/\([^()]*\)/g, " ")) ? S :
RP(S) }

Answer = RP(St).match(/\btest\s+(\S+)/)[1]

The g is optional.
The \b may need more thought, if "()test a" and other odd cases are
to be matched.

The code needs error-detection for the case where "test word" is not
found.

To be really clever, adapt it to work with matching () {} []
using only a single RegExp and .replace .

It's a good idea to read the newsgroup and its FAQ. See below.

FAQ 6.12 Can I use Perl regular expressions to match balanced text?	0	Jan 9, 2011
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013
Assign a PERL style regex in the RegExp constructor?	3	Jan 14, 2007
EXSLT and regexp	0	Jul 25, 2005
Simple regexp question	0	Oct 26, 2005
Can anyone write this recursion for simple regexp more beautifullyand clearly than the braggarts	157	Aug 29, 2009
Named groups in regexp matches?	8	Jan 29, 2007
RegExp doesn't capture matches in parentheses?	1	Jun 28, 2005

requiring balanced parens in a regexp?

Peter Michaux

shimmyshack

Peter Michaux

Dr J R Stockton

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads