requiring balanced parens in a regexp?

P

Peter Michaux

Hi,

In the following string I would like to find the word that comes after
"test" as long as test is not inside parenthesis. In this example the
match would be "two".

"the (test one) test two"

I found some indication that Perl regexp can do this with some sort of
recursive regexp. Can JavaScript regular expressions ensure that all
parentheses to the right of "test" are closed before proclaiming a
match? If so how? If not must I walk through the string counting how
nested each character is?

Thank you,
Peter
 
S

shimmyshack

could you just use

\)[\w ]*[test ]([\w]+)[ \w^\)]*\(

note that this automatically searches what is not enclosed and grabs
the word after test, and if a repeated open bracket is found, then no
match occurs.

I mean i havent tested it, and i should be asleep, but thats where i
would start off. (this assumes that you dont have a string as you gave
above, but in fact have an arbitrarily long string with many such
matches possible. You must make sure you take care of the start and end
of the string if you are grabbing a substring of a larger string,
before you attempt matches.
 
P

Peter Michaux

Hi Shimmyshack,
could you just use

\)[\w ]*[test ]([\w]+)[ \w^\)]*\(

note that this automatically searches what is not enclosed and grabs
the word after test, and if a repeated open bracket is found, then no
match occurs.

I mean i havent tested it, and i should be asleep, but thats where i
would start off. (this assumes that you dont have a string as you gave
above, but in fact have an arbitrarily long string with many such
matches possible. You must make sure you take care of the start and end
of the string if you are grabbing a substring of a larger string,
before you attempt matches.

Thanks for the suggestion. I think that if I allow nested parens then
this won't work. In the following "test two" looks like it is outside
some brackets.

((test one) test two (test three)) test four

I think that my whole idea has gone down the tubes. I need to write a
tokenizer and a real parser.

Thanks again,
Peter
 
D

Dr J R Stockton

Thu said:
Thanks for the suggestion. I think that if I allow nested parens then
this won't work. In the following "test two" looks like it is outside
some brackets.

((test one) test two (test three)) test four

I think that my whole idea has gone down the tubes. I need to write a
tokenizer and a real parser.

Consider looping on replacing ( any number of not ( or ) ) by
space until there was no change, then doing a simple search on the
residue. Undertested :-

St = "((test one) test two (test three)) test four"

function RP(S) { return S == (S=S.replace(/\([^()]*\)/g, " ")) ? S :
RP(S) }

Answer = RP(St).match(/\btest\s+(\S+)/)[1]

The g is optional.
The \b may need more thought, if "()test a" and other odd cases are
to be matched.

The code needs error-detection for the case where "test word" is not
found.


To be really clever, adapt it to work with matching () {} []
using only a single RegExp and .replace .

It's a good idea to read the newsgroup and its FAQ. See below.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top