Regexp for parsing?

Wayne Magor · Nov 19, 2007

I understand regular expressions, but can someone please explain this:

re = %r/((?<pg>\((?:\\[()]|[^()]|\g<pg>)*\)))/

By the way, this only works with the Oniguruma engine (Ruby 1.9).

So, now that there is the capability to match balanced parens and so
forth, does this mean that the new regular expression engine can be
used to construct simple parsers (matching language constructs)?

Noah Easterly · Nov 19, 2007

I understand regular expressions, but can someone please explain this:

re = %r/((?<pg>\((?:\\[()]|[^()]|\g<pg>)*\)))/

By the way, this only works with the Oniguruma engine (Ruby 1.9).

So, now that there is the capability to match balanced parens and so
forth, does this mean that the new regular expression engine can be
used to construct simple parsers (matching language constructs)?

%r/ ... /
-- regexp delimter (why they didn't just use / ... /, I don't know)
( ... )
-- non-capturing group - (normally would be capturing, but see
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt, part 10, case 3)
-- seems rather useless, given that the only contained item is a
capturing group
(?<pg> ... )
-- capturing named group (see http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt,
part 7)
\( .. \)
-- literal parentheses surrounding pattern
(?: ... | ... | ... )*
-- non-capturing group of 3 alternatives, repeated 0 or more times
\\[()]
-- escaped literal parens
[^()]
-- anything except parens
\g<pg>
-- match the pg-named pattern here (recursive sub-exp - see
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt, part 9)

Wayne Magor · Nov 20, 2007

Thanks, I understand nearly everything now. It really shows the power
of the oniguruma engine for regular expressions. By the way, the comma
caused a Japanese site to come up. For people's reference the manual
for onigurama is at:

http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

That's a great reference, but I still didn't understand this:

Noah said:
\\[()]
-- escaped literal parens

What is that pattern? I've never seen that before. What does it match?
Where can I read about that?

Why would that be there since a new open paren should start another
instance of <pg>, shouldn't it?

So, obviously, I'm still a little confused.

Eric Hodel · Nov 21, 2007

I understand regular expressions, but can someone please explain this:

re = %r/((?<pg>\((?:\\[()]|[^()]|\g<pg>)*\)))/

By the way, this only works with the Oniguruma engine (Ruby 1.9).

So, now that there is the capability to match balanced parens and so
forth, does this mean that the new regular expression engine can be
used to construct simple parsers (matching language constructs)?

No, translated to 1.8, the regex would be:

%r/((\((?:\\[()]|[^()]|\2)*\)))/

Eric Hodel · Nov 21, 2007

Noah said:
Noah said:

\\[()]
-- escaped literal parens

Click to expand...

What is that pattern? I've never seen that before. What does it
match?
Where can I read about that?

"\(" or "\)"

Wayne Magor · Nov 21, 2007

Noah said:
I understand regular expressions, but can someone please explain this:

re = %r/((?<pg>\((?:\\[()]|[^()]|\g<pg>)*\)))/

Click to expand...

snip
(?: ... | ... | ... )*
-- non-capturing group of 3 alternatives, repeated 0 or more times
\\[()]
-- escaped literal parens
[^()]
-- anything except parens
\g<pg>
-- match the pg-named pattern here

Ok, so there are 3 alternatives in the non-capturing group:

1. An open or close parenthesis
2. Any character except a paren
3. A pattern that starts with an open paren

Am I the only one that finds this strange?

Noah Easterly · Dec 3, 2007

Noah said:
Noah said:

I understand regular expressions, but can someone please explain this:
re = %r/((?<pg>\((?:\\[()]|[^()]|\g<pg>)*\)))/

Click to expand...

snip
(?: ... | ... | ... )*
-- non-capturing group of 3 alternatives, repeated 0 or more times
\\[()]
-- escaped literal parens
[^()]
-- anything except parens
\g<pg>
-- match the pg-named pattern here

Click to expand...

Ok, so there are 3 alternatives in the non-capturing group:

1. An open or close parenthesis

correction. As Eric said above, an escaped (read, with leading
backslash) parenthesis.

2. Any character except a paren yup.
3. A pattern that starts with an open paren

AND ends in a close paren, and contains only, non-parens, escaped
parens, and balanced pairs of parens.

Am I the only one that finds this strange?

Doubtful

. You may be one of the ones to which this is new, though.

I find it strange that only recognize parenthesis escapes, and not
escaped backslashes. So you can do something like:
( \( )
and match correctly, but there's no way to do a balanced pair of
parentheses containing just a backslash:
(\) -- no
(\\) -- no
(\\\) -- no
(\ ) -- matches, but has an extra space.

I would have replaced '\\[()]' by '\\[()\\]' so that '(\\)' would
match.

regexp issue	4	Oct 8, 2010
Regular Expression help - Replacing Regexp that worked withOniguruma in 1.8.6	5	Feb 20, 2011
[SUMMARY] Parsing JSON (#155)	12	Feb 7, 2008
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013
handling of regexp objects that aren't referenced by variables,arrays, tables or objects	11	Sep 27, 2009
Can anyone write this recursion for simple regexp more beautifullyand clearly than the braggarts	157	Aug 29, 2009
Lalr(n) parsing with reg	1	Apr 25, 2005
regexp match and nil	2	Aug 26, 2008

Regexp for parsing?

Wayne Magor

Noah Easterly

Wayne Magor

Eric Hodel

Eric Hodel

Wayne Magor

Noah Easterly

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads