Regexp Arity

trans. (T. Onoma) · Sep 22, 2004

Just ran into a need to know how many parenthetical groupings a Regexp has.
Would #arity for Regexp be a good idea?

T.

Yukihiro Matsumoto · Sep 22, 2004

Hi,

In message "Re: Regexp Arity"

|Just ran into a need to know how many parenthetical groupings a Regexp has.
|Would #arity for Regexp be a good idea?

I'm not sure if it should be named 'arity'.

matz.

Austin Ziegler · Sep 22, 2004

In message "Re: Regexp Arity"
on Thu, 23 Sep 2004 03:26:44 +0900, "trans. (T. Onoma)" <[email protected]> > |Just ran into a need to know how many parenthetical groupings a Regexp has.
|Would #arity for Regexp be a good idea?
I'm not sure if it should be named 'arity'.

Regexp#groups ?
Regexp#groupcount?

-austin

Simon Strandgaard · Sep 22, 2004

Just ran into a need to know how many parenthetical groupings a Regexp has.
Would #arity for Regexp be a good idea?

hmm...

irb(main):006:0> parser = Parser.new('a(b(?:c(x)c)d)')
[snip]
irb(main):007:0> class << parser
irb(main):008:1> attr_reader :number_of_captures
irb(main):009:1> end
=> nil
irb(main):010:0> parser.number_of_captures
=> 2
irb(main):011:0>

why have I never thought of this?

trans. (T. Onoma) · Sep 22, 2004

Hi,

In message "Re: Regexp Arity"

|Just ran into a need to know how many parenthetical groupings a Regexp
| has. Would #arity for Regexp be a good idea?

I'm not sure if it should be named 'arity'.

Click to expand...

Perhaps not, but I really can't think of a better word. Everything else seems
long or odd sounding:

#group_count
#parentheticals_count
#number_of_subexpressions

Hmm... that brings up a good point. Would zero-width positive lookheads and/or
non-backreferencing groups be counted?

T.

--
( o _ ã‚«ãƒ©ãƒ
// trans.
/ \ (e-mail address removed)

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

David A. Black · Sep 22, 2004

Hi --

Hi,

In message "Re: Regexp Arity"

Perhaps not, but I really can't think of a better word. Everything else seems
long or odd sounding:

#group_count
#parentheticals_count
#number_of_subexpressions

Hmm... that brings up a good point. Would zero-width positive lookheads and/or
non-backreferencing groups be counted?

Click to expand...

To back up slightly: I can't help wondering under what conditions you
would need to know this. Can you present the problem you were trying
to solve? Maybe there's a simpler way.

David

Simon Strandgaard · Sep 22, 2004

|Just ran into a need to know how many parenthetical groupings a Regexp
| has. Would #arity for Regexp be a good idea?

Click to expand...

[snip]
Hmm... that brings up a good point. Would zero-width positive lookheads
and/or non-backreferencing groups be counted?

Click to expand...

To back up slightly: I can't help wondering under what conditions you
would need to know this. Can you present the problem you were trying
to solve? Maybe there's a simpler way.

trans. (T. Onoma) · Sep 22, 2004

To back up slightly: I can't help wondering under what conditions you
would need to know this. Can you present the problem you were trying
to solve? Maybe there's a simpler way.

I'm pattern matching a document. When a match is found the match/submatches
are passed to various procs. If the regexp has groupings I want to pass the
group matches, if not just the whole match. I could just go by the matchdata
length, but the regexp is stored in a dedicated "token" class and that class
may need to be initialized differently based on that "arity".

But granted, I did just work out a different approach, so as of this moment I
won't need it. Although I see no good reason that information shouldn't be
available. As I think Simon pointed out with his engine, "it is in there".

Thanks,
T.

Gavin Sinclair · Sep 23, 2004

Regexp#groups ?
Regexp#groupcount?

Regexp#ngroups

Gavin

Nikolai Weibull · Sep 23, 2004

* Gavin Sinclair said:
Regexp#ngroups

Best by far, so far,
nikolai

Florian Frank · Sep 23, 2004

Best by far, so far,

Which groups are counted here? There are capturing groups and
non-capturing groups. Or is this the sum of those two numbers?

At the moment I don't see any obvious reason to count those
groups in a Regexp at all. I should know everything about them
before I created the regexp. If I want to find out how many
captures I got after matching the regexp I can easily get that
info from MatchData#captures.size.

On the other hand the grouping is a tree structure and if I simply
compute one number from a regexp I loose a lot of information. They can
be quite complicated like /(((a))(?

b|c)|d))/. From the group
size alone I cannot infer very much about the values that will be
captured after a match. Simons tree construction example makes
much more sense if one wants to examine the structure of
a regexp.

Nikolai Weibull · Sep 23, 2004

Which groups are counted here? There are capturing groups and
non-capturing groups. Or is this the sum of those two numbers?

Precisely why it isn't great.

At the moment I don't see any obvious reason to count those
groups in a Regexp at all. I should know everything about them
before I created the regexp. If I want to find out how many
captures I got after matching the regexp I can easily get that
info from MatchData#captures.size.

On the other hand the grouping is a tree structure and if I simply
compute one number from a regexp I loose a lot of information. They can
be quite complicated like /(((a))(?b|c)|d))/. From the group
size alone I cannot infer very much about the values that will be
captured after a match. Simons tree construction example makes
much more sense if one wants to examine the structure of
a regexp.

The only viable thing to do is return exactly the number of capturing
groups given, not excluding anything. If you wish to write a regex such
as yours above, then you are in it for the trouble. I agree with you,
however, that designing based on the knowledge of how many capturing
groups are in a regex isn't very good design at all.
nikolai

David A. Black · Sep 23, 2004

Hi --

Which groups are counted here? There are capturing groups and
non-capturing groups. Or is this the sum of those two numbers?

At the moment I don't see any obvious reason to count those
groups in a Regexp at all. I should know everything about them
before I created the regexp. If I want to find out how many
captures I got after matching the regexp I can easily get that
info from MatchData#captures.size.

I agree. It's sounding like this is being informally proposed as a
new core method (rather than just discussed as something one might
write ad hoc), and I haven't seen a case being made for it at all.

David

trans. (T. Onoma) · Sep 23, 2004

I agree. It's sounding like this is being informally proposed as a
new core method (rather than just discussed as something one might
write ad hoc), and I haven't seen a case being made for it at all.

Well, you may be right. Most scenarios can be adjusted for *after the match*.
The only *necessary* use case (since that's what you'r after) would come from
something that could only be done before matching happens. Given just how
dynamic Ruby is, that something of course will be hard to find. (BTW --Ruby's
dynamic method arguemts *args is exactly how I resolved my potential use
case.)

On the other hand, if this information is already "near the surface" in the
Regexp engine. It certainly shouldn't be a big deal to add a method to access
it. It's funny how people are more likely to find uses for things when they
can actually use them.

Any way, it's no big deal. I was just wondering if
anyone else thought they might be of use.

FYI --Concerning which type of captures to count, only groups that can
actually capture are of any external use. So only they need counting really.

--
( o _ ã‚«ãƒ©ãƒ
// trans.
/ \ (e-mail address removed)

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

trans. (T. Onoma) · Sep 23, 2004

BTW --What's the use case of #casefold?

T.

Robert Klemme · Sep 23, 2004

trans. (T. Onoma) said:
BTW --What's the use case of #casefold?

T.

You can query whether the regexp at hand ignores case or not. (That's the
/i flag)

robert

How to distinguish blocks of certain arity in 1.8	4	Apr 22, 2009
instance_eval with local variable	5	Jan 12, 2010
Regexp arity revisited	0	Oct 20, 2004
each by arity	17	Jun 14, 2009
[RCR] #inject, #partition expand array if arity > 2	4	Oct 29, 2003
Help for extracting text with regexp.	4	Feb 18, 2011
Why the expression "(1)" is not an one-arity tuple, but int ?	0	Dec 4, 2009
Redefe each to include each_with_index and arity=2	4	Sep 10, 2008

Regexp Arity

trans. (T. Onoma)

Yukihiro Matsumoto

Austin Ziegler

Simon Strandgaard

trans. (T. Onoma)

David A. Black

Simon Strandgaard

trans. (T. Onoma)

Gavin Sinclair

Nikolai Weibull

Florian Frank

Nikolai Weibull

David A. Black

trans. (T. Onoma)

trans. (T. Onoma)

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads