Regexp Arity

T

trans. (T. Onoma)

Just ran into a need to know how many parenthetical groupings a Regexp has.
Would #arity for Regexp be a good idea?

T.
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: Regexp Arity"

|Just ran into a need to know how many parenthetical groupings a Regexp has.
|Would #arity for Regexp be a good idea?

I'm not sure if it should be named 'arity'.

matz.
 
A

Austin Ziegler

In message "Re: Regexp Arity"
on Thu, 23 Sep 2004 03:26:44 +0900, "trans. (T. Onoma)" <[email protected]> > |Just ran into a need to know how many parenthetical groupings a Regexp has.
|Would #arity for Regexp be a good idea?
I'm not sure if it should be named 'arity'.

Regexp#groups ?
Regexp#groupcount?

-austin
 
S

Simon Strandgaard

Just ran into a need to know how many parenthetical groupings a Regexp has.
Would #arity for Regexp be a good idea?


hmm...

irb(main):006:0> parser = Parser.new('a(b(?:c(x)c)d)')
[snip]
irb(main):007:0> class << parser
irb(main):008:1> attr_reader :number_of_captures
irb(main):009:1> end
=> nil
irb(main):010:0> parser.number_of_captures
=> 2
irb(main):011:0>


why have I never thought of this?
 
T

trans. (T. Onoma)

Hi,

In message "Re: Regexp Arity"

|Just ran into a need to know how many parenthetical groupings a Regexp
| has. Would #arity for Regexp be a good idea?

I'm not sure if it should be named 'arity'.

Perhaps not, but I really can't think of a better word. Everything else seems
long or odd sounding:

#group_count
#parentheticals_count
#number_of_subexpressions

Hmm... that brings up a good point. Would zero-width positive lookheads and/or
non-backreferencing groups be counted?

T.

--
( o _ カラãƒ
// trans.
/ \ (e-mail address removed)

I don't give a damn for a man that can only spell a word one way.
-Mark Twain
 
D

David A. Black

Hi --

Hi,

In message "Re: Regexp Arity"



Perhaps not, but I really can't think of a better word. Everything else seems
long or odd sounding:

#group_count
#parentheticals_count
#number_of_subexpressions

Hmm... that brings up a good point. Would zero-width positive lookheads and/or
non-backreferencing groups be counted?

To back up slightly: I can't help wondering under what conditions you
would need to know this. Can you present the problem you were trying
to solve? Maybe there's a simpler way.


David
 
S

Simon Strandgaard

|Just ran into a need to know how many parenthetical groupings a Regexp
| has. Would #arity for Regexp be a good idea?
[snip]
Hmm... that brings up a good point. Would zero-width positive lookheads
and/or non-backreferencing groups be counted?

To back up slightly: I can't help wondering under what conditions you
would need to know this. Can you present the problem you were trying
to solve? Maybe there's a simpler way.


Maybe I should advertise some more for my regexp package

irb(main):001:0> require 'regexp'
=> true
irb(main):002:0> puts /this(feels(like)lisp|scheme)/.tree
+-Sequence
+-Inside set="t"
+-Inside set="h"
+-Inside set="i"
+-Inside set="s"
+-Group capture=1
+-Alternation
+-Sequence
| +-Inside set="f"
| +-Inside set="e"
| +-Inside set="e"
| +-Inside set="l"
| +-Inside set="s"
| +-Group capture=2
| | +-Sequence
| | +-Inside set="l"
| | +-Inside set="i"
| | +-Inside set="k"
| | +-Inside set="e"
| +-Inside set="l"
| +-Inside set="i"
| +-Inside set="s"
| +-Inside set="p"
+-Sequence
+-Inside set="s"
+-Inside set="c"
+-Inside set="h"
+-Inside set="e"
+-Inside set="m"
+-Inside set="e"
=> nil
irb(main):003:0>

to install just type

rpa install re
 
T

trans. (T. Onoma)

To back up slightly: I can't help wondering under what conditions you
would need to know this. Can you present the problem you were trying
to solve? Maybe there's a simpler way.

I'm pattern matching a document. When a match is found the match/submatches
are passed to various procs. If the regexp has groupings I want to pass the
group matches, if not just the whole match. I could just go by the matchdata
length, but the regexp is stored in a dedicated "token" class and that class
may need to be initialized differently based on that "arity".

But granted, I did just work out a different approach, so as of this moment I
won't need it. Although I see no good reason that information shouldn't be
available. As I think Simon pointed out with his engine, "it is in there".

Thanks,
T.
 
F

Florian Frank

Best by far, so far,

Which groups are counted here? There are capturing groups and
non-capturing groups. Or is this the sum of those two numbers?

At the moment I don't see any obvious reason to count those
groups in a Regexp at all. I should know everything about them
before I created the regexp. If I want to find out how many
captures I got after matching the regexp I can easily get that
info from MatchData#captures.size.

On the other hand the grouping is a tree structure and if I simply
compute one number from a regexp I loose a lot of information. They can
be quite complicated like /(((a))(?:(b|c)|d))/. From the group
size alone I cannot infer very much about the values that will be
captured after a match. Simons tree construction example makes
much more sense if one wants to examine the structure of
a regexp.
 
N

Nikolai Weibull

Which groups are counted here? There are capturing groups and
non-capturing groups. Or is this the sum of those two numbers?

Precisely why it isn't great.
At the moment I don't see any obvious reason to count those
groups in a Regexp at all. I should know everything about them
before I created the regexp. If I want to find out how many
captures I got after matching the regexp I can easily get that
info from MatchData#captures.size.
On the other hand the grouping is a tree structure and if I simply
compute one number from a regexp I loose a lot of information. They can
be quite complicated like /(((a))(?:(b|c)|d))/. From the group
size alone I cannot infer very much about the values that will be
captured after a match. Simons tree construction example makes
much more sense if one wants to examine the structure of
a regexp.

The only viable thing to do is return exactly the number of capturing
groups given, not excluding anything. If you wish to write a regex such
as yours above, then you are in it for the trouble. I agree with you,
however, that designing based on the knowledge of how many capturing
groups are in a regex isn't very good design at all.
nikolai
 
D

David A. Black

Hi --

Which groups are counted here? There are capturing groups and
non-capturing groups. Or is this the sum of those two numbers?

At the moment I don't see any obvious reason to count those
groups in a Regexp at all. I should know everything about them
before I created the regexp. If I want to find out how many
captures I got after matching the regexp I can easily get that
info from MatchData#captures.size.

I agree. It's sounding like this is being informally proposed as a
new core method (rather than just discussed as something one might
write ad hoc), and I haven't seen a case being made for it at all.


David
 
T

trans. (T. Onoma)

I agree. It's sounding like this is being informally proposed as a
new core method (rather than just discussed as something one might
write ad hoc), and I haven't seen a case being made for it at all.

Well, you may be right. Most scenarios can be adjusted for *after the match*.
The only *necessary* use case (since that's what you'r after) would come from
something that could only be done before matching happens. Given just how
dynamic Ruby is, that something of course will be hard to find. (BTW --Ruby's
dynamic method arguemts *args is exactly how I resolved my potential use
case.)

On the other hand, if this information is already "near the surface" in the
Regexp engine. It certainly shouldn't be a big deal to add a method to access
it. It's funny how people are more likely to find uses for things when they
can actually use them. ;) Any way, it's no big deal. I was just wondering if
anyone else thought they might be of use.

FYI --Concerning which type of captures to count, only groups that can
actually capture are of any external use. So only they need counting really.

--
( o _ カラãƒ
// trans.
/ \ (e-mail address removed)

I don't give a damn for a man that can only spell a word one way.
-Mark Twain
 
R

Robert Klemme

trans. (T. Onoma) said:
BTW --What's the use case of #casefold?

T.
You can query whether the regexp at hand ignores case or not. (That's the
/i flag)

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top