Regexp Arity

Discussion in 'Ruby' started by trans. (T. Onoma), Sep 22, 2004.

  1. Just ran into a need to know how many parenthetical groupings a Regexp has.
    Would #arity for Regexp be a good idea?

    T.
    trans. (T. Onoma), Sep 22, 2004
    #1
    1. Advertising

  2. Hi,

    In message "Re: Regexp Arity"
    on Thu, 23 Sep 2004 03:26:44 +0900, "trans. (T. Onoma)" <> writes:

    |Just ran into a need to know how many parenthetical groupings a Regexp has.
    |Would #arity for Regexp be a good idea?

    I'm not sure if it should be named 'arity'.

    matz.
    Yukihiro Matsumoto, Sep 22, 2004
    #2
    1. Advertising

  3. On Thu, 23 Sep 2004 03:54:31 +0900, Yukihiro Matsumoto
    <> wrote:
    > In message "Re: Regexp Arity"
    > on Thu, 23 Sep 2004 03:26:44 +0900, "trans. (T. Onoma)" <> > |Just ran into a need to know how many parenthetical groupings a Regexp has.
    > |Would #arity for Regexp be a good idea?
    > I'm not sure if it should be named 'arity'.


    Regexp#groups ?
    Regexp#groupcount?

    -austin
    --
    Austin Ziegler *
    * Alternate:
    : as of this email, I have [ 6 ] Gmail invitations
    Austin Ziegler, Sep 22, 2004
    #3
  4. On Wednesday 22 September 2004 20:26, trans. (T. Onoma) wrote:
    > Just ran into a need to know how many parenthetical groupings a Regexp has.
    > Would #arity for Regexp be a good idea?



    hmm...

    irb(main):006:0> parser = Parser.new('a(b(?:c(x)c)d)')
    [snip]
    irb(main):007:0> class << parser
    irb(main):008:1> attr_reader :number_of_captures
    irb(main):009:1> end
    => nil
    irb(main):010:0> parser.number_of_captures
    => 2
    irb(main):011:0>


    why have I never thought of this?

    --
    Simon Strandgaard
    Simon Strandgaard, Sep 22, 2004
    #4
  5. On Wednesday 22 September 2004 02:54 pm, Yukihiro Matsumoto wrote:
    > Hi,
    >
    > In message "Re: Regexp Arity"
    >
    > on Thu, 23 Sep 2004 03:26:44 +0900, "trans. (T. Onoma)"

    <> writes:
    > |Just ran into a need to know how many parenthetical groupings a Regexp
    > | has. Would #arity for Regexp be a good idea?
    >
    > I'm not sure if it should be named 'arity'.


    Perhaps not, but I really can't think of a better word. Everything else seems
    long or odd sounding:

    #group_count
    #parentheticals_count
    #number_of_subexpressions

    Hmm... that brings up a good point. Would zero-width positive lookheads and/or
    non-backreferencing groups be counted?

    T.

    --
    ( o _ カラãƒ
    // trans.
    / \

    I don't give a damn for a man that can only spell a word one way.
    -Mark Twain
    trans. (T. Onoma), Sep 22, 2004
    #5
  6. Hi --

    On Thu, 23 Sep 2004, trans. (T. Onoma) wrote:

    > On Wednesday 22 September 2004 02:54 pm, Yukihiro Matsumoto wrote:
    > > Hi,
    > >
    > > In message "Re: Regexp Arity"
    > >
    > > on Thu, 23 Sep 2004 03:26:44 +0900, "trans. (T. Onoma)"

    > <> writes:
    > > |Just ran into a need to know how many parenthetical groupings a Regexp
    > > | has. Would #arity for Regexp be a good idea?
    > >
    > > I'm not sure if it should be named 'arity'.

    >
    > Perhaps not, but I really can't think of a better word. Everything else seems
    > long or odd sounding:
    >
    > #group_count
    > #parentheticals_count
    > #number_of_subexpressions
    >
    > Hmm... that brings up a good point. Would zero-width positive lookheads and/or
    > non-backreferencing groups be counted?


    To back up slightly: I can't help wondering under what conditions you
    would need to know this. Can you present the problem you were trying
    to solve? Maybe there's a simpler way.


    David

    --
    David A. Black
    David A. Black, Sep 22, 2004
    #6
  7. On Wednesday 22 September 2004 21:47, David A. Black wrote:
    > > <> writes:
    > > > |Just ran into a need to know how many parenthetical groupings a Regexp
    > > > | has. Would #arity for Regexp be a good idea?

    [snip]
    > > Hmm... that brings up a good point. Would zero-width positive lookheads
    > > and/or non-backreferencing groups be counted?

    >
    > To back up slightly: I can't help wondering under what conditions you
    > would need to know this. Can you present the problem you were trying
    > to solve? Maybe there's a simpler way.



    Maybe I should advertise some more for my regexp package

    irb(main):001:0> require 'regexp'
    => true
    irb(main):002:0> puts /this(feels(like)lisp|scheme)/.tree
    +-Sequence
    +-Inside set="t"
    +-Inside set="h"
    +-Inside set="i"
    +-Inside set="s"
    +-Group capture=1
    +-Alternation
    +-Sequence
    | +-Inside set="f"
    | +-Inside set="e"
    | +-Inside set="e"
    | +-Inside set="l"
    | +-Inside set="s"
    | +-Group capture=2
    | | +-Sequence
    | | +-Inside set="l"
    | | +-Inside set="i"
    | | +-Inside set="k"
    | | +-Inside set="e"
    | +-Inside set="l"
    | +-Inside set="i"
    | +-Inside set="s"
    | +-Inside set="p"
    +-Sequence
    +-Inside set="s"
    +-Inside set="c"
    +-Inside set="h"
    +-Inside set="e"
    +-Inside set="m"
    +-Inside set="e"
    => nil
    irb(main):003:0>

    to install just type

    rpa install re

    --
    Simon Strandgaard
    Simon Strandgaard, Sep 22, 2004
    #7
  8. On Wednesday 22 September 2004 03:47 pm, David A. Black wrote:
    > To back up slightly: I can't help wondering under what conditions you
    > would need to know this. Can you present the problem you were trying
    > to solve? Maybe there's a simpler way.


    I'm pattern matching a document. When a match is found the match/submatches
    are passed to various procs. If the regexp has groupings I want to pass the
    group matches, if not just the whole match. I could just go by the matchdata
    length, but the regexp is stored in a dedicated "token" class and that class
    may need to be initialized differently based on that "arity".

    But granted, I did just work out a different approach, so as of this moment I
    won't need it. Although I see no good reason that information shouldn't be
    available. As I think Simon pointed out with his engine, "it is in there".

    Thanks,
    T.
    trans. (T. Onoma), Sep 22, 2004
    #8
  9. On Thursday, September 23, 2004, 4:56:23 AM, Austin wrote:

    > On Thu, 23 Sep 2004 03:54:31 +0900, Yukihiro Matsumoto
    > <> wrote:
    >> In message "Re: Regexp Arity"
    >> on Thu, 23 Sep 2004 03:26:44 +0900, "trans. (T. Onoma)"
    >> <> > |Just ran into a need to know how many
    >> parenthetical groupings a Regexp has.
    >> |Would #arity for Regexp be a good idea?
    >> I'm not sure if it should be named 'arity'.


    > Regexp#groups ?
    > Regexp#groupcount?


    Regexp#ngroups

    Gavin
    Gavin Sinclair, Sep 23, 2004
    #9
  10. * Gavin Sinclair <> [Sep 23, 2004 01:50]:
    > > Regexp#groups ?
    > > Regexp#groupcount?

    >
    > Regexp#ngroups


    Best by far, so far,
    nikolai

    --
    ::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
    ::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
    ::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
    main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
    Nikolai Weibull, Sep 23, 2004
    #10
  11. On Thu, 2004-09-23 at 02:03, Nikolai Weibull wrote:
    > * Gavin Sinclair <> [Sep 23, 2004 01:50]:
    > > > Regexp#groups ?
    > > > Regexp#groupcount?

    > >
    > > Regexp#ngroups

    >
    > Best by far, so far,


    Which groups are counted here? There are capturing groups and
    non-capturing groups. Or is this the sum of those two numbers?

    At the moment I don't see any obvious reason to count those
    groups in a Regexp at all. I should know everything about them
    before I created the regexp. If I want to find out how many
    captures I got after matching the regexp I can easily get that
    info from MatchData#captures.size.

    On the other hand the grouping is a tree structure and if I simply
    compute one number from a regexp I loose a lot of information. They can
    be quite complicated like /(((a))(?:(b|c)|d))/. From the group
    size alone I cannot infer very much about the values that will be
    captured after a match. Simons tree construction example makes
    much more sense if one wants to examine the structure of
    a regexp.

    --
    Florian Frank <>
    Private Linux Site
    Florian Frank, Sep 23, 2004
    #11
  12. * Florian Frank <> [Sep 23, 2004 03:00]:
    > > > > Regexp#groups ?
    > > > > Regexp#groupcount?


    > > > Regexp#ngroups


    > > Best by far, so far,


    > Which groups are counted here? There are capturing groups and
    > non-capturing groups. Or is this the sum of those two numbers?


    Precisely why it isn't great.

    > At the moment I don't see any obvious reason to count those
    > groups in a Regexp at all. I should know everything about them
    > before I created the regexp. If I want to find out how many
    > captures I got after matching the regexp I can easily get that
    > info from MatchData#captures.size.


    > On the other hand the grouping is a tree structure and if I simply
    > compute one number from a regexp I loose a lot of information. They can
    > be quite complicated like /(((a))(?:(b|c)|d))/. From the group
    > size alone I cannot infer very much about the values that will be
    > captured after a match. Simons tree construction example makes
    > much more sense if one wants to examine the structure of
    > a regexp.


    The only viable thing to do is return exactly the number of capturing
    groups given, not excluding anything. If you wish to write a regex such
    as yours above, then you are in it for the trouble. I agree with you,
    however, that designing based on the knowledge of how many capturing
    groups are in a regex isn't very good design at all.
    nikolai


    --
    ::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
    ::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
    ::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
    main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
    Nikolai Weibull, Sep 23, 2004
    #12
  13. Hi --

    On Thu, 23 Sep 2004, Florian Frank wrote:

    > On Thu, 2004-09-23 at 02:03, Nikolai Weibull wrote:
    > > * Gavin Sinclair <> [Sep 23, 2004 01:50]:
    > > > > Regexp#groups ?
    > > > > Regexp#groupcount?
    > > >
    > > > Regexp#ngroups

    > >
    > > Best by far, so far,

    >
    > Which groups are counted here? There are capturing groups and
    > non-capturing groups. Or is this the sum of those two numbers?
    >
    > At the moment I don't see any obvious reason to count those
    > groups in a Regexp at all. I should know everything about them
    > before I created the regexp. If I want to find out how many
    > captures I got after matching the regexp I can easily get that
    > info from MatchData#captures.size.


    I agree. It's sounding like this is being informally proposed as a
    new core method (rather than just discussed as something one might
    write ad hoc), and I haven't seen a case being made for it at all.


    David

    --
    David A. Black
    David A. Black, Sep 23, 2004
    #13
  14. On Wednesday 22 September 2004 09:13 pm, David A. Black wrote:

    > I agree. It's sounding like this is being informally proposed as a
    > new core method (rather than just discussed as something one might
    > write ad hoc), and I haven't seen a case being made for it at all.


    Well, you may be right. Most scenarios can be adjusted for *after the match*.
    The only *necessary* use case (since that's what you'r after) would come from
    something that could only be done before matching happens. Given just how
    dynamic Ruby is, that something of course will be hard to find. (BTW --Ruby's
    dynamic method arguemts *args is exactly how I resolved my potential use
    case.)

    On the other hand, if this information is already "near the surface" in the
    Regexp engine. It certainly shouldn't be a big deal to add a method to access
    it. It's funny how people are more likely to find uses for things when they
    can actually use them. ;) Any way, it's no big deal. I was just wondering if
    anyone else thought they might be of use.

    FYI --Concerning which type of captures to count, only groups that can
    actually capture are of any external use. So only they need counting really.

    --
    ( o _ カラãƒ
    // trans.
    / \

    I don't give a damn for a man that can only spell a word one way.
    -Mark Twain
    trans. (T. Onoma), Sep 23, 2004
    #14
  15. On Wednesday 22 September 2004 09:13 pm, David A. Black wrote:
    > > I agree. It's sounding like this is being informally proposed as a
    > > new core method (rather than just discussed as something one might
    > > write ad hoc), and I haven't seen a case being made for it at all.


    BTW --What's the use case of #casefold?

    T.
    trans. (T. Onoma), Sep 23, 2004
    #15
  16. "trans. (T. Onoma)" <> schrieb im Newsbeitrag
    news:...
    > On Wednesday 22 September 2004 09:13 pm, David A. Black wrote:
    > > > I agree. It's sounding like this is being informally proposed as a
    > > > new core method (rather than just discussed as something one might
    > > > write ad hoc), and I haven't seen a case being made for it at all.

    >
    > BTW --What's the use case of #casefold?
    >
    > T.
    >
    >

    You can query whether the regexp at hand ignores case or not. (That's the
    /i flag)

    robert
    Robert Klemme, Sep 23, 2004
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Howard Gardner
    Replies:
    4
    Views:
    350
    Howard Gardner
    Jul 20, 2006
  2. Michael Feathers
    Replies:
    3
    Views:
    284
    Roland Pibinger
    Aug 4, 2006
  3. ðÅÔÒÏ× áÌÅËÓÁÎÄÒ

    Why the expression "(1)" is not an one-arity tuple, but int ?

    ðÅÔÒÏ× áÌÅËÓÁÎÄÒ, Dec 4, 2009, in forum: Python
    Replies:
    0
    Views:
    271
    ðÅÔÒÏ× áÌÅËÓÁÎÄÒ
    Dec 4, 2009
  4. trans.  (T. Onoma)

    Regexp arity revisited

    trans. (T. Onoma), Oct 20, 2004, in forum: Ruby
    Replies:
    0
    Views:
    80
    trans. (T. Onoma)
    Oct 20, 2004
  5. Joao Silva
    Replies:
    16
    Views:
    354
    7stud --
    Aug 21, 2009
Loading...

Share This Page