[Brainstorming Input] Ruby-Oniguruma interoperability on Named Groups

Discussion in 'Ruby' started by Wolfgang Nádasi-Donner, Jul 30, 2005.

  1. Let me first explain the reason for and the kind of this message.

    I have an vague idea on coming to more readable Regular Expression, and the possibility to build Libraries of
    Regular Expressions. The hook are the named groups ('(?<name>...)') which are part of Oniguruma. The idea was
    influenced by the ancient Snobol4-Language and its '*'-operator for unevaluated expressions.

    This input is brainstorming material and not a change proposal, because it is not mature enough. I hope that
    something like this will appear sometimes in the future in Ruby.

    Now the idea.

    Ruby and Onigurama should be extended somehow to allow Ruby-objects (usually regular expressions) to be
    registered somehow to the class Regexp, so that they can be referenced later in regular expressions.

    In detail a regular expression that consists only of a named group definition (starts with '(?<name>') kann be
    registered by something like 'Regex.register(/(?<example>a|b|c|d)/)', and be deleted by
    Regex.remove('<example>'). If the regular expression is assigned to a variable this can be used, how to manage
    this in the 'remove' case has to be clearified. I used class methods for this example, but it may be better to
    introduce a named Regexp objects which will be created by something like '/(?<example>a|b|c|d)/.create. Some
    possibility for explicit deletion should be there, because the regex engine Oniguruma must know about the
    object to take care about.

    These Object can later on be referenced in regular expressions by '\k<name>' or '\g<name>' as if they were
    defined there.

    This could made regular expressions be much more readable, because one can build them based on smaller parts,
    one can build special Libraries of regular expression parts that are usable in the applications, and one can
    use regular expression parts that were build by others without complete understanding of their details.

    I think that this is worth to think about.

    Best regards, Wolfgang

    --
    Wolfgang Nádasi-Donner
     
    Wolfgang Nádasi-Donner, Jul 30, 2005
    #1
    1. Advertising

  2. On 30/07/05, Wolfgang N=E1dasi-Donner <> wrote:
    > Let me first explain the reason for and the kind of this message.
    >=20
    > I have an vague idea on coming to more readable Regular Expression, and t=

    he possibility to build Libraries of
    > Regular Expressions. The hook are the named groups ('(?<name>...)') which=

    are part of Oniguruma. The idea was
    > influenced by the ancient Snobol4-Language and its '*'-operator for uneva=

    luated expressions.
    >=20
    > This input is brainstorming material and not a change proposal, because i=

    t is not mature enough. I hope that
    > something like this will appear sometimes in the future in Ruby.
    >=20
    > Now the idea.
    >=20
    > Ruby and Onigurama should be extended somehow to allow Ruby-objects (usua=

    lly regular expressions) to be
    > registered somehow to the class Regexp, so that they can be referenced la=

    ter in regular expressions.
    >=20
    > In detail a regular expression that consists only of a named group defini=

    tion (starts with '(?<name>') kann be
    > registered by something like 'Regex.register(/(?<example>a|b|c|d)/)', and=

    be deleted by
    > Regex.remove('<example>'). If the regular expression is assigned to a var=

    iable this can be used, how to manage
    > this in the 'remove' case has to be clearified. I used class methods for =

    this example, but it may be better to
    > introduce a named Regexp objects which will be created by something like =

    '/(?<example>a|b|c|d)/.create. Some
    > possibility for explicit deletion should be there, because the regex engi=

    ne Oniguruma must know about the
    > object to take care about.
    >=20
    > These Object can later on be referenced in regular expressions by '\k<nam=

    e>' or '\g<name>' as if they were
    > defined there.
    >=20
    > This could made regular expressions be much more readable, because one ca=

    n build them based on smaller parts,
    > one can build special Libraries of regular expression parts that are usab=

    le in the applications, and one can
    > use regular expression parts that were build by others without complete u=

    nderstanding of their details.
    >=20
    > I think that this is worth to think about.
    >=20
    > Best regards, Wolfgang


    Hello Wolfgang,

    where is the difference to



    example =3D "(?<example>a|b|c)"
    regex =3D /#{example}|nothing/

    except that you make Regexp hold the example variable, and have a
    parse test on the regexp. And you may get these by something like
    this:

    bschroed@black:~/svn/projekte/ruby-things$ cat regexp.rb

    class Regexp
    def self.register(name, regexp)
    self.new(regexp.to_s)
    (@registered_res ||=3D {})[name] =3D regexp.to_s
    end

    def self.[](name)
    @registered_res[name]
    end
    end

    Regexp.register:)example, 'a|b|c')

    if /#{Regexp[:example]}|nothing/ =3D~ 'Well, that was just nothing'
    puts "Contains an example or nothing"
    end

    Regexp.register:)invalid, '(invalid(')
    bschroed@black:~/svn/projekte/ruby-things$ ruby regexp.rb=20
    Contains an example or nothing
    regexp.rb:4:in `initialize': premature end of regular expression:
    /(invalid(/ (RegexpError)
    from regexp.rb:4:in `new'
    from regexp.rb:4:in `register'
    from regexp.rb:19


    So it seems a very specialized whish to me.

    Regards,

    Brian

    --=20
    http://ruby.brian-schroeder.de/

    Stringed instrument chords: http://chordlist.brian-schroeder.de/
     
    Brian Schröder, Jul 30, 2005
    #2
    1. Advertising

  3. >>>>> snip >>>>>
    Can't we do that already?

    example = /a|b|c|d/
    mybigregex = /#{example}|foo/

    If you need more scope, use constants.
    >>>>> snap >>>>>


    It is not the same, because you include the textual data (it is somehow like usind the C preprocessor). There
    are two disadvantages:

    1) During debugging or things like this you don't see your constructed structure - you have to work with the
    final regular expression

    2) You cannot manage recursive constructs, which are possible using '\g<name>'. This is a standard part on
    Oniguruma.

    --
    Wolfgang Nádasi-Donner
     
    Wolfgang Nádasi-Donner, Jul 30, 2005
    #3
  4. On Sat, 30 Jul 2005 20:31:03 +0200, Wolfgang N=E1dasi-Donner =20
    <> wrote:

    > Now the idea.
    >
    > Ruby and Onigurama should be extended somehow to allow Ruby-objects =20
    > (usually regular expressions) to be
    > registered somehow to the class Regexp, so that they can be referenced =

    =20
    > later in regular expressions.


    I think I generally like the idea to compose regular expressions that way=
    =20
    ...

    > In detail a regular expression that consists only of a named group =20
    > definition (starts with '(?<name>') kann be
    > registered by something like 'Regex.register(/(?<example>a|b|c|d)/)', =20
    > and be deleted by
    > Regex.remove('<example>'). If the regular expression is assigned to a =20
    > variable this can be used, how to manage
    > this in the 'remove' case has to be clearified. I used class methods fo=

    r =20
    > this example, but it may be better to
    > introduce a named Regexp objects which will be created by something lik=

    e =20
    > '/(?<example>a|b|c|d)/.create. Some
    > possibility for explicit deletion should be there, because the regex =20
    > engine Oniguruma must know about the
    > object to take care about.


    ... but I think registering all named groups in one global place is not a=
    =20
    good idea (even if you can unregister): what if two libraries use the sam=
    e =20
    group names? I think there would be many name clashes.

    So here is another idea: Let the caller manage the named groups himself. =
    =20
    Maybe in arrays or hashes. Something like:

    groups =3D [/(?<example>a|b|c|d)/, /(?<example2>e|f|g)/]

    or with hashes:

    groups =3D { "example" =3D> /a|b|c|d/, "example2" =3D> /e|f|g/ }

    or maybe in some specialized named groups library class.

    > These Object can later on be referenced in regular expressions by =20
    > '\k<name>' or '\g<name>' as if they were
    > defined there.


    To use those groups I would suggest something like:

    /\k<example>/.with(groups)

    RegExp#with would return the "composed" RegExp that can be used like any =
    =20
    other RegExp.

    What do you think?

    Dominik

    Disclaimer: I do not really know how named groups work in Oniguruma, just=
    =20
    wanted to point out that one global registry might be a bad idea.
     
    Dominik Bathon, Jul 31, 2005
    #4
  5. Wolfgang N_dasi-Donner wrote:

    [blurb about named groups in regular expressions]

    I hope that the people responsible for the regular-expression code in
    Ruby 2.0 read http://www.perl.com/pub/a/2002/06/04/apo5.html before
    going along with a Perl-5-inspired syntax with hopelessly ugly
    extensions (I'm sorry, but \k<name> and \g<name> are just horrendous).
    Perl 6=E2=80=99s way of defining grammars is quite neat and simple to
    understand. I also have some ideas for a better syntax, which is
    inspired by the afforementioned document, but I have yet to release
    anything (it was part of my master=E2=80=99s thesis),
    nikolai

    --=20
    Nikolai Weibull: now available free of charge at http://bitwi.se/!
    Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
    main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
     
    Nikolai Weibull, Jul 31, 2005
    #5
  6. "Dominik Bathon" <> schrieb im Newsbeitrag news:eek:p.sur8hdg2d62ajc@localhost...
    On Sat, 30 Jul 2005 20:31:03 +0200, Wolfgang Nádasi-Donner

    >>>>> snip >>>>>

    Disclaimer: I do not really know how named groups work in Oniguruma, ...
    >>>>> snap >>>>>


    I think Oniguruma is somehow stable and used for other projects too, but this may be a wrong information. I
    took the Uniguruma syntax 'as given'.

    --
    Wolfgang Nádasi-Donner
     
    Wolfgang Nádasi-Donner, Jul 31, 2005
    #6
  7. "Nikolai Weibull" <> schrieb im Newsbeitrag
    news:...

    >>>>> snip >>>>>

    I hope that the people responsible for the regular-expression code in
    Ruby 2.0 read http://www.perl.com/pub/a/2002/06/04/apo5.html before
    going along with a Perl-5-inspired syntax with hopelessly ugly
    extensions (I'm sorry, but \k<name> and \g<name> are just horrendous).
    Perl 6's way of defining grammars is quite neat and simple to
    understand. ...
    >>>>> snap >>>>>


    Is it a realistic idea to produce change proposals against Oniguruma? - As I understood it is a project in its
    own right and used in different projects, not only Ruby.

    --
    Wolfgang Nádasi-Donner
     
    Wolfgang Nádasi-Donner, Jul 31, 2005
    #7
  8. On 7/31/05, Nikolai Weibull
    <> wrote:
    [snip]
    > Perl 6's way of defining grammars is quite neat and simple to
    > understand. I also have some ideas for a better syntax, which is
    > inspired by the afforementioned document, but I have yet to release
    > anything (it was part of my master's thesis),


    Indeed.. perl6's new regexp/grammar syntax is sweet :)

    --
    Simon Strandgaard
     
    Simon Strandgaard, Jul 31, 2005
    #8
  9. >>>>> snip >>>>>
    "Dominik Bathon" <> schrieb im Newsbeitrag news:eek:p.sur8hdg2d62ajc@localhost...
    On Sat, 30 Jul 2005 20:31:03 +0200, Wolfgang Nádasi-Donner
    ..
    ..
    ..
    ... but I think registering all named groups in one global place is not a
    good idea (even if you can unregister): what if two libraries use the same
    group names? I think there would be many name clashes.

    So here is another idea: Let the caller manage the named groups himself.
    Maybe in arrays or hashes. Something like:

    groups = [/(?<example>a|b|c|d)/, /(?<example2>e|f|g)/]

    or with hashes:

    groups = { "example" => /a|b|c|d/, "example2" => /e|f|g/ }

    or maybe in some specialized named groups library class.

    > These Object can later on be referenced in regular expressions by
    > '\k<name>' or '\g<name>' as if they were
    > defined there.


    To use those groups I would suggest something like:

    /\k<example>/.with(groups)

    RegExp#with would return the "composed" RegExp that can be used like any
    other RegExp.

    What do you think?
    >>>>> snap >>>>>


    First of all - I made a mistake. Please forget all '\k<name>...'-stuff. This is the same as '\1', '\2', ...,
    which means, it is a reference to a match result of applying this group in the actual matching process. We are
    talking here about the '\g<name>...' reference only, which is a call to the group during match time. For
    simply prematch time replacement the '#{...}' Ruby construct is still usable.

    It is clear for my understanding that in the Ruby environment the class 'Regexp' must be changed, as well as
    'Oniguruma' itself, because it must be able to find the predefined patterns during a match process.

    My suggestion based this on the prerequisite to have minimal changes in Oniguruma and Ruby's Regexp class -
    making such changes acceptable and possible ;-) This implies not to change existing things in Ruby and
    Oniguruma. Insofar I prefer the usage of '\g<name>' instead of some other notation, but that are only my
    thoughts for it.

    The idea of using hashes in Ruby and an extension of class Regexp having a 'with' method sounds very good.
    This method is a candidate for building the connection to Oniguruma, which then knows where to search for a
    '(?<paul>...)' expression, if it isn't defined in the actual regular expression, but referenced via '\g<paul>'
    there.

    The 'with' method may be able to have a list of hashes as parameter (or even multiple hashes as parameters),
    because one may use more than one predefined pattern groups (may happen if one uses a general pattern library
    and a special one for the application).

    --
    Wolfgang Nádasi-Donner
     
    Wolfgang Nádasi-Donner, Jul 31, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Victor
    Replies:
    1
    Views:
    379
    Markus Wankus
    Jan 13, 2004
  2. David Vaughan
    Replies:
    3
    Views:
    258
    Scott David Daniels
    Aug 22, 2004
  3. Chariton Karamitas

    Two C++ snippets for brainstorming

    Chariton Karamitas, Dec 2, 2011, in forum: C++
    Replies:
    6
    Views:
    772
    John Tsiombikas
    Dec 10, 2011
  4. Berger, Daniel

    Named backreferences with Oniguruma

    Berger, Daniel, Jul 27, 2006, in forum: Ruby
    Replies:
    4
    Views:
    105
  5. rretzbach
    Replies:
    2
    Views:
    135
    rretzbach
    Apr 13, 2007
Loading...

Share This Page