[Brainstorming Input] Ruby-Oniguruma interoperability on Named Groups

  • Thread starter Wolfgang Nádasi-Donner
  • Start date
W

Wolfgang Nádasi-Donner

Let me first explain the reason for and the kind of this message.

I have an vague idea on coming to more readable Regular Expression, and the possibility to build Libraries of
Regular Expressions. The hook are the named groups ('(?<name>...)') which are part of Oniguruma. The idea was
influenced by the ancient Snobol4-Language and its '*'-operator for unevaluated expressions.

This input is brainstorming material and not a change proposal, because it is not mature enough. I hope that
something like this will appear sometimes in the future in Ruby.

Now the idea.

Ruby and Onigurama should be extended somehow to allow Ruby-objects (usually regular expressions) to be
registered somehow to the class Regexp, so that they can be referenced later in regular expressions.

In detail a regular expression that consists only of a named group definition (starts with '(?<name>') kann be
registered by something like 'Regex.register(/(?<example>a|b|c|d)/)', and be deleted by
Regex.remove('<example>'). If the regular expression is assigned to a variable this can be used, how to manage
this in the 'remove' case has to be clearified. I used class methods for this example, but it may be better to
introduce a named Regexp objects which will be created by something like '/(?<example>a|b|c|d)/.create. Some
possibility for explicit deletion should be there, because the regex engine Oniguruma must know about the
object to take care about.

These Object can later on be referenced in regular expressions by '\k<name>' or '\g<name>' as if they were
defined there.

This could made regular expressions be much more readable, because one can build them based on smaller parts,
one can build special Libraries of regular expression parts that are usable in the applications, and one can
use regular expression parts that were build by others without complete understanding of their details.

I think that this is worth to think about.

Best regards, Wolfgang
 
B

Brian Schröder

Let me first explain the reason for and the kind of this message.
=20
I have an vague idea on coming to more readable Regular Expression, and t=
he possibility to build Libraries of
Regular Expressions. The hook are the named groups ('(?<name>...)') which=
are part of Oniguruma. The idea was
influenced by the ancient Snobol4-Language and its '*'-operator for uneva= luated expressions.
=20
This input is brainstorming material and not a change proposal, because i=
t is not mature enough. I hope that
something like this will appear sometimes in the future in Ruby.
=20
Now the idea.
=20
Ruby and Onigurama should be extended somehow to allow Ruby-objects (usua= lly regular expressions) to be
registered somehow to the class Regexp, so that they can be referenced la= ter in regular expressions.
=20
In detail a regular expression that consists only of a named group defini=
tion (starts with '(? said:
registered by something like 'Regex.register(/(?<example>a|b|c|d)/)', and= be deleted by
Regex.remove('<example>'). If the regular expression is assigned to a var=
iable this can be used, how to manage
this in the 'remove' case has to be clearified. I used class methods for =
this example, but it may be better to
introduce a named Regexp objects which will be created by something like =
'/(? said:
possibility for explicit deletion should be there, because the regex engi=
ne Oniguruma must know about the
object to take care about.
=20
These Object can later on be referenced in regular expressions by '\k<nam=
e>' or '\g said:
defined there.
=20
This could made regular expressions be much more readable, because one ca=
n build them based on smaller parts,
one can build special Libraries of regular expression parts that are usab=
le in the applications, and one can
use regular expression parts that were build by others without complete u= nderstanding of their details.
=20
I think that this is worth to think about.
=20
Best regards, Wolfgang

Hello Wolfgang,

where is the difference to



example =3D "(?<example>a|b|c)"
regex =3D /#{example}|nothing/

except that you make Regexp hold the example variable, and have a
parse test on the regexp. And you may get these by something like
this:

bschroed@black:~/svn/projekte/ruby-things$ cat regexp.rb

class Regexp
def self.register(name, regexp)
self.new(regexp.to_s)
(@registered_res ||=3D {})[name] =3D regexp.to_s
end

def self.[](name)
@registered_res[name]
end
end

Regexp.register:)example, 'a|b|c')

if /#{Regexp[:example]}|nothing/ =3D~ 'Well, that was just nothing'
puts "Contains an example or nothing"
end

Regexp.register:)invalid, '(invalid(')
bschroed@black:~/svn/projekte/ruby-things$ ruby regexp.rb=20
Contains an example or nothing
regexp.rb:4:in `initialize': premature end of regular expression:
/(invalid(/ (RegexpError)
from regexp.rb:4:in `new'
from regexp.rb:4:in `register'
from regexp.rb:19


So it seems a very specialized whish to me.

Regards,

Brian

--=20
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
 
W

Wolfgang Nádasi-Donner

snip >>>>>
Can't we do that already?

example = /a|b|c|d/
mybigregex = /#{example}|foo/

If you need more scope, use constants.
It is not the same, because you include the textual data (it is somehow like usind the C preprocessor). There
are two disadvantages:

1) During debugging or things like this you don't see your constructed structure - you have to work with the
final regular expression

2) You cannot manage recursive constructs, which are possible using '\g<name>'. This is a standard part on
Oniguruma.
 
D

Dominik Bathon

Now the idea.

Ruby and Onigurama should be extended somehow to allow Ruby-objects =20
(usually regular expressions) to be
registered somehow to the class Regexp, so that they can be referenced = =20
later in regular expressions.

I think I generally like the idea to compose regular expressions that way=
=20
...
In detail a regular expression that consists only of a named group =20
definition (starts with '(?<name>') kann be
registered by something like 'Regex.register(/(?<example>a|b|c|d)/)', =20
and be deleted by
Regex.remove('<example>'). If the regular expression is assigned to a =20
variable this can be used, how to manage
this in the 'remove' case has to be clearified. I used class methods fo= r =20
this example, but it may be better to
introduce a named Regexp objects which will be created by something lik= e =20
'/(?<example>a|b|c|d)/.create. Some
possibility for explicit deletion should be there, because the regex =20
engine Oniguruma must know about the
object to take care about.

... but I think registering all named groups in one global place is not a=
=20
good idea (even if you can unregister): what if two libraries use the sam=
e =20
group names? I think there would be many name clashes.

So here is another idea: Let the caller manage the named groups himself. =
=20
Maybe in arrays or hashes. Something like:

groups =3D [/(?<example>a|b|c|d)/, /(?<example2>e|f|g)/]

or with hashes:

groups =3D { "example" =3D> /a|b|c|d/, "example2" =3D> /e|f|g/ }

or maybe in some specialized named groups library class.
These Object can later on be referenced in regular expressions by =20
'\k<name>' or '\g<name>' as if they were
defined there.

To use those groups I would suggest something like:

/\k<example>/.with(groups)

RegExp#with would return the "composed" RegExp that can be used like any =
=20
other RegExp.

What do you think?

Dominik

Disclaimer: I do not really know how named groups work in Oniguruma, just=
=20
wanted to point out that one global registry might be a bad idea.
 
N

Nikolai Weibull

Wolfgang N_dasi-Donner wrote:

[blurb about named groups in regular expressions]

I hope that the people responsible for the regular-expression code in
Ruby 2.0 read http://www.perl.com/pub/a/2002/06/04/apo5.html before
going along with a Perl-5-inspired syntax with hopelessly ugly
extensions (I'm sorry, but \k<name> and \g<name> are just horrendous).
Perl 6=E2=80=99s way of defining grammars is quite neat and simple to
understand. I also have some ideas for a better syntax, which is
inspired by the afforementioned document, but I have yet to release
anything (it was part of my master=E2=80=99s thesis),
nikolai

--=20
Nikolai Weibull: now available free of charge at http://bitwi.se/!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
 
W

Wolfgang Nádasi-Donner

I think Oniguruma is somehow stable and used for other projects too, but this may be a wrong information. I
took the Uniguruma syntax 'as given'.
 
W

Wolfgang Nádasi-Donner

I hope that the people responsible for the regular-expression code in
Ruby 2.0 read http://www.perl.com/pub/a/2002/06/04/apo5.html before
going along with a Perl-5-inspired syntax with hopelessly ugly
extensions (I'm sorry, but \k<name> and \g<name> are just horrendous).
Perl 6's way of defining grammars is quite neat and simple to
understand. ...
Is it a realistic idea to produce change proposals against Oniguruma? - As I understood it is a project in its
own right and used in different projects, not only Ruby.
 
S

Simon Strandgaard

On 7/31/05, Nikolai Weibull
Perl 6's way of defining grammars is quite neat and simple to
understand. I also have some ideas for a better syntax, which is
inspired by the afforementioned document, but I have yet to release
anything (it was part of my master's thesis),

Indeed.. perl6's new regexp/grammar syntax is sweet :)
 
W

Wolfgang Nádasi-Donner

snip >>>>>
On Sat, 30 Jul 2005 20:31:03 +0200, Wolfgang Nádasi-Donner
..
..
..
... but I think registering all named groups in one global place is not a
good idea (even if you can unregister): what if two libraries use the same
group names? I think there would be many name clashes.

So here is another idea: Let the caller manage the named groups himself.
Maybe in arrays or hashes. Something like:

groups = [/(?<example>a|b|c|d)/, /(?<example2>e|f|g)/]

or with hashes:

groups = { "example" => /a|b|c|d/, "example2" => /e|f|g/ }

or maybe in some specialized named groups library class.
These Object can later on be referenced in regular expressions by
'\k<name>' or '\g<name>' as if they were
defined there.

To use those groups I would suggest something like:

/\k<example>/.with(groups)

RegExp#with would return the "composed" RegExp that can be used like any
other RegExp.

What do you think?
First of all - I made a mistake. Please forget all '\k<name>...'-stuff. This is the same as '\1', '\2', ...,
which means, it is a reference to a match result of applying this group in the actual matching process. We are
talking here about the '\g<name>...' reference only, which is a call to the group during match time. For
simply prematch time replacement the '#{...}' Ruby construct is still usable.

It is clear for my understanding that in the Ruby environment the class 'Regexp' must be changed, as well as
'Oniguruma' itself, because it must be able to find the predefined patterns during a match process.

My suggestion based this on the prerequisite to have minimal changes in Oniguruma and Ruby's Regexp class -
making such changes acceptable and possible ;-) This implies not to change existing things in Ruby and
Oniguruma. Insofar I prefer the usage of '\g<name>' instead of some other notation, but that are only my
thoughts for it.

The idea of using hashes in Ruby and an extension of class Regexp having a 'with' method sounds very good.
This method is a candidate for building the connection to Oniguruma, which then knows where to search for a
'(?<paul>...)' expression, if it isn't defined in the actual regular expression, but referenced via '\g<paul>'
there.

The 'with' method may be able to have a list of hashes as parameter (or even multiple hashes as parameters),
because one may use more than one predefined pattern groups (may happen if one uses a general pattern library
and a special one for the application).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top