attaching code to run on regular expression match

E

Eyal Oren

Hi,

I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+):(\w+)/.

I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).

I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).

So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.

Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.
 
B

Brian Schröder

Hi,

I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+):(\w+)/.

I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).

I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).

So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.

Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.

Maybe you can refactor your regexp to be used with scan.

irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcase =
end
SOME
WORDS
TO
CHANGE
=3D> "some words to change"

hth,
Brian
 
R

Robert Klemme

Eyal said:
Hi,

I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+):(\w+)/.

I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).

I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).

So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.

Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.

No, I don't think it's possible. You can do this

string.scan(/(\w+):(\w+)/) do |match|
case match.inject(1) {|pos,x| break pos if x;pos + 1}
when 1
# code for group 1
when 2
# ...
end
end

Kind regards

robert
 
E

Eyal Oren

Maybe you can refactor your regexp to be used with scan.

irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcas= e end
SOME
WORDS
TO
CHANGE
=3D> "some words to change"
I am not sure that would help, I need to know which of the matches
occurred, because the actions are different for different matches (you
just 'put' all matches).

In your example, "Some words To change" say I want to print the
capitalised words normally, and print the others reversed. I can make
a regex that captures both these words in two groups, but scan
wouldn't work because I wouldn't know if a match was from group one or
group two.

But AFAIK I cannot ask the resulting match which regex he was matched
by, so I still do not know what to do. I could of course test each
regex on the matched word again, but that is not efficient.
 
P

Pit Capitain

Eyal said:
So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method):(\w+:do_other_method)/.

Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.

I'm sure I'm missing something, but wouldn't this work:

string.scan(/(\w+):(\w+)/) do |m1, m2|
do_method(m1)
do_other_method(m2)
end

Maybe you can show us one of your complex regex?

Regards,
Pit
 
E

Eyal Oren

thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:

because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)


ExplicitWiki = /\[\[([^\]]+)\]\]/

# CamelCase followed by some non-word character, e.g. 'CamelCase.'
ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/

# <...>, no space inside brackets
Uri = /<([^<>]+)>/

# dc:title
Prefix = /(\w*):(\w+)/

# "hello"
Literal = /"([^"]*)"/

Wiki = Regexp.union ExplicitWiki, ImplicitWiki
Pred = Regexp.union Wiki, Uri, Prefix
Obj = Regexp.union Pred, Literal
Annotation = /(#{Pred})\s*(#{Obj})\s*\./

Variable = /(\?\w+)/
UriPattern = Regexp.union Variable, Pred
LiteralPattern = Regexp.union Variable, Obj
Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPattern}\]/
 
P

Pit Capitain

Eyal said:
thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:

because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)


ExplicitWiki = /\[\[([^\]]+)\]\]/

# CamelCase followed by some non-word character, e.g. 'CamelCase.'
ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/

# <...>, no space inside brackets
Uri = /<([^<>]+)>/

# dc:title
Prefix = /(\w*):(\w+)/

# "hello"
Literal = /"([^"]*)"/

Wiki = Regexp.union ExplicitWiki, ImplicitWiki
Pred = Regexp.union Wiki, Uri, Prefix
Obj = Regexp.union Pred, Literal
Annotation = /(#{Pred})\s*(#{Obj})\s*\./

Variable = /(\?\w+)/
UriPattern = Regexp.union Variable, Pred
LiteralPattern = Regexp.union Variable, Obj
Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPattern}\]/

OK, thanks for your example. I think the regexp engine of Ruby 1.9
called Oniguruma supports something like named sub-expressions, which
might be what you need.

Regards,
Pit
 
D

David Holroyd

thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:

because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)


ExplicitWiki = /\[\[([^\]]+)\]\]/

# CamelCase followed by some non-word character, e.g. 'CamelCase.'
ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/

# <...>, no space inside brackets
Uri = /<([^<>]+)>/

# dc:title
Prefix = /(\w*):(\w+)/

# "hello"
Literal = /"([^"]*)"/

Wiki = Regexp.union ExplicitWiki, ImplicitWiki
Pred = Regexp.union Wiki, Uri, Prefix
Obj = Regexp.union Pred, Literal
Annotation = /(#{Pred})\s*(#{Obj})\s*\./

Variable = /(\?\w+)/
UriPattern = Regexp.union Variable, Pred
LiteralPattern = Regexp.union Variable, Obj
Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPattern}\]/

I wrote the following a long time ago when I was new to Ruby. Maybe you
could use a similar pattern,

----------------------------------------------------------------------
# Perform (possibly) multiple global substitutions on a string.
# the regexps given as keys must not use capturing subexpressions
# '(...)'
class MultiSub
# hash has regular expression fragments (as strings) as keys, mapped
# to
# Procs that will generate replacement text, given the matched value.
def initialize(hash)
@mash = Array.new
expr = nil
hash.each do |key,val|
if expr == nil ; expr="(" else expr<<"|(" end
expr << key << ")"
@mash << val
end
@re = Regexp.new(expr)
end

# perform a global multi-sub on the given text, modifiying the passed
# string
# 'in place'
def gsub!(text)
text.gsub!(@re) { |match|
idx = -1
$~.to_a.each { |subexp|
break unless idx==-1 || subexp==nil
idx += 1
}
idx==-1 ? match : @mash[idx].call(match)
}
end
end

# example,

mailSub = proc { |match| "<a href=\"mailto:#{match}\">#{match}</a>" }
urlSub = proc { |match| "<a href=\"#{match}\">#{match}</a>" }

sub = MultiSub.new ({
'(?:mailto:)?[\w\.\-\+\=]+\@[\w\-]+(?:\.[\w\-]+)+\b' => mailSub,
'\b(?:http|https|ftp):[^ \t\n<>"]+[\w/]' => urlSub
})

test = "...."
sub.gsub!(test)
puts test
 
K

Kevin Ballard

Pit said:
OK, thanks for your example. I think the regexp engine of Ruby 1.9
called Oniguruma supports something like named sub-expressions, which
might be what you need.

Oniguruma is indeed the regexp engine of Ruby, but are you sure named
subexpressions aren't already in Ruby? I thought they were, but I've
only actually used them in TextMate (an OS X text editor that uses
Oniguruma as its regex engine).

Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.
 
C

Christophe Grandsire

Selon Kevin Ballard said:
Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.

I thought Oniguruma was not yet the regex engine of Ruby, but would becom=
e it
from Ruby2 on (is it already the engine in Ruby 1.9?), i.e. it is not the=
regex
engine of Ruby 1.8.
--
Christophe Grandsire.

http://rainbow.conlang.free.fr

It takes a straight mind to create a twisted conlang.
 
J

James Edward Gray II

Oniguruma is indeed the regexp engine of Ruby

Ruby 1.9 you mean.
but are you sure named subexpressions aren't already in Ruby?

If you just download and build Ruby 1.8, you don't get Oniguruma yet.
Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.

You can build 1.8 to use it, but you must purposefully do so.

James Edward Gray II
 
G

Gavin Kistner

Oniguruma is indeed the regexp engine of Ruby, but are you sure named
subexpressions aren't already in Ruby? [snip]
Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.

Oniguruma is only the engine in versions 1.9+; versions 1.8- use a
different regexp engine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top