How can I monitor a Regexp?

D

Daniel DeLorme

Because a regular expression can have different behaviors depending on its kcode
(e.g. behavior of \w) I decided that all my code should specify the kcode
explicitly (e.g. /\w+/n instead /\w+/). So I tried to set up some hooks to
monitor the creation of each Regexp and raise an exception if the kcode is
missing. Like this:

class Regexp
alias old_initialize initialize
def initialize(*args)
old_initialize(*args)
raise "NO KCODE!" if kcode.nil?
end
end

And it works fine if I use Regexp.new, but in the majority of cases the regexp
is expressed as a literal and the initialize is NOT EXECUTED.
Regexp.new("foobar") RuntimeError: NO KCODE!
/foobar/
=> /foobar/

So I tried an alternate approach and set the hook into the =~ operator, but same
problem; the method override is completely ignored:
class String; def =~(o); raise "S"; end; end
class Regexp; def =~(o); raise "R"; end; end
"bar" =~ /bar/ #=> 0
/foo/ =~ "foo" #=> 0

So... anyone has any idea how I can tackle that problem?
 
R

Robert Dober

Because a regular expression can have different behaviors depending on its kcode
(e.g. behavior of \w) I decided that all my code should specify the kcode
explicitly (e.g. /\w+/n instead /\w+/). So I tried to set up some hooks to
monitor the creation of each Regexp and raise an exception if the kcode is
missing. Like this:

class Regexp
alias old_initialize initialize
def initialize(*args)
old_initialize(*args)
raise "NO KCODE!" if kcode.nil?
end
end

And it works fine if I use Regexp.new, but in the majority of cases the regexp
is expressed as a literal and the initialize is NOT EXECUTED.
=> /foobar/

So I tried an alternate approach and set the hook into the =~ operator, but same
problem; the method override is completely ignored:
class String; def =~(o); raise "S"; end; end
class Regexp; def =~(o); raise "R"; end; end
"bar" =~ /bar/ #=> 0
/foo/ =~ "foo" #=> 0

So... anyone has any idea how I can tackle that problem?
Yes, well no, I had one, but prospects look bleak now, look at this

robert@swserver:/home/svn 11:49:44
555/56 > ruby -r profile -e 'puts /a/'
(?-mix:a)
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.00 0.00 0.00 2 0.00 0.00 IO#write
0.00 0.00 0.00 1 0.00 0.00 Regexp#to_s
0.00 0.00 0.00 1 0.00 0.00 Kernel.puts
0.00 0.01 0.00 1 0.00 10.00 #toplevel
robert@swserver:/home/svn 11:49:50
556/57 > ruby -r profile -e 'puts Regexp.new("a")'
(?-mix:a)
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.00 0.00 0.00 2 0.00 0.00 IO#write
0.00 0.00 0.00 1 0.00 0.00 Kernel.puts
0.00 0.00 0.00 1 0.00 0.00 Regexp#initialize
0.00 0.00 0.00 1 0.00 0.00 Class#new
0.00 0.00 0.00 1 0.00 0.00 Regexp#to_s
0.00 0.01 0.00 1 0.00 10.00 #toplevel

I just do not see any way to intercept on Ruby level, you would need
to hack ruby itself.
Maybe someone more clever than me?

Cheers
Robert
 
J

Jan Friedrich

ruby -v
# ==> ruby 1.8.4 (2005-12-24) [i486-linux]

class String; def =~(o); raise "S"; end; end
class Regexp; def =~(o); raise "R"; end; end

r = /x/
r =~ 'a'
# ==> RuntimeError: R
from (irb):2:in `=~'
from (irb):4
'a' =~ r
# ==> RuntimeError: S
from (irb):1:in `=~'
from (irb):5
 
D

Daniel DeLorme

Jan said:
ruby -v
# ==> ruby 1.8.4 (2005-12-24) [i486-linux]

class String; def =~(o); raise "S"; end; end
class Regexp; def =~(o); raise "R"; end; end

r = /x/
r =~ 'a'
# ==> RuntimeError: R
from (irb):2:in `=~'
from (irb):4
'a' =~ r
# ==> RuntimeError: S
from (irb):1:in `=~'
from (irb):5

Very interesting. If you assign the regexp to a variable you get
the overridden methods. I guess there's some voodoo optimization
at work when you use =~ on a regexp literal?

Daniel
 
D

Daniel DeLorme

Daniel said:
Because a regular expression can have different behaviors depending on
its kcode (e.g. behavior of \w) I decided that all my code should
specify the kcode explicitly (e.g. /\w+/n instead /\w+/).

As an addendum, I was wondering why \w matches extended characters in utf8.
If extended characters are considered "word" characters, does it mean they
are valid for identifiers? So I tried:
$KCODE='u' => "u"
def 日本語
"nihongo"
end => nil
日本語
=> "nihongo"

wow. O_O

Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,067
Latest member
HunterTere

Latest Threads

Top