attaching code to run on regular expression match

Discussion in 'Ruby' started by Eyal Oren, Oct 19, 2005.

  1. Eyal Oren

    Eyal Oren Guest

    Hi,

    I am parsing query expressions, using a regular expression with
    multiple matches in it, e.g. /(\w+):(\w+)/.

    I would like some code to execute on the first match (e.g.
    constructing some object out of it) and some other code on the second
    match (e.g. constructing some other object).

    I can of course check the array of matches and find the non-nil
    element, and decide which code to execute. But that becomes very
    cumbersome with a large regex (with say 10 different matches).

    So I would rather like to attach some code in a match directly, as one
    does in parsing generators, e.g.
    /(\w+:do_method):(\w+:do_other_method)/.

    Would something like that be possible in Ruby? I tried searching but
    I'm not sure how such a feature would be called.
     
    Eyal Oren, Oct 19, 2005
    #1
    1. Advertising

  2. On 19/10/05, Eyal Oren <> wrote:
    > Hi,
    >
    > I am parsing query expressions, using a regular expression with
    > multiple matches in it, e.g. /(\w+):(\w+)/.
    >
    > I would like some code to execute on the first match (e.g.
    > constructing some object out of it) and some other code on the second
    > match (e.g. constructing some other object).
    >
    > I can of course check the array of matches and find the non-nil
    > element, and decide which code to execute. But that becomes very
    > cumbersome with a large regex (with say 10 different matches).
    >
    > So I would rather like to attach some code in a match directly, as one
    > does in parsing generators, e.g.
    > /(\w+:do_method):(\w+:do_other_method)/.
    >
    > Would something like that be possible in Ruby? I tried searching but
    > I'm not sure how such a feature would be called.
    >
    >


    Maybe you can refactor your regexp to be used with scan.

    irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcase =
    end
    SOME
    WORDS
    TO
    CHANGE
    =3D> "some words to change"

    hth,
    Brian

    --
    http://ruby.brian-schroeder.de/

    Stringed instrument chords: http://chordlist.brian-schroeder.de/
     
    Brian Schröder, Oct 19, 2005
    #2
    1. Advertising

  3. Eyal Oren wrote:
    > Hi,
    >
    > I am parsing query expressions, using a regular expression with
    > multiple matches in it, e.g. /(\w+):(\w+)/.
    >
    > I would like some code to execute on the first match (e.g.
    > constructing some object out of it) and some other code on the second
    > match (e.g. constructing some other object).
    >
    > I can of course check the array of matches and find the non-nil
    > element, and decide which code to execute. But that becomes very
    > cumbersome with a large regex (with say 10 different matches).
    >
    > So I would rather like to attach some code in a match directly, as one
    > does in parsing generators, e.g.
    > /(\w+:do_method):(\w+:do_other_method)/.
    >
    > Would something like that be possible in Ruby? I tried searching but
    > I'm not sure how such a feature would be called.


    No, I don't think it's possible. You can do this

    string.scan(/(\w+):(\w+)/) do |match|
    case match.inject(1) {|pos,x| break pos if x;pos + 1}
    when 1
    # code for group 1
    when 2
    # ...
    end
    end

    Kind regards

    robert
     
    Robert Klemme, Oct 19, 2005
    #3
  4. Eyal Oren

    Eyal Oren Guest

    On 19/10/05, Brian Schr=F6der <> wrote:
    > On 19/10/05, Eyal Oren <> wrote:
    > > Hi,
    > >
    > > I am parsing query expressions, using a regular expression with
    > > multiple matches in it, e.g. /(\w+):(\w+)/.
    > >
    > > I would like some code to execute on the first match (e.g.
    > > constructing some object out of it) and some other code on the second
    > > match (e.g. constructing some other object).
    > >
    > > I can of course check the array of matches and find the non-nil
    > > element, and decide which code to execute. But that becomes very
    > > cumbersome with a large regex (with say 10 different matches).
    > >
    > > So I would rather like to attach some code in a match directly, as one
    > > does in parsing generators, e.g.
    > > /(\w+:do_method):(\w+:do_other_method)/.
    > >
    > > Would something like that be possible in Ruby? I tried searching but
    > > I'm not sure how such a feature would be called.

    >
    > Maybe you can refactor your regexp to be used with scan.
    >
    > irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcas=

    e end
    > SOME
    > WORDS
    > TO
    > CHANGE
    > =3D> "some words to change"

    I am not sure that would help, I need to know which of the matches
    occurred, because the actions are different for different matches (you
    just 'put' all matches).

    In your example, "Some words To change" say I want to print the
    capitalised words normally, and print the others reversed. I can make
    a regex that captures both these words in two groups, but scan
    wouldn't work because I wouldn't know if a match was from group one or
    group two.

    But AFAIK I cannot ask the resulting match which regex he was matched
    by, so I still do not know what to do. I could of course test each
    regex on the matched word again, but that is not efficient.
     
    Eyal Oren, Oct 19, 2005
    #4
  5. Eyal Oren

    Pit Capitain Guest

    Eyal Oren schrieb:
    > So I would rather like to attach some code in a match directly, as one
    > does in parsing generators, e.g.
    > /(\w+:do_method):(\w+:do_other_method)/.
    >
    > Would something like that be possible in Ruby? I tried searching but
    > I'm not sure how such a feature would be called.


    I'm sure I'm missing something, but wouldn't this work:

    string.scan(/(\w+):(\w+)/) do |m1, m2|
    do_method(m1)
    do_other_method(m2)
    end

    Maybe you can show us one of your complex regex?

    Regards,
    Pit
     
    Pit Capitain, Oct 19, 2005
    #5
  6. Eyal Oren

    Eyal Oren Guest

    thanks. that might work, but the problem is I think in the unions of
    the regexps that I use, see example:

    because of the unions, I don't really want to decide after the match
    what to do with it, but rather state it in the constituent regexp's
    (e.g., I would like to say in the ImplicitWiki regexp what should
    happen if it is encountered)


    ExplicitWiki = /\[\[([^\]]+)\]\]/

    # CamelCase followed by some non-word character, e.g. 'CamelCase.'
    ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/

    # <...>, no space inside brackets
    Uri = /<([^<>]+)>/

    # dc:title
    Prefix = /(\w*):(\w+)/

    # "hello"
    Literal = /"([^"]*)"/

    Wiki = Regexp.union ExplicitWiki, ImplicitWiki
    Pred = Regexp.union Wiki, Uri, Prefix
    Obj = Regexp.union Pred, Literal
    Annotation = /(#{Pred})\s*(#{Obj})\s*\./

    Variable = /(\?\w+)/
    UriPattern = Regexp.union Variable, Pred
    LiteralPattern = Regexp.union Variable, Obj
    Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPattern}\]/
     
    Eyal Oren, Oct 19, 2005
    #6
  7. Eyal Oren

    Pit Capitain Guest

    Eyal Oren schrieb:
    > thanks. that might work, but the problem is I think in the unions of
    > the regexps that I use, see example:
    >
    > because of the unions, I don't really want to decide after the match
    > what to do with it, but rather state it in the constituent regexp's
    > (e.g., I would like to say in the ImplicitWiki regexp what should
    > happen if it is encountered)
    >
    >
    > ExplicitWiki = /\[\[([^\]]+)\]\]/
    >
    > # CamelCase followed by some non-word character, e.g. 'CamelCase.'
    > ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/
    >
    > # <...>, no space inside brackets
    > Uri = /<([^<>]+)>/
    >
    > # dc:title
    > Prefix = /(\w*):(\w+)/
    >
    > # "hello"
    > Literal = /"([^"]*)"/
    >
    > Wiki = Regexp.union ExplicitWiki, ImplicitWiki
    > Pred = Regexp.union Wiki, Uri, Prefix
    > Obj = Regexp.union Pred, Literal
    > Annotation = /(#{Pred})\s*(#{Obj})\s*\./
    >
    > Variable = /(\?\w+)/
    > UriPattern = Regexp.union Variable, Pred
    > LiteralPattern = Regexp.union Variable, Obj
    > Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPattern}\]/


    OK, thanks for your example. I think the regexp engine of Ruby 1.9
    called Oniguruma supports something like named sub-expressions, which
    might be what you need.

    Regards,
    Pit
     
    Pit Capitain, Oct 19, 2005
    #7
  8. On Wed, Oct 19, 2005 at 08:16:58PM +0900, Eyal Oren wrote:
    > thanks. that might work, but the problem is I think in the unions of
    > the regexps that I use, see example:
    >
    > because of the unions, I don't really want to decide after the match
    > what to do with it, but rather state it in the constituent regexp's
    > (e.g., I would like to say in the ImplicitWiki regexp what should
    > happen if it is encountered)
    >
    >
    > ExplicitWiki = /\[\[([^\]]+)\]\]/
    >
    > # CamelCase followed by some non-word character, e.g. 'CamelCase.'
    > ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/
    >
    > # <...>, no space inside brackets
    > Uri = /<([^<>]+)>/
    >
    > # dc:title
    > Prefix = /(\w*):(\w+)/
    >
    > # "hello"
    > Literal = /"([^"]*)"/
    >
    > Wiki = Regexp.union ExplicitWiki, ImplicitWiki
    > Pred = Regexp.union Wiki, Uri, Prefix
    > Obj = Regexp.union Pred, Literal
    > Annotation = /(#{Pred})\s*(#{Obj})\s*\./
    >
    > Variable = /(\?\w+)/
    > UriPattern = Regexp.union Variable, Pred
    > LiteralPattern = Regexp.union Variable, Obj
    > Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPattern}\]/


    I wrote the following a long time ago when I was new to Ruby. Maybe you
    could use a similar pattern,

    ----------------------------------------------------------------------
    # Perform (possibly) multiple global substitutions on a string.
    # the regexps given as keys must not use capturing subexpressions
    # '(...)'
    class MultiSub
    # hash has regular expression fragments (as strings) as keys, mapped
    # to
    # Procs that will generate replacement text, given the matched value.
    def initialize(hash)
    @mash = Array.new
    expr = nil
    hash.each do |key,val|
    if expr == nil ; expr="(" else expr<<"|(" end
    expr << key << ")"
    @mash << val
    end
    @re = Regexp.new(expr)
    end

    # perform a global multi-sub on the given text, modifiying the passed
    # string
    # 'in place'
    def gsub!(text)
    text.gsub!(@re) { |match|
    idx = -1
    $~.to_a.each { |subexp|
    break unless idx==-1 || subexp==nil
    idx += 1
    }
    idx==-1 ? match : @mash[idx].call(match)
    }
    end
    end

    # example,

    mailSub = proc { |match| "<a href=\"mailto:#{match}\">#{match}</a>" }
    urlSub = proc { |match| "<a href=\"#{match}\">#{match}</a>" }

    sub = MultiSub.new ({
    '(?:mailto:)?[\w\.\-\+\=]+\@[\w\-]+(?:\.[\w\-]+)+\b' => mailSub,
    '\b(?:http|https|ftp):[^ \t\n<>"]+[\w/]' => urlSub
    })

    test = "...."
    sub.gsub!(test)
    puts test
    ----------------------------------------------------------------------

    ta,
    dave

    --
    http://david.holroyd.me.uk/
     
    David Holroyd, Oct 19, 2005
    #8
  9. Pit Capitain wrote:
    > OK, thanks for your example. I think the regexp engine of Ruby 1.9
    > called Oniguruma supports something like named sub-expressions, which
    > might be what you need.


    Oniguruma is indeed the regexp engine of Ruby, but are you sure named
    subexpressions aren't already in Ruby? I thought they were, but I've
    only actually used them in TextMate (an OS X text editor that uses
    Oniguruma as its regex engine).

    Hrm, I just tested and it does appear that named subexpressions aren't
    in Ruby 1.8. That's interesting, because I thought Oniguruma supported
    them quite a while ago.
     
    Kevin Ballard, Oct 19, 2005
    #9
  10. Selon Kevin Ballard <>:

    >
    > Hrm, I just tested and it does appear that named subexpressions aren't
    > in Ruby 1.8. That's interesting, because I thought Oniguruma supported
    > them quite a while ago.
    >


    I thought Oniguruma was not yet the regex engine of Ruby, but would becom=
    e it
    from Ruby2 on (is it already the engine in Ruby 1.9?), i.e. it is not the=
    regex
    engine of Ruby 1.8.
    --
    Christophe Grandsire.

    http://rainbow.conlang.free.fr

    It takes a straight mind to create a twisted conlang.
     
    Christophe Grandsire, Oct 19, 2005
    #10
  11. On Oct 19, 2005, at 8:51 AM, Kevin Ballard wrote:

    >
    > Pit Capitain wrote:
    >
    >> OK, thanks for your example. I think the regexp engine of Ruby 1.9
    >> called Oniguruma supports something like named sub-expressions, which
    >> might be what you need.
    >>

    >
    > Oniguruma is indeed the regexp engine of Ruby


    Ruby 1.9 you mean.

    > but are you sure named subexpressions aren't already in Ruby?


    If you just download and build Ruby 1.8, you don't get Oniguruma yet.

    > Hrm, I just tested and it does appear that named subexpressions aren't
    > in Ruby 1.8. That's interesting, because I thought Oniguruma supported
    > them quite a while ago.


    You can build 1.8 to use it, but you must purposefully do so.

    James Edward Gray II
     
    James Edward Gray II, Oct 19, 2005
    #11
  12. On Oct 19, 2005, at 7:51 AM, Kevin Ballard wrote:
    > Oniguruma is indeed the regexp engine of Ruby, but are you sure named
    > subexpressions aren't already in Ruby?

    [snip]
    > Hrm, I just tested and it does appear that named subexpressions aren't
    > in Ruby 1.8. That's interesting, because I thought Oniguruma supported
    > them quite a while ago.


    Oniguruma is only the engine in versions 1.9+; versions 1.8- use a
    different regexp engine.
     
    Gavin Kistner, Oct 19, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. championsleeper
    Replies:
    6
    Views:
    1,059
    championsleeper
    Apr 6, 2004
  2. Liang
    Replies:
    2
    Views:
    1,785
  3. VSK
    Replies:
    2
    Views:
    2,399
  4. Replies:
    4
    Views:
    750
  5. Replies:
    0
    Views:
    377
Loading...

Share This Page