Ruby regex engine behavior question

Discussion in 'Ruby' started by Daniel Berger, Sep 13, 2004.

  1. I read this in a journal entry:

    "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
    (it's 'start of current match' rather than 'end of last match'), which
    makes relatively useless to write complex parsers with."

    Can anyone comment on this? I'm not quite certain what he means. And
    is it still the same in 1.8?

    Regards,

    Dan
     
    Daniel Berger, Sep 13, 2004
    #1
    1. Advertising

  2. Daniel Berger

    ts Guest

    >>>>> "D" == Daniel Berger <> writes:

    D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
    ^^^^^^^

    are you sure of this ?

    D> (it's 'start of current match' rather than 'end of last match'), which
    D> makes relatively useless to write complex parsers with."


    Guy Decoux
     
    ts, Sep 13, 2004
    #2
    1. Advertising

  3. ts <> wrote in message news:<>...
    > >>>>> "D" == Daniel Berger <> writes:

    >
    > D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
    > ^^^^^^^
    >
    > are you sure of this ?
    >
    > D> (it's 'start of current match' rather than 'end of last match'), which
    > D> makes relatively useless to write complex parsers with."
    >
    >
    > Guy Decoux


    No. That's why I'm asking. I'm merely quoting the entry I saw. Thoughts?

    Dan
     
    Daniel Berger, Sep 14, 2004
    #3
  4. Daniel Berger

    Guest

    Hi,

    At Tue, 14 Sep 2004 01:04:58 +0900,
    Daniel Berger wrote in [ruby-talk:112395]:
    > "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
    > (it's 'start of current match' rather than 'end of last match'), which
    > makes relatively useless to write complex parsers with."


    I don't understand he means too. Th 'start' and the 'end'
    should be same, since global match starts to match from the end
    of last match.

    --
    Nobu Nakada
     
    , Sep 14, 2004
    #4
  5. ts <> wrote in message news:<>...
    > >>>>> "D" == Daniel Berger <> writes:

    >
    > D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
    > ^^^^^^^
    >
    > are you sure of this ?
    >
    > D> (it's 'start of current match' rather than 'end of last match'), which
    > D> makes relatively useless to write complex parsers with."
    >
    >
    > Guy Decoux


    The OP has further clarified. To quote:

    When trying to match abcde with /\Gx?/g, the first match is
    successful, because no x is found but the question mark allows zero
    characters to be consumed. This match ends after zero characters into
    the string — at start-of-string. In order to avoid infinite loops on a
    zero-length matches, the engine then retries the match one position
    down the string.

    In Perl, \G means end-of-last-match, and since end-of-last-match was
    at start-of-string, \G can't possibly match at one character into the
    string:

    $ perl -le'$_="abcde"; s/\Gx?/!/; print'
    !abcde

    In Ruby (both 1.6 and 1.8, I found), \G merely means
    start-of-current-match, which, of course, is satisfiable at that
    point:

    $ ruby1.6 -e'puts "abcde".gsub(/\Gx?/,"!")'
    !a!b!c!d!e!
    $ ruby1.8 -e'puts "abcde".gsub(/\Gx?/,"!")'
    !a!b!c!d!e!

    Perl's \G is a powerful tool to write parsers because the regex engine
    is prohibited from skipping characters to find a match — you can work
    your way through a string with a multitude of patterns using /c (to
    avoid resetting the end-of-last-match on match failure) applied
    against the same string in turn, without them sabotaging each other.

    End quote.

    Thoughts?

    Dan
     
    Daniel Berger, Sep 14, 2004
    #5
  6. Daniel Berger

    ts Guest

    >>>>> "D" == Daniel Berger <> writes:

    D> In Perl, \G means end-of-last-match, and since end-of-last-match was
    D> at start-of-string, \G can't possibly match at one character into the
    D> string:

    This is one way to say it, another is

    * on a zero length match, perl prohibit the second zero length match

    * on a zero length match, ruby move its internal cursor


    Guy Decoux
     
    ts, Sep 14, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    715
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Replies:
    1
    Views:
    379
    Sybren Stuvel
    Apr 10, 2006
  3. Sasha
    Replies:
    3
    Views:
    594
    Sasha
    May 22, 2007
  4. Replies:
    3
    Views:
    775
    Reedick, Andrew
    Jul 1, 2008
  5. Wolfgang Nádasi-Donner

    Which Regex-Engine will be used in Ruby 1.8.3 Release?

    Wolfgang Nádasi-Donner, Jul 29, 2005, in forum: Ruby
    Replies:
    3
    Views:
    130
    Hal Fulton
    Jul 30, 2005
Loading...

Share This Page