ts said:
D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
^^^^^^^
are you sure of this ?
D> (it's 'start of current match' rather than 'end of last match'), which
D> makes relatively useless to write complex parsers with."
Guy Decoux
The OP has further clarified. To quote:
When trying to match abcde with /\Gx?/g, the first match is
successful, because no x is found but the question mark allows zero
characters to be consumed. This match ends after zero characters into
the string — at start-of-string. In order to avoid infinite loops on a
zero-length matches, the engine then retries the match one position
down the string.
In Perl, \G means end-of-last-match, and since end-of-last-match was
at start-of-string, \G can't possibly match at one character into the
string:
$ perl -le'$_="abcde"; s/\Gx?/!/; print'
!abcde
In Ruby (both 1.6 and 1.8, I found), \G merely means
start-of-current-match, which, of course, is satisfiable at that
point:
$ ruby1.6 -e'puts "abcde".gsub(/\Gx?/,"!")'
!a!b!c!d!e!
$ ruby1.8 -e'puts "abcde".gsub(/\Gx?/,"!")'
!a!b!c!d!e!
Perl's \G is a powerful tool to write parsers because the regex engine
is prohibited from skipping characters to find a match — you can work
your way through a string with a multitude of patterns using /c (to
avoid resetting the end-of-last-match on match failure) applied
against the same string in turn, without them sabotaging each other.
End quote.
Thoughts?
Dan