Ruby regex engine behavior question

D

Daniel Berger

I read this in a journal entry:

"[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
(it's 'start of current match' rather than 'end of last match'), which
makes relatively useless to write complex parsers with."

Can anyone comment on this? I'm not quite certain what he means. And
is it still the same in 1.8?

Regards,

Dan
 
T

ts

D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
^^^^^^^

are you sure of this ?

D> (it's 'start of current match' rather than 'end of last match'), which
D> makes relatively useless to write complex parsers with."


Guy Decoux
 
D

Daniel Berger

ts said:
D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
^^^^^^^

are you sure of this ?

D> (it's 'start of current match' rather than 'end of last match'), which
D> makes relatively useless to write complex parsers with."


Guy Decoux

No. That's why I'm asking. I'm merely quoting the entry I saw. Thoughts?

Dan
 
N

nobu.nokada

Hi,

At Tue, 14 Sep 2004 01:04:58 +0900,
Daniel Berger wrote in [ruby-talk:112395]:
"[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
(it's 'start of current match' rather than 'end of last match'), which
makes relatively useless to write complex parsers with."

I don't understand he means too. Th 'start' and the 'end'
should be same, since global match starts to match from the end
of last match.
 
D

Daniel Berger

ts said:
D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
^^^^^^^

are you sure of this ?

D> (it's 'start of current match' rather than 'end of last match'), which
D> makes relatively useless to write complex parsers with."


Guy Decoux

The OP has further clarified. To quote:

When trying to match abcde with /\Gx?/g, the first match is
successful, because no x is found but the question mark allows zero
characters to be consumed. This match ends after zero characters into
the string — at start-of-string. In order to avoid infinite loops on a
zero-length matches, the engine then retries the match one position
down the string.

In Perl, \G means end-of-last-match, and since end-of-last-match was
at start-of-string, \G can't possibly match at one character into the
string:

$ perl -le'$_="abcde"; s/\Gx?/!/; print'
!abcde

In Ruby (both 1.6 and 1.8, I found), \G merely means
start-of-current-match, which, of course, is satisfiable at that
point:

$ ruby1.6 -e'puts "abcde".gsub(/\Gx?/,"!")'
!a!b!c!d!e!
$ ruby1.8 -e'puts "abcde".gsub(/\Gx?/,"!")'
!a!b!c!d!e!

Perl's \G is a powerful tool to write parsers because the regex engine
is prohibited from skipping characters to find a match — you can work
your way through a string with a multitude of patterns using /c (to
avoid resetting the end-of-last-match on match failure) applied
against the same string in turn, without them sabotaging each other.

End quote.

Thoughts?

Dan
 
T

ts

D> In Perl, \G means end-of-last-match, and since end-of-last-match was
D> at start-of-string, \G can't possibly match at one character into the
D> string:

This is one way to say it, another is

* on a zero length match, perl prohibit the second zero length match

* on a zero length match, ruby move its internal cursor


Guy Decoux
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top