Regex compiler not showing backtracking for *

K

Krishna Chaitanya

Hi, am trying out the "Smeagol" example from "Programming
Perl" (section 5.9.3) :

#!/usr/bin/perl
use re "debug";
"Smeagol" =~ /^Sm(.*)g[aeiou]l$/;

Am using perl 5.10.0 on 32-bit Windows XP. Following is the output:

====================================

Compiling REx "^Sm(.*)g[aeiou]l$"
Final program:
1: BOL (2)
2: EXACT <Sm> (4)
4: OPEN1 (6)
6: STAR (8)
7: REG_ANY (0)
8: CLOSE1 (10)
10: EXACT <g> (12)
12: ANYOF[aeiou] (23)
23: EXACT <l> (25)
25: EOL (26)
26: END (0)
anchored "Sm" at 0 floating "l"$ at 4..2147483647 (checking anchored)
anchored(BOL) minlen 5
Guessing start of match in sv for REx "^Sm(.*)g[aeiou]l$" against
"Smeagol"
Guessed: match at offset 0
Matching REx "^Sm(.*)g[aeiou]l$" against "Smeagol"
0 <> <Smeagol> | 1:BOL(2)
0 <> <Smeagol> | 2:EXACT <Sm>(4)
2 <Sm> <eagol> | 4:OPEN1(6)
2 <Sm> <eagol> | 6:STAR(8)
REG_ANY can match 5 times out of
2147483647...
4 <Smea> <gol> | 8: CLOSE1(10)
4 <Smea> <gol> | 10: EXACT <g>(12)
5 <Smeag> <ol> | 12: ANYOF[aeiou](23)
6 <Smeago> <l> | 23: EXACT <l>(25)
7 <Smeagol> <> | 25: EOL(26)
7 <Smeagol> <> | 26: END(0)
Match successful!
Freeing REx: "^Sm(.*)g[aeiou]l$"

====================================

Why is there no backtracking for the maximal match quantifier " * "?
Or backtracking is actually happening but the pragma re "debug" output
is different (not showing backtracking)? Could someone please point
out if I'm missing anything?

Thanks
 
S

sln

Hi, am trying out the "Smeagol" example from "Programming
Perl" (section 5.9.3) :

#!/usr/bin/perl
use re "debug";
"Smeagol" =~ /^Sm(.*)g[aeiou]l$/;

Am using perl 5.10.0 on 32-bit Windows XP. Following is the output:
[snip]

Why is there no backtracking for the maximal match quantifier " * "?
Or backtracking is actually happening but the pragma re "debug" output
is different (not showing backtracking)? Could someone please point
out if I'm missing anything?

Thanks

I don't think the results show back tracking, it shows a differential
of eval stacks that directly matched or did not match the pattern.

But if you want to see backtracking, put in more than one quantifier and
set up back track condition. The one below demonstrates that, and exhausts
all possibillities, then fails.

-sln

-------------------------------------------
use re "debug";
"Smeaeagol2" =~ /^Sm(.*)g*[aeiou]l$/;
-------------------------------------------

Compiling REx `^Sm(.*)g*[aeiou]l$'
size 27 Got 220 bytes for offset annotations.
first at 2
1: BOL(2)
2: EXACT <Sm>(4)
4: OPEN1(6)
6: STAR(8)
7: REG_ANY(0)
8: CLOSE1(10)
10: STAR(13)
11: EXACT <g>(0)
13: ANYOF[aeiou](24)
24: EXACT <l>(26)
26: EOL(27)
27: END(0)
anchored `Sm' at 0 floating `l'$ at 3..2147483647 (checking anchored) anchored(B
OL) minlen 4
Offsets: [27]
1[1] 2[2] 0[0] 4[1] 0[0] 6[1] 5[1] 7[1] 0[0] 9[1] 8[1] 0[0] 10[7] 0[0] 0
[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 17[1] 0[0] 18[1] 19[0]
Guessing start of match, REx `^Sm(.*)g*[aeiou]l$' against `Smeaeagol2'...
Guessed: match at offset 0
Matching REx `^Sm(.*)g*[aeiou]l$' against `Smeaeagol2'
Setting an EVAL scope, savestack=3
0 <> <Smeaeagol2> | 1: BOL
0 <> <Smeaeagol2> | 2: EXACT <Sm>
2 <Sm> <eaeagol2> | 4: OPEN1
2 <Sm> <eaeagol2> | 6: STAR
REG_ANY can match 8 times out of 2147483647...
Setting an EVAL scope, savestack=3
10 <Smeaeagol2> <> | 8: CLOSE1
10 <Smeaeagol2> <> | 10: STAR
EXACT <g> can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=3
10 <Smeaeagol2> <> | 13: ANYOF[aeiou]
failed...
failed...
9 <Smeaeagol> <2> | 8: CLOSE1
9 <Smeaeagol> <2> | 10: STAR
EXACT <g> can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=3
9 <Smeaeagol> <2> | 13: ANYOF[aeiou]
failed...
failed...
8 <Smeaeago> <l2> | 8: CLOSE1
8 <Smeaeago> <l2> | 10: STAR
EXACT <g> can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=3
8 <Smeaeago> <l2> | 13: ANYOF[aeiou]
failed...
failed...
7 <Smeaeag> <ol2> | 8: CLOSE1
7 <Smeaeag> <ol2> | 10: STAR
EXACT <g> can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=3
7 <Smeaeag> <ol2> | 13: ANYOF[aeiou]
8 <Smeaeago> <l2> | 24: EXACT <l>
9 <Smeaeagol> <2> | 26: EOL
failed...
failed...
6 <Smeaea> <gol2> | 8: CLOSE1
6 <Smeaea> <gol2> | 10: STAR
EXACT <g> can match 1 times out of 2147483647...
Setting an EVAL scope, savestack=3
7 <Smeaeag> <ol2> | 13: ANYOF[aeiou]
8 <Smeaeago> <l2> | 24: EXACT <l>
9 <Smeaeagol> <2> | 26: EOL
failed...
6 <Smeaea> <gol2> | 13: ANYOF[aeiou]
failed...
failed...
5 <Smeae> <agol2> | 8: CLOSE1
5 <Smeae> <agol2> | 10: STAR
EXACT <g> can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=3
5 <Smeae> <agol2> | 13: ANYOF[aeiou]
6 <Smeaea> <gol2> | 24: EXACT <l>
failed...
failed...
4 <Smea> <eagol2> | 8: CLOSE1
4 <Smea> <eagol2> | 10: STAR
EXACT <g> can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=3
4 <Smea> <eagol2> | 13: ANYOF[aeiou]
5 <Smeae> <agol2> | 24: EXACT <l>
failed...
failed...
3 <Sme> <aeagol2> | 8: CLOSE1
3 <Sme> <aeagol2> | 10: STAR
EXACT <g> can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=3
3 <Sme> <aeagol2> | 13: ANYOF[aeiou]
4 <Smea> <eagol2> | 24: EXACT <l>
failed...
failed...
2 <Sm> <eaeagol2> | 8: CLOSE1
2 <Sm> <eaeagol2> | 10: STAR
EXACT <g> can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=3
2 <Sm> <eaeagol2> | 13: ANYOF[aeiou]
3 <Sme> <aeagol2> | 24: EXACT <l>
failed...
failed...
failed...
Match failed
Freeing REx: `"^Sm(.*)g*[aeiou]l$"'
 
S

sln

Hi, am trying out the "Smeagol" example from "Programming
Perl" (section 5.9.3) :

#!/usr/bin/perl
use re "debug";
"Smeagol" =~ /^Sm(.*)g[aeiou]l$/;

Am using perl 5.10.0 on 32-bit Windows XP. Following is the output:

====================================

Compiling REx "^Sm(.*)g[aeiou]l$"
Final program:
1: BOL (2)
2: EXACT <Sm> (4)
4: OPEN1 (6)
6: STAR (8)
7: REG_ANY (0)
8: CLOSE1 (10)
10: EXACT <g> (12)
12: ANYOF[aeiou] (23)
23: EXACT <l> (25)
25: EOL (26)
26: END (0)
anchored "Sm" at 0 floating "l"$ at 4..2147483647 (checking anchored)
anchored(BOL) minlen 5
Guessing start of match in sv for REx "^Sm(.*)g[aeiou]l$" against
"Smeagol"
Guessed: match at offset 0
Matching REx "^Sm(.*)g[aeiou]l$" against "Smeagol"
0 <> <Smeagol> | 1:BOL(2)
0 <> <Smeagol> | 2:EXACT <Sm>(4)
^^
found 'Sm', take off 'Sm'
2 <Sm> <eagol> | 4:OPEN1(6)
2 <Sm> <eagol> | 6:STAR(8)
^
found 'g', take off 'ea' (end of quantifier *)
REG_ANY can match 5 times out of
2147483647...
4 <Smea> <gol> | 8: CLOSE1(10)
4 <Smea> <gol> | 10: EXACT <g>(12)
^
found 'g', take off 'g'
5 <Smeag> <ol> | 12: ANYOF[aeiou](23)
^
found 'o', take off 'o'
6 <Smeago> <l> | 23: EXACT <l>(25)
^
found 'l', take off 'l

I goes all the way to the end without backtracking.
That is becuase all the other characters after 'ea' match and
they are no other quantifiers. Even if the characters
after Sm did not match the constants after '.*' in the regex,
there would be no backtracking, there is no other variability in the
regex.

-sln


Match failed, still no backtracking:
---------------------------------------------
use re "debug";
"Smeagol2" =~ /^Sm(.*)g[aeiou]l$/;
-----------------------------------------

Compiling REx `^Sm(.*)g[aeiou]l$'
size 26 Got 212 bytes for offset annotations.
first at 2
1: BOL(2)
2: EXACT <Sm>(4)
4: OPEN1(6)
6: STAR(8)
7: REG_ANY(0)
8: CLOSE1(10)
10: EXACT <g>(12)
12: ANYOF[aeiou](23)
23: EXACT <l>(25)
25: EOL(26)
26: END(0)
anchored `Sm' at 0 floating `l'$ at 4..2147483647 (checking anchored) anchored(B
OL) minlen 5
Offsets: [26]
1[1] 2[2] 0[0] 4[1] 0[0] 6[1] 5[1] 7[1] 0[0] 8[1] 0[0] 9[7] 0[0] 0[0] 0[
0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 16[1] 0[0] 17[1] 18[0]
Guessing start of match, REx `^Sm(.*)g[aeiou]l$' against `Smeagol2'...
Guessed: match at offset 0
Matching REx `^Sm(.*)g[aeiou]l$' against `Smeagol2'
Setting an EVAL scope, savestack=3
0 <> <Smeagol2> | 1: BOL
0 <> <Smeagol2> | 2: EXACT <Sm>
2 <Sm> <eagol2> | 4: OPEN1
2 <Sm> <eagol2> | 6: STAR
REG_ANY can match 6 times out of 2147483647...
Setting an EVAL scope, savestack=3
4 <Smea> <gol2> | 8: CLOSE1
4 <Smea> <gol2> | 10: EXACT <g>
5 <Smeag> <ol2> | 12: ANYOF[aeiou]
6 <Smeago> <l2> | 23: EXACT <l>
7 <Smeagol> <2> | 25: EOL
failed...
failed...
Match failed
Freeing REx: `"^Sm(.*)g[aeiou]l$"'



-sln
 
I

Ilya Zakharevich

I do not remember details, and did not look into the source today, so
it might be all bogus...
Compiling REx "^Sm(.*)g[aeiou]l$" ....
6: STAR (8)
7: REG_ANY (0)
....

I suspect it is a lie...
2 <Sm> <eagol> | 6:STAR(8)
REG_ANY can match 5 times out of INF...
4 <Smea> <gol> | 8: CLOSE1(10)
4 <Smea> <gol> | 10: EXACT <g>(12)

My conjecture is that the actual node in the compiled REx is not
STAR(REG_ANY), but OPTIMIZED_STAR_ANCHOREDBY(REG_ANY, <g>) (invented
name ;-). This would mean that what follows the node "7:" is checked
ONLY IF it starts with "g".

This is not contradicted by the effect of having an extra g after Sm

perl -Mre=debug -wle "q(Smeagogl) =~ /^Sm(.*)g[aeiou]l$/"

2 <Sm> <eagogl> | 6: STAR
REG_ANY can match 6 times out of INF...
Setting an EVAL scope, savestack=3
6 <Smeago> <gl> | 8: CLOSE1
6 <Smeago> <gl> | 10: EXACT <g>
7 <Smeagog> <l> | 12: ANYOF[aeiou]
failed...
4 <Smea> <gogl> | 8: CLOSE1
4 <Smea> <gogl> | 10: EXACT <g>

Hope this helps,
Ilya
 
K

Krishna Chaitanya

What surprises me is the example in the camel book actually shows
backtracking for this example. But executing the same here did not.
How is this difference arising?
 
A

A. Sinan Unur

What surprises me is the example in the camel book actually shows
backtracking for this example. But executing the same here did not.
How is this difference arising?

You are going to have to bite the bullet and learn how to quote.

Please read the posting guidelines. Understand them. Apply them.

The answer to your question: Maybe, just maybe, the authors used a
different version of perl when they were writing the book.

Have you considered that possibility?

Maybe what they wrote back then no longer applies.

I don't have the book with me but I have a vague recollection of that
section.

The authors were just describing behavior rather than explaining a
specification.

Therefore, I find your expectation that pathalogical behavior that was
exhibited by the regex engine back then to have remained with us at
least mildly perturbing.

Now, please, before you go back to studying typeglobs, read the posting
guidelines.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
S

sln

I do not remember details, and did not look into the source today, so
it might be all bogus...
Compiling REx "^Sm(.*)g[aeiou]l$" ...
6: STAR (8)
7: REG_ANY (0)
...

I suspect it is a lie...
2 <Sm> <eagol> | 6:STAR(8)
REG_ANY can match 5 times out of INF...
4 <Smea> <gol> | 8: CLOSE1(10)
4 <Smea> <gol> | 10: EXACT <g>(12)

My conjecture is that the actual node in the compiled REx is not
STAR(REG_ANY), but OPTIMIZED_STAR_ANCHOREDBY(REG_ANY, <g>) (invented
name ;-). This would mean that what follows the node "7:" is checked
ONLY IF it starts with "g".

This is not contradicted by the effect of having an extra g after Sm

perl -Mre=debug -wle "q(Smeagogl) =~ /^Sm(.*)g[aeiou]l$/"

2 <Sm> <eagogl> | 6: STAR
REG_ANY can match 6 times out of INF...
Setting an EVAL scope, savestack=3
6 <Smeago> <gl> | 8: CLOSE1
6 <Smeago> <gl> | 10: EXACT <g>
7 <Smeagog> <l> | 12: ANYOF[aeiou]
failed...
4 <Smea> <gogl> | 8: CLOSE1
4 <Smea> <gogl> | 10: EXACT <g>

Hope this helps,
Ilya

Yes, but the point is that backtracking only occurs on a failed match
where there are other possibilities given classes and quantifiers,
and the available target string.

None of those conditions applied in her case.

"this is the one" =~ /.*r/

doesen't cut it. No backtracking at all.

-sln
 
K

Krishna Chaitanya

The answer to your question: Maybe, just maybe, the authors used a
different version of perl when they were writing the book.

Have you considered that possibility?

Yes. I couldn't have missed it. I stated I am using 5.10.0 and I could
see the book was written during (or covering) 5.6.
Maybe what they wrote back then no longer applies.

I don't have the book with me but I have a vague recollection of that
section.

The authors were just describing behavior rather than explaining a
specification.

But, the example's output definitely showed the (.*) eating up the
rest of the string and yielding a character at a time till a 'g' was
found.
Therefore, I find your expectation that pathalogical behavior that was
exhibited by the regex engine back then to have remained with us at
least mildly perturbing.

Let's agree with you here. What must have changed? The regex rules?
The pragma's output? If so, what is the authoritative text now for me?
I'll just go back and read my way through. My ecstatic feeling was
that using the pragma and the rules from the book, I can learn to
write better regexes.
 
S

sln

Let's agree with you here. What must have changed? The regex rules?
The pragma's output? If so, what is the authoritative text now for me?
I'll just go back and read my way through. My ecstatic feeling was
that using the pragma and the rules from the book, I can learn to
write better regexes.

Yeah, thats cool an all. So far you've learned what the * quantifier is.
Your well on your way to learning Regular Expressions. Another 20 years
and you might be able to parse something.

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,140
Latest member
SweetcalmCBDreview
Top