D
ddtl
Hello everybody,
I have some difficulty to understand why does qr// operator needs
'o' modifier. There seems to be a disagreement between perlop manpage
and "Programming Perl" (3rd edition - i will use PP for short from now
on).
From PP (chapter 5.9.2.2), it is clear that qr// needed for the cases
when it is impossible to use usual /o modifier, and a programmer wants
to spare recompilation every time RE is evaluated. Here is a quote:
-----------------------------------------------------------------------
Variables that interpolate into patterns necessarily do so at run time,
not compile time. This slows down execution because Perl has to check
whether you've changed the contents of the variable; if so, it would
have to recompile the regular expression. As mentioned in
"Pattern-Matching Operators", if you promise never to change the pattern,
you can use the /o option to interpolate and compile only once:
print if /$pattern/o;
Although that works fine in our pgrep program, in the general case,
it doesn't. Imagine you have a slew of patterns, and you want to match
each of them in a loop, perhaps like this:
foreach $item (@data) {
foreach $pat (@patterns) {
if ($item =~ /$pat/) { ... }
}
}
You couldn't write /$pat/o because the meaning of $pat varies each time
through the inner loop.
The solution to this is the qr/PATTERN/imosx operator. This operator
quotes--and compiles--its PATTERN as a regular expression. PATTERN is
interpolated the same way as in m/PATTERN/. If ' is used as the delimiter,
no interpolation of variables (or the six translation escapes) is done.
The operator returns a Perl value that may be used instead of the equivalent
literal in a corresponding pattern match or substitute.
-----------------------------------------------------------------------
But perlop manpage says something different:
-----------------------------------------------------------------------
qr/STRING/imosx
This operator quotes (and possibly compiles) its STRING as a regular
expression. STRING is interpolated the same way as PATTERN in m/PATTERN/.
If "'" is used as the delimiter, no interpolation is done. Returns a
Perl value which may be used instead of the corresponding /STRING/imosx
expression.
-----------------------------------------------------------------------
According to the manpage, it is quite common that qr// does not compile
RE, which is, except being different from the said in the book, doesn't
really make sense - why would otherwise anybody will need qr// for?
Also, the book doesn't even mention an existence of /o modifier for
qr// when he talks about modifiers (in the same section. though it
does mention it in the previous quote):
-----------------------------------------------------------------------
....
....
The reason this works is because the qr// operator returns a special kind
of object that has a stringification overload as described in Chapter 13,
"Overloading". If you print out the return value, you'll see the equivalent
string:
$re = qr/my.STRING/is;
print $re; # prints (?si-xm:my.STRING)
The /s and /i modifiers were enabled in the pattern because they were
supplied to qr//. The /x and /m, however, are disabled because they were not.
-----------------------------------------------------------------------
Additionally, using "use re "debug";" option, i checked what is the
difference between when you add /o modifier to qr// and when you don't -
and as i found out - there is no difference, the expression was compiled
only once when compiler reached qr// operator (while it was compiled every
time RE was evaluated when a usual double-quoted string was used.
Here is my test case:
---------------------------------------
#!/usr/bin/perl
use strict;
use re "debug";
my $re = qr/world/;
"hello world" =~ /$re/;
"hello world" =~ /$re/;
---------------------------------------
The output i get when running the program (which is the same regardless of
/o modifier)
---------------------------------------
Compiling REx `world'
size 4 Got 36 bytes for offset annotations.
first at 1
1: EXACT <world>(4)
4: END(0)
anchored `world' at 0 (checking anchored isall) minlen 5
Offsets: [4]
1[5] 0[0] 0[0] 6[0]
Guessing start of match, REx `world' against `hello world'...
Found anchored substr `world' at offset 6...
Starting position does not contradict /^/m...
Guessed: match at offset 6
Guessing start of match, REx `world' against `hello world'...
Found anchored substr `world' at offset 6...
Starting position does not contradict /^/m...
Guessed: match at offset 6
Freeing REx: `"world"'
---------------------------------------
Is there is an error in the manpage? If it is not, how it is possible to
explain the difference between the book and the manpage, and especially -
why there is a need for qr// according to the manpage's version?
ddtl.
I have some difficulty to understand why does qr// operator needs
'o' modifier. There seems to be a disagreement between perlop manpage
and "Programming Perl" (3rd edition - i will use PP for short from now
on).
From PP (chapter 5.9.2.2), it is clear that qr// needed for the cases
when it is impossible to use usual /o modifier, and a programmer wants
to spare recompilation every time RE is evaluated. Here is a quote:
-----------------------------------------------------------------------
Variables that interpolate into patterns necessarily do so at run time,
not compile time. This slows down execution because Perl has to check
whether you've changed the contents of the variable; if so, it would
have to recompile the regular expression. As mentioned in
"Pattern-Matching Operators", if you promise never to change the pattern,
you can use the /o option to interpolate and compile only once:
print if /$pattern/o;
Although that works fine in our pgrep program, in the general case,
it doesn't. Imagine you have a slew of patterns, and you want to match
each of them in a loop, perhaps like this:
foreach $item (@data) {
foreach $pat (@patterns) {
if ($item =~ /$pat/) { ... }
}
}
You couldn't write /$pat/o because the meaning of $pat varies each time
through the inner loop.
The solution to this is the qr/PATTERN/imosx operator. This operator
quotes--and compiles--its PATTERN as a regular expression. PATTERN is
interpolated the same way as in m/PATTERN/. If ' is used as the delimiter,
no interpolation of variables (or the six translation escapes) is done.
The operator returns a Perl value that may be used instead of the equivalent
literal in a corresponding pattern match or substitute.
-----------------------------------------------------------------------
But perlop manpage says something different:
-----------------------------------------------------------------------
qr/STRING/imosx
This operator quotes (and possibly compiles) its STRING as a regular
expression. STRING is interpolated the same way as PATTERN in m/PATTERN/.
If "'" is used as the delimiter, no interpolation is done. Returns a
Perl value which may be used instead of the corresponding /STRING/imosx
expression.
-----------------------------------------------------------------------
According to the manpage, it is quite common that qr// does not compile
RE, which is, except being different from the said in the book, doesn't
really make sense - why would otherwise anybody will need qr// for?
Also, the book doesn't even mention an existence of /o modifier for
qr// when he talks about modifiers (in the same section. though it
does mention it in the previous quote):
-----------------------------------------------------------------------
....
....
The reason this works is because the qr// operator returns a special kind
of object that has a stringification overload as described in Chapter 13,
"Overloading". If you print out the return value, you'll see the equivalent
string:
$re = qr/my.STRING/is;
print $re; # prints (?si-xm:my.STRING)
The /s and /i modifiers were enabled in the pattern because they were
supplied to qr//. The /x and /m, however, are disabled because they were not.
-----------------------------------------------------------------------
Additionally, using "use re "debug";" option, i checked what is the
difference between when you add /o modifier to qr// and when you don't -
and as i found out - there is no difference, the expression was compiled
only once when compiler reached qr// operator (while it was compiled every
time RE was evaluated when a usual double-quoted string was used.
Here is my test case:
---------------------------------------
#!/usr/bin/perl
use strict;
use re "debug";
my $re = qr/world/;
"hello world" =~ /$re/;
"hello world" =~ /$re/;
---------------------------------------
The output i get when running the program (which is the same regardless of
/o modifier)
---------------------------------------
Compiling REx `world'
size 4 Got 36 bytes for offset annotations.
first at 1
1: EXACT <world>(4)
4: END(0)
anchored `world' at 0 (checking anchored isall) minlen 5
Offsets: [4]
1[5] 0[0] 0[0] 6[0]
Guessing start of match, REx `world' against `hello world'...
Found anchored substr `world' at offset 6...
Starting position does not contradict /^/m...
Guessed: match at offset 6
Guessing start of match, REx `world' against `hello world'...
Found anchored substr `world' at offset 6...
Starting position does not contradict /^/m...
Guessed: match at offset 6
Freeing REx: `"world"'
---------------------------------------
Is there is an error in the manpage? If it is not, how it is possible to
explain the difference between the book and the manpage, and especially -
why there is a need for qr// according to the manpage's version?
ddtl.