Why qr// needs /o modifier, or bug in a documentation.

D

ddtl

Hello everybody,

I have some difficulty to understand why does qr// operator needs
'o' modifier. There seems to be a disagreement between perlop manpage
and "Programming Perl" (3rd edition - i will use PP for short from now
on).

From PP (chapter 5.9.2.2), it is clear that qr// needed for the cases
when it is impossible to use usual /o modifier, and a programmer wants
to spare recompilation every time RE is evaluated. Here is a quote:

-----------------------------------------------------------------------
Variables that interpolate into patterns necessarily do so at run time,
not compile time. This slows down execution because Perl has to check
whether you've changed the contents of the variable; if so, it would
have to recompile the regular expression. As mentioned in
"Pattern-Matching Operators", if you promise never to change the pattern,
you can use the /o option to interpolate and compile only once:

print if /$pattern/o;

Although that works fine in our pgrep program, in the general case,
it doesn't. Imagine you have a slew of patterns, and you want to match
each of them in a loop, perhaps like this:

foreach $item (@data) {
foreach $pat (@patterns) {
if ($item =~ /$pat/) { ... }
}
}

You couldn't write /$pat/o because the meaning of $pat varies each time
through the inner loop.

The solution to this is the qr/PATTERN/imosx operator. This operator
quotes--and compiles--its PATTERN as a regular expression. PATTERN is
interpolated the same way as in m/PATTERN/. If ' is used as the delimiter,
no interpolation of variables (or the six translation escapes) is done.
The operator returns a Perl value that may be used instead of the equivalent
literal in a corresponding pattern match or substitute.
-----------------------------------------------------------------------

But perlop manpage says something different:


-----------------------------------------------------------------------
qr/STRING/imosx

This operator quotes (and possibly compiles) its STRING as a regular
expression. STRING is interpolated the same way as PATTERN in m/PATTERN/.
If "'" is used as the delimiter, no interpolation is done. Returns a
Perl value which may be used instead of the corresponding /STRING/imosx
expression.
-----------------------------------------------------------------------

According to the manpage, it is quite common that qr// does not compile
RE, which is, except being different from the said in the book, doesn't
really make sense - why would otherwise anybody will need qr// for?

Also, the book doesn't even mention an existence of /o modifier for
qr// when he talks about modifiers (in the same section. though it
does mention it in the previous quote):

-----------------------------------------------------------------------
....
....
The reason this works is because the qr// operator returns a special kind
of object that has a stringification overload as described in Chapter 13,
"Overloading". If you print out the return value, you'll see the equivalent
string:

$re = qr/my.STRING/is;
print $re; # prints (?si-xm:my.STRING)

The /s and /i modifiers were enabled in the pattern because they were
supplied to qr//. The /x and /m, however, are disabled because they were not.
-----------------------------------------------------------------------



Additionally, using "use re "debug";" option, i checked what is the
difference between when you add /o modifier to qr// and when you don't -
and as i found out - there is no difference, the expression was compiled
only once when compiler reached qr// operator (while it was compiled every
time RE was evaluated when a usual double-quoted string was used.
Here is my test case:

---------------------------------------
#!/usr/bin/perl
use strict;
use re "debug";

my $re = qr/world/;
"hello world" =~ /$re/;
"hello world" =~ /$re/;
---------------------------------------


The output i get when running the program (which is the same regardless of
/o modifier)



---------------------------------------
Compiling REx `world'
size 4 Got 36 bytes for offset annotations.
first at 1
1: EXACT <world>(4)
4: END(0)
anchored `world' at 0 (checking anchored isall) minlen 5
Offsets: [4]
1[5] 0[0] 0[0] 6[0]
Guessing start of match, REx `world' against `hello world'...
Found anchored substr `world' at offset 6...
Starting position does not contradict /^/m...
Guessed: match at offset 6
Guessing start of match, REx `world' against `hello world'...
Found anchored substr `world' at offset 6...
Starting position does not contradict /^/m...
Guessed: match at offset 6
Freeing REx: `"world"'
---------------------------------------




Is there is an error in the manpage? If it is not, how it is possible to
explain the difference between the book and the manpage, and especially -
why there is a need for qr// according to the manpage's version?


ddtl.
 
A

Anno Siegel

ddtl said:
Hello everybody,

I have some difficulty to understand why does qr// operator needs
'o' modifier. There seems to be a disagreement between perlop manpage
and "Programming Perl" (3rd edition - i will use PP for short from now
on).

The qr// operator doesn't need the /o modifier, you must have misunder-
stood what the documentation is saying. The relation is that qr// can
be used to achieve what /o would be needed for without it.

[snip]

Anno
 
D

ddtl

The qr// operator doesn't need the /o modifier, you must have misunder-
stood what the documentation is saying. The relation is that qr// can
be used to achieve what /o would be needed for without it.

Maybe i have chosen rather wrong wording. qr// certainly does not
*need* the /o operator in a sense that it is syntactically correct to
write a statement containing qr// operator without writing an /o
modifier, but if you don't add /o modifier, RE will be compiled every
time it is evaluated, that is what manpage says.

First of all, operator's signature is (everything between double
quotes is quoted from perlop manpage):

"qr/STRING/imosx"

that is, obviously there *is* /o modifier for qr//.

"Options are:

i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o Compile pattern only once.
s Treat string as single line.
x Use extended regular expressions.
"

As the quote above says, if you use /o, pattern is compiled only once,
which obviously means that if you don't use /o - pattern would be
compiled more then once, which is exactly what happens when instead of
using 'qr//'ed expression you use a plain variable in m// or s//.


So, obviously qr// operator *does* need the /o modifier in order to
be equivalent to the usual RE with /o modifier, which means that the
question is still valid.

Maybe you have another explanation to the above quotes from the manpage?
For the time being i don't see any other way to understand it...


ddtl.
 
J

Jeff 'japhy' Pinyan

The qr// operator doesn't need the /o modifier, you must have misunder-
stood what the documentation is saying. The relation is that qr// can
be used to achieve what /o would be needed for without it.

But the qr// operator can take the /o modifier. Perhaps it's a super rare
condition, but you can use it.
 
S

Sam Holden

Maybe i have chosen rather wrong wording. qr// certainly does not
*need* the /o operator in a sense that it is syntactically correct to
write a statement containing qr// operator without writing an /o
modifier, but if you don't add /o modifier, RE will be compiled every
time it is evaluated, that is what manpage says.

The manpage does not say that. In fact the manpage says almost the opposite:

Since Perl may compile the pattern at the moment of execution of qr()
operator, using qr() may have speed advantages in some situations,
notably if the result of qr() is used standalone:

[snip example that doesn't use /o]

Precompilation of the pattern into an internal representation at the
moment of qr() avoids a need to recompile the pattern every time a
match "/$pat/" is attempted.

- perldoc perlop


First of all, operator's signature is (everything between double
quotes is quoted from perlop manpage):

"qr/STRING/imosx"

that is, obviously there *is* /o modifier for qr//.

"Options are:

i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o Compile pattern only once.
s Treat string as single line.
x Use extended regular expressions.
"

As the quote above says, if you use /o, pattern is compiled only once,
which obviously means that if you don't use /o - pattern would be
compiled more then once, which is exactly what happens when instead of
using 'qr//'ed expression you use a plain variable in m// or s//.

Just because A -> B, does not mean that !A -> !B.

Just because the pattern is compiled once with /o, does not mean that
the pattern is not compiled once without /o.
So, obviously qr// operator *does* need the /o modifier in order to
be equivalent to the usual RE with /o modifier, which means that the
question is still valid.

Maybe you have another explanation to the above quotes from the manpage?
For the time being i don't see any other way to understand it...

The o on qr//o doesn't do anything, since qr// precompiles already.

For example:

$needle = 'foo';
$re = qr/$needle/;
$reo = qr/$needle/o;

sub check {
print "\$needle = $needle\n";
for (@_) {
print ' /$needle/ matches ',"$_\n" if /$needle/;
print ' /$needle/o matches ',"$_\n" if /$needle/o;
print ' /$re/ matches ', "$_\n" if /$re/;
print ' /$reo/ matches ', "$_\n" if /$reo/;
}
}


check('barbaz');
$needle = 'bar';
check('barbaz');

Obviously qr// *does not* need the /o modifier in order to be
equivalent to the usual RE with the /o modifier, as evidenced
by the fact that /$needle/o, /$re/, and /$reo/ all fail to match
'barbaz' even though $needle is set to 'bar' in the above code.
 
A

Anno Siegel

ddtl said:
Maybe i have chosen rather wrong wording. qr// certainly does not
*need* the /o operator in a sense that it is syntactically correct to
write a statement containing qr// operator without writing an /o
modifier, but if you don't add /o modifier, RE will be compiled every
time it is evaluated, that is what manpage says.

First of all, operator's signature is (everything between double
quotes is quoted from perlop manpage):

"qr/STRING/imosx"

that is, obviously there *is* /o modifier for qr//.

"Options are:

i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o Compile pattern only once.
s Treat string as single line.
x Use extended regular expressions.
"

As the quote above says, if you use /o, pattern is compiled only once,
which obviously means that if you don't use /o - pattern would be
compiled more then once, which is exactly what happens when instead of
using 'qr//'ed expression you use a plain variable in m// or s//.


So, obviously qr// operator *does* need the /o modifier in order to
be equivalent to the usual RE with /o modifier, which means that the
question is still valid.

Your question seems to be: If qr// is supposed to be used when /o can't
be (because the pattern changes occasionally), why is it allowed to
recompile each time (without /o).

The answer is, you are not supposed to just replace /.../o by qr/.../
literally. qr// allows you to compile a regex in one place, and apply
it in another. So you recompile (using qr//) when needed, and replace
the /.../o with a variable that holds the value where you want to apply
the regex.

Look again at the examples, which you quoted in your first post. They
make pretty clear how qr// is supposed to solve the //o problem.
Maybe you have another explanation to the above quotes from the manpage?
For the time being i don't see any other way to understand it...

You have made a wrong assumption: That qr// goes where the regex used to be.

Anno
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

$needle = 'foo';
$re = qr/$needle/;
$reo = qr/$needle/o;

This is a bad example. These lines are only RUN once.

Compare:

sub make_qr {
my $pat = shift;
return qr/$pat/o;
}

print make_qr('foo'), "\n";
print make_qr('bar'), "\n";

It prints the foo regex both times.
 
S

Sam Holden

[posted & mailed]

$needle = 'foo';
$re = qr/$needle/;
$reo = qr/$needle/o;

This is a bad example. These lines are only RUN once.

I thought the post I was replying to was refering to that case...

Of course, I'm often wrong.
 
D

ddtl

Your question seems to be: If qr// is supposed to be used when /o can't
be (because the pattern changes occasionally), why is it allowed to
recompile each time (without /o).

The answer is, you are not supposed to just replace /.../o by qr/.../
literally. qr// allows you to compile a regex in one place, and apply
it in another. So you recompile (using qr//) when needed, and replace
the /.../o with a variable that holds the value where you want to apply
the regex.

But that does not explain why /o is needed! Yes, qr// allows you to
compile a regex in one place, and apply it in another, because if
it wasn't possible your RE would be recompiled every time it is evaluated,
and we want to be able to compile RE only once, so we use qr//. But
if we use qr// with /o, RE would be compiled every time RE is evaluated,
which defeats the whole reason for usage of qr//o.

Maybe you could give an example when it makes difference between
using qr//o and plain quoted string, that is, between:

-----------------
my $re = qr/hello/o;
....
....
/$re/;
....
....
/$re/;
-----------------

and:

-----------------
my $re = /hello/;
....
....
/$re/;
....
....
/$re/;
-----------------

According to the manpage, whenever "/$re/;" is evaluated, RE would be
recompiled in both examples, so why would i use qr//o at all - exactly
the same thing happens when qr//o is not used!

That is without the fact that in the first example RE actually compiled
only once (when "my $re = qr/hello/o;" is being evaluated - and there is
no difference whether you use /o or not - which means that /o does not
has *any* effect on compilation of RE, which is not what documentation says),
while in the second example RE compiled every time it is evaluated (that is,
every time "/$re/;" executed), though according to the manpage qr//o
is supposed to be recompiled every time RE is evaluated.

Look again at the examples, which you quoted in your first post. They
make pretty clear how qr// is supposed to solve the //o problem.

It is clear to me what problem qr// is supposed to solve, it is not
clear why

1) would somebody use /o modifier,

and

2) what is the difference between using qr// and qr//o - according
to the messages from debugger there is none at all and /o modifier does
not has any effect (try running an examples with "use re "debug";").


ddtl.
 
D

ddtl

Just because A -> B, does not mean that !A -> !B.
Just because the pattern is compiled once with /o, does not mean that
the pattern is not compiled once without /o.

So what that means? Do you mean that when it is said:

"o Compile pattern only once."

means that when you *do not* use 'o', pattern is also compiled only
once?? If A -> B does not mean that !A -> !B (and what you want to say
is that when !A there is still B), means that A is not the only reason
for B. If so, why do you need A at all - it is surely not because
you want B, because B exists even without A. And that is just rephrasing
of my question *why* do you need /o???

ddtl.
 
A

Anno Siegel

ddtl said:
But that does not explain why /o is needed! Yes, qr// allows you to
compile a regex in one place, and apply it in another, because if
it wasn't possible your RE would be recompiled every time it is evaluated,
and we want to be able to compile RE only once, so we use qr//. But
if we use qr// with /o, RE would be compiled every time RE is evaluated,
which defeats the whole reason for usage of qr//o.

Maybe you could give an example when it makes difference between
using qr//o and plain quoted string, that is, between:

-----------------
my $re = qr/hello/o;
...
...
/$re/;
...
...
/$re/;
-----------------

Well, as the documentation assures us, no re-compilation happens in this
case. Why are you assuming the opposite? The /o is irrelevant here,
because the statement is executed only once anyway. To be sure that
no re-compilation *can* happen, rewrite it as

my $re = qr/hello/;
# ...
# ...
$_ =~ $re;
# ...
# ...
$_ =~ $re;

Now we don't have a regex literal in the match, so no compilation happens.

You mean 'hello', not /hello/.
...
...
/$re/;
...
...
/$re/;
-----------------

In fact, even this may not re-compile the regex, if $re hasn't been
changed. The regex compiler has become rather clever about these things.
But that's beside the point. Originally, the regex would be recompiled,
and that's one of the reasons why qr// has been invented.

It is perhaps unfortunate that the Camel doesn't show an example of a
non-literal (bare-variable) pattern match, it could make things clearer.
According to the manpage, whenever "/$re/;" is evaluated, RE would be
recompiled in both examples, so why would i use qr//o at all - exactly
the same thing happens when qr//o is not used!

According to what manpage? Quoting "perldoc perlop":

Since Perl may compile the pattern at the moment
of execution of qr() operator, using qr() may have
speed advantages in some situations, notably if
the result of qr() is used standalone:

sub match {
my $patterns = shift;
my @compiled = map qr/$_/i, @$patterns;
grep {
my $success = 0;
foreach my $pat (@compiled) {
$success = 1, last if /$pat/;
}
$success;
} @_;
}

Precompilation of the pattern into an internal
representation at the moment of qr() avoids a need
to recompile the pattern every time a match ...

This clearly states that a qr//-pattern, used stand-alone in m// does
*not* cause re-compilation.

The use of /o with qr// is a red herring. It rarely makes sense.
Either the qr// is run only once, then it doesn't matter. Or you
run over it again, but then you usually do so because you want another
regex compiled, and /o would defeat the purpose.

[snip argument about /o]

Anno
 
S

Sam Holden

So what that means? Do you mean that when it is said:

"o Compile pattern only once."

means that when you *do not* use 'o', pattern is also compiled only
once?? If A -> B does not mean that !A -> !B (and what you want to say
is that when !A there is still B), means that A is not the only reason
for B. If so, why do you need A at all - it is surely not because
you want B, because B exists even without A. And that is just rephrasing
of my question *why* do you need /o???

In this example:

$foo = "foo";
$re = qr/$foo/;
$reo = qr/$foo/o;
for ("foo", "bar") {
$foo = $_;
print "$_ matches re\n" if $_=~/$re/;
print "$_ matches reo\n" if $_=~/$reo/;
}

The regex is compiled once, as evidenced by the non-matching of "bar" when
$foo is changed. This occurs both with and without /o.

qr//o means reevaluation of the qr//o expression won't recompile the
regex (there was previous post pointing this out), for example:

for ("foo", "bar") {
$foo = $_;
$re = qr/$foo/;
$reo = qr/$foo/o;
print "$_ matches re\n" if $_=~/$re/;
print "$_ matches reo\n" if $_=~/$reo/;
}

If you need to do something like that, then you need qr//o.

However, qr// only compiled when qr// is executed, not when the resulting
regex is used. I was interpreting recompiled to mean at the point of usage.
I can't see why if qr//o was wanted, the qr// wouldn't just be moved out of
the loop or function where is was being evaluated and the resulting value
used instead. It seems like more of a compatibility with normal regexes
type feature to me. Of course I am often wrong.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top