Question about perlreref - are {n} and {n}? different?

U

usenet

perlreref::QUANTIFIERS says:

Quantifiers are greedy by default -- match the longest leftmost.
Maximal Minimal Allowed range
------- ------- -------------
{n,m} {n,m}? Must occur at least n times but no more than m times
{n,} {n,}? Must occur at least n times
{n} {n}? Must occur exactly n times
[etc, snip]

Aren't {n} and {n}? really the same thing? You can't have greediness
if you stipulate an exact count, so you can't negate greediness if you
can't _have_ greediness, so {n}? (though it may be syntically correct)
makes no sense to me.
 
A

Anno Siegel

perlreref::QUANTIFIERS says:

Quantifiers are greedy by default -- match the longest leftmost.
Maximal Minimal Allowed range
------- ------- -------------
{n,m} {n,m}? Must occur at least n times but no more than m times
{n,} {n,}? Must occur at least n times
{n} {n}? Must occur exactly n times
[etc, snip]

Aren't {n} and {n}? really the same thing? You can't have greediness
if you stipulate an exact count, so you can't negate greediness if you
can't _have_ greediness, so {n}? (though it may be syntically correct)
makes no sense to me.

Yes, it's a degenerate case, the "?" has no effect. No big deal.

Anno
 
D

Dr.Ruud

(e-mail address removed) schreef:
perlreref::QUANTIFIERS says:

Quantifiers are greedy by default -- match the longest leftmost.
Maximal Minimal Allowed range
------- ------- -------------
{n,m} {n,m}? Must occur at least n times but no more than m times


The 'Must occur ... no more than m times' is not accurate.

#!/usr/bin/perl -w
use strict;

my $s = 'a'x100; # is more than 50 times

sub run {
local ($,, $\) = (' ', "\n");
my $re; ($re, $_) = @_;
s/$re/$1/;
print length, length($1);
}

run 'a{10,50}?(.*)' , $s;
run 'a{10,50}?(.*?)a', $s;
run 'a{10,50}?(.*?)' , $s;
run 'a{10,50}(.*?)' , $s;
run 'a{10,50}(.*)' , $s;

output:
90 90
89 0
90 0
50 0
50 50
 
U

usenet

Dr.Ruud said:
The 'Must occur ... no more than m times' is not accurate.
[code snip]
output:
90 90
89 0
90 0
50 0
50 50

I'm confused - the output seems consistent with the perlreref
statement. Can you explain which one(s) of these outputs you feel is
inconsistent, and what you would have expected it to be?
 
D

Dr.Ruud

(e-mail address removed) schreef:
Dr.Ruud:

I'm confused - the output seems consistent with the perlreref
statement. Can you explain which one(s) of these outputs you feel is
inconsistent, and what you would have expected it to be?

I did not expect any of the outputs to be different.

It's just nitpicking. The description says that if m=50, my 'a' must not
occur more than 50 times, so I gave it 100 'a'-s.

"Must occur at least n times but no more than m times" -->
"Must match at least n times, and will try to match up to m times".
 
J

Juha Laiho

Dr.Ruud said:
(e-mail address removed) schreef:


The 'Must occur ... no more than m times' is not accurate.

Hmm.. given a little bit more context, it is accurate:

AX{n,m}B

Here, to make the whole RE match, you must have an A, followed by at least
n, but no more than m Xs, followed by a B.
 
D

Dr.Ruud

Juha Laiho schreef:
Dr.Ruud:

Hmm.. given a little bit more context, it is accurate:

AX{n,m}B

Yes, but not in general.

echo 'AXXXXXXXXB' |
perl -ne 'chomp; print "$_:OK\n" if /AX{2,3}/;'


Compare perlre(1)

{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times

Compare grep(1)

{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.

{n,m} The preceding item is matched at least n times,
but not more than m times.

So perlreref(1) could use one of those.
 
J

Joe Smith

Dr.Ruud said:
(e-mail address removed) schreef:


The 'Must occur ... no more than m times' is not accurate.

It is accurate, when you realize that it is talking about the
characters that were actually part of the matched string.
Characters outside the matched string are irrelevant when
the match succeeds.
#!/usr/bin/perl -w
use strict;

my $s = 'a'x100; # is more than 50 times

sub run {
local ($,, $\) = (' ', "\n");
my $re; ($re, $_) = @_;
s/$re/$1/;
print length, length($1);
}

run 'a{10,50}?(.*)' , $s;
run 'a{10,50}?(.*?)a', $s;
run 'a{10,50}?(.*?)' , $s;
run 'a{10,50}(.*?)' , $s;
run 'a{10,50}(.*)' , $s;

output:
90 90
89 0
90 0
50 0
50 50

run 'a{10,50}?(.*)' , $s;
First part matches minimum; 'a'x10.
Second part matches rest of string; 'a'x90.
Replacing first+second with just second = 'a'x90.
Expected result of "90 90" = yes.

run 'a{10,50}?(.*?)a', $s;
First part matches minimum; 'a'x10.
Second part matches the null string.
Third part matches 11th a.
Replacing first+second+third with just second leaves the
89 characters that were not part of the overall match = 'a'x89.
Expected result of "89 0" = yes.

run 'a{10,50}?(.*?)' , $s;
First part matches minimum; 'a'x10.
Second part matches the null string.
Replacing first+second with just second leaves the
90 characters that were not part of the overall match = 'a'x90.
Expected result of "90 0" = yes.

run 'a{10,50}(.*?)' , $s;
First part matches maximum; 'a'x50.
Second part matches the null string.
Replacing first+second with just second leaves the
50 characters that were not part of the overall match = 'a'x50.
Expected result of "50 0" = yes.

run 'a{10,50}(.*)' , $s;
First part matches maximum; 'a'x50.
Second part matches rest of string; 'a'x50.
Replacing first+second with just second = 'a'x50.
Expected result of "50 50" = yes.

The s/$re/$1/ just confuses things. This is better:

#!/usr/bin/perl -w
use strict;

my $s = 'a'x100; # is more than 50 times

sub run {
my $re; ($re, $_) = @_;
/$re/;
print "\$1='$1' \$2='$2' rest=|$'|\n";
}

run '(a{10,50}?)(.*)' , $s;
run '(a{10,50}?)(.*?)a', $s;
run '(a{10,50}?)(.*?)' , $s;
run '(a{10,50})(.*?)' , $s;
run '(a{10,50})(.*)' , $s;
$1='aaaaaaaaaa'
$2='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
rest=||
$1='aaaaaaaaaa' $2=''
rest=|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|
$1='aaaaaaaaaa' $2=''
rest=|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|
$1='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' $2=''
rest=|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|
$1='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
$2='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' rest=||

This shows that /a{10,50}?/ matches the first 10 characters of the
string and /a{10,50}/ matches the first 50 characters of the string.

-Joe
 
D

Dr.Ruud

Joe Smith:
Dr.Ruud:

It is accurate, when you realize that it is talking about the
characters that were actually part of the matched string.

s/were/are/

The language is simply part of the non-accuracy. Both perlre and grep
describe it accurate, no reason for perlreref to do otherwise.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top