Question about perlreref - are {n} and {n}? different?

usenet · Oct 26, 2005

perlreref::QUANTIFIERS says:

Quantifiers are greedy by default -- match the longest leftmost.
Maximal Minimal Allowed range
------- ------- -------------
{n,m} {n,m}? Must occur at least n times but no more than m times
{n,} {n,}? Must occur at least n times
{n} {n}? Must occur exactly n times
[etc, snip]

Aren't {n} and {n}? really the same thing? You can't have greediness
if you stipulate an exact count, so you can't negate greediness if you
can't _have_ greediness, so {n}? (though it may be syntically correct)
makes no sense to me.

Anno Siegel · Oct 26, 2005

perlreref::QUANTIFIERS says:

Quantifiers are greedy by default -- match the longest leftmost.
Maximal Minimal Allowed range
------- ------- -------------
{n,m} {n,m}? Must occur at least n times but no more than m times
{n,} {n,}? Must occur at least n times
{n} {n}? Must occur exactly n times
[etc, snip]

Aren't {n} and {n}? really the same thing? You can't have greediness
if you stipulate an exact count, so you can't negate greediness if you
can't _have_ greediness, so {n}? (though it may be syntically correct)
makes no sense to me.

Yes, it's a degenerate case, the "?" has no effect. No big deal.

Anno

Dr.Ruud · Oct 26, 2005

(e-mail address removed) schreef:

perlreref::QUANTIFIERS says:

Quantifiers are greedy by default -- match the longest leftmost.
Maximal Minimal Allowed range
------- ------- -------------
{n,m} {n,m}? Must occur at least n times but no more than m times

The 'Must occur ... no more than m times' is not accurate.

#!/usr/bin/perl -w
use strict;

my $s = 'a'x100; # is more than 50 times

sub run {
local ($,, $\) = (' ', "\n");
my $re; ($re, $_) = @_;
s/$re/$1/;
print length, length($1);
}

run 'a{10,50}?(.*)' , $s;
run 'a{10,50}?(.*?)a', $s;
run 'a{10,50}?(.*?)' , $s;
run 'a{10,50}(.*?)' , $s;
run 'a{10,50}(.*)' , $s;

output:
90 90
89 0
90 0
50 0
50 50

usenet · Oct 26, 2005

Dr.Ruud said:
The 'Must occur ... no more than m times' is not accurate.
[code snip]
output:
90 90
89 0
90 0
50 0
50 50

I'm confused - the output seems consistent with the perlreref
statement. Can you explain which one(s) of these outputs you feel is
inconsistent, and what you would have expected it to be?

Dr.Ruud · Oct 27, 2005

(e-mail address removed) schreef:

Dr.Ruud:

I'm confused - the output seems consistent with the perlreref
statement. Can you explain which one(s) of these outputs you feel is
inconsistent, and what you would have expected it to be?

I did not expect any of the outputs to be different.

It's just nitpicking. The description says that if m=50, my 'a' must not
occur more than 50 times, so I gave it 100 'a'-s.

"Must occur at least n times but no more than m times" -->
"Must match at least n times, and will try to match up to m times".

Juha Laiho · Oct 29, 2005

Dr.Ruud said:
(e-mail address removed) schreef:

The 'Must occur ... no more than m times' is not accurate.

Hmm.. given a little bit more context, it is accurate:

AX{n,m}B

Here, to make the whole RE match, you must have an A, followed by at least
n, but no more than m Xs, followed by a B.

Dr.Ruud · Oct 29, 2005

Juha Laiho schreef:

Dr.Ruud:

Hmm.. given a little bit more context, it is accurate:

AX{n,m}B

Yes, but not in general.

echo 'AXXXXXXXXB' |
perl -ne 'chomp; print "$_:OK\n" if /AX{2,3}/;'

Compare perlre(1)

{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times

Compare grep(1)

{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.

{n,m} The preceding item is matched at least n times,
but not more than m times.

So perlreref(1) could use one of those.

Joe Smith · Oct 31, 2005

Dr.Ruud said:
(e-mail address removed) schreef:

The 'Must occur ... no more than m times' is not accurate.

It is accurate, when you realize that it is talking about the
characters that were actually part of the matched string.
Characters outside the matched string are irrelevant when
the match succeeds.

#!/usr/bin/perl -w
use strict;

my $s = 'a'x100; # is more than 50 times

sub run {
local ($,, $\) = (' ', "\n");
my $re; ($re, $_) = @_;
s/$re/$1/;
print length, length($1);
}

run 'a{10,50}?(.*)' , $s;
run 'a{10,50}?(.*?)a', $s;
run 'a{10,50}?(.*?)' , $s;
run 'a{10,50}(.*?)' , $s;
run 'a{10,50}(.*)' , $s;

output:
90 90
89 0
90 0
50 0
50 50

run 'a{10,50}?(.*)' , $s;
First part matches minimum; 'a'x10.
Second part matches rest of string; 'a'x90.
Replacing first+second with just second = 'a'x90.
Expected result of "90 90" = yes.

run 'a{10,50}?(.*?)a', $s;
First part matches minimum; 'a'x10.
Second part matches the null string.
Third part matches 11th a.
Replacing first+second+third with just second leaves the
89 characters that were not part of the overall match = 'a'x89.
Expected result of "89 0" = yes.

run 'a{10,50}?(.*?)' , $s;
First part matches minimum; 'a'x10.
Second part matches the null string.
Replacing first+second with just second leaves the
90 characters that were not part of the overall match = 'a'x90.
Expected result of "90 0" = yes.

run 'a{10,50}(.*?)' , $s;
First part matches maximum; 'a'x50.
Second part matches the null string.
Replacing first+second with just second leaves the
50 characters that were not part of the overall match = 'a'x50.
Expected result of "50 0" = yes.

run 'a{10,50}(.*)' , $s;
First part matches maximum; 'a'x50.
Second part matches rest of string; 'a'x50.
Replacing first+second with just second = 'a'x50.
Expected result of "50 50" = yes.

The s/$re/$1/ just confuses things. This is better:

#!/usr/bin/perl -w
use strict;

my $s = 'a'x100; # is more than 50 times

sub run {
my $re; ($re, $_) = @_;
/$re/;
print "\$1='$1' \$2='$2' rest=|$'|\n";
}

run '(a{10,50}?)(.*)' , $s;
run '(a{10,50}?)(.*?)a', $s;
run '(a{10,50}?)(.*?)' , $s;
run '(a{10,50})(.*?)' , $s;
run '(a{10,50})(.*)' , $s;
$1='aaaaaaaaaa'
$2='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
rest=||
$1='aaaaaaaaaa' $2=''
rest=|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|
$1='aaaaaaaaaa' $2=''
rest=|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|
$1='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' $2=''
rest=|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|
$1='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
$2='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' rest=||

This shows that /a{10,50}?/ matches the first 10 characters of the
string and /a{10,50}/ matches the first 50 characters of the string.

-Joe

Dr.Ruud · Oct 31, 2005

Joe Smith:

Dr.Ruud:

It is accurate, when you realize that it is talking about the
characters that were actually part of the matched string.

s/were/are/

The language is simply part of the non-accuracy. Both perlre and grep
describe it accurate, no reason for perlreref to do otherwise.

i=infinity;0= isin kpi, 1=cos kpi, k=m/n, n=4,m=0-00; cG=20=const, 1/sgrt2>G>0.5, 6<N = NA ^2su	13	Aug 8, 2006
Question about C semantics/optimizing C compilers	9	Jul 1, 2009
Can C++(11) split a converting constructor with implicit and explicitconversion? Plus variadic quer	0	Nov 2, 2011
Regex testing and UTF8 awarenes or Regex and numeric pattern matching	2	Mar 10, 2009
Where can I find a summary table of various reqular expressions options?	0	Apr 6, 2004
zlib question (compression/uncompression fails) - demo atatched	4	Aug 19, 2007
[ann] regexp-engine 0.11	2	Jun 2, 2004
[SUMMARY] Longest Repeated Substring (#153)	0	Jan 24, 2008

Question about perlreref - are {n} and {n}? different?

usenet

Anno Siegel

Dr.Ruud

usenet

Dr.Ruud

Juha Laiho

Dr.Ruud

Joe Smith

Dr.Ruud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads