giampiero said:
Yes, the above is correct. Both will match any string of
characters (with a caveat around a newline depending on whether
the //s switch is active at the time the regexp is encountered --
but that behavior will be the same between the two). As to why
(?:.{0,})+ is the same as .* : {0,} is a longhand way of writing
*, so .{0,} is the same as .* . (?:.{0,}) is then also the same
as .* . Now, (?:.{0,}) will match any character string (see
caveat above), hence (?:.{0,})+ will also, with the + interpreted
as "once". Depending on the character string, it might also
match, say, half of the string followed by the other half, or a
quarter followed by the other three-fourths, etc etc. Note that
there are a whole bunch of ways (?:.{0,})+ can match a character
string -- but also note that the resulting match does in fact
match the entire character string, just as .* would have.
This is incorrect. (.*)+ contains grouping parentheses which
will cause the last string matched by .* to be returned in $1 and
other side reactions to occur in the various other
regexp-grouping-related variables. (?:.{0,})+ does not contain
any grouping parentheses pairs. Hence these two, while they will
match the same strings (namely, all of them, subject to my caveat
above), are not the same because they do not cause the same
ultimate actions.
You seem to be totally missing the idea of why one *never* wants
to do something like (?:.*)+ . It is not just that it takes more
time to type and to think about; it is that such an expression
causes an extreme amount of backtracking when something
subsequent to it fails to match in a regexp. That translates
into computer time -- potentially *years* of it -- spent doing
absolutely nothing worthwhile. Here is an example program that
shows the backtracking I'm talking about as the execution of the
regexps proceeds:
use warnings;
use strict;
my $s='aaaaaaaaaaaaaaaaaaaaaaaaa';
print "Matching re1:\n";
$s=~/(.*)(??{print "$1\n";''})\1/;
print "Done matching re1. Push return to continue.\n";
<>;
print "Matching re2:\n";
$s=~/((?:.*)+)(??{print "$1\n";''})\1/;
The result of running this should be most instructive as to why
one should avoid unneeded backtracking in regexps. Note that the
same result is achieved with both "re1" and "re2" above, but at
substantially higher computational cost in the case of "re2".