strange results using m//g in while loop...

D

Damian

The following works as expected (examples):


my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while ($s =~ m|$re_d|g) {
print "[$1 $2 $3]\n";
}

And prints:

[1 2 3]
[4 5 6]
[7 8 9]



However, if I try to capture the $1, $2, ... matches ot an array, it
turns into na infinate loop:

my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

my @r;
while (@r = $s =~ m|$re_d|g) {
print "[@r]\n";
}

And keeps printing:

[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
 
K

ko

Damian wrote:

[snip]
However, if I try to capture the $1, $2, ... matches ot an array, it
turns into na infinate loop:

my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

my @r;
while (@r = $s =~ m|$re_d|g) {
print "[@r]\n";
}

And keeps printing:

[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
.
.
.

Why is ading the array to suck up the results causing the behavior to
change?
And for that matter, how can I capture the $1, $2... in an array like
this?

(That @& talked about a couple years ago would really come in handy.)

Thanks.

When the /g modifier is used in list context, it gets *all* matches.
That's normal behavior. So in this case @r is assigned nine elements.
Since this is evaluated in scalar context, basically what you get is:

while (9) {
print "[@r]\n";
}

which is an infinite loop.

HTH - keith
 
A

Anno Siegel

Damian said:
The following works as expected (examples):


my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while ($s =~ m|$re_d|g) {
print "[$1 $2 $3]\n";
}

And prints:

[1 2 3]
[4 5 6]
[7 8 9]



However, if I try to capture the $1, $2, ... matches ot an array, it
turns into na infinate loop:

Yup, that's expected.

There's a whole lot of magic going on with //g in scalar context that
doesn't happen in list context. In particular, in scalar context, each
//g matches only once (despite the /g), but sets the position (pos())
of the string to the place past the last match. The next match implicitly
starts matching at pos(), not at the start of the string. That mechanism
steers you safely through the loop when you match in scalar context.

In list context, a global match is just a global match: It starts at
the beginning and matches everything it can to the end of the string.
The next match does exactly the same thing, hence the endless loop.
my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

my @r;
while (@r = $s =~ m|$re_d|g) {
print "[@r]\n";
}

And keeps printing:

[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]

Anno
 
K

ko

ko wrote:

[snip]
Since this is evaluated in scalar context, basically what you get is:

while (9) {
print "[@r]\n";
}

which is an infinite loop.

Whoa, *what* was I thinking? A true value, of course, is also returned
when the regex matches in scalar context. As Anno explained, pos()
allows you to break out of the loop.
 
D

Damian

Anno said:
Damian said:
The following works as expected (examples):


my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while ($s =~ m|$re_d|g) {
print "[$1 $2 $3]\n";
}

And prints:

[1 2 3]
[4 5 6]
[7 8 9]



However, if I try to capture the $1, $2, ... matches ot an array, it
turns into na infinate loop:

Yup, that's expected.

There's a whole lot of magic going on with //g in scalar context that
doesn't happen in list context. In particular, in scalar context,
each //g matches only once (despite the /g), but sets the position
(pos())
of the string to the place past the last match. The next match
implicitly
starts matching at pos(), not at the start of the string. That
mechanism
steers you safely through the loop when you match in scalar context.

In list context, a global match is just a global match: It starts at
the beginning and matches everything it can to the end of the string.
The next match does exactly the same thing, hence the endless loop.
my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

my @r;
while (@r = $s =~ m|$re_d|g) {
print "[@r]\n";
}

And keeps printing:

[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]

Anno

Thank you.

So is there anyway around this? To be able to capture a variable
(unknown) about of $1, $2, ... matches to an array (without using an
eval) ?
 
D

Damian

Anno said:
Damian said:
The following works as expected (examples):


my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while ($s =~ m|$re_d|g) {
print "[$1 $2 $3]\n";
}

And one more question if I may, why does this not work?

while ($s =~ m|$re_d|g) {
my @r = @_; # $_ is emtpy too...
print "[@r]\n";
}


[]
[]
[]

I thought the default loop variable was $_ ? Which why you do something
like
while (<FILE>) { print; }
 
B

Brian McCauley

doesn't trim:

[ Please quote only what is relevant ]
So is there anyway around this? To be able to capture a variable
(unknown) about of $1, $2, ... matches to an array (without using an
eval) ?

After a match against some variable $var $1 is the same as
substr($var, $-[1], $+[1] - $-[1])

The number of the last successful capture in the regex is $#- and the
total number of captures in the regex is in $#+.

This is all explained in a rather obscure place in the manuals.
Namely, perlvar for "@-" and "@+".

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
U

Uri Guttman

D" == Damian said:
Anno said:
Damian said:
The following works as expected (examples):


my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while ($s =~ m|$re_d|g) {
print "[$1 $2 $3]\n";
}
And one more question if I may, why does this not work?
while ($s =~ m|$re_d|g) {
my @r = @_; # $_ is emtpy too...

why are you using @_? it has nothing to do with regexes. where did you
get this false concept?

uri
 
B

Brian McCauley

Damian said:
And one more question if I may, why does this not work?

while ($s =~ m|$re_d|g) {
my @r = @_; # $_ is emtpy too...
print "[@r]\n";
}

Because, as MJD would say, "you can't just make shit up and expect the
computer to know what you mean, retardo!"

$_ and @_ and $1,$2,$3 etc are all different variables.
I thought the default loop variable was $_ ?

It is in "for". ( In "map", it's the only loop variable!).
Which why you do something like
while (<FILE>) { print; }

No, this is a different bit of DWIM.

The <handle> operator acts in a special way it notices that it
appears inside the condition of a while.

while (<FILE>) { print; }

Is actually interpreted as if it said:

while ( defined ( $_ = <FILE>) ) { print; }

Note: $_ is not local()ized here - this can lead to some odd effects.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
U

Uri Guttman

D" == Damian said:
my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";
my @r;
while (@r = $s =~ m|$re_d|g) {
print "[@r]\n";
}
And keeps printing:
[1 2 3 4 5 6 7 8 9]
Why is ading the array to suck up the results causing the behavior to
change?

as others have pointed out, a regex in an list context with /g will keep
running it until it fails. sort of like while/g in scalar context but
all at one time.
And for that matter, how can I capture the $1, $2... in an array like
this?

you have captured them in the array. you just did it 3 times in each
call to the regex so you got all 9 digits.

so work with this and change your loop to grab 3 values at a time:

my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

# get all the digits at once

my @r = $s =~ m|$re_d|g ;

# loop over them 3 at a time
while (@s = splice( @r, 0, 3 )) {
print "[@s]\n";
}

uri
 
A

Anno Siegel

Damian said:
Anno said:
Damian said:
The following works as expected (examples):


my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while ($s =~ m|$re_d|g) {
print "[$1 $2 $3]\n";
}

And prints:

[1 2 3]
[4 5 6]
[7 8 9]



However, if I try to capture the $1, $2, ... matches ot an array, it
turns into na infinate loop:

Yup, that's expected.

There's a whole lot of magic going on with //g in scalar context that
doesn't happen in list context. In particular, in scalar context,
each //g matches only once (despite the /g), but sets the position
(pos())
of the string to the place past the last match. The next match
implicitly
starts matching at pos(), not at the start of the string. That
mechanism
steers you safely through the loop when you match in scalar context.

In list context, a global match is just a global match: It starts at
the beginning and matches everything it can to the end of the string.
The next match does exactly the same thing, hence the endless loop.
my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

my @r;
while (@r = $s =~ m|$re_d|g) {
print "[@r]\n";
}

And keeps printing:

[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]

Anno

Thank you.

So is there anyway around this? To be able to capture a variable
(unknown) about of $1, $2, ... matches to an array (without using an
eval) ?

You can analyze the @- and @+ arrays after the match. Something like
(untested, may be off by one (or more))

map substr( $str, $-[ $_], $+[ $_] - $-[ $_]), 0 .. $#-;

should list the whole match first, then $1, $2, etc. I'm not sure how
well that works with /g.

Another way is to provide the behavior of /g in scalar context explicitly
while calling it in list context:

while ( my @matches = /\G<something>/ ) {
pos = $+[ 0];
# ...
}

Anno
 
A

Anno Siegel

Damian said:
Anno said:
Damian said:
The following works as expected (examples):


my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while ($s =~ m|$re_d|g) {
print "[$1 $2 $3]\n";
}

And one more question if I may, why does this not work?

while ($s =~ m|$re_d|g) {
my @r = @_; # $_ is emtpy too...
print "[@r]\n";
}

Well, it doesn't work because it's neither implemented nor documented
that way. It could be so, granted, but it isn't.
I thought the default loop variable was $_ ? Which why you do something
like
while (<FILE>) { print; }

Even if this were entirely true (scalar $_ is *the* default variable in
Perl, not only loop variable), it would do nothing to explain the behavior
of the array @_. Though they share the name "_", they are entirely
different variables. That goes for all names and is a fundamental fact
in Perl. The array @_ is important as the variable space in subroutine
calls, but functions and operators don't deposit their results there
(except in one case, which happened by accident).

Anno
 
D

Damian

Anno said:
Another way is to provide the behavior of /g in scalar context
explicitly
while calling it in list context:

while ( my @matches = /\G<something>/ ) {
pos = $+[ 0];
# ...
}

Thank you Anno, I tried using the above, as it seems to be the solution
I'm looking for, though it never seems to enter the loop:

my $re_d = qr|(\d),(\d),(\d)|;

my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while (my @r = $s =~ m/\G$re_d/) {
pos($s) = $+[0];
print "[@r]\n";
}

(I tried it with the /g modifier too, though I didn't think it should be
there in this case.)

Is there something missing?
Thanks again.
 
B

Brian McCauley

Damian said:
Anno said:
Another way is to provide the behavior of /g in scalar context
explicitly
while calling it in list context:

while ( my @matches = /\G<something>/ ) {
pos = $+[ 0];
# ...
}

Thank you Anno, I tried using the above, as it seems to be the solution
I'm looking for, though it never seems to enter the loop:

my $re_d = qr|(\d),(\d),(\d)|;

my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while (my @r = $s =~ m/\G$re_d/) {
pos($s) = $+[0];
print "[@r]\n";
}

(I tried it with the /g modifier too, though I didn't think it should be
there in this case.)

Is there something missing?

Yes, Anno missed out .*? which is needed if the matches are
non-contiguous.

my $re_d = qr|(\d),(\d),(\d)|;

my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while (my @r = $s =~ m/\G.*?$re_d/) {
pos($s) = $+[0];
print "[@r]\n";
}

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
U

Uri Guttman

D" == Damian said:
Thank you Anno, I tried using the above, as it seems to be the solution
I'm looking for, though it never seems to enter the loop:
while (my @r = $s =~ m/\G$re_d/) {

from perlre:

In scalar context, each execution of "m//g" finds the next
match, returning true if it matches, and false if there is no
further match. The position after the last match can be read or
set using the pos() function; see the pos entry in the perlfunc
manpage. A failed match normally resets the search position to
the beginning of the string, but you can avoid that by adding
the "/c" modifier (e.g. "m//gc"). Modifying the target string
also resets the search position.

You can intermix "m//g" matches with "m/\G.../g", where "\G" is
a zero-width assertion that matches the exact position where the
previous "m//g", if any, left off. Without the "/g" modifier,
the "\G" assertion still anchors at pos(), but the match is of
course only attempted once. Using "\G" without "/g" on a target
string that has not previously had a "/g" match applied to it is
the same as using the "\A" assertion to match the beginning of
the string.

so \G is meaningless in list context and you regex fails and the loop
isn't entered. |G is only useful with /g in a scalar context.

see my other post for a working solution with splice.

uri
 
B

Brian McCauley

Uri Guttman said:
from perlre:

In scalar context, each execution of "m//g" finds the next
match, returning true if it matches, and false if there is no
further match. The position after the last match can be read or
set using the pos() function; see the pos entry in the perlfunc
manpage. A failed match normally resets the search position to
the beginning of the string, but you can avoid that by adding
the "/c" modifier (e.g. "m//gc"). Modifying the target string
also resets the search position.

You can intermix "m//g" matches with "m/\G.../g", where "\G" is
a zero-width assertion that matches the exact position where the
previous "m//g", if any, left off. Without the "/g" modifier,
the "\G" assertion still anchors at pos(), but the match is of
course only attempted once. Using "\G" without "/g" on a target
string that has not previously had a "/g" match applied to it is
the same as using the "\A" assertion to match the beginning of
the string.

so \G is meaningless in list context

No it isn't.
and you regex fails and the loop isn't entered.

The regex fails because pos($s)=0 initially and hense the first \G
behaves like \A. The OP's pattern $re_d didn't match at the start
of $s so the loop isn't entered.
|G is only useful with /g in a scalar context.

No it isn't.
see my other post for a working solution with splice.

See my other post for a working solution with \G with out /g and with
m// in a list context.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
T

Trent Curry

Uri said:
so \G is meaningless in list context and you regex fails and the loop
isn't entered. |G is only useful with /g in a scalar context.

Apparenly it isn't:


<code>
my $re_d = qr|(\d),(\d),(\d)|;

my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while (my @r = $s =~ m/\G.*?$re_d/) {
pos($s) = $+[0];
print "[@r]\n";
}
</code>


<output>
[1 2 3]
[4 5 6]
[7 8 9]

</output>

Remove the \G and it keeps iterating ad infinitum
[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]
....

Conclusion: it indeed makes a difference, forcing scalar context, giving
@r that matches for that iteration only, and the pos assignment manually
moving the matching along.

--
Trent Curry

perl -e
'($s=qq/e29716770256864702379602c6275605/)=~s!([0-9a-f]{2})!pack("h2",$1
)!eg;print(reverse("$s")."\n");'
 
U

Uri Guttman

BM" == Brian McCauley said:
Yes, Anno missed out .*? which is needed if the matches are
non-contiguous.


my $re_d = qr|(\d),(\d),(\d)|;
my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";
while (my @r = $s =~ m/\G.*?$re_d/) {

ok, it is the .*? that makes it work. i tried it without that and i see
why it failed (as you pointed out). the pos (\G) starts at 0 or the end
of the last match and you need the .*? to eat chars before the match. i
don't use \G much and i never used it in a list context before. the docs
i quoted seem to imply that it was only for list context but it just
uses the pos() value either way.

uri
 
A

Anno Siegel

Damian said:
Anno said:
Another way is to provide the behavior of /g in scalar context
explicitly
while calling it in list context:

while ( my @matches = /\G<something>/ ) {
pos = $+[ 0];
# ...
}

Thank you Anno, I tried using the above, as it seems to be the solution
I'm looking for, though it never seems to enter the loop:

my $re_d = qr|(\d),(\d),(\d)|;

my $s = "abcde1,2,3hk4,5,6hkgk7,8,9dfdfdfd";

while (my @r = $s =~ m/\G$re_d/) {
pos($s) = $+[0];
print "[@r]\n";
}

(I tried it with the /g modifier too, though I didn't think it should be
there in this case.)

No, I was assuming something that wasn't given, namely that your regex
matched everything from the last match (or start of string) to, well,
where the next match should begin. Make it so and it works. One could
use \w* to eat the leading alphabetics:

my $re = qr|\w*(\d),(\d),(\d)|;

Anno
 
B

Ben Morrow

Brian McCauley said:
The number of the last successful capture in the regex is $#- and the
total number of captures in the regex is in $#+.

Ouch! These *really* ought to be the same.

I would prefer to write $#+ as @+-1, because $#+ is an ordinal
while scalar(@+) is a cardinal. This would bite if you were daft
enough to change $[... :)

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,808
Messages
2,569,686
Members
45,452
Latest member
AmberLayde

Latest Threads

Top