Subroutines and $_[0]

G

George

Dear All,

I am parsing a web page with the LWP module and then doing some regular
expression matching to print out specific chunks of code. The way I
organised it is:

while (regular expression matches) {
process_text($1);
}

So far, everything is fine - however I noticed the following with the
subroutine: if I write code as following using $_[0], then the next time
within the same subroutine that I use it to check against another
regular expression it is empty.

$_[0] =~ m{<span class="listing_default">(.*?)</span></a>\s*</div>}si;
$title=$1;
$mtc1 = $_[0] =~ m{<div class="listing_results_rating">(.*?)</div>}si;
if ($mtc1) {
$rating=0;
}
else {
$rating=$1;
}

But if use shift and save the value in a variable, it works fine, i.e.:

my $ocontent = shift;
$ocontent =~ m{<span class="listing_default">(.*?)</span></a>\s*</div>}si;
$title=$1;
$mtc1 = $ocontent =~ m{<div class="listing_results_rating">(.*?)</div>}si;
if ($mtc1) {
$rating=0;
}
else {
$rating=$1;
}

If anyone could shed some light on why this is the case, I would be
grateful.

Regards,
George
 
J

Jochen Lehmeier

So far, everything is fine - however I noticed the following with the
subroutine: if I write code as following using $_[0],
But if use shift and save the value in a variable, it works fine

The $@ array in a sub is basically just an alias for the parameters you
called the sub with. If you modify it, you actually are modifying the
argument from the caller's point of view. If that happens with $1, you get
funny side effects, looks like.

Look:

~> perl -e '$a="first"; fn($a); print $a,"\n"; sub fn { $_[0]="second" }'
second

~> perl -e '"a"=~m/(.*)/; print "before: $1\n"; fn($1); print "after:
$1\n"; exit; \
sub fn { print "inside 1: @_\n"; "b" =~ m//; print "inside 2:
@_\n" }'
before: a
inside 1: a
inside 2: b
after: b

What you usually do in a sub is to somehow extract the alues from @_,
because usually you do not want to accidently modify the arguments (and it
is easier to see what arguments you expect):

sub f
{
my ($arg1,$arg2,$arg3)=@_;
...
}

or

sub f
{
my $args=shift @_;
...
}
 
C

C.DeRykus

...
while (regular expression matches) {
        process_text($1);

}

So far, everything is fine - however I noticed the following with the
subroutine: if I write code as following using $_[0], then the next time
within the same subroutine that I use it to check against another
regular expression it is empty.

$_[0] =~ m{<span class="listing_default">(.*?)</span></a>\s*</div>}si;
  $title=$1;

It's good form to check that the match succeeds before
assigning to backreferences. At the very least, your
control logic becomes much easier to follow:

if ( $_[0] =~ m{....} ) {
$title = $1;
...
}

Note: $title isn't defined if the match fails

$mtc1 = $_[0] =~ m{<div class="listing_results_rating">(.*?)</div>}si;

The match operator =~ will bind more tightly than =
so that'll be parsed as:

$mtc1 = ( $_[0] =~ m{...} );

That means $mtc1 will either be 0 if the match fails
or 1 if the match succeeds. Evidently you know that
but now the code becomes a bit tricky and isn't as
clear.
  if ($mtc1) {
        $rating=0;
        }
  else {
        $rating=$1;
        }

As noted, $mtc1 becomes a boolean test of the match
success as written so I suspect the above should be
flipped to read:

if ( $mtc1 ) { # succeeds
$rating = $1;
} else {
$rating = 0; # fails
}


But see how much clearer the following is and now you
don't need $mtc1 (unless $mtc1 is used later in your
code for other purposes):


if ( ($_[0] ) =~ m{...} ) { # match succeeds
$rating = $1;
...

} else { # match fails
$rating = 0;
..
}

But if use shift and save the value in a variable, it works fine, i.e.:
$mtc1 = $ocontent =~ m{<div class="listing_results_rating">(.*?)</div>}si;
if ($mtc1) {
$rating=0;
}
else {
$rating=$1;
}

Did you get tangled in the web of your logic...?
 
U

Uri Guttman

BM> Quoth "C.DeRykus said:
while (regular expression matches) {
        process_text($1);

}

So far, everything is fine - however I noticed the following with the
subroutine: if I write code as following using $_[0], then the next time
within the same subroutine that I use it to check against another
regular expression it is empty.

$_[0] =~ m{<span class="listing_default">(.*?)</span></a>\s*</div>}si;
  $title=$1;

It's good form to check that the match succeeds before
assigning to backreferences. At the very least, your
control logic becomes much easier to follow:

if ( $_[0] =~ m{....} ) {
$title = $1;
...
}

BM> if (my ($title) = $_[0] =~ m{...}) {
BM> ...
BM> }

BM> is clearer.

or even this so you can declare $title and make sure it is set to
something useful

my $title = $_[0] =~ m{...}) ? $1 : '' ;

but i still can't see why he had to save the value in a lexical to make
it work. i think there is unpasted code that affects things.

uri
 
W

Willem

Uri Guttman wrote:
) or even this so you can declare $title and make sure it is set to
) something useful
)
) my $title = $_[0] =~ m{...}) ? $1 : '' ;
)
) but i still can't see why he had to save the value in a lexical to make
) it work. i think there is unpasted code that affects things.

Because otherwise $_[0] is an alias for $1, and the next regular
expression will change the value of $1, and therefore the value of $_[0] ?

This is a pretty basic perl gotcha, you know.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
U

Uri Guttman

W> Uri Guttman wrote:
W> ) or even this so you can declare $title and make sure it is set to
W> ) something useful
W> )
W> ) my $title = $_[0] =~ m{...}) ? $1 : '' ;
W> )
W> ) but i still can't see why he had to save the value in a lexical to make
W> ) it work. i think there is unpasted code that affects things.

W> Because otherwise $_[0] is an alias for $1, and the next regular
W> expression will change the value of $1, and therefore the value of $_[0] ?

W> This is a pretty basic perl gotcha, you know.

he is doing m// ops which don't modify their arg. maybe he was doing
s/// ops but he didn't show any that i saw.

uri
 
G

George

Willem said:
Uri Guttman wrote:
) or even this so you can declare $title and make sure it is set to
) something useful
)
) my $title = $_[0] =~ m{...}) ? $1 : '' ;
)
) but i still can't see why he had to save the value in a lexical to make
) it work. i think there is unpasted code that affects things.

Because otherwise $_[0] is an alias for $1, and the next regular
expression will change the value of $1, and therefore the value of $_[0] ?

This is a pretty basic perl gotcha, you know.


SaSW, Willem
I am actually not doing any substitutions, simply checking whether there
is a pattern match or not. I cut down the subroutine to:

sub check_url {

print "Orginal 1: ",$_[0],"\n";
my $test = "foobar12";
print "Test before: ",$test,"\n";
$test =~ m/([a-z]{1,})[0-9]{1,}/;
print "Test 2: ",$test,"\n";
print "Original argument: ",$_[0], "\n";

}

Now, if I run it the original argument (first print statement) prints as
<!--
-->
<div class="listing_results_logo"><a
href="http://www.test.com/hayles-accountants.html"><img border="0"
src="http://www.test.com/template/default-siva/images/noimage.gif"
alt="Hayles Accountants" /></a></div>
<div class="listing_results_listing">
<div class="listing_results_rating"></div>
<div class="listing_results_title"><a
href="http://www.test.com/hayles-accountants.html"><span
class="listing_default">Hayles Accountants</span></a> </div>
<div class="listing_results_address">

<!-- new code added by PMD GFX -->
Boston Rd,<br /> Hanwell,<br />
Middlesex W7 3TT <!-- end -->
<br /><br />

</div>

<div class="listing_results_description">
</div>
</div>

which is correct as well as the value of $test which is foobar12. My
understanding is that the next line will only check if the test variable
matches the pattern and will not make any changes to it, so the next
print statement is correct as well (am I right here in thinking that it
will just check if there is a match and not change the original variable)?

But, the print statement on $_[0] prints foobar - could you please
explain why this is the case?

Regards,
George
 
J

Jochen Lehmeier


Look what happened when I played around with the OP's case yesterday. I
thought I'd ignore what happened then, but I find it interesting, actually.

# perl -v

This is perl, v5.8.8 built for i486-linux-gnu-thread-multi

# perl -e '
"a" =~ m/(.*)/;
print "before: $1\n";
fn($1);
print "after: $1\n";

sub fn
{
print "inside 1: @_\n";
"b" =~ m//; # !!!!!
print "inside 2: @_\n"
}'

before: a
inside 1: a
inside 2: b
after: b # !!!!!

# perl -e '
"a" =~ m/(.*)/;
print "before: $1\n";
fn($1);
print "after: $1\n";

sub fn
{
print "inside 1: @_\n";
"b" =~ m/(.*)/; # !!!!!
print "inside 2: @_\n"
}'

before: a
inside 1: a
inside 2: b
after: a # !!!!!

First, some pertinent lines from the documentation:

perlvar on $1...: "These variables are all read-only and dynamically
scoped to the current BLOCK."
perlop: m// uses "the last successfully matched regular expression".
perlsub: "The array @_ is a local array, but its elements are aliases for
the actual scalar parameters."

So, in the first test, m// matches and uses the old regexp m/(.*)/. It
sets $1 from the point of the view of the sub *and* of the caller. ***This
contradicts the scoping to the current BLOCK*** if I'm not mistaken.

In the second test, m/(.*)/ matches. It sets $1 from the point of view of
the sub (it's not in the code but we can assume that ;-) ) ****and also
$_[0]**** which is aliased to $1 (in the caller). ***But it does not set
$1*** from the point of view of the caller. ***How does it know that $_[0]
is $1?***

There is something very funny going on, which I would definitely not have
expected.

I think one could explain the first effect (that m// overwrites the
original $1) by assuming that the $1 is actually linked to the regexp, and
by re-using the old regexp, is overwritten. Though this is not documented.
The second escapes me.
 
J

John Bokma

George said:
sub check_url {

print "Orginal 1: ",$_[0],"\n";
my $test = "foobar12";
print "Test before: ",$test,"\n";
$test =~ m/([a-z]{1,})[0-9]{1,}/;
print "Test 2: ",$test,"\n";
print "Original argument: ",$_[0], "\n";

}

perl -e '
use strict;
use warnings;

sub check_url {

print "Orginal 1: ",$_[0],"\n";
my $test = "foobar12";
print "Test before: ",$test,"\n";
$test =~ m/([a-z]{1,})[0-9]{1,}/;
print "Test 2: ",$test,"\n";
print "Original argument: ",$_[0], "\n";
}

my $url = "http://example.com/";
check_url( $url );
print "\n\nAnd here it goes wrong...\n";
$url =~ /(.*)/;
check_url( $1 );
'

Orginal 1: http://example.com/
Test before: foobar12
Test 2: foobar12
Original argument: http://example.com/


And here it goes wrong...
Orginal 1: http://example.com/
Test before: foobar12
Test 2: foobar12
Original argument: foobar

Since $_[ 0 ] is an alias for $1, you modify $_[ 0 ] if you modify $1 in
your regexp.

Solution: use:

my $url = shift;

at the start of your sub, and replace $_[0] with $url in the rest of
your code.

Another tip:

print "Original 1: ", $_[0], "\n";

Can be written as:

print "Original 1: $_[0]\n";

etc.
 
J

John Bokma

Uri Guttman said:
W> Uri Guttman wrote:
W> ) or even this so you can declare $title and make sure it is set to
W> ) something useful
W> )
W> ) my $title = $_[0] =~ m{...}) ? $1 : '' ;
W> )
W> ) but i still can't see why he had to save the value in a lexical to make
W> ) it work. i think there is unpasted code that affects things.

W> Because otherwise $_[0] is an alias for $1, and the next regular
W> expression will change the value of $1, and therefore the value of $_[0] ?

W> This is a pretty basic perl gotcha, you know.

he is doing m// ops which don't modify their arg. maybe he was doing
s/// ops but he didn't show any that i saw.

He passes $1 as an argument ;-)
 
W

Willem

George wrote:
) Willem wrote:
)> Uri Guttman wrote:
)> ) or even this so you can declare $title and make sure it is set to
)> ) something useful
)> )
)> ) my $title = $_[0] =~ m{...}) ? $1 : '' ;
)> )
)> ) but i still can't see why he had to save the value in a lexical to make
)> ) it work. i think there is unpasted code that affects things.
)>
)> Because otherwise $_[0] is an alias for $1, and the next regular
)> expression will change the value of $1, and therefore the value of $_[0] ?
)>
)> This is a pretty basic perl gotcha, you know.
)>
)>
)> SaSW, Willem
) I am actually not doing any substitutions, simply checking whether there
) is a pattern match or not. I cut down the subroutine to:

<snip>
)
) But, the print statement on $_[0] prints foobar - could you please
) explain why this is the case?

You're passing $1 as an argument.
That makes $_[0] an alias for $1.
The next regexp match, $1 gets the value of its first paren match.
The value of $_[0], being an alias for $1, therefore also gets this value.

I hope I explained it clearly enough this time.

Code:

perl -e 'my $x = "foobarbaz"; my $y = "fefifofum";
$x =~ /(......)/; foofun($1);
sub foofun { print "$_[0]\n"; $y =~ /(......)/; print "$_[0]\n" }'

Result:

foobar
fefifo

Clear now ?


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
S

sln

Willem said:
Uri Guttman wrote:
) or even this so you can declare $title and make sure it is set to
) something useful
)
) my $title = $_[0] =~ m{...}) ? $1 : '' ;
)
) but i still can't see why he had to save the value in a lexical to make
) it work. i think there is unpasted code that affects things.

Because otherwise $_[0] is an alias for $1, and the next regular
expression will change the value of $1, and therefore the value of $_[0] ?

This is a pretty basic perl gotcha, you know.


SaSW, Willem
I am actually not doing any substitutions, simply checking whether there
is a pattern match or not. I cut down the subroutine to:

sub check_url {

print "Orginal 1: ",$_[0],"\n";
my $test = "foobar12";
print "Test before: ",$test,"\n";
$test =~ m/([a-z]{1,})[0-9]{1,}/;
print "Test 2: ",$test,"\n";
print "Original argument: ",$_[0], "\n";

}

Now, if I run it the original argument (first print statement) prints as
<!--
-->
<div class="listing_results_logo"><a
href="http://www.test.com/hayles-accountants.html"><img border="0"
src="http://www.test.com/template/default-siva/images/noimage.gif"
alt="Hayles Accountants" /></a></div>
<div class="listing_results_listing">
<div class="listing_results_rating"></div>
<div class="listing_results_title"><a
href="http://www.test.com/hayles-accountants.html"><span
class="listing_default">Hayles Accountants</span></a> </div>
<div class="listing_results_address">

<!-- new code added by PMD GFX -->
Boston Rd,<br /> Hanwell,<br />
Middlesex W7 3TT <!-- end -->
<br /><br />

</div>

<div class="listing_results_description">
</div>
</div>

which is correct as well as the value of $test which is foobar12. My
understanding is that the next line will only check if the test variable
matches the pattern and will not make any changes to it, so the next
print statement is correct as well (am I right here in thinking that it
will just check if there is a match and not change the original variable)?

But, the print statement on $_[0] prints foobar - could you please
explain why this is the case?

It does NOT print 'foobar'. In your code $_[0] does not alias the $test
variable! Even if it did alias $test, you did not do any substitution
in your regex so it will still be foobar12 if $_[0] did alias $test,
which it doesen't.

$_[] do not alias $1 vars directly unless $1 is passed in as a parameter:

myfunc($1);

sub myfunc {
# when $1 is passed in, $_[0] becomes an alias for $1
print $_[0];
$_[] =~ /asd(fas)df/;
print $_[0];
}

$1 aliased in myfunc() is readonly and is subject to change
upon the first regular expression.

So, passing in the $(n) variables when using a subs $_[] variables
is not a good idea. The same holds true if passing in a tempoary
like myfunc("this").

But none of this is your problem. I think in your desperation you
are typing/changing test lines so fast you are not even sure
of what you are seeing.

You can alias all you want but its something you should read
up on a little more.

-sln
--------------
use strict;
use warnings;

my $string = "this is 999 a string";

check_match ($string);

my $str = "howdy all";
check_substitution ($str);
check_substitution ("howdy all ");

exit 0;

###
sub check_match
{
print "\nOrginal 1: ",$_[0],"\n";
my $test = "foobar12";
print "Test before: ",$test,"\n";
$test =~ /([a-z]{1,})[0-9]{1,}/;
print "\$test = $test ,, \$1 = $1\n";
print "Original argument: ",$_[0], "\n";
check_match2 ($1);
}

sub check_match2
{
print "\nOrginal 1: ",$_[0],"\n";
my ($test) = $_[0] =~ /([a-z]{1,3})[0-9]*/;
print "\$test = $test ,, \$1 = $1\n";
print "Original argument: ",$_[0], "\n";
}

sub check_substitution
{
print "\nOrginal 1: ",$_[0],"\n";
$_[0] =~ s/[a-z]{1,3}//;
print "Original argument: ",$_[0], "\n";
}
__END__

Orginal 1: this is 999 a string
Test before: foobar12
$test = foobar12 ,, $1 = foobar
Original argument: this is 999 a string

Orginal 1: foobar
$test = foo ,, $1 = foo
Original argument: foo

Orginal 1: howdy all
Original argument: dy all

Orginal 1: howdy all
Modification of a read-only value attempted at hh.pl line 38.
 
U

Uri Guttman

JB> Since $_[ 0 ] is an alias for $1, you modify $_[ 0 ] if you modify $1 in
JB> your regexp.

i see it now. i was thinking about it backwards as in $1 is readonly and
if you pass it and then modify it, you get errors. this is the case
where you alias $1 and it get set (not modified) by the grab and so the
alias is also set to its new value.

yes, and the solution is to not do that! :)

uri
 
S

sln


Look what happened when I played around with the OP's case yesterday. I
thought I'd ignore what happened then, but I find it interesting, actually.

# perl -v

This is perl, v5.8.8 built for i486-linux-gnu-thread-multi

# perl -e '
"a" =~ m/(.*)/;
print "before: $1\n";
fn($1);
print "after: $1\n";

sub fn
{
print "inside 1: @_\n";
"b" =~ m//; # !!!!!
^^
I thought this used to be an error since it
matches nothing. Apparently, the construct of
target/regex operator/pattern parses into
a function call where @_ is loaded with the target/regex and
passed to the engine. Upon seeing m//, it doesen't seem to restore
(unwind) the callers @_.

"b" =~ /sdfds/;

works, even though it doesen't match.
Try this:

sub fn
{
print "inside 1: $_[0]\n";
# "b" =~ m//; # !!!!!
my $jj = 'c';
$jj =~ s///; # !!!!!
print "inside 2: $_[0]\n";
print "inside 3: $jj\n";
}
before: a
inside 1: a
inside 2: c
inside 3:
after: c

sub fn
{
print "inside 1: $_[0]\n";
# "b" =~ m//; # !!!!!
my $jj = 'c';
$jj =~ s/g//; # !!!!!
print "inside 2: $_[0]\n";
print "inside 3: $jj\n";
}
before: a
inside 1: a
inside 2: a
inside 3: c
after: a

print "inside 2: @_\n"
}'

before: a
inside 1: a
inside 2: b
after: b # !!!!!

# perl -e '
"a" =~ m/(.*)/;
print "before: $1\n";
fn($1);
print "after: $1\n";

sub fn
{
print "inside 1: @_\n";
"b" =~ m/(.*)/; # !!!!!
^^^^
Works correctly even on a non-match like
"b" =~ /pkj/;
print "inside 2: @_\n"
}'

before: a
inside 1: a
inside 2: b
after: a # !!!!!

First, some pertinent lines from the documentation:

perlvar on $1...: "These variables are all read-only and dynamically
scoped to the current BLOCK."
perlop: m// uses "the last successfully matched regular expression".
perlsub: "The array @_ is a local array, but its elements are aliases for
the actual scalar parameters."

So, in the first test, m// matches and uses the old regexp m/(.*)/. It
sets $1 from the point of the view of the sub *and* of the caller. ***This
contradicts the scoping to the current BLOCK*** if I'm not mistaken.

In the second test, m/(.*)/ matches. It sets $1 from the point of view of
the sub (it's not in the code but we can assume that ;-) ) ****and also
$_[0]**** which is aliased to $1 (in the caller). ***But it does not set
$1*** from the point of view of the caller. ***How does it know that $_[0]
is $1?***

There is something very funny going on, which I would definitely not have
expected.

I think one could explain the first effect (that m// overwrites the
original $1) by assuming that the $1 is actually linked to the regexp, and
by re-using the old regexp, is overwritten. Though this is not documented.
The second escapes me.

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top