Question on a 'perlsub' statement.

SomeDeveloper · Jul 7, 2009

Hello,

The following is an excerpt from the 'perlsub' man page (line
numbering mine):

1. | So
2. | my ($foo) = <STDIN>; # WRONG?
3. | my @FOO = <STDIN>;
4. |
5. | both supply a list context to the right-hand side,
while
6. |
7. | my $foo = <STDIN>;
8. |
9. | supplies a scalar context.

If on Line 2 a list context is being supplied to the rhs, the I/O
operator would read multiple lines. (I verified that it indeed is
reading multiple lines.) So far so good.

Now, once the the input (consisting of multiple lines) stands read,
why wouldn't (or rather: shouldn't) this text get assigned in its
entirety (complete with '\n' characters) to $foo? Just like the result
of join('', <STDIN>) ?

It appears to me that the list context or scalar context by itself is
not sufficient in determining the behavior. But rather "who or what is
providing this context" also matters.

As a user, I was expecting that since parentheses with the 'my'
modifier are used only to declare multiple lexicals in one-shot, Line
2 SHOULD essentially HAVE behaved like Line 7.

Are such concepts covered in a single online resource somewhere...
complete with rationale and examples?

Many thanks,
/SD

John W. Krahn · Jul 7, 2009

SomeDeveloper said:
The following is an excerpt from the 'perlsub' man page (line
numbering mine):

1. | So
2. | my ($foo) = <STDIN>; # WRONG?
3. | my @FOO = <STDIN>;
4. |
5. | both supply a list context to the right-hand side,
while
6. |
7. | my $foo = <STDIN>;
8. |
9. | supplies a scalar context.

If on Line 2 a list context is being supplied to the rhs, the I/O
operator would read multiple lines. (I verified that it indeed is
reading multiple lines.) So far so good.

Now, once the the input (consisting of multiple lines) stands read,
why wouldn't (or rather: shouldn't) this text get assigned in its
entirety (complete with '\n' characters) to $foo?

Because ($foo) is a list with only one element and the readline operator
<> is defined by default to read lines. If you had used
($foo,$bar,$baz) = <STDIN> then the first three lines would be stored in
their respective variables and the remaining lines discarded.

Just like the result of join('', <STDIN>) ?

There <STDIN> returns a list of lines and join concates them all
together into one scalar value.

It appears to me that the list context or scalar context by itself is
not sufficient in determining the behavior. But rather "who or what is
providing this context" also matters.

As a user, I was expecting that since parentheses with the 'my'
modifier are used only to declare multiple lexicals in one-shot, Line
2 SHOULD essentially HAVE behaved like Line 7.

Are such concepts covered in a single online resource somewhere...
complete with rationale and examples?

John

Alan Curry · Jul 8, 2009

The use of my() is a red herring. my() has no effect on context.

($foo) = <STDIN>;
$foo = <STDIN>;

readline() has exactly the same context as when you put a my infront:

my($foo) = <STDIN>;
my $foo = <STDIN>;

Although I've seen this enough times that it's no longer surprising, I still
find it annoyingly counter-intuitive, and sympathize with those who don't get
it. Let's see if I can explain why.

To understand the comparison made above, you have to see similarity between
this strange pair:
($foo) = <STDIN>;
my($foo) = <STDIN>;

In the first, the left operand of the assignment operator is a parenthesized
expression. In the second, the left operand of the assignment operator is a
function call (by the rule "if it looks like a function, it is a function")
which is not parenthesized. If it was parenthesized, it would look like one
of these:
(my($foo)) = <STDIN>;
(my $foo) = <STDIN>;

It's much harder to accept that the parentheses have an effect reaching all
the way to the other side of the assignment operator, when they are not in
contact with it, and are busy performing an unrelated duty: making the my()
function call look like a function call.

SomeDeveloper · Jul 8, 2009

Because the readline (<>) does not know, nor does it need to know,
where its result is going.

readline() does exactly the same thing (returns a list of all lines
on the handle) for any of these:

my($foo) = <STDIN>;
my($foo, $bar) = <STDIN>;
my @foo = <STDIN>;
join('', <STDIN>);
push @lines, <STDIN>;

It _is_ sufficient.

If it behaved as you suggested, _then_ that would matter.

That is, readline() would need to know who/what is providing the
context before it would know what behavior it should exhibit. It
would need to know "am i being assigned to a one-item list or not?".

And that is almost surely the reason why it behaves the way it does
rather than they way you want it to.

The use of my() is a red herring. my() has no effect on context.

($foo) = <STDIN>;
$foo = <STDIN>;

readline() has exactly the same context as when you put a my infront:

my($foo) = <STDIN>;
my $foo = <STDIN>;

Consider this example.

#!/usr/bin/perl
use warnings;
use strict 'vars';

my $x;
print "list context...\n";
($x) = <STDIN>; # Here, I must press Ctrl-D explicitly.
print "value = '$x'\n";

print "----\n";
print "scalar context...\n";
$x = <STDIN>; # Here, control returns after my first <ENTER>.
print "value = '$x'\n";

Here's the run...
$ ./misc.pl
list context...
This is line 1 of input.
This is line 2 of input!!!!
value = ' This is line 1 of input.
'
----
scalar context...
This is line 1 of input.
value = ' This is line 1 of input.
'

I fully understand that 'my' has no effect on scalar vs list context
being provided to the rhs. What I did (and do) not understand is the
semantics. Here's how I thought (and still think) how Perl should've
worked:
1. Perl encounters the statement:
($x) = <STDIN>
2. Perl provides a list context to the rhs.
3. The <> operator becomes aware of its list context, and begins
expecting/reading (possibly) multiple lines of input until Ctrl-D is
seen.
4. Perl holds the input read so far in some internal temporary.
5. Perl sees that there is only 1 variable -- $x -- in the list on
the lhs, and so assigns the entire input text to $x, much like the
behavior of join('', <STDIN>).

But, apparently, what is happening in step 5 above is that Perl picks
up only the first line from the text input and discards the rest!

This appears to me to be more than the semantics of an assignment
operator: after the r-value (a multi-line text input) is ready to be
assigned to the l-value (the $x var), Perl kind of intervenes and
parses the first line out of the r-value.

I can obviously re-calibrate my intuition based on this behavior but I
before I did so, I thought I'd try to get some better/logical
rationale of the goings-on here.

(Thanks to all those responded so far.)

Uri Guttman · Jul 8, 2009

S> use warnings;
S> use strict 'vars';

S> my $x;
S> print "list context...\n";
S> ($x) = <STDIN>; # Here, I must press Ctrl-D explicitly.

this reads ALL lines from stdin to eof but only assigns the first one to
$x. each line is passed to perl when you hit enter. you need to hit ^D
to send an eof so perl know it finished reading ALL the lines. then
there is no more data to be read.

S> $x = <STDIN>; # Here, control returns after my first <ENTER>.

this resets eof and reads one line. the line isn't seen by perl until
you hit enter due to unix line buffering. perl sees the one line and
assigns it to $x and continues as you would expect.

nothing odd has happened. you don't seem to understand unix line
buffering or how perl deals with STDIN.

S> I fully understand that 'my' has no effect on scalar vs list context
S> being provided to the rhs. What I did (and do) not understand is the
S> semantics. Here's how I thought (and still think) how Perl should've
S> worked:
S> 1. Perl encounters the statement:
S> ($x) = <STDIN>
S> 2. Perl provides a list context to the rhs.
S> 3. The <> operator becomes aware of its list context, and begins
S> expecting/reading (possibly) multiple lines of input until Ctrl-D is
S> seen.
S> 4. Perl holds the input read so far in some internal temporary.
S> 5. Perl sees that there is only 1 variable -- $x -- in the list on
S> the lhs, and so assigns the entire input text to $x, much like the
S> behavior of join('', <STDIN>).

more like perl reads STDIN and parses it into a list of lines all the
way to seeing EOF (^D). then it assigns the list to ($x) which only
takes the first line and puts it into $x. <> knows only about context
and its input, not how its results get assigned.

S> But, apparently, what is happening in step 5 above is that Perl
S> picks up only the first line from the text input and discards the
S> rest!

no. it reads all of the input and puts it into a list. the list is
assigned to a short list of vars ($x) so it assigns the first line and
discards the rest. <> has NOTHING to do with the discarding, the list
assignment has all to do with it.

look at this:

@x = <> ; # type in input as you did before

($x) = @x ;

the same thing happens and the assignment of all the lines in @x to ($x)
causes only the first line to be assigned to $x. in your example there
just is no @x var to hold the lines. perl holds those lines on the stack
during the assignment to ($x).

S> This appears to me to be more than the semantics of an assignment
S> operator: after the r-value (a multi-line text input) is ready to be
S> assigned to the l-value (the $x var), Perl kind of intervenes and
S> parses the first line out of the r-value.

nope. not parsed at all. each perl op is separated and just passes its
values to the next op. <> reads a line or lines to eof based on
context. those lines are put on the stack and passed to the next op
which is assignment. assignment sees a one element list and assigns the
first element on the stack to that list. no different than this:

($x) = 1 .. 3 ;

$x will be 1 but perl will generate 1,2,3 on the stack first.

S> I can obviously re-calibrate my intuition based on this behavior but I
S> before I did so, I thought I'd try to get some better/logical
S> rationale of the goings-on here.

you are just thinking too much. it is much simpler than you
realize. separate each op and then string them together with the stack
holding intermediate results. most all dynamic langs generally do
this. passing complex context like how many elements to even read or
parse is tricky. perl6 will be able to do this due to lazy lists and
such and it will pass deeper context like list lengths. perl5 doesn't do this.

uri

Alan Curry · Jul 9, 2009

Quoth (e-mail address removed) (Alan Curry):

No. my (and local) are among the few exceptions to 'if it looks like a
function',

Exceptions make language learning harder. And it still looks like a function
to me, long after I learned that it doesn't behave the way I thought it
should.

which is one reason why I *always* put a space between the
'my' and the open paren. For example, a sub called 'my' will always be
simply ignored, without even a warning. The other syntax keywords like
'if' and 'while' are the same: presumably you don't interpret

if ($x == 1) {

as a function call?

Interesting question. The answer is no, obviously, but I don't know exactly
why. Because "if" is so deeply ingrained as being a keyword that it's
impossible to read it as a function name, or maybe because it doesn't parse
as a term in an expression.

($x) = ...;

makes the LHS into a list of one element, one of the two cases (the
other being the empty list ()) where parens *are* necessary to construct
a list.

All right, by now we all know now that
($x) = something_returning_a_list();
my ($x) = something_returning_a_list();
both discard all but the first element of the generated list. New questions:

If you saw a statement of that form in perl code, without knowing anything
about the author, would you consider it more likely to be a mistake similar
to the one that started this thread, or a deliberate use of single-element
list assignment?

In your own code, would you actually use a single-element list assignment to
take the first element of a list and discard the rest, or would you make the
discard more explicit, as a favor to future readers of your code?

If it can be agreed that the slightly more verbose forms
$x = (something_returning_a_list())[0];
my $x = (something_returning_a_list())[0];
are easier to read, might we get a warning for the "too subtle" form?

Peter J. Holzer · Jul 9, 2009

Exceptions make language learning harder. And it still looks like a
function to me, long after I learned that it doesn't behave the way I
thought it should.

Maybe my background in statically typed compiled languages like Pascal,
Modula and C is showing here, but my ($foo) never looked like a function
call to me - it is /obviously/ a variable declaration, and a variable
declaration is /obviously/ not a function call, just like if and return
are /obviously/ not function calls. (The premise is wrong of course -
all of these /could/ be implemented as functions, especially in an
interpreted, dynamic language like Perl, but I wouldn't expect them to
be).

All right, by now we all know now that
($x) = something_returning_a_list();
my ($x) = something_returning_a_list();
both discard all but the first element of the generated list. New questions:

If you saw a statement of that form in perl code, without knowing
anything about the author, would you consider it more likely to be a
mistake similar to the one that started this thread, or a deliberate
use of single-element list assignment?

The latter.

In your own code, would you actually use a single-element list
assignment to take the first element of a list and discard the rest,

Yes. I do that all the time. It is a very common Perl idiom:

sub foo {
my ($arg) = @_;

...
}

This assigns the first element of @_ to $arg and
discards the rest.

or would you make the discard more explicit, as a favor to future
readers of your code?

I don't think writing it as

sub foo {
my ($arg, @ignored) = @_;

...
}

or even

sub foo {
my ($arg) = $_[0];

...
}

would be favour to readers. It is much more likely to confuse them.

If it can be agreed that the slightly more verbose forms
$x = (something_returning_a_list())[0];
my $x = (something_returning_a_list())[0];
are easier to read,

I don't agree they are easier to read. I have to explicitely match
parens here to find out what's going on. I do use that form
occasionally, but mosty if the index is substantially larger than 0, as
in

my $mtime = (stat($file))[9];

but I try to avoid that (in this case, by using File::stat).

hp

Ralph Malph · Jul 9, 2009

This was posted in another thread by Jon Kim and is
worth wider
distribution. Users new to this group should be aware
with whom they are dealing with.

Jürgen Exner · Jul 9, 2009

Ralph Malph said:
This was posted in another thread by Jon Kim and is
worth wider
distribution. Users new to this group should be aware
with whom they are dealing with.

New users to this group should also be aware that neither Ralph Malph
nor Jon Kim have ever posted to this NG before. Their only(!)
contribution ever is badmouthing a long-standing, respected member of
CLPM.

I'd call them drive-by posters: no good for nothing

jue

Jürgen Exner · Jul 9, 2009

Jürgen Exner said:
New users to this group should also be aware that neither Ralph Malph
nor Jon Kim have ever posted to this NG before. Their only(!)
contribution ever is badmouthing a long-standing, respected member of
CLPM.

I'd call them drive-by posters: no good for nothing

Oh, and just to state the obvious: of course Mr. Ralph Malph and Mr Jon
Kim just happen to post from the same NNTP server using identical
versions of the same client running on the same OS using exactly the
same configuration. What a remarkable coincidence!

jue

C.DeRykus · Jul 9, 2009

On Jul 8, 4:42 pm, (e-mail address removed) (Alan Curry) wrote:

...

All right, by now we all know now that
($x) = something_returning_a_list();
my ($x) = something_returning_a_list();
both discard all but the first element of the generated list. New questions:
...
If you saw a statement of that form in perl code, withou
If it can be agreed that the slightly more verbose forms
$x = (something_returning_a_list())[0];
my $x = (something_returning_a_list())[0];
are easier to read, might we get a warning for the "too subtle" form?

I tend to agree but IMO the less cluttered

( my $x, () ) = something_returning_a_list();

is an improvement with the empty list () clearly
signaling the throw-away's.

Jürgen Exner · Jul 9, 2009

Tad J McClellan said:
I have a subroutine that returns a 2-element list: a user ID, and the
age of their session.

When I don't intend to use the age of the session
my($user_id) = somefunc();
_IS_ a favor to future readers. It tells them that I don't intend
to use the age.

Or that the programmer isn't aware/forgot that there are more return
values.
(my $user_id, undef) = somefunc();
would be even more explicit.

jue

anno4000 · Jul 9, 2009

C.DeRykus said:
On Jul 8, 4:42 pm, (e-mail address removed) (Alan Curry) wrote:

...

All right, by now we all know now that
($x) = something_returning_a_list();
my ($x) = something_returning_a_list();
both discard all but the first element of the generated list. New questions:
...
If you saw a statement of that form in perl code, withou
If it can be agreed that the slightly more verbose forms
$x = (something_returning_a_list())[0];
my $x = (something_returning_a_list())[0];
are easier to read, might we get a warning for the "too subtle" form?

Click to expand...

I tend to agree but IMO the less cluttered

( my $x, () ) = something_returning_a_list();

is an improvement with the empty list () clearly
signaling the throw-away's.

Ah, but that only looks like a throwaway, but isn't one.

(my $x, (), my $y) = qw(one two three);
say "$x $y"; # "one two"

So it doesn't convey the intention very clearly.

Anno

Tim McDaniel · Jul 9, 2009

Or that the programmer isn't aware/forgot that there are more return
values.
(my $user_id, undef) = somefunc();
would be even more explicit.

I just checked
perl -e 'my($a, undef) = (2, 3); print $a, "\n"'
under 5.005, 5.6.1, 5.8.8, and 5.10, and it prints "2" without a
warning or error in each of them. Even
perl -e 'my (undef) = 2;'
works. (The versions without "my" worked the same as with.)

I think I'll be more likely henceforth to put undef on the left-hand
side just to make it clear that I know that it's throwing away values.

Tim McDaniel · Jul 9, 2009

New users to this group should also be aware that neither Ralph Malph
nor Jon Kim have ever posted to this NG before. Their only(!)
contribution ever is badmouthing a long-standing, respected member of
CLPM.

I'd call them drive-by posters

The usual Internet jargon for at least the second poster is
<http://www.catb.org/jargon/html/S/sock-puppet.html>:

sock puppet: n.

[Usenet: from the act of placing a sock over your hand and talking
to it and pretending it's talking back] In Usenet parlance, a
_pseudo_ through which the puppeteer posts follow-ups to their own
original message to give the appearance that a number of people
support the views held in the original message. See also
_astroturfing_, _tentacle_.

If there was an original person dissing Uri, then both Jon Kim and
Ralph Malph would be sock puppets, of course.

sln · Jul 9, 2009

C.DeRykus said:
C.DeRykus said:

On Jul 8, 4:42 pm, (e-mail address removed) (Alan Curry) wrote:

...

All right, by now we all know now that
($x) = something_returning_a_list();
my ($x) = something_returning_a_list();
both discard all but the first element of the generated list. New questions:
...
If you saw a statement of that form in perl code, withou
If it can be agreed that the slightly more verbose forms
$x = (something_returning_a_list())[0];
my $x = (something_returning_a_list())[0];
are easier to read, might we get a warning for the "too subtle" form?

Click to expand...

I tend to agree but IMO the less cluttered

( my $x, () ) = something_returning_a_list();

is an improvement with the empty list () clearly
signaling the throw-away's.

Click to expand...

Ah, but that only looks like a throwaway, but isn't one.

(my $x, (), my $y) = qw(one two three);
say "$x $y"; # "one two"

So it doesn't convey the intention very clearly.

Anno

Or, more explanitory:

my $result = my ($aa,$bb) = (1,2,3,4,5,6,7);
print $result,"\n";

-sln

C.DeRykus · Jul 9, 2009

On Jul 8, 4:42 pm, (e-mail address removed) (Alan Curry) wrote:

...

Click to expand...

All right, by now we all know now that
($x) = something_returning_a_list();
my ($x) = something_returning_a_list();
both discard all but the first element of the generated list. New questions:
...
If you saw a statement of that form in perl code, withou
If it can be agreed that the slightly more verbose forms
$x = (something_returning_a_list())[0];
my $x = (something_returning_a_list())[0];
are easier to read, might we get a warning for the "too subtle" form?

Click to expand...

Click to expand...

I tend to agree but IMO the less cluttered

Click to expand...

( my $x, () ) = something_returning_a_list();

Click to expand...

is an improvement with the empty list () clearly
signaling the throw-away's.

Click to expand...

Ah, but that only looks like a throwaway, but isn't one.

(my $x, (), my $y) = qw(one two three);
say "$x $y"; # "one two"

So it doesn't convey the intention very clearly.

Maybe the better word would be 'unused'. But, if everything other than
the first return arg isn't
needed, I believe the trailing empty list would
reinforce visually at least that intent more
clearly than the idiomatic, single arg in paren's.
And, IMO, it's aesthetically more pleasing than
indexing into a list-ified tangle.

sln · Jul 10, 2009

On Jul 8, 4:42 pm, (e-mail address removed) (Alan Curry) wrote:

...

Click to expand...

All right, by now we all know now that
($x) = something_returning_a_list();
my ($x) = something_returning_a_list();
both discard all but the first element of the generated list. New questions:
...
If you saw a statement of that form in perl code, withou
If it can be agreed that the slightly more verbose forms
$x = (something_returning_a_list())[0];
my $x = (something_returning_a_list())[0];
are easier to read, might we get a warning for the "too subtle" form?

Click to expand...

I tend to agree but IMO the less cluttered

Click to expand...

( my $x, () ) = something_returning_a_list();

Click to expand...

is an improvement with the empty list () clearly
signaling the throw-away's.

Click to expand...

Ah, but that only looks like a throwaway, but isn't one.

(my $x, (), my $y) = qw(one two three);
say "$x $y"; # "one two"

So it doesn't convey the intention very clearly.

Click to expand...

Maybe the better word would be 'unused'. But, if everything other than
the first return arg isn't
needed, I believe the trailing empty list would
reinforce visually at least that intent more
clearly than the idiomatic, single arg in paren's.
And, IMO, it's aesthetically more pleasing than
indexing into a list-ified tangle.

I'm sure there's an algo that does ()[index] in a loop.
Unused and throwaway are both misnomers.

-sln

I need some help on a format issue that should be simple for someone here (but not me!)	0	Jul 6, 2023
Replace an occurrence of a regexp with a function call on a substringof the match, multiple times on	4	Sep 16, 2013
FAQ 7.23 How do I create a switch or case statement?	0	Jan 26, 2011
Padding strings for a clean visual print out...	5	Dec 23, 2023
FAQ 4.40 What is the difference between $array[1] and @array[1]?	0	Apr 16, 2011
Trying to build a SARIMAX model to forecast the S&P500 trend	0	Nov 5, 2023
FAQ 7.13 What's a closure?	0	Apr 25, 2011
FAQ 7.20 Why doesn't "my($foo) = <FILE>;" work right?	0	Feb 28, 2011

Question on a 'perlsub' statement.

SomeDeveloper

John W. Krahn

Alan Curry

SomeDeveloper

Uri Guttman

Alan Curry

Peter J. Holzer

Ralph Malph

Jürgen Exner

Jürgen Exner

C.DeRykus

Jürgen Exner

anno4000

Tim McDaniel

Tim McDaniel

sln

C.DeRykus

sln

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads