Assigning split to a list: undefined values?

D

Dmitry Epstein

Here is a code snippet:

while ($line = <F>) {
my ($node1, $node2, $node3) = split ' ', $line;
last unless defined $node3;
...
}

The idea here was that the loop stops if the line was split into
fewer than 3 values. It doesn't work however: when the line has
only 2 values, the variable $node3 is initialized with zero instead
of remaining undefined. How come?

(I have since changed the code to use an array instead of a list.)
 
K

ko

Dmitry said:
Here is a code snippet:

while ($line = <F>) {
my ($node1, $node2, $node3) = split ' ', $line;
last unless defined $node3;
...
}

The idea here was that the loop stops if the line was split into
fewer than 3 values. It doesn't work however: when the line has
only 2 values, the variable $node3 is initialized with zero instead
of remaining undefined. How come?

(I have since changed the code to use an array instead of a list.)

Since the pattern ' ' is split()ing on whitespace, the newline on each
line results in an empty trailing field ('', which is a defined value).
So you need to get rid of the newline on each line:

use strict;
use warnings;

while (my $line = <DATA>) {
chomp $line;
my ($node1, $node2, $node3) = split ' ', $line;
last unless defined $node3;
print join(" ", $node1, $node2, $node3), "\n"
}

__DATA__
a s d
a s

HTH- keith
 
D

Dmitry Epstein

ko said:
Since the pattern ' ' is split()ing on whitespace, the newline
on each line results in an empty trailing field ('', which is
a defined value). So you need to get rid of the newline on
each line:

use strict;
use warnings;

while (my $line = <DATA>) {
chomp $line;
my ($node1, $node2, $node3) = split ' ', $line;
last unless defined $node3;
print join(" ", $node1, $node2, $node3), "\n"
}

__DATA__
a s d
a s

HTH- keith

I don't think so. Split on ' ' is supposed to "chomp" any trailing
whitespace. In fact, if I use an array to store the value of split
instead of a list of scalars, the array does not receive a null
value in the end.
 
M

Malcolm Dew-Jones

Dmitry Epstein ([email protected]) wrote:
: : > Dmitry Epstein wrote:
: >> Here is a code snippet:
: >>
: >> while ($line = <F>) {
: >> my ($node1, $node2, $node3) = split ' ', $line;
: >> last unless defined $node3;
: >> ...
: >> }
: >>
: >> The idea here was that the loop stops if the line was split
: >> into fewer than 3 values. It doesn't work however: when the
: >> line has only 2 values, the variable $node3 is initialized
: >> with zero instead of remaining undefined. How come?
: >>
: >> (I have since changed the code to use an array instead of a
: >> list.)
: >>
: >
: > Since the pattern ' ' is split()ing on whitespace, the newline
: > on each line results in an empty trailing field ('', which is
: > a defined value). So you need to get rid of the newline on
: > each line:
: >
: > use strict;
: > use warnings;
: >
: > while (my $line = <DATA>) {
: > chomp $line;
: > my ($node1, $node2, $node3) = split ' ', $line;
: > last unless defined $node3;
: > print join(" ", $node1, $node2, $node3), "\n"
: > }
: >
: > __DATA__
: > a s d
: > a s
: >
: > HTH- keith

: I don't think so. Split on ' ' is supposed to "chomp" any trailing
: whitespace. In fact, if I use an array to store the value of split
: instead of a list of scalars, the array does not receive a null
: value in the end.

Not exactly. The "default" behaviour is to drop empty trailing fields,
but what does "default" mean?

perldoc -f split also states that assigning to a _list_ (which is what you
have) creates an implicit limit on the number of fields to be extracted.

So I am guessing that "default" is refering to the "limit", but since you
have implicitly defined a limit then you don't get the default behaviour.

Instead you get the first three fields, the third of which is the empty
string "found" at the end.


"first-item second-item \n"
1111111111 22222222222 3 (3rd is 0 chars long)
 
B

Brad Baxter

Here is a code snippet:

while ($line = <F>) {
my ($node1, $node2, $node3) = split ' ', $line;
last unless defined $node3;
...
}

The idea here was that the loop stops if the line was split into
fewer than 3 values. It doesn't work however: when the line has
only 2 values, the variable $node3 is initialized with zero instead
of remaining undefined. How come?

I doubt that $node3 is initialized to zero, but it is initialized to '',
apparently because of the "\n" at the end of $line.

That isn't what I would have expected from reading this in perldoc -f
split:

If LIMIT is unspecified or zero, trailing
null fields are stripped (which potential users of
"pop" would do well to remember)

but if you chomp $line, your logic works as it appears you intend...

% cat -n ./qt
1 #!/usr/local/bin/perl
2 use strict;
3 use warnings;
4
5 my $line = '';
6 while ($line = <DATA>) {
7 my ($node1, $node2, $node3) = split " ", $line;
8 last unless defined $node3;
9 #...
10 print "-$node1- -$node2- -$node3- : $line";
11 }
12
13 __DATA__
14 1 2 3
15 1 2

% ./qt
-1- -2- -3- : 1 2 3
-1- -2- -- : 1 2

% cat -n ./qt
1 #!/usr/local/bin/perl
2 use strict;
3 use warnings;
4
5 my $line = '';
6 while ($line = <DATA>) {
7 chomp $line;
8 my ($node1, $node2, $node3) = split " ", $line;
9 last unless defined $node3;
10 #...
11 print "-$node1- -$node2- -$node3- : $line\n";
12 }
13
14 __DATA__
15 1 2 3
16 1 2

% ./qt
-1- -2- -3- : 1 2 3


Regards,

Brad
 
K

ko

Malcolm said:
Dmitry Epstein ([email protected]) wrote:
[snip]

perldoc -f split also states that assigning to a _list_ (which is what you
have) creates an implicit limit on the number of fields to be extracted.

So I am guessing that "default" is refering to the "limit", but since you
have implicitly defined a limit then you don't get the default behaviour.

Thanks for pointing out my error. Missed the most relevant part of the
docs in relation to this problem :(

keith
 
A

Anno Siegel

Dmitry Epstein said:
I don't think so. Split on ' ' is supposed to "chomp" any trailing
whitespace. In fact, if I use an array to store the value of split
instead of a list of scalars, the array does not receive a null
value in the end.

Quite so.

Assigning to a list (with a given number of slots) is equivalent to
supplying a limit parameter of one more than the number of slots to
split(). With a limit parameter, trailing empty fields aren't dropped.

Granted, split() is a little too clever for its own good, but it's all
in the documentation.

Anno
 
B

Brad Baxter

That isn't what I would have expected from reading this in perldoc -f
split:

If LIMIT is unspecified or zero, trailing
null fields are stripped (which potential users of
"pop" would do well to remember)

My apologies. I did also see this:

When assigning to a list, if LIMIT is omitted, Perl
supplies a LIMIT one larger than the number of
variables in the list, to avoid unnecessary work.

But when I tried out code like the following last night, I got it wrong
and confused myself. Will try to improve. :)

1 #!/usr/local/bin/perl
2 use strict;
3 use warnings;
4
5 my $line = "1 2\n";
6
7 my ($node1, $node2, $node3) = split " ", $line;
8 my @x = split " ", $line;
9
10 print "$node1-$node2-$node3\n";
11 print join('-', @x), "\n";

1-2-
1-2


Regards,

Brad
 
D

Dmitry Epstein

(e-mail address removed) (Malcolm Dew-Jones) wrote in
Dmitry Epstein ([email protected]) wrote:
: : > Dmitry Epstein wrote:
: >> Here is a code snippet:
: >>
: >> while ($line = <F>) {
: >> my ($node1, $node2, $node3) = split ' ', $line;
: >> last unless defined $node3;
: >> ...
: >> }
: >>
: >> The idea here was that the loop stops if the line was
: >> split into fewer than 3 values. It doesn't work however:
: >> when the line has only 2 values, the variable $node3 is
: >> initialized with zero instead of remaining undefined. How
: >> come?
: >>
: >> (I have since changed the code to use an array instead of
: >> a list.)
: >>
: >
: > Since the pattern ' ' is split()ing on whitespace, the
: > newline on each line results in an empty trailing field
: > ('', which is a defined value). So you need to get rid of
: > the newline on each line:
: >
: > use strict;
: > use warnings;
: >
: > while (my $line = <DATA>) {
: > chomp $line;
: > my ($node1, $node2, $node3) = split ' ', $line;
: > last unless defined $node3;
: > print join(" ", $node1, $node2, $node3), "\n"
: > }
: >
: > __DATA__
: > a s d
: > a s
: >
: > HTH- keith

: I don't think so. Split on ' ' is supposed to "chomp" any
: trailing whitespace. In fact, if I use an array to store the
: value of split instead of a list of scalars, the array does
: not receive a null value in the end.

Not exactly. The "default" behaviour is to drop empty
trailing fields, but what does "default" mean?

perldoc -f split also states that assigning to a _list_ (which
is what you have) creates an implicit limit on the number of
fields to be extracted.

So I am guessing that "default" is refering to the "limit",
but since you have implicitly defined a limit then you don't
get the default behaviour.

Instead you get the first three fields, the third of which is
the empty string "found" at the end.


"first-item second-item \n"
1111111111 22222222222 3 (3rd is 0 chars long)

Heh, I didn't say my example was the "default" case (split with no
arguments, as I understand it), now did I? :) This is what the
docs say:

As a special case, specifying a PATTERN of space (' ') will split
on white space just as split with no arguments does.

There is a bit more on that case in the docs, but nowhere does it
say that specifying a limit, whether implicitly or explicitly,
somehow abrogates the standard behaviour of split on ' '.

Still confused.
 
A

Anno Siegel

Dmitry Epstein said:
(e-mail address removed) (Malcolm Dew-Jones) wrote in

[from the split doc]
As a special case, specifying a PATTERN of space (' ') will split
on white space just as split with no arguments does.

There is a bit more on that case in the docs, but nowhere does it
say that specifying a limit, whether implicitly or explicitly,
somehow abrogates the standard behaviour of split on ' '.

Put these together (from the same document):

If LIMIT is unspecified or
zero, trailing null fields are stripped...

When assigning to a list, if LIMIT is omitted, or
zero, Perl supplies a LIMIT one larger than the
number of variables in the list...

It isn't explicit, but in combination they say that trailing null fields
will *not* be stripped when assigning to a list.

Anno
 
M

Malcolm Dew-Jones

Dmitry Epstein ([email protected]) wrote:
: (e-mail address removed) (Malcolm Dew-Jones) wrote in
:
: > Dmitry Epstein ([email protected]) wrote:
: >: : >: > Dmitry Epstein wrote:
: >: >> Here is a code snippet:
: >: >>
: >: >> while ($line = <F>) {
: >: >> my ($node1, $node2, $node3) = split ' ', $line;
: >: >> last unless defined $node3;
: >: >> ...
: >: >> }
: >: >>
: >: >> The idea here was that the loop stops if the line was
: >: >> split into fewer than 3 values. It doesn't work however:
: >: >> when the line has only 2 values, the variable $node3 is
: >: >> initialized with zero instead of remaining undefined. How
: >: >> come?
: >: >>
: >: >> (I have since changed the code to use an array instead of
: >: >> a list.)
: >: >>
: >: >
: >: > Since the pattern ' ' is split()ing on whitespace, the
: >: > newline on each line results in an empty trailing field
: >: > ('', which is a defined value). So you need to get rid of
: >: > the newline on each line:
: >: >
: >: > use strict;
: >: > use warnings;
: >: >
: >: > while (my $line = <DATA>) {
: >: > chomp $line;
: >: > my ($node1, $node2, $node3) = split ' ', $line;
: >: > last unless defined $node3;
: >: > print join(" ", $node1, $node2, $node3), "\n"
: >: > }
: >: >
: >: > __DATA__
: >: > a s d
: >: > a s
: >: >
: >: > HTH- keith
: >
: >: I don't think so. Split on ' ' is supposed to "chomp" any
: >: trailing whitespace. In fact, if I use an array to store the
: >: value of split instead of a list of scalars, the array does
: >: not receive a null value in the end.
: >
: > Not exactly. The "default" behaviour is to drop empty
: > trailing fields, but what does "default" mean?
: >
: > perldoc -f split also states that assigning to a _list_ (which
: > is what you have) creates an implicit limit on the number of
: > fields to be extracted.
: >
: > So I am guessing that "default" is refering to the "limit",
: > but since you have implicitly defined a limit then you don't
: > get the default behaviour.
: >
: > Instead you get the first three fields, the third of which is
: > the empty string "found" at the end.
: >
: >
: > "first-item second-item \n"
: > 1111111111 22222222222 3 (3rd is 0 chars long)

: Heh, I didn't say my example was the "default" case (split with no
: arguments, as I understand it), now did I? :) This is what the
: docs say:

: As a special case, specifying a PATTERN of space (' ') will split
: on white space just as split with no arguments does.

The _Default_ behaviour to which I refered, and which I quote below, is in
the first paragraph of the perl I am using today ( v5.6.1 built for
MSWin32-x86-multi-thread )

=item split

... By default,
empty leading fields are preserved, and empty trailing ones are
deleted.

So, what will split do with the trailing empty fields?

: There is a bit more on that case in the docs, but nowhere does it
: say that specifying a limit, whether implicitly or explicitly,
: somehow abrogates the standard behaviour of split on ' '.

No, but by the same token, the docs for the special case of ' ' do not
indicate what happens to the trailing fields, only ever the leading
fields. Since no mention is made of the trailing fields, and since you do
not specify the number of fields, therefore I assume the indicated default
behaviour for trailing fields applies.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top