Assigning split to a list: undefined values?

Discussion in 'Perl Misc' started by Dmitry Epstein, Nov 22, 2003.

  1. Here is a code snippet:

    while ($line = <F>) {
    my ($node1, $node2, $node3) = split ' ', $line;
    last unless defined $node3;
    ...
    }

    The idea here was that the loop stops if the line was split into
    fewer than 3 values. It doesn't work however: when the line has
    only 2 values, the variable $node3 is initialized with zero instead
    of remaining undefined. How come?

    (I have since changed the code to use an array instead of a list.)

    --
    Dmitry Epstein
    Northwestern University, Evanston, IL. USA
    mitia(at)northwestern(dot)edu
     
    Dmitry Epstein, Nov 22, 2003
    #1
    1. Advertising

  2. Dmitry Epstein

    ko Guest

    Dmitry Epstein wrote:
    > Here is a code snippet:
    >
    > while ($line = <F>) {
    > my ($node1, $node2, $node3) = split ' ', $line;
    > last unless defined $node3;
    > ...
    > }
    >
    > The idea here was that the loop stops if the line was split into
    > fewer than 3 values. It doesn't work however: when the line has
    > only 2 values, the variable $node3 is initialized with zero instead
    > of remaining undefined. How come?
    >
    > (I have since changed the code to use an array instead of a list.)
    >


    Since the pattern ' ' is split()ing on whitespace, the newline on each
    line results in an empty trailing field ('', which is a defined value).
    So you need to get rid of the newline on each line:

    use strict;
    use warnings;

    while (my $line = <DATA>) {
    chomp $line;
    my ($node1, $node2, $node3) = split ' ', $line;
    last unless defined $node3;
    print join(" ", $node1, $node2, $node3), "\n"
    }

    __DATA__
    a s d
    a s

    HTH- keith
     
    ko, Nov 22, 2003
    #2
    1. Advertising

  3. ko <> wrote in
    news:bpme0m$25q$:
    > Dmitry Epstein wrote:
    >> Here is a code snippet:
    >>
    >> while ($line = <F>) {
    >> my ($node1, $node2, $node3) = split ' ', $line;
    >> last unless defined $node3;
    >> ...
    >> }
    >>
    >> The idea here was that the loop stops if the line was split
    >> into fewer than 3 values. It doesn't work however: when the
    >> line has only 2 values, the variable $node3 is initialized
    >> with zero instead of remaining undefined. How come?
    >>
    >> (I have since changed the code to use an array instead of a
    >> list.)
    >>

    >
    > Since the pattern ' ' is split()ing on whitespace, the newline
    > on each line results in an empty trailing field ('', which is
    > a defined value). So you need to get rid of the newline on
    > each line:
    >
    > use strict;
    > use warnings;
    >
    > while (my $line = <DATA>) {
    > chomp $line;
    > my ($node1, $node2, $node3) = split ' ', $line;
    > last unless defined $node3;
    > print join(" ", $node1, $node2, $node3), "\n"
    > }
    >
    > __DATA__
    > a s d
    > a s
    >
    > HTH- keith


    I don't think so. Split on ' ' is supposed to "chomp" any trailing
    whitespace. In fact, if I use an array to store the value of split
    instead of a list of scalars, the array does not receive a null
    value in the end.


    --
    Dmitry Epstein
    Northwestern University, Evanston, IL. USA
    mitia(at)northwestern(dot)edu
     
    Dmitry Epstein, Nov 24, 2003
    #3
  4. Dmitry Epstein () wrote:
    : ko <> wrote in
    : news:bpme0m$25q$:
    : > Dmitry Epstein wrote:
    : >> Here is a code snippet:
    : >>
    : >> while ($line = <F>) {
    : >> my ($node1, $node2, $node3) = split ' ', $line;
    : >> last unless defined $node3;
    : >> ...
    : >> }
    : >>
    : >> The idea here was that the loop stops if the line was split
    : >> into fewer than 3 values. It doesn't work however: when the
    : >> line has only 2 values, the variable $node3 is initialized
    : >> with zero instead of remaining undefined. How come?
    : >>
    : >> (I have since changed the code to use an array instead of a
    : >> list.)
    : >>
    : >
    : > Since the pattern ' ' is split()ing on whitespace, the newline
    : > on each line results in an empty trailing field ('', which is
    : > a defined value). So you need to get rid of the newline on
    : > each line:
    : >
    : > use strict;
    : > use warnings;
    : >
    : > while (my $line = <DATA>) {
    : > chomp $line;
    : > my ($node1, $node2, $node3) = split ' ', $line;
    : > last unless defined $node3;
    : > print join(" ", $node1, $node2, $node3), "\n"
    : > }
    : >
    : > __DATA__
    : > a s d
    : > a s
    : >
    : > HTH- keith

    : I don't think so. Split on ' ' is supposed to "chomp" any trailing
    : whitespace. In fact, if I use an array to store the value of split
    : instead of a list of scalars, the array does not receive a null
    : value in the end.

    Not exactly. The "default" behaviour is to drop empty trailing fields,
    but what does "default" mean?

    perldoc -f split also states that assigning to a _list_ (which is what you
    have) creates an implicit limit on the number of fields to be extracted.

    So I am guessing that "default" is refering to the "limit", but since you
    have implicitly defined a limit then you don't get the default behaviour.

    Instead you get the first three fields, the third of which is the empty
    string "found" at the end.


    "first-item second-item \n"
    1111111111 22222222222 3 (3rd is 0 chars long)
     
    Malcolm Dew-Jones, Nov 25, 2003
    #4
  5. Dmitry Epstein

    Brad Baxter Guest

    On Fri, 22 Nov 2003, Dmitry Epstein wrote:

    > Here is a code snippet:
    >
    > while ($line = <F>) {
    > my ($node1, $node2, $node3) = split ' ', $line;
    > last unless defined $node3;
    > ...
    > }
    >
    > The idea here was that the loop stops if the line was split into
    > fewer than 3 values. It doesn't work however: when the line has
    > only 2 values, the variable $node3 is initialized with zero instead
    > of remaining undefined. How come?


    I doubt that $node3 is initialized to zero, but it is initialized to '',
    apparently because of the "\n" at the end of $line.

    That isn't what I would have expected from reading this in perldoc -f
    split:

    If LIMIT is unspecified or zero, trailing
    null fields are stripped (which potential users of
    "pop" would do well to remember)

    but if you chomp $line, your logic works as it appears you intend...

    % cat -n ./qt
    1 #!/usr/local/bin/perl
    2 use strict;
    3 use warnings;
    4
    5 my $line = '';
    6 while ($line = <DATA>) {
    7 my ($node1, $node2, $node3) = split " ", $line;
    8 last unless defined $node3;
    9 #...
    10 print "-$node1- -$node2- -$node3- : $line";
    11 }
    12
    13 __DATA__
    14 1 2 3
    15 1 2

    % ./qt
    -1- -2- -3- : 1 2 3
    -1- -2- -- : 1 2

    % cat -n ./qt
    1 #!/usr/local/bin/perl
    2 use strict;
    3 use warnings;
    4
    5 my $line = '';
    6 while ($line = <DATA>) {
    7 chomp $line;
    8 my ($node1, $node2, $node3) = split " ", $line;
    9 last unless defined $node3;
    10 #...
    11 print "-$node1- -$node2- -$node3- : $line\n";
    12 }
    13
    14 __DATA__
    15 1 2 3
    16 1 2

    % ./qt
    -1- -2- -3- : 1 2 3


    Regards,

    Brad
     
    Brad Baxter, Nov 25, 2003
    #5
  6. Dmitry Epstein

    ko Guest

    Malcolm Dew-Jones wrote:
    > Dmitry Epstein () wrote:


    [snip]

    > perldoc -f split also states that assigning to a _list_ (which is what you
    > have) creates an implicit limit on the number of fields to be extracted.
    >
    > So I am guessing that "default" is refering to the "limit", but since you
    > have implicitly defined a limit then you don't get the default behaviour.
    >


    Thanks for pointing out my error. Missed the most relevant part of the
    docs in relation to this problem :(

    keith
     
    ko, Nov 25, 2003
    #6
  7. Dmitry Epstein

    Anno Siegel Guest

    Dmitry Epstein <> wrote in comp.lang.perl.misc:
    > ko <> wrote in
    > news:bpme0m$25q$:
    > > Dmitry Epstein wrote:
    > >> Here is a code snippet:
    > >>
    > >> while ($line = <F>) {
    > >> my ($node1, $node2, $node3) = split ' ', $line;
    > >> last unless defined $node3;
    > >> ...
    > >> }
    > >>
    > >> The idea here was that the loop stops if the line was split
    > >> into fewer than 3 values. It doesn't work however: when the
    > >> line has only 2 values, the variable $node3 is initialized
    > >> with zero instead of remaining undefined. How come?
    > >>
    > >> (I have since changed the code to use an array instead of a
    > >> list.)
    > >>

    > >
    > > Since the pattern ' ' is split()ing on whitespace, the newline
    > > on each line results in an empty trailing field ('', which is
    > > a defined value). So you need to get rid of the newline on
    > > each line:
    > >
    > > use strict;
    > > use warnings;
    > >
    > > while (my $line = <DATA>) {
    > > chomp $line;
    > > my ($node1, $node2, $node3) = split ' ', $line;
    > > last unless defined $node3;
    > > print join(" ", $node1, $node2, $node3), "\n"
    > > }
    > >
    > > __DATA__
    > > a s d
    > > a s
    > >
    > > HTH- keith

    >
    > I don't think so. Split on ' ' is supposed to "chomp" any trailing
    > whitespace. In fact, if I use an array to store the value of split
    > instead of a list of scalars, the array does not receive a null
    > value in the end.


    Quite so.

    Assigning to a list (with a given number of slots) is equivalent to
    supplying a limit parameter of one more than the number of slots to
    split(). With a limit parameter, trailing empty fields aren't dropped.

    Granted, split() is a little too clever for its own good, but it's all
    in the documentation.

    Anno
     
    Anno Siegel, Nov 25, 2003
    #7
  8. Dmitry Epstein

    Brad Baxter Guest

    On Mon, 24 Nov 2003, Brad Baxter wrote:
    > That isn't what I would have expected from reading this in perldoc -f
    > split:
    >
    > If LIMIT is unspecified or zero, trailing
    > null fields are stripped (which potential users of
    > "pop" would do well to remember)


    My apologies. I did also see this:

    When assigning to a list, if LIMIT is omitted, Perl
    supplies a LIMIT one larger than the number of
    variables in the list, to avoid unnecessary work.

    But when I tried out code like the following last night, I got it wrong
    and confused myself. Will try to improve. :)

    1 #!/usr/local/bin/perl
    2 use strict;
    3 use warnings;
    4
    5 my $line = "1 2\n";
    6
    7 my ($node1, $node2, $node3) = split " ", $line;
    8 my @x = split " ", $line;
    9
    10 print "$node1-$node2-$node3\n";
    11 print join('-', @x), "\n";

    1-2-
    1-2


    Regards,

    Brad
     
    Brad Baxter, Nov 25, 2003
    #8
  9. (Malcolm Dew-Jones) wrote in
    news::

    > Dmitry Epstein () wrote:
    >: ko <> wrote in
    >: news:bpme0m$25q$:
    >: > Dmitry Epstein wrote:
    >: >> Here is a code snippet:
    >: >>
    >: >> while ($line = <F>) {
    >: >> my ($node1, $node2, $node3) = split ' ', $line;
    >: >> last unless defined $node3;
    >: >> ...
    >: >> }
    >: >>
    >: >> The idea here was that the loop stops if the line was
    >: >> split into fewer than 3 values. It doesn't work however:
    >: >> when the line has only 2 values, the variable $node3 is
    >: >> initialized with zero instead of remaining undefined. How
    >: >> come?
    >: >>
    >: >> (I have since changed the code to use an array instead of
    >: >> a list.)
    >: >>
    >: >
    >: > Since the pattern ' ' is split()ing on whitespace, the
    >: > newline on each line results in an empty trailing field
    >: > ('', which is a defined value). So you need to get rid of
    >: > the newline on each line:
    >: >
    >: > use strict;
    >: > use warnings;
    >: >
    >: > while (my $line = <DATA>) {
    >: > chomp $line;
    >: > my ($node1, $node2, $node3) = split ' ', $line;
    >: > last unless defined $node3;
    >: > print join(" ", $node1, $node2, $node3), "\n"
    >: > }
    >: >
    >: > __DATA__
    >: > a s d
    >: > a s
    >: >
    >: > HTH- keith
    >
    >: I don't think so. Split on ' ' is supposed to "chomp" any
    >: trailing whitespace. In fact, if I use an array to store the
    >: value of split instead of a list of scalars, the array does
    >: not receive a null value in the end.
    >
    > Not exactly. The "default" behaviour is to drop empty
    > trailing fields, but what does "default" mean?
    >
    > perldoc -f split also states that assigning to a _list_ (which
    > is what you have) creates an implicit limit on the number of
    > fields to be extracted.
    >
    > So I am guessing that "default" is refering to the "limit",
    > but since you have implicitly defined a limit then you don't
    > get the default behaviour.
    >
    > Instead you get the first three fields, the third of which is
    > the empty string "found" at the end.
    >
    >
    > "first-item second-item \n"
    > 1111111111 22222222222 3 (3rd is 0 chars long)


    Heh, I didn't say my example was the "default" case (split with no
    arguments, as I understand it), now did I? :) This is what the
    docs say:

    As a special case, specifying a PATTERN of space (' ') will split
    on white space just as split with no arguments does.

    There is a bit more on that case in the docs, but nowhere does it
    say that specifying a limit, whether implicitly or explicitly,
    somehow abrogates the standard behaviour of split on ' '.

    Still confused.

    --
    Dmitry Epstein
    Northwestern University, Evanston, IL. USA
    mitia(at)northwestern(dot)edu
     
    Dmitry Epstein, Nov 26, 2003
    #9
  10. Dmitry Epstein

    Anno Siegel Guest

    Dmitry Epstein <> wrote in comp.lang.perl.misc:
    > (Malcolm Dew-Jones) wrote in
    > news::
    > > Dmitry Epstein () wrote:
    > >: ko <> wrote in
    > >: news:bpme0m$25q$:


    > >: >> Here is a code snippet:
    > >: >>
    > >: >> while ($line = <F>) {
    > >: >> my ($node1, $node2, $node3) = split ' ', $line;
    > >: >> last unless defined $node3;
    > >: >> ...
    > >: >> }


    [from the split doc]

    > As a special case, specifying a PATTERN of space (' ') will split
    > on white space just as split with no arguments does.
    >
    > There is a bit more on that case in the docs, but nowhere does it
    > say that specifying a limit, whether implicitly or explicitly,
    > somehow abrogates the standard behaviour of split on ' '.


    Put these together (from the same document):

    If LIMIT is unspecified or
    zero, trailing null fields are stripped...

    When assigning to a list, if LIMIT is omitted, or
    zero, Perl supplies a LIMIT one larger than the
    number of variables in the list...

    It isn't explicit, but in combination they say that trailing null fields
    will *not* be stripped when assigning to a list.

    Anno
     
    Anno Siegel, Nov 26, 2003
    #10
  11. Dmitry Epstein () wrote:
    : (Malcolm Dew-Jones) wrote in
    : news::

    : > Dmitry Epstein () wrote:
    : >: ko <> wrote in
    : >: news:bpme0m$25q$:
    : >: > Dmitry Epstein wrote:
    : >: >> Here is a code snippet:
    : >: >>
    : >: >> while ($line = <F>) {
    : >: >> my ($node1, $node2, $node3) = split ' ', $line;
    : >: >> last unless defined $node3;
    : >: >> ...
    : >: >> }
    : >: >>
    : >: >> The idea here was that the loop stops if the line was
    : >: >> split into fewer than 3 values. It doesn't work however:
    : >: >> when the line has only 2 values, the variable $node3 is
    : >: >> initialized with zero instead of remaining undefined. How
    : >: >> come?
    : >: >>
    : >: >> (I have since changed the code to use an array instead of
    : >: >> a list.)
    : >: >>
    : >: >
    : >: > Since the pattern ' ' is split()ing on whitespace, the
    : >: > newline on each line results in an empty trailing field
    : >: > ('', which is a defined value). So you need to get rid of
    : >: > the newline on each line:
    : >: >
    : >: > use strict;
    : >: > use warnings;
    : >: >
    : >: > while (my $line = <DATA>) {
    : >: > chomp $line;
    : >: > my ($node1, $node2, $node3) = split ' ', $line;
    : >: > last unless defined $node3;
    : >: > print join(" ", $node1, $node2, $node3), "\n"
    : >: > }
    : >: >
    : >: > __DATA__
    : >: > a s d
    : >: > a s
    : >: >
    : >: > HTH- keith
    : >
    : >: I don't think so. Split on ' ' is supposed to "chomp" any
    : >: trailing whitespace. In fact, if I use an array to store the
    : >: value of split instead of a list of scalars, the array does
    : >: not receive a null value in the end.
    : >
    : > Not exactly. The "default" behaviour is to drop empty
    : > trailing fields, but what does "default" mean?
    : >
    : > perldoc -f split also states that assigning to a _list_ (which
    : > is what you have) creates an implicit limit on the number of
    : > fields to be extracted.
    : >
    : > So I am guessing that "default" is refering to the "limit",
    : > but since you have implicitly defined a limit then you don't
    : > get the default behaviour.
    : >
    : > Instead you get the first three fields, the third of which is
    : > the empty string "found" at the end.
    : >
    : >
    : > "first-item second-item \n"
    : > 1111111111 22222222222 3 (3rd is 0 chars long)

    : Heh, I didn't say my example was the "default" case (split with no
    : arguments, as I understand it), now did I? :) This is what the
    : docs say:

    : As a special case, specifying a PATTERN of space (' ') will split
    : on white space just as split with no arguments does.

    The _Default_ behaviour to which I refered, and which I quote below, is in
    the first paragraph of the perl I am using today ( v5.6.1 built for
    MSWin32-x86-multi-thread )

    =item split

    ... By default,
    empty leading fields are preserved, and empty trailing ones are
    deleted.

    So, what will split do with the trailing empty fields?

    : There is a bit more on that case in the docs, but nowhere does it
    : say that specifying a limit, whether implicitly or explicitly,
    : somehow abrogates the standard behaviour of split on ' '.

    No, but by the same token, the docs for the special case of ' ' do not
    indicate what happens to the trailing fields, only ever the leading
    fields. Since no mention is made of the trailing fields, and since you do
    not specify the number of fields, therefore I assume the indicated default
    behaviour for trailing fields applies.
     
    Malcolm Dew-Jones, Nov 26, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    470
  2. Carlos Ribeiro
    Replies:
    11
    Views:
    702
    Alex Martelli
    Sep 17, 2004
  3. Wells
    Replies:
    1
    Views:
    304
    Chris Rebert
    May 14, 2009
  4. trans.  (T. Onoma)

    split on '' (and another for split -1)

    trans. (T. Onoma), Dec 27, 2004, in forum: Ruby
    Replies:
    10
    Views:
    218
    Florian Gross
    Dec 28, 2004
  5. weston
    Replies:
    1
    Views:
    253
    Richard Cornford
    Sep 22, 2006
Loading...

Share This Page