Help: separate difference length of spaces between words

Discussion in 'Perl Misc' started by Lei, Nov 25, 2004.

  1. Lei

    Lei Guest


    I have a file "file.txt"which look like this:

    aaaaaaa 11111 XXXX
    bbbbb 222222 YYYYYY

    The number of spaces between words are not fixed.

    If I want to get $D = "first column word" $E = "second column word"
    and discard the last word of the row.

    Then continue get DATA from the second row.

    What will be syntax be?

    I am confused on how to use split and pattern in this case. Thanks a lot
    Lei, Nov 25, 2004
    1. Advertisements

  2. Lei

    Jim Keenan Guest

    perldoc -f split

    # rough code: untested
    my (@data);
    while (<FH>) {
    my @temp = split(/\s+/, $_);
    push(@data, [@temp[0..1]);

    Jim Keenan
    Jim Keenan, Nov 25, 2004
    1. Advertisements

  3. Assuming that the current row is in $_ then:

    my ( $D, $E ) = split;

    Which is short for:

    my ( $D, $E ) = split ' ', $_;

    John W. Krahn, Nov 25, 2004
  4. Lei

    KKramsch Guest

    Which is short for:

    my ( $D, $E ) = split /\s+/, $_;

    Both the forms that John gives are *special cases* of split.
    Splitting at whitespace is common enough that it warrants the
    special treatment.

    Also note that in all cases, the LHS of the assignment captures
    only the first two split items, and discards the rest. If instead
    of only three words per line you had 1000, but still only wanted
    the first 2, then it may be more efficient to write

    my ( $D, $E ) = split ' ', $_, 3;

    This would split the line into 3 pieces: the first 2 would be the
    desired words, and the last one would the rest of the line (after
    the second stretch of whitespace). In contrast, the forms given
    earlier would have split the line into 1000 pieces, of which the
    last 998 would have been discarded.


    perldoc -f split

    KKramsch, Nov 25, 2004

  5. No it isn't, it behaves differently when $_ =~ /^\s/, see:

    perldoc -f split

    Wrong again, you don't need to write it that way to get the

    perldoc -f split

    says that Perl will do that for you *automatically* when used
    like John did above.

    No then wouldn't have.

    Errr, right!
    Tad McClellan, Nov 25, 2004
  6. Lei

    KKramsch Guest

    So, Lei, here's another important lesson: to get the real scoop in
    CLPM, don't ask. Just post your best guess as fact, and someone
    who really knows will go ballistic and post the correct answer.

    KKramsch, Nov 25, 2004

  7. You can count on that not happening for future posts of yours.
    Tad McClellan, Nov 25, 2004
  8. Lei

    Ben Morrow Guest


    Ben Morrow, Nov 25, 2004
  9. Why not

    my @data;

    instead? i.e. do (unnecessary) parens really add to readability?
    Why explicitly using something that will be used anyways? Why not

    my @temp = split /\s+/;

    instead? While we're there, why not using

    my @temp = split;

    instead? It's not *exactly* the same, but it is what the OP wants,
    Why C<0..1> when it is just the same as C<0,1>?

    All in all, why not

    push @data, (split)[0,1] while <FH>;

    (i.e. no need for intermediate variables)?

    No offense intended,
    Michele Dondi, Nov 26, 2004
  10. Lei

    Jim Keenan Guest

    [snip further refinements ]
    I was at work when responding to the OP and only had a few minutes to
    dash off a response -- didn't even have time to test it, which I
    almost always do before posting to this list.

    If I had had more time, I would have refined the response as you and
    subsequent posters did. But my first inclination was to give the OP
    something that spelled everything out.

    Jim Keenan
    Jim Keenan, Nov 26, 2004
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.