Help: separate difference length of spaces between words

L

Lei

Greetings

I have a file "file.txt"which look like this:

aaaaaaa 11111 XXXX
bbbbb 222222 YYYYYY
.....


The number of spaces between words are not fixed.

If I want to get $D = "first column word" $E = "second column word"
and discard the last word of the row.

Then continue get DATA from the second row.

What will be syntax be?

I am confused on how to use split and pattern in this case. Thanks a lot
 
J

Jim Keenan

Lei said:
Greetings

I have a file "file.txt"which look like this:

aaaaaaa 11111 XXXX
bbbbb 222222 YYYYYY
....


The number of spaces between words are not fixed.

If I want to get $D = "first column word" $E = "second column word"
and discard the last word of the row.

Then continue get DATA from the second row.

What will be syntax be?

I am confused on how to use split and pattern in this case. Thanks a lot
perldoc -f split


# rough code: untested
my (@data);
while (<FH>) {
my @temp = split(/\s+/, $_);
push(@data, [@temp[0..1]);
}

Jim Keenan
 
J

John W. Krahn

Lei said:
I have a file "file.txt"which look like this:

aaaaaaa 11111 XXXX
bbbbb 222222 YYYYYY
....


The number of spaces between words are not fixed.

If I want to get $D = "first column word" $E = "second column word"
and discard the last word of the row.

Then continue get DATA from the second row.

What will be syntax be?

I am confused on how to use split and pattern in this case. Thanks a lot

Assuming that the current row is in $_ then:

my ( $D, $E ) = split;

Which is short for:

my ( $D, $E ) = split ' ', $_;




John
 
K

KKramsch

Assuming that the current row is in $_ then:
my ( $D, $E ) = split;
Which is short for:
my ( $D, $E ) = split ' ', $_;

Which is short for:

my ( $D, $E ) = split /\s+/, $_;

Both the forms that John gives are *special cases* of split.
Splitting at whitespace is common enough that it warrants the
special treatment.

Also note that in all cases, the LHS of the assignment captures
only the first two split items, and discards the rest. If instead
of only three words per line you had 1000, but still only wanted
the first 2, then it may be more efficient to write

my ( $D, $E ) = split ' ', $_, 3;

This would split the line into 3 pieces: the first 2 would be the
desired words, and the last one would the rest of the line (after
the second stretch of whitespace). In contrast, the forms given
earlier would have split the line into 1000 pieces, of which the
last 998 would have been discarded.

Definitely,

perldoc -f split

Karl
 
T

Tad McClellan

KKramsch said:
Which is short for:

my ( $D, $E ) = split /\s+/, $_;


No it isn't, it behaves differently when $_ =~ /^\s/, see:

perldoc -f split

If instead
of only three words per line you had 1000, but still only wanted
the first 2, then it may be more efficient to write ^^^^^^^^
^^^^^^^^
my ( $D, $E ) = split ' ', $_, 3;


Wrong again, you don't need to write it that way to get the
efficiency.

perldoc -f split

says that Perl will do that for you *automatically* when used
like John did above.

In contrast, the forms given
earlier would have split the line into 1000 pieces,


No then wouldn't have.

perldoc -f split


Errr, right!
 
K

KKramsch

In said:
No it isn't, it behaves differently when $_ =~ /^\s/, see:
perldoc -f split


Wrong again, you don't need to write it that way to get the
efficiency.

So, Lei, here's another important lesson: to get the real scoop in
CLPM, don't ask. Just post your best guess as fact, and someone
who really knows will go ballistic and post the correct answer.

Karl
 
T

Tad McClellan

So, Lei, here's another important lesson: to get the real scoop in
CLPM, don't ask. Just post your best guess as fact, and someone
who really knows will go ballistic and post the correct answer.


You can count on that not happening for future posts of yours.
 
B

Ben Morrow

Quoth KKramsch said:
So, Lei, here's another important lesson: to get the real scoop in
CLPM, don't ask. Just post your best guess as fact, and someone
who really knows will go ballistic and post the correct answer.

*plonk*

Ben
 
M

Michele Dondi

# rough code: untested
my (@data);

Why not

my @data;

instead? i.e. do (unnecessary) parens really add to readability?
while (<FH>) {
my @temp = split(/\s+/, $_);

Why explicitly using something that will be used anyways? Why not
writing

my @temp = split /\s+/;

instead? While we're there, why not using

my @temp = split;

instead? It's not *exactly* the same, but it is what the OP wants,
anyways...
push(@data, [@temp[0..1]);

Why C<0..1> when it is just the same as C<0,1>?

All in all, why not

push @data, (split)[0,1] while <FH>;

(i.e. no need for intermediate variables)?


No offense intended,
Michele
 
J

Jim Keenan

Michele Dondi said:
Why not

my @data;

[snip further refinements ]
I was at work when responding to the OP and only had a few minutes to
dash off a response -- didn't even have time to test it, which I
almost always do before posting to this list.

If I had had more time, I would have refined the response as you and
subsequent posters did. But my first inclination was to give the OP
something that spelled everything out.

Jim Keenan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top