Help: separate difference length of spaces between words

Lei · Nov 24, 2004

Greetings

I have a file "file.txt"which look like this:

aaaaaaa 11111 XXXX
bbbbb 222222 YYYYYY
.....

The number of spaces between words are not fixed.

If I want to get $D = "first column word" $E = "second column word"
and discard the last word of the row.

Then continue get DATA from the second row.

What will be syntax be?

I am confused on how to use split and pattern in this case. Thanks a lot

Jim Keenan · Nov 24, 2004

Lei said:
Greetings

I have a file "file.txt"which look like this:

aaaaaaa 11111 XXXX
bbbbb 222222 YYYYYY
....

The number of spaces between words are not fixed.

If I want to get $D = "first column word" $E = "second column word"
and discard the last word of the row.

Then continue get DATA from the second row.

What will be syntax be?

I am confused on how to use split and pattern in this case. Thanks a lot

perldoc -f split

# rough code: untested
my (@data);
while (<FH>) {
my @temp = split(/\s+/, $_);
push(@data, [@temp[0..1]);
}

Jim Keenan

John W. Krahn · Nov 25, 2004

Lei said:
I have a file "file.txt"which look like this:

aaaaaaa 11111 XXXX
bbbbb 222222 YYYYYY
....

The number of spaces between words are not fixed.

If I want to get $D = "first column word" $E = "second column word"
and discard the last word of the row.

Then continue get DATA from the second row.

What will be syntax be?

I am confused on how to use split and pattern in this case. Thanks a lot

Assuming that the current row is in $_ then:

my ( $D, $E ) = split;

Which is short for:

my ( $D, $E ) = split ' ', $_;

John

KKramsch · Nov 25, 2004

Assuming that the current row is in $_ then:

my ( $D, $E ) = split;

Which is short for:

my ( $D, $E ) = split ' ', $_;

Which is short for:

my ( $D, $E ) = split /\s+/, $_;

Both the forms that John gives are *special cases* of split.
Splitting at whitespace is common enough that it warrants the
special treatment.

Also note that in all cases, the LHS of the assignment captures
only the first two split items, and discards the rest. If instead
of only three words per line you had 1000, but still only wanted
the first 2, then it may be more efficient to write

my ( $D, $E ) = split ' ', $_, 3;

This would split the line into 3 pieces: the first 2 would be the
desired words, and the last one would the rest of the line (after
the second stretch of whitespace). In contrast, the forms given
earlier would have split the line into 1000 pieces, of which the
last 998 would have been discarded.

Definitely,

perldoc -f split

Karl

Tad McClellan · Nov 25, 2004

KKramsch said:
Which is short for:

my ( $D, $E ) = split /\s+/, $_;

No it isn't, it behaves differently when $_ =~ /^\s/, see:

perldoc -f split

If instead
of only three words per line you had 1000, but still only wanted
the first 2, then it may be more efficient to write ^^^^^^^^
^^^^^^^^
my ( $D, $E ) = split ' ', $_, 3;

Wrong again, you don't need to write it that way to get the
efficiency.

perldoc -f split

says that Perl will do that for you *automatically* when used
like John did above.

In contrast, the forms given
earlier would have split the line into 1000 pieces,

No then wouldn't have.

perldoc -f split

Errr, right!

KKramsch · Nov 25, 2004

In said:
No it isn't, it behaves differently when $_ =~ /^\s/, see:

perldoc -f split

Wrong again, you don't need to write it that way to get the
efficiency.

So, Lei, here's another important lesson: to get the real scoop in
CLPM, don't ask. Just post your best guess as fact, and someone
who really knows will go ballistic and post the correct answer.

Karl

Tad McClellan · Nov 25, 2004

So, Lei, here's another important lesson: to get the real scoop in
CLPM, don't ask. Just post your best guess as fact, and someone
who really knows will go ballistic and post the correct answer.

You can count on that not happening for future posts of yours.

Ben Morrow · Nov 25, 2004

Quoth KKramsch said:
So, Lei, here's another important lesson: to get the real scoop in
CLPM, don't ask. Just post your best guess as fact, and someone
who really knows will go ballistic and post the correct answer.

*plonk*

Ben

Michele Dondi · Nov 26, 2004

# rough code: untested
my (@data);

Why not

my @data;

instead? i.e. do (unnecessary) parens really add to readability?

while (<FH>) {
my @temp = split(/\s+/, $_);

Why explicitly using something that will be used anyways? Why not
writing

my @temp = split /\s+/;

instead? While we're there, why not using

my @temp = split;

instead? It's not *exactly* the same, but it is what the OP wants,
anyways...

push(@data, [@temp[0..1]);

Why C<0..1> when it is just the same as C<0,1>?

All in all, why not

push @data, (split)[0,1] while <FH>;

(i.e. no need for intermediate variables)?

No offense intended,
Michele

Jim Keenan · Nov 26, 2004

Michele Dondi said:
Why not

my @data;

[snip further refinements ]
I was at work when responding to the OP and only had a few minutes to
dash off a response -- didn't even have time to test it, which I
almost always do before posting to this list.

If I had had more time, I would have refined the response as you and
subsequent posters did. But my first inclination was to give the OP
something that spelled everything out.

Jim Keenan

Help with finding difference between two bodies of text in order	0	Sep 10, 2024
Need help in comparing the string words in two arrays.	5	Apr 29, 2006
PyWart: PEP8: a seething cauldron of inconsistencies.	1	Jul 28, 2011
PyWart: PEP8: A cauldron of inconsistencies.	7	Jul 27, 2011
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
CSV dB script help	9	Jun 2, 2004
[SUMMARY] Word Search Generator (#159)	3	Apr 18, 2008
Finding the value of "TOP" from prior block-containers used	2	Nov 11, 2005

Help: separate difference length of spaces between words

Lei

Jim Keenan

John W. Krahn

KKramsch

Tad McClellan

KKramsch

Tad McClellan

Ben Morrow

Michele Dondi

Jim Keenan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads