read and parse a single line file

Rainer Weikusat · Apr 1, 2014

Right now, I'm dealing with (two) single line files whose single line
contains data in the form of

YYYYMMDD XXXX

the first being a date and the second a counter. So far, I've been using
a pretty conventional

$rc = <$fh>;
chomp($rc);
($date, $counter) = split(/\s+/, $rc);

for getting the data out of the file. While working with this code in
order to add some features to it, it came to me that

($date, $counter) = split for <$fh>

works as well, as does

($date, $counter) = map { split } <$fh>;

but I like the first one better. Comments or alternate suggestions?

Jim Gibson · Apr 2, 2014

Rainer said:
Right now, I'm dealing with (two) single line files whose single line
contains data in the form of

YYYYMMDD XXXX

the first being a date and the second a counter. So far, I've been using
a pretty conventional

$rc = <$fh>;
chomp($rc);
($date, $counter) = split(/\s+/, $rc);

for getting the data out of the file. While working with this code in
order to add some features to it, it came to me that

($date, $counter) = split for <$fh>

works as well, as does

($date, $counter) = map { split } <$fh>;

but I like the first one better. Comments or alternate suggestions?

<> in scalar context will read one line, whereas <> in list context
will read the entire file. Since your files only have one line, it
doesn't matter. But what if in the future a blank line gets added to
the end of the file. I would prefer that my code still worked, so I
would prefer a solution that keeps <> in scalar context and only reads
the first line.

What about this:

($date, $counter) = split(' ',<$fh>);

That has <> in scalar context and also takes advantage of the
skip-any-null-fields-at-the-beginning feature of split with a single
space first argument, which is the default for split with no arguments
as in your second and third solutions.

gamo · Apr 2, 2014

El 02/04/14 01:27, Jim Gibson escribió:
This is a clear solution.

This is not clear, and does a for for one element (one string).
And where is the chomp?

Rainer Weikusat · Apr 2, 2014

gamo said:
El 02/04/14 01:27, Jim Gibson escribió:

This is a clear solution.

It's a seriously verbose solution. In particular, I'd like to get rid of
the helper variable.

This is not clear, and does a for for one element (one string).
And where is the chomp?

Can you provide a definition of 'clear' which is not "different from
what I'm used to"? The 'foreach' for aliases all elements of the list to
$_ in turn and then executes whatever the 'loop body' happens to be, in
this case, the statement annotated with the for statement
modifier. Using for in this way is actually a Perl-idiom because it is
one of the 'traditional' ways to emulate a switch-style multi-way
conditional, eg (untested)

for ($text) {
/supersonic/ && do {

Rainer Weikusat · Apr 2, 2014

Jim Gibson said:
Rainer said:

Right now, I'm dealing with (two) single line files whose single line
contains data in the form of

YYYYMMDD XXXX

the first being a date and the second a counter.

Click to expand...

[...]

While working with this code in
order to add some features to it, it came to me that

($date, $counter) = split for <$fh>

works

Click to expand...

[...]

<> in scalar context will read one line, whereas <> in list context
will read the entire file. Since your files only have one line, it
doesn't matter. But what if in the future a blank line gets added to
the end of the file. I would prefer that my code still worked, so I
would prefer a solution that keeps <> in scalar context and only reads
the first line.

The first thing I noticed about that is that I now need to truncate the
file before updating it to prevent a trailing blank line from appearing
in case the counter wraps from a two-digit to a one-digit number when
the date changes ;-).

What about this:

($date, $counter) = split(' ',<$fh>);

See also "cannot see the forest because of all the trees". All the
one-line variants have one common problem, though: They're
debugging-unfriendly because it is not easily possible to inspect the
data read from the file before processing it. Presently, I'm thinking
about either using a helper variable nevertheless or something like

for (<$fh>) {
($date, $counter) = split;
}

possibly with the additional requirement that the counter will become a
fixed-width field.

John Bokma · Apr 2, 2014

Rainer Weikusat said:
for (<$fh>) {
($date, $counter) = split;
}

When I see this code it gives me the impression (out of context) that
the author wants to have the 2 values on the last line. Which is
correct, since there is only one. If I would use this, I probably would
add:

# There is only one line; get the 2 values on this line.

I probably would write it like this:

chomp ( my $line = <$fh> );
my ( $date, $counter ) = split ' ', $line;

As for the fixed field, I probably would use

truncate( $fh ) or die "Can't truncate '$filename': $!";

Rainer Weikusat · Apr 2, 2014

John Bokma said:
When I see this code it gives me the impression (out of context) that
the author wants to have the 2 values on the last line. Which is
correct, since there is only one. If I would use this, I probably would
add:

# There is only one line; get the 2 values on this line.

I probably would write it like this:

chomp ( my $line = <$fh> );
my ( $date, $counter ) = split ' ', $line;

After flirting with

local $_ = <$fh>;
($date, $counter) = split;

I've meanwhile settled on

$rc = <$fh>;
($date, $counter) = split(' ', $rc);

as the 'least byzantine way to express what I want' which has at least a
'simplified split' (' ' instead of /\s+/) and does away with the
redundant chomp.

The third programming language I learnt (after Apple Basic and 65C02
machine language[*]) was Pascal which is strictly 'declare everything
before use' and forces declarations of similar things to occur in
blocks, eg, 'all constants, all types, all variables'. I've mostly kept
this as a habit and in particular, I start every subroutine with
declarations of all 'local' (as in 'my', not as in 'local') variables. I
consider declarations distributed all throughout the code extremely
messy, not only because the mixing of 'different things' (declarations
and statements) but also because this tends to hide the real complexity
of the subroutine in question: If all variables are declared at the top,
subroutines ripe for segmentation can be identified by this list
becoming 'lengthy and messy', ie, containing lots of variables and
'strange naming conventions' in order to avoid name clashes.

[*] As a friendly reminder, a home computer looks like this:

http://upload.wikimedia.org/wikiped..._monitor.jpg/600px-Apple_IIc_with_monitor.jpg

and not like this

http://upload.wikimedia.org/wikipedia/commons/thumb/5/5e/Toes.jpg/800px-Toes.jpg

even if you have 64 of them (in German, C is pronunced like Zeh which
means toe).

John Bokma · Apr 2, 2014

The third programming language I learnt (after Apple Basic and 65C02
machine language[*]) was Pascal which is strictly 'declare everything

If you don't count COMAL, same here. At least that's what I recall. And
replace Apple with Sinclair and 65C02 with Z80 ;-)

before use' and forces declarations of similar things to occur in
blocks, eg, 'all constants, all types, all variables'. I've mostly kept
this as a habit and in particular, I start every subroutine with
declarations of all 'local' (as in 'my', not as in 'local') variables. I
consider declarations distributed all throughout the code extremely
messy, not only because the mixing of 'different things' (declarations
and statements) but also because this tends to hide the real complexity
of the subroutine in question: If all variables are declared at the top,
subroutines ripe for segmentation can be identified by this list
becoming 'lengthy and messy', ie, containing lots of variables and
'strange naming conventions' in order to avoid name clashes.

I split a sub if:

- it makes it more readable as in I can move lines of code to a
separate sub and replace this with a call that makes the code more
easy to read.
- it has too many lines (more than 60 or so) and it makes sense to
split it.

And I do prefer to put my close to first use (makes factoring out
easier). But that probably also has a lot to do with that I like early
returns, etc. And a bunch of mys followed by a .... or return (or return
if ... ) looks weird to me.

even if you have 64 of them (in German, C is pronunced like Zeh which
means toe).

Ah, didn't know that even though being Dutch and having had one year of
German at school, and having read quite some (well written) German
computer magazines back in the day.

Peter J. Holzer · Apr 2, 2014

It's a seriously verbose solution.

The chomp is unnecessary, as you already noticed.

In particular, I'd like to get rid of the helper variable.

Why not:
($date, $counter) = split(/\s+/, <$fh>);
?
hp

George Mpouras · Apr 2, 2014

<$fh> =~/^(?<date>\w+)\s+(?<counter>\w+)/;
print "*$+{date}* *$+{counter}*\n";

or

read $fh, my $date, 8;
seek $fh, 1,1;
read $fh, my $count, 4;

Rainer Weikusat · Apr 3, 2014

George Mpouras said:
<$fh> =~/^(?<date>\w+)\s+(?<counter>\w+)/;
print "*$+{date}* *$+{counter}*\n";

Slightly modified variant:

($date, $counter) = <$fh> =~ /(\d+)\s+(\d+)/;

Another we didn't have so far:

($date said:
read $fh, my $date, 8;
seek $fh, 1,1;
read $fh, my $count, 4;

This won't work because the count isn't a fixed-width field. Using

read($fh, $date, 8)
$counter = <$fh> + 0;

would, though.

Rainer Weikusat · Apr 3, 2014

[...]

Another we didn't have so far:

($date, $counter) = unpack('A8xA', <$fh>);

This doesn't work either, as it only uses the first character of the
counter.

($date, $counter) = unpack('A9A*', <$fh>);

George Mpouras · Apr 3, 2014

Î£Ï„Î¹Ï‚ 3/4/2014 18:07, Î¿/Î· Rainer Weikusat ÎÎ³ÏÎ±ÏˆÎµ:

($date, $counter) = unpack('A9A*', <$fh>);

my @array = unpack "A9 A*", <$fh>;

Rainer Weikusat · Apr 3, 2014

George Mpouras said:
Î£Ï„Î¹Ï‚ 3/4/2014 18:07, Î¿/Î· Rainer Weikusat ÎÎ³ÏÎ±ÏˆÎµ:

my @array = unpack "A9 A*", <$fh>;

What is this now supposed to communicate?

George Mpouras · Apr 3, 2014

my @array = unpack "A9 A*" said:
What is this now supposed to communicate?

#!/usr/bin/perl
use strict;
use warnings;
open my $fh, 'file.txt' or die;
@{$_}{qw/date x count/} = unpack "A8ZA*", <$fh>;
print "*$_->{date}*";
print "*$_->{count}*";

George Mpouras · Apr 3, 2014

Î£Ï„Î¹Ï‚ 3/4/2014 20:47, Î¿/Î· Rainer Weikusat ÎÎ³ÏÎ±ÏˆÎµ:

What is this now supposed to communicate?

# substr is considered faster than regexs

open my $fh, 'file.txt' or die;
$_ = <$fh>;
my $date = substr $_, 0, 8, '';
my $count = substr $_, 1;

print "*$date* *$count*\n";

Rainer Weikusat · Apr 3, 2014

George Mpouras said:
Î£Ï„Î¹Ï‚ 3/4/2014 20:47, Î¿/Î· Rainer Weikusat ÎÎ³ÏÎ±ÏˆÎµ:

# substr is considered faster than regexs

open my $fh, 'file.txt' or die;
$_ = <$fh>;
my $date = substr $_, 0, 8, '';
my $count = substr $_, 1;

print "*$date* *$count*\n";

$date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>

?

Rainer Weikusat · Apr 3, 2014

Rainer Weikusat said:
$date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>

ts, ts, ts ... hasty postings bad ...

$date=substr($_,0,-(length($count=substr($_,rindex($_,' ')+1,-1))+2))for<$fh>

George Mpouras · Apr 3, 2014

$date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>

nice, but something goes wrong.
for file content "YYYYMMDD 123"
I got

*YYYYMMDD 1* *12*

George Mpouras · Apr 3, 2014

the last character is missing

*YYYYMMDD* *12*

Single-File Inheritance	9	Jan 18, 2012
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
UTF-8 read & print?	6	Nov 25, 2012
How to read input data from pipe, file and files	1	Jul 9, 2007
parse a csv file into a text file	29	Feb 6, 2014
FAQ 5.29 How can I read in an entire file all at once?	0	Mar 16, 2011
[2.5.1] Read each line from txt file, replace, and save?	4	Sep 2, 2012
wsdl2perl.pl doesn't completely parse the WSDL file for all its types	1	May 8, 2012

read and parse a single line file

Rainer Weikusat

Jim Gibson

gamo

Rainer Weikusat

Rainer Weikusat

John Bokma

Rainer Weikusat

John Bokma

Peter J. Holzer

George Mpouras

Rainer Weikusat

Rainer Weikusat

George Mpouras

Rainer Weikusat

George Mpouras

George Mpouras

Rainer Weikusat

Rainer Weikusat

George Mpouras

George Mpouras

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads