read and parse a single line file

  • Thread starter Rainer Weikusat
  • Start date
R

Rainer Weikusat

Right now, I'm dealing with (two) single line files whose single line
contains data in the form of

YYYYMMDD XXXX

the first being a date and the second a counter. So far, I've been using
a pretty conventional

$rc = <$fh>;
chomp($rc);
($date, $counter) = split(/\s+/, $rc);

for getting the data out of the file. While working with this code in
order to add some features to it, it came to me that

($date, $counter) = split for <$fh>

works as well, as does

($date, $counter) = map { split } <$fh>;

but I like the first one better. Comments or alternate suggestions?
 
J

Jim Gibson

Rainer said:
Right now, I'm dealing with (two) single line files whose single line
contains data in the form of

YYYYMMDD XXXX

the first being a date and the second a counter. So far, I've been using
a pretty conventional

$rc = <$fh>;
chomp($rc);
($date, $counter) = split(/\s+/, $rc);

for getting the data out of the file. While working with this code in
order to add some features to it, it came to me that

($date, $counter) = split for <$fh>

works as well, as does

($date, $counter) = map { split } <$fh>;

but I like the first one better. Comments or alternate suggestions?

<> in scalar context will read one line, whereas <> in list context
will read the entire file. Since your files only have one line, it
doesn't matter. But what if in the future a blank line gets added to
the end of the file. I would prefer that my code still worked, so I
would prefer a solution that keeps <> in scalar context and only reads
the first line.

What about this:

($date, $counter) = split(' ',<$fh>);

That has <> in scalar context and also takes advantage of the
skip-any-null-fields-at-the-beginning feature of split with a single
space first argument, which is the default for split with no arguments
as in your second and third solutions.
 
G

gamo

El 02/04/14 01:27, Jim Gibson escribió:
This is a clear solution.


This is not clear, and does a for for one element (one string).
And where is the chomp?
 
R

Rainer Weikusat

gamo said:
El 02/04/14 01:27, Jim Gibson escribió:

This is a clear solution.

It's a seriously verbose solution. In particular, I'd like to get rid of
the helper variable.
This is not clear, and does a for for one element (one string).
And where is the chomp?

Can you provide a definition of 'clear' which is not "different from
what I'm used to"? The 'foreach' for aliases all elements of the list to
$_ in turn and then executes whatever the 'loop body' happens to be, in
this case, the statement annotated with the for statement
modifier. Using for in this way is actually a Perl-idiom because it is
one of the 'traditional' ways to emulate a switch-style multi-way
conditional, eg (untested)

for ($text) {
/supersonic/ && do {
 
R

Rainer Weikusat

Jim Gibson said:
Rainer said:
Right now, I'm dealing with (two) single line files whose single line
contains data in the form of

YYYYMMDD XXXX

the first being a date and the second a counter.
[...]
While working with this code in
order to add some features to it, it came to me that

($date, $counter) = split for <$fh>

works
[...]

<> in scalar context will read one line, whereas <> in list context
will read the entire file. Since your files only have one line, it
doesn't matter. But what if in the future a blank line gets added to
the end of the file. I would prefer that my code still worked, so I
would prefer a solution that keeps <> in scalar context and only reads
the first line.

The first thing I noticed about that is that I now need to truncate the
file before updating it to prevent a trailing blank line from appearing
in case the counter wraps from a two-digit to a one-digit number when
the date changes ;-).
What about this:

($date, $counter) = split(' ',<$fh>);

See also "cannot see the forest because of all the trees". All the
one-line variants have one common problem, though: They're
debugging-unfriendly because it is not easily possible to inspect the
data read from the file before processing it. Presently, I'm thinking
about either using a helper variable nevertheless or something like

for (<$fh>) {
($date, $counter) = split;
}

possibly with the additional requirement that the counter will become a
fixed-width field.
 
J

John Bokma

Rainer Weikusat said:
for (<$fh>) {
($date, $counter) = split;
}

When I see this code it gives me the impression (out of context) that
the author wants to have the 2 values on the last line. Which is
correct, since there is only one. If I would use this, I probably would
add:

# There is only one line; get the 2 values on this line.

I probably would write it like this:

chomp ( my $line = <$fh> );
my ( $date, $counter ) = split ' ', $line;

As for the fixed field, I probably would use

truncate( $fh ) or die "Can't truncate '$filename': $!";
 
R

Rainer Weikusat

John Bokma said:
When I see this code it gives me the impression (out of context) that
the author wants to have the 2 values on the last line. Which is
correct, since there is only one. If I would use this, I probably would
add:

# There is only one line; get the 2 values on this line.

I probably would write it like this:

chomp ( my $line = <$fh> );
my ( $date, $counter ) = split ' ', $line;

After flirting with

local $_ = <$fh>;
($date, $counter) = split;

I've meanwhile settled on

$rc = <$fh>;
($date, $counter) = split(' ', $rc);

as the 'least byzantine way to express what I want' which has at least a
'simplified split' (' ' instead of /\s+/) and does away with the
redundant chomp.

The third programming language I learnt (after Apple Basic and 65C02
machine language[*]) was Pascal which is strictly 'declare everything
before use' and forces declarations of similar things to occur in
blocks, eg, 'all constants, all types, all variables'. I've mostly kept
this as a habit and in particular, I start every subroutine with
declarations of all 'local' (as in 'my', not as in 'local') variables. I
consider declarations distributed all throughout the code extremely
messy, not only because the mixing of 'different things' (declarations
and statements) but also because this tends to hide the real complexity
of the subroutine in question: If all variables are declared at the top,
subroutines ripe for segmentation can be identified by this list
becoming 'lengthy and messy', ie, containing lots of variables and
'strange naming conventions' in order to avoid name clashes.

[*] As a friendly reminder, a home computer looks like this:

http://upload.wikimedia.org/wikiped..._monitor.jpg/600px-Apple_IIc_with_monitor.jpg

and not like this

http://upload.wikimedia.org/wikipedia/commons/thumb/5/5e/Toes.jpg/800px-Toes.jpg

even if you have 64 of them (in German, C is pronunced like Zeh which
means toe).
 
J

John Bokma

The third programming language I learnt (after Apple Basic and 65C02
machine language[*]) was Pascal which is strictly 'declare everything

If you don't count COMAL, same here. At least that's what I recall. And
replace Apple with Sinclair and 65C02 with Z80 ;-)
before use' and forces declarations of similar things to occur in
blocks, eg, 'all constants, all types, all variables'. I've mostly kept
this as a habit and in particular, I start every subroutine with
declarations of all 'local' (as in 'my', not as in 'local') variables. I
consider declarations distributed all throughout the code extremely
messy, not only because the mixing of 'different things' (declarations
and statements) but also because this tends to hide the real complexity
of the subroutine in question: If all variables are declared at the top,
subroutines ripe for segmentation can be identified by this list
becoming 'lengthy and messy', ie, containing lots of variables and
'strange naming conventions' in order to avoid name clashes.

I split a sub if:

- it makes it more readable as in I can move lines of code to a
separate sub and replace this with a call that makes the code more
easy to read.
- it has too many lines (more than 60 or so) and it makes sense to
split it.

And I do prefer to put my close to first use (makes factoring out
easier). But that probably also has a lot to do with that I like early
returns, etc. And a bunch of mys followed by a .... or return (or return
if ... ) looks weird to me.
even if you have 64 of them (in German, C is pronunced like Zeh which
means toe).

Ah, didn't know that even though being Dutch and having had one year of
German at school, and having read quite some (well written) German
computer magazines back in the day.
 
G

George Mpouras

<$fh> =~/^(?<date>\w+)\s+(?<counter>\w+)/;
print "*$+{date}* *$+{counter}*\n";

or

read $fh, my $date, 8;
seek $fh, 1,1;
read $fh, my $count, 4;
 
R

Rainer Weikusat

George Mpouras said:
<$fh> =~/^(?<date>\w+)\s+(?<counter>\w+)/;
print "*$+{date}* *$+{counter}*\n";

Slightly modified variant:

($date, $counter) = <$fh> =~ /(\d+)\s+(\d+)/;

Another we didn't have so far:

($date said:
read $fh, my $date, 8;
seek $fh, 1,1;
read $fh, my $count, 4;

This won't work because the count isn't a fixed-width field. Using

read($fh, $date, 8)
$counter = <$fh> + 0;

would, though.
 
R

Rainer Weikusat

[...]
Another we didn't have so far:

($date, $counter) = unpack('A8xA', <$fh>);

This doesn't work either, as it only uses the first character of the
counter.

($date, $counter) = unpack('A9A*', <$fh>);
 
G

George Mpouras

Στις 3/4/2014 18:07, ο/η Rainer Weikusat έγÏαψε:
($date, $counter) = unpack('A9A*', <$fh>);

my @array = unpack "A9 A*", <$fh>;
 
R

Rainer Weikusat

George Mpouras said:
Στις 3/4/2014 18:07, ο/η Rainer Weikusat έγÏαψε:

my @array = unpack "A9 A*", <$fh>;

What is this now supposed to communicate?
 
G

George Mpouras

my @array = unpack "A9 A*" said:
What is this now supposed to communicate?

#!/usr/bin/perl
use strict;
use warnings;
open my $fh, 'file.txt' or die;
@{$_}{qw/date x count/} = unpack "A8ZA*", <$fh>;
print "*$_->{date}*";
print "*$_->{count}*";
 
G

George Mpouras

Στις 3/4/2014 20:47, ο/η Rainer Weikusat έγÏαψε:
What is this now supposed to communicate?


# substr is considered faster than regexs

open my $fh, 'file.txt' or die;
$_ = <$fh>;
my $date = substr $_, 0, 8, '';
my $count = substr $_, 1;


print "*$date* *$count*\n";
 
R

Rainer Weikusat

George Mpouras said:
Στις 3/4/2014 20:47, ο/η Rainer Weikusat έγÏαψε:


# substr is considered faster than regexs

open my $fh, 'file.txt' or die;
$_ = <$fh>;
my $date = substr $_, 0, 8, '';
my $count = substr $_, 1;


print "*$date* *$count*\n";

$date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>

?
 
R

Rainer Weikusat

Rainer Weikusat said:
$date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>

ts, ts, ts ... hasty postings bad ...

$date=substr($_,0,-(length($count=substr($_,rindex($_,' ')+1,-1))+2))for<$fh>
 
G

George Mpouras

$date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>


nice, but something goes wrong.
for file content "YYYYMMDD 123"
I got

*YYYYMMDD 1* *12*
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top