read and parse a single line file

Discussion in 'Perl Misc' started by Rainer Weikusat, Apr 1, 2014.

  1. Right now, I'm dealing with (two) single line files whose single line
    contains data in the form of

    YYYYMMDD XXXX

    the first being a date and the second a counter. So far, I've been using
    a pretty conventional

    $rc = <$fh>;
    chomp($rc);
    ($date, $counter) = split(/\s+/, $rc);

    for getting the data out of the file. While working with this code in
    order to add some features to it, it came to me that

    ($date, $counter) = split for <$fh>

    works as well, as does

    ($date, $counter) = map { split } <$fh>;

    but I like the first one better. Comments or alternate suggestions?
     
    Rainer Weikusat, Apr 1, 2014
    #1
    1. Advertisements

  2. Rainer Weikusat

    Jim Gibson Guest

    <> in scalar context will read one line, whereas <> in list context
    will read the entire file. Since your files only have one line, it
    doesn't matter. But what if in the future a blank line gets added to
    the end of the file. I would prefer that my code still worked, so I
    would prefer a solution that keeps <> in scalar context and only reads
    the first line.

    What about this:

    ($date, $counter) = split(' ',<$fh>);

    That has <> in scalar context and also takes advantage of the
    skip-any-null-fields-at-the-beginning feature of split with a single
    space first argument, which is the default for split with no arguments
    as in your second and third solutions.
     
    Jim Gibson, Apr 2, 2014
    #2
    1. Advertisements

  3. Rainer Weikusat

    gamo Guest

    El 02/04/14 01:27, Jim Gibson escribió:
    This is a clear solution.


    This is not clear, and does a for for one element (one string).
    And where is the chomp?
     
    gamo, Apr 2, 2014
    #3
  4. It's a seriously verbose solution. In particular, I'd like to get rid of
    the helper variable.
    Can you provide a definition of 'clear' which is not "different from
    what I'm used to"? The 'foreach' for aliases all elements of the list to
    $_ in turn and then executes whatever the 'loop body' happens to be, in
    this case, the statement annotated with the for statement
    modifier. Using for in this way is actually a Perl-idiom because it is
    one of the 'traditional' ways to emulate a switch-style multi-way
    conditional, eg (untested)

    for ($text) {
    /supersonic/ && do {
     
    Rainer Weikusat, Apr 2, 2014
    #4
  5. The first thing I noticed about that is that I now need to truncate the
    file before updating it to prevent a trailing blank line from appearing
    in case the counter wraps from a two-digit to a one-digit number when
    the date changes ;-).
    See also "cannot see the forest because of all the trees". All the
    one-line variants have one common problem, though: They're
    debugging-unfriendly because it is not easily possible to inspect the
    data read from the file before processing it. Presently, I'm thinking
    about either using a helper variable nevertheless or something like

    for (<$fh>) {
    ($date, $counter) = split;
    }

    possibly with the additional requirement that the counter will become a
    fixed-width field.
     
    Rainer Weikusat, Apr 2, 2014
    #5
  6. Rainer Weikusat

    John Bokma Guest

    When I see this code it gives me the impression (out of context) that
    the author wants to have the 2 values on the last line. Which is
    correct, since there is only one. If I would use this, I probably would
    add:

    # There is only one line; get the 2 values on this line.

    I probably would write it like this:

    chomp ( my $line = <$fh> );
    my ( $date, $counter ) = split ' ', $line;

    As for the fixed field, I probably would use

    truncate( $fh ) or die "Can't truncate '$filename': $!";
     
    John Bokma, Apr 2, 2014
    #6
  7. After flirting with

    local $_ = <$fh>;
    ($date, $counter) = split;

    I've meanwhile settled on

    $rc = <$fh>;
    ($date, $counter) = split(' ', $rc);

    as the 'least byzantine way to express what I want' which has at least a
    'simplified split' (' ' instead of /\s+/) and does away with the
    redundant chomp.

    The third programming language I learnt (after Apple Basic and 65C02
    machine language[*]) was Pascal which is strictly 'declare everything
    before use' and forces declarations of similar things to occur in
    blocks, eg, 'all constants, all types, all variables'. I've mostly kept
    this as a habit and in particular, I start every subroutine with
    declarations of all 'local' (as in 'my', not as in 'local') variables. I
    consider declarations distributed all throughout the code extremely
    messy, not only because the mixing of 'different things' (declarations
    and statements) but also because this tends to hide the real complexity
    of the subroutine in question: If all variables are declared at the top,
    subroutines ripe for segmentation can be identified by this list
    becoming 'lengthy and messy', ie, containing lots of variables and
    'strange naming conventions' in order to avoid name clashes.

    [*] As a friendly reminder, a home computer looks like this:

    http://upload.wikimedia.org/wikiped..._monitor.jpg/600px-Apple_IIc_with_monitor.jpg

    and not like this

    http://upload.wikimedia.org/wikipedia/commons/thumb/5/5e/Toes.jpg/800px-Toes.jpg

    even if you have 64 of them (in German, C is pronunced like Zeh which
    means toe).
     
    Rainer Weikusat, Apr 2, 2014
    #7
  8. Rainer Weikusat

    John Bokma Guest

    If you don't count COMAL, same here. At least that's what I recall. And
    replace Apple with Sinclair and 65C02 with Z80 ;-)
    I split a sub if:

    - it makes it more readable as in I can move lines of code to a
    separate sub and replace this with a call that makes the code more
    easy to read.
    - it has too many lines (more than 60 or so) and it makes sense to
    split it.

    And I do prefer to put my close to first use (makes factoring out
    easier). But that probably also has a lot to do with that I like early
    returns, etc. And a bunch of mys followed by a .... or return (or return
    if ... ) looks weird to me.
    Ah, didn't know that even though being Dutch and having had one year of
    German at school, and having read quite some (well written) German
    computer magazines back in the day.
     
    John Bokma, Apr 2, 2014
    #8
  9. The chomp is unnecessary, as you already noticed.
    Why not:
    ($date, $counter) = split(/\s+/, <$fh>);
    ?
    hp
     
    Peter J. Holzer, Apr 2, 2014
    #9
  10. <$fh> =~/^(?<date>\w+)\s+(?<counter>\w+)/;
    print "*$+{date}* *$+{counter}*\n";

    or

    read $fh, my $date, 8;
    seek $fh, 1,1;
    read $fh, my $count, 4;
     
    George Mpouras, Apr 2, 2014
    #10
  11. Slightly modified variant:

    ($date, $counter) = <$fh> =~ /(\d+)\s+(\d+)/;

    Another we didn't have so far:

    This won't work because the count isn't a fixed-width field. Using

    read($fh, $date, 8)
    $counter = <$fh> + 0;

    would, though.
     
    Rainer Weikusat, Apr 3, 2014
    #11
  12. [...]
    This doesn't work either, as it only uses the first character of the
    counter.

    ($date, $counter) = unpack('A9A*', <$fh>);
     
    Rainer Weikusat, Apr 3, 2014
    #12
  13. Στις 3/4/2014 18:07, ο/η Rainer Weikusat έγÏαψε:
    my @array = unpack "A9 A*", <$fh>;
     
    George Mpouras, Apr 3, 2014
    #13
  14. What is this now supposed to communicate?
     
    Rainer Weikusat, Apr 3, 2014
    #14
  15. #!/usr/bin/perl
    use strict;
    use warnings;
    open my $fh, 'file.txt' or die;
    @{$_}{qw/date x count/} = unpack "A8ZA*", <$fh>;
    print "*$_->{date}*";
    print "*$_->{count}*";
     
    George Mpouras, Apr 3, 2014
    #15
  16. Στις 3/4/2014 20:47, ο/η Rainer Weikusat έγÏαψε:

    # substr is considered faster than regexs

    open my $fh, 'file.txt' or die;
    $_ = <$fh>;
    my $date = substr $_, 0, 8, '';
    my $count = substr $_, 1;


    print "*$date* *$count*\n";
     
    George Mpouras, Apr 3, 2014
    #16
  17. $date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>

    ?
     
    Rainer Weikusat, Apr 3, 2014
    #17
  18. ts, ts, ts ... hasty postings bad ...

    $date=substr($_,0,-(length($count=substr($_,rindex($_,' ')+1,-1))+2))for<$fh>
     
    Rainer Weikusat, Apr 3, 2014
    #18

  19. nice, but something goes wrong.
    for file content "YYYYMMDD 123"
    I got

    *YYYYMMDD 1* *12*
     
    George Mpouras, Apr 3, 2014
    #19
  20. the last character is missing

    *YYYYMMDD* *12*
     
    George Mpouras, Apr 3, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.