read and parse a single line file

Discussion in 'Perl Misc' started by Rainer Weikusat, Apr 1, 2014.

  1. Right now, I'm dealing with (two) single line files whose single line
    contains data in the form of

    YYYYMMDD XXXX

    the first being a date and the second a counter. So far, I've been using
    a pretty conventional

    $rc = <$fh>;
    chomp($rc);
    ($date, $counter) = split(/\s+/, $rc);

    for getting the data out of the file. While working with this code in
    order to add some features to it, it came to me that

    ($date, $counter) = split for <$fh>

    works as well, as does

    ($date, $counter) = map { split } <$fh>;

    but I like the first one better. Comments or alternate suggestions?
    Rainer Weikusat, Apr 1, 2014
    #1
    1. Advertising

  2. Rainer Weikusat

    Jim Gibson Guest

    In article <>, Rainer
    Weikusat <> wrote:

    > Right now, I'm dealing with (two) single line files whose single line
    > contains data in the form of
    >
    > YYYYMMDD XXXX
    >
    > the first being a date and the second a counter. So far, I've been using
    > a pretty conventional
    >
    > $rc = <$fh>;
    > chomp($rc);
    > ($date, $counter) = split(/\s+/, $rc);
    >
    > for getting the data out of the file. While working with this code in
    > order to add some features to it, it came to me that
    >
    > ($date, $counter) = split for <$fh>
    >
    > works as well, as does
    >
    > ($date, $counter) = map { split } <$fh>;
    >
    > but I like the first one better. Comments or alternate suggestions?
    >


    <> in scalar context will read one line, whereas <> in list context
    will read the entire file. Since your files only have one line, it
    doesn't matter. But what if in the future a blank line gets added to
    the end of the file. I would prefer that my code still worked, so I
    would prefer a solution that keeps <> in scalar context and only reads
    the first line.

    What about this:

    ($date, $counter) = split(' ',<$fh>);

    That has <> in scalar context and also takes advantage of the
    skip-any-null-fields-at-the-beginning feature of split with a single
    space first argument, which is the default for split with no arguments
    as in your second and third solutions.

    --
    Jim Gibson
    Jim Gibson, Apr 2, 2014
    #2
    1. Advertising

  3. Rainer Weikusat

    gamo Guest

    El 02/04/14 01:27, Jim Gibson escribió:
    > In article <>, Rainer
    > Weikusat <> wrote:
    >
    >> Right now, I'm dealing with (two) single line files whose single line
    >> contains data in the form of
    >>
    >> YYYYMMDD XXXX
    >>
    >> the first being a date and the second a counter. So far, I've been using
    >> a pretty conventional
    >>
    >> $rc = <$fh>;
    >> chomp($rc);
    >> ($date, $counter) = split(/\s+/, $rc);


    This is a clear solution.


    >>
    >> for getting the data out of the file. While working with this code in
    >> order to add some features to it, it came to me that
    >>
    >> ($date, $counter) = split for <$fh>


    This is not clear, and does a for for one element (one string).
    And where is the chomp?



    --
    http://www.telecable.es/personales/gamo/
    gamo, Apr 2, 2014
    #3
  4. gamo <> writes:
    > El 02/04/14 01:27, Jim Gibson escribió:
    >> In article <>, Rainer
    >> Weikusat <> wrote:
    >>
    >>> Right now, I'm dealing with (two) single line files whose single line
    >>> contains data in the form of
    >>>
    >>> YYYYMMDD XXXX
    >>>
    >>> the first being a date and the second a counter. So far, I've been using
    >>> a pretty conventional
    >>>
    >>> $rc = <$fh>;
    >>> chomp($rc);
    >>> ($date, $counter) = split(/\s+/, $rc);

    >
    > This is a clear solution.


    It's a seriously verbose solution. In particular, I'd like to get rid of
    the helper variable.

    >>> for getting the data out of the file. While working with this code in
    >>> order to add some features to it, it came to me that
    >>>
    >>> ($date, $counter) = split for <$fh>

    >
    > This is not clear, and does a for for one element (one string).
    > And where is the chomp?


    Can you provide a definition of 'clear' which is not "different from
    what I'm used to"? The 'foreach' for aliases all elements of the list to
    $_ in turn and then executes whatever the 'loop body' happens to be, in
    this case, the statement annotated with the for statement
    modifier. Using for in this way is actually a Perl-idiom because it is
    one of the 'traditional' ways to emulate a switch-style multi-way
    conditional, eg (untested)

    for ($text) {
    /supersonic/ && do {
    Rainer Weikusat, Apr 2, 2014
    #4
  5. Jim Gibson <> writes:
    > In article <>, Rainer
    > Weikusat <> wrote:
    >
    >> Right now, I'm dealing with (two) single line files whose single line
    >> contains data in the form of
    >>
    >> YYYYMMDD XXXX
    >>
    >> the first being a date and the second a counter.


    [...]

    >> While working with this code in
    >> order to add some features to it, it came to me that
    >>
    >> ($date, $counter) = split for <$fh>
    >>
    >> works


    [...]

    > <> in scalar context will read one line, whereas <> in list context
    > will read the entire file. Since your files only have one line, it
    > doesn't matter. But what if in the future a blank line gets added to
    > the end of the file. I would prefer that my code still worked, so I
    > would prefer a solution that keeps <> in scalar context and only reads
    > the first line.


    The first thing I noticed about that is that I now need to truncate the
    file before updating it to prevent a trailing blank line from appearing
    in case the counter wraps from a two-digit to a one-digit number when
    the date changes ;-).

    > What about this:
    >
    > ($date, $counter) = split(' ',<$fh>);


    See also "cannot see the forest because of all the trees". All the
    one-line variants have one common problem, though: They're
    debugging-unfriendly because it is not easily possible to inspect the
    data read from the file before processing it. Presently, I'm thinking
    about either using a helper variable nevertheless or something like

    for (<$fh>) {
    ($date, $counter) = split;
    }

    possibly with the additional requirement that the counter will become a
    fixed-width field.
    Rainer Weikusat, Apr 2, 2014
    #5
  6. Rainer Weikusat

    John Bokma Guest

    Rainer Weikusat <> writes:

    > for (<$fh>) {
    > ($date, $counter) = split;
    > }


    When I see this code it gives me the impression (out of context) that
    the author wants to have the 2 values on the last line. Which is
    correct, since there is only one. If I would use this, I probably would
    add:

    # There is only one line; get the 2 values on this line.

    I probably would write it like this:

    chomp ( my $line = <$fh> );
    my ( $date, $counter ) = split ' ', $line;

    As for the fixed field, I probably would use

    truncate( $fh ) or die "Can't truncate '$filename': $!";

    --
    John Bokma j3b

    Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
    Perl for books: http://johnbokma.com/perl/help-in-exchange-for-books.html
    John Bokma, Apr 2, 2014
    #6
  7. John Bokma <> writes:
    > Rainer Weikusat <> writes:
    >
    >> for (<$fh>) {
    >> ($date, $counter) = split;
    >> }

    >
    > When I see this code it gives me the impression (out of context) that
    > the author wants to have the 2 values on the last line. Which is
    > correct, since there is only one. If I would use this, I probably would
    > add:
    >
    > # There is only one line; get the 2 values on this line.
    >
    > I probably would write it like this:
    >
    > chomp ( my $line = <$fh> );
    > my ( $date, $counter ) = split ' ', $line;


    After flirting with

    local $_ = <$fh>;
    ($date, $counter) = split;

    I've meanwhile settled on

    $rc = <$fh>;
    ($date, $counter) = split(' ', $rc);

    as the 'least byzantine way to express what I want' which has at least a
    'simplified split' (' ' instead of /\s+/) and does away with the
    redundant chomp.

    The third programming language I learnt (after Apple Basic and 65C02
    machine language[*]) was Pascal which is strictly 'declare everything
    before use' and forces declarations of similar things to occur in
    blocks, eg, 'all constants, all types, all variables'. I've mostly kept
    this as a habit and in particular, I start every subroutine with
    declarations of all 'local' (as in 'my', not as in 'local') variables. I
    consider declarations distributed all throughout the code extremely
    messy, not only because the mixing of 'different things' (declarations
    and statements) but also because this tends to hide the real complexity
    of the subroutine in question: If all variables are declared at the top,
    subroutines ripe for segmentation can be identified by this list
    becoming 'lengthy and messy', ie, containing lots of variables and
    'strange naming conventions' in order to avoid name clashes.

    [*] As a friendly reminder, a home computer looks like this:

    http://upload.wikimedia.org/wikiped..._monitor.jpg/600px-Apple_IIc_with_monitor.jpg

    and not like this

    http://upload.wikimedia.org/wikipedia/commons/thumb/5/5e/Toes.jpg/800px-Toes.jpg

    even if you have 64 of them (in German, C is pronunced like Zeh which
    means toe).
    Rainer Weikusat, Apr 2, 2014
    #7
  8. Rainer Weikusat

    John Bokma Guest

    Rainer Weikusat <> writes:


    > The third programming language I learnt (after Apple Basic and 65C02
    > machine language[*]) was Pascal which is strictly 'declare everything


    If you don't count COMAL, same here. At least that's what I recall. And
    replace Apple with Sinclair and 65C02 with Z80 ;-)

    > before use' and forces declarations of similar things to occur in
    > blocks, eg, 'all constants, all types, all variables'. I've mostly kept
    > this as a habit and in particular, I start every subroutine with
    > declarations of all 'local' (as in 'my', not as in 'local') variables. I
    > consider declarations distributed all throughout the code extremely
    > messy, not only because the mixing of 'different things' (declarations
    > and statements) but also because this tends to hide the real complexity
    > of the subroutine in question: If all variables are declared at the top,
    > subroutines ripe for segmentation can be identified by this list
    > becoming 'lengthy and messy', ie, containing lots of variables and
    > 'strange naming conventions' in order to avoid name clashes.


    I split a sub if:

    - it makes it more readable as in I can move lines of code to a
    separate sub and replace this with a call that makes the code more
    easy to read.
    - it has too many lines (more than 60 or so) and it makes sense to
    split it.

    And I do prefer to put my close to first use (makes factoring out
    easier). But that probably also has a lot to do with that I like early
    returns, etc. And a bunch of mys followed by a .... or return (or return
    if ... ) looks weird to me.

    > even if you have 64 of them (in German, C is pronunced like Zeh which
    > means toe).


    Ah, didn't know that even though being Dutch and having had one year of
    German at school, and having read quite some (well written) German
    computer magazines back in the day.

    --
    John Bokma j3b

    Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
    Perl for books: http://johnbokma.com/perl/help-in-exchange-for-books.html
    John Bokma, Apr 2, 2014
    #8
  9. On 2014-04-02 11:43, Rainer Weikusat <> wrote:
    > gamo <> writes:
    >> El 02/04/14 01:27, Jim Gibson escribió:
    >>> In article <>, Rainer
    >>> Weikusat <> wrote:
    >>>> Right now, I'm dealing with (two) single line files whose single line
    >>>> contains data in the form of
    >>>>
    >>>> YYYYMMDD XXXX
    >>>>
    >>>> the first being a date and the second a counter. So far, I've been using
    >>>> a pretty conventional
    >>>>
    >>>> $rc = <$fh>;
    >>>> chomp($rc);
    >>>> ($date, $counter) = split(/\s+/, $rc);

    >>
    >> This is a clear solution.

    >
    > It's a seriously verbose solution.


    The chomp is unnecessary, as you already noticed.

    > In particular, I'd like to get rid of the helper variable.


    Why not:
    ($date, $counter) = split(/\s+/, <$fh>);
    ?
    hp

    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Apr 2, 2014
    #9
  10. <$fh> =~/^(?<date>\w+)\s+(?<counter>\w+)/;
    print "*$+{date}* *$+{counter}*\n";

    or

    read $fh, my $date, 8;
    seek $fh, 1,1;
    read $fh, my $count, 4;
    George Mpouras, Apr 2, 2014
    #10
  11. George Mpouras <> writes:
    > <$fh> =~/^(?<date>\w+)\s+(?<counter>\w+)/;
    > print "*$+{date}* *$+{counter}*\n";


    Slightly modified variant:

    ($date, $counter) = <$fh> =~ /(\d+)\s+(\d+)/;

    Another we didn't have so far:

    ($date, $counter) = unpack('A8xA', <$fh>);

    > read $fh, my $date, 8;
    > seek $fh, 1,1;
    > read $fh, my $count, 4;


    This won't work because the count isn't a fixed-width field. Using

    read($fh, $date, 8)
    $counter = <$fh> + 0;

    would, though.
    Rainer Weikusat, Apr 3, 2014
    #11
  12. Rainer Weikusat <> writes:

    [...]

    > Another we didn't have so far:
    >
    > ($date, $counter) = unpack('A8xA', <$fh>);


    This doesn't work either, as it only uses the first character of the
    counter.

    ($date, $counter) = unpack('A9A*', <$fh>);
    Rainer Weikusat, Apr 3, 2014
    #12
  13. Στις 3/4/2014 18:07, ο/η Rainer Weikusat έγÏαψε:
    > ($date, $counter) = unpack('A9A*', <$fh>);


    my @array = unpack "A9 A*", <$fh>;
    George Mpouras, Apr 3, 2014
    #13
  14. George Mpouras <> writes:
    > Στις 3/4/2014 18:07, ο/η Rainer Weikusat έγÏαψε:
    >> ($date, $counter) = unpack('A9A*', <$fh>);

    >
    > my @array = unpack "A9 A*", <$fh>;


    What is this now supposed to communicate?
    Rainer Weikusat, Apr 3, 2014
    #14
  15. >> my @array = unpack "A9 A*", <$fh>;
    >
    > What is this now supposed to communicate?
    >


    #!/usr/bin/perl
    use strict;
    use warnings;
    open my $fh, 'file.txt' or die;
    @{$_}{qw/date x count/} = unpack "A8ZA*", <$fh>;
    print "*$_->{date}*";
    print "*$_->{count}*";
    George Mpouras, Apr 3, 2014
    #15
  16. Στις 3/4/2014 20:47, ο/η Rainer Weikusat έγÏαψε:
    > George Mpouras <> writes:
    >> Στις 3/4/2014 18:07, ο/η Rainer Weikusat έγÏαψε:
    >>> ($date, $counter) = unpack('A9A*', <$fh>);

    >>
    >> my @array = unpack "A9 A*", <$fh>;

    >
    > What is this now supposed to communicate?
    >



    # substr is considered faster than regexs

    open my $fh, 'file.txt' or die;
    $_ = <$fh>;
    my $date = substr $_, 0, 8, '';
    my $count = substr $_, 1;


    print "*$date* *$count*\n";
    George Mpouras, Apr 3, 2014
    #16
  17. George Mpouras <> writes:
    > Στις 3/4/2014 20:47, ο/η Rainer Weikusat έγÏαψε:
    >> George Mpouras <> writes:
    >>> Στις 3/4/2014 18:07, ο/η Rainer Weikusat έγÏαψε:
    >>>> ($date, $counter) = unpack('A9A*', <$fh>);
    >>>
    >>> my @array = unpack "A9 A*", <$fh>;

    >>
    >> What is this now supposed to communicate?
    >>

    >
    >
    > # substr is considered faster than regexs
    >
    > open my $fh, 'file.txt' or die;
    > $_ = <$fh>;
    > my $date = substr $_, 0, 8, '';
    > my $count = substr $_, 1;
    >
    >
    > print "*$date* *$count*\n";


    $date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>

    ?
    Rainer Weikusat, Apr 3, 2014
    #17
  18. Rainer Weikusat <> writes:
    > George Mpouras <> writes:
    >> Στις 3/4/2014 20:47, ο/η Rainer Weikusat έγÏαψε:
    >>> George Mpouras <> writes:
    >>>> Στις 3/4/2014 18:07, ο/η Rainer Weikusat έγÏαψε:
    >>>>> ($date, $counter) = unpack('A9A*', <$fh>);
    >>>>
    >>>> my @array = unpack "A9 A*", <$fh>;
    >>>
    >>> What is this now supposed to communicate?
    >>>

    >>
    >>
    >> # substr is considered faster than regexs
    >>
    >> open my $fh, 'file.txt' or die;
    >> $_ = <$fh>;
    >> my $date = substr $_, 0, 8, '';
    >> my $count = substr $_, 1;
    >>
    >>
    >> print "*$date* *$count*\n";

    >
    > $date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>


    ts, ts, ts ... hasty postings bad ...

    $date=substr($_,0,-(length($count=substr($_,rindex($_,' ')+1,-1))+2))for<$fh>
    Rainer Weikusat, Apr 3, 2014
    #18
  19. >
    > $date=substr($_,0,-length($count=substr($_,rindex($_,' ')+1,-1)))for<$fh>
    >
    >



    nice, but something goes wrong.
    for file content "YYYYMMDD 123"
    I got

    *YYYYMMDD 1* *12*
    George Mpouras, Apr 3, 2014
    #19
  20. > $date=substr($_,0,-(length($count=substr($_,rindex($_,' ')+1,-1))+2))for<$fh>
    >


    the last character is missing

    *YYYYMMDD* *12*
    George Mpouras, Apr 3, 2014
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hugo
    Replies:
    10
    Views:
    1,299
    Matt Humphrey
    Oct 18, 2004
  2. Replies:
    19
    Views:
    1,119
    Daniel Vallstrom
    Mar 15, 2005
  3. kaushikshome
    Replies:
    4
    Views:
    759
    kaushikshome
    Sep 10, 2006
  4. scad
    Replies:
    23
    Views:
    1,157
    Alf P. Steinbach
    May 17, 2009
  5. Marek Stepanek
    Replies:
    12
    Views:
    410
    Peter J. Holzer
    Sep 2, 2006
Loading...

Share This Page