Reading Data File Records

Discussion in 'Perl Misc' started by Graham, Sep 9, 2003.

  1. Graham

    Graham Guest

    I'm a little frustrated with Perl's line-by-line file reading and I am
    hoping that someone can help me.

    I have a data file that looks like:

    --
    ! Comment 1
    ! Comment 2
    ! Comment ...
    5 ! number of levels
    *aaa [aaa units] ! space deliminated is common
    1.0 2.0 3.0 4.0 5.0
    *bbb [bbb units] ! csv is possible
    1.0, 2.0, 3.0,
    4.0 5.0
    *ccc [ccc units] ! the file is written from fortran and the number of
    columns is not fixed
    10.0
    20.0
    30.0
    40.0
    50.0
    ....
    --

    Essentially, there is a header block that always begins with '!' in
    the first column. This is followed by the number of elements in each
    data block and an unknown number of data blocks having a set number of
    elements.

    The file is generated using about five lines of FORTRAN so it seems
    somehwat surprising that I am up to 30 lines of perl with almost no
    end in sight... Does anyone have an example showing how to process a
    file in blocks using Perl?

    Thanks,
    Graham
     
    Graham, Sep 9, 2003
    #1
    1. Advertising

  2. Graham

    Brian Wakem Guest

    "Graham" <> wrote in message
    news:...
    > I'm a little frustrated with Perl's line-by-line file reading and I am
    > hoping that someone can help me.
    >
    > I have a data file that looks like:
    >
    > --
    > ! Comment 1
    > ! Comment 2
    > ! Comment ...
    > 5 ! number of levels
    > *aaa [aaa units] ! space deliminated is common
    > 1.0 2.0 3.0 4.0 5.0
    > *bbb [bbb units] ! csv is possible
    > 1.0, 2.0, 3.0,
    > 4.0 5.0
    > *ccc [ccc units] ! the file is written from fortran and the number of
    > columns is not fixed
    > 10.0
    > 20.0
    > 30.0
    > 40.0
    > 50.0
    > ...
    > --
    >
    > Essentially, there is a header block that always begins with '!' in
    > the first column. This is followed by the number of elements in each
    > data block and an unknown number of data blocks having a set number of
    > elements.
    >
    > The file is generated using about five lines of FORTRAN so it seems
    > somehwat surprising that I am up to 30 lines of perl with almost no
    > end in sight... Does anyone have an example showing how to process a
    > file in blocks using Perl?



    What do you want to do with it?

    --
    Brian Wakem
     
    Brian Wakem, Sep 9, 2003
    #2
    1. Advertising

  3. On 9 Sep 2003 08:14:57 -0700
    (Graham) wrote:
    <snip>
    > The file is generated using about five lines of FORTRAN so it seems
    > somehwat surprising that I am up to 30 lines of perl with almost no
    > end in sight... Does anyone have an example showing how to process
    > a file in blocks using Perl?


    Post your code - I have no idea what you are trying to do. Maybe it's
    just me ;)

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    You cannot kill time without injuring eternity.
     
    James Willmore, Sep 9, 2003
    #3
  4. Graham

    Tulan W. Hu Guest

    "Graham" <> wrote in message ...
    [snip..]
    > The file is generated using about five lines of FORTRAN so it seems
    > somehwat surprising that I am up to 30 lines of perl with almost no
    > end in sight... Does anyone have an example showing how to process a
    > file in blocks using Perl?


    I would download the File::Slurp module from cpan and installed it.
    http://search.cpan.org/author/MUIR/File-Slurp-2004.0904/

    ====
    #!/usr/bin/perl
    use File::Slurp;

    @allLines = read_file("data_file_name");
    foreach my $line (@allLine) {
    # in case you need process each line
    if ($line =~ /^!/) { # comment lines }
    else { # datalines}
    }
     
    Tulan W. Hu, Sep 9, 2003
    #4
  5. Graham

    Jay Tilton Guest

    (Graham) wrote:

    : I have a data file that looks like:
    :
    : --
    : ! Comment 1
    : ! Comment 2
    : ! Comment ...
    : 5 ! number of levels
    : *aaa [aaa units] ! space deliminated is common
    : 1.0 2.0 3.0 4.0 5.0
    : *bbb [bbb units] ! csv is possible
    : 1.0, 2.0, 3.0,
    : 4.0 5.0
    ^
    ^
    Should there be a comma between those two values?

    : *ccc [ccc units] ! the file is written from fortran and the number of
    : columns is not fixed

    Is this really how the data file is formatted, or did your newsreader
    word-wrap that line for you?

    : 10.0
    : 20.0
    : 30.0
    : 40.0
    : 50.0
    : ...
    : --
    :
    : Essentially, there is a header block that always begins with '!' in
    : the first column. This is followed by the number of elements in each
    : data block and an unknown number of data blocks having a set number of
    : elements.

    The problem is determining where one block ends and another begins when
    the only thing known about the block is how many elements it contains.
    There's no apparent consistency or predictability to how the blocks may
    be formatted, or to how the elements are separated. Altering the input
    record separator, $/, then reading in a number of records isn't going to
    work.

    What might work would be to read lines of data until a block's requisite
    number of elements have been acquired, but the elements themselves will
    need to have a consistent, recognizable format, and a newline character
    has to mark the boundary between blocks. From the sample data, the
    elemets all seem to be numbers with one place after the decimal.

    As a first approximation of workable code,

    #!perl
    use warnings;
    use strict;
    my $elems_per_block;
    while(<DATA>) {
    next if /^!/;
    ($elems_per_block) = /^(\d+)/;
    last;
    }
    my @blocks;
    while(<DATA>) {
    my $block = $_;
    my $n = 0;
    while(<DATA>) {
    $block .= $_;
    last if $elems_per_block == ($n += () = /(\b\d+\.\d\b)/g);
    }
    push @blocks, $block;
    }
    for( @blocks ) {
    # whatever processing each block needs
    print "Block:\n$_\n";
    }

    __DATA__
    ! Comment 1
    ! Comment 2
    ! Comment ...
    5 ! number of levels
    *aaa [aaa units] ! space deliminated is common
    1.0 2.0 3.0 4.0 5.0
    *bbb [bbb units] ! csv is possible
    1.0, 2.0, 3.0,
    4.0 5.0
    *ccc [ccc units] ! the file is written from fortran and the number of
    columns is not fixed
    10.0
    20.0
    30.0
    40.0
    50.0

    : The file is generated using about five lines of FORTRAN so it seems
    : somehwat surprising that I am up to 30 lines of perl with almost no
    : end in sight...

    Why should that be surprising? You're trying to build a modicum of
    intelligence into one tool to compensate for another's lack of
    sophistication. The Perl program would have a much easier time reading
    if the FORTRAN program was only a little better at writing.
     
    Jay Tilton, Sep 10, 2003
    #5
  6. On 9 Sep 2003 15:41:03 -0700
    (Graham) wrote:
    > It seems it isn't just you. All I am trying to do is get the data
    > blocks into a suitable perl structure so I can calculate some simple
    > statistics and reformat it for another program. See comments in the
    > second while loop.
    >
    > I really appreciate the help. I have a pile of files with this type
    > of structure (a legacy of an ancient postdoc) that I need to
    > manipulate and reformat.


    First, let me say that each language is going to handle files and
    variables differently. I say this because you commented on using
    FORTRAN. I know nothing about FORTRAN, but have had _some_ dealings
    with COBOL. Some functionality in COBOL is unavailable in Perl (such
    as strictly defining variables). By the same token, there's
    functionaility in Perl that is not available in COBOL (such as regular
    expressions). Having said that, here is some untested code that _may_
    fit the bill for you. Again, it's untested and may _not_ be exactly
    what you're looking for. If I'm off, I'm hoping someone will point
    out where the errors are.

    ==untested==
    #!/usr/bin/perl -w
    use strict;

    #define the name of the file
    my $file = 'name_of_file_here';

    #define a hash (associative array) for your records
    my %records;

    #open a file handle to the file - die if we can't open it
    open(FILE, $file)
    or die "Can't open file $file: $!\n";

    #get the header - if it's the first line and
    #leads with a "!"
    my $header = <FILE> if /^!/;
    #if you want the number of levels, get the portion before the first
    "!"
    #can be done with substr - regular expression used for
    #demonstration purposes
    my $numLev = $1 if $header =~ m/^(.*)!/;

    #while the file is open and does not return eof
    while(<FILE>){
    #chomp the newline off the line
    chomp;
    #stick the line of the file into variable $line
    my $line = $_;
    #get the begining of the line up until the first "!"
    #(strip the comments)
    #again - substr could be used
    my $uncommented_line = $1 if m/^(.*)!/;
    #if the record is 132 characters in length, separated by
    whitespace
    #spilt the line on whitespace and place each 'section' into an
    array
    my @data = split / /, $uncommented_line;
    #create the key for the record using the block id
    my $key = shift @data;
    #store the record as an array into the hash using the block id as the
    key
    push @{$records{$key}}, @data;
    }

    #to retrieve the records ...
    foreach my $k(sort keys %records){
    print "$k => ",join(" ",@{$record{$k}}),"\n";
    }
    ==untested==

    HTH

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    What this country needs is a good five cent microcomputer.
     
    James Willmore, Sep 10, 2003
    #6
  7. Graham

    Anno Siegel Guest

    Jay Tilton <> wrote in comp.lang.perl.misc:
    > (Graham) wrote:


    > : The file is generated using about five lines of FORTRAN so it seems
    > : somehwat surprising that I am up to 30 lines of perl with almost no
    > : end in sight...
    >
    > Why should that be surprising? You're trying to build a modicum of
    > intelligence into one tool to compensate for another's lack of
    > sophistication. The Perl program would have a much easier time reading
    > if the FORTRAN program was only a little better at writing.


    Also, parsing input is generally harder than generating output. Printing
    what comes along is easy. To read it back in, you must often (as in
    the OPs case) understand what you have read so far to know how to
    proceed.

    The C functions printf() and scanf() are an attempt to make printing
    and scanning symmetric. A look at their respective frequency of use
    shows that the attempt wasn't a full success.

    Anno
     
    Anno Siegel, Sep 10, 2003
    #7
  8. Graham wrote:

    >
    > It seems it isn't just you. All I am trying to do is get the data
    > blocks into a suitable perl structure so I can calculate some simple
    > statistics and reformat it for another program. See comments in the
    > second while loop.
    >
    > I really appreciate the help. I have a pile of files with this type
    > of structure (a legacy of an ancient postdoc) that I need to
    > manipulate and reformat.


    snip


    Don't be afraid to slurp the whole file. I slurp 400,000+
    line files very quickly and do the processing. The only
    trouble is if you do it more than once in the program.
    You might see a big slowdown - at least on Win2000.

    I never found a good solution to this (yet), so I just
    run a bunch on individual perl scripts - one for each
    file.

    If you find a better solution, let us know.


    Mike
     
    Mike Flannigan, Sep 10, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Craig
    Replies:
    1
    Views:
    594
    Thomas Kellerer
    Mar 15, 2005
  2. Milo Woodward
    Replies:
    6
    Views:
    7,945
    Georg Bauhaus
    Aug 28, 2003
  3. Luke Airig
    Replies:
    0
    Views:
    813
    Luke Airig
    Dec 31, 2003
  4. Dan

    Delete records or update records

    Dan, May 10, 2004, in forum: ASP General
    Replies:
    1
    Views:
    475
    Ray at
    May 10, 2004
  5. Replies:
    3
    Views:
    680
    Anthony Jones
    Nov 2, 2006
Loading...

Share This Page