Reading Data File Records

G

Graham

I'm a little frustrated with Perl's line-by-line file reading and I am
hoping that someone can help me.

I have a data file that looks like:

--
! Comment 1
! Comment 2
! Comment ...
5 ! number of levels
*aaa [aaa units] ! space deliminated is common
1.0 2.0 3.0 4.0 5.0
*bbb [bbb units] ! csv is possible
1.0, 2.0, 3.0,
4.0 5.0
*ccc [ccc units] ! the file is written from fortran and the number of
columns is not fixed
10.0
20.0
30.0
40.0
50.0
....
--

Essentially, there is a header block that always begins with '!' in
the first column. This is followed by the number of elements in each
data block and an unknown number of data blocks having a set number of
elements.

The file is generated using about five lines of FORTRAN so it seems
somehwat surprising that I am up to 30 lines of perl with almost no
end in sight... Does anyone have an example showing how to process a
file in blocks using Perl?

Thanks,
Graham
 
B

Brian Wakem

Graham said:
I'm a little frustrated with Perl's line-by-line file reading and I am
hoping that someone can help me.

I have a data file that looks like:

--
! Comment 1
! Comment 2
! Comment ...
5 ! number of levels
*aaa [aaa units] ! space deliminated is common
1.0 2.0 3.0 4.0 5.0
*bbb [bbb units] ! csv is possible
1.0, 2.0, 3.0,
4.0 5.0
*ccc [ccc units] ! the file is written from fortran and the number of
columns is not fixed
10.0
20.0
30.0
40.0
50.0
...
--

Essentially, there is a header block that always begins with '!' in
the first column. This is followed by the number of elements in each
data block and an unknown number of data blocks having a set number of
elements.

The file is generated using about five lines of FORTRAN so it seems
somehwat surprising that I am up to 30 lines of perl with almost no
end in sight... Does anyone have an example showing how to process a
file in blocks using Perl?


What do you want to do with it?
 
J

James Willmore

On 9 Sep 2003 08:14:57 -0700
The file is generated using about five lines of FORTRAN so it seems
somehwat surprising that I am up to 30 lines of perl with almost no
end in sight... Does anyone have an example showing how to process
a file in blocks using Perl?

Post your code - I have no idea what you are trying to do. Maybe it's
just me ;)

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
You cannot kill time without injuring eternity.
 
T

Tulan W. Hu

Graham said:
The file is generated using about five lines of FORTRAN so it seems
somehwat surprising that I am up to 30 lines of perl with almost no
end in sight... Does anyone have an example showing how to process a
file in blocks using Perl?

I would download the File::Slurp module from cpan and installed it.
http://search.cpan.org/author/MUIR/File-Slurp-2004.0904/

====
#!/usr/bin/perl
use File::Slurp;

@allLines = read_file("data_file_name");
foreach my $line (@allLine) {
# in case you need process each line
if ($line =~ /^!/) { # comment lines }
else { # datalines}
}
 
J

Jay Tilton

(e-mail address removed) (Graham) wrote:

: I have a data file that looks like:
:
: --
: ! Comment 1
: ! Comment 2
: ! Comment ...
: 5 ! number of levels
: *aaa [aaa units] ! space deliminated is common
: 1.0 2.0 3.0 4.0 5.0
: *bbb [bbb units] ! csv is possible
: 1.0, 2.0, 3.0,
: 4.0 5.0
^
^
Should there be a comma between those two values?

: *ccc [ccc units] ! the file is written from fortran and the number of
: columns is not fixed

Is this really how the data file is formatted, or did your newsreader
word-wrap that line for you?

: 10.0
: 20.0
: 30.0
: 40.0
: 50.0
: ...
: --
:
: Essentially, there is a header block that always begins with '!' in
: the first column. This is followed by the number of elements in each
: data block and an unknown number of data blocks having a set number of
: elements.

The problem is determining where one block ends and another begins when
the only thing known about the block is how many elements it contains.
There's no apparent consistency or predictability to how the blocks may
be formatted, or to how the elements are separated. Altering the input
record separator, $/, then reading in a number of records isn't going to
work.

What might work would be to read lines of data until a block's requisite
number of elements have been acquired, but the elements themselves will
need to have a consistent, recognizable format, and a newline character
has to mark the boundary between blocks. From the sample data, the
elemets all seem to be numbers with one place after the decimal.

As a first approximation of workable code,

#!perl
use warnings;
use strict;
my $elems_per_block;
while(<DATA>) {
next if /^!/;
($elems_per_block) = /^(\d+)/;
last;
}
my @blocks;
while(<DATA>) {
my $block = $_;
my $n = 0;
while(<DATA>) {
$block .= $_;
last if $elems_per_block == ($n += () = /(\b\d+\.\d\b)/g);
}
push @blocks, $block;
}
for( @blocks ) {
# whatever processing each block needs
print "Block:\n$_\n";
}

__DATA__
! Comment 1
! Comment 2
! Comment ...
5 ! number of levels
*aaa [aaa units] ! space deliminated is common
1.0 2.0 3.0 4.0 5.0
*bbb [bbb units] ! csv is possible
1.0, 2.0, 3.0,
4.0 5.0
*ccc [ccc units] ! the file is written from fortran and the number of
columns is not fixed
10.0
20.0
30.0
40.0
50.0

: The file is generated using about five lines of FORTRAN so it seems
: somehwat surprising that I am up to 30 lines of perl with almost no
: end in sight...

Why should that be surprising? You're trying to build a modicum of
intelligence into one tool to compensate for another's lack of
sophistication. The Perl program would have a much easier time reading
if the FORTRAN program was only a little better at writing.
 
J

James Willmore

On 9 Sep 2003 15:41:03 -0700
It seems it isn't just you. All I am trying to do is get the data
blocks into a suitable perl structure so I can calculate some simple
statistics and reformat it for another program. See comments in the
second while loop.

I really appreciate the help. I have a pile of files with this type
of structure (a legacy of an ancient postdoc) that I need to
manipulate and reformat.

First, let me say that each language is going to handle files and
variables differently. I say this because you commented on using
FORTRAN. I know nothing about FORTRAN, but have had _some_ dealings
with COBOL. Some functionality in COBOL is unavailable in Perl (such
as strictly defining variables). By the same token, there's
functionaility in Perl that is not available in COBOL (such as regular
expressions). Having said that, here is some untested code that _may_
fit the bill for you. Again, it's untested and may _not_ be exactly
what you're looking for. If I'm off, I'm hoping someone will point
out where the errors are.

==untested==
#!/usr/bin/perl -w
use strict;

#define the name of the file
my $file = 'name_of_file_here';

#define a hash (associative array) for your records
my %records;

#open a file handle to the file - die if we can't open it
open(FILE, $file)
or die "Can't open file $file: $!\n";

#get the header - if it's the first line and
#leads with a "!"
my $header = <FILE> if /^!/;
#if you want the number of levels, get the portion before the first
"!"
#can be done with substr - regular expression used for
#demonstration purposes
my $numLev = $1 if $header =~ m/^(.*)!/;

#while the file is open and does not return eof
while(<FILE>){
#chomp the newline off the line
chomp;
#stick the line of the file into variable $line
my $line = $_;
#get the begining of the line up until the first "!"
#(strip the comments)
#again - substr could be used
my $uncommented_line = $1 if m/^(.*)!/;
#if the record is 132 characters in length, separated by
whitespace
#spilt the line on whitespace and place each 'section' into an
array
my @data = split / /, $uncommented_line;
#create the key for the record using the block id
my $key = shift @data;
#store the record as an array into the hash using the block id as the
key
push @{$records{$key}}, @data;
}

#to retrieve the records ...
foreach my $k(sort keys %records){
print "$k => ",join(" ",@{$record{$k}}),"\n";
}
==untested==

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
What this country needs is a good five cent microcomputer.
 
A

Anno Siegel

Jay Tilton said:
(e-mail address removed) (Graham) wrote:
: The file is generated using about five lines of FORTRAN so it seems
: somehwat surprising that I am up to 30 lines of perl with almost no
: end in sight...

Why should that be surprising? You're trying to build a modicum of
intelligence into one tool to compensate for another's lack of
sophistication. The Perl program would have a much easier time reading
if the FORTRAN program was only a little better at writing.

Also, parsing input is generally harder than generating output. Printing
what comes along is easy. To read it back in, you must often (as in
the OPs case) understand what you have read so far to know how to
proceed.

The C functions printf() and scanf() are an attempt to make printing
and scanning symmetric. A look at their respective frequency of use
shows that the attempt wasn't a full success.

Anno
 
M

Mike Flannigan

Graham said:
It seems it isn't just you. All I am trying to do is get the data
blocks into a suitable perl structure so I can calculate some simple
statistics and reformat it for another program. See comments in the
second while loop.

I really appreciate the help. I have a pile of files with this type
of structure (a legacy of an ancient postdoc) that I need to
manipulate and reformat.

snip


Don't be afraid to slurp the whole file. I slurp 400,000+
line files very quickly and do the processing. The only
trouble is if you do it more than once in the program.
You might see a big slowdown - at least on Win2000.

I never found a good solution to this (yet), so I just
run a bunch on individual perl scripts - one for each
file.

If you find a better solution, let us know.


Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top