Parsing long files by using read

gavs · Jan 16, 2004

Hi,

I am fairly new to perl and need to split a fairly large file that
contains no newlines. The records contained in this file is fixed
length. I have written the following code to split this long record
into 600 byte long records and appending a newline. After executing
this program, the file size doubles.

For example: a record in this file can be split up into 3 records of
600 byte length; hence the original length of this file is 1800 bytes.

size = size of the original file.

while($bytes_read < $size) {
my $record;
$bytes_read += read(FIN, $record, $record_len, $offset);
print "Bytes read # $bytes_read, OFFSET=$offset\n";

$record .= "\n";

print FOUT $record;
$offset += $record_len;
}

fclose(FIN);
fclose(FOUT);

Viewing the out file with vi generates the following:
"a" 3 lines, 3603 characters (1800 null characters)

Where are extra 1800 bytes coming from? How do I get rid of them?

Thanks.
gavs

Uri Guttman · Jan 16, 2004

perldoc perlvar. look for $/ and assigning it a ref to an integer.

uri

Ben Morrow · Jan 16, 2004

I am fairly new to perl and need to split a fairly large file that
contains no newlines. The records contained in this file is fixed
length. I have written the following code to split this long record
into 600 byte long records and appending a newline. After executing
this program, the file size doubles.

For example: a record in this file can be split up into 3 records of
600 byte length; hence the original length of this file is 1800 bytes.

size = size of the original file.

while($bytes_read < $size) {
my $record;
$bytes_read += read(FIN, $record, $record_len, $offset);
print "Bytes read # $bytes_read, OFFSET=$offset\n";

$record .= "\n";

print FOUT $record;
$offset += $record_len;
}

fclose(FIN);
fclose(FOUT);

Perl has no fclose function. Please show us your real code.

Viewing the out file with vi generates the following:
"a" 3 lines, 3603 characters (1800 null characters)

Where are extra 1800 bytes coming from? How do I get rid of them?

The 'offset' parameter to read() is an offset into the string, not
into the file. The bytes are read from the file starting wherever the
last read left off. However, the whole thing looks more like C than
Perl.

Here's how I'd do it (untested):

{
local $/ = \600; # 600-byte input records
local $\ = "\n"; # see perldoc perlvar

open my $IN, ... or die "can't open input: $!";
open my $OUT, ... or die "can't open output: $!";

print $OUT $_ while <$IN>;
}
# no need for close() as the filehandles are closed when they go out
# of scope.

or indeed

perl -lpe'BEGIN { $/ = \600 }' < in > out

Ben

Walter Roberson · Jan 16, 2004

: local $/ = \600; # 600-byte input records

How does that work, Ben? When I look at the documentation for $/
there does not appear to be an option for setting a record size.
And a reference to a scalar looks odd there...

Uri Guttman · Jan 16, 2004

WR> In article <[email protected]>,
WR> : local $/ = \600; # 600-byte input records

WR> How does that work, Ben? When I look at the documentation for $/
WR> there does not appear to be an option for setting a record size.
WR> And a reference to a scalar looks odd there...

what docs are you looking at? perldoc perlvar says this:

Setting "$/" to a reference to an integer, scalar
containing an integer, or scalar that's convertible
to an integer will attempt to read records instead
of lines, with the maximum record size being the
referenced integer. So this:

$/ = \32768; # or \"32768", or \$var_containing_32768
open(FILE, $myfile);
$_ = <FILE>;

will read a record of no more than 32768 bytes from
FILE. If you're not reading from a record-oriented
file (or your OS doesn't have record-oriented
files), then you'll likely get a full chunk of data
with every read. If a record is larger than the
record size you've set, you'll get the record back
in pieces.

seems to be clearly documented to me.

uri

gnari · Jan 16, 2004

Walter Roberson said:
: local $/ = \600; # 600-byte input records

How does that work, Ben? When I look at the documentation for $/
there does not appear to be an option for setting a record size.

see http://perldoc.com/perl5.8.0/pod/perlvar.html
look for $/, where it says:

Setting $/ to a reference to an integer, scalar containing an integer,
or scalar that's convertible to an integer will attempt to read records
instead of lines, with the maximum record size being the referenced
integer.

gnari

'Needless flexibilities' and structured records [very long]	10	Mar 15, 2013
Problems reading from files	11	Aug 25, 2007
FAQ 5.29 How can I read in an entire file all at once?	0	Mar 16, 2011
UTF - SEEK_SET workaround for BOM encoding(utf-16/32) layer Bug	2	Aug 5, 2009
HOWTO: Parsing email using Python part2	1	Jul 15, 2011
trying to read a list of files	2	Aug 15, 2006
Problem Reading Bmp's ..	14	May 13, 2007
Masking by columns for grep	12	Jun 9, 2005

Parsing long files by using read

gavs

Uri Guttman

Ben Morrow

Walter Roberson

Uri Guttman

gnari

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads