Parsing long files by using read

G

gavs

Hi,

I am fairly new to perl and need to split a fairly large file that
contains no newlines. The records contained in this file is fixed
length. I have written the following code to split this long record
into 600 byte long records and appending a newline. After executing
this program, the file size doubles.

For example: a record in this file can be split up into 3 records of
600 byte length; hence the original length of this file is 1800 bytes.

size = size of the original file.

while($bytes_read < $size) {
my $record;
$bytes_read += read(FIN, $record, $record_len, $offset);
print "Bytes read # $bytes_read, OFFSET=$offset\n";

$record .= "\n";

print FOUT $record;
$offset += $record_len;
}

fclose(FIN);
fclose(FOUT);

Viewing the out file with vi generates the following:
"a" 3 lines, 3603 characters (1800 null characters)

Where are extra 1800 bytes coming from? How do I get rid of them?

Thanks.
gavs
 
B

Ben Morrow

I am fairly new to perl and need to split a fairly large file that
contains no newlines. The records contained in this file is fixed
length. I have written the following code to split this long record
into 600 byte long records and appending a newline. After executing
this program, the file size doubles.

For example: a record in this file can be split up into 3 records of
600 byte length; hence the original length of this file is 1800 bytes.

size = size of the original file.

while($bytes_read < $size) {
my $record;
$bytes_read += read(FIN, $record, $record_len, $offset);
print "Bytes read # $bytes_read, OFFSET=$offset\n";

$record .= "\n";

print FOUT $record;
$offset += $record_len;
}

fclose(FIN);
fclose(FOUT);

Perl has no fclose function. Please show us your real code.
Viewing the out file with vi generates the following:
"a" 3 lines, 3603 characters (1800 null characters)

Where are extra 1800 bytes coming from? How do I get rid of them?

The 'offset' parameter to read() is an offset into the string, not
into the file. The bytes are read from the file starting wherever the
last read left off. However, the whole thing looks more like C than
Perl.

Here's how I'd do it (untested):

{
local $/ = \600; # 600-byte input records
local $\ = "\n"; # see perldoc perlvar

open my $IN, ... or die "can't open input: $!";
open my $OUT, ... or die "can't open output: $!";

print $OUT $_ while <$IN>;
}
# no need for close() as the filehandles are closed when they go out
# of scope.

or indeed

perl -lpe'BEGIN { $/ = \600 }' < in > out

Ben
 
W

Walter Roberson

: local $/ = \600; # 600-byte input records

How does that work, Ben? When I look at the documentation for $/
there does not appear to be an option for setting a record size.
And a reference to a scalar looks odd there...
 
U

Uri Guttman

WR> In article <[email protected]>,
WR> : local $/ = \600; # 600-byte input records

WR> How does that work, Ben? When I look at the documentation for $/
WR> there does not appear to be an option for setting a record size.
WR> And a reference to a scalar looks odd there...

what docs are you looking at? perldoc perlvar says this:

Setting "$/" to a reference to an integer, scalar
containing an integer, or scalar that's convertible
to an integer will attempt to read records instead
of lines, with the maximum record size being the
referenced integer. So this:

$/ = \32768; # or \"32768", or \$var_containing_32768
open(FILE, $myfile);
$_ = <FILE>;

will read a record of no more than 32768 bytes from
FILE. If you're not reading from a record-oriented
file (or your OS doesn't have record-oriented
files), then you'll likely get a full chunk of data
with every read. If a record is larger than the
record size you've set, you'll get the record back
in pieces.


seems to be clearly documented to me.

uri
 
G

gnari

Walter Roberson said:
: local $/ = \600; # 600-byte input records

How does that work, Ben? When I look at the documentation for $/
there does not appear to be an option for setting a record size.

see http://perldoc.com/perl5.8.0/pod/perlvar.html
look for $/, where it says:

Setting $/ to a reference to an integer, scalar containing an integer,
or scalar that's convertible to an integer will attempt to read records
instead of lines, with the maximum record size being the referenced
integer.

gnari
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top