newbie questions about File IO

M

Matthew Crema

Hello,

I'm brand new to Perl. I spent the last few weeks reading the "Llama"
and I want to get started now. This question is more for a beginner's
curiosity than an actual application.

Say I have 1000 text files called:
data1.dat
data2.dat
data3.dat
..
..
..
data1000.dat

Each of these files has 32768 integer numbers (between -100 and 100)
like so:
10
-20
-14
..
..
..
90

Say I issue the command:
$cat *.dat > one_large_file

This takes about 1.5 seconds.

However, the Perl script I wrote takes 28 seconds . Am I doing
something wrong? My guess is that it has something to do with type
conversions behind the scenes.

Thanks.
-Matt

Here's the code:

#! /usr/bin/perl -w

use strict;

my $NFILES = 1000;

for (my $i=1;$i<=$NFILES;++$i) {
my $filename = "data$i.dat";

open (DATA, "< $filename") or die "Cannot open file: $!";
while (my $line = <DATA>){
print $line;
}
close DATA;
}
 
F

Fabian Pilkowski

* Matthew Crema said:
Say I have 1000 text files called:
Say I issue the command:
$cat *.dat > one_large_file

This takes about 1.5 seconds.

However, the Perl script I wrote takes 28 seconds . Am I doing
something wrong? My guess is that it has something to do with type
conversions behind the scenes.

Perl's default behavior while reading files with the <>-operator is a
line-by-line proceeding. From that place your code opens every file and
read in each line separately, just to print it out. Perl has to search
for each newline in your files -- that's why your code slows down.

But it's simple to change this behavior. Try to read in your files in
chunks of some kilobytes like

#!usr/bin/perl -w
use strict;
for my $i ( 1 .. 1000 ) {
my $file = "data$i.dat";
open my $fh, '<', $file or die "Cannot open '$file': $!";
local $/ = \8192;
print $_ while <$fh>;
}

I'm sure, cat is working this way. The important statement is the
assigning to the special var »$/«. Have a look at `perldoc perlvar`
where you can learn more about this variable.

regards,
fabian
 
M

Matthew Crema

Fabian said:
Perl's default behavior while reading files with the <>-operator is a
line-by-line proceeding. From that place your code opens every file and
read in each line separately, just to print it out. Perl has to search
for each newline in your files -- that's why your code slows down.

But it's simple to change this behavior. Try to read in your files in
chunks of some kilobytes like

#!usr/bin/perl -w
use strict;
for my $i ( 1 .. 1000 ) {
my $file = "data$i.dat";
open my $fh, '<', $file or die "Cannot open '$file': $!";
local $/ = \8192;
print $_ while <$fh>;
}

I'm sure, cat is working this way. The important statement is the
assigning to the special var »$/«. Have a look at `perldoc perlvar`
where you can learn more about this variable.

regards,
fabian

Thanks.

Works great.

-Matt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,158
Latest member
Vinay_Kumar Nevatia
Top