newbie questions about File IO

Matthew Crema · Apr 13, 2005

Hello,

I'm brand new to Perl. I spent the last few weeks reading the "Llama"
and I want to get started now. This question is more for a beginner's
curiosity than an actual application.

Say I have 1000 text files called:
data1.dat
data2.dat
data3.dat
..
..
..
data1000.dat

Each of these files has 32768 integer numbers (between -100 and 100)
like so:
10
-20
-14
..
..
..
90

Say I issue the command:
$cat *.dat > one_large_file

This takes about 1.5 seconds.

However, the Perl script I wrote takes 28 seconds . Am I doing
something wrong? My guess is that it has something to do with type
conversions behind the scenes.

Thanks.
-Matt

Here's the code:

#! /usr/bin/perl -w

use strict;

my $NFILES = 1000;

for (my $i=1;$i<=$NFILES;++$i) {
my $filename = "data$i.dat";

open (DATA, "< $filename") or die "Cannot open file: $!";
while (my $line = <DATA>){
print $line;
}
close DATA;
}

Fabian Pilkowski · Apr 13, 2005

* Matthew Crema said:
Say I have 1000 text files called:

Say I issue the command:
$cat *.dat > one_large_file

This takes about 1.5 seconds.

However, the Perl script I wrote takes 28 seconds . Am I doing
something wrong? My guess is that it has something to do with type
conversions behind the scenes.

Perl's default behavior while reading files with the <>-operator is a
line-by-line proceeding. From that place your code opens every file and
read in each line separately, just to print it out. Perl has to search
for each newline in your files -- that's why your code slows down.

But it's simple to change this behavior. Try to read in your files in
chunks of some kilobytes like

#!usr/bin/perl -w
use strict;
for my $i ( 1 .. 1000 ) {
my $file = "data$i.dat";
open my $fh, '<', $file or die "Cannot open '$file': $!";
local $/ = \8192;
print $_ while <$fh>;
}

I'm sure, cat is working this way. The important statement is the
assigning to the special var »$/«. Have a look at `perldoc perlvar`
where you can learn more about this variable.

regards,
fabian

Matthew Crema · Apr 13, 2005

Fabian said:
Perl's default behavior while reading files with the <>-operator is a
line-by-line proceeding. From that place your code opens every file and
read in each line separately, just to print it out. Perl has to search
for each newline in your files -- that's why your code slows down.

But it's simple to change this behavior. Try to read in your files in
chunks of some kilobytes like

#!usr/bin/perl -w
use strict;
for my $i ( 1 .. 1000 ) {
my $file = "data$i.dat";
open my $fh, '<', $file or die "Cannot open '$file': $!";
local $/ = \8192;
print $_ while <$fh>;
}

I'm sure, cat is working this way. The important statement is the
assigning to the special var »$/«. Have a look at `perldoc perlvar`
where you can learn more about this variable.

regards,
fabian

Thanks.

Works great.

-Matt

Upload file fails	11	Feb 1, 2007
file pointer array	1	Jun 5, 2012
Cron Job & Perl	2	Mar 21, 2005
Noob File IO question	8	Apr 4, 2007
Questions about Inline::C	13	Sep 28, 2006
Extraction Fields from a file	9	Jan 12, 2007
Newbie Question: python mysqldb performance question	3	May 21, 2007
Does cProfile include IO wait time?	2	Jul 5, 2009

newbie questions about File IO

Matthew Crema

Fabian Pilkowski

Matthew Crema

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads