Slurp large files into an array, first is quick, rest are slow

gdtrob · Dec 28, 2005

I am slurping a series of large .csv files (6MB) directly into an array
one at a time (then querying). The first time I slurp a file it is
incredibly quick. The second time I do it the slurping is very slow
despite the fact that I close the file (using a filehandle) and undef
the array. here is the relevant code:

open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open
targetfile: $!";
print "opened";
@chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
into memory
print "slurped";

(and after each loop)

close (TARGETFILE);
undef @chrfile;

If it is possible to quickly/simply fix this I would much rather keep
this method than setting up a line by line input to the array. The
first slurp is very efficient.

I am using activestate perl 5.6 on a win32 system with 1 gig ram:

Kevin Collins · Dec 28, 2005

I am slurping a series of large .csv files (6MB) directly into an array
one at a time (then querying). The first time I slurp a file it is
incredibly quick. The second time I do it the slurping is very slow
despite the fact that I close the file (using a filehandle) and undef
the array. here is the relevant code:

open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open

^^^^^^^^^^^^^

No need to quote this. It should either be:
open (TARGETFILE,"CanRPT".$chromosome.".csv") || die "can't open
or
open (TARGETFILE,"CanRPT$chromosome.csv") || die "can't open

targetfile: $!";
print "opened";
@chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
into memory
print "slurped";

(and after each loop)

close (TARGETFILE);

Not that it answers your question, but you should be able to close your file
immediately after slurping it in, rather than after a loop...

undef @chrfile;

If it is possible to quickly/simply fix this I would much rather keep
this method than setting up a line by line input to the array. The
first slurp is very efficient.

I am using activestate perl 5.6 on a win32 system with 1 gig ram:

Kevin

Mark Clements · Dec 28, 2005

I am slurping a series of large .csv files (6MB) directly into an array
one at a time (then querying). The first time I slurp a file it is
incredibly quick. The second time I do it the slurping is very slow
despite the fact that I close the file (using a filehandle) and undef
the array. here is the relevant code:

open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open
targetfile: $!";
print "opened";
@chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
into memory
print "slurped";

(and after each loop)

close (TARGETFILE);
undef @chrfile;

If it is possible to quickly/simply fix this I would much rather keep
this method than setting up a line by line input to the array. The
first slurp is very efficient.

I am using activestate perl 5.6 on a win32 system with 1 gig ram:

I'd argue you'd be better off processing one line at a time, but anyway...

You need more detailed timing data: you are assuming that the extra time
is being spent in the slurp, but you have no timing data to prove this.

Use something like

Benchmark::Timer

to provide a detailed breakdown of where the time is being spent. You
may be surprised. It would be an idea to display file size and number of
lines at the same time.

Running with

use strict;
use warnings;

will save you a lot of heartache. Also, it is now recommended to use
lexically scoped filehandles:

open my $fh,"<","$filename"
or die "could not open $filename for read: $!";

You may also want to check out one of the cvs parsing modules available,
eg

DBD::CSV
Text::CSV_XS

Mark

A. Sinan Unur · Dec 28, 2005

(e-mail address removed) wrote in

I am slurping a series of large .csv files (6MB) directly into an
array one at a time (then querying). The first time I slurp a file it
is incredibly quick. The second time I do it the slurping is very slow
despite the fact that I close the file (using a filehandle) and undef
the array. here is the relevant code:

open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open
targetfile: $!";
print "opened";
@chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
into memory
print "slurped";

(and after each loop)

close (TARGETFILE);
undef @chrfile;

Here is what the loop body would look like if I were writing this:

{
my $name = sprintf 'CanRPT%s.csv', $chromosome;
open my $target, $name
or die "Cannot open '$name': $!";
my @chrfile = <$target>;

# do something with @chrfile
}

If it is possible to quickly/simply fix this I would much rather keep
this method than setting up a line by line input to the array. The
first slurp is very efficient.

I am using activestate perl 5.6 on a win32 system with 1 gig ram:

I am assuming the problem has to do with your coding style. You don't
seem to be using lexicals effectively, and the fact that you are
repeatedly slurping is a red flag.

Can't you read the file once (slurped or line-by-line) and build the
data structure it represents, and then use that data structure for
further processing.

It is impossible to tell without having seen the program, but the
constant slurping might be causing memory fragmentation and therefore
excessive pagefile hits. Dunno, really.

Sinan
--

Larry · Dec 28, 2005

I am using activestate perl 5.6 on a win32 system with 1 gig ram:

You may want to consider upgrading... 5.8 has been out for several
years.

Smegal · Dec 28, 2005

Thanks everyone,

I thought this might be a simple slurp usage problem ie: freeing up
memory or something because the program runs, its just really slow
after the first slurp. But I wasn't able to find anything google
searching. I'll look into improving my coding as suggested and see if
the problem persists.

Grant

Eric J. Roode · Dec 29, 2005

my $name = sprintf 'CanRPT%s.csv', $chromosome;

OOC, why use sprintf here instead of

my $name = "CanRPT$chromosome.csv";

?

--
Eric
`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`

Big and Blue · Dec 30, 2005

undef @chrfile;

Why bother? You are about to replace this with the read of the next
file. This means that you chuck away all of the memory allocation you have
just for Perl to reassign it all. This may lead to heap memory fragmentation.

reading from a file into an array in perl	5	Jun 30, 2010
Why is indexing into an numpy array that slow?	3	Nov 8, 2008
Scan CSV file and saving it into an array	2	Apr 25, 2013
Aggregating/Sorting Large files	6	May 25, 2006
[postgres] Is there a way to avoid having the library slurp-read the whole result-set?	4	Aug 4, 2006
c++ reading a CVS into an array of struct problem	0	Dec 13, 2010
FAQ 5.29 How can I read in an entire file all at once?	0	Mar 16, 2011
Strange behavior when working with large files	4	Jul 1, 2005

Slurp large files into an array, first is quick, rest are slow

gdtrob

Kevin Collins

Mark Clements

A. Sinan Unur

Larry

Smegal

Eric J. Roode

Big and Blue

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads