Slurp large files into an array, first is quick, rest are slow

Discussion in 'Perl Misc' started by gdtrob@gmail.com, Dec 28, 2005.

  1. Guest

    I am slurping a series of large .csv files (6MB) directly into an array
    one at a time (then querying). The first time I slurp a file it is
    incredibly quick. The second time I do it the slurping is very slow
    despite the fact that I close the file (using a filehandle) and undef
    the array. here is the relevant code:

    open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open
    targetfile: $!";
    print "opened";
    @chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
    into memory
    print "slurped";

    (and after each loop)

    close (TARGETFILE);
    undef @chrfile;

    If it is possible to quickly/simply fix this I would much rather keep
    this method than setting up a line by line input to the array. The
    first slurp is very efficient.

    I am using activestate perl 5.6 on a win32 system with 1 gig ram:
    , Dec 28, 2005
    #1
    1. Advertising

  2. In article <>,
    wrote:
    > I am slurping a series of large .csv files (6MB) directly into an array
    > one at a time (then querying). The first time I slurp a file it is
    > incredibly quick. The second time I do it the slurping is very slow
    > despite the fact that I close the file (using a filehandle) and undef
    > the array. here is the relevant code:
    >
    > open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open

    ^^^^^^^^^^^^^

    No need to quote this. It should either be:
    open (TARGETFILE,"CanRPT".$chromosome.".csv") || die "can't open
    or
    open (TARGETFILE,"CanRPT$chromosome.csv") || die "can't open

    > targetfile: $!";
    > print "opened";
    > @chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
    > into memory
    > print "slurped";
    >
    > (and after each loop)
    >
    > close (TARGETFILE);


    Not that it answers your question, but you should be able to close your file
    immediately after slurping it in, rather than after a loop...

    > undef @chrfile;
    >
    > If it is possible to quickly/simply fix this I would much rather keep
    > this method than setting up a line by line input to the array. The
    > first slurp is very efficient.
    >
    > I am using activestate perl 5.6 on a win32 system with 1 gig ram:



    Kevin


    --
    Unix Guy Consulting, LLC
    Unix and Linux Automation, Shell, Perl and CGI scripting
    http://www.unix-guy.com
    Kevin Collins, Dec 28, 2005
    #2
    1. Advertising

  3. wrote:
    > I am slurping a series of large .csv files (6MB) directly into an array
    > one at a time (then querying). The first time I slurp a file it is
    > incredibly quick. The second time I do it the slurping is very slow
    > despite the fact that I close the file (using a filehandle) and undef
    > the array. here is the relevant code:
    >
    > open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open
    > targetfile: $!";
    > print "opened";
    > @chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
    > into memory
    > print "slurped";
    >
    > (and after each loop)
    >
    > close (TARGETFILE);
    > undef @chrfile;
    >
    > If it is possible to quickly/simply fix this I would much rather keep
    > this method than setting up a line by line input to the array. The
    > first slurp is very efficient.
    >
    > I am using activestate perl 5.6 on a win32 system with 1 gig ram:
    >


    I'd argue you'd be better off processing one line at a time, but anyway...

    You need more detailed timing data: you are assuming that the extra time
    is being spent in the slurp, but you have no timing data to prove this.

    Use something like

    Benchmark::Timer

    to provide a detailed breakdown of where the time is being spent. You
    may be surprised. It would be an idea to display file size and number of
    lines at the same time.

    Running with

    use strict;
    use warnings;

    will save you a lot of heartache. Also, it is now recommended to use
    lexically scoped filehandles:

    open my $fh,"<","$filename"
    or die "could not open $filename for read: $!";

    You may also want to check out one of the cvs parsing modules available,
    eg

    DBD::CSV
    Text::CSV_XS

    Mark
    Mark Clements, Dec 28, 2005
    #3
  4. wrote in
    news::

    > I am slurping a series of large .csv files (6MB) directly into an
    > array one at a time (then querying). The first time I slurp a file it
    > is incredibly quick. The second time I do it the slurping is very slow
    > despite the fact that I close the file (using a filehandle) and undef
    > the array. here is the relevant code:
    >
    > open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open
    > targetfile: $!";
    > print "opened";
    > @chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
    > into memory
    > print "slurped";
    >
    > (and after each loop)
    >
    > close (TARGETFILE);
    > undef @chrfile;


    Here is what the loop body would look like if I were writing this:

    {
    my $name = sprintf 'CanRPT%s.csv', $chromosome;
    open my $target, $name
    or die "Cannot open '$name': $!";
    my @chrfile = <$target>;

    # do something with @chrfile
    }

    > If it is possible to quickly/simply fix this I would much rather keep
    > this method than setting up a line by line input to the array. The
    > first slurp is very efficient.
    >
    > I am using activestate perl 5.6 on a win32 system with 1 gig ram:


    I am assuming the problem has to do with your coding style. You don't
    seem to be using lexicals effectively, and the fact that you are
    repeatedly slurping is a red flag.

    Can't you read the file once (slurped or line-by-line) and build the
    data structure it represents, and then use that data structure for
    further processing.

    It is impossible to tell without having seen the program, but the
    constant slurping might be causing memory fragmentation and therefore
    excessive pagefile hits. Dunno, really.

    Sinan
    --
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Dec 28, 2005
    #4
  5. Larry Guest

    wrote:

    > I am using activestate perl 5.6 on a win32 system with 1 gig ram:


    You may want to consider upgrading... 5.8 has been out for several
    years.
    Larry, Dec 28, 2005
    #5
  6. Smegal Guest

    Thanks everyone,

    I thought this might be a simple slurp usage problem ie: freeing up
    memory or something because the program runs, its just really slow
    after the first slurp. But I wasn't able to find anything google
    searching. I'll look into improving my coding as suggested and see if
    the problem persists.

    Grant
    Smegal, Dec 28, 2005
    #6
  7. "A. Sinan Unur" <> wrote in
    news:Xns973A9C0195EA7asu1cornelledu@127.0.0.1:

    > my $name = sprintf 'CanRPT%s.csv', $chromosome;


    OOC, why use sprintf here instead of

    my $name = "CanRPT$chromosome.csv";

    ?

    --
    Eric
    `$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
    $!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
    $_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
    ;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`
    Eric J. Roode, Dec 29, 2005
    #7
  8. Big and Blue Guest

    wrote:

    > undef @chrfile;


    Why bother? You are about to replace this with the read of the next
    file. This means that you chuck away all of the memory allocation you have
    just for Perl to reassign it all. This may lead to heap memory fragmentation.

    --
    Just because I've written it doesn't mean that
    either you or I have to believe it.
    Big and Blue, Dec 30, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JosephByrns

    Slow, then quick then slow

    JosephByrns, Jul 10, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    2,475
    codezilla94
    Nov 13, 2007
  2. Amil Hanish

    *** REST, large POST, and "Server Error" (for gurus only)

    Amil Hanish, Jul 30, 2009, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    721
    Amil Hanish
    Jul 30, 2009
  3. Dick Davies
    Replies:
    1
    Views:
    106
    Gavin Sinclair
    Sep 29, 2005
  4. Wes Gamble
    Replies:
    7
    Views:
    121
    Lyle Johnson
    Mar 23, 2006
  5. Tom Sliva
    Replies:
    7
    Views:
    105
    Tom Sliva
    Nov 23, 2004
Loading...

Share This Page