Script "terminates" when processing large numbers of files

S

Scott Stark

Hi, I'm running a script that reads through large numbers of html
files (1500-2000 or so) in each of about 20 directories, searching for
strings in the files.

For some reason the script quits midway through, and I get a
"Terminated" message. It quits while checking a batch of files at a
different point in the file system every time, so I know it's not a
code error. In fact if I limit the total number of files processed to
a couple of hundred, the script runs fine.

Is this some kind of memory problem or other resource problem? I've
tried breaking up each directory pass into separate subroutine calls,
and even broken up the individual directory lists so that they process
in smaller batches of 300 each, thinking that might free up resources.
Something like this:

foreach $d (@dirs){
my @files = glob("$basedir/$d/*.html $basedir/$d/*.htm");
if(scalar(@files) > 300){
... # make smaller lists called my(@shortList) of 300 each
search_files(@shortList);
}
}

sub search_files {
my @files = @_;
... # search through each file
}

I've tried running the script with perl -d and #! /usr/bin/perl -w
with no errors and get the same results, but at different points in
the file system.

Any thoughts? If it's a memory problem, is there some way to free up
memory?

thanks,
Scott
 
S

Scott Stark

Tim Heaney said:
Perhaps the glob is hitting the expansion limit. Try reading the
directory yourself...something like

Hi Tim, well that didn't work either. I've done some further testing
and discovered that the "termination" is happening not in the glob (or
read) but in the search_files() subroutine, always (as far as I can
gather) after it's closed one file in the @files list and before it
opens the next.

Here's an abbreviated version of the search_files subroutine that's
called for each directory:

sub search_files{
my(@files) = @_;
my(@searchStrings) = split(/\s+/,param('terms'));
foreach $f (@files){
open(F, "$f") || on_error("Can't open file $f for reading");
while($line=<F>){
for($s=0;$s<scalar(@searchStrings);$s++){
$line =~ s/($searchStrings[$s])/<font
color=\"blue\"><b>$1<\/b><\/font>/gi
and $found{$searchStrings[$s]}=$line
and next if($line =~/$searchStrings[$s]/i);
}
}
close(F);
}

Not much unusual going on here -- perhaps Gregory is correct, there's
a time limit? The whole thing never takes more than a couple of
minutes though. And where it stops varies every time.

thanks
Scott
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top