Speeding my script

Petyr David · Feb 22, 2008

have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

Q: Script works - but is straining under the load - files are in the
Gbs.
How to speed process? How simple to employ threads or slitting
off
new processes?

I know i should RTFM (LOL) and I will, but just looking for some
quick guidance/suggestions

pseudo code;

cd root of document directory

Load array with names of directories

forech subdir in @dirnames

cd $subdir
lots of if statements to figure what find command and what
option to use
@temp_array=`$long_find_grep_command`
push @temp_array onto big array
other processing
end foreach

what I'd like to do is to be able to simultaneously be searching more
than 1 subdirectory

TX for your help -

smallpond · Feb 22, 2008

have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

Q: Script works - but is straining under the load - files are in the
Gbs.
How to speed process? How simple to employ threads or slitting
off
new processes?

I know i should RTFM (LOL) and I will, but just looking for some
quick guidance/suggestions

pseudo code;

cd root of document directory

Load array with names of directories

forech subdir in @dirnames

cd $subdir
lots of if statements to figure what find command and what
option to use
@temp_array=`$long_find_grep_command`
push @temp_array onto big array
other processing
end foreach

what I'd like to do is to be able to simultaneously be searching more
than 1 subdirectory

TX for your help -

Your idea is only likely to help if the directories reside on
different
disks, otherwise it will slow down the search by thrashing the disks.

Better would be to analyze the type of requests. Maybe there
are common searches you can cache. For example, a search for
/the magic words are squeamish ossifrage/ need only be performed
on files known to contain the common word "ossifrage".

J. Gleixner · Feb 22, 2008

Petyr said:
have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

Q: Script works - but is straining under the load - files are in the
Gbs.
How to speed process? How simple to employ threads or slitting
off
new processes?

I know i should RTFM (LOL) and I will, but just looking for some
quick guidance/suggestions

No need to LOL at your laziness.

Using find/grep on thousands of files and Gb of data is a poor
choice. Try looking at various indexing tools: htdig, glimpse,
Swish-e, etc.

Petyr David · Feb 23, 2008

No need to LOL at your laziness.

Using find/grep on thousands of files and Gb of data is a poor
choice. Try looking at various indexing tools: htdig, glimpse,
Swish-e, etc.

Agreed, but it was my first project in PERL. It started out as a very,
very simple file searcher
and then a bunch of people asked if anyone knew of file search
software that could be implementd quickly.

I meekly raised my hand. Since then a lot of options have been added
and I do believe
that I either take this to the next step, using one of the indexing
tools mentioned, or I
leave it "as is". I have plenty of other things to do. It's just that
I like programming.
My other responsibilities pay me plenty, but are boring and are almost
clerical in nature

TX to all for the help

Jamie · Feb 23, 2008

In said:
have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

That is going to take a long, long time.

Q: Script works - but is straining under the load - files are in the
Gbs.
How to speed process? How simple to employ threads or slitting
off
new processes?

Thats an option. Check into File::Find, fork() and pipes. You could
create some pipes, fork several processes, do a select on the handles
and run the commands in parallel.

This will still run awfully slow though.

what I'd like to do is to be able to simultaneously be searching more
than 1 subdirectory

If you don't need full regex capability, you could check into indices. If you
know one of the words, you can use that to filter out which documents to scan.

If you can get the words sorted, look into Search:

ict (or, use a tied hash)

Best bet is to use an index though. Even if it's crude, a substantial amount
of your time is probably spent opening and closing files. (well, find/grep anyway)

An example of a "crude index" is the whatis database.

When you type 'apropos keyword' you're not opening a zillion manpages and
scanning them.

Jamie

Petyr David · Feb 25, 2008

In <29fba8f0-26f0-4b2e-9dbd-637445719...@n77g2000hse.googlegroups.com>,

That is going to take a long, long time.

Thats an option. Check into File::Find, fork() and pipes. You could
create some pipes, fork several processes, do a select on the handles
and run the commands in parallel.

This will still run awfully slow though.

If you don't need full regex capability, you could check into indices. If you
know one of the words, you can use that to filter out which documents to scan.

If you can get the words sorted, look into Search:ict (or, use a tied hash)

Best bet is to use an index though. Even if it's crude, a substantial amount
of your time is probably spent opening and closing files. (well, find/grep anyway)

An example of a "crude index" is the whatis database.

When you type 'apropos keyword' you're not opening a zillion manpages and
scanning them.

Jamie
--http://www.geniegate.com Custom web programming
Perl * Java * UNIX User Management Solutions

If you don't need full regex capability, you could check into indices. If you
know one of the words, you can use that to filter out which documents to scan.

but I do. I've considered, and will install Swish-e. Would i not be
able to use regexes with something like Swishe-e?

Speeding up an application - general rules	7	Dec 22, 2006
Which Perl files (other than modules) are used by a script?	0	Sep 13, 2003
compiling perl 5.8.7 on Solaris 8	3	Nov 17, 2005
REQ: Perl 5.8.3 on OpenBSD	3	Mar 6, 2004
Net::FTP and IBM TSO :-(	15	Mar 15, 2006
import confused by contents of working directory	2	Jun 2, 2006
[ANN] JRuby 1.4.0 Released	2	Nov 2, 2009
NewsMaestro Usenet Supertool	0	Aug 29, 2007

Speeding my script

Petyr David

smallpond

J. Gleixner

Petyr David

Jamie

Petyr David

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads