Speeding my script

P

Petyr David

have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

Q: Script works - but is straining under the load - files are in the
Gbs.
How to speed process? How simple to employ threads or slitting
off
new processes?

I know i should RTFM (LOL) and I will, but just looking for some
quick guidance/suggestions

pseudo code;

cd root of document directory

Load array with names of directories

forech subdir in @dirnames

cd $subdir
lots of if statements to figure what find command and what
option to use
@temp_array=`$long_find_grep_command`
push @temp_array onto big array
other processing
end foreach

what I'd like to do is to be able to simultaneously be searching more
than 1 subdirectory

TX for your help -
 
S

smallpond

have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

Q: Script works - but is straining under the load - files are in the
Gbs.
How to speed process? How simple to employ threads or slitting
off
new processes?

I know i should RTFM (LOL) and I will, but just looking for some
quick guidance/suggestions

pseudo code;

cd root of document directory

Load array with names of directories

forech subdir in @dirnames

cd $subdir
lots of if statements to figure what find command and what
option to use
@temp_array=`$long_find_grep_command`
push @temp_array onto big array
other processing
end foreach

what I'd like to do is to be able to simultaneously be searching more
than 1 subdirectory

TX for your help -

Your idea is only likely to help if the directories reside on
different
disks, otherwise it will slow down the search by thrashing the disks.

Better would be to analyze the type of requests. Maybe there
are common searches you can cache. For example, a search for
/the magic words are squeamish ossifrage/ need only be performed
on files known to contain the common word "ossifrage".
 
J

J. Gleixner

Petyr said:
have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

Q: Script works - but is straining under the load - files are in the
Gbs.
How to speed process? How simple to employ threads or slitting
off
new processes?

I know i should RTFM (LOL) and I will, but just looking for some
quick guidance/suggestions

No need to LOL at your laziness.

Using find/grep on thousands of files and Gb of data is a poor
choice. Try looking at various indexing tools: htdig, glimpse,
Swish-e, etc.
 
P

Petyr David

No need to LOL at your laziness.

Using find/grep on thousands of files and Gb of data is a poor
choice. Try looking at various indexing tools: htdig, glimpse,
Swish-e, etc.

Agreed, but it was my first project in PERL. It started out as a very,
very simple file searcher
and then a bunch of people asked if anyone knew of file search
software that could be implementd quickly.

I meekly raised my hand. Since then a lot of options have been added
and I do believe
that I either take this to the next step, using one of the indexing
tools mentioned, or I
leave it "as is". I have plenty of other things to do. It's just that
I like programming.
My other responsibilities pay me plenty, but are boring and are almost
clerical in nature

TX to all for the help
 
J

Jamie

In said:
have a web page calling PERL script that searches for patterns in 20,
000 files + and returns link to files and lines found matching
pattern. I use a call to `find` and `egrep`

That is going to take a long, long time.
Q: Script works - but is straining under the load - files are in the
Gbs.
How to speed process? How simple to employ threads or slitting
off
new processes?

Thats an option. Check into File::Find, fork() and pipes. You could
create some pipes, fork several processes, do a select on the handles
and run the commands in parallel.

This will still run awfully slow though.
what I'd like to do is to be able to simultaneously be searching more
than 1 subdirectory

If you don't need full regex capability, you could check into indices. If you
know one of the words, you can use that to filter out which documents to scan.

If you can get the words sorted, look into Search::Dict (or, use a tied hash)

Best bet is to use an index though. Even if it's crude, a substantial amount
of your time is probably spent opening and closing files. (well, find/grep anyway)

An example of a "crude index" is the whatis database.

When you type 'apropos keyword' you're not opening a zillion manpages and
scanning them.

Jamie
 
P

Petyr David

In <29fba8f0-26f0-4b2e-9dbd-637445719...@n77g2000hse.googlegroups.com>,


That is going to take a long, long time.


Thats an option. Check into File::Find, fork() and pipes. You could
create some pipes, fork several processes, do a select on the handles
and run the commands in parallel.

This will still run awfully slow though.


If you don't need full regex capability, you could check into indices. If you
know one of the words, you can use that to filter out which documents to scan.

If you can get the words sorted, look into Search::Dict (or, use a tied hash)

Best bet is to use an index though. Even if it's crude, a substantial amount
of your time is probably spent opening and closing files. (well, find/grep anyway)

An example of a "crude index" is the whatis database.

When you type 'apropos keyword' you're not opening a zillion manpages and
scanning them.

Jamie
--http://www.geniegate.com Custom web programming
Perl * Java * UNIX User Management Solutions
If you don't need full regex capability, you could check into indices. If you
know one of the words, you can use that to filter out which documents to scan.

but I do. I've considered, and will install Swish-e. Would i not be
able to use regexes with something like Swishe-e?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top