Speeding up an application - general rules

P

Petyr David

I have a small Perl application that searches through a series of
directories chosen by the user for files containing a pattern or group
of patterns. The file names and matching patterns are returned to the
user sorted by the file's modification time.The user also has the
choice of how far back in time to search and how many lines of output
he wants to see for each file.

With an expected and current increase of files and file sizes, the
application is bogging down a bit. I didn't design it with performance
in mind and I will be reviewing what I've done, but are there general
rules or specific suggestions you could offer to enhance performance?

Basically: the script uses perl's system command to run a long winded
"find" command which is piped to sed to correct patterns that match
HTML markers. The matching lines are then shoved into an array. The
elements of the array are moved into a hash for the purpose of sorting
the file names. Then file names and matching lines are printed.

Q: Can I speed things by eliminating the sed command and letting Perl
filter and modify the matching patterns? If so, how much of a
performance gain?

Is using Perl's grep to search through every file for the pattern
faster than using the find command? The find command has the advantage
that I can search for files of a certain date rather easily. Again:
could that be done more rapidly by Perl's looking at the file's mod
time?

Any thoughts or suggestions would be appreciated

TX
 
E

Eric Schwartz

Petyr David said:
Basically: the script uses perl's system command to run a long winded
"find" command which is piped to sed to correct patterns that match
HTML markers.

You are unclear here, which is why we generally ask you to post
example code. In fact, it's really kinda hard to say anything for
sure because you didn't. I'm not sure, for instance, if you pipe the
output of find to sed, or if you iterate over the list of files
returned by find and run sed on the contents of those files. I'm
guessing the former, but it's just a guess. If you want people to be
able to help you the best way possible, you probably don't want to
make them guess.
The matching lines are then shoved into an array.

Which lines? Are you talking about contents of the files, or names of
files? Now I think you're talking about contents. It would help if
you were more clear.
The elements of the array are moved into a hash for the purpose of
sorting the file names.

Er, now I think you're talking about file names.
Then file names and matching lines are printed.

Now I have no idea. What are you actually doing? Can you please show
some code?
Q: Can I speed things by eliminating the sed command and letting Perl
filter and modify the matching patterns? If so, how much of a
performance gain?

Honestly, rather than asking us, you should ask Perl. The answer to
"how do I speed things up?" is profile profile profile! Until you
profile, you don't know what will help.

'perldoc -q profile' mentions the Devel::DProf module, and you can use
'perldoc Devel::DProf' to find out more about it. You'll also want to
learn about the Benchmark module ('perldoc Benchmark'), which will
help you compare two different ways of doing the same thing to find
out which is faster.
Is using Perl's grep to search through every file for the pattern
faster than using the find command?

Wait, are you on Windows? It's been a very long time, but I vaguely
recall that the Windows 'find' command searches in files, whereas the
Unix one mostly just looks at file names and metadata.
The find command has the advantage that I can search for files of a
certain date rather easily. Again: could that be done more rapidly
by Perl's looking at the file's mod time?

Those questions really depend on uch a large a number of things,
including your system's OS, configuration, load for other tasks, etc,
that it's almost impossible for anyone to tell you for certain.
Honestly, even if somebody were to give you an answer here, I wouldn't
believe them-- they may be telling you what worked for them, but it
might not be the same for you. Profile, then optimize the
worst-performing part, then profile again, optimize what's left, and
repeat. Take care that in optimizing one part you don't make another
slower-- but that's all part of the art, really.s
Any thoughts or suggestions would be appreciated

Enjoy. But next time, please post some code, so we can actulaly tell
what you're doing. Making people guess and make stuff up is
frustrating for us, because we can't tell if we're guessing right, or
going completely off the deep end. I hope I was helpful anyway.

-=Eric
 
X

xhoster

Petyr David said:
Basically: the script uses perl's system command to run a long winded
"find" command which is piped to sed to correct patterns that match
HTML markers. The matching lines are then shoved into an array. The
elements of the array are moved into a hash for the purpose of sorting
the file names. Then file names and matching lines are printed.

Q: Can I speed things by eliminating the sed command and letting Perl
filter and modify the matching patterns?

Probably not. It should be a 30 second job to take out the sed pipe.
Sure, the answers will now be wrong, but unless it gives the wrong answers
much faster than it used to, you will know there is no speed benefit to be
had by rewriting the sed into Perl.
If so, how much of a
performance gain?

Is using Perl's grep to search through every file for the pattern
faster than using the find command?

Probably not. Also, Perl's grep (currently) forces the list to be
evaluated to completion (in memory) before it gets started, so potentially
takes much more memory. You may want to look at Perl's File::Find,
although I see no particular reason to think it will be faster than the
system's find.
The find command has the advantage
that I can search for files of a certain date rather easily. Again:
could that be done more rapidly by Perl's looking at the file's mod
time?

Probably not more rapidly, no.

What is the total CPU usage? What is the relative usage of each process
(perl, find, sed)?

Xho
 
P

Petyr David

I haven't yet done any measuring of the CPU usage for the processes,
but will look into that -TX. I just heard yesterday (the day of my
post) that the application was bogging down. When I do my testing, I'm
working with live, production data, but typically limit my search to
one of three patterns and do it on only one or two directories. I want
to get my results back quickly. The users of this app apparently make
heavy use of this and are looking for the "needle in a haystack".
 
P

Petyr David

I will also review those URLS. Creating an app that did indexing of the
files did not come up as this script came from a far simpler one that
merely found files matching the single pattern and printed a link to
the file. I also don't have the time to make this a full time job.
Something was needed quick and dirty and that's what they got : -)

TX
 
R

Ric

Petyr said:
I will also review those URLS. Creating an app that did indexing of the
files did not come up as this script came from a far simpler one that
merely found files matching the single pattern and printed a link to
the file. I also don't have the time to make this a full time job.
Something was needed quick and dirty and that's what they got : -)

I just took a quick look at your problem description, not sure what your
needs are, but have you considered using a desktop search engine to do
the work for you?

http://beagle-project.org/Searching_Data
 
P

Petyr David

You're correct: I use Perl's back tics to take output from a command
that looks similar to this to populate an array:

my @filepatterns=`find $subdir -n $days -type f -exec egrep $pattern {}
\; |sed "s/somepattern/diffpattern"`

I like using the find command because I can also control how many days
to go back in my search. I will also check Devl::DProf
 
P

Petyr David

This litle app is web based and is going against files on a Red Hat
Server's NFS file system. I suppose I could use Samba ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top