Speeding my script

Discussion in 'Perl Misc' started by Petyr David, Feb 22, 2008.

  1. Petyr David

    Petyr David Guest

    have a web page calling PERL script that searches for patterns in 20,
    000 files + and returns link to files and lines found matching
    pattern. I use a call to `find` and `egrep`

    Q: Script works - but is straining under the load - files are in the
    Gbs.
    How to speed process? How simple to employ threads or slitting
    off
    new processes?

    I know i should RTFM (LOL) and I will, but just looking for some
    quick guidance/suggestions

    pseudo code;

    cd root of document directory

    Load array with names of directories

    forech subdir in @dirnames

    cd $subdir
    lots of if statements to figure what find command and what
    option to use
    @temp_array=`$long_find_grep_command`
    push @temp_array onto big array
    other processing
    end foreach

    what I'd like to do is to be able to simultaneously be searching more
    than 1 subdirectory

    TX for your help -
     
    Petyr David, Feb 22, 2008
    #1
    1. Advertising

  2. Petyr David

    smallpond Guest

    On Feb 22, 1:38 pm, Petyr David <> wrote:
    > have a web page calling PERL script that searches for patterns in 20,
    > 000 files + and returns link to files and lines found matching
    > pattern. I use a call to `find` and `egrep`
    >
    > Q: Script works - but is straining under the load - files are in the
    > Gbs.
    > How to speed process? How simple to employ threads or slitting
    > off
    > new processes?
    >
    > I know i should RTFM (LOL) and I will, but just looking for some
    > quick guidance/suggestions
    >
    > pseudo code;
    >
    > cd root of document directory
    >
    > Load array with names of directories
    >
    > forech subdir in @dirnames
    >
    > cd $subdir
    > lots of if statements to figure what find command and what
    > option to use
    > @temp_array=`$long_find_grep_command`
    > push @temp_array onto big array
    > other processing
    > end foreach
    >
    > what I'd like to do is to be able to simultaneously be searching more
    > than 1 subdirectory
    >
    > TX for your help -


    Your idea is only likely to help if the directories reside on
    different
    disks, otherwise it will slow down the search by thrashing the disks.

    Better would be to analyze the type of requests. Maybe there
    are common searches you can cache. For example, a search for
    /the magic words are squeamish ossifrage/ need only be performed
    on files known to contain the common word "ossifrage".
     
    smallpond, Feb 22, 2008
    #2
    1. Advertising

  3. Petyr David

    J. Gleixner Guest

    Petyr David wrote:
    > have a web page calling PERL script that searches for patterns in 20,
    > 000 files + and returns link to files and lines found matching
    > pattern. I use a call to `find` and `egrep`
    >
    > Q: Script works - but is straining under the load - files are in the
    > Gbs.
    > How to speed process? How simple to employ threads or slitting
    > off
    > new processes?
    >
    > I know i should RTFM (LOL) and I will, but just looking for some
    > quick guidance/suggestions


    No need to LOL at your laziness.

    Using find/grep on thousands of files and Gb of data is a poor
    choice. Try looking at various indexing tools: htdig, glimpse,
    Swish-e, etc.
     
    J. Gleixner, Feb 22, 2008
    #3
  4. Petyr David

    Petyr David Guest

    On Feb 22, 2:48 pm, "J. Gleixner" <>
    wrote:
    > Petyr David wrote:
    > > have a web page calling PERL script that searches for patterns in 20,
    > > 000 files + and returns link to files and lines found matching
    > > pattern. I use a call to `find` and `egrep`

    >
    > > Q: Script works - but is straining under the load - files are in the
    > > Gbs.
    > > How to speed process? How simple to employ threads or slitting
    > > off
    > > new processes?

    >
    > > I know i should RTFM (LOL) and I will, but just looking for some
    > > quick guidance/suggestions

    >
    > No need to LOL at your laziness.
    >
    > Using find/grep on thousands of files and Gb of data is a poor
    > choice. Try looking at various indexing tools: htdig, glimpse,
    > Swish-e, etc.


    Agreed, but it was my first project in PERL. It started out as a very,
    very simple file searcher
    and then a bunch of people asked if anyone knew of file search
    software that could be implementd quickly.

    I meekly raised my hand. Since then a lot of options have been added
    and I do believe
    that I either take this to the next step, using one of the indexing
    tools mentioned, or I
    leave it "as is". I have plenty of other things to do. It's just that
    I like programming.
    My other responsibilities pay me plenty, but are boring and are almost
    clerical in nature

    TX to all for the help
     
    Petyr David, Feb 23, 2008
    #4
  5. Petyr David

    Jamie Guest

    In <>,
    Petyr David <> mentions:
    >have a web page calling PERL script that searches for patterns in 20,
    >000 files + and returns link to files and lines found matching
    >pattern. I use a call to `find` and `egrep`


    That is going to take a long, long time.

    >Q: Script works - but is straining under the load - files are in the
    >Gbs.
    > How to speed process? How simple to employ threads or slitting
    >off
    > new processes?


    Thats an option. Check into File::Find, fork() and pipes. You could
    create some pipes, fork several processes, do a select on the handles
    and run the commands in parallel.

    This will still run awfully slow though.

    >what I'd like to do is to be able to simultaneously be searching more
    >than 1 subdirectory


    If you don't need full regex capability, you could check into indices. If you
    know one of the words, you can use that to filter out which documents to scan.

    If you can get the words sorted, look into Search::Dict (or, use a tied hash)

    Best bet is to use an index though. Even if it's crude, a substantial amount
    of your time is probably spent opening and closing files. (well, find/grep anyway)

    An example of a "crude index" is the whatis database.

    When you type 'apropos keyword' you're not opening a zillion manpages and
    scanning them.

    Jamie
    --
    http://www.geniegate.com Custom web programming
    Perl * Java * UNIX User Management Solutions
     
    Jamie, Feb 23, 2008
    #5
  6. Petyr David

    Petyr David Guest

    On Feb 23, 3:07 am, (Jamie) wrote:
    > In <>,
    > Petyr David <> mentions:
    >
    > >have a web page calling PERL script that searches for patterns in 20,
    > >000 files + and returns link to files and lines found matching
    > >pattern. I use a call to `find` and `egrep`

    >
    > That is going to take a long, long time.
    >
    > >Q: Script works - but is straining under the load - files are in the
    > >Gbs.
    > > How to speed process? How simple to employ threads or slitting
    > >off
    > > new processes?

    >
    > Thats an option. Check into File::Find, fork() and pipes. You could
    > create some pipes, fork several processes, do a select on the handles
    > and run the commands in parallel.
    >
    > This will still run awfully slow though.
    >
    > >what I'd like to do is to be able to simultaneously be searching more
    > >than 1 subdirectory

    >
    > If you don't need full regex capability, you could check into indices. If you
    > know one of the words, you can use that to filter out which documents to scan.
    >
    > If you can get the words sorted, look into Search::Dict (or, use a tied hash)
    >
    > Best bet is to use an index though. Even if it's crude, a substantial amount
    > of your time is probably spent opening and closing files. (well, find/grep anyway)
    >
    > An example of a "crude index" is the whatis database.
    >
    > When you type 'apropos keyword' you're not opening a zillion manpages and
    > scanning them.
    >
    > Jamie
    > --http://www.geniegate.com Custom web programming
    > Perl * Java * UNIX User Management Solutions


    > If you don't need full regex capability, you could check into indices. If you
    > know one of the words, you can use that to filter out which documents to scan.


    but I do. I've considered, and will install Swish-e. Would i not be
    able to use regexes with something like Swishe-e?
     
    Petyr David, Feb 25, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Spamtrap

    Need some hints on speeding up

    Spamtrap, Aug 11, 2004, in forum: Perl
    Replies:
    1
    Views:
    383
    Jim Gibson
    Aug 12, 2004
  2. Troy

    Speeding up page display

    Troy, Jan 21, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    526
    George Ter-Saakov
    Jan 22, 2004
  3. Luis P. Mendes

    speeding up Python script

    Luis P. Mendes, May 18, 2005, in forum: Python
    Replies:
    10
    Views:
    605
    Luis P. Mendes
    May 18, 2005
  4. Robert Love

    Speeding Up This Script

    Robert Love, Apr 26, 2006, in forum: Ruby
    Replies:
    5
    Views:
    112
    Robert Love
    Apr 28, 2006
  5. stig erikson
    Replies:
    3
    Views:
    210
    Gregory Toomey
    Oct 29, 2004
Loading...

Share This Page