perl vs Unix grep

Discussion in 'Perl' started by Al Belden, Jul 3, 2004.

  1. Al Belden

    Al Belden Guest

    Hi all,
    I've been working on a problem that I thought might be of interest: I'm
    trying to replace some korn shell scripts that search source code files with
    perl scripts to gain certain features such as:

    More powerful regular expressions available in perl
    Ability to print out lines before and after matches (gnu grep supports this
    but is not availble on our Digital Unix and AIX platforms)
    Make searches case insensitive by default (yes, I know this can be done with
    grep but the shell scripts that use
    grep don't do this)

    We're talking about approx. 5000 files spread over 15 directories. To date
    it has proven quite difficult (for me) to match the performance of the Korn
    shell scripts using perl scripts and still obtain the line number and
    context information needed. The crux of the problem is that I have seen the
    best performance from perl when I match with the /g option on a string that
    represents the current slurped file:

    local $/;
    my $curStr = <FH>;
    while ($curStr =~ /$compiledRegex/g)
    {
    # write matches to file for eventual paging
    }

    This works well except that when each match is found I need the line number
    the match has been found in. As far as I can tell from reading and research
    there is no variable that holds this information as I am not reading from
    the file at this point. I can get the information in other ways such as:

    1. Reading each file a line at a time, testing for a match and keeping a
    line counter or using $NR.
    2. Reading the file into an array and processing a line at a time
    3. Creating index files for the source files that store line offsets and
    using them with the slurp method in the
    paragraph above
    4. Creating an in-memory index for each file that contains a match and using
    it for subsequent matches in that file

    1, 2 and 4 above suffer performance degradation relative to unix grep. #3
    provides good performance and is the method I am currently using but it
    requires creating and maintaining index files. I was wondering if I could
    tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
    $. would contain the current line number as the file would be read as the
    loop is traversed. Any other ideas would be welcome

    Al
     
    Al Belden, Jul 3, 2004
    #1
    1. Advertising

  2. Hello Al Baden,

    I have had similar problem with getting the index numder of the
    element match when we search for elements in an array. It was
    fruitless. I used Hash map, but that was a burden on the system. In
    another possiable implementation i have done with use of a separate
    variable indexCount on array and reintialized evry time.

    That's it.
    Perl is langauge to make things work at any cost. All the best.
    Thanks.
    Giridhar Nandigam


    "Al Belden" <> wrote in message news:<>...
    > Hi all,
    > I've been working on a problem that I thought might be of interest: I'm
    > trying to replace some korn shell scripts that search source code files with
    > perl scripts to gain certain features such as:
    >
    > More powerful regular expressions available in perl
    > Ability to print out lines before and after matches (gnu grep supports this
    > but is not availble on our Digital Unix and AIX platforms)
    > Make searches case insensitive by default (yes, I know this can be done with
    > grep but the shell scripts that use
    > grep don't do this)
    >
    > We're talking about approx. 5000 files spread over 15 directories. To date
    > it has proven quite difficult (for me) to match the performance of the Korn
    > shell scripts using perl scripts and still obtain the line number and
    > context information needed. The crux of the problem is that I have seen the
    > best performance from perl when I match with the /g option on a string that
    > represents the current slurped file:
    >
    > local $/;
    > my $curStr = <FH>;
    > while ($curStr =~ /$compiledRegex/g)
    > {
    > # write matches to file for eventual paging
    > }
    >
    > This works well except that when each match is found I need the line number
    > the match has been found in. As far as I can tell from reading and research
    > there is no variable that holds this information as I am not reading from
    > the file at this point. I can get the information in other ways such as:
    >
    > 1. Reading each file a line at a time, testing for a match and keeping a
    > line counter or using $NR.
    > 2. Reading the file into an array and processing a line at a time
    > 3. Creating index files for the source files that store line offsets and
    > using them with the slurp method in the
    > paragraph above
    > 4. Creating an in-memory index for each file that contains a match and using
    > it for subsequent matches in that file
    >
    > 1, 2 and 4 above suffer performance degradation relative to unix grep. #3
    > provides good performance and is the method I am currently using but it
    > requires creating and maintaining index files. I was wondering if I could
    > tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
    > $. would contain the current line number as the file would be read as the
    > loop is traversed. Any other ideas would be welcome
    >
    > Al
     
    Giridhar Nandigam, Jul 7, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dpackwood
    Replies:
    3
    Views:
    1,832
  2. Spendius
    Replies:
    2
    Views:
    3,027
    Rogan Dawes
    Dec 13, 2004
  3. nospam
    Replies:
    5
    Views:
    14,794
    winey
    Jul 3, 2013
  4. Robert Wallace

    my own perl "dos->unix"/"unix->dos"

    Robert Wallace, Jan 21, 2004, in forum: Perl Misc
    Replies:
    7
    Views:
    296
    Michele Dondi
    Jan 22, 2004
  5. Al Belden

    perl vs Unix grep

    Al Belden, Jul 3, 2004, in forum: Perl Misc
    Replies:
    3
    Views:
    203
Loading...

Share This Page