perl vs Unix grep

A

Al Belden

Hi all,
I've been working on a problem that I thought might be of interest: I'm
trying to replace some korn shell scripts that search source code files with
perl scripts to gain certain features such as:

More powerful regular expressions available in perl
Ability to print out lines before and after matches (gnu grep supports this
but is not availble on our Digital Unix and AIX platforms)
Make searches case insensitive by default (yes, I know this can be done with
grep but the shell scripts that use
grep don't do this)

We're talking about approx. 5000 files spread over 15 directories. To date
it has proven quite difficult (for me) to match the performance of the Korn
shell scripts using perl scripts and still obtain the line number and
context information needed. The crux of the problem is that I have seen the
best performance from perl when I match with the /g option on a string that
represents the current slurped file:

local $/;
my $curStr = <FH>;
while ($curStr =~ /$compiledRegex/g)
{
# write matches to file for eventual paging
}

This works well except that when each match is found I need the line number
the match has been found in. As far as I can tell from reading and research
there is no variable that holds this information as I am not reading from
the file at this point. I can get the information in other ways such as:

1. Reading each file a line at a time, testing for a match and keeping a
line counter or using $NR.
2. Reading the file into an array and processing a line at a time
3. Creating index files for the source files that store line offsets and
using them with the slurp method in the
paragraph above
4. Creating an in-memory index for each file that contains a match and using
it for subsequent matches in that file

1, 2 and 4 above suffer performance degradation relative to unix grep. #3
provides good performance and is the method I am currently using but it
requires creating and maintaining index files. I was wondering if I could
tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
$. would contain the current line number as the file would be read as the
loop is traversed. Any other ideas would be welcome

Al
 
G

Giridhar Nandigam

Hello Al Baden,

I have had similar problem with getting the index numder of the
element match when we search for elements in an array. It was
fruitless. I used Hash map, but that was a burden on the system. In
another possiable implementation i have done with use of a separate
variable indexCount on array and reintialized evry time.

That's it.
Perl is langauge to make things work at any cost. All the best.
Thanks.
Giridhar Nandigam
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top