Pattern match over mutiple files is slow - Help Needed !

RV · Oct 22, 2003

Hi:

Am having a huge performance problem in one of my scripts.

I have an array containing some reference keys. ( about 1000 entries
or so ).
I also have a list of files ( about 100 or so ) and I need to locate
occurence of these keys in all of the files and replace with some
value ( lets say the key-value hash is also given ).

My code looks something like this:

#Note: %keyval --> holds the key-value mapping
# @keylist - is the array with the 1000 keys ( like keys %keyval )
# @files - holds the list of files ( about 100 or so ).

foreach $f ( @files )
{
#open file - validate etc - assume it is opened as <FH>
while(<FH>) #each line
{
$line=$_ ;
foreach $k (@keylist)
{
$line =~ s/$k/$keyval{$k}/ig ; #replace key with value
} #key loop
}
close(FH);
} #foreach

This code works - but its too slow ! -- Obviously I run the inner loop
1000 times for each line in the file.
Constraints being that multiple keys may occur on the same line ( and
even the same key will occur multiple times on the same line ).

I tried globbing the file into a scalar ( unsetting $/ ) - no big
difference in timing.

Can someone help me here ? - If you can give some ideas that I can
look into, I'll greatly appreciate it.
Pseudocode is fine as well.

If you can include a courtesy CC: that would be great !

Thanks - hope I've conveyed my problem accurately ( this among my
first posts - am a frequent "reader" though ! ).

-RV.

Donald 'Paddy' McCarthy · Oct 23, 2003

RV said:
Hi:

Am having a huge performance problem in one of my scripts.

I have an array containing some reference keys. ( about 1000 entries
or so ).
I also have a list of files ( about 100 or so ) and I need to locate
occurence of these keys in all of the files and replace with some
value ( lets say the key-value hash is also given ).

My code looks something like this:

#Note: %keyval --> holds the key-value mapping
# @keylist - is the array with the 1000 keys ( like keys %keyval )
# @files - holds the list of files ( about 100 or so ).

foreach $f ( @files )
{
#open file - validate etc - assume it is opened as <FH>
while(<FH>) #each line
{
$line=$_ ;
foreach $k (@keylist)
{
$line =~ s/$k/$keyval{$k}/ig ; #replace key with value
} #key loop
}
close(FH);
} #foreach

This code works - but its too slow ! -- Obviously I run the inner loop
1000 times for each line in the file.
Constraints being that multiple keys may occur on the same line ( and
even the same key will occur multiple times on the same line ).

I tried globbing the file into a scalar ( unsetting $/ ) - no big
difference in timing.

Can someone help me here ? - If you can give some ideas that I can
look into, I'll greatly appreciate it.
Pseudocode is fine as well.

If you can include a courtesy CC: that would be great !

Thanks - hope I've conveyed my problem accurately ( this among my
first posts - am a frequent "reader" though ! ).

-RV.

You could read each file in turn into one string then apply your
thousand replacements to the whole file (not line by line).
If files are too big then you could apply your replacements to, say, one
hundred thousand lines at a time.

Cheers, Pad.

Pattern match over mutiple files is slow - Help Needed !	6	Oct 22, 2003
Replace an occurrence of a regexp with a function call on a substringof the match, multiple times on	4	Sep 16, 2013
Hashed to death... help needed.	4	May 23, 2007
Efficiently searching multiple files	10	May 20, 2010
script too slow - sometimes hangs	3	Apr 24, 2004
Could someone help me with this source code?	5	Jan 20, 2007
parse hash by array element seems to run slow	8	Dec 4, 2004
UTF - SEEK_SET workaround for BOM encoding(utf-16/32) layer Bug	2	Aug 5, 2009

Pattern match over mutiple files is slow - Help Needed !

RV

Donald 'Paddy' McCarthy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads