Pattern match over mutiple files is slow - Help Needed !

Discussion in 'Perl' started by RV, Oct 22, 2003.

  1. RV

    RV Guest

    Hi:

    Am having a huge performance problem in one of my scripts.

    I have an array containing some reference keys. ( about 1000 entries
    or so ).
    I also have a list of files ( about 100 or so ) and I need to locate
    occurence of these keys in all of the files and replace with some
    value ( lets say the key-value hash is also given ).

    My code looks something like this:

    #Note: %keyval --> holds the key-value mapping
    # @keylist - is the array with the 1000 keys ( like keys %keyval )
    # @files - holds the list of files ( about 100 or so ).

    foreach $f ( @files )
    {
    #open file - validate etc - assume it is opened as <FH>
    while(<FH>) #each line
    {
    $line=$_ ;
    foreach $k (@keylist)
    {
    $line =~ s/$k/$keyval{$k}/ig ; #replace key with value
    } #key loop
    }
    close(FH);
    } #foreach

    This code works - but its too slow ! -- Obviously I run the inner loop
    1000 times for each line in the file.
    Constraints being that multiple keys may occur on the same line ( and
    even the same key will occur multiple times on the same line ).

    I tried globbing the file into a scalar ( unsetting $/ ) - no big
    difference in timing.

    Can someone help me here ? - If you can give some ideas that I can
    look into, I'll greatly appreciate it.
    Pseudocode is fine as well.

    If you can include a courtesy CC: that would be great !

    Thanks - hope I've conveyed my problem accurately ( this among my
    first posts - am a frequent "reader" though ! ).

    -RV.
     
    RV, Oct 22, 2003
    #1
    1. Advertising

  2. RV wrote:

    > Hi:
    >
    > Am having a huge performance problem in one of my scripts.
    >
    > I have an array containing some reference keys. ( about 1000 entries
    > or so ).
    > I also have a list of files ( about 100 or so ) and I need to locate
    > occurence of these keys in all of the files and replace with some
    > value ( lets say the key-value hash is also given ).
    >
    > My code looks something like this:
    >
    > #Note: %keyval --> holds the key-value mapping
    > # @keylist - is the array with the 1000 keys ( like keys %keyval )
    > # @files - holds the list of files ( about 100 or so ).
    >
    > foreach $f ( @files )
    > {
    > #open file - validate etc - assume it is opened as <FH>
    > while(<FH>) #each line
    > {
    > $line=$_ ;
    > foreach $k (@keylist)
    > {
    > $line =~ s/$k/$keyval{$k}/ig ; #replace key with value
    > } #key loop
    > }
    > close(FH);
    > } #foreach
    >
    > This code works - but its too slow ! -- Obviously I run the inner loop
    > 1000 times for each line in the file.
    > Constraints being that multiple keys may occur on the same line ( and
    > even the same key will occur multiple times on the same line ).
    >
    > I tried globbing the file into a scalar ( unsetting $/ ) - no big
    > difference in timing.
    >
    > Can someone help me here ? - If you can give some ideas that I can
    > look into, I'll greatly appreciate it.
    > Pseudocode is fine as well.
    >
    > If you can include a courtesy CC: that would be great !
    >
    > Thanks - hope I've conveyed my problem accurately ( this among my
    > first posts - am a frequent "reader" though ! ).
    >
    > -RV.

    You could read each file in turn into one string then apply your
    thousand replacements to the whole file (not line by line).
    If files are too big then you could apply your replacements to, say, one
    hundred thousand lines at a time.

    Cheers, Pad.
     
    Donald 'Paddy' McCarthy, Oct 23, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. thorsten
    Replies:
    1
    Views:
    481
  2. user

    editing mutiple html files

    user, Aug 3, 2006, in forum: HTML
    Replies:
    12
    Views:
    600
    Toby Inkster
    Aug 5, 2006
  3. user
    Replies:
    2
    Views:
    381
    Ed Seedhouse
    Aug 4, 2006
  4. Ken Fine
    Replies:
    1
    Views:
    695
    Steve C. Orr [MCSD, MVP, CSM, ASP Insider]
    Jul 31, 2007
  5. RV
    Replies:
    6
    Views:
    146
    Darren Dunham
    Oct 23, 2003
Loading...

Share This Page