replace words in string with hash values

Discussion in 'Perl Misc' started by wana, Nov 29, 2004.

  1. wana

    wana Guest

    foreach (keys %h)
    {
    $a =~ s/\b$_\b/$h{$_}/g;
    }

    I want to replace matches in string to hash key with hash value. I am
    replacing acronyms with phrases where acronym is hash key. Is there a
    better or different way?

    thanks!

    wana (on pda)
     
    wana, Nov 29, 2004
    #1
    1. Advertisements

  2. You have Perl search the whole string as many times as there are keys in
    the hash. With this approach, searching the string once is sufficient:

    my $keys = join '|', keys %h;
    $a =~ s/($keys)/$h{$1}/g;
     
    Gunnar Hjalmarsson, Nov 29, 2004
    #2
    1. Advertisements

  3. wana

    Paul Lalli Guest

    Well, here's one that's different, though not necessarily better (in
    fact, quite likely worse)

    $a=~ s/\b(\B+)\b/$h{$1} or $1/ge;

    Rather than searching the string for each hash key, this one would
    search each word in the string to determine if it is in the hash (more
    correctly stated: if it has a true value in the hash). If so, replace
    it with the hash value, if not, leave it as is.

    Benchmarking is left as an excercise to the OP. ;-)

    One comment, however: Be careful about the use of \b. While \b does
    mean 'word boundary', it means Perl's definition of a 'word', which is:
    [0-9a-zA-Z_]+ That means that "don't" is two words: "don" and "t".
    This may or may not be what you actually want.

    Paul Lalli
     
    Paul Lalli, Nov 29, 2004
    #3
  4. Don't know if it's "better" (depends on far too many different
    things!), but

    s/\b\w+\b/$h{$&}||$&/ge;

    should do the job. Of course if your acronyms follow some convention
    (e.g. 2 to 4 uppercase letters only) it could be improved
    performance-wise:

    s/\b[A-Z]{2,4}\b/$h{$&}||$&/ge;


    HTH,
    Michele
     
    Michele Dondi, Nov 29, 2004
    #4

  5. But you better put the word boundaries back in though!
     
    Tad McClellan, Nov 30, 2004
    #5
  6. Yes, they were omitted by mistake; thanks for pointing it out.
     
    Gunnar Hjalmarsson, Nov 30, 2004
    #6

  7. But if a metachar could be the first or last char, then
    the word boundary probably won't match where you want it to...
     
    Tad McClellan, Nov 30, 2004
    #7
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.