Newbie asking, interesting question

Discussion in 'Perl Misc' started by Wondering, Feb 4, 2005.

  1. Wondering

    Wondering Guest

    I'm struggling to learn Perl, with some degree of success. I have a
    question that's a bit more advanced than I am, but I hope someone can
    help (thanks in advance to all who read this and biger thanks to

    I'm trying to match name and address records in a large (~300,000
    record) database with potential new records to avoid duplicates. Anyone
    who has tried this knows that there are problems with exact matching,
    especially if no convention has been followed for entering data.
    (Consider all the possible variations of "avenue" - "avenue", "av",
    "ave", etc., and when you consider drive, boulevard, etc. and all their
    possible abbreviations, you begin to get the picture). So, I want to be
    able to extract just the numeric characters in a strings so I can do
    the matching on those (it's fuzzy, but with other feilds being
    considered, too, we can get a fairly high matching rate). Anyone know
    how to extract just the numeric charaters?
    I'll also accept any other ideas for doing the match.
    Wondering, Feb 4, 2005
    1. Advertisements

  2. Wondering

    Wondering Guest

    Right on. I know tr from *nix, just didn't occur to me to use it for
    this. Big thanks!
    Wondering, Feb 4, 2005
    1. Advertisements

  3. Please put the subject of your article in the Subject of your article.

    Your article was not about a newbie asking interesting questions.
    Tad McClellan, Feb 4, 2005
  4. Wondering

    Anno Siegel Guest


    That will delete everything except digits.
    There's the Soundex method with a corresponding standard module
    Text::Soundex. It tries to map words so that similar-sounding ones
    map to the same thing. It may also map different-sounding words to
    the same thing, but you're not overly concerned about false positives.
    Your fields may need some pre-processing (as breaking into words in
    a useful way).

    Anno Siegel, Feb 6, 2005

  5. Make that


    Tad McClellan, Feb 6, 2005
  6. Wondering

    Anno Siegel Guest

    Yes. Oh boy. Looks like I violated the copy/paste rule.

    Anno Siegel, Feb 6, 2005
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.