Search methods, "fuzzy" logic, duplicate checking, oh my!

Discussion in 'ASP General' started by William Morris, Feb 25, 2004.

  1. Apologies in advance for the cross-post. Not sure if this is better handled
    in ASP code or TransactSQL.

    Windows2000 Server
    SQL 2000

    tblPeople
    contactid int
    firstname varchar(25)
    lastname varchar(25)
    address varchar(255)
    city varchar(100)
    state int
    zip varchar(10)
    homephone varchar(25)
    workphone varchar(25)
    mobilephone varchar(25)

    Our contact search screen allows the users to search for a contact by phone,
    lastname, firstname, city, state, zip. If no matches are found, then the
    user is given the option of adding the information into the database as a
    new contact. When searching, all whitespace and punctuation is removed for
    the comparison.

    The database is populated two ways:

    1. by hand by our users, where the format is controlled through the
    interface - i.e. phone numbers always have an area code, and the numbers are
    stored with dashes. Numbers, dashes, spaces, and parentheses are the only
    allow characters.
    2. Client data as CSV files from a variety of proprietary sources, and
    the data could have any ol' crap in it. We scrub as best we can, but some
    garbage gets through.

    So, what we find is when doing the phone number search, someone with
    816-555-1234 won't be found if the user enters 555-1234, and vice versa.
    Likewise, if any of the names are mis-spelled, the search will fail.

    We'd like to make the query "smart". We've explored SOUNDEX, but the
    results that come back often confuse the end-user: "Mercer" comes back with
    "Mercer", "Mercier" (not too bad, yet), "Marker" (ok, but...), "Markwardt"
    (um...), "Margarucci" (now we're getting blank stares...), and against
    numeric values (like phone) it just dies. DIFFERENCE might get us there
    somehow, though it returns a 4 on every variation in the preceeding list so
    we can't sort that way. I've seen some searches where they're sorted by
    "relevance", but I'm not sure how to make that happen.

    So I'm looking for ideas on making a "smart search". Thanks just for
    reading this far, and thanks for any help you can give.

    - Wm


    --
    William Morris
    Product Development, Seritas LLC
    Kansas City, Missouri
    William Morris, Feb 25, 2004
    #1
    1. Advertising

  2. William Morris

    jenny mabe Guest

    On Wed, 25 Feb 2004 10:27:08 -0600, William Morris
    <> wrote:

    > tblPeople
    > contactid int
    > firstname varchar(25)
    > lastname varchar(25)
    > address varchar(255)
    > city varchar(100)
    > state int
    > zip varchar(10)
    > homephone varchar(25)
    > workphone varchar(25)
    > mobilephone varchar(25)
    >
    > So I'm looking for ideas on making a "smart search". Thanks just for
    > reading this far, and thanks for any help you can give.


    You have clearly goteen into some more advanced areas than i can speak to,
    but I wanted to share a couple simple things things I have used in the
    past:

    -- Use a "LIKE" clause in SQL to search for phone numbers without area
    codes and to search for words or names that begin with the entered text
    (searching for "Mer" to get "Mercer"). For example:

    SELECT firstname, lastname FROM tblPeople WHERE homephone LIKE '%5551212'
    -- will give you all the people with phone numbers that are the same but
    include an area code

    -- You might be able to reduce some of your search complexity depending on
    the interface by using a list that dynamically reduces itself as the user
    enters their search string (at least in the case of names or cities),
    showing those people whose names are alphabetically similar. Some exciting
    javascript and ASP trickery is needed to make this work, but by showing a
    longer list of possibilities a user can look ahead at the whole list and
    often find what they are looking for faster than a search. Also neatly
    combined with an autocomplete script to fill in the rest of the selected
    name automatically.

    -- Encourage or even only offer a zip code search. Zip codes are far more
    likely to be entered correctly than cities and states when dealing with
    lots of user-created data. To reduce errors in cities and states, you
    could even consider regenerating cities and states on import from a
    commercially-available zip code database.


    HTH,
    Jenny
    jenny mabe, Feb 26, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VisionSet

    Fuzzy Logic Library

    VisionSet, Dec 12, 2005, in forum: Java
    Replies:
    3
    Views:
    1,043
    Roedy Green
    Dec 12, 2005
  2. Ray Gardener

    Fuzzy logic question

    Ray Gardener, Dec 28, 2003, in forum: C++
    Replies:
    1
    Views:
    424
    Jeff Schwab
    Dec 28, 2003
  3. Christos KK Loverdos

    Fuzzy Logic

    Christos KK Loverdos, Dec 6, 2003, in forum: Python
    Replies:
    2
    Views:
    506
    Raymond Hettinger
    Dec 7, 2003
  4. BBands

    Fuzzy Logic

    BBands, Aug 1, 2004, in forum: Python
    Replies:
    1
    Views:
    2,976
    BBands
    Aug 7, 2004
  5. DNKMCA

    Fuzzy Logic for String search

    DNKMCA, Nov 11, 2005, in forum: ASP General
    Replies:
    0
    Views:
    143
    DNKMCA
    Nov 11, 2005
Loading...

Share This Page