Search methods, "fuzzy" logic, duplicate checking, oh my!

W

William Morris

Apologies in advance for the cross-post. Not sure if this is better handled
in ASP code or TransactSQL.

Windows2000 Server
SQL 2000

tblPeople
contactid int
firstname varchar(25)
lastname varchar(25)
address varchar(255)
city varchar(100)
state int
zip varchar(10)
homephone varchar(25)
workphone varchar(25)
mobilephone varchar(25)

Our contact search screen allows the users to search for a contact by phone,
lastname, firstname, city, state, zip. If no matches are found, then the
user is given the option of adding the information into the database as a
new contact. When searching, all whitespace and punctuation is removed for
the comparison.

The database is populated two ways:

1. by hand by our users, where the format is controlled through the
interface - i.e. phone numbers always have an area code, and the numbers are
stored with dashes. Numbers, dashes, spaces, and parentheses are the only
allow characters.
2. Client data as CSV files from a variety of proprietary sources, and
the data could have any ol' crap in it. We scrub as best we can, but some
garbage gets through.

So, what we find is when doing the phone number search, someone with
816-555-1234 won't be found if the user enters 555-1234, and vice versa.
Likewise, if any of the names are mis-spelled, the search will fail.

We'd like to make the query "smart". We've explored SOUNDEX, but the
results that come back often confuse the end-user: "Mercer" comes back with
"Mercer", "Mercier" (not too bad, yet), "Marker" (ok, but...), "Markwardt"
(um...), "Margarucci" (now we're getting blank stares...), and against
numeric values (like phone) it just dies. DIFFERENCE might get us there
somehow, though it returns a 4 on every variation in the preceeding list so
we can't sort that way. I've seen some searches where they're sorted by
"relevance", but I'm not sure how to make that happen.

So I'm looking for ideas on making a "smart search". Thanks just for
reading this far, and thanks for any help you can give.

- Wm
 
J

jenny mabe

tblPeople
contactid int
firstname varchar(25)
lastname varchar(25)
address varchar(255)
city varchar(100)
state int
zip varchar(10)
homephone varchar(25)
workphone varchar(25)
mobilephone varchar(25)

So I'm looking for ideas on making a "smart search". Thanks just for
reading this far, and thanks for any help you can give.

You have clearly goteen into some more advanced areas than i can speak to,
but I wanted to share a couple simple things things I have used in the
past:

-- Use a "LIKE" clause in SQL to search for phone numbers without area
codes and to search for words or names that begin with the entered text
(searching for "Mer" to get "Mercer"). For example:

SELECT firstname, lastname FROM tblPeople WHERE homephone LIKE '%5551212'
-- will give you all the people with phone numbers that are the same but
include an area code

-- You might be able to reduce some of your search complexity depending on
the interface by using a list that dynamically reduces itself as the user
enters their search string (at least in the case of names or cities),
showing those people whose names are alphabetically similar. Some exciting
javascript and ASP trickery is needed to make this work, but by showing a
longer list of possibilities a user can look ahead at the whole list and
often find what they are looking for faster than a search. Also neatly
combined with an autocomplete script to fill in the rest of the selected
name automatically.

-- Encourage or even only offer a zip code search. Zip codes are far more
likely to be entered correctly than cities and states when dealing with
lots of user-created data. To reduce errors in cities and states, you
could even consider regenerating cities and states on import from a
commercially-available zip code database.


HTH,
Jenny
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top