W
William Morris
Apologies in advance for the cross-post. Not sure if this is better handled
in ASP code or TransactSQL.
Windows2000 Server
SQL 2000
tblPeople
contactid int
firstname varchar(25)
lastname varchar(25)
address varchar(255)
city varchar(100)
state int
zip varchar(10)
homephone varchar(25)
workphone varchar(25)
mobilephone varchar(25)
Our contact search screen allows the users to search for a contact by phone,
lastname, firstname, city, state, zip. If no matches are found, then the
user is given the option of adding the information into the database as a
new contact. When searching, all whitespace and punctuation is removed for
the comparison.
The database is populated two ways:
1. by hand by our users, where the format is controlled through the
interface - i.e. phone numbers always have an area code, and the numbers are
stored with dashes. Numbers, dashes, spaces, and parentheses are the only
allow characters.
2. Client data as CSV files from a variety of proprietary sources, and
the data could have any ol' crap in it. We scrub as best we can, but some
garbage gets through.
So, what we find is when doing the phone number search, someone with
816-555-1234 won't be found if the user enters 555-1234, and vice versa.
Likewise, if any of the names are mis-spelled, the search will fail.
We'd like to make the query "smart". We've explored SOUNDEX, but the
results that come back often confuse the end-user: "Mercer" comes back with
"Mercer", "Mercier" (not too bad, yet), "Marker" (ok, but...), "Markwardt"
(um...), "Margarucci" (now we're getting blank stares...), and against
numeric values (like phone) it just dies. DIFFERENCE might get us there
somehow, though it returns a 4 on every variation in the preceeding list so
we can't sort that way. I've seen some searches where they're sorted by
"relevance", but I'm not sure how to make that happen.
So I'm looking for ideas on making a "smart search". Thanks just for
reading this far, and thanks for any help you can give.
- Wm
in ASP code or TransactSQL.
Windows2000 Server
SQL 2000
tblPeople
contactid int
firstname varchar(25)
lastname varchar(25)
address varchar(255)
city varchar(100)
state int
zip varchar(10)
homephone varchar(25)
workphone varchar(25)
mobilephone varchar(25)
Our contact search screen allows the users to search for a contact by phone,
lastname, firstname, city, state, zip. If no matches are found, then the
user is given the option of adding the information into the database as a
new contact. When searching, all whitespace and punctuation is removed for
the comparison.
The database is populated two ways:
1. by hand by our users, where the format is controlled through the
interface - i.e. phone numbers always have an area code, and the numbers are
stored with dashes. Numbers, dashes, spaces, and parentheses are the only
allow characters.
2. Client data as CSV files from a variety of proprietary sources, and
the data could have any ol' crap in it. We scrub as best we can, but some
garbage gets through.
So, what we find is when doing the phone number search, someone with
816-555-1234 won't be found if the user enters 555-1234, and vice versa.
Likewise, if any of the names are mis-spelled, the search will fail.
We'd like to make the query "smart". We've explored SOUNDEX, but the
results that come back often confuse the end-user: "Mercer" comes back with
"Mercer", "Mercier" (not too bad, yet), "Marker" (ok, but...), "Markwardt"
(um...), "Margarucci" (now we're getting blank stares...), and against
numeric values (like phone) it just dies. DIFFERENCE might get us there
somehow, though it returns a 4 on every variation in the preceeding list so
we can't sort that way. I've seen some searches where they're sorted by
"relevance", but I'm not sure how to make that happen.
So I'm looking for ideas on making a "smart search". Thanks just for
reading this far, and thanks for any help you can give.
- Wm