best practise for search function / regExp?

R

Russell

Hey,

ok i have numerous tables to search through for a 'site search'.

some of the searchble fields have html embeded within so after some quick
referencing, saw I can use the regExp function to strip out all the HTML
leaving only the raw text.
(done and works a treat)

My issue is:

what is the best method to remove alot of words such as "a", "and", "I",
"so", "that", "this" ...etc ... from the search string leaving only keywords
essentially per page/field that will be searched within for the occurance
of the users' input text through a input field

The idea being to only return the suitable records without alot of
rubbish...

your assitance is appreciated.

Thankyou
 
R

Rob Meade

...
what is the best method to remove alot of words such as "a", "and", "I",
"so", "that", "this" ...etc ... from the search string leaving only
keywords essentially per page/field that will be searched within for the
occurance of the users' input text through a input field

The idea being to only return the suitable records without alot of
rubbish...

You want to create yourself an IGNORE WORDS list, I put one of these
together before, contains about 390 now I think - you can search Google and
find results for this pretty easily, the pain is sometimes having to get
them from the web into a format you can use, perhaps from a web page to
excel to xml or something...

With these, you then just have a little function that is called passing in
your search criteria, iterate through each word in the search criteria and
the ignore words, if you find a match dont keep it as good criteria, if
there's no match then keep it...

Example:

Dim aIgnoreWords(4)

aIgnoreWords(0) = "a"
aIgnoreWords(1) = "i"
aIgnoreWords(2) = "them"
aIgnoreWords(3) = "must"

bMatchFound = False

sSearchCriteria = "I must find them and a donkey"

aSearchCriteriaWords = Split(sSearchCriteria, " ")

' iterate through search criteria words
For x = 0 To (UBound(aSearchCriteriaWords)-1)

' iterate through our ignore words
For y = 0 To (UBound(aIgnoreWords)-1)

' do we have a match?
If LCase(aSearchCritieriaWords(x)) = LCase(aIgnoreWords(y)) Then

' match found!
bMatchFound = True
Exit For

End If

Next

' if we didn't match our criteria word to any ignore words then its a
good criteria word, add it to our new search criteria
If bMarchFound = False Then

sNewSearchCriteria = sNewSearchCriteria & aSearchCriteria(x)

' add a space to separate the words
If x < (UBound(aIgnoreWords)-1) Then

sNewSearchCriteria & " "

End If

Else

' reset flag ready for next search criteria word
bMatchFound = False

End If

Next

By the end of this, your sNewSearchCriteria string should contain: "find and
my donkey"

Little example only and untested so it might error - you should also
consider doing some checks,

does the Len(sSearchCriteria) > 1 (do we have anything to search for at
all?!)
have you loaded in your ignore words successful (perhaps from a
database/xml)
do you want to store the words that were admitted so you can do what google
used to do... "The following common words were exclude; a, i, them, must"
etc

My list of ignore words is below, its in XML format, but you cold easily do
a REPLACE ALL in word or something to remove the tags..hope its of use.

Regards

Rob


<?xml version="1.0" encoding="utf-8" ?>
<IgnoreWords>
<Word>a</Word>
<Word>about</Word>
<Word>above</Word>
<Word>according</Word>
<Word>across</Word>
<Word>actually</Word>
<Word>adj</Word>
<Word>after</Word>
<Word>afterwards</Word>
<Word>again</Word>
<Word>against</Word>
<Word>all</Word>
<Word>almost</Word>
<Word>alone</Word>
<Word>along</Word>
<Word>already</Word>
<Word>also</Word>
<Word>although</Word>
<Word>always</Word>
<Word>among</Word>
<Word>amongst</Word>
<Word>an</Word>
<Word>and</Word>
<Word>another</Word>
<Word>any</Word>
<Word>anyhow</Word>
<Word>anyone</Word>
<Word>anything</Word>
<Word>anywhere</Word>
<Word>are</Word>
<Word>aren't</Word>
<Word>around</Word>
<Word>as</Word>
<Word>at</Word>
<Word>b</Word>
<Word>be</Word>
<Word>became</Word>
<Word>because</Word>
<Word>become</Word>
<Word>becomes</Word>
<Word>becoming</Word>
<Word>been</Word>
<Word>before</Word>
<Word>beforehand</Word>
<Word>begin</Word>
<Word>beginning</Word>
<Word>behind</Word>
<Word>being</Word>
<Word>below</Word>
<Word>beside</Word>
<Word>besides</Word>
<Word>between</Word>
<Word>beyond</Word>
<Word>billion</Word>
<Word>both</Word>
<Word>but</Word>
<Word>by</Word>
<Word>c</Word>
<Word>can</Word>
<Word>can't</Word>
<Word>cannot</Word>
<Word>caption</Word>
<Word>co</Word>
<Word>co.</Word>
<Word>could</Word>
<Word>couldn't</Word>
<Word>d</Word>
<Word>did</Word>
<Word>didn't</Word>
<Word>do</Word>
<Word>does</Word>
<Word>doesn't</Word>
<Word>don't</Word>
<Word>down</Word>
<Word>during</Word>
<Word>e</Word>
<Word>each</Word>
<Word>eg</Word>
<Word>eight</Word>
<Word>eighty</Word>
<Word>either</Word>
<Word>else</Word>
<Word>elsewhere</Word>
<Word>end</Word>
<Word>ending</Word>
<Word>enough</Word>
<Word>etc</Word>
<Word>even</Word>
<Word>ever</Word>
<Word>every</Word>
<Word>everyone</Word>
<Word>everything</Word>
<Word>everywhere</Word>
<Word>except</Word>
<Word>f</Word>
<Word>few</Word>
<Word>fifty</Word>
<Word>first</Word>
<Word>five</Word>
<Word>for</Word>
<Word>former</Word>
<Word>formerly</Word>
<Word>forty</Word>
<Word>found</Word>
<Word>four</Word>
<Word>from</Word>
<Word>further</Word>
<Word>g</Word>
<Word>h</Word>
<Word>had</Word>
<Word>has</Word>
<Word>hasn't</Word>
<Word>have</Word>
<Word>haven't</Word>
<Word>he</Word>
<Word>he'd</Word>
<Word>he'll</Word>
<Word>he's</Word>
<Word>hence</Word>
<Word>her</Word>
<Word>here</Word>
<Word>here's</Word>
<Word>hereafter</Word>
<Word>hereby</Word>
<Word>herein</Word>
<Word>hereupon</Word>
<Word>hers</Word>
<Word>herself</Word>
<Word>him</Word>
<Word>himself</Word>
<Word>his</Word>
<Word>how</Word>
<Word>however</Word>
<Word>hundred</Word>
<Word>i</Word>
<Word>i'd</Word>
<Word>i'll</Word>
<Word>i'm</Word>
<Word>i've</Word>
<Word>ie</Word>
<Word>if</Word>
<Word>in</Word>
<Word>inc.</Word>
<Word>indeed</Word>
<Word>instead</Word>
<Word>into</Word>
<Word>is</Word>
<Word>isn't</Word>
<Word>it</Word>
<Word>it's</Word>
<Word>its</Word>
<Word>itself</Word>
<Word>j</Word>
<Word>k</Word>
<Word>l</Word>
<Word>last</Word>
<Word>later</Word>
<Word>latter</Word>
<Word>latterly</Word>
<Word>least</Word>
<Word>less</Word>
<Word>let</Word>
<Word>let's</Word>
<Word>like</Word>
<Word>likely</Word>
<Word>ltd</Word>
<Word>m</Word>
<Word>made</Word>
<Word>make</Word>
<Word>makes</Word>
<Word>many</Word>
<Word>maybe</Word>
<Word>me</Word>
<Word>meantime</Word>
<Word>meanwhile</Word>
<Word>might</Word>
<Word>million</Word>
<Word>miss</Word>
<Word>more</Word>
<Word>moreover</Word>
<Word>most</Word>
<Word>mostly</Word>
<Word>mr</Word>
<Word>mrs</Word>
<Word>much</Word>
<Word>must</Word>
<Word>my</Word>
<Word>myself</Word>
<Word>n</Word>
<Word>namely</Word>
<Word>neither</Word>
<Word>never</Word>
<Word>nevertheless</Word>
<Word>next</Word>
<Word>nine</Word>
<Word>ninety</Word>
<Word>no</Word>
<Word>nobody</Word>
<Word>none</Word>
<Word>nonetheless</Word>
<Word>noone</Word>
<Word>nor</Word>
<Word>not</Word>
<Word>nothing</Word>
<Word>now</Word>
<Word>nowhere</Word>
<Word>o</Word>
<Word>of</Word>
<Word>off</Word>
<Word>often</Word>
<Word>on</Word>
<Word>once</Word>
<Word>one</Word>
<Word>one's</Word>
<Word>only</Word>
<Word>onto</Word>
<Word>or</Word>
<Word>other</Word>
<Word>others</Word>
<Word>otherwise</Word>
<Word>our</Word>
<Word>ours</Word>
<Word>ourselves</Word>
<Word>out</Word>
<Word>over</Word>
<Word>overall</Word>
<Word>own</Word>
<Word>p</Word>
<Word>per</Word>
<Word>perhaps</Word>
<Word>q</Word>
<Word>r</Word>
<Word>rather</Word>
<Word>recent</Word>
<Word>recently</Word>
<Word>s</Word>
<Word>same</Word>
<Word>seem</Word>
<Word>seemed</Word>
<Word>seeming</Word>
<Word>seems</Word>
<Word>seven</Word>
<Word>seventy</Word>
<Word>several</Word>
<Word>she</Word>
<Word>she'd</Word>
<Word>she'll</Word>
<Word>she's</Word>
<Word>should</Word>
<Word>shouldn't</Word>
<Word>since</Word>
<Word>six</Word>
<Word>sixty</Word>
<Word>so</Word>
<Word>some</Word>
<Word>somehow</Word>
<Word>someone</Word>
<Word>something</Word>
<Word>sometime</Word>
<Word>sometimes</Word>
<Word>somewhere</Word>
<Word>still</Word>
<Word>stop</Word>
<Word>stoplist</Word>
<Word>such</Word>
<Word>t</Word>
<Word>taking</Word>
<Word>ten</Word>
<Word>than</Word>
<Word>that</Word>
<Word>that'll</Word>
<Word>that's</Word>
<Word>that've</Word>
<Word>the</Word>
<Word>their</Word>
<Word>them</Word>
<Word>themselves</Word>
<Word>then</Word>
<Word>thence</Word>
<Word>there</Word>
<Word>there'd</Word>
<Word>there'll</Word>
<Word>there're</Word>
<Word>there's</Word>
<Word>there've</Word>
<Word>thereafter</Word>
<Word>thereby</Word>
<Word>therefore</Word>
<Word>therein</Word>
<Word>thereupon</Word>
<Word>these</Word>
<Word>they</Word>
<Word>they'd</Word>
<Word>they'll</Word>
<Word>they're</Word>
<Word>they've</Word>
<Word>thirty</Word>
<Word>this</Word>
<Word>those</Word>
<Word>though</Word>
<Word>thousand</Word>
<Word>three</Word>
<Word>through</Word>
<Word>throughout</Word>
<Word>thru</Word>
<Word>thus</Word>
<Word>to</Word>
<Word>together</Word>
<Word>too</Word>
<Word>toward</Word>
<Word>towards</Word>
<Word>trillion</Word>
<Word>twenty</Word>
<Word>two</Word>
<Word>u</Word>
<Word>under</Word>
<Word>unless</Word>
<Word>unlike</Word>
<Word>unlikely</Word>
<Word>until</Word>
<Word>up</Word>
<Word>upon</Word>
<Word>us</Word>
<Word>used</Word>
<Word>using</Word>
<Word>v</Word>
<Word>very</Word>
<Word>via</Word>
<Word>w</Word>
<Word>was</Word>
<Word>wasn't</Word>
<Word>we</Word>
<Word>we'd</Word>
<Word>we'll</Word>
<Word>we're</Word>
<Word>we've</Word>
<Word>well</Word>
<Word>were</Word>
<Word>weren't</Word>
<Word>what</Word>
<Word>what'll</Word>
<Word>what's</Word>
<Word>what've</Word>
<Word>whatever</Word>
<Word>when</Word>
<Word>whence</Word>
<Word>whenever</Word>
<Word>where</Word>
<Word>where's</Word>
<Word>whereafter</Word>
<Word>whereas</Word>
<Word>whereby</Word>
<Word>wherein</Word>
<Word>whereupon</Word>
<Word>wherever</Word>
<Word>whether</Word>
<Word>which</Word>
<Word>while</Word>
<Word>whither</Word>
<Word>who</Word>
<Word>who'd</Word>
<Word>who'll</Word>
<Word>who's</Word>
<Word>whoever</Word>
<Word>whole</Word>
<Word>whom</Word>
<Word>whomever</Word>
<Word>whose</Word>
<Word>why</Word>
<Word>will</Word>
<Word>with</Word>
<Word>within</Word>
<Word>without</Word>
<Word>won't</Word>
<Word>would</Word>
<Word>wouldn't</Word>
<Word>x</Word>
<Word>y</Word>
<Word>yes</Word>
<Word>yet</Word>
<Word>you</Word>
<Word>you'd</Word>
<Word>you'll</Word>
<Word>you're</Word>
<Word>you've</Word>
<Word>your</Word>
<Word>yours</Word>
<Word>yourself</Word>
<Word>yourselves</Word>
<Word>z</Word>
</IgnoreWords>
 
R

Russell

Much appreciated for the advice AND your list!!
Thanks!!

"Rob Meade" wrote in message
 
R

Rob Meade

...
Much appreciated for the advice AND your list!!
Thanks!!

You're welcome Russell, if I'd had more time I'd have done a quick test on
the code too - but I suspect you'll get the idea from it and be able to
write your own and better! :eek:)

Rob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top