Keyword Parsing with ASP

A

ARK

I am writing a search program in ASP(VBScript). The user can enter keywords
and press submit.
The user can separate the keywords by spaces and/or commas and key words may
contain plain words, single quoted strings (phrases), double quoted strings
(phrases).
For example:

Keywords:

Jack, Jill, Jim, "Timothy Brown", Mary OR
Jack Jill Jim 'Timothy Brown' Mary OR
Jack, Jill Jim, 'Timothy Brown' "Mary"

When I parse it i store the keywords in an array. The results must be:

Jack
Jill
Jim
Timothy Brown
Mary

I have tried doing this using Split but am unable to get the Phrases. Some
suggestions, code examples or links would help.

Thanks in advance

ARK.
 
A

Andrew J Durstewitz

You might want to replace the spaces the user puts in with commas and
then use the split command.

strVariable = Replace(strVariable," ",",")
Split(strVariable,",")

Then you should have your array of items.

hth,
Andrew

* * * Sent via DevBuilder http://www.devbuilder.org * * *
Developer Resources for High End Developers.
 
T

TomB

Unfortunately, that would put a comma in "Timothy Brown" as well.

My suggestion would be to work your way through the string a character at a
time. If the character is a space, and not within quotes (" or ') then add
a comma, otherwise move along
 
A

Aaron Bertrand - MVP

My suggestion would be to work your way through the string a character at
a
time. If the character is a space, and not within quotes (" or ') then add
a comma, otherwise move along

This can get infinitely complex, e.g.

Bob, Mary, "Timothy, Brown" 'franke, "tom, hula hoop" mea, culpa"

You never know what a user is going to enter, and it's hard to write code to
understand exactly what they mean.

I'd be really interested to see how Google's parsing algorithm works. I
wasn't brave enough to do that: www.aspfaq.com supports all words, any
words, or exact phrase... but no combination of the three.
 
A

Andrew Durstewitz

I agree, keep it simple as possible. Setting up validation characters
such as " will require that you analyze the string one character at a
time.

Andrew

* * * Sent via DevBuilder http://www.devbuilder.org * * *
Developer Resources for High End Developers.
 
R

Ray at

Google is a non-stop source of awe. This is why I buy Google t-shirts. The
calculator can also do some math too in addition to unit conversion (okay,
that's also math, but fine), i.e.
5 percent of 343

Ray at work
 
J

Jon Mundsack

SWEET! Thanks for the tip.

TomB said:
Yes you are right, that would be complex.
Speaking of Google's parsing have you seen the calculator? Try "searching"
for

100 kilometers in miles

Very cool.


code
 
T

TomB

Yeah, I think that's why they call it a calculator ;)
I thought the fact that it was able to determine that I wanted a calculation
rather than a search for the words was the cool part.
 
A

Aaron Bertrand - MVP

It even works with slightly more complex phrases, like 100 degrees
fahrenheit in celsius
 
B

Bob Barrows

OK, I've come up with the following function that returns an array
containing the keywords. However, in order for this to work, you need to set
some ground rules:
1. Don't mix delimiters for a phrase. This will work correctly:
Jack, Jill Jim, "Timothy Brown", 'Mary'
but this will not:
Jack, Jill Jim, "Timothy Brown', 'Mary'

2. If literal delimiter characters are used, then they must not match the
delimiters used. For example, this will work:
"O'Malley"
but this will not:
'O'Malley'
Also, if literal delimiter characters are used, all delimiters in the entire
list must be the same. This will work:
Jim, "Tom Brown", "Pat O'Malley"
This won't:
Jim, 'Tom Brown', "Pat O'Malley"

Anyways, the function appears below my signature. You can use this code to
test it:
Dim iCount, arResult, sWords
sWords="Jack, Jill Jim, ""Timothy Brown"", 'Mary'"
Response.Write sWords & "<BR>"
arResult= ParseKeywords(sWords)
if IsArray(arResult) then
for iCount = 0 to UBound(arResult)
Response.Write arResult(iCount) & "<BR>"
next
end if

HTH,
Bob Barrows

Function ParseKeywords(pKeywords)
Dim sKeywords,iQuotes, arQuoted(), i, j, k, sTmp, bQfound, bSQFound
dim iCommas, arCommas, arSpaces, bArrayDefined, arKeywords()
bArrayDefined = false
sKeywords = pKeywords
'first see if sKeywords contains quoted sections - if so, make
'sure they are paired, ie, there is an even number of quotes
iQuotes = len(sKeywords) - len(Replace(sKeywords,"""",""))
bQfound = false
if iQuotes > 0 then
if iQuotes mod 2 = 0 then
bQfound = true
redim arQuoted(iQuotes/2 - 1)
i=instr(sKeywords,"""")
k = 0
Do Until i = 0
j = instr(i+1,sKeywords,"""")
sTmp = mid(sKeywords,i,j+1-i)
arQuoted(k) = sTmp
k=k+1
sKeywords = replace(sKeywords,sTmp,"")
i=instr(sKeywords,"""")
Loop
for i = 0 to ubound(arQuoted)
arQuoted(i) = replace(arQuoted(i),"""","")
next
end if
end if

'now find single-quoted sections
iQuotes = len(sKeywords) - len(Replace(sKeywords,"'",""))
bSQFound = false
if iQuotes > 0 then
if iQuotes mod 2 = 0 then
bSQFound = true
if bQfound = false then
redim arQuoted(iQuotes/2 - 1)
k = 0
else
k = ubound(arQuoted) + 1
Redim preserve arQuoted(UBound(arQuoted) + iQuotes/2)
end if
i=instr(sKeywords,"'")
Do Until i = 0
j = instr(i+1,sKeywords,"'")
sTmp = mid(sKeywords,i,j+1-i)
arQuoted(k) = sTmp
k=k+1
sKeywords = replace(sKeywords,sTmp,"")
i=instr(sKeywords,"'")
Loop
for i = 0 to ubound(arQuoted)
arQuoted(i) = replace(arQuoted(i),"'","")
next
end if
end if
sKeywords = RTrim(sKeywords)
do until right(sKeywords,1) <> ","
sKeywords = rtrim(left(sKeywords,len(sKeywords)-1))
loop

'add quoted sections to result array
if bQfound or bSQFound then
redim arKeywords(UBound(arQuoted))
for i = 0 to ubound(arQuoted)
arKeywords(i) = arQuoted(i)
next
bArrayDefined = true
end if

'now process commas and spaces

iCommas = len(sKeywords) - len(Replace(sKeywords,",",""))
arCommas=split(sKeywords,",")
for i = 0 to ubound(arCommas)
arCommas(i) = RTrim(LTrim(arCommas(i)))
if len(arCommas(i)) > 0 then
if instr(arCommas(i)," ") = 0 then
if bArrayDefined then
redim preserve arKeywords(UBound(arKeywords) + 1)
else
redim arKeywords(0)
end if
arKeywords(ubound(arKeywords)) = arCommas(i)
else
arSpaces = split(arCommas(i)," ")
for j = 0 to ubound(arSpaces)
arSpaces(j) = RTrim(LTrim(arSpaces(j)))
if len(arSpaces(j)) > 0 then
if bArrayDefined then
redim preserve arKeywords(UBound(arKeywords) + 1)
else
redim arKeywords(0)
end if
arKeywords(ubound(arKeywords)) = arSpaces(j)
end if
next
end if
end if
next
ParseKeywords=arKeywords
end function
 
B

Bob Barrows

Chris said:
Here's a regular expression alternative:
<%
Dim s,oRE,oMatches,oMatch
s = "Jack, Jill, Jim, 'Timothy Brown', Mary"
Set oRE = New RegExp
oRE.Global=True
oRE.Pattern = "\w+|('|"")([^\1]|\1{2})+\1"
Set oMatches = oRE.Execute(s)
For Each oMatch In oMatches
Response.Write oMatch.Value & "<br>"
Next
%>
Showoff! ;-)

Actually, I have to dive into this regexp stuff. I've been meaning to but I
just haven't had the time.

If you have a few min. could you break down that pattern you used and
explain each element?

I'm assuming the same ground rules I laid out still apply to your solution
here ... ?

Bob Barrows
 
C

Chris Hohmann

Bob Barrows said:
Chris said:
Here's a regular expression alternative:
<%
Dim s,oRE,oMatches,oMatch
s = "Jack, Jill, Jim, 'Timothy Brown', Mary"
Set oRE = New RegExp
oRE.Global=True
oRE.Pattern = "\w+|('|"")([^\1]|\1{2})+\1"
Set oMatches = oRE.Execute(s)
For Each oMatch In oMatches
Response.Write oMatch.Value & "<br>"
Next
%>
Showoff! ;-)

Actually, I have to dive into this regexp stuff. I've been meaning to but I
just haven't had the time.

If you have a few min. could you break down that pattern you used and
explain each element?

I'm assuming the same ground rules I laid out still apply to your solution
here ... ?

Bob Barrows

Sure...

\w+ = a series of one(1) or more word characters, i.e.
[a-zA-Z0-9_]

| = OR

('|") = a quote (") OR an apostrophe ('), let call this submatch
QUALIFIER

([^\1]|\1{2})+ = one(1) or more characters that are either not the
QUALIFIER OR a double occurrence of the QUALIFIER (escaping quotes)

\1 = a closing instance of the QUALIFIER

A perennial favorite for those interested in regular expressions is
O'Reilly's "Mastering Regular Expressions" (ISBN:0596002890)

"HTH".replace(/HTH/g,"Hope that helps,");
-Chris
 
C

Chris Hohmann

Bob Barrows said:
I'm assuming the same ground rules I laid out still apply to your solution
here ... ?

Sorry, I forgot to answer this in my previous post. Your first rule
about balanced(matched) text qualifiers applies to my solution as well.
However, your second rule does not apply. The value list can contain a
mixture of quote-qualified phrases and apostrophe qualified phrases.
Also, a qualifier can be embedded into a phrase by doubling-it-up
(escaping). Finally, regular expression, by default are greedy
algorithms (although you can override this behavior). As such the
expression will match as much of the string as possible. Having said all
that, the following should be a valid value list:

Bob, Barrows, "Bob 'The Man' Barrows", 'Bob "The Man" Barrows', "Bob
""The Man"" Barrows", 'Bob ''The Man'' Barrows'

HTH
-Chris
 
A

ARK

Hi! Everyone,

Thanks for the replies. I will try out the code and post my findings. What
version onwards VBScript
supports Regular Expressions?

Thanks again!
ARK.
 
A

ARK

I tried the Function that uses RegExp but the following does not work
on my server (windows 2000 Prof./IIS 5.0) -

Set re = new RegExp

This Object is supposed to be supported by VBScript 5.0 which comes in
Windows 2000 / IE 5.0 upwards, how come it does not work on my server?
 
B

Bob Barrows

ARK said:
I tried the Function that uses RegExp but the following does not work
on my server (windows 2000 Prof./IIS 5.0) -

Set re = new RegExp

This Object is supposed to be supported by VBScript 5.0 which comes in
Windows 2000 / IE 5.0 upwards, how come it does not work on my server?

It's very difficult to troubleshoot when all we are told is that something
"does not work." If a user called you and said one of your programs did not
work, what would be your first response?

Bob Barrows
 
C

Chris Hohmann

Bob Barrows said:
So the users will have to be trained to escape their quotes, eh? I don't
know ... it's hard enough to train some of the programmers to do this ....
;-)

Bob

Only if they want to embed quotes in the phrase they're looking for.
Most users (and programmers) can remain ignorantly blissful about the
concept. Perhaps you should teach your users about "stored procedure as
method", then they wouldn't have to worry about quotes/apostrophes in
their parameters. :)
 
A

ARK

Well the error shown is -
Technical Information (for support personnel)

a.. Error Type:
(0x8002801D)
Library not registered.
/regexp.asp, line 5
b.. Browser Type:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

c.. Page:
GET /regexp.asp
and Line 5 happens to have the following -

Set re = new RegExp

I guess the dll is there somewhere and it's not got registered during
Windows 2K Install?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top