Boolean query parsing.... what tools, examples, suggestionss

P

puzzlecracker

am implementing a fairly simple and straightforward text-search (I
display each line that contains required pattern ) that supports
Boolean queries in the following format:

str1 AND str2 NOT str3 - where not is a unary operation thus the
following would be equivalent to str1 AND str ANDNOT str3 by default
unless a user specifies otherwise....


Additionally, It is left-associative;

I would also like to have parenthesis as well: (str1 AND (str2 OR
str3))


That is it....


Should I write a parser for that followed by walking the AST (abstract
syntax tree) - javacc or antlar? - or would java regular expressions
suffice? it should be fast, whereas java regex is known to be slow....


Any suggestions, examples, references would be highly appreciated.

Thx
 
L

Luc The Perverse

puzzlecracker said:
str1 AND str2 NOT str3 - where not is a unary operation thus the
following would be equivalent to str1 AND str ANDNOT str3 by default
unless a user specifies otherwise....

Either not is a binary operator that means "and not" or it is a unary
operator and the operation you listed is not defined, but rather should be
str1 AND str2 AND NOT str3
 
P

puzzlecracker

Luc said:
Either not is a binary operator that means "and not" or it is a unary
operator and the operation you listed is not defined, but rather should be
str1 AND str2 AND NOT str3


Thx. any implementable suggestions?
 
L

Luc The Perverse

puzzlecracker said:
Thx. any implementable suggestions?

Yes.

Parse the string and break it up into operators and test conditions. Make
the class which holds the condition have a negatable clause, and remove nots
while parsing by just including them in the next condition instance.

In this way you should have

condition operator condition operator . . . operator.

This can be easily tested for.

I'm not entirely convinced that recursion is the best way to handle
parenthesis as I ran into trouble with it trying to make a calculator
program. I devised a very simple (although perhaps inefficient) method in
which I would search for the first closing parenthesis and then search
backwards till I found an open parenthesis. This is the first innermost
parenthesis, once it is computed it can be saved as a single operand
(value), or you could allow your operands themselves to be a tree structure,
depending on how much you want to analyze and use them. Once you have done
this, then just start over.

It depends - do you need a way to store the result, or just need to solve
it. Recursion is very easy, and I don't remember anymore what trouble I was
having. So if all you want to do is get the result, making a function
which knows if it is inside a parenthesis is useful. Make it return
boolean and throw ParseException, and accept the equation from the current
position and tell it if it is inside a parenthesis (so it will know that
hitting an ending parenthesis is not a parse error, and that it needs to
exit) If it hits an opening parenthesis, then it will recursively call
itself. The only trick will be remembering where in the string you are.
You could find a way to have the recursive call pass back how many
chatacters were advanced during the operation. The calling parent
recursive function should "own" the child, so the child shouldn't be making
any changes to the parent's data - as such it would be a violation of
accepted OOP to allow the child to advance a string counter owned or shared
by the parent, even if you know it will work. This is of course my
opinion.
 
P

puzzlecracker

Luc said:
Yes.

Parse the string and break it up into operators and test conditions. Make
the class which holds the condition have a negatable clause, and remove nots
while parsing by just including them in the next condition instance.

In this way you should have

condition operator condition operator . . . operator.
I am not sure what you mean by operator...please explain

This can be easily tested for.

I'm not entirely convinced that recursion is the best way to handle
parenthesis as I ran into trouble with it trying to make a calculator
program. I devised a very simple (although perhaps inefficient) method in
which I would search for the first closing parenthesis and then search
backwards till I found an open parenthesis. This is the first innermost
parenthesis, once it is computed it can be saved as a single operand
(value), or you could allow your operands themselves to be a tree structure,
depending on how much you want to analyze and use them. Once you have done
this, then just start over.


How do you create a tree... I mean what data structure should be used
,advisable?
It depends - do you need a way to store the result, or just need to solve
it. Recursion is very easy, and I don't remember anymore what trouble I was
having. So if all you want to do is get the result, making a function
which knows if it is inside a parenthesis is useful. Make it return
boolean and throw ParseException, and accept the equation from the current
position and tell it if it is inside a parenthesis (so it will know that
hitting an ending parenthesis is not a parse error, and that it needs to

what are the recursive solutions available?

thx
 
L

Luc The Perverse

puzzlecracker said:
I am not sure what you mean by operator...please explain

An operator is like AND, OR, XOR, NAND

*snip*
How do you create a tree... I mean what data structure should be used
,advisable? *snip*
what are the recursive solutions available?

thx

Maybe we should start from the beginning? What experience do you have
programming?

Writing any kind of a parser should not be an introductory learning
assignment - unless that is truly what interests you.
 
P

puzzlecracker

Luc said:
An operator is like AND, OR, XOR, NAND

*snip*

Maybe we should start from the beginning? What experience do you have
programming?

Writing any kind of a parser should not be an introductory learning
assignment - unless that is truly what interests you.

:) working professional with 5 years of software dev.... and 0 in
compiler creation!
 
P

puzzlecracker

Luc said:
An operator is like AND, OR, XOR, NAND

*snip*

Maybe we should start from the beginning? What experience do you have
programming?

Writing any kind of a parser should not be an introductory learning
assignment - unless that is truly what interests you.

then what is the difference between operator and condition in your
context?

thanks
 
L

Luc The Perverse

puzzlecracker said:
then what is the difference between operator and condition in your
context?

thanks

An operator is any "+ - ! & && AND XOR " etc in your example it is AND OR
etc.

A condition is something which is true or false. Such as "true" "1==X"
"b!=c" etc.
 
P

puzzlecracker

---In this way you should have

--condition operator condition operator . . . operator.

I am not sure how this is possible then..... doesn't make much
sense...pls explain. thx
 
L

Luc The Perverse

puzzlecracker said:
---In this way you should have

--condition operator condition operator . . . operator.

I am not sure how this is possible then..... doesn't make much
sense...pls explain. thx

Please quote what you are replying to.

You should never end in an operator.

Maybe you should try to code it, and then when you run into a specific
problem you can ask us.

Make a base class (can be empty) called Operand

Derive classes Operator and Condition. Don't include NOT in your first
draft, also don't include parenthesis. Don't attempt to test actual
conditions, just use TRUE and FALSE.

First do a sanity check, to make sure you start with a condition and you
alternate condition and operator. Make sure you end with a condition.

In first draft, don't try to parse complex expressions - just one character
for each operand.

T&F|T

Your set will consist of T for true, F for fale & for AND and | for or.

Next you can add whatever you feel like next. Make the &'s into ANDs, make
it space tolerant. Add Parenthesis.

Seperately develop your condition tests. I'm not sure what your conditions
are, but if you have variables 1 through 10 stored in variable names $1 $2
$3 . . . $10 You could make it something like this

$1 > $3 Your legal variables are $[1 . . 10] and your legal operators for
the expressions are = (or ==) > < <= >= <> (or !=)

When I haved developped very simple scripting languages (the closest thing I
can relate to through experience) I find it useful to use special characters
to identify literals, variables. For instance a dollar sign for any kind of
variable, a pound sign for any kind of number. Also I like to make all my
operators single characters, but this is certainly not a requisite. But it
makes it SO much easier to parse! Example

$1<#1.0|$1=#1.0

If you want to keep <= as an operator define a special character for it, and
then just search and replace in string for <= before starting.

The point of this is to be able to immediately identify from a single
character what the current operand is.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top