Unordered tree to store elements of a parenthesized expression

K

Kaushik

I have a string like (A AND B OR (C AND D AND E OR F)) which has to be
put into a tree. The number of children any node can have is variable.
For the above expression, I'd like to have a tree which looks like
this:
OR
AND
A
B
OR
F
AND
C
D
E

Could someone tell me how to construct this tree and then traverse this
tree?
I'll be traversing this tree to put the contents into a W3C document
later on.

Thanks
Kaushik
 
V

voorth

Kaushik said:
I have a string like (A AND B OR (C AND D AND E OR F)) which has to be
put into a tree. The number of children any node can have is variable.
For the above expression, I'd like to have a tree which looks like
this:
OR
AND
A
B
OR
F
AND
C
D
E

Could someone tell me how to construct this tree and then traverse this
tree?
I'll be traversing this tree to put the contents into a W3C document
later on.

Thanks
Kaushik

A google search on "parse boolean expression java" will get you a _lot_
of relevant information.

You might also want to take a look ad Dijkstra's "shunting yard
algorithm" : http://en.wikipedia.org/wiki/Shunting_yard_algorithm
 
J

John Ersatznom

Kaushik said:
I have a string like (A AND B OR (C AND D AND E OR F)) which has to be
put into a tree. The number of children any node can have is variable.
For the above expression, I'd like to have a tree which looks like
this:
OR
AND
A
B
OR
F
AND
C
D
E

Could someone tell me how to construct this tree and then traverse this
tree?
I'll be traversing this tree to put the contents into a W3C document
later on.

I guess you've decided OR takes precedence over AND from the above.

You need to build a tree so

public class Node {
public List<Node> children = new LinkedList<Node>();
public Node parent;
public String contents;
public boolean isOrNode;

public static Node buildTree (String x) {
int pos = 0;
StringBuffer wordBuilder = new StringBuffer();
String word = null;
boolean expectLeaf = true;
Deque<Node> stackOr = new LinkedList<Node>();
Deque<Node> stackAnd = new LinkedList<Node>();
Node flag = new Node();
Node currentOr = flag;
Node currentAnd = null;
Node child = null;
currentOr.isOrNode = true;
if (!x.startsWith('(')) x = "(" + x + ')';
while (pos < x.length()) {
char c = x.charAt(pos);
++pos;
if (!Character.isWhitespace(c)
&& c != '(' && c != ')') {
wordBuilder.append(c);
continue;
}
if (wordBuilder.length() > 0) {
word = wordBuilder.toString();
wordBuilder.delete(0,
wordBuilder.length() - 1);
}
if (word != null) {
if (word.equals("AND")) {
if (expectLeaf) throw whatever;
child = new Node();
currentAnd.children.add(child);
expectLeaf = true;
} else if (word.equals("OR")) {
if (expectLeaf) throw whatever;
Node cur = currentAnd;
if (cur.children.size() == 0)
throw whatever;
if (cur.children.size() == 1) {
currentOr.remove(cur);
cur=cur.children.get(0);
currentOr.add(cur);
}
currentAnd = new Node();
currentOr.children.
add(currentAnd);
expectLeaf = true;
} else {
if (!expectLeaf) throw whatever;
child.contents = word;
expectLeaf = false;
}
}
if (c == '(') {
if (!expectLeaf) throw whatever;
stackOr.push(currentOr);
stackAnd.push(currentAnd);
currentOr = new Node();
currentAnd = new Node();
child = new Node();
currentOr.children.add(currentAnd);
currentAnd.children.add(child);
currentOr.isOrNode = true;
word = null;
continue;
}
if (c == ')') {
if (expectLeaf) throw whatever;
if (currentAnd.children.size() == 0)
throw whatever;
if (currentAnd.children.size() == 1) {
currentOr.remove(currentAnd);
currentOr.add(currentAnd.
children.get(0));
}
if (currentOr.children.size() == 0)
throw whatever;
Node cur = currentOr;
if (currentOr.children.size() == 1)
cur = currentAnd;
currentOr = stackOr.pop(); // May throw
if (currentOr == flag) {
if (pos != string.length())
throw whatever;
return cur;
}
currentAnd.children.add(cur);
currentAnd = stackAnd.pop();
word = null;
expectLeaf = true;
}
}
throw whatever; // End of string came early.
}
}

Untested but should probably work with any string of the form you gave,
producing a tree of Nodes. Only leaf Nodes have non-null contents field,
while the rest have non-empty children list with at least 2 children and
isOrNode set for the "OR" nodes. Whitespace is insignificant and only
the parentheses and the words OR and AND (case-sensitive) are special.
Any other agglomeration of non-parenthesis non-whitespace is treated as
a leaf node. The outermost pair of parentheses are optional.

The string you gave should produce
OR
AND OR
A B AND F
C D E

as the tree structure. The loop starts with expectLeaf true and sees the
initial (, wordBuilder is empty and word is null, and it pushes the node
"flag" and currentAnd (never used). It then reads A and whitespace and
word becomes "A", and it changes expectLeaf to false and child's
contents to "A". Then it reads AND and makes child a new empty Node
added to currentAnd. It reads B, assigns the child's contents field;
reads OR, and creates a new currentAnd to add to currentOr. We now have
the left hand side of the tree. It sees a ( next and pushes currentOr
and currentAnd -- the former is the tree root above and the latter is
empty. It creates new ones -- they'll end up the right hand OR and right
hand AND above. It sees C now, setting the currently-under-construction
child's content field and unsetting expectLeaf. (Note the pattern:
expectLeaf goes up, then child gets its contents field filled in and
expectLeaf goes down, then a new child begins construction and the flag
goes back up.) Next is AND, so a new child is started. D goes into it;
new child; E. currentAnd is now the tree's other AND node with three
children when it sees OR. It creates an empty currentAnd and adds it to
currentOr. This currentAnd won't last. It sees F, sets child contents,
and then sees ). In the ) handler it sees that currentAnd has size 1 and
so drops it from currentOr and adds its sole child directly to
currentOr. Since currentOr has two children cur ends up currentOr;
otherwise it would have ended up currentOr's sole child. The original
currentAnd and currentOr get popped (the top OR and an empty currentAnd)
and cur (the right hand OR) added to the latter. And it immediately sees
another ). Again currentAnd has only one child so the top OR ends up
with the right hand OR as a direct child. As pos was incremented after
pulling the final ) from x, pos is the string's length and the top OR is
returned from the method.

It's trivial to verify that it handles a one-element non-rightmost AND
correctly as well -- building an OR is only terminated by a ) but
building an AND is terminated by any OR, so when it sees say (X OR Y)'s
"OR" it snaps the X to being a direct child of the OR node. It also
handles (FOO AND BAR) correctly, removing the superfluous OR node it was
building with the AND as its only child on encountering the ).

I don't guarantee it totally free of errors; in particular some imports
are needed (mainly from java.util) and you should throw a new instance
of some chosen exception class in place of each "whatever" occurrence.
It will choke on a string with trailing whitespace, unless it didn't
start with a (, as well. It will handle (X) properly, as the ) handler
first eats the useless currentAnd node and then the equally useless
currentOr node. It will choke on empty parenthesis pair (() or ( );
some whitespace may be between them) too. It also isn't easy to extend,
or fix if OR taking precedence wasn't your intention (though it gives
your intended tree for the example string you posted). In the general case:

X AND Y AND Z OR A AND B AND C OR D AND E AND F

gets you a single OR node with three AND nodes, the first with X, Y, and
Z, the next with A, B, and C, and the last with D, E, and F as children.

The only limit to parenthesis nesting depth is memory and any size
limitations LinkedList has when used as a stack via the Deque interface.
Memory consumption is O(nesting depth) + O(string length) and time
consumption is O(string length) as there's no backtracking for such a
simple grammar.

If you need more flexibility, the APIs that come with the JDK include
some parsing stuff, much of it geared toward HTML and XML but some of it
more general-purpose. But I personally hate mucking with StringTokenizer...

Exceptions: besides your own parse exception, it will throw
NoSuchElementException if there's an unbalanced ) in the input string.
You might want to transform that into another of your choice of parse
exception class with an added outer try block.

Efficiency caveats: empty nodes are sometimes created and discarded.
This includes the initial "flag" node, used to put in the OR-node deque
to flag exit from the final parentheses and any one-child nodes that get
created. At least it recycles the StringBuffer. :)
 
L

Luc The Perverse

Kaushik said:
How would I be able to implement this in version 4 of java?

Be able to implement what?

You need to quote whatever message you are replying to, or else people won't
know ;)
 
J

John Ersatznom

Kaushik said:
How would I be able to implement this in version 4 of java?

It wasn't that heavily generic. You'll have to change the List<Node>
references and the like to drop <Node> and cast things taken out of
collections to (Node); that's probably it, unless lists don't implement
Deque in older versions...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top