H
Hal Vaughan
Oliver said:Picking the right data structure for a given problem is one of those
things you just acquire through experience, I think. It's difficult for me
to say for sure, because I don't know what you're doing with your tables,
but based on what I heard so far, an acyclic directed graph sounds like
the the data structure I'd used to model the dependencies between your
tables.
I checked out the Wikipedia article on acyclic directed graphs. In some
ways that is close to what I was doing. Each table contained a list of its
dependent tables and, of course, they had lists of their dependent tables
too. I made sure they always went to the next level so no path would go to
a table on the same or a previous level.
Presumably, your program is trying to "do something": model global
warming, schedule airplane flights, track inventory in a warehouse, etc.
This "thing" that it is trying to do exists independently of your program;
that is, you could do it manually if you needed to, but it might be
painful, boring, repetitive, etc. This "thing" that is independent of your
program is your "problem domain".
It's a setting editor for another system (which is mostly in Perl, since it
handles a LOT of text). For various reasons, I don't want this program
interacting directly with the database, in part because this would be
accessible from the web and nothing else is. The settings data from the
database is converted to text in what is almost XML, but is something I set
up that made it easy to read and write the data easily and quickly from
Java and Perl, saved in files on the server, and this reads them in and
creates the tables. Basically, this program is a GUI that lets me edit the
settings quickly with point-and-click which, for me, is much faster than
dealing with all the SQL commands and the interrelations of the different
settings.
So what I'm wondering is whether the entities in your problem domain
naturally have a tree-like structure to them. If so, then it might be
worth it to stick with trees in your program that reflect the structure of
the problem domain, and use the queue-based algorithm mentioned below.
Otherwise, it might be worth chucking the trees in favour of the acyclic
directed graph (http://en.wikipedia.org/wiki/Acyclic_directed_graph).
After what you're saying, I'd say it's more in the acyclic directed graph.
I had never heard of a data structure like that before, but it fits better
than a tree.
Side note here: on some forums when I ask questions like the one I started
with, I get treated like I'm an idiot because I don't know what a lot of
people think is something basic. On this group, I've noticed that it may
happen once in a while, so I usually include that I'm self taught in my
first post. I've found that once people see that I have taught myself,
they understand why there are holes in what I know and people here often
give me a lot of info that goes way beyond answering my current question
and helps me later.
[A good explanation of acyclic directed graphs snipped for brevity.]
....
What you've written is essentially what I described as being a
breadth-first traversal using a queue (except you need not make these
sublists). I think generally, it would not be considered a form of
recursion. I think (but am not sure) that the technical definition of
recursion requires a method to call itself either directly or indirectly.
I didn't really know what recursion was until after I had created several
recursive subroutines on my own. Then I found that "recursion" described
what I had been doing. All the examples I saw were routines calling
themselves, but when I thought about it, and what I did here amounted to
something like this:
while (!isDone) {
//All sorts of stuff to be done
if (no more tables)
isDone = true;
}
It doesn't call itself, but it keeps going and repeating itself until it's
done. It does essentially the same thing as a recursive subroutine. I did
something slightly different. I used a LinkedList and each node is another
LinkedList of all the tables on a particular level that need updating, so I
did this:
private void updateTables(MyTable firstTable) {
int iLevel = 0;
LinkedList updateTables = new LinkedList(), oneLevel = new LinkedList();
oneLevel.add(firstTable);
updateTables.add(oneLevel);
while (intLevel < updateTables.size() {
//Go through all the tables in oneLevel and get a list of all their child
// child tables. Put them in a new LinkedList and add it to updateTables as
// the next node. If there are none, we don't add a node to updateTables so
// it doesn't increase in length, but intLevel will increase and be the
// same as the size of updateTables. That is the flag that we're out of
// dependent or child tables
intLevel++;
}
// code to go through updateTables and update all tables we've found.
return;
}
[More of explanation snipped for brevity]
....
The drawback of this algorithm is that you must always process the
entire tree. That is, you cannot, for example, only update B and its
dependents, because adding B to the queue will eventually trigger the
processing of E, but you'll miss the processing of F.
What I've ended up with is a routine that will start at any table, start
reading all the dependent ones from it, then get their dependents and so
on. It tracks which level each one is on and update them by going through
each level and updating each table in that level before moving on to the
next. I can start on the root table or on any other tables after it and
still do the same thing.
Having the dependency list as an acyclic graph allows you to only
process the parts of the trees that need to be processed.
At this point, I've got something that does close to that. Maybe I figured
it out without the actual terms?!
Thank you for all the time and effort I know it took to write up a clear
explanation. It was a help!
Hal