a position on Treestructure in SQL and request for comments on same

J

John Benson

I checked out the

http://c2.com/cgi/wiki?TreeInSql

link and found the following:

Motivation

There are many applications where data is structured as trees. This data
needs to be stored in a database. This data needs to be searched
efficiently. A join operation (set intersection) needs to be efficient.

Forces

a.. Loading an entire tree into memory before searching is not scalable.
b.. Relational tables are two dimensional, so trees must be expressed
using relational linkage.
c.. Fast searching at the database relies on early subsetting against some
index.
(end quote)

This whole "store tree nodes as individual records in an RDBMS" sounds
doomed to irrelevance in the near future because:

1) Loading an entire file into memory wasn't scalable, but it's sure looking
that way now, and specific file <--> string idioms in Python and the
availability of scads of cheap RAM encourage it more with each passing year.
Analogously, maintaining an entire pointery data structure in memory and
dispensing with access through RDBMS key blocks in favor of simple pointer
traversal is cheaper and cheaper as time goes by. Large (64-bit offset)
files and 64-bit OSes are waiting in the wings, too.

2) Virtual memory will also help you in your efforts to cram ever larger
data structures into memory. If you want to persist them easily, they could
be implemented in memory-mapped files with lazy read semantics (i.e. file
portions are only retrieved from disk on demand).

3) Perhaps a felicitous partition of functionality would be appropriate,
such as in-memory tree data structures with leaf node data storage delegated
to an RDBMS or hash table on disk.

I feel a little like a devil's advocate here, because in the past I have
also advocated shoehorning tree-structured stuff into an ISAM file, similar
to the tree-to-RDBMS idea posted here earlier, and, like a good code monkey,
I was excited about doing it. But it seems now that the technological
drivers are rendering this approach outmoded.

Although there are limitations of having stuff in-memory (like, how do you
provide scalable access to it from other processors and checkpoint it to
disk), it's interesting to note that your average RDBMS has already adopted
the approach outlined in 3): key blocks tend to get cached in memory anyway,
since they're on the critical path to the data block leaf nodes, which can
also get cached too. Whoops! Sounds like your RDBMS is already a mugwump
sort of tree, with it's mug in-memory and it's wump on disk. So it sound's
like there is really a continuum of mug/wump distribution, and that the wump
would wither away were it not necessary for persisting data structures.

Comments, please: am I missing anything important here?
 
R

Rene Pijlman

John Benson:
Although there are limitations of having stuff in-memory (like, how do you
provide scalable access to it from other processors and checkpoint it to
disk), it's interesting to note that your average RDBMS has already adopted
the approach outlined in 3):

And Prevayler has adopted 1 and 2:
http://www.prevayler.org/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top