a position on Treestructure in SQL and request for comments on same

John Benson · Nov 26, 2003

I checked out the

http://c2.com/cgi/wiki?TreeInSql

link and found the following:

Motivation

There are many applications where data is structured as trees. This data
needs to be stored in a database. This data needs to be searched
efficiently. A join operation (set intersection) needs to be efficient.

Forces

a.. Loading an entire tree into memory before searching is not scalable.
b.. Relational tables are two dimensional, so trees must be expressed
using relational linkage.
c.. Fast searching at the database relies on early subsetting against some
index.
(end quote)

This whole "store tree nodes as individual records in an RDBMS" sounds
doomed to irrelevance in the near future because:

1) Loading an entire file into memory wasn't scalable, but it's sure looking
that way now, and specific file <--> string idioms in Python and the
availability of scads of cheap RAM encourage it more with each passing year.
Analogously, maintaining an entire pointery data structure in memory and
dispensing with access through RDBMS key blocks in favor of simple pointer
traversal is cheaper and cheaper as time goes by. Large (64-bit offset)
files and 64-bit OSes are waiting in the wings, too.

2) Virtual memory will also help you in your efforts to cram ever larger
data structures into memory. If you want to persist them easily, they could
be implemented in memory-mapped files with lazy read semantics (i.e. file
portions are only retrieved from disk on demand).

3) Perhaps a felicitous partition of functionality would be appropriate,
such as in-memory tree data structures with leaf node data storage delegated
to an RDBMS or hash table on disk.

I feel a little like a devil's advocate here, because in the past I have
also advocated shoehorning tree-structured stuff into an ISAM file, similar
to the tree-to-RDBMS idea posted here earlier, and, like a good code monkey,
I was excited about doing it. But it seems now that the technological
drivers are rendering this approach outmoded.

Although there are limitations of having stuff in-memory (like, how do you
provide scalable access to it from other processors and checkpoint it to
disk), it's interesting to note that your average RDBMS has already adopted
the approach outlined in 3): key blocks tend to get cached in memory anyway,
since they're on the critical path to the data block leaf nodes, which can
also get cached too. Whoops! Sounds like your RDBMS is already a mugwump
sort of tree, with it's mug in-memory and it's wump on disk. So it sound's
like there is really a continuum of mug/wump distribution, and that the wump
would wither away were it not necessary for persisting data structures.

Comments, please: am I missing anything important here?

Rene Pijlman · Nov 26, 2003

John Benson:

Although there are limitations of having stuff in-memory (like, how do you
provide scalable access to it from other processors and checkpoint it to
disk), it's interesting to note that your average RDBMS has already adopted
the approach outlined in 3):

And Prevayler has adopted 1 and 2:
http://www.prevayler.org/

Request for comments on a design	2	Oct 23, 2010
PHP cURL for large content and single HTTP request	1	Feb 23, 2023
Hi, I am a webflow user. I am looking for CSS code that can KEEP ALL ELEMENTS POSITIONED in the SAME spot across all resolutions	0	Oct 27, 2023
Request for comments on python distributed technologies	5	May 19, 2006
Im working on a school project and i need some advice in MPLAB X and some programming around it	0	Apr 26, 2023
Request for useful functions on dicts	3	Jul 15, 2012
Why in 'due column', same value is showing for every clients?	4	Jul 18, 2020
I need some help on a format issue that should be simple for someone here (but not me!)	0	Jul 6, 2023

a position on Treestructure in SQL and request for comments on same

John Benson

Rene Pijlman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads