persistent composites

Aaron Brady · Jun 14, 2009

Hi, please forgive the multi-posting on this general topic.

Some time ago, I recommended a pursuit of keeping 'persistent
composite' types on disk, to be read and updated at other times by
other processes. Databases provide this functionality, with the
exception that field types in any given table are required to be
uniform. Python has no such restriction.

I tried out an implementation of composite collections, specifically
lists, sets, and dicts, using 'sqlite3' as a persistence back-end.
It's significantly slower, but we might argue that attempting to do it
by hand classifies as a premature optimization; it is easy to optimize
debugged code.

The essentials of the implementation are:
- each 'object' gets its own table.
= this includes immutable types
- a reference count table
= when an object's ref. count reaches zero, its table is dropped
- a type-map table
= maps object's table ID to a string of its type
- a single 'entry point' table, with the table ID of the entry-point
object
= the entry point is the only data structure available to new
connections. (I imagine it will usually be a list or dict.)

I will be sure to kill any interest you might have by now, by
"revealing" a snippet of code.

The object creation procedure:

def new_table( self, type ):
''' 'type' is a string, the name of the class the object is an
instance of '''
cur= self.conn.cursor( )
recs= cur.execute( '''SELECT max( tableid ) FROM refcounts''' )
rec= cur.fetchone( )
if rec[ 0 ] is None:
obid= 0
else:
obid= rec[ 0 ]+ 1
cur.execute( '''INSERT INTO types VALUES( ?, ? )''', ( obid,
type ) )
cur.execute( '''INSERT INTO refcounts VALUES( ?, ? )''', ( obid,
1 ) )

The increment ref. count procedure:

def incref( self, obid ):
cur= self.conn.cursor( )
recs= cur.execute( '''SELECT count FROM refcounts WHERE tableid
= ?''', ( obid, ) )
rec= recs.fetchone( )
newct= rec[ 0 ]+ 1
cur.execute( '''UPDATE refcounts SET count = ? WHERE tableid = ?''',
( newct, obid ) )

The top-level structure contains these two procedures, as well as
'setentry', 'getentry', and 'revive' procedures.

Most notably, user-defined types are possible. The dict is merely a
persistent dict. 'revive' checks the global namespace by name for the
original type, subject to the same restrictions that we all know and
love that 'pickle' has.

As usual, deadlocks and cyclic garbage pose the usual problems. The
approach I used last time was to maintain a graph of acquired locks,
and merely check for cycles to avert deadlocks, which would go in a
separate table. For garbage, I can't find a better solution than
Python already uses.

From the 3.0 docs:
gc.garbage

A list of objects which the collector found to be unreachable but
could not be freed (uncollectable objects).
....
Python doesn’t collect such [garbage] cycles automatically because, in
general, it isn’t possible for Python to guess a safe order in which
to run the __del__() methods. If you know a safe order, you can force
the issue by examining the garbage list, and explicitly breaking
cycles due to your objects within the list.

Before I go and flesh out the entire interfaces for the provided
types, does anyone have a use for it?

kindly · Jun 14, 2009

Hi, please forgive the multi-posting on this general topic.

Some time ago, I recommended a pursuit of keeping 'persistent
composite' types on disk, to be read and updated at other times by
other processes. Databases provide this functionality, with the
exception that field types in any given table are required to be
uniform. Python has no such restriction.

I tried out an implementation of composite collections, specifically
lists, sets, and dicts, using 'sqlite3' as a persistence back-end.
It's significantly slower, but we might argue that attempting to do it
by hand classifies as a premature optimization; it is easy to optimize
debugged code.

The essentials of the implementation are:
- each 'object' gets its own table.
= this includes immutable types
- a reference count table
= when an object's ref. count reaches zero, its table is dropped
- a type-map table
= maps object's table ID to a string of its type
- a single 'entry point' table, with the table ID of the entry-point
object
= the entry point is the only data structure available to new
connections. (I imagine it will usually be a list or dict.)

I will be sure to kill any interest you might have by now, by
"revealing" a snippet of code.

The object creation procedure:

def new_table( self, type ):
''' 'type' is a string, the name of the class the object is an
instance of '''
cur= self.conn.cursor( )
recs= cur.execute( '''SELECT max( tableid ) FROM refcounts''' )
rec= cur.fetchone( )
if rec[ 0 ] is None:
obid= 0
else:
obid= rec[ 0 ]+ 1
cur.execute( '''INSERT INTO types VALUES( ?, ? )''', ( obid,
type ) )
cur.execute( '''INSERT INTO refcounts VALUES( ?, ? )''', ( obid,
1 ) )

The increment ref. count procedure:

def incref( self, obid ):
cur= self.conn.cursor( )
recs= cur.execute( '''SELECT count FROM refcounts WHERE tableid
= ?''', ( obid, ) )
rec= recs.fetchone( )
newct= rec[ 0 ]+ 1
cur.execute( '''UPDATE refcounts SET count = ? WHERE tableid = ?''',
( newct, obid ) )

The top-level structure contains these two procedures, as well as
'setentry', 'getentry', and 'revive' procedures.

Most notably, user-defined types are possible. The dict is merely a
persistent dict. 'revive' checks the global namespace by name for the
original type, subject to the same restrictions that we all know and
love that 'pickle' has.

As usual, deadlocks and cyclic garbage pose the usual problems. The
approach I used last time was to maintain a graph of acquired locks,
and merely check for cycles to avert deadlocks, which would go in a
separate table. For garbage, I can't find a better solution than
Python already uses.

From the 3.0 docs:
gc.garbage

A list of objects which the collector found to be unreachable but
could not be freed (uncollectable objects).
...
Python doesn’t collect such [garbage] cycles automatically because, in
general, it isn’t possible for Python to guess a safe order in which
to run the __del__() methods. If you know a safe order, you can force
the issue by examining the garbage list, and explicitly breaking
cycles due to your objects within the list.

Before I go and flesh out the entire interfaces for the provided
types, does anyone have a use for it?

I like it as a concept, have not got any use for it this minute, but I
am sure it will be useful someday.

Steven D'Aprano · Jun 14, 2009

Aaron said:
Some time ago, I recommended a pursuit of keeping 'persistent
composite' types on disk, to be read and updated at other times by
other processes. Databases provide this functionality, with the
exception that field types in any given table are required to be
uniform. Python has no such restriction. ....
I will be sure to kill any interest you might have by now, by
"revealing" a snippet of code.

How about telling us what problem you're trying to solve? Giving us your
implementation isn't useful if we don't even know what it's for. Persistent
data on disk, okay, I get that -- but how is it different from (say) XML,
or INI files, or pickles, or similar? Is the idea to have *live* data
stored on disk, updated by multiple processes simultaneously? Don't assume
that people will remember your recommendation from "some time ago".

Aaron Brady · Jun 14, 2009

How about telling us what problem you're trying to solve? Giving us your
implementation isn't useful if we don't even know what it's for. Persistent
data on disk, okay, I get that -- but how is it different from (say) XML,
or INI files, or pickles, or similar? Is the idea to have *live* data
stored on disk, updated by multiple processes simultaneously? Don't assume
that people will remember your recommendation from "some time ago".

It was no time at all in the Usenet world, I suppose. There are lots
of strategies for storing data on a disk, and lots of strategies for
storing data in memory. I was trying on disk what Python uses on
memory.

If you want to argue that Python isn't useful, you're in the wrong
place. If you want to argue that Python's strategy is useful on
disks, you're talking to the right person.

I'll draw an analogy. static data structures : sql & co. :: Python
data structures : this undertaking. It has all the advantages over
SQL & XML structures that Python has over C structures: sort of a
'worst of neither' combination. I haven't written a textbook on DSs,
though, so I don't quite know where to begin. In my eyes, you're
practically asking, "What's your use case for Python?" Either that,
or I don't know enough other languages.

I have nothing to use it for, but someone might, and it might be just
plumb interesting to fellow Pythoneers. If I did, I'd have somewhere
to start.

Jaime Fernandez del Rio · Jun 14, 2009

Before I go and flesh out the entire interfaces for the provided
types, does anyone have a use for it?

A real-world application of persistent data structures can be found here:

http://stevekrenzel.com/persistent-list

Jaime

Diez B. Roggisch · Jun 15, 2009

Aaron said:
Hi, please forgive the multi-posting on this general topic.

Some time ago, I recommended a pursuit of keeping 'persistent
composite' types on disk, to be read and updated at other times by
other processes. Databases provide this functionality, with the
exception that field types in any given table are required to be
uniform. Python has no such restriction.

I tried out an implementation of composite collections, specifically
lists, sets, and dicts, using 'sqlite3' as a persistence back-end.
It's significantly slower, but we might argue that attempting to do it
by hand classifies as a premature optimization; it is easy to optimize
debugged code.

<snip/>

Sounds like you are re-inventing the ZODB.

Diez

Aahz · Jun 15, 2009

Sounds like you are re-inventing the ZODB.

....or SQLAlchemy or pickles in a SQL BLOB or...
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer

Aaron Brady · Jun 15, 2009

<snip/>

Sounds like you are re-inventing the ZODB.

Diez

Alright, Diez. Here is some private consulting for free.

''Section 2.6.1:
The most common idiom that isn't caught by the ZODB is mutating a list
or dictionary''

My approach performs this for free.

The docs also don't mention interprocess communication, which is one
of the two primary functions that I satisfy in my approach.

The syntax is different, non-trivially so, and mine is more Pythonic.

Aaron Brady · Jun 15, 2009

A real-world application of persistent data structures can be found here:

http://stevekrenzel.com/persistent-list

Jaime

Jaime, thanks for the link. I contacted its author.

Aaron Brady · Jun 15, 2009

...or SQLAlchemy or pickles in a SQL BLOB or...

I recognize your aversion to my claim of making a new approach on
something. I suppose an austere review of literature would compare
with the existing alternatives.

My approach does not define tables, so it is not SQL Alchemy; it is
not mere sugar for SQL. It defines a 'set' class and a 'tuple' class,
so some of the functionality may (and probably should be expected to)
overlap.

http://www.sqlalchemy.org/docs/05/ormtutorial.html#define-and-create-a-table

My approach does not pickle objects, so it is not a mere BLOB pickle.
If a user writes, listA[ 50 ].append( "abcde" ), one addition is made
to an existing structure, and the remaining contents of 'listA' are
unread and unchanged.

Paul Rubin · Jun 15, 2009

Aaron Brady said:
Jaime, thanks for the link. I contacted its author.

You might also look at www.couchdb.org .

Aaron Brady · Jun 15, 2009

You might also look atwww.couchdb.org.

I'm not much for the interface. But the back end might match what I'm
doing.

Aaron Brady · Jun 16, 2009

I'm not much for the interface. But the back end might match what I'm
doing.

Making the charitable interpretation that this was the extent of c-l-
py's support and enthusiasm for my idea, I will now go into mourning.
Death occurred at oh-eight-hundred. Rest in peace, support &
enthusiasm.

Mike Kazantsev · Jun 16, 2009

Making the charitable interpretation that this was the extent of c-l-
py's support and enthusiasm for my idea, I will now go into mourning.
Death occurred at oh-eight-hundred. Rest in peace, support &
enthusiasm.

I've read this thread from the beginning, being tempted to insert
remarks about shelve module or ORMs like SQLAlchemy, but that'd be
meaningless without the problem description, which I haven't seen
anywhere. Is it some trick idea like "let's walk on our heads"?

--
Mike Kazantsev // fraggod.net

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAko3tZkACgkQASbOZpzyXnEj0QCgsJEobtw8Y+fW3FETniZGXfqP
ZRAAn3X8yYW5wbAlKYmtsKfSoa2HO8XB
=o/1R
-----END PGP SIGNATURE-----

Aaron Brady · Jun 17, 2009

I've read this thread from the beginning, being tempted to insert
remarks about shelve module or ORMs like SQLAlchemy, but that'd be
meaningless without the problem description, which I haven't seen
anywhere. Is it some trick idea like "let's walk on our heads"?

More like bronze them, or hang them on a tackboard. You haven't
convinced me that it's not a problem, or that it's an easy one.

Rhodri James · Jun 17, 2009

More like bronze them, or hang them on a tackboard. You haven't
convinced me that it's not a problem, or that it's an easy one.

Unfortunately it's up to you to demonstrate that it is a problem,
whichever of the many possible 'it's you're talking about. So far,
the question "Why would I want to use this? What's the use case?"
has gone unanswered, and I'm sure I'm not the only baffled by it.

Aaron Brady · Jun 19, 2009

Unfortunately it's up to you to demonstrate that it is a problem,
whichever of the many possible 'it's you're talking about. So far,
the question "Why would I want to use this? What's the use case?"
has gone unanswered, and I'm sure I'm not the only baffled by it.

I can demonstrate it's a problem; many things are.

The subject line says it all. SQL will persist (and share for IPC,
after a fashion) sets of statically typed tuples. But we have need
for sets of dynamically typed tuples, in volatile storage at the very
least, or no one would like Python.

Whether we have need for them in persistent storage remains to be
seen.

POSH Python Object SHaring has been at least one student's graduate
thesis. It couldn't hurt to pursue it if one has time, as with many
things. It's pretty likely my estimates of end user usefulness of
'KeepDB', say, derive sturdily and steadily from our estimated
popularity of Python. So much for my motives, though not for my
opportunities.

My implementation has followed two separate paths: one using SQL for a
back-end host; the other using hand-coded bytes on disk, complete with
large file support using views, and on-disk (or in-buffer) memory
management, which incidentally I spoke up about about a year ago. It
was the topic of an honors lecture in the Data Structures course in my
undergrad. For the record, I still can't beat a log-N balanced tree
for finding free nodes, but I am operating on the assumption that the
smallest sufficient region is the best to return from 'alloc'.

You are not being any help, Rhodri, in your question. Its chief
virtue is fidelity of programming, that is, what you write is most
true to what you mean. If you have an object on disk, and want the
third element of its 'coords' list attribute, it's a waste to do
anything other than find its disk address, then the disk address of
its 'coords' attribute, then the disk address of the third element of
that, such as say, loading or worse, processing, the entire structure;
even if, and in light of parallel considerations about economic market
transparency and efficiency, that is to say, economy, maybe /
especially/ if, you are having a lowly machine make do. You don't
want to open every box in your basement to find your tennis racket;
you want to open one box, find the racket in the index, then open all
and only the boxes that 'nestedly' contain the racket. (Yes, that's /
all/ and /only/, or 'if and only if'-- not to be confused with 'one
and only'.)

It's more economic.

Rhodri James · Jun 19, 2009

You are not being any help, Rhodri, in your question.

To you, perhaps not. To me, it has at least had the effect of making
what you're trying to do (write a pythonic object database) clearer.

Aahz · Jun 19, 2009

You are not being any help, Rhodri, in your question.

Maybe not, but honestly, you're getting pretty close to going back in my
killfile. Although you're no longer trolling this group, I find your
writing difficult to read at best; answering questions from people like
Rhodri is about your only chance to reach people like me. Even without
the killfile, I'm skipping about 60-80% of your posts. It's a free
Usenet, of course, so you're under no obligation to pay any attention to
me, but I think you ought to consider the merits of changing your
opinion of Rhodri's questions.

Side note: the rhetorical trick of inserting a person's name in a
statement is often intended and received as a mildly patronizing insult.
It's not at all clear to me whether that was your intent; if not, you
might want to change your writing style a bit.

Aaron Brady · Jun 19, 2009

Maybe not, but honestly, you're getting pretty close to going back in my
killfile. Although you're no longer trolling this group, I find your
writing difficult to read at best; answering questions from people like
Rhodri is about your only chance to reach people like me. Even without
the killfile, I'm skipping about 60-80% of your posts. It's a free
Usenet, of course, so you're under no obligation to pay any attention to
me, but I think you ought to consider the merits of changing your
opinion of Rhodri's questions.

Side note: the rhetorical trick of inserting a person's name in a
statement is often intended and received as a mildly patronizing insult.
It's not at all clear to me whether that was your intent; if not, you
might want to change your writing style a bit.

Side note: the rhetorical trick of inserting a person's name in a

This is a welcome digression for me. I wasn't sure that my idea was
being taken seriously. On a temperature scale from freezing to
boiling, I took Rhodri's message to be somewhere in the single
digits-- quite chilly. Maybe if it hadn't made me feel so defensive,
I could've just said so. But, judging from his reply, he got the

message. said:
I find your
writing difficult to read at best; ....
I'm skipping about 60-80% of your posts.

My loss.

Database Manager: A C++ Console Application	14	May 12, 2025
recommended way to insert data into a one to many relationship usingpython	6	May 1, 2010
MySQL dymanic query in Python	2	May 29, 2013
Replacing old data with new data using python	1	Mar 28, 2013
persistent deque (continued)	3	Jul 21, 2008
API:create & persistent	1	Sep 28, 2009
sqlite3 gui	0	Aug 5, 2009
cx_Oracle clause IN using a variable	3	Oct 16, 2012

persistent composites

Aaron Brady

kindly

Steven D'Aprano

Aaron Brady

Jaime Fernandez del Rio

Diez B. Roggisch

Aahz

Aaron Brady

Aaron Brady

Aaron Brady

Paul Rubin

Aaron Brady

Aaron Brady

Mike Kazantsev

Aaron Brady

Rhodri James

Aaron Brady

Rhodri James

Aahz

Aaron Brady

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads