persistent deque (continued)

C

castironpi

Some time ago, I was asking about the feasibility of a persistent
deque, a double-ended queue.

It runs into the typical space allocation problems. If you're storing
a pickle, you have to allocate and fragment the file you've opened,
since pickles can be variable-length strings; i.e. if the new data is
too long, blank out its entry, and grow the file. If you're storing a
data-type, you lose Python's dynamic-type advantages, as well as its
long integers, as they can be any length. If you change the object in
the deque, such as when using any mutable type, you have to update the
container too.

Does anyone have any experience storing pickles (I am aware of the
similarities to shelf) to a database?
Can the file system facilitate the variable-length string problem?
How feasible is a length-of-length - length - data solution to the
unbounded integer problem?
Is there any alternative to completely re-pickling a large (say 1k
pickled) object you only change slightly?
What other issues are there?
Is a hybrid-storage type possible, that stores the contents of its
Python-allocated memory block to disk, including reference count, even
if it's a lot slower? The object could not contain any references to
objects not allocated on disk.

A first approach is for the file to look like this:

00 data 01 data 02
01 data 03 data 04
02 data 05 data 06

Append would add:

03 data 07 data 08

AppendLeft would add:

-01 data 09 data 0a

Pop would remove 03, PopLeft would remove -01. You would need a
length-and-head index to make 'rotate' available. Remove would run a
worst-case risk of renumbering half of the indices stored, plus a
rotate.
 
M

M.-A. Lemburg

Some time ago, I was asking about the feasibility of a persistent
deque, a double-ended queue.

You might want to have a look at mxBeeBase:

http://www.egenix.com/products/python/mxBase/mxBeeBase/

Using the integer index you could probably write an on-disk
dequeue.

The B*Tree indexes available in mxBeeBase keep the indexes sorted,
so you'd only have to keep incrementing the index as you add new
values and keep a record of the highest and lowest index currently
in use.

Details are left as exercise for the interested reader ;-)
It runs into the typical space allocation problems. If you're storing
a pickle, you have to allocate and fragment the file you've opened,
since pickles can be variable-length strings; i.e. if the new data is
too long, blank out its entry, and grow the file. If you're storing a
data-type, you lose Python's dynamic-type advantages, as well as its
long integers, as they can be any length. If you change the object in
the deque, such as when using any mutable type, you have to update the
container too.

Does anyone have any experience storing pickles (I am aware of the
similarities to shelf) to a database?
Can the file system facilitate the variable-length string problem?
How feasible is a length-of-length - length - data solution to the
unbounded integer problem?
Is there any alternative to completely re-pickling a large (say 1k
pickled) object you only change slightly?
What other issues are there?
Is a hybrid-storage type possible, that stores the contents of its
Python-allocated memory block to disk, including reference count, even
if it's a lot slower? The object could not contain any references to
objects not allocated on disk.

A first approach is for the file to look like this:

00 data 01 data 02
01 data 03 data 04
02 data 05 data 06

Append would add:

03 data 07 data 08

AppendLeft would add:

-01 data 09 data 0a

Pop would remove 03, PopLeft would remove -01. You would need a
length-and-head index to make 'rotate' available. Remove would run a
worst-case risk of renumbering half of the indices stored, plus a
rotate.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Jul 21 2008)________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
 
C

castironpi

Try starting with a dict-based implementation of a double-ended queue
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259179)
and then replace the dict with a shelf.  Presto, you've got a
persistent deque.

Raymond

We have the cookbook recipe Raymond brings up, and an automatic
pickler that inhahe brought up in May.

I've initiated the approach to the solution I have in mind, but the
dilemma comes early and isn't pleasant. I can either (i) recompile
the whole interpreter, replacing malloc with a custom function that
allocates from disk; or (ii) duplicate every class I want to be
persistable, making every change to the allocations by hand.

Disk-resident objects can clearly not contain references to any
objects not on disk (arguably at finalization-stage only), and
references to ones that are need to be wrapped by offset, and 'made
live' at initialization-time, so that references to the mapped file
resume succeeding after termination and restart--- the mapped memory
segment won't be the same from run to run.

Further, if I can't hijack the existing arena-pool model native in
Python, it would take writing a dedicated memory manager.

Assuming the whole memory isn't moved to disk, which is feasible, the
usage would look like this, to ensure that 'disk-resident' objects
don't refer to 'live' ones.

listA= perst( [ perst( 0 ), perst( 1 ), perst( "abc" ) ] )

to create the list [ 0, 1, "abc" ]. Modifications, e.g. append "def",
would look like this:

listA.append( perst( "def" ) )

and would execute the addition directly to mapped memory. The memory
file would then have five objects: 0, 1, "abc", "def", and this list.

It is not clear that normal assignments would still work, self.b= c,
even for disk-resident c, since on a later run, 'self.b' would refer
to a different place in memory. It might take either trapping updates
to self.__dict__, or something deeper at the level of the ast and
Assm't statements. Or an extra level of indirection in accessing it.
<C instance at index 12 in mem-mapped file 'data.dat'>

The indirection would consist of an offset-finder and a delegation
layer.

It would first have to look-up 'self.b', as opposed to accessing it
directly by memory address.

What I want from the group is to know if I've assessed the options for
a model of persistence accurately. Or if I haven't depicted what I'm
going for clearly. It seems enormous and if it is this hard, I doubt
I'll produce anything other than a few correct persistent types.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top