Object Persistence Using a File System

C

Chris Spencer

Before I get too carried away with something that's probably
unnecessary, please allow me to throw around some ideas. I've been
looking for a method of transparent, scalable, and human-readable object
persistence, and I've tried the standard lib's Shelve, Zope's ZODB,
Divmod's Axiom, and others. However, while they're all useful, none
satisfies all my criteria. So I started writing some toy code of my own:
http://paste.plone.org/5227

All my code currently does is transparently keep track of object changes
without requiring any special coding on part of the user, and a function
to convert an object to a file system hierarchy of folders and files.
Please, let me know what you think.

Thanks,
Chris
 
B

Bruno Desthuilliers

Chris said:
Before I get too carried away with something that's probably
unnecessary, please allow me to throw around some ideas. I've been
looking for a method of transparent, scalable, and human-readable object
persistence, and I've tried the standard lib's Shelve, Zope's ZODB,
Divmod's Axiom, and others. However, while they're all useful, none
satisfies all my criteria. So I started writing some toy code of my own:
http://paste.plone.org/5227

All my code currently does is transparently keep track of object changes
without requiring any special coding on part of the user, and a function
to convert an object to a file system hierarchy of folders and files.
Please, let me know what you think.

As you say, using filesystem for fine-grained persistance may not be the
most efficient solution. I also wonder how (if...) you intend to address
concurrent R/W access and transactions...

A few observations and questions :
- you should avoid tests on concrete types as much as possible - at
least use isinstance
- tuples are immutable containers. What about them ?
- what about multiple references to a same object ?
 
N

Nick Vatamaniuc

Chris,

Interesting concept. But why is there a need for a human readable
object persistence that is x10 slower than pickle? In other words
present a good use case with a rationale (i.e. your "criteria" that you
mentioned). The only one I can think of so far is debugging.

Also some objects are inherently not human readable (they are large, or
just binary/array data for example), or you could simply end up having
so many of them (10GB worth of disk space) that just due to the volume
they will become not very readable and you would need some kind of a
custom query mechanism (or find+grep) to search through your objects if
you wanted to read/edit values in them.

In your code comments I saw that another reason is resistance to
corruption. I think that a database that is backed up regularly is
probably a better solution. Besides, sometimes you want your failure to
be dramatic (go down with a bang!) so it gets your attention and you
can restore everything with a backup. Otherwise, with a tens of
thousands of files, some of them might end up being corrupted before
your next filesystem check gets to them, by then, the corruption could
spread (your program would load it, perform computations, persist the
wrong results and so on), and you wouldn't even notice it.

Hope these comments help,
Nick V.
 
B

Bruno Desthuilliers

Nick Vatamaniuc a écrit :
(please don't top-post - corrected)
Bruno Desthuilliers wrote:
(snip)

>
> Good point about isinstance. Here is a good explanation why:
> http://www.canonical.org/~kragen/isinstance/

Err... I'm sorry but justifying the use of explicit tests on concrete
type not even taking inheritance into consideration with a paper
explaining why type tests taking inheritance into consideration may be
bad seems rather strange to me... Or did I missed your point here ?
 
L

Lawrence D'Oliveiro

I've been looking for a method of transparent, scalable, and
human-readable object persistence...

Don't do object persistence. What is an object? It's a combination of code
and data. Code structure is internal to your program--it has no business
being reflected in external data that may be saved to a persistent medium,
transmitted over an external channel or whatever. Otherwise when you
refactor your code, that external data no longer becomes readable without
major backward-compatibility hacks.

Use data abstraction instead: define a high-level data structure that is
independent of implementation details of your code. When you look at the
major data-interchange formats in use today--such as XML, ODF, MPEG,
WAV--there's a reason why none of them are built on object persistence.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,123
Latest member
Layne6498
Top