pickle complexity limit?

M

Mark Hahn

I have a heavily interlinked dom like tree structure of objects. A node
will link to many children redundantly and many children will reference
parents redundantly. So there are many cycles of references.

When I tried to cPickle one of these structures with about 700 nodes the
python interpreter crashed. I did a binary search limiting the number of
nodes added and it could handle 217 nodes. I switched to pickle and now it
gives a recursion limit exception instead of crashing.

I don't understand how this could happen with pickle. Isn't it supposed to
stop when it runs into an object it has already pickled? Couldn't it have
been coded in a stackless way?

What do I do now? I don't want to recompile with a deeper stack as this
will obviously only help in limited cases, not solve the real problem.
 
M

Martin v. =?iso-8859-15?q?L=F6wis?=

Mark Hahn said:
I don't understand how this could happen with pickle. Isn't it supposed to
stop when it runs into an object it has already pickled?

Yes, and it does.
Couldn't it have been coded in a stackless way?

Yes, but it doesn't. Please look at save_inst implementation; it ends
with

save(stuff)
write(BUILD)

This is where the recursion happens. You run into the limit only if
you can find a path through your object graph that does not visit the
same object twice.
What do I do now? I don't want to recompile with a deeper stack as this
will obviously only help in limited cases, not solve the real problem.

Read the pickle code, understand why it is recursive, and implement an
alternative that isn't recursive, yet preserves the original semantics
in terms of pickling order and sequence in which methods are called on
objects. When you are done, please contribute your changes back, to
sf.net/projects/python.

Regards,
Martin
 
S

Simon Burton

Read the pickle code, understand why it is recursive, and implement an
alternative that isn't recursive, yet preserves the original semantics
in terms of pickling order and sequence in which methods are called on
objects. When you are done, please contribute your changes back, to
sf.net/projects/python.

Regards,
Martin

yow, eek... Well, i had this same problem a few days ago.
Could the docs be at least modified to take a note of this limitation?
I'll stick a post on bugzilla.

BTW, it's likely that i will write a non-recursive pickler, but it won't try
to be compatable.

Simon.
 
F

Frank Bechmann

If I understood your problem correctly it might help to find out
multiple occurences of childs in your tree, define them once and
giving them a unique ID and then using the ID in following occurences
and the appropriate XML syntax.

I can't remember the details but at least I can remember having seen
this kind of logic in the python ZSI framework (when a SOAP/XML tree
is created from python-objects), available at
http://pywebsvcs.sourceforge.net/zsi.html.

Hope this helps.
 
P

Paul Rubin

Read the pickle code, understand why it is recursive, and implement an
alternative that isn't recursive, yet preserves the original semantics
in terms of pickling order and sequence in which methods are called on
objects. When you are done, please contribute your changes back, to
sf.net/projects/python.

I wonder if it's feasible to write a pickler as a C extension that
works something like a pointer-reversing garbage collector,
eliminating the need for recursion and still not needing any extra
storage in the objects.
 
M

Mark Hahn

Paul Rubin said:
I wonder if it's feasible to write a pickler as a C extension that
works something like a pointer-reversing garbage collector,
eliminating the need for recursion and still not needing any extra
storage in the objects.

You are quoting Martin from a message that I never saw. How can that be?
I'm reading all the messages from comp.lang.python in usenet with outlook
express.
 
M

Martin v. =?iso-8859-15?q?L=F6wis?=

Paul Rubin said:
I wonder if it's feasible to write a pickler as a C extension that
works something like a pointer-reversing garbage collector,
eliminating the need for recursion and still not needing any extra
storage in the objects.

You definitely will need storage, to remember the objects you have to
come back to; you won't necessarily need storage *in the objects*.

Regards,
Martin
 
M

Martin v. =?iso-8859-15?q?L=F6wis?=

Mark Hahn said:
You are quoting Martin from a message that I never saw. How can that be?
I'm reading all the messages from comp.lang.python in usenet with outlook
express.

I see my own message back on Usenet; messages overtaking each other is
not unheard of.

Regards,
Martin
 
M

Martin v. =?iso-8859-15?q?L=F6wis?=

Simon Burton said:
yow, eek... Well, i had this same problem a few days ago.
Could the docs be at least modified to take a note of this limitation?
I'll stick a post on bugzilla.

Better yet, contribute a documentation change: submit the precise
wording that you want to appear.

Regards,
Martin
 
M

Mark Hahn

Martin v. Löwis said:
Better yet, contribute a documentation change: submit the precise
wording that you want to appear.

Regards,
Martin

I still don't have Martin's original post and I never saw Simon's post. I'm
not getting large parts of this thread and I started it. Life isn't fair.
(boohoo)

I see a total of 7 messages in this thread, how many does everyone else see?
 
M

Mark Hahn

What do I do now? I don't want to recompile with a deeper stack as
this
Read the pickle code, understand why it is recursive, and implement an
alternative that isn't recursive, yet preserves the original semantics
in terms of pickling order and sequence in which methods are called on
objects. When you are done, please contribute your changes back, to
sf.net/projects/python.

Being lazy, I think I will look for persistent storage alternatives instead.
I've been meaning to check out zope. If I were to write something of my own
in this area I'd like to try something fresh, like transparent storage using
metaclasses.

So I guess I'll just have to remember that pickle is a toy for little things
like option storage. What a shame.

P.S. My messages finally all came through. I guess they got clogged up
somewhere. Sorry about all the whining.
 
P

Paul Rubin

You definitely will need storage, to remember the objects you have to
come back to; you won't necessarily need storage *in the objects*.

No. I'm talking about a crazy hack that was used in some old-time
Lisp systems. The idea is to temporarily reverse the pointers that
already exist in the objects to track what's been GC'd, so you don't
need more than a constant amount of additional storage. Google for
"Deutsch Schorr Waite garbage collection" or look in Knuth vol 1 for
more info.
 
S

Simon Burton

I have a heavily interlinked dom like tree structure of objects. A node
will link to many children redundantly and many children will reference
parents redundantly. So there are many cycles of references.
...

I am working on trying to flatten the data before it is pickled.
Here is the code below: it searches all references and replaces them
with dummy Ids (TankId) that key the original reference in a dict.

eg.

def fat_dump(nested):
t=Tank()
t.flatten(nested)
# pickle.dump( t, ... )
# ...
# t = pickle.load( ... )
nested = t.lift()

Rather untested as i am about to take a short holiday.
( note that == doesn't work well on nested data either! )

ciao,

Simon.


#!/usr/bin/env python

class TankId(int):
id = 0
def __new__(self):
TankId.id += 1
return int(TankId.id)

class Tank:
def __init__(self, root = None):
self.items = {}
self.ids = {} # XX use a list
self.todo = []
if root is not None:
self.flatten( root )
def __str__(self):
return str( self.ids.values() )
###########################################
# before pickle

def flat_list(self, item):
for i in range(len(item)):
item = self.queue( item )
def flat_dict(self, item):
for key in item.keys():
item[key] = self.queue( item[key] )
def flat_attrs(self, item):
for key in item.__dict__.keys():
item.__dict__[key] = self.queue( item.__dict__[key] )
def queue(self, item):
assert not isinstance( item, TankId )
if not ( isinstance(item, list) or isinstance(item, dict) or hasattr(item, "__dict__") ):
return item
if id(item) not in self.items:
self.todo.append(item)
tankid = TankId()
self.items[ id(item) ] = tankid # temporary
self.ids[ tankid ] = item
else:
tankid = self.items[ id(item) ]
return tankid
def flatten(self, root):
print "flaten()"
self.root = root
self.queue( self.root )
while self.todo:
item = self.todo.pop(0)
if isinstance(item, list):
self.flat_list(item)
if isinstance(item, dict):
self.flat_dict(item)
if hasattr(item, "__dict__"):
self.flat_attrs(item)
# tuple ? other immutables ? __set/getstate__ ?

del self.items
###########################################
# after pickle

def lift_list(self, item):
for i in range(len(item)):
item = self.recover( item )
def lift_dict(self, item):
for key in item.keys():
item[key] = self.recover( item[key] )
def lift_attrs(self, item):
for key in item.__dict__.keys():
item.__dict__[key] = self.recover( item.__dict__[key] )
def recover(self, item):
if isinstance( item, TankId ):
item = self.ids[ item ]
assert not isinstance( item, TankId )
return item
def lift(self):
print "lifting"
for item in self.ids.values():
if isinstance(item, list):
self.lift_list(item)
if isinstance(item, dict):
self.lift_dict(item)
if hasattr(item, "__dict__"):
self.lift_attrs(item)
# tuple ? other immutables ? __set/getstate__ ?
return self.root



--
Simon Burton, B.Sc.
Licensed PO Box A66
ANU Canberra 2601
Australia
Ph. 02 6249 6940
http://arrowtheory.com
 
C

Christian Tismer

Martin said:
Yes, and it does.


Yes, but it doesn't. Please look at save_inst implementation; it ends
with

save(stuff)
write(BUILD)

This is where the recursion happens. You run into the limit only if
you can find a path through your object graph that does not visit the
same object twice.

There are limitations, and there are ways to cope with them. See below.
Read the pickle code, understand why it is recursive, and implement an
alternative that isn't recursive, yet preserves the original semantics
in terms of pickling order and sequence in which methods are called on
objects. When you are done, please contribute your changes back, to
sf.net/projects/python.

The reason why pickling is recursive was of course a thing
that I never really understood. For me, this is a bit like
asking for problems.
But well, the same was true with recursive destructors of
objects, until I came up with the Py_Trashcan structure.
It has been changed very much, but the idea is still alive:
try to limit recursions as much as possible.

Back to your problem: You have a chance to avoid recursions.
I had the same problem with Stackless. I support pickling of
frame chains to any extent, together with tracebacks. The
recursion problems would be there, if I didn't take some action.
ANd for Stackless, these problems are in fact harder, since
it doesn't have any restriction in recursion depth by nature.

Let me give you an example from my recent efforts:

For Stackless, I implemented pickling of frames and tracebacks,
besides many others. With the standard approach of pickling
them as they come, I would have got into recursion problems.
Although I got different advice of Armin Rigo ("do it right,
trash the pickler and build a stackless engine for it"), I didn't,
since I'm not trying to convince the stack-bound heads any longer.
I will do that for PyPy, when the time has come.

Keeping the current recursive pickling engine, I did this for
pickling of Tasklets, which are just small header objects
for frame chains:
Frames are pickled alone. They don't care about their f_back
field, therefore not introducing any recursive calls.
Tasklets are frame chains. They know which frames they
contain. And instead of causing recursions, tasklets simply
build a list of all the frames they need to be pickled,
and exactly that will happen, in an iterative manner.

Analogously it works with tracebacks, which also can show
very serious nesting. The problem with tracebacks is a bit
harder than with tasklets, since they don't have the
tasklet serving like a header. Therefore, I added a flag to
tracebacks, which marks the leading traceback object. That is,
this leading one will pickle all the other ones into a list,
again avoiding deep recursions. The trick was to modify
PyTraceBack_Here to tag the leading traceback object. Under the
(reasonable) assumption that this leading traceback object
will show up in your pickle, things work very smoothly.

If you want to know about how to pickle certain stuff, download
the Stackless CVS and have a look. There is support for pickling
tasklets, frames, code objects, iterators, tracebacks, and some
more to come. The most straight forward thing for you to look
into would probably be tracebackobject.c .

CVSROOT=:pserver:[email protected]:/home/cvs
export CVSROOT
cvs co stackless

cheers - chris

--
Christian Tismer :^) <mailto:[email protected]>
Mission Impossible 5oftware : Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/
14109 Berlin : PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/
 
M

Mark Hahn

Christian Tismer said:
Sorry? What exactly is the bad tool you are referring to?

The bad tool is pickle. It cannot handle arbitrary graphs, which to me is a
bug.
I'm slightly unhappy with the time that I spent with
this message, in order to get *that* answer.

I didn't mean to be unappreciative of your help. Hopefully someone reading
this group will be helped by your message. I just don't want to try to work
around pickle's problems.
thanks for no further replies -- chris

I don't know what you mean by this.
 
M

Mark Hahn

Was your reply directing me to an example of how to write something that
works around the recursion limits of pickle, or was it something that could
be used directly to do pickling without the limits? I thought it was the
former.

In either case I apologize for my curt reply to your well thought out
answer. I didn't realize that I had pissed you off until I finally parsed
"thanks for no further replies -- chris" as "shut up" (which is kind of rude
by the way). One reason I had trouble parsing that is you asked me a
non-rhetorical question in the same post, which is a bit of a contradiction.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top