Please have a look at this class

N

Neil Cerutti

While working on a program I encountered a situation where I'd
construct a largish data structure (a tree) from parsing a host
of files and would end up having to throw away parts of my
newly built tree if a file turned out to contain invalid data.

The first idea that occurs to me is to provide a merge function
for your data structure, which you use to merge in another tree
object when that data is known to be valid.

So the process would work like this:

temp_tree = process(the_file)
if temp_tree.is_valid():
real_tree.merge(temp_tree)
<CODE SNIPPET>

u = Unrollable()
u.someValue = 3.14
u.aString = 'Hi there'

# If we decide we want to keep those changes ...
u.commit()

# Or we could have reverted to the original. This would have restored
the state prior to the last call to commit() (or simply the
state at the beginning, if there hasn't been a call to commit
yet).
#u.rollback()

</CODE SNIPPET>

The basic idea behind this is that each instance of the
Unrollable class keeps an internal dictionary (which, in lieu
of a better name I'm currently calling 'sand box') to which all
changed attribute values are saved (attribute changes are
intercepted via __setattr__). Then, with a call to commit(),
all attributes are transferred to the instance's __dict__
dictionary and hence become actual attributes per se.

A nice use for this class might be to pass large mutable objects
to a functions as if it were immutable without needing to copy
them. Unfortunately it only works for one level of call. I think.
 
A

antred

Hello everyone,

While working on a program I encountered a situation where I'd
construct a largish data structure (a tree) from parsing a host of
files and would end up having to throw away parts of my newly built
tree if a file turned out to contain invalid data. My first thought was
'Well, you can always make a deep copy of your tree first, then add new
data to the copy and revert to the original if you need to.", but as
this tree can grow very big this is not exactly efficient.
So my second idea was to come up with a class that offers a very
limited degree of database-like behavior, meaning you can make changes
to the object and then decide whether you want to commit those changes
or roll them back to get back to the original. For example:

<CODE SNIPPET>

u = Unrollable()
u.someValue = 3.14
u.aString = 'Hi there'

# If we decide we want to keep those changes ...
u.commit()

# Or we could have reverted to the original. This would have restored
the state prior to the last call to commit() (or simply the state at
the beginning, if there hasn't been a call to commit yet).
#u.rollback()

</CODE SNIPPET>

The basic idea behind this is that each instance of the Unrollable
class keeps an internal dictionary (which, in lieu of a better name I'm
currently calling 'sand box') to which all changed attribute values are
saved (attribute changes are intercepted via __setattr__). Then, with a
call to commit(), all attributes are transferred to the instance's
__dict__ dictionary and hence become actual attributes per se.

Similarily, the rollback() function simply empties the contents of the
sand box without committing them to __dict__. The rollback() function
can work recursively, too, if passed some extra parameters. If so, it
walks either the sand box or the __dict__ (or both) and invokes the
rollback() function on any attribute members that are instances of the
Unrollable class or a derived class.

Finally, this works for 'private' attributes (i.e. names with two
leading underscores), too, as the __setattr__ implementation mangles
the name of the attribute if it detects a private name.

I'm posting this for 2 reasons. Firstly, I feel that I have finally
produced something that others _might_ find useful, too. Secondly,
since I'm still learning Python (yeah, even after 2 years), I would be
very grateful to hear people's criticisms. Are there things that could
be done more efficiently? Do you spot any grave errors? Does something
similar already exist that I should just have used instead? Right now
I'm rather pleased with my class, but if someone tells me there is
already something like this Python's library (and then it'll most
likely be more efficient anyway) then I'd of course rather use that.

Entire class definition + some test code attached to this post.

P.S. I __LOVE__ how something like this is just barely 70 lines of code
in Python!



class Unrollable( object ):
"""Provides a very simple commit/rollback system."""

def __setattr__( self, attributeName, attributeValue ):
"""Changes the specified attribute by setting it to the passed value.
The change is only made to the sandbox and is not committed."""

if attributeName.find( '__' ) == 0:
# Mangle name to make attribute private.
attributeName = '_' + self.__class__.__name__ + attributeName

try:
theDict = self.__dict__[ '_Unrollable__dSandBox' ]
except KeyError:
theDict = self.__dict__[ '_Unrollable__dSandBox' ] = {}


theDict[ attributeName ] = attributeValue


def __getattr__( self, attributeName ):
"""This method ensures an attribute can be accessed even when it
hasn't been committed yet (since it might not exist in the object
itself yet)."""

if attributeName.find( '__' ) == 0:
# Mangle name to make attribute private.
attributeName = '_' + self.__class__.__name__ + attributeName

try:
theDict = self.__dict__[ '_Unrollable__dSandBox' ]
except KeyError:
# Our sandbox doesn't exist yet, therefore the requested attribute
doesn't exist yet either.
raise AttributeError


try:
return theDict[ attributeName ]
except KeyError:
# No such attribute in our sandbox.
raise AttributeError

def commitChanges( self ):
"""Commits the contents of the sandbox to the actual object. Clears
the sandbox."""
while len( self.__dSandBox ) > 0:
key, value = self.__dSandBox.popitem()
self.__dict__[ key ] = value

def unroll( self, bRecurseSandBox = True, bRecurseDict = False ):
"""Ditches all changes currently in the sandbox. Recurses all objects
in the instance itself and in its sandbox and, if
they're unrollable instances themselves, invokes the unroll method on
them as well."""
if bRecurseSandBox:
while len( self.__dSandBox ) > 0:
key, value = self.__dSandBox.popitem()

if isinstance( value, Unrollable ):
value.unroll( bRecurseSandBox, bRecurseDict )
else:
self.__dSandBox.clear()

if bRecurseDict:
iterator = self.__dict__.itervalues()

while True:
try:
nextVal = iterator.next()
except StopIteration:
break

if isinstance( nextVal, Unrollable ):
nextVal.unroll( bRecurseSandBox, bRecurseDict )

def hasUncommittedChanges( self ):
"""Returns true if there are uncommitted changes, false otherwise."""
return len( self.__dSandBox ) > 0




if __name__ == '__main__':
# With a public attribute ...
u = Unrollable()

print 'Before.'

try:
theValue = u.theValue
except AttributeError:
print 'u does not have a theValue attribute yet.'
else:
print 'u.theValue is', theValue


u.theValue = 3.147634

print 'After set().'

try:
theValue = u.theValue
except AttributeError:
print 'u does not have a theValue attribute yet.'
else:
print 'u.theValue is', theValue

u.commitChanges()

print 'After commitChanges().'

try:
theValue = u.theValue
except AttributeError:
print 'u does not have a theValue attribute yet.'
else:
print 'u.theValue is', theValue

print u.__dict__


# With a private attribute ...
class MyClass( Unrollable ):
def accessPrivateAttr( self ):
try:
theValue = self.__theValue
except AttributeError:
print 'self does not have a __theValue attribute yet.'
else:
print 'self.__theValue is', theValue


anObject = MyClass()

print 'Before.'
anObject.accessPrivateAttr()

anObject.__theValue = 6.667e-11
print 'After set().'
anObject.accessPrivateAttr()

#anObject.commitChanges()

print 'After commitChanges().'
anObject.accessPrivateAttr()




anObject.subObject = Unrollable()
anObject.subObject.aString = 'Yeeehaawww'

print anObject.__dict__
print anObject.subObject.__dict__

anObject.unroll( True, True )

print anObject.__dict__
 
B

Bruno Desthuilliers

antred a écrit :
Hello everyone,

While working on a program I encountered a situation where I'd
construct a largish data structure (a tree) from parsing a host of
files and would end up having to throw away parts of my newly built
tree if a file turned out to contain invalid data. My first thought was
'Well, you can always make a deep copy of your tree first, then add new
data to the copy and revert to the original if you need to.", but as
this tree can grow very big this is not exactly efficient.
So my second idea was to come up with a class that offers a very
limited degree of database-like behavior, meaning you can make changes
to the object and then decide whether you want to commit those changes
or roll them back to get back to the original.

Then you may want to have a look at the zodb (and possibly Durus - it
has support for transactions too IIRC) instead of reinventing the square
wheel.
 
S

Steven D'Aprano

Hello everyone,

While working on a program I encountered a situation where I'd
construct a largish data structure (a tree) from parsing a host of
files and would end up having to throw away parts of my newly built
tree if a file turned out to contain invalid data.

Why not validate the file before you add it to the tree?
My first thought was
'Well, you can always make a deep copy of your tree first, then add new
data to the copy and revert to the original if you need to.", but as
this tree can grow very big this is not exactly efficient.

My first thought would have been to just delete the invalid node from the
tree.

BTW, what sort of tree are you using? A basic binary tree, or something
more sophisticated? Deleting nodes from trees need not be expensive.

So my second idea was to come up with a class that offers a very
limited degree of database-like behavior, meaning you can make changes
to the object and then decide whether you want to commit those changes
or roll them back to get back to the original.

I don't know whether that would be a useful tool to have in general. How
do you use it? I can guess two broad strategies:

(1) commit; add; validate; commit or roll-back.

That seems wasteful, because if you're adding only a single node to the
tree, it should be *very* easy to delete it and revert to the previous
state.

Alternatively:

(2) commit; add multiple times; validate multiple items; commit the lot,
or roll-back the lot and then re-add the ones that validated.

But does it really make sense to be rolling back a whole lot of
transactions, just because one of them is faulty? I don't think so.

In this specific example, I think the right solution is to make adding and
deleting a file an atomic operation (or at least as close to atomic as
Python allows), so you don't need to think about attributes. You just add
or delete nodes. Anyway, that's the solution I'd be looking at.

[snip]
The basic idea behind this is that each instance of the Unrollable
class keeps an internal dictionary (which, in lieu of a better name I'm
currently calling 'sand box')

Bad name! "Sand box" already has an established usage, and that's not it.

A better name might be something like "provisional" or "uncommitted" or
similar.

[snip]
Finally, this works for 'private' attributes (i.e. names with two
leading underscores), too, as the __setattr__ implementation mangles
the name of the attribute if it detects a private name.

I think it would be better NOT to mangle the names of the attribute, as
that defeats the purpose of name mangling in the first place. If the user
of your code wants to muck about with private variables, then he should
mangle the name before passing it to your code.


[snip]
class Unrollable( object ):
"""Provides a very simple commit/rollback system."""

def __setattr__( self, attributeName, attributeValue ):
"""Changes the specified attribute by setting it to the passed value.
The change is only made to the sandbox and is not committed."""

if attributeName.find( '__' ) == 0:
# Mangle name to make attribute private.
attributeName = '_' + self.__class__.__name__ + attributeName

You use mangle twice in your code -- that should be factored out as a
function or method.

Or better yet, just don't do it at all.

try:
theDict = self.__dict__[ '_Unrollable__dSandBox' ]
except KeyError:
theDict = self.__dict__[ '_Unrollable__dSandBox' ] = {}

Use this instead:

theDict = self.__dict__.setdefault('_Unrollable__dSandBox', {})


theDict[ attributeName ] = attributeValue


def __getattr__( self, attributeName ):
"""This method ensures an attribute can be accessed even when it
hasn't been committed yet (since it might not exist in the object
itself yet)."""

Doesn't that defeat the purpose of having a distinct commit? If something
hasn't been committed, you shouldn't be able to access it!


[snip]

What about commits/roll-backs for deletion of attributes?

def commitChanges( self ):
"""Commits the contents of the sandbox to the actual object. Clears
the sandbox."""
while len( self.__dSandBox ) > 0:
key, value = self.__dSandBox.popitem()
self.__dict__[ key ] = value

If you change the name of your class, or subclass it, your 'sandbox' will
break. Here, you use a double-underscore name and let Python mangle it,
but earlier you hard-coded the mangled name. Do one or the other, not both.

def unroll( self, bRecurseSandBox = True, bRecurseDict = False ):
"""Ditches all changes currently in the sandbox. Recurses all objects
in the instance itself and in its sandbox and, if
they're unrollable instances themselves, invokes the unroll method on
them as well."""

But if you're ditching the objects in the 'sandbox', why would you need to
unroll them? Aren't they going to disappear?

if bRecurseSandBox:
while len( self.__dSandBox ) > 0:
key, value = self.__dSandBox.popitem()

if isinstance( value, Unrollable ):
value.unroll( bRecurseSandBox, bRecurseDict )
else:
self.__dSandBox.clear()

if bRecurseDict:
iterator = self.__dict__.itervalues()

while True:
try:
nextVal = iterator.next()
except StopIteration:
break

if isinstance( nextVal, Unrollable ):
nextVal.unroll( bRecurseSandBox, bRecurseDict )


Use this instead:


for nextVal in iterator:
if isinstance( ... ):


def hasUncommittedChanges( self ):
"""Returns true if there are uncommitted changes, false otherwise."""
return len( self.__dSandBox ) > 0

Which will fail if the 'sandbox' doesn't exist. In other parts of the code
you test for its existence.


Another thing... I notice that you seem to be inconsistently using a form
of the Hungarian Notation (e.g. dSandbox, bRecurseDict, etc.). I suggest
you read this for a better understanding of when you should do so:

http://www.joelonsoftware.com/articles/Wrong.html


Hope this was helpful,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top