Creating an object that can track when its attributes are modified

B

Ben Sizer

I am trying to make an object that can track when its attributes have been assigned new values, and which can rollback to previous values where necessary. I have the following code which I believe works, but would like to know if there are simpler ways to achieve this goal, or if there are any bugs I haven't seen yet.


class ChangeTrackingObject(object):
def __init__(self):
self.clean()

def clean(self):
"""Mark all attributes as unmodified."""
object.__setattr__(self, '_dirty_attributes', dict())

def dirty_vals(self):
"""Returns all dirty values."""
return dict( [ (k,v) for k,v in self.__dict__.iteritems() if k in self._dirty_attributes] )

def get_changes_and_clean(self):
"""Helper that collects all the changes and returns them, cleaning the dirty flags at the same time."""
changes = self.dirty_vals()
self.clean()
return changes

def rollback(self):
"""Reset attributes to their previous values."""
for k,v in self._dirty_attributes.iteritems():
object.__setattr__(self, k, v)
self.clean()

def __setattr__(self, key, value):
# If the first modification to this attribute, store the old value
if key not in self._dirty_attributes:
if key in self.__dict__:
self._dirty_attributes[key] = object.__getattribute__(self, key)
else:
self._dirty_attributes[key] = None
# Set the new value
object.__setattr__(self, key, value)


I am aware that adding a new attribute and then calling rollback() leaves the new attribute in place with a None value - maybe I can use a special DeleteMe marker object in the _dirty_attributes dict along with a loop that calls delattr on any attribute that has that value after a rollback.

I also believe that this won't catch modification to existing attributes asopposed to assignments: eg. if one of the attributes is a list and I append to it, this system won't notice. Is that something I can rectify easily?

Any other comments or suggestions?

Thanks,
 
C

Chris Angelico

I also believe that this won't catch modification to existing attributes as opposed to assignments: eg. if one of the attributes is a list and I append to it, this system won't notice. Is that something I can rectify easily?

The only way you could detect mutation of one of its attributes is
with that object's assistance. Effectively, you would need to have a
subclass of list/dict/tuple/whatever that can respond to the change.
Alternatively, you could retain a deep copy and do a comparison at
time of rollback; this, however, would have annoying consequences wrt
performance and other references and such.

What's the goal of this class? Can you achieve the same thing by
using, perhaps, a before-and-after snapshot of a JSON-encoded form of
the object?

ChrisA
 
L

Lele Gaifax

Ben Sizer said:
I also believe that this won't catch modification to existing
attributes as opposed to assignments: eg. if one of the attributes is
a list and I append to it, this system won't notice. Is that something
I can rectify easily?

It's really up to how far you wanna go: a similar use case is
implemented by SQLAlchemy__, which "instrument" builtin collection
classes to achieve the goal. But I'd not call that "easy" though :)

ciao, lele.

__ http://www.sqlalchemy.org/trac/browser/lib/sqlalchemy/orm/collections.py
 
B

Ben Sizer

Effectively, you would need to have a
subclass of list/dict/tuple/whatever that can respond to the change.

This is certainly something I'd be interested in having, but I guess that would be fragile since the user would have the burden of having to remember to use those types.
What's the goal of this class? Can you achieve the same thing by
using, perhaps, a before-and-after snapshot of a JSON-encoded form of
the object?

I need to be able to perform complex operations on the object that may modify several properties, and then gather the properties at the end as an efficient way to see what has changed and to store those changes. Any comparison of before-and-after snapshots could work in theory, but in practice it could be expensive to produce the snapshots on larger objects and probably expensive to calculate the differences that way too. Performance is importantso I would probably just go for an explicit function call to mark an attribute as having been modified rather than trying to do a diff like that. (Itwouldn't work for rollbacks, but I can accept that.)
 
B

Ben Sizer

Effectively, you would need to have a
subclass of list/dict/tuple/whatever that can respond to the change.

This is certainly something I'd be interested in having, but I guess that would be fragile since the user would have the burden of having to remember to use those types.
What's the goal of this class? Can you achieve the same thing by
using, perhaps, a before-and-after snapshot of a JSON-encoded form of
the object?

I need to be able to perform complex operations on the object that may modify several properties, and then gather the properties at the end as an efficient way to see what has changed and to store those changes. Any comparison of before-and-after snapshots could work in theory, but in practice it could be expensive to produce the snapshots on larger objects and probably expensive to calculate the differences that way too. Performance is importantso I would probably just go for an explicit function call to mark an attribute as having been modified rather than trying to do a diff like that. (Itwouldn't work for rollbacks, but I can accept that.)
 
C

Chris Angelico

This is certainly something I'd be interested in having, but I guess thatwould be fragile since the user would have the burden of having to remember to use those types.

Since you're already overriding setattr, you could simply force all
non-string sequences to your special subclass of list. That reduces
that burden, though it'd break if there are any other references to
the object.
I need to be able to perform complex operations on the object that may modify several properties, and then gather the properties at the end as an efficient way to see what has changed and to store those changes. Any comparison of before-and-after snapshots could work in theory, but in practice it could be expensive to produce the snapshots on larger objects and probably expensive to calculate the differences that way too. Performance is important so I would probably just go for an explicit function call to mark an attribute as having been modified rather than trying to do a diff like that. (It wouldn't work for rollbacks, but I can accept that.)

Hmm. Interesting. The perfect solution probably is too messy, yeah.
But if you have your subclassing done, you could possibly
snapshot-on-write, which would allow the rollback. Not sure if it'd
help though.

ChrisA
 
8

88888 Dihedral

Ben Sizeræ–¼ 2013å¹´3月7日星期四UTC+8上åˆ12時56分09秒寫é“:
This is certainly something I'd be interested in having, but I guess thatwould be fragile since the user would have the burden of having to remember to use those types.









I need to be able to perform complex operations on the object that may modify several properties, and then gather the properties at the end as an efficient way to see what has changed and to store those changes. Any comparison of before-and-after snapshots could work in theory, but in practice it could be expensive to produce the snapshots on larger objects and probably expensive to calculate the differences that way too. Performance is important so I would probably just go for an explicit function call to mark an attribute as having been modified rather than trying to do a diff like that. (It wouldn't work for rollbacks, but I can accept that.)
Please hook a stack implemented as a list in python to every property
of the object that you want to track down.
 
8

88888 Dihedral

Ben Sizeræ–¼ 2013å¹´3月7日星期四UTC+8上åˆ12時56分09秒寫é“:
This is certainly something I'd be interested in having, but I guess thatwould be fragile since the user would have the burden of having to remember to use those types.









I need to be able to perform complex operations on the object that may modify several properties, and then gather the properties at the end as an efficient way to see what has changed and to store those changes. Any comparison of before-and-after snapshots could work in theory, but in practice it could be expensive to produce the snapshots on larger objects and probably expensive to calculate the differences that way too. Performance is important so I would probably just go for an explicit function call to mark an attribute as having been modified rather than trying to do a diff like that. (It wouldn't work for rollbacks, but I can accept that.)
Please hook a stack implemented as a list in python to every property
of the object that you want to track down.
 
S

Steven D'Aprano

I need to be able to perform complex operations on the object that may
modify several properties, and then gather the properties at the end as
an efficient way to see what has changed and to store those changes. Any
comparison of before-and-after snapshots could work in theory, but in
practice it could be expensive to produce the snapshots on larger
objects and probably expensive to calculate the differences that way
too. Performance is important so I would probably just go for an
explicit function call to mark an attribute as having been modified
rather than trying to do a diff like that. (It wouldn't work for
rollbacks, but I can accept that.)


Premature optimization.

Unless you have been eating and breathing Python code for 15+ years, your
intuition of what is expensive and what isn't will probably be *way* off.
I've been using Python for ~15 years, and I wouldn't want to try to guess
what the most efficient way to do this will be.

Actually I lie. I would guess that the simple, most obvious way is
faster: don't worry about storing what changed, just store *everything*.
But I could be wrong.

Fortunately, Python development is rapid enough that you can afford to
develop this object the straightforward way, profile your application to
see where the bottlenecks are, and if it turns out that the simple
approach is too expensive, then try something more complicated.
 
B

Ben Sizer

Premature optimization.

Unless you have been eating and breathing Python code for 15+ years, your
intuition of what is expensive and what isn't will probably be *way* off.
I've been using Python for ~15 years, and I wouldn't want to try to guess
what the most efficient way to do this will be.

I admit, I've only been using Python for 10 years, but I've learned a lot about optimisation and what needs optimising from my time as a game developer. This code needs to be fairly high-performing due to the role it plays inmy server and the frequency with which the behaviour gets called.
Actually I lie. I would guess that the simple, most obvious way is
faster: don't worry about storing what changed, just store *everything*.
But I could be wrong.

The use case I have is not one where that is suitable. It's not the snapshots that are important, but the changes between them.
Fortunately, Python development is rapid enough that you can afford to
develop this object the straightforward way, profile your application to
see where the bottlenecks are, and if it turns out that the simple
approach is too expensive, then try something more complicated.

I don't see a more straightforward solution to the problem I have than the one I have posted. I said that a system that took snapshots of the whole object and attempted to diff them would probably perform worse, but it would probably be more complex too, given the traversal and copying requirements.
 
S

Schneider

Hi,
maybe you could do this by a decorator on the setattr method. It should
look more or less
like your implementation, but in my eyes it's a cleaner and can be reused.

Further, I would use a stack for each attribute, so that you can restore
all previous values.

bg,
Johannes
 
S

Steven D'Aprano

The use case I have is not one where that is suitable. It's not the
snapshots that are important, but the changes between them.

I'm afraid that doesn't make much sense to me. You're performing
calculations, and stuffing them into instance attributes, but you don't
care about the result of the calculations, only how they differ from the
previous result?

I obviously don't understand the underlying problem you're trying to
solve.

I don't see a more straightforward solution to the problem I have than
the one I have posted. I said that a system that took snapshots of the
whole object and attempted to diff them would probably perform worse,
but it would probably be more complex too, given the traversal and
copying requirements.

Yes, and I said that your intuition of what will be fast and what will be
slow is not necessarily trustworthy. Without testing, neither of us knows
for sure.

Given the code you showed in the original post, I don't see that
traversal and copying requirements are terribly complicated. You don't do
deep-copies of attributes, so a shallow copy of the instance __dict__
ought to be enough. Assuming you have a well-defined "start processing"
moment, just grab a snapshot of the dict, which will be fast, then do
your calculations, then call get_changes:


def snapshot(self):
self._snapshot = self.__dict__.copy()

def get_changes(self):
sentinel = object()
return dict( [ (k,v) for k,v in self.__dict__.iteritems()
if k == self._snapshot.get(k, sentinel) ] )


This doesn't support *deleting* attributes, but neither does your
original version.

Obviously I don't know for sure which strategy is fastest, but since your
version already walks the entire __dict__, this shouldn't be much slower,
and has a good chance of being faster.

(Your version slows down *every* attribute assignment. My version does
not.)


By the way, your original version describes the get_changes_and_clean()
method as cleaning the dirty *flags*. But the implementation doesn't
store flags. Misleading documentation is worse than no documentation.


But if you insist on the approach you've taken, you can simplify the
__setattr__ method:


def __setattr__(self, key, value):
# If the first modification to this attribute, store the old value
dirty = self._dirty_attributes
if key not in dirty:
dirty[key] = getattr(self, key, None)
# Set the new value
object.__setattr__(self, key, value)


You might try this (slightly) obfuscated version, which *could* be faster
still, although I doubt it.


def __setattr__(self, key, value):
# If the first modification to this attribute, store the old value
self._dirty_attributes.setdefault(key, getattr(self, key, None))
# Set the new value
object.__setattr__(self, key, value)


but if you really need to get every bit of performance, it's worth trying
them both and seeing which is faster.


(P.S. I trust you know to use timeit for timing small code snippets,
rather than rolling your own timing code?)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top