Help Create Good Data Model

M

mwt

Hi. I'm reworking a little app I wrote, in order to separate the data
from the UI. As a start, I wanted to create a iron-clad data recepticle
that will hold all the important values, and stand up to being queried
by various sources, perhaps concurrently. In all likelihood, the app
will never need anything that robust, but I want to learn to write it
anyway, as an exercise. So here is my code. It's really simple, and I'm
sure you can see my Java background. Are there any problems here?
Something I'm missing or screwing up? I welcome any and alll feedback,
especially if it includes the *why's.* Thanks!

#!/usr/bin/python
# author mwt
# Mar '06

import copy, threading

class FAHData(object):
"""The data model for the F@H monitor."""

def __init__(self):
self.data = {}#this dict will hold all data
self.mutex = threading.RLock()

def get_all_data(self):
"""Returns a COPY of entire data dict."""
#not sure deepcopy() is really necessary here
#but using it for now
#might cause some weird synchronization problems...
try:
self.mutex.acquire()
return copy.deepcopy(self.data)
finally:
self.mutex.release()

def get_data(self, key):
"""Returns a COPY of <key> data element."""
try:
self.mutex.acquire()
return copy.deepcopy(self.data[key])
finally:
self.mutex.release()

def set_value(self, key, value):
"""Sets value of <key> data element."""
try:
self.mutex.acquire()
self.data[key] = value
finally:
self.mutex.release()

def set_data(self, data):
"""Sets entire data dictionary."""
try:
self.mutex.acquire()
self.data = data
finally:
self.mutex.release()

def clear_data(self):
"""Clears entire data dictionary."""
try:
self.mutex.acquire()
self.data = {}
finally:
self.mutex.release()
 
S

Sybren Stuvel

mwt enlightened us with:
I'm reworking a little app I wrote, in order to separate the data
from the UI.

Good idea.
As a start, I wanted to create a iron-clad data recepticle that will
hold all the important values, and stand up to being queried by
various sources, perhaps concurrently.

Why do that yourself, if you can have SQLite databases? SQLite is even
capable of in-memory databases. No need to re-invent the wheel.

Sybren
 
F

fumanchu

There's nothing really *broken* jumping out at me. The last three
methods (set_value, set_data, and clear_data) probably don't need a
mutex, since they will each have their own frame, and the operations
are atomic. If that makes no sense, Google for "Python GIL" ;). If you
just returned a value from the dict instead of using copy, the same
might be said for the get methods--it depends on whether you're storing
mutable objects in self.data or not.

When you're done with the exercise and want to persist those values
somewhere, give Dejavu a try: http://projects.amor.org/dejavu/


Robert Brewer
System Architect
Amor Ministries
(e-mail address removed)
 
M

mwt

fumanchu: Interesting. I'm trying to understand atomicity. Also, since
I want this class to work using the Observer pattern, I've complicated
things, as shown below. I'll look into Dejavu for persistence (although
most of the basic values are persisted elsewhere, so this app will
mainly need only in-memory values and configuration stuff (for which I
think a ConfigParser will probably be enough).

Sybren: That's a cool choice, but I don't think it's right for what I'm
trying to do. Even more massive overkill than what I'm already doing.
Plus, I'm trying to write this thing so that I can hand it *anything*
(a value, a list, another dict, whole objects, etc.), which might be
tough with a DB.

Here's what I've got cooking at this point (adapted heavily from Bruce
Eckel, as well as incorporating fumanchu's corrections). If you get a
chance, please let me know what you think.


#!/usr/bin/python
# author mwt
# Mar '06
import copy, threading, observable

class FAHData(Observable):
"""The data model for the F@H monitor."""

def __init__(self):
Observable.__init__(self)
self.data = {}#this dict will hold all data
self.mutex = threading.RLock()

def get_all_data(self):
"""Returns a COPY of entire data dict."""
#not sure deepcopy() is really necessary here
#but using it for now
#might cause some weird synchronization problems...
try:
self.mutex.acquire()
return copy.deepcopy(self.data)
finally:
self.mutex.release()

def get_data(self, key):
"""Returns a COPY of <key> data element."""
try:
self.mutex.acquire()
return copy.deepcopy(self.data[key])
finally:
self.mutex.release()

#these three methods don't need a mutex because they are atomic (I
think):
#-------------------------------------------->
def set_value(self, key, value):
"""Sets value of <key> data element."""
self.data[key] = value
Observable.notifyObservers(self, arg = 'ELEMENT_CHANGED')

def set_data(self, data):
"""Sets entire data dictionary."""
self.data = data
Observable.notifyObservers(self, arg = 'DATA_CHANGED')

def clear_data(self):
"""Clears entire data dictionary."""
self.data = {}
Observable.notifyObservers(self, arg = 'DATA_CHANGED')
#<---------------------------------------------


#!/usr/bin/python
# author mwt
# Mar '06
import threading

class Observer(object):
def update(observable, arg):
"""OVERRIDE ME"""
pass


class Observable(object):
def __init__(self):
self.obs = []
self.mutex = threading.RLock()

def addObserver(self, observer):
self.mutex.aquire()
try:
if observer not in self.obs:
self.obs.append(observer)
finally:
self.mutex.release()

def notifyObservers(self, arg = None):
self.mutex.aquire()
try:
localArray = self.obs[:]
finally:
self.mutex.release()
for observer in localArray:
observer.update(self, arg)

#these methods don't need a mutex because they are atomic (I
think):
#-------------------------------------------->

def deleteObserver(self, observer):
self.obs.remove(observer)

def deleteObservers(self):
self.obs = []

def countObservers(self):
return len(self.obs)

#<---------------------------------------------

mwt
 
M

mwt

Well, thank the gods for unit testing. Here's the fah_data module with
fewer errors:

import copy, threading, observable

class FAHData(observable.Observable):
"""The data model for the F@H monitor."""

def __init__(self):
observable.Observable.__init__(self)
self.data = {}#this dict will hold all data
self.mutex = threading.RLock()

def get_all_data(self):
"""Returns a COPY of entire data dict."""
#not sure deepcopy() is really necessary here
#but using it for now
#might cause some weird synchronization problems...
try:
self.mutex.acquire()
return copy.deepcopy(self.data)
finally:
self.mutex.release()

def get_data(self, key):
"""Returns a COPY of <key> data element."""
try:
self.mutex.acquire()
return copy.deepcopy(self.data[key])
finally:
self.mutex.release()

#these three methods don't need a mutex because they are atomic (I
think):
#-------------------------------------------->
def set_value(self, key, value):
"""Sets value of <key> data element."""
self.data[key] = value
observable.Observable.notifyObservers(self, arg =
'ELEMENT_CHANGED')

def set_data(self, data):
"""Sets entire data dictionary."""
self.data = data
observable.Observable.notifyObservers(self, arg =
'DATA_CHANGED')

def clear_data(self):
"""Clears entire data dictionary."""
self.data = {}
observable.Observable.notifyObservers(self, arg =
'DATA_CHANGED')
#<---------------------------------------------
 
C

Carl Banks

mwt said:
def get_data(self, key):
"""Returns a COPY of <key> data element."""
try:
self.mutex.acquire()
return copy.deepcopy(self.data[key])
finally:
self.mutex.release()


self.mutex.acquire() should be outside the try block, like this:

self.mutex.acquire()
try:
return copy.deepcopy(self.data[key])
finally:
self.mutex.release()

The reason is: what if the call to self.mutex.acquire fails (raises an
exception)?

Suppose that another thread (#1) has the mutex, and this thread (#2) is
waiting on it, when an exception is raised in it (say a timeout,
keyboard interrupt, or runtime error). What will happen? Because the
acquire call is inside the try block, the finally block gets run, and
thread #2 releases the mutex being held by #1. But wait! Now let's
say there's a third thread (#3) that's also waiting on this mutex.
When #2 releases the mutex, #3 runs. But #1 is not done... #3 corrupts
the data. You've just destroyed the secret plans. You're fired.

The same logic applies to open/close, allocate/deallocate, set/unset,
or any other resource acquisistion. Acquire the resource before the
try block, release it in the finally block.

(Note: it may be the case that 1. the acquisition function cannot raise
an exception for some reason, or 2. attempting to release a unacquired
resource is harmless, in which case it won't exactly hurt to put the
acquisition inside the try block. It might be true of mutexes for all
I know. But even in those rare cases, I recommend always sticking to
the idiom for consistency.)

Carl Banks
 
F

fumanchu

I didn't say that right. As long as you are using deepcopy (or any
operation which might iterate over the keys or values in self.data),
your setter methods need that mutex, *and* it should probably be a
threading.Lock, not an RLock, just in case that iteration ends up
mutating the dict somehow. You can "get away with" no mutexes only if
*all* operations are atomic, meaning you'd have to at the least iterate
over *copies* of the dict's keys. Otherwise, you need the mutex in your
setters as well as your getters.

That turns into a performance problem quickly, which is why there's so
much literature out there on alternatives (mostly written by those
designing databases). Once you hit the performance issues, you can try
multiple dicts to simulate page locking, or cooperating locks to
implement one-writer/multiple-readers, or other techniques.

Dejavu uses a separate sandbox for each thread and lets the back end
(usually a well-written database) handle the concurrency issues.
There's a CachingProxy backend in dejavu/storage/__init__.py which has
full locks as your app does.

Of course, even atomic operations don't guarantee that overlapping
threads do what you expect. If one thread sets self.data['a'] = 1 and
another sets self.data = {}, the order of their operation may or may
not be important to you.


Robert Brewer
System Architect
Amor Ministries
(e-mail address removed)
 
M

mwt

I get what you're saying fumanchu (or should I say Robert?).
I've been working and reworking this code. It's in a lot better shape
now (although I hestitate to keep flooding the conversation with each
iteration of the file). At the level this app should be operating, I
doubt I'll hit performance issues, and it's good to learn the basics
first. However, I doubt this would scale very well, so the next step
will be to educate myself aobut the performance-enhancing alternatives
you're talking about.

One thing I'm still not sure about -- and I suspect that there is no
right answer -- is the fact that although I am writing the code in
Python, the idiom is purely Java. Having my data bucket in the form of,
essentially, a bean with setters and getters, and each method
surrounded by (the Python version of) a "synchronized" piece, and so on
all comes from my Java background. It's ending up working well as code
(I'm a lot further along today), and it's accomplishing the decoupling
of front and back end I was looking for, so that's excellent. However I
do have the vague feeling that I am doing the equivalent of, say,
writing Greek hexameters in English (i.e. it works but it is
stylistically clunky).

Anyway, thanks for your insight. I will probably be posting more of the
code later, if you are interested in checking it out. The app is a
Folding@Home client monitor (for Gnome) -- one of those applications,
like a web spider, that lots of people want to create, even though
there are already a zillion perfectly working versions out there. It's
just about the right level of complexity for me now.

mwt (Michael Taft)
 
C

Carl Banks

mwt said:
One thing I'm still not sure about -- and I suspect that there is no
right answer -- is the fact that although I am writing the code in
Python, the idiom is purely Java. Having my data bucket in the form of,
essentially, a bean with setters and getters, and each method
surrounded by (the Python version of) a "synchronized" piece, and so on
all comes from my Java background. It's ending up working well as code
(I'm a lot further along today), and it's accomplishing the decoupling
of front and back end I was looking for, so that's excellent. However I
do have the vague feeling that I am doing the equivalent of, say,
writing Greek hexameters in English (i.e. it works but it is
stylistically clunky).

However, have a look at the Queue module. It's arguably more Pythonic,
in the sense that it's in the standard libarary. But interally it's
fairly similar to how you're doing it. (It has a few extra locks to
handle empty and full conditions, IIRC.)

Carl Banks
 
M

mwt

The Queue won't work for this app, because it removes the data from the
Queue when you query (the way I understand it).
 
F

fumanchu

If you used a Queue, it wouldn't be the container itself; rather, it
would be a gatekeeper between the container and consumer code. A
minimal example of user-side code would be:

class Request:
def __init__(self, op, data):
self.op = op
self.data = data
self.reply = None
req = Request('get', key)
data_q.put(req, block=True)
while req.reply is None:
time.sleep(0.1)
do_something_with(req.reply)

The container-side code would be:

while True:
request = data_q.get(block=True)
request.reply = handle(request)

That can be improved with queue timeouts on both sides, but it shows
the basic idea.


Robert Brewer
System Architect
Amor Ministries
(e-mail address removed)
 
M

mwt

fumanchu said:
If you used a Queue, it wouldn't be the container itself; rather, it
would be a gatekeeper between the container and consumer code. A
minimal example of user-side code would be:

class Request:
def __init__(self, op, data):
self.op = op
self.data = data
self.reply = None
req = Request('get', key)
data_q.put(req, block=True)
while req.reply is None:
time.sleep(0.1)
do_something_with(req.reply)

The container-side code would be:

while True:
request = data_q.get(block=True)
request.reply = handle(request)

That can be improved with queue timeouts on both sides, but it shows
the basic idea.


Robert Brewer
System Architect
Amor Ministries
(e-mail address removed)

I get it. That's cool.
Here's the latest incarnation of this code. I haven't implemented the
lock you suggested yet, although this version seems to be working fine.


#!/usr/bin/python
# author mwt
# Mar '06
import copy, threading, observable

class FAHData(observable.Observable):
"""The data model for the F@H monitor."""

def __init__(self):
observable.Observable.__init__(self)
self.data = {}#this dict will hold all data
self.mutex = threading.RLock()

def get_all_data(self):
"""Returns a COPY of entire data dict."""
#not sure deepcopy() is really necessary here
#but using it for now
#might cause some weird synchronization problems...
self.mutex.acquire()
try:
return copy.deepcopy(self.data)
finally:
self.mutex.release()

def get_data(self, key):
"""Returns a COPY of <key> data element."""
self.mutex.acquire()
try:
return copy.deepcopy(self.data[key])
finally:
self.mutex.release()

def set_value(self, key, value):
"""Sets value of <key> data element."""
self.mutex.acquire()
try:
self.data[key] = value
observable.Observable.notifyObservers(self, event =
'VALUE_CHANGED', key = key)
finally:
self.mutex.release()

def set_values(self, values):
"""Sets a list of values"""
self.mutex.acquire()
try:
for value in values:
self.data[value] = values[value]
#print 'values input are: %s' %self.data
observable.Observable.notifyObservers(self, event =
'VALUES_CHANGED', keys = values.keys())
finally:
self.mutex.release()

def set_data(self, data):
"""Sets entire data dictionary."""
self.mutex.acquire()
try:
self.data = data
observable.Observable.notifyObservers(self, event =
'DATA_SET')
finally:
self.mutex.release()

def clear_data(self):
"""Clears entire data dictionary."""
self.mutex.acquire()
try:
self.data = {}
observable.Observable.notifyObservers(self, event =
'DATA_CLEARED')
finally:
self.mutex.release()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,796
Messages
2,569,645
Members
45,368
Latest member
EwanMacvit

Latest Threads

Top