Minimalistic Software Transactional Memory

M

Michael Sparks

Hi,


I'm interested in writing a simple, minimalistic, non persistent (at this
stage) software transactional memory (STM) module. The idea being it should
be possible to write such a beast in a way that can be made threadsafe fair
easily.

For those who don't know, STM is a really fancy way of saying variables
with version control (as far as I can tell :) designed to enable threadsafe
shared data.

I'm starting with the caveat here that the following code is almost
certainly not threadsafe (not put any real thought into that as yet),
and I'm interested in any feedback on the following:

* Does the API look simple enough?
* Are there any glaring mistakes in the code ? (It's always harder to see
your own bugs)
* What key areas appear least threadsafe, and any general suggestions
around that.

If I get no feedback I hope this is of interest. Since these things get
archived, if you're reading this a month, 6 months, a year or more from
now, I'll still be interested in feedback...

OK, API.

First of all we need to initialise the store:

S = Store()

We then want to get a value from the store such that we can use the value,
and do stuff with it:

greeting = S.using("hello")

Access the value:

print repr(greeting.value)

Update the value:

greeting.set("Hello World")

Commit the value back to the store:

greeting.commit()

If you have concurrent updates of the same value, the following exception
gets thrown:
ConcurrentUpdate

cf: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 12, in commit
File "<stdin>", line 11, in set
__main__.ConcurrentUpdate

That's pretty much the simplest API I can come up with. (I've tried a few
others but didn't really like them)

The way this works is we have a Store that manages Values. (perhaps a better
name may be variables to avoid clashing with pythons parlance of labels and
values?)

Anyhow, you can ask the store for Value, which it will give you. The Value
knows what it's called and where it's from, so when you tell it to commit,
it can try to do so. The little detail here is you get a /copy/ of the
Value not the stored Value. (This is to avoid accidental concurrent update
of the same actual object)

As a result, Store looks like this:

class Store(object):
def __init__(self):
self.store = {}

def get(self, key):
return self.store[key].clone()

def set(self, key, value):
if not (self.store[key].version > value.version):
self.store[key] = Value(value.version+1, value.value, self, key)
value.version= value.version+1
else:
raise ConcurrentUpdate

def using(self, key):
try:
return self.get(key)
except KeyError:
self.store[key] = Value(0, None,self,key)
return self.get(key)

def dump(self):
for k in self.store:
print k, ":", self.store[k]

You'll note that the set method is the greatest candidate for any possible
race hazard here - though I can see a possible boundary issue in "using".
(I think :)

Otherwise I think the above code is relatively straightforward. You'll note
that this API allows this:

greeting.set("Hello")
greeting.commit()
greeting.set("Hello World")
greeting.commit()
greeting.set("Hello World. Game")
greeting.commit()
greeting.set("Hello World. Game Over")
greeting.commit()

The other class is value that looks like this:

class Value(object):
def __init__(self, version, value,store,key):
self.version = version
self.value = value
self.store = store
self.key = key

def __repr__(self):
return "Value"+repr((self.version,self.value,self.store,self.key))

def set(self, value):
self.value = value

def commit(self):
self.store.set(self.key, self)

def clone(self):
return Value(self.version, self.value,self.store,self.key)

To me this looks like a pretty complete minimalistic thing, which does seem
to work OK, but I'm interested in the three points I mention above - if
anyone is willing to comment - specifcally:

* Does the API look simple enough?
* Are there any glaring mistakes in the code ? (It's always harder to see
your own bugs)
* What key areas appear least threadsafe, and any general suggestions
around that.

Full code below.

Many thanks for any comments in advance,


Michael
--
Michael Sparks, Kamaelia Project Lead
http://kamaelia.sourceforge.net/Developers/
http://yeoldeclue.com/blog

#!/usr/bin/python

class ConcurrentUpdate(Exception): pass

class Value(object):
def __init__(self, version, value,store,key):
self.version = version
self.value = value
self.store = store
self.key = key

def __repr__(self):
return "Value"+repr((self.version,self.value))

def set(self, value):
self.value = value

def commit(self):
self.store.set(self.key, self)

def clone(self):
return Value(self.version, self.value,self.store,self.key)

class Store(object):
def __init__(self):
self.store = {}

def get(self, key):
return self.store[key].clone()

def set(self, key, value):
if not (self.store[key].version > value.version):
self.store[key] = Value(value.version+1, value.value, self, key)
value.version= value.version+1
else:
raise ConcurrentUpdate

def using(self, key):
try:
return self.get(key)
except KeyError:
self.store[key] = Value(0, None,self,key)
return self.get(key)

def dump(self):
for k in self.store:
print k, ":", self.store[k]

S = Store()
greeting = S.using("hello")
print repr(greeting.value)
greeting.set("Hello World")
greeting.commit()
# ------------------------------------------------------
print greeting
S.dump()
# ------------------------------------------------------
par = S.using("hello")
par.set("Woo")
par.commit()
# ------------------------------------------------------
print greeting
S.dump()
# ------------------------------------------------------
greeting.set("Woo")
greeting.commit()

print repr(greeting), repr(greeting.value)

S.dump()
 
F

Fuzzyman

Hi,

I'm interested in writing a simple, minimalistic, non persistent (at this
stage) software transactional memory (STM) module. The idea being it should
be possible to write such a beast in a way that can be made threadsafe fair
easily.

For those who don't know, STM is a really fancy way of saying variables
with version control (as far as I can tell :) designed to enable threadsafe
shared data.

I'm starting with the caveat here that the following code is almost
certainly not threadsafe (not put any real thought into that as yet),
and I'm interested in any feedback on the following:

* Does the API look simple enough?
* Are there any glaring mistakes in the code ? (It's always harder to see
your own bugs)
* What key areas appear least threadsafe, and any general suggestions
around that.

If I get no feedback I hope this is of interest. Since these things get
archived, if you're reading this a month, 6 months, a year or more from
now, I'll still be interested in feedback...

OK, API.

First of all we need to initialise the store:

S = Store()

We then want to get a value from the store such that we can use the value,
and do stuff with it:

greeting = S.using("hello")

Access the value:

print repr(greeting.value)

Update the value:

greeting.set("Hello World")

Commit the value back to the store:

greeting.commit()

If you have concurrent updates of the same value, the following exception
gets thrown:
ConcurrentUpdate

cf:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 12, in commit
File "<stdin>", line 11, in set
__main__.ConcurrentUpdate

That's pretty much the simplest API I can come up with. (I've tried a few
others but didn't really like them)

The way this works is we have a Store that manages Values. (perhaps a better
name may be variables to avoid clashing with pythons parlance of labels and
values?)

Anyhow, you can ask the store for Value, which it will give you. The Value
knows what it's called and where it's from, so when you tell it to commit,
it can try to do so. The little detail here is you get a /copy/ of the
Value not the stored Value. (This is to avoid accidental concurrent update
of the same actual object)

Wouldn't it be more useful having a store (possibly with a mapping
API?) that stored several values. Then a single commit can be done at
the end of the transaction.

If the transaction fails, then the whole process can be repeated with
none of the changes having been committed.

The tricky part is that other threads asking for the value should get
the *old* value until the commit has been done - so you need to know
which thread 'value' is being asked for from.

Michael
 
M

Michael Sparks

Fuzzyman said:
Wouldn't it be more useful having a store (possibly with a mapping
API?) that stored several values. Then a single commit can be done at
the end of the transaction.

If the transaction fails, then the whole process can be repeated with
none of the changes having been committed.

Quite probably yes :)

However initially I thought I'd start off with trying to get single values
right :) (A single value can be a dict of course, but your point holds for
arbitrary unrelated values)

I suspect the way to deal with that might be to do something like:

S = Store()
D = S.using("account_one", "account_two", "myaccount")
D["myaccount"].set(D["account_one"].value+D["account_two"].value)
D["account_one"].set(0)
D["account_two"].set(0)
D.commit()

Which seems pretty nice/simple. It'd be useful for the commit failure to
indicate which value was broken by a concurrent update, but not necessary.

Context:

At present Kamaelia has something which tracks keys and values - primarily
services ("where's the person who manages the pygame display?"). At present
those are only really accessible by generator based components which means
that key/value context is actually accessed in a single threaded way making
services safe for non-threaded components.

However, making accessing that store safe from threads requires changing to
something more advanced, either locking, or using STM. STM seems more in
keeping with Kamaelia being generally lock-free.

The way this store is currently used is exclusively in a single key/value
lookup, which is why I was thinking of that first - though I can see to
be more generally useful allowing multi-value updates would be useful :)
The tricky part is that other threads asking for the value should get
the old value until the commit has been done - so you need to know
which thread 'value' is being asked for from.

I think this approach deals with this naturally. cf:
def get(self, key): # in Store
return self.store[key].clone()
....
def clone(self): # in Value
return Value(self.version, self.value,self.store,self.key)
....
def set(self, key, value):
if not (self.store[key].version > value.version):
self.store[key] = Value(value.version+1, value.value, self, key)
value.version= value.version+1
else:
raise ConcurrentUpdate

This means that the value in 2 threads for the same checked out value names
can be different, but they get warning of this when they try to commit.

eg suppose the following interleaving of commands:

S = Store()
D = S.using("account_one", "account_two", "myaccount")
D["myaccount"].set(D["account_one"].value+D["account_two"].value)
D["account_one"].set(0)
E = S.using("account_one", "myaccount")
E["myaccount"].set(E["myaccount"].value-100)
E["account_one"].set(100)
E.commit()
D["account_two"].set(0)
D.commit()

I did think that this would work with the current code but changing the
value to something more complex - like this:

S = Store()
D = S.using("accounts")
D.set({"account_one":50, "account_two":100, "myaccount":0})
D.commit()
print "here"
S.dump()
X = D.value
X["myaccount"] = X["account_one"] + X["account_two"]
X["account_one"] = 0
X["account_two"] = 0
D.set(X)
D.commit()
S.dump()
print "Committed", D.value["myaccount"]

But in practice, because I'm not doing a _copy_ here of self.value:
def clone(self): # in Value
return Value(self.version, self.value,self.store,self.key)

I end up with accidental concurrent access, which is easy to fix, for
moderately complex datatypes:
import copy
.....
def clone(self): # in Value
return Value(self.version, copy.deepcopy(self.value),
self.store, self.key)

As well as here:
def set(self, key, value):
if not (self.store[key].version > value.version):
self.store[key] = Value(value.version+1,
copy.deepcopy(value.value), self, key)
value.version= value.version+1
else:
raise ConcurrentUpdate

If I do that, then the following code fails as would be expected:
S = Store()
D = S.using("accounts")
D.set({"account_one":50, "account_two":100, "myaccount":0})
D.commit() # First
S.dump()
X = D.value
X["myaccount"] = X["account_one"] + X["account_two"]
X["account_one"] = 0

E = S.using("accounts")
Y = E.value
Y["myaccount"] = Y["myaccount"]-100
Y["account_one"]= 100
E.set(Y)
E.commit() # Second
S.dump()

X["account_two"] = 0
D.set(X)
D.commit() # Third
S.dump()
print "Committed", D.value["myaccount"]

Fails as follows:

accounts : Value(1, {'account_two': 100, 'myaccount': 0, 'account_one': 50})
accounts : Value(2, {'account_two': 100, 'myaccount': -100, 'account_one':
100})
Traceback (most recent call last):
File "./NewSTM.py", line 70, in <module>
D.commit() # Third
File "./NewSTM.py", line 20, in commit
self.store.set(self.key, self)
File "./NewSTM.py", line 37, in set
raise ConcurrentUpdate
__main__.ConcurrentUpdate

(which is of course the error wanted - since we want to be able to detect
failure)

It's probably useful to know for the more general approach you suggest.

Thanks!


Michael.
 
D

Duncan Booth

Michael Sparks said:
I'm interested in writing a simple, minimalistic, non persistent (at
this stage) software transactional memory (STM) module. The idea being
it should be possible to write such a beast in a way that can be made
threadsafe fair easily.

For those who don't know, STM is a really fancy way of saying
variables with version control (as far as I can tell :) designed to
enable threadsafe shared data.

I'm starting with the caveat here that the following code is almost
certainly not threadsafe (not put any real thought into that as yet),
and I'm interested in any feedback on the following:

* Does the API look simple enough?
* Are there any glaring mistakes in the code ? (It's always harder
to see
your own bugs)
* What key areas appear least threadsafe, and any general
suggestions
around that.

If I get no feedback I hope this is of interest. Since these things
get archived, if you're reading this a month, 6 months, a year or more
from now, I'll still be interested in feedback...

Unless you really are desperate to reinvent the wheel, have you looked at
ZODB? https://launchpad.net/zodb

ZODB gives you the transactional model you want. It also gives you
persistence, but if you don't want that you can simply connect to a non-
persistent store.
 
F

Fuzzyman

STM seems more in
keeping with Kamaelia being generally lock-free.

STM isn't lock free - it just abstracts the locks away from the
'user'. You still need to lock around committing the transaction.
Other threads accessing the data during the transaction should see the
old values until the commit is successful.

Michael
http://www.manning.com/foord
 
M

Michael Sparks

Duncan said:
Unless you really are desperate to reinvent the wheel, have you looked at
ZODB? https://launchpad.net/zodb

ZODB gives you the transactional model you want. It also gives you
persistence, but if you don't want that you can simply connect to a non-
persistent store.

That seems somewhat overkill for my needs. ZODB's distribution is 3.3MB in
size, whereas the system I want a minimalistic, non-persistant[1]
transactional memory for is 345K in size. (The Axon part of kamaelia)
I'll take a look though.

[1] I said "at this stage" because its something I *may* want in future
for some possible usecases, but generally I'm not really very interested
in persistence. (At least not where I'm putting this :)

Context: I want thing that use the following class threadsafe, and can think
of a few ways of doing this, and STM seems one of the more "nicer" ways of
doing so. (it's clients are currently just generators which makes it safe at
the moment)

https://kamaelia.svn.sourceforge.ne...tch/Axon/Axon/CoordinatingAssistantTracker.py

It's used to provide an ENVironment like facility (similar to os.environ)
for generator based Kamaelia components.

Thanks for the feedback :)


Michael.
 
M

Michael Sparks

Fuzzyman said:
STM isn't lock free - it just abstracts the locks away from the
'user'. You still need to lock around committing the transaction.

I perhaps phrased what I meant too tersely.

Kamaelia isn't lock free either. It's generally lock free. User code
however, doesn't ever see locks... (Though threaded components do
communicate over thread safe Queues which contain locks internally.
Generator components don't use locking)

I suppose the reason why I like STM (conceptually) is because it appears to
follow the same sort of idea - the user of the code doesn't have to worry
about locks - the just have to worry about whether they're using the system
correctly or not. :)

Based on this, I think I'm right in thinking then that given the code I
posted, that at minimum I need locking here, since it is the critical
section.

def set(self, key, value):
# Attempt to acquire lock
if not (self.store[key].version > value.version):
# Also assuming we have lock
self.store[key] = Value(value.version+1,
copy.deepcopy(value.value), self, key)
value.version= value.version+1
# Release lock
else:
# Also do this if we can't get access to the lock
raise ConcurrentUpdate

I know I then have a choice of locking the value (ie a lock specific to
self.store[key]) or the self.store as a whole. Depends on how antisocial
you want the locking to be I suppose :)
Other threads accessing the data during the transaction should see the
old values until the commit is successful.

The implementation I'm suggesting means that other threads would see the
old values until the try to commit their changes. (I'm after something
simple since I'm expecting to deal with simple cases)

Many thanks :)


Michael.
 
J

John J. Lee

Michael Sparks said:
Duncan said:
Michael Sparks said:
I'm interested in writing a simple, minimalistic, non persistent (at
this stage) software transactional memory (STM) module. The idea being
it should be possible to write such a beast in a way that can be made
threadsafe fair easily.
[...]
Unless you really are desperate to reinvent the wheel, have you looked at
ZODB? https://launchpad.net/zodb
[...]
That seems somewhat overkill for my needs. ZODB's distribution is 3.3MB in
size, whereas the system I want a minimalistic, non-persistant[1]
transactional memory for is 345K in size. (The Axon part of kamaelia)
I'll take a look though.
[...]

Durus might be worth a look too (though I doubt it's suitable for your
situation):

http://www.mems-exchange.org/software/durus/


The link to their paper about it seems to be broken, but I think it
was based somewhat on ZODB, but is simpler (67k tarball :).


John
 
M

Michael Sparks

John said:
Durus might be worth a look too (though I doubt it's suitable for your
situation):

http://www.mems-exchange.org/software/durus/

The link to their paper about it seems to be broken, but I think it
was based somewhat on ZODB, but is simpler (67k tarball :).

Much appreciated. I've downloaded this and it looks more suitable,
however it still looks like overkill... After all, I'm just after a
simple system for concurrent update of in-memory shared values by
multiple threads - how hard can that be ?[1] ;-) :)

[1] Yes, yes, for those who don't know me, I know, I know... :)

Thanks to Fuzzyman's comments and a code review on IRC I think I've got what
I think is a minimally sufficient system - though I'll extend to include
Fuzzyman's suggestion regarding concurrent update of multiple independent
values :)

Code is here for those curious/wanting something similar/similarly
lightweight:

https://kamaelia.svn.sourceforge.net/svnroot/kamaelia/trunk/Sketches/MPS/Experiments/NewSTM.py

I'll package it up independently of the rest of Kamaelia as well probably
once I've made the addition noted above :)

Regards,


Michael
 
P

Paul Rubin

Fuzzyman said:
STM isn't lock free - it just abstracts the locks away from the
'user'. You still need to lock around committing the transaction.

The idea is that readers don't need locks. They just look at the
version number before they start reading and after they finish. If
the number didn't change, the read was successful. If it changed,
they have to retry.

On multiprocessor systems this relies on an instruction like CMPXCHG
for writers to increment the version number. Otherwise it requires a
lock separate from the version number, and multiple operations on the
lock.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top