Minimalistic Software Transactional Memory

Discussion in 'Python' started by Michael Sparks, Dec 8, 2007.

  1. Hi,


    I'm interested in writing a simple, minimalistic, non persistent (at this
    stage) software transactional memory (STM) module. The idea being it should
    be possible to write such a beast in a way that can be made threadsafe fair
    easily.

    For those who don't know, STM is a really fancy way of saying variables
    with version control (as far as I can tell :) designed to enable threadsafe
    shared data.

    I'm starting with the caveat here that the following code is almost
    certainly not threadsafe (not put any real thought into that as yet),
    and I'm interested in any feedback on the following:

    * Does the API look simple enough?
    * Are there any glaring mistakes in the code ? (It's always harder to see
    your own bugs)
    * What key areas appear least threadsafe, and any general suggestions
    around that.

    If I get no feedback I hope this is of interest. Since these things get
    archived, if you're reading this a month, 6 months, a year or more from
    now, I'll still be interested in feedback...

    OK, API.

    First of all we need to initialise the store:

    S = Store()

    We then want to get a value from the store such that we can use the value,
    and do stuff with it:

    greeting = S.using("hello")

    Access the value:

    print repr(greeting.value)

    Update the value:

    greeting.set("Hello World")

    Commit the value back to the store:

    greeting.commit()

    If you have concurrent updates of the same value, the following exception
    gets thrown:
    ConcurrentUpdate

    cf:
    >>> S = Store()
    >>> greeting = S.using("hello")
    >>> par = S.using("hello")
    >>> greeting.set("Hello World")
    >>> par.set("Woo")
    >>> greeting.commit()
    >>> par.commit()

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 12, in commit
    File "<stdin>", line 11, in set
    __main__.ConcurrentUpdate

    That's pretty much the simplest API I can come up with. (I've tried a few
    others but didn't really like them)

    The way this works is we have a Store that manages Values. (perhaps a better
    name may be variables to avoid clashing with pythons parlance of labels and
    values?)

    Anyhow, you can ask the store for Value, which it will give you. The Value
    knows what it's called and where it's from, so when you tell it to commit,
    it can try to do so. The little detail here is you get a /copy/ of the
    Value not the stored Value. (This is to avoid accidental concurrent update
    of the same actual object)

    As a result, Store looks like this:

    class Store(object):
    def __init__(self):
    self.store = {}

    def get(self, key):
    return self.store[key].clone()

    def set(self, key, value):
    if not (self.store[key].version > value.version):
    self.store[key] = Value(value.version+1, value.value, self, key)
    value.version= value.version+1
    else:
    raise ConcurrentUpdate

    def using(self, key):
    try:
    return self.get(key)
    except KeyError:
    self.store[key] = Value(0, None,self,key)
    return self.get(key)

    def dump(self):
    for k in self.store:
    print k, ":", self.store[k]

    You'll note that the set method is the greatest candidate for any possible
    race hazard here - though I can see a possible boundary issue in "using".
    (I think :)

    Otherwise I think the above code is relatively straightforward. You'll note
    that this API allows this:

    greeting.set("Hello")
    greeting.commit()
    greeting.set("Hello World")
    greeting.commit()
    greeting.set("Hello World. Game")
    greeting.commit()
    greeting.set("Hello World. Game Over")
    greeting.commit()

    The other class is value that looks like this:

    class Value(object):
    def __init__(self, version, value,store,key):
    self.version = version
    self.value = value
    self.store = store
    self.key = key

    def __repr__(self):
    return "Value"+repr((self.version,self.value,self.store,self.key))

    def set(self, value):
    self.value = value

    def commit(self):
    self.store.set(self.key, self)

    def clone(self):
    return Value(self.version, self.value,self.store,self.key)

    To me this looks like a pretty complete minimalistic thing, which does seem
    to work OK, but I'm interested in the three points I mention above - if
    anyone is willing to comment - specifcally:

    * Does the API look simple enough?
    * Are there any glaring mistakes in the code ? (It's always harder to see
    your own bugs)
    * What key areas appear least threadsafe, and any general suggestions
    around that.

    Full code below.

    Many thanks for any comments in advance,


    Michael
    --
    Michael Sparks, Kamaelia Project Lead
    http://kamaelia.sourceforge.net/Developers/
    http://yeoldeclue.com/blog

    #!/usr/bin/python

    class ConcurrentUpdate(Exception): pass

    class Value(object):
    def __init__(self, version, value,store,key):
    self.version = version
    self.value = value
    self.store = store
    self.key = key

    def __repr__(self):
    return "Value"+repr((self.version,self.value))

    def set(self, value):
    self.value = value

    def commit(self):
    self.store.set(self.key, self)

    def clone(self):
    return Value(self.version, self.value,self.store,self.key)

    class Store(object):
    def __init__(self):
    self.store = {}

    def get(self, key):
    return self.store[key].clone()

    def set(self, key, value):
    if not (self.store[key].version > value.version):
    self.store[key] = Value(value.version+1, value.value, self, key)
    value.version= value.version+1
    else:
    raise ConcurrentUpdate

    def using(self, key):
    try:
    return self.get(key)
    except KeyError:
    self.store[key] = Value(0, None,self,key)
    return self.get(key)

    def dump(self):
    for k in self.store:
    print k, ":", self.store[k]

    S = Store()
    greeting = S.using("hello")
    print repr(greeting.value)
    greeting.set("Hello World")
    greeting.commit()
    # ------------------------------------------------------
    print greeting
    S.dump()
    # ------------------------------------------------------
    par = S.using("hello")
    par.set("Woo")
    par.commit()
    # ------------------------------------------------------
    print greeting
    S.dump()
    # ------------------------------------------------------
    greeting.set("Woo")
    greeting.commit()

    print repr(greeting), repr(greeting.value)

    S.dump()
     
    Michael Sparks, Dec 8, 2007
    #1
    1. Advertising

  2. Michael Sparks

    Fuzzyman Guest

    On Dec 8, 10:53 pm, Michael Sparks <> wrote:
    > Hi,
    >
    > I'm interested in writing a simple, minimalistic, non persistent (at this
    > stage) software transactional memory (STM) module. The idea being it should
    > be possible to write such a beast in a way that can be made threadsafe fair
    > easily.
    >
    > For those who don't know, STM is a really fancy way of saying variables
    > with version control (as far as I can tell :) designed to enable threadsafe
    > shared data.
    >
    > I'm starting with the caveat here that the following code is almost
    > certainly not threadsafe (not put any real thought into that as yet),
    > and I'm interested in any feedback on the following:
    >
    > * Does the API look simple enough?
    > * Are there any glaring mistakes in the code ? (It's always harder to see
    > your own bugs)
    > * What key areas appear least threadsafe, and any general suggestions
    > around that.
    >
    > If I get no feedback I hope this is of interest. Since these things get
    > archived, if you're reading this a month, 6 months, a year or more from
    > now, I'll still be interested in feedback...
    >
    > OK, API.
    >
    > First of all we need to initialise the store:
    >
    > S = Store()
    >
    > We then want to get a value from the store such that we can use the value,
    > and do stuff with it:
    >
    > greeting = S.using("hello")
    >
    > Access the value:
    >
    > print repr(greeting.value)
    >
    > Update the value:
    >
    > greeting.set("Hello World")
    >
    > Commit the value back to the store:
    >
    > greeting.commit()
    >
    > If you have concurrent updates of the same value, the following exception
    > gets thrown:
    > ConcurrentUpdate
    >
    > cf:
    > >>> S = Store()
    > >>> greeting = S.using("hello")
    > >>> par = S.using("hello")
    > >>> greeting.set("Hello World")
    > >>> par.set("Woo")
    > >>> greeting.commit()
    > >>> par.commit()

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > File "<stdin>", line 12, in commit
    > File "<stdin>", line 11, in set
    > __main__.ConcurrentUpdate
    >
    > That's pretty much the simplest API I can come up with. (I've tried a few
    > others but didn't really like them)
    >
    > The way this works is we have a Store that manages Values. (perhaps a better
    > name may be variables to avoid clashing with pythons parlance of labels and
    > values?)
    >
    > Anyhow, you can ask the store for Value, which it will give you. The Value
    > knows what it's called and where it's from, so when you tell it to commit,
    > it can try to do so. The little detail here is you get a /copy/ of the
    > Value not the stored Value. (This is to avoid accidental concurrent update
    > of the same actual object)
    >


    Wouldn't it be more useful having a store (possibly with a mapping
    API?) that stored several values. Then a single commit can be done at
    the end of the transaction.

    If the transaction fails, then the whole process can be repeated with
    none of the changes having been committed.

    The tricky part is that other threads asking for the value should get
    the *old* value until the commit has been done - so you need to know
    which thread 'value' is being asked for from.

    Michael
     
    Fuzzyman, Dec 8, 2007
    #2
    1. Advertising

  3. Fuzzyman wrote:

    > Wouldn't it be more useful having a store (possibly with a mapping
    > API?) that stored several values. Then a single commit can be done at
    > the end of the transaction.
    >
    > If the transaction fails, then the whole process can be repeated with
    > none of the changes having been committed.


    Quite probably yes :)

    However initially I thought I'd start off with trying to get single values
    right :) (A single value can be a dict of course, but your point holds for
    arbitrary unrelated values)

    I suspect the way to deal with that might be to do something like:

    S = Store()
    D = S.using("account_one", "account_two", "myaccount")
    D["myaccount"].set(D["account_one"].value+D["account_two"].value)
    D["account_one"].set(0)
    D["account_two"].set(0)
    D.commit()

    Which seems pretty nice/simple. It'd be useful for the commit failure to
    indicate which value was broken by a concurrent update, but not necessary.

    Context:

    At present Kamaelia has something which tracks keys and values - primarily
    services ("where's the person who manages the pygame display?"). At present
    those are only really accessible by generator based components which means
    that key/value context is actually accessed in a single threaded way making
    services safe for non-threaded components.

    However, making accessing that store safe from threads requires changing to
    something more advanced, either locking, or using STM. STM seems more in
    keeping with Kamaelia being generally lock-free.

    The way this store is currently used is exclusively in a single key/value
    lookup, which is why I was thinking of that first - though I can see to
    be more generally useful allowing multi-value updates would be useful :)

    > The tricky part is that other threads asking for the value should get
    > the old value until the commit has been done - so you need to know
    > which thread 'value' is being asked for from.


    I think this approach deals with this naturally. cf:
    def get(self, key): # in Store
    return self.store[key].clone()
    ....
    def clone(self): # in Value
    return Value(self.version, self.value,self.store,self.key)
    ....
    def set(self, key, value):
    if not (self.store[key].version > value.version):
    self.store[key] = Value(value.version+1, value.value, self, key)
    value.version= value.version+1
    else:
    raise ConcurrentUpdate

    This means that the value in 2 threads for the same checked out value names
    can be different, but they get warning of this when they try to commit.

    eg suppose the following interleaving of commands:

    S = Store()
    D = S.using("account_one", "account_two", "myaccount")
    D["myaccount"].set(D["account_one"].value+D["account_two"].value)
    D["account_one"].set(0)
    E = S.using("account_one", "myaccount")
    E["myaccount"].set(E["myaccount"].value-100)
    E["account_one"].set(100)
    E.commit()
    D["account_two"].set(0)
    D.commit()

    I did think that this would work with the current code but changing the
    value to something more complex - like this:

    S = Store()
    D = S.using("accounts")
    D.set({"account_one":50, "account_two":100, "myaccount":0})
    D.commit()
    print "here"
    S.dump()
    X = D.value
    X["myaccount"] = X["account_one"] + X["account_two"]
    X["account_one"] = 0
    X["account_two"] = 0
    D.set(X)
    D.commit()
    S.dump()
    print "Committed", D.value["myaccount"]

    But in practice, because I'm not doing a _copy_ here of self.value:
    def clone(self): # in Value
    return Value(self.version, self.value,self.store,self.key)

    I end up with accidental concurrent access, which is easy to fix, for
    moderately complex datatypes:
    import copy
    .....
    def clone(self): # in Value
    return Value(self.version, copy.deepcopy(self.value),
    self.store, self.key)

    As well as here:
    def set(self, key, value):
    if not (self.store[key].version > value.version):
    self.store[key] = Value(value.version+1,
    copy.deepcopy(value.value), self, key)
    value.version= value.version+1
    else:
    raise ConcurrentUpdate

    If I do that, then the following code fails as would be expected:
    S = Store()
    D = S.using("accounts")
    D.set({"account_one":50, "account_two":100, "myaccount":0})
    D.commit() # First
    S.dump()
    X = D.value
    X["myaccount"] = X["account_one"] + X["account_two"]
    X["account_one"] = 0

    E = S.using("accounts")
    Y = E.value
    Y["myaccount"] = Y["myaccount"]-100
    Y["account_one"]= 100
    E.set(Y)
    E.commit() # Second
    S.dump()

    X["account_two"] = 0
    D.set(X)
    D.commit() # Third
    S.dump()
    print "Committed", D.value["myaccount"]

    Fails as follows:

    accounts : Value(1, {'account_two': 100, 'myaccount': 0, 'account_one': 50})
    accounts : Value(2, {'account_two': 100, 'myaccount': -100, 'account_one':
    100})
    Traceback (most recent call last):
    File "./NewSTM.py", line 70, in <module>
    D.commit() # Third
    File "./NewSTM.py", line 20, in commit
    self.store.set(self.key, self)
    File "./NewSTM.py", line 37, in set
    raise ConcurrentUpdate
    __main__.ConcurrentUpdate

    (which is of course the error wanted - since we want to be able to detect
    failure)

    It's probably useful to know for the more general approach you suggest.

    Thanks!


    Michael.
    --
    Michael Sparks, Kamaelia Project Lead
    http://kamaelia.sourceforge.net/Developers/
    http://yeoldeclue.com/blog
     
    Michael Sparks, Dec 9, 2007
    #3
  4. Michael Sparks

    Duncan Booth Guest

    Michael Sparks <> wrote:

    > I'm interested in writing a simple, minimalistic, non persistent (at
    > this stage) software transactional memory (STM) module. The idea being
    > it should be possible to write such a beast in a way that can be made
    > threadsafe fair easily.
    >
    > For those who don't know, STM is a really fancy way of saying
    > variables with version control (as far as I can tell :) designed to
    > enable threadsafe shared data.
    >
    > I'm starting with the caveat here that the following code is almost
    > certainly not threadsafe (not put any real thought into that as yet),
    > and I'm interested in any feedback on the following:
    >
    > * Does the API look simple enough?
    > * Are there any glaring mistakes in the code ? (It's always harder
    > to see
    > your own bugs)
    > * What key areas appear least threadsafe, and any general
    > suggestions
    > around that.
    >
    > If I get no feedback I hope this is of interest. Since these things
    > get archived, if you're reading this a month, 6 months, a year or more
    > from now, I'll still be interested in feedback...


    Unless you really are desperate to reinvent the wheel, have you looked at
    ZODB? https://launchpad.net/zodb

    ZODB gives you the transactional model you want. It also gives you
    persistence, but if you don't want that you can simply connect to a non-
    persistent store.
     
    Duncan Booth, Dec 9, 2007
    #4
  5. Michael Sparks

    Fuzzyman Guest

    > STM seems more in
    > keeping with Kamaelia being generally lock-free.


    STM isn't lock free - it just abstracts the locks away from the
    'user'. You still need to lock around committing the transaction.
    Other threads accessing the data during the transaction should see the
    old values until the commit is successful.

    Michael
    http://www.manning.com/foord
     
    Fuzzyman, Dec 9, 2007
    #5
  6. Duncan Booth wrote:

    > Michael Sparks <> wrote:
    >
    >> I'm interested in writing a simple, minimalistic, non persistent (at
    >> this stage) software transactional memory (STM) module. The idea being
    >> it should be possible to write such a beast in a way that can be made
    >> threadsafe fair easily.
    >>
    >> For those who don't know, STM is a really fancy way of saying
    >> variables with version control (as far as I can tell :) designed to
    >> enable threadsafe shared data.
    >>
    >> I'm starting with the caveat here that the following code is almost
    >> certainly not threadsafe (not put any real thought into that as yet),
    >> and I'm interested in any feedback on the following:
    >>
    >> * Does the API look simple enough?
    >> * Are there any glaring mistakes in the code ? (It's always harder
    >> to see
    >> your own bugs)
    >> * What key areas appear least threadsafe, and any general
    >> suggestions
    >> around that.
    >>
    >> If I get no feedback I hope this is of interest. Since these things
    >> get archived, if you're reading this a month, 6 months, a year or more
    >> from now, I'll still be interested in feedback...

    >
    > Unless you really are desperate to reinvent the wheel, have you looked at
    > ZODB? https://launchpad.net/zodb
    >
    > ZODB gives you the transactional model you want. It also gives you
    > persistence, but if you don't want that you can simply connect to a non-
    > persistent store.


    That seems somewhat overkill for my needs. ZODB's distribution is 3.3MB in
    size, whereas the system I want a minimalistic, non-persistant[1]
    transactional memory for is 345K in size. (The Axon part of kamaelia)
    I'll take a look though.

    [1] I said "at this stage" because its something I *may* want in future
    for some possible usecases, but generally I'm not really very interested
    in persistence. (At least not where I'm putting this :)

    Context: I want thing that use the following class threadsafe, and can think
    of a few ways of doing this, and STM seems one of the more "nicer" ways of
    doing so. (it's clients are currently just generators which makes it safe at
    the moment)

    https://kamaelia.svn.sourceforge.ne...tch/Axon/Axon/CoordinatingAssistantTracker.py

    It's used to provide an ENVironment like facility (similar to os.environ)
    for generator based Kamaelia components.

    Thanks for the feedback :)


    Michael.
     
    Michael Sparks, Dec 9, 2007
    #6
  7. Fuzzyman wrote:

    >> STM seems more in
    >> keeping with Kamaelia being generally lock-free.

    >
    > STM isn't lock free - it just abstracts the locks away from the
    > 'user'. You still need to lock around committing the transaction.
    >


    I perhaps phrased what I meant too tersely.

    Kamaelia isn't lock free either. It's generally lock free. User code
    however, doesn't ever see locks... (Though threaded components do
    communicate over thread safe Queues which contain locks internally.
    Generator components don't use locking)

    I suppose the reason why I like STM (conceptually) is because it appears to
    follow the same sort of idea - the user of the code doesn't have to worry
    about locks - the just have to worry about whether they're using the system
    correctly or not. :)

    Based on this, I think I'm right in thinking then that given the code I
    posted, that at minimum I need locking here, since it is the critical
    section.

    def set(self, key, value):
    # Attempt to acquire lock
    if not (self.store[key].version > value.version):
    # Also assuming we have lock
    self.store[key] = Value(value.version+1,
    copy.deepcopy(value.value), self, key)
    value.version= value.version+1
    # Release lock
    else:
    # Also do this if we can't get access to the lock
    raise ConcurrentUpdate

    I know I then have a choice of locking the value (ie a lock specific to
    self.store[key]) or the self.store as a whole. Depends on how antisocial
    you want the locking to be I suppose :)

    > Other threads accessing the data during the transaction should see the
    > old values until the commit is successful.


    The implementation I'm suggesting means that other threads would see the
    old values until the try to commit their changes. (I'm after something
    simple since I'm expecting to deal with simple cases)

    Many thanks :)


    Michael.
     
    Michael Sparks, Dec 9, 2007
    #7
  8. Michael Sparks

    John J. Lee Guest

    Michael Sparks <> writes:

    > Duncan Booth wrote:
    >
    >> Michael Sparks <> wrote:
    >>
    >>> I'm interested in writing a simple, minimalistic, non persistent (at
    >>> this stage) software transactional memory (STM) module. The idea being
    >>> it should be possible to write such a beast in a way that can be made
    >>> threadsafe fair easily.

    [...]
    >> Unless you really are desperate to reinvent the wheel, have you looked at
    >> ZODB? https://launchpad.net/zodb

    [...]
    > That seems somewhat overkill for my needs. ZODB's distribution is 3.3MB in
    > size, whereas the system I want a minimalistic, non-persistant[1]
    > transactional memory for is 345K in size. (The Axon part of kamaelia)
    > I'll take a look though.

    [...]

    Durus might be worth a look too (though I doubt it's suitable for your
    situation):

    http://www.mems-exchange.org/software/durus/


    The link to their paper about it seems to be broken, but I think it
    was based somewhat on ZODB, but is simpler (67k tarball :).


    John
     
    John J. Lee, Dec 9, 2007
    #8
  9. John J. Lee wrote:

    > Durus might be worth a look too (though I doubt it's suitable for your
    > situation):
    >
    > http://www.mems-exchange.org/software/durus/
    >
    > The link to their paper about it seems to be broken, but I think it
    > was based somewhat on ZODB, but is simpler (67k tarball :).


    Much appreciated. I've downloaded this and it looks more suitable,
    however it still looks like overkill... After all, I'm just after a
    simple system for concurrent update of in-memory shared values by
    multiple threads - how hard can that be ?[1] ;-) :)

    [1] Yes, yes, for those who don't know me, I know, I know... :)

    Thanks to Fuzzyman's comments and a code review on IRC I think I've got what
    I think is a minimally sufficient system - though I'll extend to include
    Fuzzyman's suggestion regarding concurrent update of multiple independent
    values :)

    Code is here for those curious/wanting something similar/similarly
    lightweight:

    https://kamaelia.svn.sourceforge.net/svnroot/kamaelia/trunk/Sketches/MPS/Experiments/NewSTM.py

    I'll package it up independently of the rest of Kamaelia as well probably
    once I've made the addition noted above :)

    Regards,


    Michael
     
    Michael Sparks, Dec 9, 2007
    #9
  10. Michael Sparks

    Paul Rubin Guest

    Fuzzyman <> writes:
    > STM isn't lock free - it just abstracts the locks away from the
    > 'user'. You still need to lock around committing the transaction.


    The idea is that readers don't need locks. They just look at the
    version number before they start reading and after they finish. If
    the number didn't change, the read was successful. If it changed,
    they have to retry.

    On multiprocessor systems this relies on an instruction like CMPXCHG
    for writers to increment the version number. Otherwise it requires a
    lock separate from the version number, and multiple operations on the
    lock.
     
    Paul Rubin, Jan 11, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michael Sparks
    Replies:
    0
    Views:
    260
    Michael Sparks
    Dec 10, 2007
  2. Michael Sparks
    Replies:
    0
    Views:
    304
    Michael Sparks
    Dec 24, 2007
  3. Dmitriy V'jukov
    Replies:
    3
    Views:
    347
    Ian Collins
    Oct 23, 2008
  4. DreiJane
    Replies:
    0
    Views:
    304
    DreiJane
    Nov 20, 2009
  5. legends2k
    Replies:
    10
    Views:
    717
    legends2k
    May 16, 2010
Loading...

Share This Page