How to go about. On read/write locks

  • Thread starter Emanuele D'Arrigo
  • Start date
E

Emanuele D'Arrigo

Hi everybody,

I'm having a threading-related design issue and I suspect it has a
name that I just don't know. Here's a description.

Let's assume a resource (i.e. a dictionary) that needs to be accessed
by multiple threads. A simple lock will do the job but in some
circumstances it will create an unnecessary bottleneck. I.e. let's
assume that most threads only need to have a -read- access to the
resource, while only few threads actually change the dictionary.
Ideally, the reading threads should not block each other. However, as
soon as a threads intends to change the dictionary it should let all
the reading threads finish, lock the access to the resource, change
it, and then release the lock.

I don't think it'd be difficult to implement but I'm wondering if
something in this direction has been done already, if it has a name or
if it's even well known design pattern.

Anybody can shed some light?

Manu
 
D

Diez B. Roggisch

Emanuele said:
Hi everybody,

I'm having a threading-related design issue and I suspect it has a
name that I just don't know. Here's a description.

Let's assume a resource (i.e. a dictionary) that needs to be accessed
by multiple threads. A simple lock will do the job but in some
circumstances it will create an unnecessary bottleneck. I.e. let's
assume that most threads only need to have a -read- access to the
resource, while only few threads actually change the dictionary.
Ideally, the reading threads should not block each other. However, as
soon as a threads intends to change the dictionary it should let all
the reading threads finish, lock the access to the resource, change
it, and then release the lock.

I don't think it'd be difficult to implement but I'm wondering if
something in this direction has been done already, if it has a name or
if it's even well known design pattern.

Anybody can shed some light?


There are two answers to this - the general, and the CPython-specific.

The general is this: if you need reading to finish before you write, you
need a lock that guards every access to the dict. There is no "cheap to
obtain but only effective for some threads"-lock.


The CPython-specific answer is that the GIL takes care of that for you
right now anyway. So unless you plan for a distant future where some
kind of swallows fly around that don't have a GIL, you are safe to
simply read and write in threads without any locking whatsoever.

Diez
 
E

Emanuele D'Arrigo

The CPython-specific answer is that the GIL takes care of that for you
right now anyway. So unless you plan for a distant future where some
kind of swallows fly around that don't have a GIL, you are safe to
simply read and write in threads without any locking whatsoever.

Diez, thanks for your reply. I didn't know what the GIL is. I did some
research finding an interesting article that did clarify many multi-
threading related concepts and issues:

http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/

Python's approach with the GIL is both reasonable and disappointing.
Reasonable because I understand how it can make things easier for its
internals. Disappointing because it means that standard python cannot
take advantage of the parallelism that can more and more often be
afforded by today's computers. I.e. I found only recently, almost by
chance, that my wife's laptop has not one but two processors, even
though it isn't a particularly high-end computer. I now understand
that OS-level threading does use them both, but I understand that the
GIL effectively prevents parallel operations. (Am I understanding
correctly?)

I do not completely understand your statement in the context of my
original example though, the shared dictionary. As the GIL is released
every X bytecode operations surely it can happen that as the
dictionary is iterated through, i.e. in a for/in loop, a different
thread might change it, with potentially catastrophic consequences.
The GIL wouldn't be able to prevent this, wouldn't it?

Manu
 
P

Piet van Oostrum

Emanuele D'Arrigo said:
ED> Hi everybody,
ED> I'm having a threading-related design issue and I suspect it has a
ED> name that I just don't know. Here's a description.
ED> Let's assume a resource (i.e. a dictionary) that needs to be accessed
ED> by multiple threads. A simple lock will do the job but in some
ED> circumstances it will create an unnecessary bottleneck. I.e. let's
ED> assume that most threads only need to have a -read- access to the
ED> resource, while only few threads actually change the dictionary.
ED> Ideally, the reading threads should not block each other. However, as
ED> soon as a threads intends to change the dictionary it should let all
ED> the reading threads finish, lock the access to the resource, change
ED> it, and then release the lock.
ED> I don't think it'd be difficult to implement but I'm wondering if
ED> something in this direction has been done already, if it has a name or
ED> if it's even well known design pattern.
ED> Anybody can shed some light?

This is a classical synchronization problem with a classical solution:
You treat the readers as a group, and the writers individually. So you
have a write lock that each writer has to acquire and release, but it is
acquired only by the first reader and released by the last one.
Therefore you need a counter of the number of readers, and manipulations
of this counter must be protected by another lock.

#------------------------------------------------------------------------
from threading import Lock

mutex = Lock()
writelock = Lock()
numreaders = 0
#------------------------------------------------------------------------
# Reader code:

mutex.acquire()
numreaders += 1
if numreaders == 1:
writelock.acquire()
mutex.release()

## critical section for reader

mutex.acquire()
numreaders -= 1
if numreaders == 0:
writelock.release()
mutex.release()
#------------------------------------------------------------------------
# Writer code:

writelock.acquire()

## critical section for writer

writer.release
#------------------------------------------------------------------------

Notes:
1. From Python 2.6 on you can use the with statement, which makes it
more robust against exceptions:
with mutex:
numreaders += 1
if numreaders == 1:
writelock.acquire()
etc.
In Python 2.5 you can also use this if you use:
from __future__ import with_statement

2. The code above can cause starvation for writers if there are many
readers (or if new readers come in before all other readers have
finished. You need at least one more lock and probably a writer
counter to solve this.

3. See also http://code.activestate.com/recipes/465156/

4. The code has not been tested, not even for syntax errors.
 
D

Diez B. Roggisch

This is a classical synchronization problem with a classical solution:
You treat the readers as a group, and the writers individually. So you
have a write lock that each writer has to acquire and release, but it is
acquired only by the first reader and released by the last one.
Therefore you need a counter of the number of readers, and manipulations
of this counter must be protected by another lock.

I was going to suggest a similar approach but refused to because of a
problem I see with your code as well - if the readers are reading very
fast (the OP didn't state what his actual application is, so it might
not be a consumer-producer scheme which I don't think a dict would be
the natural choice anyway) they can block the writer from writing
alltogether.


Or do I miss something here?

Diez
 
D

Diez B. Roggisch

Python's approach with the GIL is both reasonable and disappointing.
Reasonable because I understand how it can make things easier for its
internals. Disappointing because it means that standard python cannot
take advantage of the parallelism that can more and more often be
afforded by today's computers. I.e. I found only recently, almost by
chance, that my wife's laptop has not one but two processors, even
though it isn't a particularly high-end computer. I now understand
that OS-level threading does use them both, but I understand that the
GIL effectively prevents parallel operations. (Am I understanding
correctly?)

Not entirely. Yes, if your application is CPU-bound. No if it's
IO-bound. And a lot of people think that threads are actually the wrong
approach for concurrency anyway, so with python2.6 there comes the
multiprocessing-module that lets you use the full capacity of your CPUs.
I do not completely understand your statement in the context of my
original example though, the shared dictionary. As the GIL is released
every X bytecode operations surely it can happen that as the
dictionary is iterated through, i.e. in a for/in loop, a different
thread might change it, with potentially catastrophic consequences.
The GIL wouldn't be able to prevent this, wouldn't it?

You didn't give a concrete usage scenario for your shared dict - but I
assumed that by reading and writing you meant

mydict[key] = value

value = mydict[key]

which are both atomic through the GIL.

More complex operations - such as iteration - might need more coarse
grained locking.

Diez
 
C

Carl Banks

Python's approach with the GIL is both reasonable and disappointing.
Reasonable because I understand how it can make things easier for its
internals. Disappointing because it means that standard python cannot
take advantage of the parallelism that can more and more often be
afforded by today's computers. I.e. I found only recently, almost by
chance, that my wife's laptop has not one but two processors, even
though it isn't a particularly high-end computer. I now understand
that OS-level threading does use them both, but I understand that the
GIL effectively prevents parallel operations. (Am I understanding
correctly?)

Mostly, but keep in mind that non-Python code can run on a different
core at the same time. This could be stuff like I/O or numerical
calcuations written in C.


I do not completely understand your statement in the context of my
original example though, the shared dictionary. As the GIL is released
every X bytecode operations surely it can happen that as the
dictionary is iterated through, i.e. in a for/in loop, a different
thread might change it, with potentially catastrophic consequences.
The GIL wouldn't be able to prevent this, wouldn't it?

It'll prevent catastrophic consequences such as segfaults, yes. It
won't prevent incorrect results, or some other thread changing
something under your feet, though.

Also, I believe dicts know when they're being iterated over and will
raise an exception on any attempt to add or remove a new key, so at
worst you might get a value changed under your feet. I would say,
therefore, that as long as you are only modifying values, and not
adding or removing them, Diez is correct. (Someone more familiar with
dict internals might want to verify.)


Carl Banks
 
C

Carl Banks

I was going to suggest a similar approach but refused to because of a
problem I see with your code as well - if the readers are reading very
fast (the OP didn't state what his actual application is, so it might
not be a consumer-producer scheme which I don't think a dict would be
the natural choice anyway) they can block the writer from writing
alltogether.

You could implement some kind of fair ordering where whoever requests
a lock first is guaranteed to get it first, but I can't think of a way
to do that without requiring all readers to acquire two locks.


Carl Banks
 
D

Dennis Lee Bieber

You could implement some kind of fair ordering where whoever requests
a lock first is guaranteed to get it first, but I can't think of a way
to do that without requiring all readers to acquire two locks.
The more I follow this discussion, the more it begins to sound like
the scheme SQLite uses, with its multilevel locks...

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
P

Piet van Oostrum

DBR> I was going to suggest a similar approach but refused to because of a
DBR> problem I see with your code as well - if the readers are reading very fast
DBR> (the OP didn't state what his actual application is, so it might not be a
DBR> consumer-producer scheme which I don't think a dict would be the natural
DBR> choice anyway) they can block the writer from writing alltogether.

DBR> Or do I miss something here?

Yes, you missed note 2 at the end of my posting.
 
P

Piet van Oostrum

CB> You could implement some kind of fair ordering where whoever requests
CB> a lock first is guaranteed to get it first, but I can't think of a way
CB> to do that without requiring all readers to acquire two locks.

The original implementation (with writers starvation) comes from [1].
They also describe a solution where witers have priority, but it needs 5
semaphores and 2 counters (one for writers and one for readers). It can
cause starvation for readers, however. For the OP this wouldn't be a
problem because writers are rare in his situation.

However, I found a solution [2] with just one additional counter for the
number of writers and no additional semaphores. The manipulations of the
writers counter are also protected by the same mutex. This solution is
fair for both readers and writers. Translated in Python this would be:

#------------------------------------------------------------------------
from threading import Lock

mutex = Lock()
writelock = Lock()
numreaders = 0
numwriters = 0
#------------------------------------------------------------------------
# Reader code:

mutex.acquire()
if numwriters > 0 or numreaders == 0:
mutex.release()
writelock.acquire()
mutex.acquire()
numreaders += 1
mutex.release()

## critical section for reader

mutex.acquire()
numreaders -= 1
if numreaders == 0:
writelock.release()
mutex.release()
#------------------------------------------------------------------------
# Writer code:

mutex.acquire()
numwriters += 1
mutex.release()
writelock.acquire()

## critical section for writer

mutex.acquire()
numwriters -= 1
mutex.release()
writelock.release()
#------------------------------------------------------------------------

I am going to put this in a class, with a context manager so that it can
be use with the 'with' statement like the normal locks.

I also found a solution with no additional counters but an additional
semaphore, but I found it only in lecture notes [3,4]; I couldn't find any
scientific publications about it. So I don't know for sure if it is
correct, fair etc.

[1] P. J. Courtois , F. Heymans , D. L. Parnas, Concurrent control with
“readers†and “writersâ€, Communications of the ACM, v.14 n.10,
p.667-668, Oct. 1971
[2] Jalal Kawash, Process Synchronization with Readers and Writers
Revisited Journal of Computing and Information Technology - CIT 13,
2005, 1, 43–51
[3] http://pages.cs.wisc.edu/~haryadi/537/slides/lec18-semaphores.ppt
[4] http://vorlon.case.edu/~jrh23/338/HW3.pdf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top