Read-Write Lock vs primitive Lock()

k3xji · Dec 29, 2008

Hi,

I am trying to see on which situations does the Read-Write Lock
performs better on primitive Lock() itself. Below is the code I am
using to test the performance:
import threading
import locks
import time

class mylock(object):
def __init__(self):
self.__notreading = threading.Event()
self.__notwriting = threading.Event()
self.__notreading.set()
self.__notwriting.set()
def acquire_read(self):
self.__notreading.clear()
self.__notwriting.wait()
def acquire_write(self):
self.__notreading.wait()
self.__notwriting.clear()
def release_read(self):
self.__notreading.set()
def release_write(self):
self.__notwriting.set()

GLOBAL_VAR = 1
#GLOBAL_LOCK = locks.ReadWriteLock()
GLOBAL_LOCK = threading.Lock()
#GLOBAL_LOCK = mylock()
GLOBAL_LOOP_COUNT = 100000
GLOBAL_READER_COUNT = 1000
GLOBAL_WRITER_COUNT = 1

class wthread(threading.Thread):
def run(self):
try:
#GLOBAL_LOCK.acquireWrite()
#GLOBAL_LOCK.acquire_write()
GLOBAL_LOCK.acquire()
for i in range(GLOBAL_LOOP_COUNT):
GLOBAL_VAR = 4
finally:
#GLOBAL_LOCK.release_write()
GLOBAL_LOCK.release()

class rthread(threading.Thread):
def run(self):
try:
#GLOBAL_LOCK.acquireRead()
#GLOBAL_LOCK.acquire_read()
GLOBAL_LOCK.acquire()
for i in range(GLOBAL_LOOP_COUNT):
GLOBAL_VAR = 3
finally:
#GLOBAL_LOCK.release_read()
GLOBAL_LOCK.release()

# module executed?
if __name__ == "__main__":
starttime = time.clock()
threads = []
for i in range(GLOBAL_READER_COUNT):
rt = rthread()
threads.append(rt)
for i in range(GLOBAL_WRITER_COUNT):
wt = wthread()
threads.append(wt)

for thread in threads:
thread.start()
for thread in threads:
thread.join()
print "All operations took " + str(time.clock() - starttime) + "
msecs"

What I am doing is: I am creating multiple readers and try to do
something. I had assumed that with using primitive Lock() on the above
situation, it will create a bottleneck on the rthreads. But the
numbers indicate that there are no difference at all. I had
implemented my own READ-WRIET lock as can be seen above mylock and
also used the one here: code.activestate.com/recipes/502283/.

Both have the same numbers:
above test with primitive Lock:
C:\Python25\mytest>python test_rw.py
All operations took 14.4584082614 msecs
above test with mylock:
C:\Python25\mytest>python test_rw.py
All operations took 14.5185156214 msecs
abive test with the one in recipe:
C:\Python25\mytest>python test_rw.py
All operations took 14.4641975447 msecs

So, I am confused in which situations Read-Write lock scales better?

Thanks,

Gabriel Genellina · Dec 29, 2008

En Mon said:
I am trying to see on which situations does the Read-Write Lock
performs better on primitive Lock() itself. Below is the code I am
using to test the performance:
import threading
import locks
import time

class mylock(object):

(I'm not convinced your lock is correct)

GLOBAL_VAR = 1
#GLOBAL_LOCK = locks.ReadWriteLock()
GLOBAL_LOCK = threading.Lock()
#GLOBAL_LOCK = mylock()
GLOBAL_LOOP_COUNT = 100000
GLOBAL_READER_COUNT = 1000
GLOBAL_WRITER_COUNT = 1

Only one writer? If this is always the case, you don't need a lock at all.

class wthread(threading.Thread):
def run(self):
try:
#GLOBAL_LOCK.acquireWrite()
#GLOBAL_LOCK.acquire_write()
GLOBAL_LOCK.acquire()
for i in range(GLOBAL_LOOP_COUNT):
GLOBAL_VAR = 4
finally:
#GLOBAL_LOCK.release_write()
GLOBAL_LOCK.release()

Note that the thread acquires the lock ONCE, repeats several thousand
times an assignment to a *local* variable called GLOBAL_VAR (!), finally
releases the lock and exits. As every thread does the same, they just run
one after another, they never have a significant overlap.

Also, you should acquire the lock *before* the try block (you have to
ensure that, *after* acquiring the lock, it is always released; such
requisite does not apply *before* acquiring the lock)

I'd test again with something like this:

class wthread(threading.Thread):
def run(self):
global GLOBAL_VAR
for i in xrange(GLOBAL_LOOP_COUNT):
GLOBAL_LOCK.acquire()
try:
GLOBAL_VAR += 1
finally:
GLOBAL_LOCK.release()

class rthread(threading.Thread):
def run(self):
try:
#GLOBAL_LOCK.acquireRead()
#GLOBAL_LOCK.acquire_read()
GLOBAL_LOCK.acquire()
for i in range(GLOBAL_LOOP_COUNT):
GLOBAL_VAR = 3
finally:
#GLOBAL_LOCK.release_read()
GLOBAL_LOCK.release()

Hmmm, it's a reader but attempts to modify the value?
You don't have to protect a read operation on a global variable - so a
lock isn't required here.

What I am doing is: I am creating multiple readers and try to do
something. I had assumed that with using primitive Lock() on the above
situation, it will create a bottleneck on the rthreads. But the
numbers indicate that there are no difference at all. I had
implemented my own READ-WRIET lock as can be seen above mylock and
also used the one here: code.activestate.com/recipes/502283/.

I hope you now understand why you got the same numbers always.

k3xji · Dec 29, 2008

(I'm not convinced your lock is correct)

No problem.

Only one writer? If this is always the case, you don't need a lock at all..

No just used for testing. It does not matter, what I want to see is
that I want to create a botleneck on readers.

Note that the thread acquires the lock ONCE, repeats several thousand
times an assignment to a *local* variable called GLOBAL_VAR (!), finally
releases the lock and exits. As every thread does the same, they just run
one after another, they never have a significant overlap.

If I put the for loop outside, in that case, readers will not overlap
at all, and you would be amazed by the numbers for that test. They
indicate primitive lock is faster than read-write lock, as it requires
the lock, executes only one bytecode operation and releases the lock.
So, in order to create a bottlenneck on readers, we need to somehow do
not release the lock immediately.

Also, you should acquire the lock *before* the try block (you have to
ensure that, *after* acquiring the lock, it is always released; such
requisite does not apply *before* acquiring the lock)

Yeah, you are right but it is irrelevant.

I'd test again with something like this:

class wthread(threading.Thread):
def run(self):
global GLOBAL_VAR
for i in xrange(GLOBAL_LOOP_COUNT):
GLOBAL_LOCK.acquire()
try:
GLOBAL_VAR += 1
finally:
GLOBAL_LOCK.release()

With that, primitive locks perform 10 times better than Read-Write
lock. See above.

Hmmm, it's a reader but attempts to modify the value?
You don't have to protect a read operation on a global variable - so a
lock isn't required here.

This is just for testing. Suppose that I am actually reading the
value. I don't understand why a lock is not required? Are you saying
lock is not necesary beacuse GLOBAL_VALUE is an immutable object, if
then, suppose it is not. This is just a test. Suppose GLOBAl_VAR is a
list and we are calling append() on it which is nt an atomic
operation.

I hope you now understand why you got the same numbers always.

Unfortunately, I do not understand anyhing.

Thanks.

Aaron Brady · Dec 29, 2008

With that, primitive locks perform 10 times better than Read-Write
lock. See above.

snip

Gabriel's point (one of them) was that 'GLOBAL_VAR' isn't global in
your example. Your 'wthread' objects aren't sharing anything. He
added the 'global GLOBAL_VAR' statement, which is important

Gabriel Genellina · Dec 30, 2008

En Mon said:
No problem.

No just used for testing. It does not matter, what I want to see is
that I want to create a botleneck on readers.

Is this a theoretical question? What do your readers/writers actually do?
Just reading a global variable does not require any locking. If it is
being modified, you either get the previous value, or the new value.

If I put the for loop outside, in that case, readers will not overlap
at all, and you would be amazed by the numbers for that test. They
indicate primitive lock is faster than read-write lock, as it requires
the lock, executes only one bytecode operation and releases the lock.
So, in order to create a bottlenneck on readers, we need to somehow do
not release the lock immediately.

Again, what is your specific use case? why do you think readers will see a
bottleneck?

With that, primitive locks perform 10 times better than Read-Write
lock. See above.

So what? You don't like such results? The read-write lock in the recipe
you posted is rather complex, and isn't obvious whether it is correct or
not, and the author doesn't prove it nor provide a reference. I'd stick to
the standard Lock object unless I have specific needs.
Note that threading.Lock objects are implemented in C and don't have
significant overhead (other than the wait itself); even threading.RLock
objects are a lot slower. So depending on what you really have to do, the
overhead of using a class like ReadWriteLock may be worse than using a
standard Lock.

This is just for testing. Suppose that I am actually reading the
value. I don't understand why a lock is not required? Are you saying
lock is not necesary beacuse GLOBAL_VALUE is an immutable object, if
then, suppose it is not. This is just a test. Suppose GLOBAl_VAR is a
list and we are calling append() on it which is nt an atomic
operation.

It doesn't matter whether it is mutable or immutable. Remember that, in
Python, a global variable is just a *name* that refers to an object; you
either get the old object referred, or the new one. If it is a list and it
is being modified "in place", you always get the same list -- its contents
may differ from one instant to another so iterating over it isn't safe.
That's why I insist on knowing your use case: you may me doing things more
complicated than they should.

Attached there is a modified test using a shared list. A group of writers
append elements; another group of writers pop elements. Readers
continuously check the list sanity. The list starts empty, should be empty
at the end, and should be equal to range(len(global_list)) at any time.
Run the code as-is, with all locks enabled, and it should work fine.
Disable the locking in each place, and things start going wrong.

Unfortunately, I do not understand anyhing.

Each thread in your original code were like this:

begin thread; wait for lock; execute something locally; release lock; exit
thread;

There was no shared state between threads. Even if you fix them to share
some variable, they were a single, sequential, piece of code (the inner
loop is irrelevant here). Once a thread gets the lock, it just runs to its
end. No one waits more than once for the lock. So all of them run
sequentially, in arbitrary order, but just one after the other. They never
overlap.

k3xji · Dec 30, 2008

Ok. Forget about the GLOBAL_VAR. See the below code:

import threading
import locks
import time

GLOBAL_VAR = []
#GLOBAL_LOCK = locks.ReadWriteLock()
GLOBAL_LOCK = threading.Lock()
#GLOBAL_LOCK = mylock()
GLOBAL_LOOP_COUNT = 10000000
GLOBAL_READER_COUNT = 100
GLOBAL_WRITER_COUNT = 1
global GLOBAL_LEN

class wthread(threading.Thread):
def run(self):
try:
GLOBAL_LOCK.acquireWrite()
#GLOBAL_LOCK.acquire_write()
#GLOBAL_LOCK.acquire()
for i in range(GLOBAL_LOOP_COUNT):
pass
finally:
#GLOBAL_LOCK.release_write()
GLOBAL_LOCK.release()

class rthread(threading.Thread):
def run(self):
try:
GLOBAL_LOCK.acquireRead()
#GLOBAL_LOCK.acquire_read()
print 'ENTER:'+str(threading.currentThread())
#GLOBAL_LOCK.acquire()
print 'ACQUIRE:'+str(threading.currentThread())
for i in range(GLOBAL_LOOP_COUNT):
pass
finally:
#GLOBAL_LOCK.release_read()
GLOBAL_LOCK.release()
print 'RELEASE:'+str(threading.currentThread())

# module executed?
if __name__ == "__main__":
starttime = time.clock()
threads = []
for i in range(GLOBAL_READER_COUNT):
rt = rthread()
threads.append(rt)
for i in range(GLOBAL_WRITER_COUNT):
wt = wthread()
threads.append(wt)

for thread in threads:
thread.start()
for thread in threads:
thread.join()
print "All operations took " + str(time.clock() - starttime) + "
msecs"

As GLOBAL_LOOP_COUNT is 10000000, now this is making a bottleneck on
the readers. I had assumed that as everythread is given only 100
bytecodes to execute, that it will be enough to have a 10000 value for
this number to let other rthread start() and try to acquire the lock.
But, Python undocumentedly, does not grab the GIL easily if a Lock is
held. This is strange for me.

Again, what is your specific use case? why do you think readers will see a
bottleneck?

So what? You don't like such results? The read-write lock in the recipe
you posted is rather complex, and isn't obvious whether it is correct or
not, and the author doesn't prove it nor provide a reference. I'd stick to
the standard Lock object unless I have specific needs.
Note that threading.Lock objects are implemented in C and don't have
significant overhead (other than the wait itself); even threading.RLock
objects are a lot slower. So depending on what you really have to do, the
overhead of using a class like ReadWriteLock may be worse than using a
standard Lock.

That may be the point. That is why I am trying to test this. When will
ReadWrite() lock permforms better over the primitive lock. By the way,
for the reference, the code in the recipe is used in CherryPy and
based on the Tanenbaum's book Operating Systems Design.

It doesn't matter whether it is mutable or immutable. Remember that, in
Python, a global variable is just a *name* that refers to an object; you
either get the old object referred, or the new one. If it is a list and it
is being modified "in place", you always get the same list -- its contents
may differ from one instant to another so iterating over it isn't safe.
That's why I insist on knowing your use case: you may me doing things more
complicated than they should.

I understand your point completely, but let's change anything inside
the loop. Just assume that we a thread that is supposed to read
something, and a thread that is supposed to write. If I create
thousands of readers, and if the operation is enormously huge
calculation(as Python does not grab GIL easily inside a Lock), then
this will create a bottlencek on readers. But, with ReadWrite Lock
this problem *SHOULD* be fixed and in my tests I see it is fixed, when
I increase the time spent in the loop in huge amounts.

Each thread in your original code were like this:

begin thread; wait for lock; execute something locally; release lock; exit
thread;

There was no shared state between threads. Even if you fix them to share
some variable, they were a single, sequential, piece of code (the inner
loop is irrelevant here). Once a thread gets the lock, it just runs to its
end. No one waits more than once for the lock. So all of them run
sequentially, in arbitrary order, but just one after the other. They never
overlap.

You are correct about this. But this is an undocumented thing. I had
assumed that Python will release GIL for other threads after 100
bytecodes are executed by default. However, as I stated, when a Lock()
is held this is changing. I think this is because to avoid simple MT
problems for new programmers. Not sure. I will read threads.c for
information.

Thanks.

k3xji · Dec 30, 2008

Ok. Forget about the GLOBAL_VAR. See the below code:

import threading
import locks
import time

GLOBAL_VAR = []
#GLOBAL_LOCK = locks.ReadWriteLock()
GLOBAL_LOCK = threading.Lock()
#GLOBAL_LOCK = mylock()
GLOBAL_LOOP_COUNT = 10000000
GLOBAL_READER_COUNT = 100
GLOBAL_WRITER_COUNT = 1
global GLOBAL_LEN

class wthread(threading.Thread):
def run(self):
try:
GLOBAL_LOCK.acquireWrite()
#GLOBAL_LOCK.acquire_write()
#GLOBAL_LOCK.acquire()
for i in range(GLOBAL_LOOP_COUNT):
pass
finally:
#GLOBAL_LOCK.release_write()
GLOBAL_LOCK.release()

class rthread(threading.Thread):
def run(self):
try:
GLOBAL_LOCK.acquireRead()
#GLOBAL_LOCK.acquire_read()
print 'ENTER:'+str(threading.currentThread())
#GLOBAL_LOCK.acquire()
print 'ACQUIRE:'+str(threading.currentThread())
for i in range(GLOBAL_LOOP_COUNT):
pass
finally:
#GLOBAL_LOCK.release_read()
GLOBAL_LOCK.release()
print 'RELEASE:'+str(threading.currentThread())

# module executed?
if __name__ == "__main__":
starttime = time.clock()
threads = []
for i in range(GLOBAL_READER_COUNT):
rt = rthread()
threads.append(rt)
for i in range(GLOBAL_WRITER_COUNT):
wt = wthread()
threads.append(wt)

for thread in threads:
thread.start()
for thread in threads:
thread.join()
print "All operations took " + str(time.clock() - starttime) + "
msecs"

As GLOBAL_LOOP_COUNT is 10000000, now this is making a bottleneck on
the readers. I had assumed that as everythread is given only 100
bytecodes to execute, that it will be enough to have a 10000 value for
this number to let other rthread start() and try to acquire the lock.
But, Python undocumentedly, does not grab the GIL easily if a Lock is
held. This is strange for me.

Again, what is your specific use case? why do you think readers will see a
bottleneck?

So what? You don't like such results? The read-write lock in the recipe
you posted is rather complex, and isn't obvious whether it is correct or
not, and the author doesn't prove it nor provide a reference. I'd stick to
the standard Lock object unless I have specific needs.
Note that threading.Lock objects are implemented in C and don't have
significant overhead (other than the wait itself); even threading.RLock
objects are a lot slower. So depending on what you really have to do, the
overhead of using a class like ReadWriteLock may be worse than using a
standard Lock.

That may be the point. That is why I am trying to test this. When will
ReadWrite() lock permforms better over the primitive lock. By the way,
for the reference, the code in the recipe is used in CherryPy and
based on the Tanenbaum's book Operating Systems Design.

It doesn't matter whether it is mutable or immutable. Remember that, in
Python, a global variable is just a *name* that refers to an object; you
either get the old object referred, or the new one. If it is a list and it
is being modified "in place", you always get the same list -- its contents
may differ from one instant to another so iterating over it isn't safe.
That's why I insist on knowing your use case: you may me doing things more
complicated than they should.

I understand your point completely, but let's change anything inside
the loop. Just assume that we a thread that is supposed to read
something, and a thread that is supposed to write. If I create
thousands of readers, and if the operation is enormously huge
calculation(as Python does not grab GIL easily inside a Lock), then
this will create a bottlencek on readers. But, with ReadWrite Lock
this problem *SHOULD* be fixed and in my tests I see it is fixed, when
I increase the time spent in the loop in huge amounts.

Each thread in your original code were like this:

begin thread; wait for lock; execute something locally; release lock; exit
thread;

There was no shared state between threads. Even if you fix them to share
some variable, they were a single, sequential, piece of code (the inner
loop is irrelevant here). Once a thread gets the lock, it just runs to its
end. No one waits more than once for the lock. So all of them run
sequentially, in arbitrary order, but just one after the other. They never
overlap.

You are correct about this. But this is an undocumented thing. I had
assumed that Python will release GIL for other threads after 100
bytecodes are executed by default. However, as I stated, when a Lock()
is held this is changing. I think this is because to avoid simple MT
problems for new programmers. Not sure. I will read threads.c for
information.

Thanks.

Gabriel Genellina · Dec 30, 2008

En Tue said:
As GLOBAL_LOOP_COUNT is 10000000, now this is making a bottleneck on
the readers. I had assumed that as everythread is given only 100
bytecodes to execute, that it will be enough to have a 10000 value for
this number to let other rthread start() and try to acquire the lock.
But, Python undocumentedly, does not grab the GIL easily if a Lock is
held. This is strange for me.

This has nothing to do with the GIL, nor undocumented behavior. If all
threads are blocked waiting for the same lock, yielding the CPU has no
effect, because nobody can progress until the lock is released.

I'll ask you again: what is your use case? what do you want to do? what is
a "reader" in your problem? what is a "writer"? what is the resource you
want to protect using a lock?
Different problems have different behaviors; for example, for
reading/writing files, your model is basically wrong.

That may be the point. That is why I am trying to test this. When will
ReadWrite() lock permforms better over the primitive lock. By the way,
for the reference, the code in the recipe is used in CherryPy and
based on the Tanenbaum's book Operating Systems Design.

Ah, good to know at least it has some background. Anyway, isn't a
guarantee of correctness. By exampe, Tanenbaum's algorithm for the dining
philosophers problem were (is?) flawed (according to W. Stallings).

I understand your point completely, but let's change anything inside
the loop. Just assume that we a thread that is supposed to read
something, and a thread that is supposed to write. If I create
thousands of readers, and if the operation is enormously huge
calculation(as Python does not grab GIL easily inside a Lock), then
this will create a bottlencek on readers. But, with ReadWrite Lock
this problem *SHOULD* be fixed and in my tests I see it is fixed, when
I increase the time spent in the loop in huge amounts.

But why would you protect the whole long calculation with a lock? And why
do you say "Python does not grab GIL easily inside a Lock", whatever that
means?
If you hold a scarce resource for a long time, this is likely to cause a
bottleneck, and the language (be it Python or other) is mostly irrelevant
here.

I had
assumed that Python will release GIL for other threads after 100
bytecodes are executed by default.

Yes. You can alter how often this occurs, using sys.setcheckinterval (the
code I posted does that, just to see how results vary)

However, as I stated, when a Lock()
is held this is changing. I think this is because to avoid simple MT
problems for new programmers. Not sure. I will read threads.c for
information.

No, it's always the same thing, locks don't affect this behavior. But as I
said above, if no other thread can progress because all of them are
waiting to acquire the same lock, you gain nothing by releasing the GIL.

lock problem	8	Mar 15, 2007
Python 2.6's multiprocessing lock not working on second use?	5	Jan 16, 2009
urllib2 and threading	6	May 1, 2009
An error in threading.py?	2	Mar 11, 2009
Problem in threading	12	Dec 29, 2004
proxy testing code with multi-thread	0	Feb 1, 2008
Newbie help for using multiprocessing and subprocess packages forcreating child processes	0	Jun 16, 2009
Multithreaded class with queues	1	Jul 22, 2005

Read-Write Lock vs primitive Lock()

k3xji

Gabriel Genellina

k3xji

Aaron Brady

Gabriel Genellina

k3xji

k3xji

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads