Locking around

N

Nikolaus Rath

Hello,

I need to synchronize the access to a couple of hundred-thousand
files[1]. It seems to me that creating one lock object for each of the
files is a waste of resources, but I cannot use a global lock for all
of them either (since the locked operations go over the network, this
would make the whole application essentially single-threaded even
though most operations act on different files).

My idea is therefore to create and destroy per-file locks "on-demand"
and to protect the creation and destruction by a global lock
(self.global_lock). For that, I add a "usage counter"
(wlock.user_count) to each lock, and destroy the lock when it reaches
zero. The number of currently active lock objects is stored in a dict:

def lock_s3key(s3key):

self.global_lock.acquire()
try:

# If there is a lock object, use it
if self.key_lock.has_key(s3key):
wlock = self.key_lock[s3key]
wlock.user_count += 1
lock = wlock.lock

# otherwise create a new lock object
else:
wlock = WrappedLock()
wlock.lock = threading.Lock()
wlock.user_count = 1
self.key_lock[s3key] = wlock

finally:
self.global_lock.release()

# Lock the key itself
lock.acquire()


and similarly

def unlock_s3key(s3key):

# Lock dictionary of lock objects
self.global_lock.acquire()
try:

# Get lock object
wlock = self.key_lock[s3key]

# Unlock key
wlock.lock.release()

# We don't use the lock object any longer
wlock.user_count -= 1

# If no other thread uses the lock, dispose it
if wlock.user_count == 0:
del self.key_lock[s3key]
assert wlock.user_count >= 0

finally:
self.global_lock.release()


WrappedLock is just an empty class that allows me to add the
additional user_count attribute.


My questions:

- Does that look like a proper solution, or does anyone have a better
one?

- Did I overlook any deadlock possibilities?


Best,
Nikolaus



[1] Actually, it's not really files (because in that case I could use
fcntl) but blobs stored on Amazon S3.


--
»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
 
U

Ulrich Eckhardt

Nikolaus said:
I need to synchronize the access to a couple of hundred-thousand
files[1]. It seems to me that creating one lock object for each of the
files is a waste of resources, but I cannot use a global lock for all
of them either (since the locked operations go over the network, this
would make the whole application essentially single-threaded even
though most operations act on different files).

Just wondering, but at what time do you know what files are needed? If you
know that rather early, you could simply 'check out' the required files, do
whatever you want with them and then release them again. If one of the
requested files is marked as already in use, you simply wait (without
reserving the others) until someone releases files and then try again. You
could also wait for that precise file to be available, but that would
require that you already reserve the other files, which might unnecessarily
block other accesses.

Note that this idea requires that each access locks one set of files at the
beginning and releases them at the end, i.e. no attempts to lock files in
between, which would otherwise easily lead to deadlocks.

Further, you could distinguish between read-only and read-write access as an
optimisation.
My idea is therefore to create and destroy per-file locks "on-demand"
and to protect the creation and destruction by a global lock
(self.global_lock). For that, I add a "usage counter"
(wlock.user_count) to each lock, and destroy the lock when it reaches
zero. [...code...]

- Does that look like a proper solution, or does anyone have a better
one?

This should work, at least the idea is not flawed. However, I'd say there
are too many locks involved. Rather, you just need a simple flag and the
global lock. Further, you need a condition/event that tells waiting threads
that you released some of the files so that it should see again if the ones
it wants are available.
- Did I overlook any deadlock possibilities?

The normal deadlock possibilities when multiple locks are involved apply,
you must make sure that they are always acquired in an order that prevents
two threads waiting for a resource held by the other.

Uli
 
N

Nikolaus Rath

Ulrich Eckhardt said:
Nikolaus said:
I need to synchronize the access to a couple of hundred-thousand
files[1]. It seems to me that creating one lock object for each of the
files is a waste of resources, but I cannot use a global lock for all
of them either (since the locked operations go over the network, this
would make the whole application essentially single-threaded even
though most operations act on different files).

Just wondering, but at what time do you know what files are needed?

As soon as I have read a client request. Also, I will only need one
file per request, not multiple.
If you know that rather early, you could simply 'check out' the
required files, do whatever you want with them and then release them
again. If one of the requested files is marked as already in use,
you simply wait (without reserving the others) until someone
releases files and then try again. You could also wait for that
precise file to be available, but that would require that you
already reserve the other files, which might unnecessarily block
other accesses.

Note that this idea requires that each access locks one set of files at the
beginning and releases them at the end, i.e. no attempts to lock files in
between, which would otherwise easily lead to deadlocks.

I am not sure that I understand your idea. To me this sounds exactly
like what I'm already doing, just replace 'check out' by 'lock' in
your description... Am I missing something?
My idea is therefore to create and destroy per-file locks "on-demand"
and to protect the creation and destruction by a global lock
(self.global_lock). For that, I add a "usage counter"
(wlock.user_count) to each lock, and destroy the lock when it reaches
zero. [...code...]

- Does that look like a proper solution, or does anyone have a better
one?

This should work, at least the idea is not flawed. However, I'd say
there are too many locks involved. Rather, you just need a simple
flag and the global lock. Further, you need a condition/event that
tells waiting threads that you released some of the files so that it
should see again if the ones it wants are available.

I have to agree that this sounds like an easier implementation. I just
have to think about how to do the signalling. Thanks a lot!


Best,

-Nikolaus

--
»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
 
N

Nikolaus Rath

Nikolaus Rath said:
I have to agree that this sounds like an easier implementation. I
just have to think about how to do the signalling. Thanks a lot!

Here's the code I use now. I think it's also significantly easier to
understand (cv is a threading.Condition() object and cv.locked_keys a
set()).

def lock_s3key(s3key):
cv = self.s3_lock

try:
# Lock set of locked s3 keys (global lock)
cv.acquire()

# Wait for given s3 key becoming unused
while s3key in cv.locked_keys:
cv.wait()

# Mark it as used (local lock)
cv.locked_keys.add(s3key)
finally:
# Release global lock
cv.release()


def unlock_s3key(s3key):
cv = self.s3_lock

try:
# Lock set of locked s3 keys (global lock)
cv.acquire()

# Mark key as free (release local lock)
cv.locked_keys.remove(s3key)

# Notify other threads
cv.notify()

finally:
# Release global lock
cv.release()


Best,

-Nikolaus

--
»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
 
C

Carl Banks

Hello,

I need to synchronize the access to a couple of hundred-thousand
files[1]. It seems to me that creating one lock object for each of the
files is a waste of resources, but I cannot use a global lock for all
of them either (since the locked operations go over the network, this
would make the whole application essentially single-threaded even
though most operations act on different files).

My idea is therefore to create and destroy per-file locks "on-demand"
and to protect the creation and destruction by a global lock
(self.global_lock). For that, I add a "usage counter"
(wlock.user_count) to each lock, and destroy the lock when it reaches
zero. [snip]
My questions:

- Does that look like a proper solution, or does anyone have a better
one?


You need the per-file locks at all if you use a global lock like
this. Here's a way to do it using threading.Condition objects. I
suspect it might not perform so well if there is a lot of competition
for certain keys but it doesn't sound like that's the case for you.
Performance and robustness improvements left as an exercise. (Note:
I'm not sure where self comes from in your examples so I left it out
of mine.)


global_lock = threading.Condition()
locked_keys = set()

def lock_s3key(s3key):
global_lock.acquire()
while s3key in locked_keys:
global_lock.wait()
locked_keys.add(s3key)
global_lock.release()

def unlock_s3key(s3key):
global_lock.acquire()
locked_keys.remove(s3key)
global_lock.notifyAll()
global_lock.release()



Carl Banks
 
C

Carl Banks

Here's the code I use now. I think it's also significantly easier to
understand (cv is a threading.Condition() object and cv.locked_keys a
set()).

    def lock_s3key(s3key):
        cv = self.s3_lock

        try:
            # Lock set of locked s3 keys (global lock)
            cv.acquire()

            # Wait for given s3 key becoming unused
            while s3key in cv.locked_keys:
                cv.wait()

            # Mark it as used (local lock)
            cv.locked_keys.add(s3key)
        finally:
            # Release global lock
            cv.release()

    def unlock_s3key(s3key):
        cv = self.s3_lock

        try:
            # Lock set of locked s3 keys (global lock)
            cv.acquire()

            # Mark key as free (release local lock)
            cv.locked_keys.remove(s3key)

            # Notify other threads
            cv.notify()

        finally:
            # Release global lock
            cv.release()

Freaky... I just posted nearly this exact solution.

I have a couple comments. First, the call to acquire should come
before the try block. If the acquire were to fail, you wouldn't want
to release the lock on cleanup.

Second, you need to change notify() to notifyAll(); notify alone won't
cut it. Consider what happens if you have two threads waiting for
keys A and B respectively. When the thread that has B is done, it
releases B and calls notify, but notify happens to wake up the thread
waiting on A. Thus the thread waiting on B is starved.


Carl Banks
 
N

Nikolaus Rath

Carl Banks said:
Freaky... I just posted nearly this exact solution.

I have a couple comments. First, the call to acquire should come
before the try block. If the acquire were to fail, you wouldn't want
to release the lock on cleanup.

Second, you need to change notify() to notifyAll(); notify alone won't
cut it. Consider what happens if you have two threads waiting for
keys A and B respectively. When the thread that has B is done, it
releases B and calls notify, but notify happens to wake up the thread
waiting on A. Thus the thread waiting on B is starved.

You're right. Thanks for pointing it out.

Best,

-Nikolaus

--
»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
 
M

MRAB

You're right. Thanks for pointing it out.
There's also less chance of deadlock if the files are always locked in
the same order, ie if you sort the files by, say, name, don't lock a
file if one earlier in the sorted list is already locked.
 
B

Brian Quinlan

Hey,

I'm trying to figure out how I can validate an XML file using a DTD that
isn't specified in the XML file.

My code so far is:

from xml import sax
from xml.sax import sax2exts

parser = sax2exts.XMLValParserFactory.make_parser()

parser.setContentHandler(handler)
parser.setErrorHandler(handler)

parser.parse(xml_file)

And this works fine if the DTD is specified in the XML file i.e errors
are generated for non-compliant entities. But I would like to force the
file to be valid according to one other DTD file that is not referenced
in the XML file.

Anyone know how to do this?

Cheers,
Brian
 
N

Nikolaus Rath

Tobiah said:
Hello,

I need to synchronize the access to a couple of hundred-thousand
files[1]. It seems to me that creating one lock object for each of the
files is a waste of resources, but I cannot use a global lock for all
of them either (since the locked operations go over the network, this
would make the whole application essentially single-threaded even
though most operations act on different files).

Do you think you could use an SQL database on the network to
handle the locking?

Yeah, I could. It wouldn't even have to be over the network (I'm
synchronizing access from within the same program). But I think that
is even more resource-wasteful than my original idea.
Hey, maybe the files themselves should go into blobs.

Nope, not possible. They're on Amazon S3.

Best,

-Nikolaus

--
»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
 
L

linuxnow

Yeah, I could. It wouldn't even have to be over the network (I'm
synchronizing access from within the same program). But I think that
is even more resource-wasteful than my original idea.

You could use Amazon SQS to queue the requests, as you are talking
about files kept in S3 anyway, that way it would be fully distributed,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,062
Latest member
OrderKetozenseACV

Latest Threads

Top