What's an "atomic" operation (in a threaded context)?

P

Paul Moore

I can't find anything which spells this out in the manuals. I guess
that, at some level, the answer is "a single bytecode operation", but
I'm not sure that explains it for me.

This thought was triggered by a comment on the Python Cookbook site,
which basically said that it was OK to do
tss = {}
...
id = thread.get_ident()
tss[id] = {}

(where tss is a global) without a lock, because id is unique to the
thread.

But couldn't this result in 2 threads allocating a new entry in tss at
the same time, and so get tss in an inconsistent state?

I tried to understand this with the dis module:
.... d[n] = {}
.... 2 0 BUILD_MAP 0
3 LOAD_GLOBAL 0 (d)
6 LOAD_FAST 0 (n)
9 STORE_SUBSCR
10 LOAD_CONST 0 (None)
13 RETURN_VALUE

But I'm not sure to make of that.

Can anyone clarify this for me? I'd like to avoid littering my
threaded code with huge numbers of unnecessary locks...

Thanks,
Paul.
 
P

Peter Hansen

Paul said:
I can't find anything which spells this out in the manuals. I guess
that, at some level, the answer is "a single bytecode operation", but
I'm not sure that explains it for me.

This thought was triggered by a comment on the Python Cookbook site,
which basically said that it was OK to do
tss = {}
...
id = thread.get_ident()
tss[id] = {}

(where tss is a global) without a lock, because id is unique to the
thread.

But couldn't this result in 2 threads allocating a new entry in tss at
the same time, and so get tss in an inconsistent state?

I tried to understand this with the dis module:
... d[n] = {}
...2 0 BUILD_MAP 0
3 LOAD_GLOBAL 0 (d)
6 LOAD_FAST 0 (n)
9 STORE_SUBSCR
10 LOAD_CONST 0 (None)
13 RETURN_VALUE

In a nutshell, your "single bytecode operation" theory is correct,
and each of the above lines represents a single bytecode. Therefore
the only line which matters is STORE_SUBSCR (the actual assignment)
and it is atomic, so you can have any number of threads running but
no matter how they try, they won't be able to interrupt each other's
STORE_SUBSCR operations and the dictionary will stay in a consistent
state.

Much or most of the time, basic operation such as this on the standard
mutable primitive types (i.e. lists, dictionaries) are atomic/safe for
use in the way you wish to use them, without needing locks.

-Peter
 
D

Dave Brueck

Paul said:
I can't find anything which spells this out in the manuals. I guess
that, at some level, the answer is "a single bytecode operation", but

Bytecode instructions are atomic in that you won't switch to another Python
thread "mid-bytecode", but that's only sorta interesting.
;-)
This thought was triggered by a comment on the Python Cookbook site,
which basically said that it was OK to do
tss = {}
...
id = thread.get_ident()
tss[id] = {}

(where tss is a global) without a lock, because id is unique to the
thread.

But couldn't this result in 2 threads allocating a new entry in tss at
the same time, and so get tss in an inconsistent state?

This is what's more interesting: object access is threadsafe in the sense that
you can't get Python's internal data structures into an inconsistent state via
pure Python code (or, for that matter, well-behaved C extensions). IOW, the
above code won't ever crash the interpreter or screw up Python's internal
state. Even if two different threads did tss[5] = <somevalue> (both setting a
value using the same key) you wouldn't get an inconsistent internal state,
although you would obviously have no guarantees about the ordering of the
operations - you'd never know if thread A or thread B changed it last.
Can anyone clarify this for me? I'd like to avoid littering my
threaded code with huge numbers of unnecessary locks...

The above applies to CPython - I'm not too sure about Jython. If you're pretty
familiar with multithreading in other languages and view Python's built-in data
types as fundamental types like integers, then the rules for when to use
locking are pretty much the same as in other languages.

-Dave
 
P

Paul Moore

Peter Hansen said:
In a nutshell, your "single bytecode operation" theory is correct,
and each of the above lines represents a single bytecode. Therefore
the only line which matters is STORE_SUBSCR (the actual assignment)
and it is atomic, so you can have any number of threads running but
no matter how they try, they won't be able to interrupt each other's
STORE_SUBSCR operations and the dictionary will stay in a consistent
state.

Thanks. I think I'd got bogged down in C-level thinking ("STORE_SUBSCR
can reallocate the dictionary, so it's address changes, so doesn't
that mean another thread might have loaded its address and then it
moves...")
Much or most of the time, basic operation such as this on the standard
mutable primitive types (i.e. lists, dictionaries) are atomic/safe for
use in the way you wish to use them, without needing locks.

That's good to know. And the trick of using dis to analyze it will
help as well.

Thanks for the explanation,
Paul
 
D

Dieter Maurer

Peter Hansen said:
...
...
In a nutshell, your "single bytecode operation" theory is correct,
and each of the above lines represents a single bytecode. Therefore
the only line which matters is STORE_SUBSCR (the actual assignment)
and it is atomic

Are you sure? "STORE_SUBSCR" may dereference an object causing
the object to be deleted. Its destructor may execute arbitrary
code.

Almost surely (I trust the Python developpers), this happens
after the assignment has been made (but before "STORE_SUBSCR" finished).
Maybe, this is enough "atomicity" for you.


Dieter
 
P

Parzival

Dieter said:
Are you sure? "STORE_SUBSCR" may dereference an object causing
Almost surely (I trust the Python developpers), this happens
after the assignment has been made (but before "STORE_SUBSCR" finished).
Maybe, this is enough "atomicity" for you.

Do you really think this is a sound approach? You are obviously concerned
about correct concurrency, yet you are prepared to disassemble Python byte
code and to speculate about the actions, and then you are going to risk the
correctness of your program on what you observe and deduce?

I'll wager that nobody who is somebody in the Python world is going to offer
*any* guarantees about what source code generates what byte-code. AFAIK,
the Python GIL guarantees that at least one bytecode operation is atomic,
but the _language_ makes absolutely no commitments as to what source code
construction is guaranteed to be (now and forever) atomic. If I am wrong
about this, I want to see it in writing in the official documentation.

Until then, for correct concurrent operation, you *must* lock around all
shared concurrent data accesses.

(-: Stop, smile, and enjoy your breathing :)
-- Parzival
-- Reply-to is confuggled: parzp (@) shaw (.) ca
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top