atomic operations in presence of multithreading

G

Glenn Kasten

I am wondering which operations in Python
are guaranteed to be atomic in the presence
of multi-threading. In particular, are assignment
and reading of a dictionary entry atomic?
For example, initially:
dictionary = {}
dictionary[key] = old_value
Then thread 1 does:
v = dictionary[key]
And thread 2 does:
dictionary[key] = new_value2
And thread 3 does:
dictionary[key] = new_value3
What I want to make sure of is that
thread 1 will read either the old_value
for key, or new_value2, or new_value3,
but not an intermediate value.
Which value in particular does not matter to me, as that
obviously depends on timing. I just want to make sure
that the dictionary internal data structures can't
become corrupted by multiple writers / readers.
Note: I realize that in this particular case
I can use locks to guarantee safety, but I am
wondering whether the basic language operations
such as dictionary reads and updates are guaranteed
to be atomic.
A related, more general question, is which
Python operations are atomic, and which are not.
I was unable to find this in the language reference,
but please direct me to the correct location
if I missed it.
Thanks,
Glenn
 
D

Dave Brueck

Glenn said:
I am wondering which operations in Python
are guaranteed to be atomic in the presence
of multi-threading. In particular, are assignment
and reading of a dictionary entry atomic? [snip]
I just want to make sure
that the dictionary internal data structures can't
become corrupted by multiple writers / readers. [snip]
Python operations are atomic, and which are not.
I was unable to find this in the language reference,
but please direct me to the correct location
if I missed it.

It's probably not spelled out anywhere, but the "global interpreter
lock" entry and the Glossary and the first few paragraphs of section 8.1
of the documentation ("Thread State and the Global Interpreter Lock")
give you the gist of it.

Basically: multiple threads can't corrupt the interpreter's internals
(but a buggy C extension could).

-Dave
 
P

Peter Hansen

Dave said:
Glenn said:
I am wondering which operations in Python
are guaranteed to be atomic in the presence
of multi-threading. In particular, are assignment
and reading of a dictionary entry atomic?

> [...]
Basically: multiple threads can't corrupt the interpreter's internals
(but a buggy C extension could).

The "dis" module can be helpful in analyzing such things, sometimes.
Realizing that the GIL ensures that individual bytecodes are
executed atomically ("boom!"), anything that shows up as a
separate bytecode instruction is basically thread-safe:
.... v = dictionary[key]
........ dictionary[key] = new_value2
.... 2 0 LOAD_GLOBAL 0 (dictionary)
3 LOAD_GLOBAL 1 (key)
6 BINARY_SUBSCR
7 STORE_FAST 0 (v)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE 2 0 LOAD_GLOBAL 0 (new_value2)
3 LOAD_GLOBAL 1 (dictionary)
6 LOAD_GLOBAL 2 (key)
9 STORE_SUBSCR
10 LOAD_CONST 0 (None)
13 RETURN_VALUE

The key instructions in the above are the dictionary lookup,
which is merely "BINARY_SUBSCR" in func(), and the dictionary
assignment, which is "STORE_SUBSCR" in func2(). If func
and func2 were in separate threads, either the lookup or
the store executes first, then the other, but they cannot
both be executing at the same time.

-Peter
 
P

Paul Moore

Peter Hansen said:
Dave said:
Glenn said:
I am wondering which operations in Python
are guaranteed to be atomic in the presence
of multi-threading. In particular, are assignment
and reading of a dictionary entry atomic?

[...]
Basically: multiple threads can't corrupt the interpreter's
internals (but a buggy C extension could).

[...]

The key instructions in the above are the dictionary lookup,
which is merely "BINARY_SUBSCR" in func(), and the dictionary
assignment, which is "STORE_SUBSCR" in func2(). If func
and func2 were in separate threads, either the lookup or
the store executes first, then the other, but they cannot
both be executing at the same time.

As was pointed out to me when this came up recently, it is possible
for callbacks into python to screw this simple picture up a little.

The obvious case is a user-defined class with a __setitem__ method. In
this case, STORE_SUBSCR calls arbitrary Python code, and so can be
interrupted by a thread switch.

Given that the original question was about dictionaries, which are
coded in C (and so not subject to this issue) there is still the
following case: when the old value stored in the dictionary is
replaced, that could be the last reference to it. When the old value
is freed, its __del__ method gets called - arbitrary Python code
again.

But the basic idea is sound. The interpreter releases the GIL *only*
between Python bytecodes. As long as you cater for recursive cases
like the above (and obvious ones like CALL_METHOD), you're OK.

Finally, C code has the option of explicitly releasing the GIL -
"long-running" operations like file reads do this, but basic ops
don't.

Paul.
 
D

Dave Brueck

Paul said:
Dave Brueck wrote:

Glenn Kasten wrote:


I am wondering which operations in Python
are guaranteed to be atomic in the presence
of multi-threading. In particular, are assignment
and reading of a dictionary entry atomic?

[...]
Basically: multiple threads can't corrupt the interpreter's
internals (but a buggy C extension could).

[...]


The key instructions in the above are the dictionary lookup,
which is merely "BINARY_SUBSCR" in func(), and the dictionary
assignment, which is "STORE_SUBSCR" in func2(). If func
and func2 were in separate threads, either the lookup or
the store executes first, then the other, but they cannot
both be executing at the same time.


As was pointed out to me when this came up recently, it is possible
for callbacks into python to screw this simple picture up a little.

The obvious case is a user-defined class with a __setitem__ method. In
this case, STORE_SUBSCR calls arbitrary Python code, and so can be
interrupted by a thread switch.

Given that the original question was about dictionaries, which are
coded in C (and so not subject to this issue) there is still the
following case: when the old value stored in the dictionary is
replaced, that could be the last reference to it. When the old value
is freed, its __del__ method gets called - arbitrary Python code
again.

But the basic idea is sound. The interpreter releases the GIL *only*
between Python bytecodes. As long as you cater for recursive cases
like the above (and obvious ones like CALL_METHOD), you're OK.

Yep, though I think the OP was mostly concerned with Python's internals
gettin messed up by unlocked multithreaded access, so he doesn't have to
be concerned about the above.

-Dave
 
D

Duncan Booth

Paul Moore said:
Given that the original question was about dictionaries, which are
coded in C (and so not subject to this issue) there is still the
following case: when the old value stored in the dictionary is
replaced, that could be the last reference to it. When the old value
is freed, its __del__ method gets called - arbitrary Python code
again.

Right, but the new value has already been stored in the dictionary at the
point where the __del__ method is called, so as far as the OP is concerned
this is still safe.
 
J

Jeff Shannon

Duncan said:
Right, but the new value has already been stored in the dictionary at the
point where the __del__ method is called, so as far as the OP is concerned
this is still safe.

For some definition of "safe", perhaps. It won't corrupt any Python
internals, but if the __del__() method has side effects that the
threaded code is relying on, it may give unexpected results. (One might
want to rely on the fact that, once an object holding a socket is
removed from a dictionary, that socket will be closed; if that close
happens in the object's __del__(), then there's no guarantee that it
*will* actually be closed before a thread switch occurs. (Relying on
__del__() is hazardous in any case, actually, thanks to differing GC
between different implementations of Python, but it's even worse in a
threaded environment.))

Jeff Shannon
Technician/Programmer
Credit International
 
P

Paul Moore

Dave Brueck said:
Yep, though I think the OP was mostly concerned with Python's
internals gettin messed up by unlocked multithreaded access, so he
doesn't have to be concerned about the above.

I don't think that Python's internals can *ever* be messed up by
unlocked multithreaded access - that's what the GIL is for.

Paul.
 
D

Dave Brueck

Paul said:
I don't think that Python's internals can *ever* be messed up by
unlocked multithreaded access - that's what the GIL is for.

Yeah, that was mentioned earlier:

"multiple threads can't corrupt the interpreter's internals (but a buggy
C extension could)."

-Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top