Are python objects thread-safe?

R

RajNewbie

Say, I have two threads, updating the same dictionary object - but for
different parameters:
Please find an example below:
a = {file1Data : '',
file2Data : ''}

Now, I send it to two different threads, both of which are looping
infinitely:
In thread1:
a['file1Data'] = open(filename1).read
and
in thread2:
a['file2Data'] = open(filename2).read

My question is - is this object threadsafe? - since we are working on
two different parameters in the object. Or should I have to block the
whole object?
 
J

James Mills

Say, I have two threads, updating the same dictionary object - but for
different parameters:
Please find an example below:
a = {file1Data : '',
file2Data : ''}

Now, I send it to two different threads, both of which are looping
infinitely:
In thread1:
a['file1Data'] = open(filename1).read
and
in thread2:
a['file2Data'] = open(filename2).read

My question is - is this object threadsafe? - since we are working on
two different parameters in the object. Or should I have to block the
whole object?

I believe (iirc), all basic data types
and objects are thread-safe. I could
be wrong though - I don't tend to
use threads much myself :)

cheers
James
 
A

Aaron Brady

Say, I have two threads, updating the same dictionary object - but for
different parameters:
Please find an example below:
a = {file1Data : '',
       file2Data : ''}

Now, I send it to two different threads, both of which are looping
infinitely:
In thread1:
a['file1Data'] = open(filename1).read
          and
in thread2:
a['file2Data'] = open(filename2).read

My question is  - is this object threadsafe? - since we are working on
two different parameters in the object. Or should I have to block the
whole object?

Threads take turns with the Global Interpreter Lock, so a Python
thread is sure to have the GIL before it calls a method on some
object. So yes, with the rare exception (that I don't want to not
mention) that if you've got non-Python threads running in your process
somehow, they don't make the guarantee of enforcing that.
 
R

Rhamphoryncus

Say, I have two threads, updating the same dictionary object - but for
different parameters:
Please find an example below:
a = {file1Data : '',
       file2Data : ''}

Now, I send it to two different threads, both of which are looping
infinitely:
In thread1:
a['file1Data'] = open(filename1).read
          and
in thread2:
a['file2Data'] = open(filename2).read

My question is  - is this object threadsafe? - since we are working on
two different parameters in the object. Or should I have to block the
whole object?

In general, python makes few promises. It has a *strong* preference
towards failing gracefully (ie an exception rather than a segfault),
which implies atomic operations underneath, but makes no promise as to
the granularity of those atomic operations.

In practice though, it is safe to update two distinct keys in a dict.
 
A

Aaron Brady

RajNewbie said:
Say, I have two threads, updating the same dictionary object - but for
different parameters:
Please find an example below:
a = {file1Data : '',
       file2Data : ''}
Now, I send it to two different threads, both of which are looping
infinitely:
In thread1:
a['file1Data'] = open(filename1).read
          and
in thread2:
a['file2Data'] = open(filename2).read
My question is  - is this object threadsafe? - since we are working on
two different parameters in the object. Or should I have to block the
whole object?

It depends exactly what you mean by 'threadsafe'. The GIL will guarantee
that you can't screw up Python's internal data structures: so your
dictionary always remains a valid dictionary rather than a pile of bits.

However, when you dig a bit deeper, it makes very few guarantees at the
Python level. Individual bytecode instructions are not guaranteed
atomic: for example, any assignment (including setting a new value into
the dictionary) could overwrite an existing value and the value which is
overwritten may have a destructor written in Python. If that happens you
can get context switches within the assignment.

Th.1 Th.2
a=X
a=Y
a=Z

You are saying that if 'a=Z' interrupts 'a=Y' at the wrong time, the
destructor for 'X' or 'Y' might not get called. Correct? In serial
flow, the destructor for X is called, then Y.
Other nasty things can happen if you use dictionaries from multiple
threads. You cannot add or remove a dictionary key while iterating over
a dictionary. This isn't normally a big issue, but as soon as you try to
share the dictionary between threads you'll have to be careful never to
iterate through it.

These aren't documented, IIRC. Did you just discover them by trial
and error?
You will probably find it less error prone in the long run if you get
your threads to write (key,value) tuples into a queue which the
consuming thread can read and use to update the dictionary.

Perhaps there's a general data structure which can honor 'fire-and-
forget' method calls in serial.

a= async( {} )
a[0]= X
a[0]= Y

-->
obj_queue[a].put( a.__setitem__, 0, X )
obj_queue[a].put( a.__setitem__, 0, Y )

If you need the return value, you'll need to block.

print a[0]
-->
res= obj_queue[a].put( a.__getitem__, 0 )
res.wait()
return res
print res

Or you can use a Condition object. But you can also delegate the
print farther down the line of processing:

obj_queue[a].link( print ).link( a.__getitem__, 0 )

(As you can see, the author (I) finds it a more interesting problem to
get required information in the right places at the right times in
execution. The actual implementation is left to the reader; I'm
merely claiming that there exists a consistent one taking the above
instructions to be sufficient givens.)
 
G

Gabriel Genellina

En Tue, 23 Dec 2008 11:30:25 -0200, Duncan Booth
No, the destructors will be called, but the destructors can do pretty
much
anything they want so you can't say the assignment is atomic. This isn't
actually a threading issue: you don't need multiple threads to experience
werid issues here. If you do strange things in a destructor then you can
come up with confusing code even with a single thread.

A simple example showing what you said:

py> class A:
.... def __del__(self):
.... global a
.... a = None
....
py> a = A()
py> a = 3
py> print a
None
 
A

Aaron Brady

No, the destructors will be called, but the destructors can do pretty much
anything they want so you can't say the assignment is atomic. This isn't
actually a threading issue: you don't need multiple threads to experience
werid issues here. If you do strange things in a destructor then you can
come up with confusing code even with a single thread.

I see. What about

del a
a= Z

Then, can we say 'a=Z' is atomic? At least, it eliminates the
destructor issue you raise.
It is documented, but I can't remember where for Python 2.x. For Python 3,
PEP 3106 says: "As in Python 2.x, mutating a dict while iterating over it
using an iterator has an undefined effect and will in most cases raise a
RuntimeError exception. (This is similar to the guarantees made by the Java
Collections Framework.)"

I infer that d.items() holds the GIL during the entire operation, and
it's safe to put in a thread. It is merely using an iterator that is
unsafe. (Python 3.0 removed d.items(), leaving only the iterator, I
understand.)

I'm looking at the code, and I don't see where the size is safely
checked. That is, can't I sneak in an add and a remove during
iteration, so long as it doesn't catch me?

I'm looking at 'dict_traverse':
while (PyDict_Next(op, &i, &pk, &pv)) {
Py_VISIT(pk);
Py_VISIT(pv);
}

No locks are acquired here, though I might have missed acquiring the
GIL somewhere else.

In the OP's example, he wasn't changing the size of the dict.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,156
Latest member
KetoBurnSupplement
Top