Threads modify "global" variable -- asking for trouble?

J

J Rice

I have been experimenting with some thread programming, but as I'm
doing this on my own I am worried I might be making a major mistake.

Here's a brief rundown of what I am working on. Multiple threads, via
Queue, are used to perform RBL checks on an IP. The threads are passed
a defined class (ConnectionScore) in their init. This is used to
collect the results, including a method to add the result. If I run
the program, everything goes fine and once all the threads complete the
main program has all the data from their RBL checks in the
ConnectionScore class.

It seems to work, even with an obscene (and silly) number of WORKERS
and RBLs to check. But I was reading in the Python Cookbook this
evening and it raised the issue in my mind of what happens if multiple
threads try to add data to the class at the same time.

I've attached the code below. The ConnectionClass.addRBL method just
takes the result and adds it to a dictionary object attached to the
class.


#!/usr/bin/python
from UCE_weight import ConnectionScore
from rbl import *
import Queue
import threading
import time, random

starttime = time.time()

WORKERS=5

class Worker(threading.Thread):

def __init__(self,queue,score):
self.__queue = queue
threading.Thread.__init__(self)
self.score = score

def run(self):
while 1:
RBL = self.__queue.get()
# Are we at the end of the queue?
if RBL is None: break
self.result = checkRBL("127.0.0.2",RBL,True,5)
score.addRBL(RBL,self.result[0],self.result[1])

queue = Queue.Queue(0)
score = ConnectionScore()

f = open("rbl.list",'r')
MEGARBL = []
while 1:
line = f.readline().strip()
if not line: break
MEGARBL.append(line)
f.close()


for i in range(WORKERS):
Worker(queue,score).start() # start a worker

for i in MEGARBL:
queue.put(i)

for i in range(WORKERS):
queue.put(None) # add end of queue marker

while 1:
# wait for the queue???
if queue.empty():
break
else:
time.sleep(0.1)


elapsed = time.time() - starttime
rate = len(MEGARBL)/elapsed
rate = str(round(rate,2))
elapsed = str(round(elapsed,2))

score.showRBL()
print "Checked",str(len(MEGARBL)),"zones in",elapsed,"seconds.
(",rate.strip(),"per second )"
 
J

J Rice

My apologizes, I missed the newish FAQ entry on this. The addrbl()
method looks like this:

def addRBL(self, testname, result, info=""):
self.testresultsRBL[testname] = result, info

So according to the FAQ, D[x] = y, where D is a dictionary, is atomic
and therefore thread-safe. Right?
 
A

Alex Martelli

J Rice said:
My apologizes, I missed the newish FAQ entry on this. The addrbl()
method looks like this:

def addRBL(self, testname, result, info=""):
self.testresultsRBL[testname] = result, info

So according to the FAQ, D[x] = y, where D is a dictionary, is atomic
and therefore thread-safe. Right?

In the current implementation specifically, for CPython exclusively, if
the hashing of testname is itself somehow 'atomic', and so is the access
to attribute testresultsRBL of object self, I do believe you might luck
out and end up "atomic" (by accident of implementation, only) if
everything's just right. Of course, any tiny change (any different
implementation of Python, any type that's not a pure primitive, any
future version of Python, etc, etc) could break this extremely fragile
set of circumstances -- Python as a language makes no guarantees of
atomicity for anything except the synchronization primitives (of which
Queue is the most powerful). There was a rather heated exchange on the
subject quite recently on this group.

Just put a lock acquire/release around this assignment (and any _use_ of
the same dictionary) and you should be vastly safer, IMNSHO.


Alex
 
T

Terry Reedy

J Rice said:
My apologizes, I missed the newish FAQ entry on this. The addrbl()
method looks like this:

def addRBL(self, testname, result, info=""):
self.testresultsRBL[testname] = result, info

So according to the FAQ, D[x] = y, where D is a dictionary, is atomic
and therefore thread-safe. Right?

I believe an alternative, mentioned elsewhere by Tim Peters, is for workers
to send results (here (testname,result,info) ) back in a results queue read
by the main thread.

tjr



tjr
 
D

Dennis Lee Bieber

My apologizes, I missed the newish FAQ entry on this. The addrbl()
method looks like this:

def addRBL(self, testname, result, info=""):
self.testresultsRBL[testname] = result, info

So according to the FAQ, D[x] = y, where D is a dictionary, is atomic
and therefore thread-safe. Right?

It should be, but...

1) You should have included that class in the listing shown

2) This is an ideal place for a collection THREAD, with a receiving
Queue... The workers would be passed a reference to the queue, and as
they come up with results, put the results onto the queue. The
collection thread would then take the items from the queue and do
whatever with them.

(note: the collection "thread" could be the main program, in a loop
after starting the workers)
--
 
J

J Rice

Thank you. Implementing a results queue was much simpler than I
expected, and I think as I add this into the rest of the program it
will avoid a lot of potential problems later too.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top