Python multithreading problem

A

abhinav

//A CRAWLER IMPLEMENTATION
please run this prog. on the shell and under the control of debugger
when this prog. is run normally the prog. does not terminate .It
doesn't come out of the cond. if c<5: so this prog. continues
infinitely
but if this prog is run under the control of debugger the prog
terminates when the cond. if c<5: becomes false
i think this prob. may be due to multithreading pls help.


from sgmllib import SGMLParser
import threading
import re
import urllib
import pdb
import time
class urlist(SGMLParser):
def reset(self):
SGMLParser.reset(self)
self.list=[]

def start_a(self,attr):
href=[v for k,v in attr if k=="href"]
if href:
self.list.extend(href)
mid=2
c=0
class mythread(threading.Thread):
stdmutex=threading.Lock()
global threads
threads=[]
def __init__(self,u,myid):
self.u=u
self.myid=myid
threading.Thread.__init__(self)
def run(self):
global c
global mid
if c<5:
self.stdmutex.acquire()
self.usock=urllib.urlopen(self.u)
self.p=urlist()
self.s=self.usock.read()
self.p.feed(self.s)
self.usock.close()
self.p.close()
c=c+1
fname="/root/" + str(c) + ".txt"
self.f=open(fname,"w")
self.f.write(self.s)
self.f.close()
print c
print self.p.list
print self.u
print self.myid
for j in self.p.list:
k=re.search("^https?:",j)
if k:
i=mythread(j,mid)
i.start()
threads.append(i)
mid=mid+1
self.stdmutex.release()






if __name__=="__main__":
thread=mythread("http://www.google.co.in/",1)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
print "main thread exits"
 
D

Dennis Lee Bieber

self.list.extend(href)
mid=2
c=0
class mythread(threading.Thread):
stdmutex=threading.Lock()
global threads
threads=[]

Move that line out -- initialize all your globals (ugh) in the same
spot...

self.stdmutex.acquire()

There is NO actual self.stdmutex (using self. implies that a copy
exists for each instance), you should be using a class level reference,
or __init__ should make self.stdmutex a reference to the class level...

It may be inherited, but it isn't safe, in my mind..
for j in self.p.list:
k=re.search("^https?:",j)
if k:
i=mythread(j,mid)
i.start()
threads.append(i)
mid=mid+1
self.stdmutex.release()

That doesn't look good either -- you are releasing the lock from
inside a loop -- and "k" never changes, so you keep releasing the same
lock and spawning new threads...

Maybe something like:

for j in ...:
k = re...
if k:
threads.append(mythread(j, mid).start())
mid = mid + 1
break
self.stdmutex...




--
 
S

Serge Orlov

abhinav said:
//A CRAWLER IMPLEMENTATION
please run this prog. on the shell and under the control of debugger
when this prog. is run normally the prog. does not terminate .It
doesn't come out of the cond. if c<5: so this prog. continues
infinitely

How do you know? Have you waited *infinitely* ;)

if c<5:
self.stdmutex.acquire()

The problem you have a lot of threads that has already checked c < 5
condition but has not acquired the lock yet. Besides you have another
problem: if a thread raises an exception you don't release the lock.
Why don't you use Queue module for sane thread management?

Serge.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,585
Members
45,081
Latest member
AnyaMerry

Latest Threads

Top