N
Ningyu Shi
I'm trying to write a multi-task downloader to download files from a
website using multi-threading. I have one thread to analyze the
webpage, get the addresses of the files to be downloaded and put these
in a Queue. Then the main thread will start some threads to get the
address from the queue and download it. To keep the maximum files
downloaded concurrently, I use a semaphore to control this, like at
most 5 downloads at same time.
I tried to use urllib.urlretreive in the download() thread, but from
time to time, it seems that one download thread may freeze the whole
program. Then I gave up and use subprocess to call wget to do the job.
My download thread is like this:
def download( url ):
subprocess.call(["wget", "-q", url])
with print_lock:
print url, 'finished.'
semaphore.realease()
But later I found that after the specific wget job finished
downloading, that download() thread never reach the print url
statement. So I end up with files been downloaded, but the download()
thread never ends and don't realease the semaphore then block the
whole program. My guess is that at the time wget process ends, that
specific download thread is not active and missed the return of the
call.
Any comment and suggestion about this problem? Thanks
website using multi-threading. I have one thread to analyze the
webpage, get the addresses of the files to be downloaded and put these
in a Queue. Then the main thread will start some threads to get the
address from the queue and download it. To keep the maximum files
downloaded concurrently, I use a semaphore to control this, like at
most 5 downloads at same time.
I tried to use urllib.urlretreive in the download() thread, but from
time to time, it seems that one download thread may freeze the whole
program. Then I gave up and use subprocess to call wget to do the job.
My download thread is like this:
def download( url ):
subprocess.call(["wget", "-q", url])
with print_lock:
print url, 'finished.'
semaphore.realease()
But later I found that after the specific wget job finished
downloading, that download() thread never reach the print url
statement. So I end up with files been downloaded, but the download()
thread never ends and don't realease the semaphore then block the
whole program. My guess is that at the time wget process ends, that
specific download thread is not active and missed the return of the
call.
Any comment and suggestion about this problem? Thanks