querry on queue ( thread safe ) multithreading


J

Jaiprakash Singh

hey i am working on scraping a site , so i am using multi-threading concept.
i wrote a code based on queue (thread safe) but still my code block out after sometime, please help , i have searched a lot but unable to resolve it. please help i stuck here.

my code is under ..

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

import subprocess
import multiprocessing
import logging
from scrapy import cmdline
import time

logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)-10s) %(message)s',)


num_fetch_threads = 150
enclosure_queue = multiprocessing.JoinableQueue()



def main3(i, q):
for pth in iter(q.get, None):
try:
cmdline.execute(['scrapy', 'runspider', 'page3_second_scrapy_flipkart.py', '-a', 'pth=%s' %(pth)])
print pth
except:
pass

time.sleep(i + 2)
q.task_done()

q.task_done()




def main2(output):
procs = []

for i in range(num_fetch_threads):
procs.append(multiprocessing.Process(target=main3, args=(i, enclosure_queue,)))
#worker.setDaemon(True)
procs[-1].start()

for pth in output:
enclosure_queue.put(pth)

print '*** Main thread waiting'
enclosure_queue.join()
print '*** Done'

for p in procs:
enclosure_queue.put(None)

enclosure_queue.join()

for p in procs:
p.join()
 
Ad

Advertisements

J

Jim Gibson

Jaiprakash Singh said:
hey i am working on scraping a site , so i am using multi-threading concept.
i wrote a code based on queue (thread safe) but still my code block out after
sometime, please help , i have searched a lot but unable to resolve it.
please help i stuck here.

Do you really want to subject the web server to 150 simultaneous
requests? Some would consider that a denial-of-service attack.

When I scrape a site, and I have been doing that occasionally of late,
I put a 10-second sleep after each HTTP request. That makes my program
more considerate of other people's resources and a better web citizen.
It is also much easier to program.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top