Threads and temporary files

A

aiwarrior

Hi
I recently am meddling with threads and wanted to make a threaded
class that instead of processing anything just retrieves data from a
file and returns that data to a main thread that takes all the
gathered data and concatenates it sequentially.

An example is if we want to get various ranges of an http resource in
paralell

import threading

class download(threading.Thread):
def __init__(self,queue_in,queue_out):
threading.Thread.__init__( self )

self.url = url
self.starts = starts
self.ends = ends
self.content = 0

def getBytesRange(self):
request = urllib2.Request(self.url) # New Request object
if self.ends is not None: # If the end of the desired range is
specified
request.add_header("Range", "bytes=%d-%d" % (self.starts,
self.ends))
else: # If you want everything from start up to the resource's
length
request.add_header("Range", "bytes=%d-" % self.starts)
response = urllib2.urlopen(request) # Make the request, get
the data
self.response.read()
def run(self):
self.getBytesRange()

then when we create the threads we wait for them to complete and write
in a file. To reduce memory footprint we can write the self.response
to a tempifile and then when all threads complete concatenate the
temps in a single file in a specified location. The problem here is
the same, how to get a reference to the temporary file objects created
in the threads sequeatially. I hope i have made myself clear. Thanks
you in advance
 
G

Gabriel Genellina

I recently am meddling with threads and wanted to make a threaded
class that instead of processing anything just retrieves data from a
file and returns that data to a main thread that takes all the
gathered data and concatenates it sequentially.
An example is if we want to get various ranges of an http resource in
paralell

The usual way to communicate between threads is using a Queue object.
Instead of (create a thread, do some work, exit/destroy thread) you could
create the threads in advance (a "thread pool" of "worker threads") and
make them wait for some work to do from a queue (in a quasi-infinite
loop). When work is done, they put results in another queue. The main
thread just places work units on the first queue; another thread
reassembles the pieces from the result queue. For an I/O bound application
like yours, this should work smoothly.
You should be able to find examples on the web - try the Python Cookbook.
 
L

Lawrence D'Oliveiro

In message <302dd4f5-e9b3-4b0a-b80c-
I recently am meddling with threads and wanted to make a threaded
class that instead of processing anything just retrieves data from a
file and returns that data to a main thread that takes all the
gathered data and concatenates it sequentially.

Any reason you're using threads instead of processes?
 
A

aiwarrior

The usual way to communicate between threads is using a Queue object.
Instead of (create a thread, do some work, exit/destroy thread) you could  
create the threads in advance (a "thread pool" of "worker threads") and  
make them wait for some work to do from a queue (in a quasi-infinite  
loop). When work is done, they put results in another queue. The main  
thread just places work units on the first queue; another thread  
reassembles the pieces from the result queue. For an I/O bound application  
like yours, this should work smoothly.
You should be able to find examples on the web - try the Python Cookbook.

I already tried a double queue implementation as you suggest with a
queue for the threads to get info from and another for the threads to
put the info in. My implementation test was using a file with some
lines of random data.
Here it is

class DownloadUrl(threading.Thread):
def __init__(self,queue_in,queue_out):
threading.Thread.__init__( self )

#self.url = url
#self.starts = starts
#self.ends = ends
self.queue_in = queue_in
self.queue_out = queue_out

def run(self):

(fp,i) = self.queue_in.get()
self.queue_in.task_done()
#print var
#self.queue_out.put("i",False)

worknr = 5
queue_in = Queue.Queue(worknr)
queue_out = Queue.Queue(worknr)
threads = []
fp = open("./xi","r")
#print fp.readlines()

for i in xrange(10):
queue_in.put((fp,i))
DownloadUrl(queue_in,queue_out).start()


queue_in.join()
while queue_out.qsize():
print queue_out.get()
queue_out.task_done()
Any reason you're using threads instead of processes?
Perhaps because of more flexible way to share data between threads
than processes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top