Threads and temporary files

aiwarrior · Mar 13, 2009

Hi
I recently am meddling with threads and wanted to make a threaded
class that instead of processing anything just retrieves data from a
file and returns that data to a main thread that takes all the
gathered data and concatenates it sequentially.

An example is if we want to get various ranges of an http resource in
paralell

import threading

class download(threading.Thread):
def __init__(self,queue_in,queue_out):
threading.Thread.__init__( self )

self.url = url
self.starts = starts
self.ends = ends
self.content = 0

def getBytesRange(self):
request = urllib2.Request(self.url) # New Request object
if self.ends is not None: # If the end of the desired range is
specified
request.add_header("Range", "bytes=%d-%d" % (self.starts,
self.ends))
else: # If you want everything from start up to the resource's
length
request.add_header("Range", "bytes=%d-" % self.starts)
response = urllib2.urlopen(request) # Make the request, get
the data
self.response.read()
def run(self):
self.getBytesRange()

then when we create the threads we wait for them to complete and write
in a file. To reduce memory footprint we can write the self.response
to a tempifile and then when all threads complete concatenate the
temps in a single file in a specified location. The problem here is
the same, how to get a reference to the temporary file objects created
in the threads sequeatially. I hope i have made myself clear. Thanks
you in advance

Gabriel Genellina · Mar 14, 2009

I recently am meddling with threads and wanted to make a threaded
class that instead of processing anything just retrieves data from a
file and returns that data to a main thread that takes all the
gathered data and concatenates it sequentially.
An example is if we want to get various ranges of an http resource in
paralell

The usual way to communicate between threads is using a Queue object.
Instead of (create a thread, do some work, exit/destroy thread) you could
create the threads in advance (a "thread pool" of "worker threads") and
make them wait for some work to do from a queue (in a quasi-infinite
loop). When work is done, they put results in another queue. The main
thread just places work units on the first queue; another thread
reassembles the pieces from the result queue. For an I/O bound application
like yours, this should work smoothly.
You should be able to find examples on the web - try the Python Cookbook.

Lawrence D'Oliveiro · Mar 14, 2009

In message <302dd4f5-e9b3-4b0a-b80c-

I recently am meddling with threads and wanted to make a threaded
class that instead of processing anything just retrieves data from a
file and returns that data to a main thread that takes all the
gathered data and concatenates it sequentially.

Any reason you're using threads instead of processes?

aiwarrior · Mar 14, 2009

The usual way to communicate between threads is using a Queue object.
Instead of (create a thread, do some work, exit/destroy thread) you could
create the threads in advance (a "thread pool" of "worker threads") and
make them wait for some work to do from a queue (in a quasi-infinite
loop). When work is done, they put results in another queue. The main
thread just places work units on the first queue; another thread
reassembles the pieces from the result queue. For an I/O bound application
like yours, this should work smoothly.
You should be able to find examples on the web - try the Python Cookbook.

I already tried a double queue implementation as you suggest with a
queue for the threads to get info from and another for the threads to
put the info in. My implementation test was using a file with some
lines of random data.
Here it is

class DownloadUrl(threading.Thread):
def __init__(self,queue_in,queue_out):
threading.Thread.__init__( self )

#self.url = url
#self.starts = starts
#self.ends = ends
self.queue_in = queue_in
self.queue_out = queue_out

def run(self):

(fp,i) = self.queue_in.get()
self.queue_in.task_done()
#print var
#self.queue_out.put("i",False)

worknr = 5
queue_in = Queue.Queue(worknr)
queue_out = Queue.Queue(worknr)
threads = []
fp = open("./xi","r")
#print fp.readlines()

for i in xrange(10):
queue_in.put((fp,i))
DownloadUrl(queue_in,queue_out).start()

queue_in.join()
while queue_out.qsize():
print queue_out.get()
queue_out.task_done()

Any reason you're using threads instead of processes?

Perhaps because of more flexible way to share data between threads
than processes

Problems with sockets and threads	9	Apr 11, 2013
Problems with ZODB,I can not persist and object accessed from 2 threads	0	Apr 29, 2014
Too many threads	2	Sep 16, 2010
multiple threads with Logging: ValueError: I/O operation on closedfile	2	Nov 8, 2008
How do i : Python Threads + KeyboardInterrupt exception	7	Jun 19, 2008
Threads not Improving Performance in Program	1	Mar 19, 2009
Python and threads	5	Jan 18, 2009
urllib2 and threading	6	May 1, 2009

Threads and temporary files

aiwarrior

Gabriel Genellina

Lawrence D'Oliveiro

aiwarrior

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads