Thread Question

Ritesh Raj Sarraf · Aug 4, 2006

Carl said:
If you have multiple threads trying to access the same ZIP file at the
same time, whether or not they use the same ZipFile object, you'll have
trouble. You'd have to change download_from_web to protect against
simultaneous use. A simple lock should suffice. Create the lock in
the main thread, like so:

ziplock = threading.Lock()

Thanks. This looks to be the correct way to go. I do have access to all
the source code as it is under GPL.

Then change the zipping part of download_from_web to acquire and
release this lock; do zipfile operations only between them.

ziplock.acquire()
try:
do_all_zipfile_stuff_here()
finally:
ziplock.release()

I hope while one thread has acquired the lock, the other threads (which
have done the downloading work and are ready to zip) would wait.

If you can't change download_from_web, you might have no choice but to
download sequentially.

OTOH, if each thread uses a different ZIP file (and a different ZipFile
object), you wouldn't have to use a lock. It doesn't sound like you're
doing that, though.

It shouldn't be a problem if one thread is zipping at the same time
another is downloading, unless there's some common data between them
for some reason.

Thanks,
Ritesh

Carl Banks · Aug 4, 2006

Ritesh said:
I hope while one thread has acquired the lock, the other threads (which
have done the downloading work and are ready to zip) would wait.

Exactly. Only one thread can hold a lock at a time. If a thread tries
to acquire a lock that some other thread has, it'll wait until the
other thread releases it. You need locks to do this stuff because most
things (such as zipfile objects) don't wait for other threads to
finish.

Carl Banks

Ritesh Raj Sarraf · Aug 4, 2006

Carl said:
Exactly. Only one thread can hold a lock at a time. If a thread tries
to acquire a lock that some other thread has, it'll wait until the
other thread releases it. You need locks to do this stuff because most
things (such as zipfile objects) don't wait for other threads to
finish.

I would heartly like to thank you for the suggestion you made.
My program now works exactly as I wanted. Thanks.

Ritesh

Bryan Olson · Aug 5, 2006

Carl said:
Exactly. Only one thread can hold a lock at a time.

In the code above, a form called a "critical section", we might
think of a thread as holding the lock when it is between the
acquire() and release(). But that's not really how Python's
locks work. A lock, even in the locked state, is not held by
any particular thread.

If a thread tries
to acquire a lock that some other thread has, it'll wait until the
other thread releases it.

More accurate: If a thread tries to acquire a lock that is in
the locked state, it will wait until some thread releases it.
(Unless it set the blocking flag false.) If more that one thread
is waiting to acquire the lock, it may be blocked longer.

I think the doc for threading.Lock is good:

http://docs.python.org/lib/lock-objects.html

Ritesh Raj Sarraf · Aug 5, 2006

Bryan said:
In the code above, a form called a "critical section", we might
think of a thread as holding the lock when it is between the
acquire() and release(). But that's not really how Python's
locks work. A lock, even in the locked state, is not held by
any particular thread.

More accurate: If a thread tries to acquire a lock that is in
the locked state, it will wait until some thread releases it.
(Unless it set the blocking flag false.) If more that one thread
is waiting to acquire the lock, it may be blocked longer.

I think the doc for threading.Lock is good:

http://docs.python.org/lib/lock-objects.html

You're correct.
I noticed that even though while one thread acquires the lock, the other threads
don't respect the lock. In fact they just go ahead and execute the statements
within the lock acquire statement. With this behavior, I'm ending up having a
partially corrupted zip archive file.

def run(request, response, func=copy_first_match):
'''Get items from the request Queue, process them
with func(), put the results along with the
Thread's name into the response Queue.

Stop running once an item is None.'''

name = threading.currentThread().getName()
ziplock = threading.Lock()
while 1:
item = request.get()
if item is None:
break
(sUrl, sFile, download_size, checksum) = stripper(item)
response.put((name, sUrl, sFile, func(cache, sFile, sSourceDir,
checksum)))

# This will take care of making sure that if downloaded, they
are zipped
(thread_name, Url, File, exit_status) = responseQueue.get()
if exit_status == False:
log.verbose("%s not available in local cache %s\n" % (File,
cache))
if download_from_web(sUrl, sFile, sSourceDir, checksum) !=
True:
log.verbose("%s not downloaded from %s and NA in local
cache %s\n\n" % (sFile, sUrl, sRepository))
else:
# We need this because we can't do join or exists
operation on None
if cache is None or os.path.exists(os.path.join(cache,
sFile)):
#INFO: The file is already there.
pass
else:
shutil.copy(sFile, cache)
if zip_bool:
ziplock.acquire()
try:
compress_the_file(zip_type_file, sFile,
sSourceDir)
os.remove(sFile) # Remove it because we
don't need the file once it is zipped.
finally:
ziplock.release()
elif exit_status == True:
if zip_bool:
ziplock.acquire()
try:
compress_the_file(zip_type_file, sFile, sSourceDir)
os.unlink(sFile)
finally:
ziplock.release()

--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
"Stealing logic from one person is plagiarism, stealing from many is research."
"The great are those who achieve the impossible, the petty are those who
cannot - rrs"

Bryan Olson · Aug 5, 2006

Ritesh Raj Sarraf wrote:
[...]

I noticed that even though while one thread acquires the lock, the other threads
don't respect the lock. In fact they just go ahead and execute the statements
within the lock acquire statement. With this behavior, I'm ending up having a
partially corrupted zip archive file.

No, Carl's code was fine. I just found his explanation
misleading.

def run(request, response, func=copy_first_match):
'''Get items from the request Queue, process them
with func(), put the results along with the
Thread's name into the response Queue.

Stop running once an item is None.'''

name = threading.currentThread().getName()
ziplock = threading.Lock()

You don't want "ziplock = threading.Lock()" in the body of
the function. It creates a new and different lock on every
execution. Your threads are all acquiring different locks.
To coordinate your threads, they need to be using the same
lock.

Try moving "ziplock = threading.Lock()" out of the function, so
your code might read, in part:

ziplock = threading.Lock()

def run(request, response, func=copy_first_match):
# And so on...

Ritesh Raj Sarraf · Aug 5, 2006

Bryan said:
You don't want "ziplock = threading.Lock()" in the body of
the function. It creates a new and different lock on every
execution. Your threads are all acquiring different locks.
To coordinate your threads, they need to be using the same
lock.

Try moving "ziplock = threading.Lock()" out of the function, so
your code might read, in part:

ziplock = threading.Lock()

def run(request, response, func=copy_first_match):
# And so on...

Thanks. That did it.

Ritesh
--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
"Stealing logic from one person is plagiarism, stealing from many is research."
"The great are those who achieve the impossible, the petty are those who
cannot - rrs"

Simon Forman · Aug 6, 2006

Ritesh said:
Thanks. That did it.

Ritesh

Another thing you might want to consider would be to split your
download and zipping code into separate functions then create one more
thread to do all the zipping. That way your downloading threads would
never be waiting around for each other to zip.

Just a thought.

~Simon

Gerhard Fiedler · Aug 6, 2006

I was using this approach earlier. The problem with this approach is
too much temporary disk usage.

Say I'm downloading 2 GB of data which is a combination of, say 600
files. Now following this approach, I'll have to make sure that I have
4 GB of disk space available on my hard drive.

Not necessarily. You have a minimum speed of the zipping process, and a
maximum speed of the download. Between the two you can figure out what the
maximum required temp storage space is. It's in any case less than the full
amount, and if the minimum zipping speed is faster than the maximum
download speed, it's not more than a few files.

But if you current solution works, then that's good enough

It probably
wouldn't be much faster anyway; only would avoid a few waiting periods.

Gerhard

deque and thread-safety	0	Oct 12, 2012
daemon thread cleanup approach	7	May 29, 2014
java thread question	9	Apr 3, 2014
basic thread question	2	Aug 18, 2009
Are min() and max() thread-safe?	15	Sep 17, 2009
A thread import problem	0	Jul 19, 2012
Thread program	9	Nov 3, 2012
thread. question	1	Feb 9, 2009

Thread Question

Ritesh Raj Sarraf

Carl Banks

Ritesh Raj Sarraf

Bryan Olson

Ritesh Raj Sarraf

Bryan Olson

Ritesh Raj Sarraf

Simon Forman

Gerhard Fiedler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads