timeout on os.popen3?

S

selwyn

hi all,

I would like some advice on how I can include a timeout for a scanning
operation using unzip on linux and os.popen3.

I am scanning through about 30g of rescued zip files, looking for xml
extensions within those files. What I have put together 'works', but
only for what appears to be properly reconstructed files.
Unfortunately, some aren't AND for some reason no standard error
messages are being triggered. This causes my script to hang indefinitely.

What I would like is for the script to move on to the next file, after
say a 10sec period of inactivity, but am unsure how this could be
included. I have googled around and think the select module may be
helpful, but after reading the docs I am still confused :-(

Here is what I have so far:

#!/usr/bin/python
import os,sys, time, string

filesscanned=0
possibles=0
nonzips=0
files=[]
a = os.listdir(sys.argv[1])

for i in a:
print i
stdin, stdout, stderr = os.popen3('unzip -l %s%s' % (sys.argv[1], i))
if stderr.read()=='':
zippedfiles = string.lower(stdout.read())
if zippedfiles.find('xml')!= -1:
os.system("""cp '%s%s' candidates"""% (sys.argv[1], i))
possibles +=1
print 'found a candidate:- %s%s'% (sys.argv[1],i)
files.append(i)
else:
os.system("""cp '%s%s' nonzips"""% (sys.argv[1], i))
nonzips +=1
print 'found nonzip or broken file:- %s' %i

filesscanned +=1

Any help gratefully received.
cheers,
Selwyn.
 
D

Donn Cave

Quoth selwyn <[email protected]>:
| I would like some advice on how I can include a timeout for a scanning
| operation using unzip on linux and os.popen3.
|
| I am scanning through about 30g of rescued zip files, looking for xml
| extensions within those files. What I have put together 'works', but
| only for what appears to be properly reconstructed files.
| Unfortunately, some aren't AND for some reason no standard error
| messages are being triggered. This causes my script to hang indefinitely.
|
| What I would like is for the script to move on to the next file, after
| say a 10sec period of inactivity, but am unsure how this could be
| included. I have googled around and think the select module may be
| helpful, but after reading the docs I am still confused :-(

You might be able to manage it with select. When you start up a program
on two or more pipes, you have kind of a juggling act. Select is the
juggler, it can tell which pipe has data to read and which is ready for
more data to be written to it. However, it's still a juggling act and
you need some skill, too. If you decide to try it, also read about
os.read, and don't try to use the file object for reading.

On the other hand, if you don't mind writing to disk files instead, that
will completely eliminate this aspect of your problem. Like

file = '%s%s' % (sys.argv[1], i)
ev = os.system('unzip -l "%s" > /tmp/zout 2> /tmp/zerr' % (file,)
if os.WEXITSTATUS(ev) != 0 or nonEmptyFile('/tmp/zerr'):
dealWithError(ev, open('/tmp/zerr', 'r'))
elif searchFile('/tmp/zout', 'xml'):
dealWithFile(file)

Donn Cave, (e-mail address removed)
 
S

selwyn

On the other hand, if you don't mind writing to disk files instead, that
will completely eliminate this aspect of your problem.

thanks worked perfectly - classic case of not seeing the forest for the
trees!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top