Use subprocesses in simple way...

D

Dara Durum

Hi !

Py2.4, Win32.

I need to optimize a program that have a speciality: hash (MD5/SHA1)
the file contents (many files).

Now I do this in a simple python program, because (what a pity) the
FSUM utility died in a file with unicode filename... (It is unknown
error: I used alternate file name, but it not found).
And FSUM have some problematics with it's parameterizing.

So now I want choose the best solution to get hash values:
1.) Creating pyshamd5.exe with py2exe and python, and call this
program with CreateProcess and Params ?
2.) Creating pyshamd5.exe with py2exe and python, and call this
program with popen - use pipes ?
3.) Using external utility ?
4.) Other things ?

So I want to avoid the passive times like "many process starting and
ending", "many file operation", etc.
Need to use unicode filenames, so the outer program can handle these
files (alternate file names ?).
I "send" the filename to the subprocess, and it is send the hash
results (or error codes) to my main process.

Pipe good for this, but PIPE-s are sync. (synchronous) things. For
sync. things I need threads.

If I use subprocess.Popen object, it is better, because I not use
threads (I can use poll to get return code) - I can manage the
subprocesses from onethreaded code (from main thread).
(I need more process, because it speeding up my hashing. If I have one
ISO, the hashing of this file block all operations in one
thread/process. A multiprocessor code can hash more files in one
time.)

So: I need a solution that not need multithreading chaos, where I can
poll my processes.
Better, If I have persistent suprocesses, because I avoid
starting/ending "neutral gear" of subprocesses.

The working in pseudocode:

while HaveAnyFile or HaveOpenedProcesses:
if not Processes:
CreateProcess(MaxProcesses)
if HaveUsableProcesses:
fn=GetNextFile
if fn:
SendToSubProcess(fn)
PollProcessesAndStoreResults()

The questions
1.)
Do you know a "method" that can send/receive datas to/from presistent
subprocesses without blocking ?
2.)
Do you know command line to just like FSUM that can compute file
hashes (MD5/SHA1), and don't have any problems with unicode alt. file
names ?
3.)
PIPE-s better than CreateProcess(params) ?

Important: the sockets are unavailable in this project. I don't use
them, because users forbid to me.

Thanks for any help:
dd



__________________________________________________________________
Python mailing list
Levelcim: (e-mail address removed)
Webfelulet: https://lists.fsn.hu/mailman/listinfo/python
Web oldalak: http://www.python.hu vagy http://www.python.org

Állományba menteni | Nem biztoságos képek elrejtése

Delete & Prev | Delete & Next
Move to:
 
S

Serge Orlov

Dara Durum wrote:

[snip design of a multi-processor algorithm]

I thought md5 algorithm is pretty light, so you'll be I/O-bound, then
why bother with multi-processor algorithm?
2.)
Do you know command line to just like FSUM that can compute file
hashes (MD5/SHA1), and don't have any problems with unicode alt. file
names ?

I believe you can wrap the broken program with a simple python wrapper.
Use win32api.GetShortPathName to convert non-ascii file names to DOS
filenames.
 
D

DurumDara

10 May 2006 04:57:17 -0700 said:
I thought md5 algorithm is pretty light, so you'll be I/O-bound, then
why bother with multi-processor algorithm?

This is an assessor utility.
The program's architecture must be flexible, because I don't know,
where it need to run (only I have a possibility to fix this: I write
to user's guide).

But I want to speedup my alg. with native code, and multiprocess code.
I not tested yed, but I think that 4 subprocess quickly as one large
process.
I believe you can wrap the broken program with a simple python wrapper.
Use win32api.GetShortPathName to convert non-ascii file names to DOS
filenames.

I use FindFilesW with 8. (?) parameter. This is the alternative name
of the file, but yet I found a file that not handled by FSUM
utility...

Thanx for help:
dd

Ps:
I wrote some code to test pipes and subprocesses. The name of the mod.
is testpipe.py.
The code is:

import sys,random,subprocess,time,os,popen2,threading,thread

IsMaster=len(sys.argv)==1

class ProcessThread(threading.Thread):
def __init__(self,Param):
threading.Thread.__init__(self)
self.Param=Param
self.RetVal=None
self.start()

def run(self):
param=self.Param
print "New thread with param",param
po=os.popen2('c:\\python24\\python.exe testpipe.py 1')
child_stdin,child_stdout=po
child_stdin.write(str(param)+'\n')
retval=child_stdout.readlines()
child_stdin.close()
child_stdout.close()
self.RetVal=retval

if IsMaster:
print "M:",time.time()
print "M: Start"
print "M: Open subprocess"
cnt=1
fcnt=0
pths=[]
ress=[None]*9
while True:
if cnt<10:
pt=ProcessThread(cnt)
pths.append(pt)
cnt+=1
pcnt=0
for pt in pths:
if pt: pcnt+=1
if pcnt:
for i in range(len(pths)):
pt=pths
if pt and pt.RetVal:
pths=None
ress=pt.RetVal
print [pt.RetVal]

else:
break
print "\M: The results are:"
for s in ress:
print s
print "M: End"
else:
print "S:",time.time()
print "S: Start"
print "S: Data"
print "S: End"
s=sys.stdin.readline()
print "S: %s"%s
time.sleep(1)
print "Echo: %s"%s
print "Finished"
 
S

Serge Orlov

DurumDara said:
This is an assessor utility.
The program's architecture must be flexible, because I don't know,
where it need to run (only I have a possibility to fix this: I write
to user's guide).

But I want to speedup my alg. with native code, and multiprocess code.
I not tested yed, but I think that 4 subprocess quickly as one large
process.

I believe you need to look at Queue module. Using Queue will help you
avoid threading hell that you're afraid of (and rightly so!). Create
two queues: one for jobs, another one for results, the main thread
submits jobs and picks up results for results queue. As soon as number
of results == number of jobs, it's time to quit. Submit N special jobs
that indicate it's time to exit, where N is the number of worker
threads. Then "join" the main thread with worker threads and exit the
application.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top