subprocess.Popen and multiprocessing fails to execute external program

N

Niklas Berliner

I have a pipline that involves processing some data, handing the data to an
external program (t_coffee used for sequence alignments in bioinformatics),
and postprocessing the result. Since I have a lot of data, I need to run my
pipeline in parallel which I implemented using the multiprocessing module
following Doug Hellmanns blog (
http://blog.doughellmann.com/2009/04/pymotw-multiprocessing-part-1.html).

My pipeline works perfectly fine when I run it with the multiprocessing
implementation and one consumer, i.e. on one core. If I increase the number
of consumers, i.e. that multiple instances of my pipeline run in parallel
the external program fails with a core dump.

To call the external programm I let python write a bash wrapper script that
is called by
subprocess.Popen(system_command, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
result, error = childProcess.communicate()
rc = childProcess.returncode
(I also tried shell=False and calling the program directly specifying the
env for the call)

To avoid conflict between the external program each program call gets a
flushed environment and the important environment variables are set to
unique, existing paths. An example looks like this:
#!/bin/bash
env -i
export HOME_4_TCOFFEE="/home/niklas/tcoffee/parallel/99-1-Consumer-2/"
export CACHE_4_TCOFFEE="$HOME_4_TCOFFEE/cache/"
export TMP_4_TCOFFEE="$HOME_4_TCOFFEE/tmp/"
export LOCKDIR_4_TCOFFEE="$HOME_4_TCOFFEE/lock/"
mkdir -p $CACHE_4_TCOFFEE
mkdir -p $TMP_4_TCOFFEE
mkdir -p $LOCKDIR_4_TCOFFEE

t_coffee -mode expresso -seq
/home/niklas/tcoffee/parallel/Consumer-2Q9FHL4_ARATH -blast_server=LOCAL
-pdb_db=pdbaa -outorder=input -output fasta_aln -quiet -no_warning
-outfile=/tmp/tmpm3mViZ
If I replace the t_coffee command by some simple 'touch I-<unique
ID>-was-here' the files are created as expected and no error is produced.
The developers of the external program assured me that running their
program in parallel should not be a problem if the environment variables
are set correctly. If a take the exact same bash scripts that are generated
by python and that failed when trying to run them in parallel through
python and execute batches of them manually using a for loop in multiple
terminals (i.e. in parallel) they don't produce an error.


I am really puzzled and stuck. Python seems to work correctly on its own
and the external program seems to work correctly on its own. But somehow,
when combined, they won't work.
Any help and hints would be really appreciated! I need that to work.

I am using Ubuntu 12.04 with python 2.7.3

Cheers,
Niklas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top