How do subprocess.Popen("ls | grep foo", shell=True) withshell=False?

C

Chris Seberino

How do subprocess.Popen("ls | grep foo", shell=True) with shell=False?

Does complex commands with "|" in them mandate shell=True?

cs
 
C

Chris Rebert

How do subprocess.Popen("ls | grep foo", shell=True) with shell=False?

I would think:

from subprocess import Popen, PIPE
ls = Popen("ls", stdout=PIPE)
grep = Popen(["grep", "foo"], stdin=ls.stdout)

Cheers,
Chris
 
N

Nobody

How do subprocess.Popen("ls | grep foo", shell=True) with shell=False?

The same way that the shell does it, e.g.:

from subprocess import Popen, PIPE
p1 = Popen("ls", stdout=PIPE)
p2 = Popen(["grep", "foo"], stdin=p1.stdout, stdout = PIPE)
p1.stdout.close()
result = p2.communicate()[0]
p1.wait()

Notes:

Without the p1.stdout.close(), if the reader (grep) terminates before
consuming all of its input, the writer (ls) won't terminate so long as
Python retains the descriptor corresponding to p1.stdout. In this
situation, the p1.wait() will deadlock.

The communicate() method wait()s for the process to terminate. Other
processes need to be wait()ed on explicitly, otherwise you end up with
zombies" (labelled said:
Does complex commands with "|" in them mandate shell=True?

No.

Also, "ls | grep" may provide a useful tutorial for the subprocess module,
but if you actually need to enumerate files, use e.g. os.listdir/os.walk()
and re.search/fnmatch, or glob. Spawning child processes to perform tasks
which can easily be performed in Python is inefficient (and often creates
unnecessary portability issues).
 
C

Chris Seberino

Without the p1.stdout.close(), if the reader (grep) terminates before
consuming all of its input, the writer (ls) won't terminate so long as
Python retains the descriptor corresponding to p1.stdout. In this
situation, the p1.wait() will deadlock.

The communicate() method wait()s for the process to terminate. Other
processes need to be wait()ed on explicitly, otherwise you end up with
"zombies" (labelled "<defunct>" in the output from "ps").

You are obviously very wise on such things. I'm curious if this
deadlock issue is a rare event since I'm grep (hopefully) would rarely
terminate before consuming all its input.

Even if zombies are created, they will eventually get dealt with my OS
w/o any user intervention needed right?

I'm just trying to verify the naive solution of not worrying about
these deadlock will still be ok and handled adequately by os. :)

cs
 
L

Lie Ryan

Spawning child processes to perform tasks
which can easily be performed in Python is inefficient

Not necessarily so, recently I wrote a script which takes a blink of an
eye when I pipe through cat/grep to prefilter the lines before doing
further complex filtering in python; however when I eliminated the
cat/grep subprocess and rewrite it in pure python, what was done in a
blink of an eye turns into ~8 seconds (not much to fetter around, but it
shows that using subprocess can be faster). I eventually optimized a
couple of things and reduced it to ~1.5 seconds, up to which, I stopped
since to go even faster would require reading by larger chunks,
something which I don't really want to do.

The task was to take a directory of ~10 files, each containing thousands
of short lines (~5-10 chars per line on average) and count the number of
lines which match a certain criteria, a very typical script job, however
the overhead of reading the files line-by-line in pure python can be
straining (you can read in larger chunks, but that's not the point,
eliminating grep may not come for free).
 
N

Nobody

You are obviously very wise on such things. I'm curious if this
deadlock issue is a rare event since I'm grep (hopefully) would rarely
terminate before consuming all its input.

That depends; it might never start (missing grep, missing shared
library), segfault, terminate due to a signal, etc. Also, the program
might later be modified to use "grep -m <count> ..." which will terminate
after finding said:
Even if zombies are created, they will eventually get dealt with my OS
w/o any user intervention needed right?

They will persist until the parent either wait()s for them (I think that
this will happen if the process gets garbage-collected) or terminates. For
short-lived processes, you can forget about them; for long-lived
processes, they need to be dealt with.
I'm just trying to verify the naive solution of not worrying about
these deadlock will still be ok and handled adequately by os. :)

Deadlock is deadlock. If you wait() on the child while it's blocked
waiting for your Python program to consume its output, the wait() will
block forever.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top