Piping processes works with 'shell = True' but not otherwise.


L

Luca Cerone

Hi everybody,
I am new to the group (and relatively new to Python)
so I am sorry if this issues has been discussed (although searching for topics in the group I couldn't find a solution to my problem).

I am using Python 2.7.3 to analyse the output of two 3rd parties programs that can be launched in a linux shell as:

program1 | program2

To do this I have written a function that pipes program1 and program2 (using subprocess.Popen) and the stdout of the subprocess, and a function that parses the output:

A basic example:

from subprocess import Popen, STDOUT, PIPE
def run():
p1 = Popen(['program1'], stdout = PIPE, stderr = STDOUT)
p2 = Popen(['program2'], stdin = p1.stdout, stdout = PIPE, stderr = STDOUT)
p1.stdout.close()
return p2.stdout


def parse(out):
for row in out:
print row
#do something else with each line
out.close()
return parsed_output


# main block here

pout = run()

parsed = parse(pout)

#--- END OF PROGRAM ----#

I want to parse the output of 'program1 | program2' line by line because the output is very large.

When running the code above, occasionally some error occurs (IOERROR: [Errno 0]). However this error doesn't occur if I code the run() function as:

def run():
p = Popen('program1 | program2', shell = True, stderr = STDOUT, stdout = PIPE)
return p.stdout

I really can't understand why the first version causes errors, while the second one doesn't.

Can you please help me understanding what's the difference between the two cases?

Thanks a lot in advance for the help,
Cheers, Luca
 
Ad

Advertisements

C

Chris Rebert

Hi everybody,
I am new to the group (and relatively new to Python)
so I am sorry if this issues has been discussed (although searching for
topics in the group I couldn't find a solution to my problem).
I am using Python 2.7.3 to analyse the output of two 3rd parties programs
that can be launched in a linux shell as:
program1 | program2

To do this I have written a function that pipes program1 and program2
(using subprocess.Popen) and the stdout of the subprocess, and a function
that parses the output:
A basic example:

from subprocess import Popen, STDOUT, PIPE
def run():
p1 = Popen(['program1'], stdout = PIPE, stderr = STDOUT)
p2 = Popen(['program2'], stdin = p1.stdout, stdout = PIPE, stderr =
STDOUT)

Could you provide the *actual* commands you're using, rather than the
generic "program1" and "program2" placeholders? It's *very* common for
people to get the tokenization of a command line wrong (see the Note box in
http://docs.python.org/2/library/subprocess.html#subprocess.Popen for some
relevant advice).
p1.stdout.close()
return p2.stdout


def parse(out):
for row in out:
print row
#do something else with each line
out.close()
return parsed_output


# main block here

pout = run()

parsed = parse(pout)

#--- END OF PROGRAM ----#

I want to parse the output of 'program1 | program2' line by line because the output is very large.

When running the code above, occasionally some error occurs (IOERROR:
[Errno 0]).

Could you provide the full & complete error message and exception traceback?
However this error doesn't occur if I code the run() function as:

def run():
p = Popen('program1 | program2', shell = True, stderr = STDOUT, stdout = PIPE)
return p.stdout

I really can't understand why the first version causes errors, while the second one doesn't.

Can you please help me understanding what's the difference between the
two cases?

One obvious difference between the 2 approaches is that the shell doesn't
redirect the stderr streams of the programs, whereas you /are/ redirecting
the stderrs to stdout in the non-shell version of your code. But this is
unlikely to be causing the error you're currently seeing.

You may also want to provide /dev/null as p1's stdin, out of an abundance
of caution.

Lastly, you may want to consider using a wrapper library such as
http://plumbum.readthedocs.org/en/latest/ , which makes it easier to do
pipelining and other such "fancy" things with subprocesses, while still
avoiding the many perils of the shell.

Cheers,
Chris
 
L

Luca Cerone

Could you provide the *actual* commands you're using, rather than the generic "program1" and "program2" placeholders? It's *very* common for people to get the tokenization of a command line wrong (see the Note box in http://docs.python.org/2/library/subprocess.html#subprocess.Popen for some relevant advice).Hi Chris, first of all thanks for the help. Unfortunately I can't provide the actual commands because are tools that are not publicly available.
I think I get the tokenization right, though.. the problem is not that the programs don't run.. it is just that sometimes I get that error..

Just to be clear I run the process like:

p = subprocess.Popen(['program1','--opt1','val1',...'--optn','valn'], ...the rest)

which I think is the right way to pass arguments (it works fine for other commands)..
Could you provide the full & complete error message and exception traceback?
yes, as soon as I get to my work laptop..
One obvious difference between the 2 approaches is that the shell doesn'tredirect the stderr streams of the programs, whereas you /are/ redirectingthe stderrs to stdout in the non-shell version of your code. But this is unlikely to be causing the error you're currently seeing.


You may also want to provide /dev/null as p1's stdin, out of an abundanceof caution.

I tried to redirect the output to /dev/null using the Popen argument:
'stdin = os.path.devnull' (having imported os of course)..
But this seemed to cause even more troubles...
Lastly, you may want to consider using a wrapper library such as http://plumbum.readthedocs.org/en/latest/ , which makes it easier to do pipelining and other such "fancy" things with subprocesses, while still avoiding the many perils of the shell.
Thanks, I didn't know this library, I'll give it a try.
Though I forgot to mention that I was using the subprocess module, because I want the code to be portable (even though for now if it works in Unix platform is OK).

Thanks a lot for your help,
Cheers,
Luca
 
C

Carlos Nepomuceno

pipes usually consumes disk storage at '/tmp'. Are you sure you have enoughroom on that filesystem? Make sure no other processes are competing against for that space. Just my 50c because I don't know what's causing Errno 0. I don't even know what are the possible causes of such error. Good luck!

----------------------------------------
Date: Sun, 26 May 2013 16:58:57 -0700
Subject: Re: Piping processes works with 'shell = True' but not otherwise.
From: (e-mail address removed)
To: (e-mail address removed) [...]
I tried to redirect the output to /dev/null using the Popen argument:
'stdin = os.path.devnull' (having imported os of course)..
But this seemed to cause even more troubles...
Lastly, you may want to consider using a wrapper library such as http://plumbum.readthedocs.org/en/latest/ , which makes it easier to do pipelining and other such "fancy" things with subprocesses, while still avoiding the many perils of the shell.
Thanks, I didn't know this library, I'll give it a try.
Though I forgot to mention that I was using the subprocess module, because I want the code to be portable (even though for now if it works in Unixplatform is OK).

Thanks a lot for your help,
Cheers,
Luca
 
L

Luca Cerone

Will it violate privacy / NDA to post the command line? Even if we

can't actually replicate your system, we may be able to see something

from the commands given.

Unfortunately yes..
 
Ad

Advertisements

C

Chris Rebert

Hi Chris, first of all thanks for the help. Unfortunately I can't provide the actual commands because are tools that are not publicly available.
I think I get the tokenization right, though.. the problem is not that the programs don't run.. it is just that sometimes I get that error..

Just to be clear I run the process like:

p = subprocess.Popen(['program1','--opt1','val1',...'--optn','valn'], ... the rest)

which I think is the right way to pass arguments (it works fine for other commands)..
You may also want to provide /dev/null as p1's stdin, out of an abundance of caution.

I tried to redirect the output to /dev/null using the Popen argument:
'stdin = os.path.devnull' (having imported os of course)..
But this seemed to cause even more troubles...

That's because stdin/stdout/stderr take file descriptors or file
objects, not path strings.

Cheers,
Chris
 
T

Thomas Rachel

Am 27.05.2013 02:14 schrieb Carlos Nepomuceno:
pipes usually consumes disk storage at '/tmp'.

Good that my pipes don't know about that.

Why should that happen?


Thomas
 
C

Carlos Nepomuceno

----------------------------------------
From: (e-mail address removed)
Subject: Re: Piping processes works with 'shell = True' but not otherwise.
Date: Wed, 29 May 2013 19:39:40 +0200
To: (e-mail address removed)

Am 27.05.2013 02:14 schrieb Carlos Nepomuceno:

Good that my pipes don't know about that.

Why should that happen?


Thomas

Ooops! My mistake! We've been using 'tee' when in debugging mode and I though that would apply to this case. Nevermind!
 
C

Cameron Simpson

| Am 27.05.2013 02:14 schrieb Carlos Nepomuceno:
| >pipes usually consumes disk storage at '/tmp'.
|
| Good that my pipes don't know about that.
| Why should that happen?

It probably doesn't on anything modern. On V7 UNIX at least there
was a kernel notion of the "pipe fs", where pipe storage existed;
usually /tmp; using small real (but unnamed) files is an easy way
to implement them, especially on systems where RAM is very small
and without a paging VM - for example, V7 UNIX ran on PDP-11s amongst
other things. And files need a filesystem.

But even then pipes are still small fixed length buffers; they don't
grow without bound as you might have inferred from the quoted
statement.

Cheers,
 
L

Luca Cerone

That's because stdin/stdout/stderr take file descriptors or file

objects, not path strings.

Thanks Chris, how do I set the file descriptor to /dev/null then?
 
Ad

Advertisements

P

Peter Otten

Luca said:
Thanks Chris, how do I set the file descriptor to /dev/null then?

For example:

with open(os.devnull, "wb") as stderr:
p = subprocess.Popen(..., stderr=stderr)
...


In Python 3.3 and above:

p = subprocess.Popen(..., stderr=subprocess.DEVNULL)
 
L

Luca Cerone

thanks and what about python 2.7?
In Python 3.3 and above:



p = subprocess.Popen(..., stderr=subprocess.DEVNULL)

P.s. sorry for the late reply, I discovered I don't receive notifications from google groups..
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top