ANN: Sarge, a library wrapping the subprocess module,has been released.


V

Vinay Sajip

Sarge, a cross-platform library which wraps the subprocess module in
the standard library, has been released.

What does it do?
----------------

Sarge tries to make interfacing with external programs from your
Python applications easier than just using subprocess alone.

Sarge offers the following features:

* A simple way to run command lines which allows a rich subset of Bash-
style shell command syntax, but parsed and run by sarge so that you
can run on Windows without cygwin (subject to having those commands
available):
...
'foo\n'
'bar\n'

* The ability to format shell commands with placeholders, such that
variables are quoted to prevent shell injection attacks.

* The ability to capture output streams without requiring you to
program your own threads. You just use a Capture object and then you
can read from it as and when you want.

Advantages over subprocess
---------------------------

Sarge offers the following benefits compared to using subprocess:

* The API is very simple.

* It's easier to use command pipelines - using subprocess out of the
box often leads to deadlocks because pipe buffers get filled up.

* It would be nice to use Bash-style pipe syntax on Windows, but
Windows shells don't support some of the syntax which is useful, like
&&, ||, |& and so on. Sarge gives you that functionality on Windows,
without cygwin.

* Sometimes, subprocess.Popen.communicate() is not flexible enough for
one's needs - for example, when one needs to process output a line at
a time without buffering the entire output in memory.

* It's desirable to avoid shell injection problems by having the
ability to quote command arguments safely.

* subprocess allows you to let stderr be the same as stdout, but not
the other way around - and sometimes, you need to do that.

Python version and platform compatibility
-----------------------------------------

Sarge is intended to be used on any Python version >= 2.6 and is
tested on Python versions 2.6, 2.7, 3.1, 3.2 and 3.3 on Linux,
Windows, and Mac OS X (not all versions are tested on all platforms,
but sarge is expected to work correctly on all these versions on all
these platforms).

Finding out more
----------------

You can read the documentation at

http://sarge.readthedocs.org/

There's a lot more information, with examples, than I can put into
this post.

You can install Sarge using "pip install sarge" to try it out. The
project is hosted on BitBucket at

https://bitbucket.org/vinay.sajip/sarge/

And you can leave feedback on the issue tracker there.

I hope you find Sarge useful!

Regards,


Vinay Sajip
 
Ad

Advertisements

A

Anh Hai Trinh

Having written something with similar purpose (https://github.com/aht/extproc), here are my comments:

* Having command parsed from a string is complicated. Why not just have an OOP API to construct commands? extproc does this, but you opted to write a recursive descent parser. I'm sure it's fun but I think simple is better than complex. Most users would prefer not to deal with Python, not another language.

* Using threads and fork()ing process does not play nice together unless extreme care is taken. Disasters await. For a shell-like library, I would recommend its users to never use threads (so that those who do otherwise know what they are in for).
 
A

Anh Hai Trinh

Having written something with similar purpose (https://github.com/aht/extproc), here are my comments:

* Having command parsed from a string is complicated. Why not just have an OOP API to construct commands? extproc does this, but you opted to write a recursive descent parser. I'm sure it's fun but I think simple is better than complex. Most users would prefer not to deal with Python, not another language.

* Using threads and fork()ing process does not play nice together unless extreme care is taken. Disasters await. For a shell-like library, I would recommend its users to never use threads (so that those who do otherwise know what they are in for).
 
V

Vinay Sajip

Having written something with similar purpose (https://github.com/aht/extproc), here are my comments:

* Having command parsed from a string is complicated. Why not just have an OOP API to construct commands?

It's not hard for the user, and less work e.g. when migrating from an
existing Bash script. I may have put in the effort to use a recursive
descent parser under the hood, but why should the user of the library
care? It doesn't make their life harder. And it's not complicated, not
even particularly complex - such parsers are commonplace.
* Using threads and fork()ing process does not play nice together unless extreme care is taken. Disasters await.

By that token, disasters await if you ever use threads, unless you
know what you're doing (and sometimes even then). Sarge doesn't force
the use of threads with forking - you can do everything synchronously
if you want. The test suite does cover the particular case of thread
+fork. Do you have specific caveats, or is it just a "there be
dragons" sentiment? Sarge is still in alpha status; no doubt bugs will
surface, but unless a real show-stopper occurs, there's not much to be
gained by throwing up our hands.

BTW extproc is nice, but I wanted to push the envelope a little :)

Regards,

Vinay Sajip
 
A

Anh Hai Trinh

It's not hard for the user

I think most users like to use Python, or they'd use Bash. I think people prefer not another language that is different from both, and having little benefits. My own opinion of course.

Re. threads & fork(): http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

For a careful impl of fork-exec with threads, see http://golang.org/src/pkg/syscall/exec_unix.go
By that token, disasters await if you ever use threads, unless you know what you're doing

So don't, this package is mainly a fork-exec-wait library providing shell-like functionalities. Just use fork().
BTW extproc is nice, but I wanted to push the envelope a little :)

Hmm, if the extra "envelop" is the async code with threads that may deadlock, I would say "thanks but no thanks" :p

I do think that IO redirection is much nicer with extproc.
 
A

Anh Hai Trinh

For a careful impl of fork-exec with threads, see http://golang.org/src/pkg/syscall/exec_unix.go

I forgot to mention that this impl is indeed "correct" only because you cannot start thread or call fork() directly in the Go language, other than usegoroutines and the ForkExec() function implemented there. So all that locking is internal.

If you use threads and call fork(), you'll almost guaranteed to face with deadlocks. Perhaps not in a particular piece of code, but some others. Perhaps not on your laptop, but on the production machine with different kernels.. Like most race conditions, they will eventually show up.
 
Ad

Advertisements

V

Vinay Sajip

I think most users like to use Python, or they'd use Bash. I think peopleprefer not another language that is different from both, and having littlebenefits. My own opinion of course.

I have looked at pbs and clom: they Pythonify calls to external
programs by making spawning those look like function calls. There's
nothing wrong with that, it's just a matter of taste. I find that e.g.

wc(ls("/etc", "-1"), "-l")

is not as readable as

call(“ls /etc –1 | wc –l”)

and the attempt to Pythonify doesn't buy you much, IMO. Of course, it
is a matter of taste - I understand that there are people who will
prefer the pbs/clom way of doing things.
Re. threads & fork():http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-befo...

For a careful impl of fork-exec with threads, seehttp://golang.org/src/pkg/syscall/exec_unix.go

Thanks for the links. The first seems to me to be talking about the
dangers of locking and forking; if you don't use threads, you don't
need locks, so the discussion about locking only really applies in a
threading+forking scenario.

I agree that locking+forking can be problematic because the semantics
of what happens to the state of the locks and threads in the child
(for example, as mentioned in http://bugs.python.org/issue6721).
However, it's not clear that any problem occurs if the child just
execs a new program, overwriting the old - which is the case here. The
link you pointed to says that

"It seems that calling execve(2) to start another program is the only
sane reason you would like to call fork(2) in a multi-threaded
program."

which is what we're doing in this case. Even though it goes on to
mention the dangers inherent in inherited file handles, it also
mentions that these problems have been overcome in recent Linux
kernels, and the subprocess module does contain code to handle at
least some of these conditions (e.g. preexec_fn, close_fds keyword
arguments to subprocess.Popen).

Hopefully, if there are race conditions which emerge in the subprocess
code (as has happened in the past), they will be fixed (as has
happened in the past).
Hmm, if the extra "envelop" is the async code with threads that may deadlock, I would say "thanks but no thanks" :p

That is of course your privilege. I would hardly expect you to drop
extproc in favour of sarge. But there might be people who need to
tread in these dangerous waters, and hopefully sarge will make things
easier for them. As I said earlier, one doesn't *need* to use
asynchronous calls.

I agree that I may have to review the design decisions I've made,
based on feedback based on people actually trying the async
functionality out. I don't feel that shying away from difficult
problems without even trying to solve them is the best way of moving
things forward. What are the outcomes?

* Maybe people won't even try the async functionality (in which case,
they won't hit problems)

* They'll hit problems and just give up on the library (I hope not -
if I ever have a problem with a library I want to use, I always try
and engage with the developers to find a workaround or fix)

* They'll report problems which, on investigation, will turn out to be
fixable bugs - well and good

* The reported bugs will be unfixable for some reason, in which case
I'll just have to deprecate that functionality.

Remember, this is version 0.1 of the library, not version 1.0. I
expect to do some API and functionality tweaks based on feedback and
bugs which show up.
I do think that IO redirection is much nicer with extproc.

Again, a matter of taste. You feel that it's better to pass dicts
around in the public API where integer file handles map to other
handles or streams; I feel that using a Capture instance is less
fiddly for the user. Let a thousand flowers bloom, and all that.

I do thank you for the time you've taken to make these comments, and I
found the reading you pointed me to interesting. I will update the
sarge docs to point to the link on the Linux Programming blog, to make
sure people are informed of potential pitfalls.

Regards,

Vinay Sajip
 
V

Vinay Sajip

If you use threads and call fork(), you'll almost guaranteed to face withdeadlocks. Perhaps not in a particular piece of code, but some others. Perhaps not on your laptop, but on the production machine with different kernels. Like most race conditions, they will eventually show up.

You can hit deadlocks in multi-threaded programs even without the
fork(), can't you? In that situation, you either pin it down to a bug
in your code (and even developers experienced in writing multi-
threaded programs hit these), or a bug in the underlying library
(which can hopefully be fixed, but that applies to any bug you might
hit in any library you use, and is something you have to consider
whenever you use a library written by someone else), or an unfixable
problem (e.g. due to problems in the Python or C runtime) which
require a different approach. I understand your concerns, but you are
just a little further along the line from people who say "If you use
threads, you will have deadlock problems. Don't use threads." I'm not
knocking that POV - people need to use what they're comfortable with,
and to avoid things that make them uncomfortable. I'm not pushing the
async feature as a major advantage of the library - it's still useful
without that, IMO.

Regards,

Vinay Sajip
 
A

Anh Hai Trinh

I have looked at pbs and clom: they Pythonify calls to external
programs by making spawning those look like function calls. There's
nothing wrong with that, it's just a matter of taste. I find that e.g.

wc(ls("/etc", "-1"), "-l")

is not as readable as

call(“ls /etc –1 | wc –l”)

I don't disagree with it. But the solution is really easy, just call 'sh' and pass it a string!

No parser needed written!

Yes there is a danger of argument parsing and globs and all that. But people are aware of it. With string parsing, ambiguity is always there. Even when you have a BNF grammar, people easily make mistakes.
 
R

Rick Johnson

I think most users like to use Python, or they'd use Bash. I think peopleprefer not another language that is different from both, and having littlebenefits. My own opinion of course.

Objection! Does the defense REALLY expect this court to believe that
he can testify as to how MOST members of the Python community would or
would not favor bash over Python? And IF they do in fact prefer bash,
is this display of haughty arrogance nothing more than a hastily
stuffed straw-man presented to protect his own ego?
Hmm, if the extra "envelop" is the async code with threads that may deadlock, I would say "thanks but no thanks" :p

And why do you need to voice such strong opinions of disdain in an
announcement thread? Testing the integrity of a module (or module)
author is fine so long as we are respectful whilst doing so. However,
i must take exception with your crass attitude.
 
R

Rick Johnson

wc(ls("/etc", "-1"), "-l")

is not as readable as

call(“ls /etc –1 | wc –l”)

And i agree!

I remember a case where i was forced to use an idiotic API for
creating inputbox dialogs. Something like this:

prompts = ['Height', 'Width', 'Color']
values = [10, 20, Null]
options = [Null, Null, "Red|White|Blue"]
dlg(prompts, values, options)

....and as you can see this is truly asinine!

Later, someone "slightly more intelligent" wrapped this interface up
like this:

dlg = Ipb("Title")
dlg.add("Height")
dlg.add("Width", 39)
dlg.add("Color", ["Red", "White", "Blue"])
dl.show()

....and whilst i prefer this interface over the original, i new we
could make it better; because we had the technology!

dlg = Ipb(
"Title",
"Height=10",
"Width=20",
"Color=Red|Green|Blue",
)

Ahh... refreshing as a cold brew!
 
Ad

Advertisements

A

Anh Hai Trinh

Objection! Does the defense REALLY expect this court to believe that
he can testify as to how MOST members of the Python community would or
would not favor bash over Python? And IF they do in fact prefer bash,
is this display of haughty arrogance nothing more than a hastily
stuffed straw-man presented to protect his own ego?

Double objection! Relevance. The point is that the OP created another language that is neither Python nor Bash.
And why do you need to voice such strong opinions of disdain in an
announcement thread? Testing the integrity of a module (or module)
author is fine so long as we are respectful whilst doing so. However,
i must take exception with your crass attitude.

My respectful opinion is that the OP's approach is fundamentally flawed. There are many platform-specific issues when forking and threading are fused. My benign intent was to warn others about unsolved problems and scratching-your-head situations.

Obviously, the OP can always choose to continue his direction at his own discretion.
 
V

Vinay Sajip

I don't disagree with it. But the solution is really easy, just call 'sh'and pass it a string!


No parser needed written!

Yes there is a danger of argument parsing and globs and all that. But people are aware of it. With string parsing, ambiguity is always there. Even when you have a BNF grammar, people easily make mistakes.

You're missing a few points:

* The parser is *already* written, so there's no point worrying about
saving any effort.
* Your solution is to pass shell=True, which as you point out, can
lead to shell injection problems. To say "people are aware of it" is
glossing over it a bit - how come you don't say that when it comes to
locking+forking? ;-)
* I'm aiming to offer cross-platform functionality across Windows and
Posix. Your approach will require a lower common denominator, since
the Windows shell (cmd.exe) is not as flexible as Bash. For example -
no "echo foo; echo bar"; no "a && b" or "a || b" - these are not
supported by cmd.exe.
* Your comment about people making mistakes applies just as much if
someone passes a string with a Bash syntax error, to Bash, via your
sh() function. After all, Bash contains a parser, too. For instance:
/bin/sh: Syntax error: redirection unexpected
''

If you're saying there might be bugs in the parser, that's something
else - I'll address those as and when they turn up.

Regards,

Vinay Sajip
 
V

Vinay Sajip

Double objection! Relevance. The point is that the OP created another language that is neither Python nor Bash.

Triple objection! I think Rick's point was only that he didn't think
you were expressing the views of "most" people, which sort of came
across in your post.

To say I've created "another language" is misleading - it's just a
subset of Bash syntax, so you can do things like "echo foo; echo bar",
use "&&", "||" etc. (I used the Bash man page as my guide when
designing the parser.)

As an experiment on Windows, in a virtualenv, with GnuWin32 installed
on the path:

(venv) C:\temp>python
ActivePython 2.6.6.17 (ActiveState Software Inc.) based on
Python 2.6.6 (r266:84292, Nov 24 2010, 09:16:51) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\temp\venv\lib\site-packages\extproc.py", line 412, in sh
f = Sh(cmd, fd=fd, e=e, cd=cd).capture(1).stdout
File "C:\temp\venv\lib\site-packages\extproc.py", line 202, in
capture
p = subprocess.Popen(self.cmd, cwd=self.cd, env=self.env,
stdin=self.fd[0],
stdout=self.fd[1], stderr=self.fd[2])
File "C:\Python26\Lib\subprocess.py", line 623, in __init__
errread, errwrite)
File "C:\Python26\Lib\subprocess.py", line 833, in _execute_child
startupinfo)
WindowsError: [Error 3] The system cannot find the path specified

That's all from a single interactive session. So as you can see, my
use cases are a little different to yours, which in turn makes a
different approach reasonable.
My respectful opinion is that the OP's approach is fundamentally flawed. There are many platform-specific issues when forking and threading are fused. My benign intent was to warn others about unsolved problems and scratching-your-head situations.

Obviously, the OP can always choose to continue his direction at his own discretion.

I think you were right to bring up the forking+threading issue, but I
have addressed the points you made in this thread - please feel free
to respond to the points I made about the Linux Programming Blog
article. I've updated the sarge docs to point to that article, and
I've added a section on API stability to highlight the fact that the
library is in alpha status and that API changes may be needed based on
feedback.

I'm not being blasé about the issue - it's just that I don't want to
be too timid, either. Python does not proscribe using subprocess and
threads together, and the issues you mention could easily occur even
without the use of sarge. You might say that sarge makes it more
likely that the issues will surface - but it'll only do that if you
pass "a & b & c & d" to sarge, and not otherwise.

The other use of threads by sarge - to read output streams from child
processes - is no different from the stdlib usage of threads in
subprocess.Popen.communicate().

Possibly Rick was objecting to the tone of your comments, but I
generally disregard any tone that seems confrontational when the
benefit of the doubt can be given - on the Internet, you can never
take for granted, and have to make allowances for, the language style
of your interlocutor ... I think you meant well when you responded,
and I have taken your posts in that spirit.

Regards,

Vinay Sajip
 
J

Jean-Michel Pichavant

Vinay said:
Sarge, a cross-platform library which wraps the subprocess module in
the standard library, has been released.

What does it do?
----------------

Sarge tries to make interfacing with external programs from your
Python applications easier than just using subprocess alone.

Sarge offers the following features:

* A simple way to run command lines which allows a rich subset of Bash-
style shell command syntax, but parsed and run by sarge so that you
can run on Windows without cygwin (subject to having those commands
available):

...
'foo\n'
'bar\n'

* The ability to format shell commands with placeholders, such that
variables are quoted to prevent shell injection attacks.

* The ability to capture output streams without requiring you to
program your own threads. You just use a Capture object and then you
can read from it as and when you want.

Advantages over subprocess
---------------------------

Sarge offers the following benefits compared to using subprocess:

* The API is very simple.

* It's easier to use command pipelines - using subprocess out of the
box often leads to deadlocks because pipe buffers get filled up.

* It would be nice to use Bash-style pipe syntax on Windows, but
Windows shells don't support some of the syntax which is useful, like
&&, ||, |& and so on. Sarge gives you that functionality on Windows,
without cygwin.

* Sometimes, subprocess.Popen.communicate() is not flexible enough for
one's needs - for example, when one needs to process output a line at
a time without buffering the entire output in memory.

* It's desirable to avoid shell injection problems by having the
ability to quote command arguments safely.

* subprocess allows you to let stderr be the same as stdout, but not
the other way around - and sometimes, you need to do that.

Python version and platform compatibility
-----------------------------------------

Sarge is intended to be used on any Python version >= 2.6 and is
tested on Python versions 2.6, 2.7, 3.1, 3.2 and 3.3 on Linux,
Windows, and Mac OS X (not all versions are tested on all platforms,
but sarge is expected to work correctly on all these versions on all
these platforms).

Finding out more
----------------

You can read the documentation at

http://sarge.readthedocs.org/

There's a lot more information, with examples, than I can put into
this post.

You can install Sarge using "pip install sarge" to try it out. The
project is hosted on BitBucket at

https://bitbucket.org/vinay.sajip/sarge/

And you can leave feedback on the issue tracker there.

I hope you find Sarge useful!

Regards,


Vinay Sajip
Hi,

Thanks for sharing, I hope this one will be as successful as the logging
module, possibly integrated into a next version of subprocess.
I can't use it though, I'm still using a vintage 2.5 version :-/

JM
 
V

Vinay Sajip

I can't use it though, I'm still using a vintage 2.5 version :-/

That's a shame. I chose 2.6 as a baseline for this package, because I
need it to work on Python 2.x and 3.x with the same code base and
minimal work, and that meant supporting Unicode literals via "from
__future__ import unicode_literals".

I'm stuck on 2.5 with other projects, so I share your pain :-(

Regards,

Vinay Sajip
 
Ad

Advertisements

8

88888 Dihedral

Check PY2EXE, PYREX and PSYChO. I must use these packages
to relase commercial products with my own dll in c.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top