porting shell scripts: system(list), system_pipe(lists)

E

eichin

One of my recent projects has involved taking an accretion of sh and
perl scripts and "doing them right" - making them modular, improving
the error reporting, making it easier to add even more features to
them. "Of course," I'm redoing them in python - much of the cut&paste
reuse has become common functions, which then get made more robust and
have a common style and are callable from other (python) tools
directly, instead of having to exec scripts to get at them. The usual
"glorious refactoring."

Most of it has been great - os.listdir+i.endswith() instead of
globbing, exception handling instead of "exit 1", that sort of thing.
I've run into one weakness, though: executing programs.

Python has, of course, os.fork and os.exec* corresponding to the raw
unix functions. It also has the higher level os.system, popen,
expect, and commands.get* functions. The former need a bunch of
stylized operations performed; the latter *all* involve passing in
strings which then leads one to quoting issues, which can be serious
risks in some applications.

Perl had one very helpful interface for this kind of thing: system and
exec will both take array arguments:
$ perl -e 'system("echo", "*")'
*
$ perl -e 'exec("echo", "*")'
*
versus
$ perl -e 'exec("echo *")'
#.newsrc-dribble# CVS stuff ...
This has always struck me as "correct" - not the overloading,
necessarily, but the use of a list.

So, implementing system this way is easy enough:

def system(cmd):
pid = os.fork()
if pid > 0:
p, st = os.waitpid(pid, os.P_WAIT)
if st == 0:
return
raise ExecFailed(str(cmd), st)
elif pid == 0:
try:
os.execvp(cmd[0], cmd)
except OSError, e:
traceback.print_exc()
os._exit(113)

[The try/except is an interesting issue: if cmd[0] isn't found,
os.execvp throws -- but it is already in the child, and this walks up
the stack to any surrounding try/except, which then continues,
possibly disastrously, whatever that code had been doing *in a
duplicate process*. The _exit explicitly short cuts this.]

So, this makes a big difference when porting simple bits of shell (and
usually, just in passing, fixing quoting bugs - if you had code that
used to do "ci -l $foo" and it is now "system(['ci', '-l', foo])"
you now properly handle spaces and punctuation in the value of foo,
"for free".) However, the other thing you tend to find in
"advanced"[1] shell scripts is lengthy pipelines. (Sure, you find
while loops and case statements and such - but python's control
structures handle those fine.)

Implementing pipelines takes rather a bit more work, and one might
(not unreasonably) throw up one's hands and just use os.system and
some re.sub's to do the quoting. However, I had enough cases where
the goal really was to run a complex shell pipeline (I also had cases
where the pipeline converted nicely to some inline python code,
especially with the help of the gzip module) that I sat down and
cooked up a pipeline class.

The interface I ended up with is pretty simple:
g_pipe = pipeline()
g_pipe.stdin(open("blort.gz", "r"))
g_pipe.append(["gunzip"])
g_pipe.append(["sort", "-u"])
g_pipe.append(["wc", "-l"])
g_pipe.stdout(open("blort.count", "w"))
print g_pipe.run()

is equivalent to the sh:
gunzip < blort.gz | sort -u | wc -l > blort.count

pipeline also has obvious stderr and chdir methods; pipeline.run
actually returns an array with the return status of *each* pipeline
element (which leads to "if filter(None, st): deal_with_error" being a
useful idiom for noticing failures that a shell script would typically
miss.)

This has lead me to a few questions:

1. Am I being dense? Are there already common modules (included or
otherwise) that do this, or solve the problem some other way?
2. Is there a more pythonic way of expressing the construction?
Would exposing the internal array of commands make more sense,
possibly by "passing through" various array operations on the
class to the internal array (as the use of "append" hints at)? Or
maybe "exec" objects that a "pipe" combiner operates on?
3. Should an interface like this be in a "battery" somewhere? shutil
didn't seem to quite match...
4. Any reason to even try porting this interface to non-unix systems?
Is there a close enough match to os.pipe/os.fork/os.exec/os.wait,
or some other construct that works on microsoft platforms?

_Mark_ <[email protected]>

[1] in the Invader Zim sense :)
 
D

Donald 'Paddy' McCarthy

One of my recent projects has involved taking an accretion of sh and
perl scripts and "doing them right" - making them modular, improving
the error reporting, making it easier to add even more features to
them. "Of course," I'm redoing them in python - much of the cut&paste
reuse has become common functions, which then get made more robust and
have a common style and are callable from other (python) tools
directly, instead of having to exec scripts to get at them. The usual
"glorious refactoring."
Implementing pipelines takes rather a bit more work, and one might
(not unreasonably) throw up one's hands and just use os.system and
some re.sub's to do the quoting. However, I had enough cases where
the goal really was to run a complex shell pipeline (I also had cases
where the pipeline converted nicely to some inline python code,
especially with the help of the gzip module) that I sat down and
cooked up a pipeline class.

The interface I ended up with is pretty simple:
g_pipe = pipeline()
g_pipe.stdin(open("blort.gz", "r"))
g_pipe.append(["gunzip"])
g_pipe.append(["sort", "-u"])
g_pipe.append(["wc", "-l"])
g_pipe.stdout(open("blort.count", "w"))
print g_pipe.run()

is equivalent to the sh:
gunzip < blort.gz | sort -u | wc -l > blort.count
_Mark_ <[email protected]>

[1] in the Invader Zim sense :)

I think that your pipeline code looks nothing like the original sh
script pipeline which to me counts heavily against it.
Just playing at the cygwin prompt...
$ ls -l|wc -l > /tmp/lines_in_dir
$ cat /tmp/lines_in_dir
465
$ python 463
0

I prefer the above because it looks like the original sh command.
Of course, if script security is very important then you may want to
change the way things are implemented again.

Cheers, Paddy.
 
D

Donn Cave

Quoth (e-mail address removed):
....
| 1. Am I being dense? Are there already common modules (included or
| otherwise) that do this, or solve the problem some other way?

I can't tell you whether any of them has come to be common, but
there have been a handful of efforts along these lines - process
and pipeline creation.

| 2. Is there a more pythonic way of expressing the construction?
| Would exposing the internal array of commands make more sense,
| possibly by "passing through" various array operations on the
| class to the internal array (as the use of "append" hints at)? Or
| maybe "exec" objects that a "pipe" combiner operates on?

Only thing that comes to mind is error handling. It certainly is
not characteristic of Python functions to return an error status,
rather they typically raise exceptions. Ideally, I would think
the exception type for this would carry the exit status, other
information in the status word, and text from error/diagnostic
output. That last one is particularly important and particularly
awkward to get.

See appended example for a trick to deal with the special case
where a Python exception is caught in the fork.

| 3. Should an interface like this be in a "battery" somewhere? shutil
| didn't seem to quite match...

No one ever likes anyone else's version of this, so it's typically
reinvented as required.

| 4. Any reason to even try porting this interface to non-unix systems?
| Is there a close enough match to os.pipe/os.fork/os.exec/os.wait,
| or some other construct that works on microsoft platforms?

There's os.spawnv, if you haven't noticed that.

Donn Cave, (e-mail address removed)
-----------
import fcntl
import posix
import sys
import pickle

def spawn_wnw(wait, file, args, env):
p0, p1 = posix.pipe()
pid = posix.fork()
if pid:
posix.close(p1)
ps = posix.read(p0, 1024)
posix.close(p0)
if wait:
junk, ret = posix.waitpid(pid, 0)
else:
ret = pid
if ps:
e, v = pickle.loads(ps)
raise e, v
else:
return ret
else:
try:
fcntl.fcntl(p1, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
posix.close(p0)
posix.execve(file, args, env)
except:
e, v, t = sys.exc_info()
s = pickle.dumps((e, v))
posix.write(p1, s)
posix._exit(117)

def spawnw(file, args, env):
spawn_wnw(1, file, args, env)

def spawn(file, args, env):
spawn_wnw(0, file, args, env)

pid = spawn('/bin/bummer', ['bummer', '-ever', 'summer'], posix.environ)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top