how to simulate tar filename substitution across pipedsubprocess.Popen() calls?

J

jkn

Hi All
i am trying to build up a set of subprocess.Ponen calls to
replicate the effect of a horribly long shell command. I'm not clear
how I can do one part of this and wonder if anyone can advise. I'm on
Linux, fairly obviously.

I have a command which (simplified) is a tar -c command piped through
to xargs:

tar -czvf myfile.tgz -c $MYDIR mysubdir/ | xargs -I '{}' sh -c "test -
f $MYDIR/'{}'"

(The full command is more complicated than this; I got it from a shell
guru).

IIUC, when called like this, the two occurences of '{}' in the xargs
command will get replaced with the file being added to the tarfile.

Also IIUC, I will need two calls to subprocess.Popen() and use
subprocess.stdin on the second to receive the output from the first.
But how can I achive the substitution of the '{}' construction across
these two calls?

Apologies if I've made any howlers in this description - it's very
likely...

Cheers
J^n
 
H

Hans Mulder

Hi All
i am trying to build up a set of subprocess.Ponen calls to
replicate the effect of a horribly long shell command. I'm not clear
how I can do one part of this and wonder if anyone can advise. I'm on
Linux, fairly obviously.

I have a command which (simplified) is a tar -c command piped through
to xargs:

tar -czvf myfile.tgz -c $MYDIR mysubdir/ | xargs -I '{}' sh -c "test -
f $MYDIR/'{}'"

(The full command is more complicated than this; I got it from a shell
guru).

IIUC, when called like this, the two occurences of '{}' in the xargs
command will get replaced with the file being added to the tarfile.

Also IIUC, I will need two calls to subprocess.Popen() and use
subprocess.stdin on the second to receive the output from the first.
But how can I achive the substitution of the '{}' construction across
these two calls?

That's what 'xargs' will do for you. All you need to do, is invoke
xargs with arguments containing '{}'. I.e., something like:

cmd1 = ['tar', '-czvf', 'myfile.tgz', '-c', mydir, 'mysubdir']
first_process = subprocess.Popen(cmd1, stdout=subprocess.PIPE)

cmd2 = ['xargs', '-I', '{}', 'sh', '-c', "test -f %s/'{}'" % mydir]
second_process = subprocess.Popen(cmd2, stdin=first_process.stdout)
Apologies if I've made any howlers in this description - it's very
likely...

I think the second '-c' argument to tar should have been a '-C'.

I'm not sure I understand what the second command is trying to
achieve. On my system, nothing happens, because tar writes the
names of the files it is adding to stderr, so xargs receives no
input at all. If I send the stderr from tar to the stdin of
xargs, then it still doesn't seem to do anything sensible.

Perhaps your real xargs command is more complicated and more
sensible.



Hope this helps,

-- HansM
 
J

jkn

Hi Hans
thanks a lot for your reply:
That's what 'xargs' will do for you.  All you need to do, is invoke
xargs with arguments containing '{}'.  I.e., something like:

cmd1 = ['tar', '-czvf', 'myfile.tgz', '-c', mydir, 'mysubdir']
first_process = subprocess.Popen(cmd1, stdout=subprocess.PIPE)

cmd2 = ['xargs', '-I', '{}', 'sh', '-c', "test -f %s/'{}'" % mydir]
second_process = subprocess.Popen(cmd2, stdin=first_process.stdout)

Hmm - that's pretty much what I've been trying. I will have to
experiment a bit more and post the results in a bit more detail.
I think the second '-c' argument to tar should have been a '-C'.

You are correct, thanks. Serves me right for typing the simplified
version in by hand. I actually use the equivalent "--directory=..." in
the actual code.
I'm not sure I understand what the second command is trying to
achieve.  On my system, nothing happens, because tar writes the
names of the files it is adding to stderr, so xargs receives no
input at all.  If I send the stderr from tar to the stdin of
xargs, then it still doesn't seem to do anything sensible.

That's interesting ... on my system, and all others that I know about,
the file list goes to stdout.
Perhaps your real xargs command is more complicated and more
sensible.

Yes, in fact the output from xargs is piped to a third process. But I
realise this doesn't alter the result of your experiment; the xargs
process should filter a subset of the files being fed to it.

I will experiment a bit more and hopefully post some results. Thanks
in the meantime...

Regards
Jon N
 
J

jkn

slight followup ...

I have made some progress; for now I'm using subprocess.communicate to
read the output from the first subprocess, then writing it into the
secodn subprocess. This way I at least get to see what is
happening ...

The reason 'we' weren't seeing any output from the second call (the
'xargs') is that as mentioned I had simplified this. The actual shell
command was more like (in python-speak):

"xargs -I {} sh -c \"test -f %s/{} && md5sum %s/{}\"" % (mydir, mydir)

ie. I am running md5sum on each tar-file entry which passes the 'is
this a file' test.

My next problem; how to translate the command-string clause

"test -f %s/{} && md5sum %s/{}" # ...

into s parameter to subprocss.Popen(). I think it's the command
chaining '&&' which is tripping me up...

Cheers
J^n
 
H

Hans Mulder

slight followup ...

I have made some progress; for now I'm using subprocess.communicate to
read the output from the first subprocess, then writing it into the
secodn subprocess. This way I at least get to see what is
happening ...

The reason 'we' weren't seeing any output from the second call (the
'xargs') is that as mentioned I had simplified this. The actual shell
command was more like (in python-speak):

"xargs -I {} sh -c \"test -f %s/{} && md5sum %s/{}\"" % (mydir, mydir)

ie. I am running md5sum on each tar-file entry which passes the 'is
this a file' test.

My next problem; how to translate the command-string clause

"test -f %s/{} && md5sum %s/{}" # ...

into s parameter to subprocss.Popen(). I think it's the command
chaining '&&' which is tripping me up...

It is not really necessary to translate the '&&': you can
just write:

"test -f '%s/{}' && md5sum '%s/{}'" % (mydir, mydir)

, and xargs will pass that to the shell, and then the shell
will interpret the '&&' for you: you have shell=False in your
subprocess.Popen call, but the arguments to xargs are -I {}
sh -c "....", and this means that xargs ends up invoking the
shell (after replacing the {} with the name of a file).

Alternatively, you could translate it as:

"if [ -f '%s/{}' ]; then md5sum '%s/{}'; fi" % (mydir, mydir)

; that might make the intent clearer to whoever gets to
maintain your code.


Hope this helps,

-- HansM
 
R

Rebelo

Dana Äetvrtak, 8. studenoga 2012. 19:05:12 UTC+1, korisnik jkn napisaoje:
Hi All

i am trying to build up a set of subprocess.Ponen calls to

replicate the effect of a horribly long shell command. I'm not clear

how I can do one part of this and wonder if anyone can advise. I'm on

Linux, fairly obviously.

J^n

You should try to do it in pure python, avoiding shell altogether.
The first step would be to actually write what it is you want to do.

To filter files you want to add to tar file check tarfile (http://docs.python.org/2/library/tarfile.html?highlight=tar#module-tarfile),
specifically :
TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
which takes filter paramter :
"If filter is specified it must be a function that takes a TarInfo object argument and returns the changed TarInfo object. If it instead returns None the TarInfo object will be excluded from the archive."
 
J

jkn

Hi Hans

slight followup ...
I have made some progress; for now I'm using subprocess.communicate to
read the output from the first subprocess, then writing it into the
secodn subprocess. This way I at least get to see what is
happening ...
The reason 'we' weren't seeing any output from the second call (the
'xargs') is that as mentioned I had simplified this. The actual shell
command was more like (in python-speak):
"xargs -I {} sh -c \"test -f %s/{} && md5sum %s/{}\"" % (mydir, mydir)
ie. I am running md5sum on each tar-file entry which passes the 'is
this a file' test.
My next problem; how to translate the command-string clause
    "test -f %s/{} && md5sum %s/{}" # ...
into s parameter to subprocss.Popen(). I think it's the command
chaining '&&' which is tripping me up...

It is not really necessary to translate the '&&': you can
just write:

    "test -f '%s/{}' && md5sum '%s/{}'" % (mydir, mydir)

, and xargs will pass that to the shell, and then the shell
will interpret the '&&' for you: you have shell=False in your
subprocess.Popen call, but the arguments to xargs are -I {}
sh -c "....", and this means that xargs ends up invoking the
shell (after replacing the {} with the name of a file).

Alternatively, you could translate it as:

    "if [ -f '%s/{}' ]; then md5sum '%s/{}'; fi" % (mydir, mydir)

; that might make the intent clearer to whoever gets to
maintain your code.

Yes to both points; turns out that my problem was in building up the
command sequence to subprocess.Popen() - when to use, and not use,
quotes etc. It has ended up as (spelled out in longhand...)


xargsproc = ['xargs']

xargsproc.append('-I')
xargsproc.append("{}")

xargsproc.append('sh')
xargsproc.append('-c')

xargsproc.append("test -f %s/{} && md5sum %s/{}" % (mydir,
mydir))


As usual, breaking it all down for the purposes of clarification has
helpd a lot, as has your input. Thanks a lot.

Cheers
Jon N
 
J

jkn

Dana Äetvrtak, 8. studenoga 2012. 19:05:12 UTC+1, korisnik jkn napisao je:







You should try to do it in pure python, avoiding shell altogether.
The first step would be to actually write what it is you want to do.

Hi Rebelo
FWIW I intend to do exactly this - but I wanted to duplicate the
existing shell action beforehand, so that I could get rid of the shell
command.

After I've tidied things up, that will be my next step.

Cheers
Jon N
 
H

Hans Mulder

Hi Hans

slight followup ...
I have made some progress; for now I'm using subprocess.communicate to
read the output from the first subprocess, then writing it into the
secodn subprocess. This way I at least get to see what is
happening ...
The reason 'we' weren't seeing any output from the second call (the
'xargs') is that as mentioned I had simplified this. The actual shell
command was more like (in python-speak):
"xargs -I {} sh -c \"test -f %s/{} && md5sum %s/{}\"" % (mydir, mydir)
ie. I am running md5sum on each tar-file entry which passes the 'is
this a file' test.
My next problem; how to translate the command-string clause
"test -f %s/{} && md5sum %s/{}" # ...
into s parameter to subprocss.Popen(). I think it's the command
chaining '&&' which is tripping me up...

It is not really necessary to translate the '&&': you can
just write:

"test -f '%s/{}' && md5sum '%s/{}'" % (mydir, mydir)

, and xargs will pass that to the shell, and then the shell
will interpret the '&&' for you: you have shell=False in your
subprocess.Popen call, but the arguments to xargs are -I {}
sh -c "....", and this means that xargs ends up invoking the
shell (after replacing the {} with the name of a file).

Alternatively, you could translate it as:

"if [ -f '%s/{}' ]; then md5sum '%s/{}'; fi" % (mydir, mydir)

; that might make the intent clearer to whoever gets to
maintain your code.

Yes to both points; turns out that my problem was in building up the
command sequence to subprocess.Popen() - when to use, and not use,
quotes etc. It has ended up as (spelled out in longhand...)


xargsproc = ['xargs']

xargsproc.append('-I')
xargsproc.append("{}")

xargsproc.append('sh')
xargsproc.append('-c')

xargsproc.append("test -f %s/{} && md5sum %s/{}" % (mydir,
mydir))

This will break if there are spaces in the file name, or other
characters meaningful to the shell. If you change if to

xargsproc.append("test -f '%s/{}' && md5sum '%s/{}'"
% (mydir, mydir))

, then it will only break if there are single quotes in the file name.

As I understand, your plan is to rewrite this bit in pure Python, to
get rid of any and all such problems.
As usual, breaking it all down for the purposes of clarification has
helpd a lot, as has your input. Thanks a lot.

You're welcome.

-- HansM
 
J

jkn

Hi Hans

[...]
This will break if there are spaces in the file name, or other
characters meaningful to the shell.  If you change if to

        xargsproc.append("test -f '%s/{}' && md5sum '%s/{}'"
                             % (mydir, mydir))

, then it will only break if there are single quotes in the file name.

Fair point. As it happens, I know that there are no 'unhelpful'
characters in the filenames ... but it's still worth doing.
As I understand, your plan is to rewrite this bit in pure Python, to
get rid of any and all such problems.

Yep - as mentioned in another reply I wanted first to have something
which duplicated the current action (which has taken longer than I
expected), and then rework in a more pythonic way.

Still, I've learned some things about the subprocess module, and also
about the shell, so it's been far from wasted time.

Regards
Jon N
 
T

Thomas Rachel

Am 09.11.2012 02:12 schrieb Hans Mulder:
That's what 'xargs' will do for you. All you need to do, is invoke
xargs with arguments containing '{}'. I.e., something like:

cmd1 = ['tar', '-czvf', 'myfile.tgz', '-c', mydir, 'mysubdir']
first_process = subprocess.Popen(cmd1, stdout=subprocess.PIPE)

cmd2 = ['xargs', '-I', '{}', 'sh', '-c', "test -f %s/'{}'" % mydir]
second_process = subprocess.Popen(cmd2, stdin=first_process.stdout)

After launching second_process, it might be useful to
firstprocess.stdout.close(). If you fail to do so, your process is a
second reader which might break things apart.

At least, I once hat issues with it; I currently cannot recapitulate
what these were nor how they could arise; maybe there was just the open
file descriptor which annoyed me.


Thomas
 
T

Thomas Rachel

Am 12.11.2012 19:30 schrieb Hans Mulder:
This will break if there are spaces in the file name, or other
characters meaningful to the shell. If you change if to

xargsproc.append("test -f '%s/{}'&& md5sum '%s/{}'"
% (mydir, mydir))

, then it will only break if there are single quotes in the file name.

And if you do mydir_q = mydir.replace("'", "'\\''") and use mydir_q, you
should be safe...


Thomas
 
H

Hans Mulder

Am 12.11.2012 19:30 schrieb Hans Mulder:


And if you do mydir_q = mydir.replace("'", "'\\''") and use mydir_q, you
should be safe...

The problem isn't single quotes in mydir, but single quotes in the
files names that 'tar' generates and 'xargs' consumes. In the shell
script, these names go directly from tar to xargs via a pipe. If the
OP wants to do your replace, his script would have to read the output
of tar and do the replace before passing the filenames down a second
pipe to xargs.

However, once he does that, it's simpler to cut out xargs and invoke
"sh" directly. Or even cut out "sh" and "test" and instead use
os.path.isfile and then call md5sum directly. And once he does that,
he no longer needs to worry about single quotes.

The OP has said, he's going to d all that. One step at a time.
That sounds like a sensible plan to me.


Hope this helps,

-- HansM
 
J

jkn

Hi Hans

[...]
However, once he does that, it's simpler to cut out xargs and invoke

"sh" directly. Or even cut out "sh" and "test" and instead use

os.path.isfile and then call md5sum directly. And once he does that,

he no longer needs to worry about single quotes.

Yes indeed, using os.path.isfile() and them md5sum directly is my plan ... for reasons of maintainability (by myself) more than anything else.
The OP has said, he's going to d all that. One step at a time.

That sounds like a sensible plan to me.

Thanks a lot.

J^n
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,052
Latest member
LucyCarper

Latest Threads

Top