codecs / subprocess interaction: utf help requested

S

smitty1e

The first print statement does what you'd expect.
The second print statement has rather a lot of rat in it.
The goal here is to write a function that will return the man page for
some command (mktemp used as a short example here) as text to client
code, where the groff markup will be chopped to extract all of the
command options. Those options will eventually be used within an
emacs mode, all things going swimmingly.
I don't know what's going on with the piping in the second version.
It looks like the output of p0 gets converted to unicode at some
point, but I might be misunderstanding what's going on. The 4.8
codecs module documentation doesn't really offer much enlightment,
nor google. About the only other place I can think to look would be
the unit test cases shipped with python.
Sort of hoping one of the guru-level pythonistas can point to
illumination, or write something to help out the next chap. This
might be one of those catalytic questions, the answer to which tackles
five other questions you didn't really know you had.
Thanks,
Chris
---------------------------
#!/usr/bin/python
import subprocess

p = subprocess.Popen(["bzip2", "-c", "-d", "/usr/share/man/man1/mktemp.
1.bz2"]
, stdout=subprocess.PIPE)
stdout, stderr = p.communicate()
print stdout


p0 = subprocess.Popen(["cat","/usr/share/man/man1/mktemp.1.bz2"],
stdout=subprocess.PIPE)
p1 = subprocess.Popen(["bzip2"], stdin=p0.stdout ,
stdout=subprocess.PIPE)
stdout, stderr = p1.communicate()
print stdout
---------------------------
 
J

John Machin

The first print statement does what you'd expect.
The second print statement has rather a lot of rat in it.
The goal here is to write a function that will return the man page for
some command (mktemp used as a short example here) as text to client
code, where the groff markup will be chopped to extract all of the
command options. Those options will eventually be used within an
emacs mode, all things going swimmingly.
I don't know what's going on with the piping in the second version.
It looks like the output of p0 gets converted to unicode at some
point,

Whatever gave you that idea?
but I might be misunderstanding what's going on. The 4.8
codecs module documentation doesn't really offer much enlightment,
nor google. About the only other place I can think to look would be
the unit test cases shipped with python.

Get your head out of the red herring factory; unicode, "utf" (which
one?) and codecs have nothing to do with your problem. Think about
looking at your own code and at the bzip2 documentation.
Sort of hoping one of the guru-level pythonistas can point to
illumination, or write something to help out the next chap. This
might be one of those catalytic questions, the answer to which tackles
five other questions you didn't really know you had.
Thanks,
Chris
---------------------------
#!/usr/bin/python
import subprocess

p = subprocess.Popen(["bzip2", "-c", "-d", "/usr/share/man/man1/mktemp.
1.bz2"]
, stdout=subprocess.PIPE)
stdout, stderr = p.communicate()
print stdout

p0 = subprocess.Popen(["cat","/usr/share/man/man1/mktemp.1.bz2"],
stdout=subprocess.PIPE)
p1 = subprocess.Popen(["bzip2"], stdin=p0.stdout ,
stdout=subprocess.PIPE)
stdout, stderr = p1.communicate()
print stdout
---------------------------

You left out the command-line options for bzip2. The "rat" that you
saw was the result of compressing the already-compressed man page.
Read this:
http://www.bzip.org/docs.html
which is a bit obscure. The --help output from my copy of an antique
(2001, v1.02) bzip2 Windows port explains it plainly:
"""
If invoked as `bzip2', default action is to compress.
as `bunzip2', default action is to decompress.
as `bzcat', default action is to decompress to stdout.

If no file names are given, bzip2 compresses or decompresses
from standard input to standard output.
"""

HTH,
John
 
S

smitty1e

The first print statement does what you'd expect.
The second print statement has rather a lot of rat in it.
The goal here is to write a function that will return the man page for
some command (mktemp used as a short example here) as text to client
code, where the groff markup will be chopped to extract all of the
command options. Those options will eventually be used within an
emacs mode, all things going swimmingly.
I don't know what's going on with the piping in the second version.
It looks like the output of p0 gets converted to unicode at some
point,

Whatever gave you that idea?
but I might be misunderstanding what's going on. The 4.8
codecs module documentation doesn't really offer much enlightment,
nor google. About the only other place I can think to look would be
the unit test cases shipped with python.

Get your head out of the red herring factory; unicode, "utf" (which
one?) and codecs have nothing to do with your problem. Think about
looking at your own code and at the bzip2 documentation.


Sort of hoping one of the guru-level pythonistas can point to
illumination, or write something to help out the next chap. This
might be one of those catalytic questions, the answer to which tackles
five other questions you didn't really know you had.
Thanks,
Chris
p = subprocess.Popen(["bzip2", "-c", "-d", "/usr/share/man/man1/mktemp.
1.bz2"]
, stdout=subprocess.PIPE)
stdout, stderr = p.communicate()
print stdout
p0 = subprocess.Popen(["cat","/usr/share/man/man1/mktemp.1.bz2"],
stdout=subprocess.PIPE)
p1 = subprocess.Popen(["bzip2"], stdin=p0.stdout ,
stdout=subprocess.PIPE)
stdout, stderr = p1.communicate()
print stdout
---------------------------

You left out the command-line options for bzip2. The "rat" that you
saw was the result of compressing the already-compressed man page.
Read this:http://www.bzip.org/docs.html
which is a bit obscure. The --help output from my copy of an antique
(2001, v1.02) bzip2 Windows port explains it plainly:
"""
If invoked as `bzip2', default action is to compress.
as `bunzip2', default action is to decompress.
as `bzcat', default action is to decompress to stdout.

If no file names are given, bzip2 compresses or decompresses
from standard input to standard output.
"""

HTH,
John

Don't I feel like the biggest dork on the planet.
I had started with
cat /usr/share/man/man1/paludis.1.bz2 | bunzip2
then proceeded right to a self-foot-shoot when I went to python.
*sigh*
Thanks for the calibration, sir.
Rm
C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top