capturing stdout from lynx..

S

sergio

i have a huge database that contains large amounts of html that i need
to translate to ascii..

i have tried using html2text.py:

http://www.aaronsw.com/2002/html2text/

but i could not figure out how to import it and use it as a library
without getting errors everywhere..

so i decided to try using lynx with the -dump switch..

it works great from the command line, but i am having trouble capturing
the output into a python variable..

the only way i have figured out how to do it is:

s = subprocess(args='/sw/bin/lynx',stdout=subprocess.PIPE)

but i can't figure out how to send it the "-dump" or the
<filename.html> and retrieve the ouput..

any help would be appreciated..
 
E

Enigma Curry

Does this do what you want?

import os
filename = "test.html"
cmd = os.popen("lynx -dump %s" % filename)
output = cmd.read()
cmd.close()
print output
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top