Converting HTML to ASCII

G

gf gf

Hans,

Thanks for the tip. I took a look at Beatiful Soup,
and it looked like it was a framework to parse HTML.
I'm not really interetsed in going through it tag by
tag - just to get it converted to ASCII. How can I do
this with B. Soup?

--Thanks

PS William - thanks for the reference to lynx, but I
need a Python solution - forking and execing for each
file I need to convert is too slow for my application


Hans wrote:
Try Beautiful Soup!
1) Be able to handle badly formed, or illegal, HTML,
as best as possible.
From the description:
"It won't choke if you give it ill-formed markup:
it'll just give you access to
a correspondingly ill-formed data structure."
Can anyone direct me to something which could help me
for this?
http://www.crummy.com/software/BeautifulSoup/

Hans Christian



__________________________________
Do you Yahoo!?
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250
 
J

Jorgen Grahn

Hans,

Thanks for the tip. I took a look at Beatiful Soup,
and it looked like it was a framework to parse HTML.

This is my understanding, too.
I'm not really interetsed in going through it tag by
tag - just to get it converted to ASCII. How can I do
this with B. Soup?

You should probably do what some other poster suggested -- download lynx or
some other text-only browser and make your code execute it in -dump mode to
get the text-formatted html. You'll get that working in an hour or so, and
then you can see if you need something more complicated.

/Jorgen
 
P

Paul Rubin

Jorgen Grahn said:
You should probably do what some other poster suggested -- download
lynx or some other text-only browser and make your code execute it
in -dump mode to get the text-formatted html. You'll get that
working in an hour or so, and then you can see if you need something
more complicated.

Lynx is pathetically slow for large files. It seems to use a
quadratic algorithm for remembering where the links point, or
something. I wrote a very crude but very fast renderer in C that I
can post if someone wants it, which is what I use for this purpose.
 
J

Jorgen Grahn

Lynx is pathetically slow for large files. It seems to use a
quadratic algorithm for remembering where the links point, or
something. I wrote a very crude but very fast renderer in C that I
can post if someone wants it, which is what I use for this purpose.

That may be so, but it's fast enough for all the people who use it as a
general html->plaintext tool, so it's probably good enough for the OP.

w3m and links are other options. They provide better formatting than lynx,
and at least w3m has the -dump option.

I wouldn't mind if there was a reusable library for rendering HTML to text,
from various languages. I'd also like to see one (CSS-aware) for rendering
to troff or Postscript.

/Jorgen
 
G

Grant Edwards

Lynx is pathetically slow for large files.

First, make it work. Then make it work right. Then worry
about how fast it is.

"Premature optimization..."
It seems to use a quadratic algorithm for remembering where
the links point, or something. I wrote a very crude but very
fast renderer in C that I can post if someone wants it, which
is what I use for this purpose.

If lynx really is too slow, try w3m or links. Both do a better
job of rendering anyway.
 
T

Thomas Dickey

Grant Edwards said:
First, make it work. Then make it work right. Then worry
about how fast it is.
"Premature optimization..."

That could be - but then again, most of the comments I've seen for that
particular issue are for rather old releases.
If lynx really is too slow, try w3m or links. Both do a better
job of rendering anyway.

They lay out tables more/less as expected (though navigation in tables
for links seems to be an afterthought).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top