text representation of HTML

K

Ksenia Marasanova

Hi,

I am looking for a library that will give me very simple text
representation of HTML.
For example
<div><h1>Title</h1><p>This is a <br />test</p></div>

will be transformed to:

Title

This is a
test


i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
Any hints?

Thanks!
Ksenia.
 
D

Diez B. Roggisch

Ksenia said:
Hi,

I am looking for a library that will give me very simple text
representation of HTML.
For example
<div><h1>Title</h1><p>This is a <br />test</p></div>

will be transformed to:

Title

This is a
test


i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
Any hints?

html2text is a commandline tool. You can invoke it from python using
subprocess.

Diez
 
G

garabik-news-2005-05

Ksenia Marasanova said:
Hi,

I am looking for a library that will give me very simple text
representation of HTML.
For example
<div><h1>Title</h1><p>This is a <br />test</p></div>

will be transformed to:

Title

This is a
test


i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.

something like this:

import re
text = '<div><h1>Title</h1><p>This is a <br />test</p></div>'
text = re.sub(r'[\n\ \t]+', ' ', text)
text = re.sub(r'(?i)(\<p\>|\<br\>|\<h[1-6]\>)', '\n', text)
result = re.sub('<.+?>', '', text)
print result

--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
 
D

Duncan Booth

Ksenia said:
I am looking for a library that will give me very simple text
representation of HTML.
For example
<div><h1>Title</h1><p>This is a <br />test</p></div>

will be transformed to:

Title

This is a
test


i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
Any hints?

Use htmllib:
out = StringIO.StringIO()
p = htmllib.HTMLParser(
formatter.AbstractFormatter(formatter.DumbWriter(out)))
p.feed(s)
p.close()
if p.anchorlist:
print >>out
for idx,anchor in enumerate(p.anchorlist):
print >>out, "\n[%d]: %s" % (idx+1,anchor)
return out.getvalue()
/>test</p></div>''')

Title

This is a
testhref="http://python.org">a link</a> to the Python homepage</p></div>''')

Title

This is a
test with a link[1] to the Python homepage

[1]: http://python.org
 
T

Tim Williams

Ksenia said:
i want to send plain text alternative of html email, and would prefer
to do it automatically from HTML source.
Any hints?

Use htmllib:
out = StringIO.StringIO()
p = htmllib.HTMLParser(
formatter.AbstractFormatter(formatter.DumbWriter(out)))
p.feed(s)
p.close()
if p.anchorlist:
print >>out
for idx,anchor in enumerate(p.anchorlist):
print >>out, "\n[%d]: %s" % (idx+1,anchor)
return out.getvalue()
/>test</p></div>''')

Title

This is a
testhref="http://python.org">a link</a> to the Python homepage</p></div>''')

Title

This is a
test with a link[1] to the Python homepage

[1]: http://python.org

cleanup() doesn't handle script and styles too well. html2text will
do a much better job of these and give a more structured output
(compatible with Markdown)

http://www.aaronsw.com/2002/html2text/
/>test with <a href="http://python.org">a link</a> to the Python
homepage</p></div>''')

# Title

This is a
test with [a link][1] to the Python homepage

[1]: http://python.org


HTH :)
 
K

Ksenia Marasanova

Sorry for the late reply... better too late than never :)
Thanks to all for the tips. Stripogram is the winner, since it is the
most configurable and accept line-length parameter, which is handy for
email...

Ksenia.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,188
Latest member
Crypto TaxSoftware

Latest Threads

Top