question on HTMLParser and parser.feed()

Stephen Briley · Dec 6, 2003

Hi,

I'm new to Python, so please bear with me..

I am satisfied with the HTMLparse of my htmlsource
page. But I am unable to save the output of
parser.feed(htmlsource). When I type
parser.feed(htmlsource) into the interpreter, the
correct output streams across the screen. But all of
my attempts to capture this output to a variable are
unsucessful (e.g. capt_text =
parser.feed(htmlsource)).

What am I missing and how can I get this to work?
Thanks in advance!

from htmllib import HTMLParser
from formatter import AbstractFormatter, DumbWriter
parser = HTMLParser(AbstractFormatter(DumbWriter()))
parser.feed(htmlsource)

__________________________________
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/

Peter Otten · Dec 6, 2003

Stephen said:
I am satisfied with the HTMLparse of my htmlsource
page. But I am unable to save the output of

My guess is that in the long run you will be even more satisfied with the
HTMLParser in the HTMLParser module - it has a cleaner interface and can
handle XHTML.

parser.feed(htmlsource). When I type
parser.feed(htmlsource) into the interpreter, the
correct output streams across the screen. But all of
my attempts to capture this output to a variable are
unsucessful (e.g. capt_text =
parser.feed(htmlsource)).

What am I missing and how can I get this to work?
Thanks in advance!

from htmllib import HTMLParser
from formatter import AbstractFormatter, DumbWriter
parser = HTMLParser(AbstractFormatter(DumbWriter()))
parser.feed(htmlsource)

You can provide a file object to the dumbwriter object to write the
formatter output to a file:

outstream = file("tmp.txt", "w")
parser = HTMLParser(AbstractFormatter(DumbWriter(outstream)))
parser.feed(htmlsource)
outstream.close()

When you don't want to store the output you can instead provide a StringIO
instance that behaves like a file, but does not store anything on disk:

# cStringIO contains the faster version of StringIO
from cStringIO import StringIO
from htmllib import HTMLParser
from formatter import AbstractFormatter, DumbWriter
htmlsource = """
<html>
<head><title>Hello world</title></head>
<body>For demonstration purposes</body>
</html>
"""

outstream = StringIO()
parser = HTMLParser(AbstractFormatter(DumbWriter(outstream)))
parser.feed(htmlsource)
data = outstream.getvalue()
outstream.close()

# your code here, I just print it in uppercase
print data.upper()

Peter

HTMLParser not parsing whole html file	4	Oct 24, 2010
HTMLParser problems.	11	Oct 30, 2003
urllib2.urlopen(url) pulling something other than HTML	7	Aug 20, 2007
Simulation library in Python	2	Jan 2, 2004
build problems on Solaris 9	0	Dec 30, 2003
Using Tools/freeze.py on AIX -- having problems	1	Dec 22, 2006
Amrita question(s)	2	Dec 19, 2003
Question on Java .form and Eclipse	10	Jul 31, 2008

question on HTMLParser and parser.feed()

Stephen Briley

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads