question on HTMLParser and parser.feed()

S

Stephen Briley

Hi,

I'm new to Python, so please bear with me..

I am satisfied with the HTMLparse of my htmlsource
page. But I am unable to save the output of
parser.feed(htmlsource). When I type
parser.feed(htmlsource) into the interpreter, the
correct output streams across the screen. But all of
my attempts to capture this output to a variable are
unsucessful (e.g. capt_text =
parser.feed(htmlsource)).

What am I missing and how can I get this to work?
Thanks in advance!


from htmllib import HTMLParser
from formatter import AbstractFormatter, DumbWriter
parser = HTMLParser(AbstractFormatter(DumbWriter()))
parser.feed(htmlsource)

__________________________________
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/
 
P

Peter Otten

Stephen said:
I am satisfied with the HTMLparse of my htmlsource
page. But I am unable to save the output of

My guess is that in the long run you will be even more satisfied with the
HTMLParser in the HTMLParser module - it has a cleaner interface and can
handle XHTML.
parser.feed(htmlsource). When I type
parser.feed(htmlsource) into the interpreter, the
correct output streams across the screen. But all of
my attempts to capture this output to a variable are
unsucessful (e.g. capt_text =
parser.feed(htmlsource)).

What am I missing and how can I get this to work?
Thanks in advance!


from htmllib import HTMLParser
from formatter import AbstractFormatter, DumbWriter
parser = HTMLParser(AbstractFormatter(DumbWriter()))
parser.feed(htmlsource)

You can provide a file object to the dumbwriter object to write the
formatter output to a file:

outstream = file("tmp.txt", "w")
parser = HTMLParser(AbstractFormatter(DumbWriter(outstream)))
parser.feed(htmlsource)
outstream.close()

When you don't want to store the output you can instead provide a StringIO
instance that behaves like a file, but does not store anything on disk:

# cStringIO contains the faster version of StringIO
from cStringIO import StringIO
from htmllib import HTMLParser
from formatter import AbstractFormatter, DumbWriter
htmlsource = """
<html>
<head><title>Hello world</title></head>
<body>For demonstration purposes</body>
</html>
"""

outstream = StringIO()
parser = HTMLParser(AbstractFormatter(DumbWriter(outstream)))
parser.feed(htmlsource)
data = outstream.getvalue()
outstream.close()

# your code here, I just print it in uppercase
print data.upper()


Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top