question on HTMLParser and parser.feed()

Discussion in 'Python' started by Stephen Briley, Dec 6, 2003.

  1. Hi,

    I'm new to Python, so please bear with me..

    I am satisfied with the HTMLparse of my htmlsource
    page. But I am unable to save the output of
    parser.feed(htmlsource). When I type
    parser.feed(htmlsource) into the interpreter, the
    correct output streams across the screen. But all of
    my attempts to capture this output to a variable are
    unsucessful (e.g. capt_text =
    parser.feed(htmlsource)).

    What am I missing and how can I get this to work?
    Thanks in advance!


    from htmllib import HTMLParser
    from formatter import AbstractFormatter, DumbWriter
    parser = HTMLParser(AbstractFormatter(DumbWriter()))
    parser.feed(htmlsource)

    __________________________________
    Do you Yahoo!?
    New Yahoo! Photos - easier uploading and sharing.
    http://photos.yahoo.com/
    Stephen Briley, Dec 6, 2003
    #1
    1. Advertising

  2. Stephen Briley

    Peter Otten Guest

    Stephen Briley wrote:

    > I am satisfied with the HTMLparse of my htmlsource
    > page. But I am unable to save the output of


    My guess is that in the long run you will be even more satisfied with the
    HTMLParser in the HTMLParser module - it has a cleaner interface and can
    handle XHTML.

    > parser.feed(htmlsource). When I type
    > parser.feed(htmlsource) into the interpreter, the
    > correct output streams across the screen. But all of
    > my attempts to capture this output to a variable are
    > unsucessful (e.g. capt_text =
    > parser.feed(htmlsource)).
    >
    > What am I missing and how can I get this to work?
    > Thanks in advance!
    >
    >
    > from htmllib import HTMLParser
    > from formatter import AbstractFormatter, DumbWriter
    > parser = HTMLParser(AbstractFormatter(DumbWriter()))
    > parser.feed(htmlsource)


    You can provide a file object to the dumbwriter object to write the
    formatter output to a file:

    outstream = file("tmp.txt", "w")
    parser = HTMLParser(AbstractFormatter(DumbWriter(outstream)))
    parser.feed(htmlsource)
    outstream.close()

    When you don't want to store the output you can instead provide a StringIO
    instance that behaves like a file, but does not store anything on disk:

    # cStringIO contains the faster version of StringIO
    from cStringIO import StringIO
    from htmllib import HTMLParser
    from formatter import AbstractFormatter, DumbWriter
    htmlsource = """
    <html>
    <head><title>Hello world</title></head>
    <body>For demonstration purposes</body>
    </html>
    """

    outstream = StringIO()
    parser = HTMLParser(AbstractFormatter(DumbWriter(outstream)))
    parser.feed(htmlsource)
    data = outstream.getvalue()
    outstream.close()

    # your code here, I just print it in uppercase
    print data.upper()


    Peter
    Peter Otten, Dec 6, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mike
    Replies:
    0
    Views:
    327
  2. mike
    Replies:
    0
    Views:
    873
  3. Adonis
    Replies:
    1
    Views:
    350
    Carl Banks
    Jul 28, 2003
  4. Andrew Berg
    Replies:
    0
    Views:
    168
    Andrew Berg
    May 15, 2011
  5. Karim
    Replies:
    0
    Views:
    398
    Karim
    May 17, 2011
Loading...

Share This Page