Question: processing HTML, re-write default processing action of many tags

Discussion in 'Python' started by Hubert Hung-Hsien Chang, Sep 17, 2004.

  1. I know you could use the


    def start_a
    .....

    def end_a
    .....

    to process the <a href=...> anchor </a> tags, but is there a
    default method for processing ALL tags? If I just want change
    some parts of the hyperlink and want to keep other parts of the HTML
    could I just print them out? There should be such a method.
    Can't find it...

    Thank you.
    Hubert Hung-Hsien Chang, Sep 17, 2004
    #1
    1. Advertising

  2. Hubert Hung-Hsien Chang <> wrote:

    > I know you could use the
    >
    >
    > def start_a
    > ....
    >
    > def end_a
    > ....
    >
    > to process the <a href=...> anchor </a> tags, but is there a
    > default method for processing ALL tags? If I just want change
    > some parts of the hyperlink and want to keep other parts of the HTML
    > could I just print them out? There should be such a method.
    > Can't find it...


    You could subclass HTMLParser.HTMLParser and override handle_starttag
    and handle_endtag (also, if needed, handle_charref, handle_entityref,
    and last but not least handle_data -- that's assuming that while you
    only talk about processing _tags_ you may in fact also want to process
    references and text nodes... possibly handle_comment, too, btw).


    Alex
    Alex Martelli, Sep 17, 2004
    #2
    1. Advertising

  3. (Hubert Hung-Hsien Chang) wrote in message news:<>...
    > I know you could use the
    >
    >
    > def start_a
    > ....
    >
    > def end_a
    > ....
    >
    > to process the <a href=...> anchor </a> tags, but is there a
    > default method for processing ALL tags? If I just want change
    > some parts of the hyperlink and want to keep other parts of the HTML
    > could I just print them out? There should be such a method.
    > Can't find it...
    >
    > Thank you.


    If you are modifying the contents of tags I've written a simple HTML
    parser class called Scraper that does this. Unlike the HTMLParser in
    the standard library it doesn't choke so much on badly formed HTML....

    It's part of approx.py my cgiproxy....
    http://www.voidspace.org.uk/atlantibots/pythonutils.html#cgiproxy

    HTH

    Regards,

    Fuzzy
    Michael Foord, Sep 17, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dean H. Saxe
    Replies:
    0
    Views:
    1,023
    Dean H. Saxe
    Jan 3, 2004
  2. Rob Nicholson
    Replies:
    3
    Views:
    717
    Rob Nicholson
    May 28, 2005
  3. Joe Bloggs
    Replies:
    1
    Views:
    734
    Sudsy
    Aug 3, 2003
  4. Donald Firesmith

    html tags within meta tags allowed?

    Donald Firesmith, Jan 5, 2005, in forum: XML
    Replies:
    5
    Views:
    883
    Andy Dingley
    Jan 8, 2005
  5. Luke
    Replies:
    8
    Views:
    146
Loading...

Share This Page