HTMLParser question

Discussion in 'Python' started by Rajarshi Guha, Aug 19, 2004.

  1. Hi,
    I have some HTML that looks essentially consists of a series of <div>'s
    and each <div> having one of two classes (tnt-question or tnt-answer).
    I'm using HTMLParser to handle the tags as:

    class MyHTMLParser(HTMLParser.HTMLParser):

    def handle_starttag(self, tag, attrs):
    if len(attrs) == 1:
    cls,whichcls = attrs[0]
    if whichcls == 'tnt-question':
    print self.get_starttag_text(), self.getpos()
    def handle_endtag(self, tag):
    pass
    def handle_data(self, data):
    print data

    if __name__ == '__main__':

    htmldata = string.join(open('tt.html','r').readlines())
    parser = MyHTMLParser()
    parser.feed( htmldata )

    However what I would like is that when the parser reaches some HTML like
    this:

    <div class="tnt-question">
    How do I add a user to a MySQL system?
    </div>

    I should get back the data between the open and close tags. However the
    above code prints the text contained between all tags, not just the <div>
    tags with the class='tnt-question'.

    Is there a way to call handle_data() when a specific tag is being handled?
    Placing a call to handle_data() in handle_starttag seems to be the way -
    but I';m not sure how to actually do it - what data should I pass to the
    call?

    Any pointers would be appreciated
    Thanks,
    Rajarshi
     
    Rajarshi Guha, Aug 19, 2004
    #1
    1. Advertising

  2. Rajarshi Guha wrote:
    > Hi,
    > I have some HTML that looks essentially consists of a series of <div>'s
    > and each <div> having one of two classes (tnt-question or tnt-answer).
    > I'm using HTMLParser to handle the tags as:
    >
    > class MyHTMLParser(HTMLParser.HTMLParser):
    >
    > def handle_starttag(self, tag, attrs):
    > if len(attrs) == 1:
    > cls,whichcls = attrs[0]
    > if whichcls == 'tnt-question':
    > print self.get_starttag_text(), self.getpos()
    > def handle_endtag(self, tag):
    > pass
    > def handle_data(self, data):
    > print data
    >
    > if __name__ == '__main__':
    >
    > htmldata = string.join(open('tt.html','r').readlines())
    > parser = MyHTMLParser()
    > parser.feed( htmldata )
    >
    > However what I would like is that when the parser reaches some HTML like
    > this:
    >
    > <div class="tnt-question">
    > How do I add a user to a MySQL system?
    > </div>
    >
    > I should get back the data between the open and close tags. However the
    > above code prints the text contained between all tags, not just the <div>
    > tags with the class='tnt-question'.
    >
    > Is there a way to call handle_data() when a specific tag is being handled?
    > Placing a call to handle_data() in handle_starttag seems to be the way -
    > but I';m not sure how to actually do it - what data should I pass to the
    > call?

    Set a flag, when you the parser calls handle_starttag() and the tag
    matches your criteria, unset it, when the corresponding endtag is found
    (you'll probably have to count the nesting depth, so for
    <div class="printme">Yo <div>man</div>!</div>
    the flag is unset on the second </div>). Then in handle_data() only
    print it, when the flag is set.
     
    Benjamin Niemann, Aug 19, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tan Vu Ngoc

    HTMLParser solution!

    Tan Vu Ngoc, Nov 18, 2003, in forum: Java
    Replies:
    0
    Views:
    393
    Tan Vu Ngoc
    Nov 18, 2003
  2. JavaJug

    Swing HTMLParser problem

    JavaJug, Jul 26, 2004, in forum: Java
    Replies:
    3
    Views:
    513
    JavaJug
    Jul 26, 2004
  3. mike
    Replies:
    0
    Views:
    916
  4. Adonis
    Replies:
    1
    Views:
    377
    Carl Banks
    Jul 28, 2003
  5. Stephen Briley

    question on HTMLParser and parser.feed()

    Stephen Briley, Dec 6, 2003, in forum: Python
    Replies:
    1
    Views:
    538
    Peter Otten
    Dec 6, 2003
Loading...

Share This Page