R
Rajarshi Guha
Hi,
I have some HTML that looks essentially consists of a series of <div>'s
and each <div> having one of two classes (tnt-question or tnt-answer).
I'm using HTMLParser to handle the tags as:
class MyHTMLParser(HTMLParser.HTMLParser):
def handle_starttag(self, tag, attrs):
if len(attrs) == 1:
cls,whichcls = attrs[0]
if whichcls == 'tnt-question':
print self.get_starttag_text(), self.getpos()
def handle_endtag(self, tag):
pass
def handle_data(self, data):
print data
if __name__ == '__main__':
htmldata = string.join(open('tt.html','r').readlines())
parser = MyHTMLParser()
parser.feed( htmldata )
However what I would like is that when the parser reaches some HTML like
this:
<div class="tnt-question">
How do I add a user to a MySQL system?
</div>
I should get back the data between the open and close tags. However the
above code prints the text contained between all tags, not just the <div>
tags with the class='tnt-question'.
Is there a way to call handle_data() when a specific tag is being handled?
Placing a call to handle_data() in handle_starttag seems to be the way -
but I';m not sure how to actually do it - what data should I pass to the
call?
Any pointers would be appreciated
Thanks,
Rajarshi
I have some HTML that looks essentially consists of a series of <div>'s
and each <div> having one of two classes (tnt-question or tnt-answer).
I'm using HTMLParser to handle the tags as:
class MyHTMLParser(HTMLParser.HTMLParser):
def handle_starttag(self, tag, attrs):
if len(attrs) == 1:
cls,whichcls = attrs[0]
if whichcls == 'tnt-question':
print self.get_starttag_text(), self.getpos()
def handle_endtag(self, tag):
pass
def handle_data(self, data):
print data
if __name__ == '__main__':
htmldata = string.join(open('tt.html','r').readlines())
parser = MyHTMLParser()
parser.feed( htmldata )
However what I would like is that when the parser reaches some HTML like
this:
<div class="tnt-question">
How do I add a user to a MySQL system?
</div>
I should get back the data between the open and close tags. However the
above code prints the text contained between all tags, not just the <div>
tags with the class='tnt-question'.
Is there a way to call handle_data() when a specific tag is being handled?
Placing a call to handle_data() in handle_starttag seems to be the way -
but I';m not sure how to actually do it - what data should I pass to the
call?
Any pointers would be appreciated
Thanks,
Rajarshi