N
Nickolay Kolev
Hi all,
I am looking for a way to extract the titles of HTML documents. I have
made an honest attempt at doing it, and it even works. Is there an
easier (faster / more efficient / clearer) way?
------------ START SCRIPT --------------------
#!/usr/bin/python
import sgmllib
class MyParser(sgmllib.SGMLParser):
inside_title = False
title = ''
def start_title(self, attrs):
self.inside_title = True
def end_title(self):
self.inside_title = False
def handle_data(self, data):
if self.inside_title and data:
self.title = self.title + data + ' '
p = MyParser()
p.feed(file('test.html').read())
p.close()
print p.title.strip()
---------------- END SCRIPT -------------------------
Many thanks in advance!
Best regards,
Nickolay Kolev
I am looking for a way to extract the titles of HTML documents. I have
made an honest attempt at doing it, and it even works. Is there an
easier (faster / more efficient / clearer) way?
------------ START SCRIPT --------------------
#!/usr/bin/python
import sgmllib
class MyParser(sgmllib.SGMLParser):
inside_title = False
title = ''
def start_title(self, attrs):
self.inside_title = True
def end_title(self):
self.inside_title = False
def handle_data(self, data):
if self.inside_title and data:
self.title = self.title + data + ' '
p = MyParser()
p.feed(file('test.html').read())
p.close()
print p.title.strip()
---------------- END SCRIPT -------------------------
Many thanks in advance!
Best regards,
Nickolay Kolev