confused by HTMLParser class

globalrev · May 28, 2008

tried all kinds of combos to get this to work.

http://docs.python.org/lib/module-HTMLParser.html

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag

def handle_endtag(self, tag):
print "Encountered the end of a %s tag" % tag

from HTMLParser import HTMLParser
import urllib
import myhtmlparser

x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
print x.handle_starttag()

alex23 · May 28, 2008

tried all kinds of combos to get this to work.

Did you try searching this group? There were recent posts discussing
basic usage of HTMLParser.

Throwing random code together is the least likely way to actually get
it to work.

x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
print x.handle_starttag()

Why are you passing HTMLParser in to initialise MyHTMLParser?

Why are you iterating over site and expecting your instance of
MyHTMLParser to magically know about it?

Why haven't you read the urllib.urlopen docs, to see you need to do
a .read() to actually get the page data?

Why are you so resistant to reading some basic tutorials?

XLiIV · May 28, 2008

tried all kinds of combos to get this to work.

http://docs.python.org/lib/module-HTMLParser.html

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag

def handle_endtag(self, tag):
print "Encountered the end of a %s tag" % tag

from HTMLParser import HTMLParser
import urllib
import myhtmlparser

x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
print x.handle_starttag()

this works fine to me:

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag

def handle_endtag(self, tag):
print "Encountered the end of a %s tag" % tag

#from HTMLParser import HTMLParser
import urllib
#import mythmlparser

site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
x = MyHTMLParser() # x = MyHTMLParser(HTMLParser())
x.feed(site.read())
x.close()
for row in site:
print x.handle_starttag()
site.close()

You should also read this:
http://www.diveintopython.org/html_processing/extracting_data.html
for example

Stefan Behnel · May 28, 2008

globalrev said:
tried all kinds of combos to get this to work.

In case you meant to say that you can't get it to work, consider using lxml
instead.

http://codespeak.net/lxml
http://codespeak.net/lxml/lxmlhtml.html

Stefan

HTMLParser skipping HTML? [newbie]	6	Sep 5, 2012
HTMLParser can't read japanese	3	Apr 13, 2010
HTMLParser not parsing whole html file	4	Oct 24, 2010
HTMLParser and non-ascii html pages	0	Sep 20, 2011
Parsing an HTML a tag	10	Sep 24, 2005
HTMLParser and write	1	Mar 5, 2004
HTMLParser problems.	11	Oct 30, 2003
HTMLParser question	1	Aug 19, 2004

confused by HTMLParser class

globalrev

alex23

XLiIV

Stefan Behnel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads