confused by HTMLParser class

G

globalrev

tried all kinds of combos to get this to work.


http://docs.python.org/lib/module-HTMLParser.html



from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag

def handle_endtag(self, tag):
print "Encountered the end of a %s tag" % tag


from HTMLParser import HTMLParser
import urllib
import myhtmlparser

x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
print x.handle_starttag()
 
A

alex23

tried all kinds of combos to get this to work.

Did you try searching this group? There were recent posts discussing
basic usage of HTMLParser.

Throwing random code together is the least likely way to actually get
it to work.
x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
print x.handle_starttag()

Why are you passing HTMLParser in to initialise MyHTMLParser?

Why are you iterating over site and expecting your instance of
MyHTMLParser to magically know about it?

Why haven't you read the urllib.urlopen docs, to see you need to do
a .read() to actually get the page data?

Why are you so resistant to reading some basic tutorials?
 
X

XLiIV

tried all kinds of combos to get this to work.

http://docs.python.org/lib/module-HTMLParser.html

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        print "Encountered the beginning of a %s tag" % tag

    def handle_endtag(self, tag):
        print "Encountered the end of a %s tag" % tag

from HTMLParser import HTMLParser
import urllib
import myhtmlparser

x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
    print x.handle_starttag()

this works fine to me:


from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag

def handle_endtag(self, tag):
print "Encountered the end of a %s tag" % tag

#from HTMLParser import HTMLParser
import urllib
#import mythmlparser

site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
x = MyHTMLParser() # x = MyHTMLParser(HTMLParser())
x.feed(site.read())
x.close()
for row in site:
print x.handle_starttag()
site.close()


You should also read this:
http://www.diveintopython.org/html_processing/extracting_data.html
for example
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top