a html parse problem

C

cheng

hi,all

if the html like:
<meta name = "description" content = "a test page">
<meta name = "keywords" content = "keyword1 keyword2">

if i use:
def handle_starttag(self, tag, attrs):
if tag == 'meta':
self.attr = attrs
self.headers += ['%s' % (self.attr)]
self.attr = ''

will get the output:
[('name', 'description'), ('content', 'a test page')]

[('name', 'keywords'), ('content', 'keyword1 keyword2')]

is it some way that only take the content like " a test page, keyword1
, keywork2"
 
B

bruno modulix

cheng said:
hi,all

if the html like:
<meta name = "description" content = "a test page">
<meta name = "keywords" content = "keyword1 keyword2">

if i use:
def handle_starttag(self, tag, attrs):
if tag == 'meta':
self.attr = attrs
self.headers += ['%s' % (self.attr)]
self.attr = ''


will get the output:
[('name', 'description'), ('content', 'a test page')]

[('name', 'keywords'), ('content', 'keyword1 keyword2')]
is it some way that only take the content like " a test page, keyword1
, keywork2"

And put it where ?-)

Well, it may looks like this:

def handle_starttag(self, tag, attrs):
if tag == 'meta':
try:
self.content.append(attrs['content'])
except KeyError:
pass
self.headers += ['%s' % attr]

HTH
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top