BeautifulSoup fetch help

T

ted

Hi,

I'm using the BeautifulSoup module and having some trouble processing a
file. It's not printing what I'm expecting. In the code below, I'm expecting
cells with only "bgcolor" attributes to be printed, but I'm getting cells
with other attributes and some without any attributes.

Any help appreciated. Thanks,
Ted

import re
from BeautifulSoup import BeautifulSoup

text = open('yahoo.html').read()
soup = BeautifulSoup(text)
tables = soup('table', {'border':re.compile('.+')})

for table in tables:
cells = table.fetch('td', {'bgcolor':re.compile('.+')})
for cell in cells:
print cell
print "================"
 
M

Mike Meyer

ted said:
I'm using the BeautifulSoup module and having some trouble processing a
file. It's not printing what I'm expecting. In the code below, I'm expecting
cells with only "bgcolor" attributes to be printed, but I'm getting cells
with other attributes and some without any attributes.

BeatifulSoups matching is for any tag with a matching attribute, not
tags that only match that attribute. That's why you're getting tags
with other attributes.

However, you can use a callable as the tag argument to check for what
you want:

def findtagswithly(name, attr):
return (lambda tag: tag.name == name and
len(tag.attrs) == 1 and
tag.attrs[0][0] == attr)

....

cells = table.fetch(findtagswithonly('a', 'bgcolor'))


Or, because I wrote it to check out:

def findtagswithoneattrib(name):
return lambda tag: tag.name == name and len(tag.attrs) == 1

....
cells = table.fetch(findtagswithoneattrib('a', {bgcolor: re.compile('.+)}))

I'm not sure why you're getting tags without attributes. If the above
code does that, post some sample data along with the code.

<mike
 
T

ted

Thanks Mike, works like a charm.

-Ted


Mike Meyer said:
ted said:
I'm using the BeautifulSoup module and having some trouble processing a
file. It's not printing what I'm expecting. In the code below, I'm
expecting
cells with only "bgcolor" attributes to be printed, but I'm getting cells
with other attributes and some without any attributes.

BeatifulSoups matching is for any tag with a matching attribute, not
tags that only match that attribute. That's why you're getting tags
with other attributes.

However, you can use a callable as the tag argument to check for what
you want:

def findtagswithly(name, attr):
return (lambda tag: tag.name == name and
len(tag.attrs) == 1 and
tag.attrs[0][0] == attr)

...

cells = table.fetch(findtagswithonly('a', 'bgcolor'))


Or, because I wrote it to check out:

def findtagswithoneattrib(name):
return lambda tag: tag.name == name and len(tag.attrs) == 1

...
cells = table.fetch(findtagswithoneattrib('a', {bgcolor:
re.compile('.+)}))

I'm not sure why you're getting tags without attributes. If the above
code does that, post some sample data along with the code.

<mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top