B-Soup: broken iterator, tag a keyword?

B

Brendan

Hi there,
I have the following using Beautiful Soup:

soup = BeautifulSoup(data)
tags = soup.findAll(href=re.compile("/MER_FRS_L2_Canada/MER_FRS_\S
+gz"))
for tag in tags:
print tag['href']
print tag.parent.nextSibling.string
print tag.parent.nextSibling.nextSibling.string
print tag.parent.nextSibling.nextSibling.nextSibling.string
print
tag.parent.nextSibling.nextSibling.nextSibling.nextSibling.contents[0].string


For some reason I do not understand, using 'tag' as an iterator breaks
the code. Can someone tell me why? reading dir(soup) did not
illuminate me.

Thanks
 
S

Stefan Behnel

Hi,
I have the following using Beautiful Soup:

soup = BeautifulSoup(data)
tags = soup.findAll(href=re.compile("/MER_FRS_L2_Canada/MER_FRS_\S
+gz"))
for tag in tags:
print tag['href']
print tag.parent.nextSibling.string
print tag.parent.nextSibling.nextSibling.string
print tag.parent.nextSibling.nextSibling.nextSibling.string
print
tag.parent.nextSibling.nextSibling.nextSibling.nextSibling.contents[0].string

It's pretty impossible that the problem is the name "tag" here. But since you
didn't state what the actual problem is, let me suggest not to parse markup
with regular expressions in general (which BS does). Use a real XML/HTML
parser for that. lxml will work just fine (and it also has a nicer API).

http://codespeak.net/lxml/

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,688
Members
48,796
Latest member
Greg L.

Latest Threads

Top