Beautiful Soup iterator question....

cjl · Apr 20, 2007

P:

I am screen-scraping a table. The table has an unknown number of rows,
but each row has exactly 8 cells. I would like to extract the data
from the cells, but the first three cells in each row have their data
nested inside other tags.

So I have the following code:

for row in table.findAll("tr"):
for cell in row.findAll("td"):
print cell.contents[0]

This code prints out all the data, but of course the first three cells
still contain their unwanted tags.

I would like to do something like this:

for cell1, cell2, cell3, cell4, cell5, cell6, cell7, cell8 in
row.findAll("td"):

Then treat each cell differently.

I can't figure this out. Can anyone point me in the right direction?

-CJL

Steve Holden · Apr 20, 2007

cjl said:
P:

I am screen-scraping a table. The table has an unknown number of rows,
but each row has exactly 8 cells. I would like to extract the data
from the cells, but the first three cells in each row have their data
nested inside other tags.

So I have the following code:

for row in table.findAll("tr"):
for cell in row.findAll("td"):
print cell.contents[0]

This code prints out all the data, but of course the first three cells
still contain their unwanted tags.

I would like to do something like this:

for cell1, cell2, cell3, cell4, cell5, cell6, cell7, cell8 in
row.findAll("td"):

Then treat each cell differently.

I can't figure this out. Can anyone point me in the right direction?

did you try something like (untested)

cell1, cell2, cell3, cell4, cell5, \
cell6, cell7, cell8 = row.findAll("td")

No need for the "for" if you want to handle each cell differently, you
won;t be iterating over htem . And, as you saw, it doesn't work unless
row.findAll(...) returns a sequence of eight-item containers.

regards
Steve

Paul McGuire · Apr 20, 2007

On Apr 20 said:
did you try something like (untested)

cell1, cell2, cell3, cell4, cell5, \
cell6, cell7, cell8 = row.findAll("td")

No need for the "for" if you want to handle each cell differently, you
won;t be iterating over htem . And, as you saw, it doesn't work unless
row.findAll(...) returns a sequence of eight-item containers.

One defensive approach to handle rows that might have too few or too
many elements, is to construct a larger list, and then slice the right
number of elements from it.

cell1, cell2, cell3, cell4, cell5, \
cell6, cell7, cell8 = (row.findAll("td") + [None]*8)[:
8]

-- Paul

parsing tables with beautiful soup?	3	Mar 21, 2007
beautiful soup library question	2	Mar 10, 2006
trouble with click events on dynamically created link buttons	5	Feb 20, 2007
Hi 2 all	2	Feb 9, 2006
Survey details won't go through using php, ajax, Mysql	0	Oct 26, 2023
Dynamic Table	1	Jan 25, 2007
Grid question: How do I force grid elements to expand and use available space	0	Oct 8, 2003
Refresh table when keypressed	2	Sep 21, 2006

Beautiful Soup iterator question....

cjl

Steve Holden

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads