parsing tables with beautiful soup?

cjl · Mar 21, 2007

I am learning python and beautiful soup, and I'm stuck.

A web page has a table that contains data I would like to scrape. The
table has a unique class, so I can use:

soup.find("table", {"class": "class_name"})

This isolates the table. So far, so good. Next, this table has a
certain number of rows (I won't know ahead of time how many), and each
row has a set number of cells (which will be constant).

I couldn't find example code on how to loop through the contents of
the rows and cells of a table using beautiful soup. I'm guessing I
need an outer loop for the rows and an inner loop for the cells, but I
don't know how to iterate over the tags that I want. The beautiful
soup documentation is a little beyond me at this point.

Can anyone point me in the right direction?

thanks again,
cjl

cjl · Mar 21, 2007

This works:

for row in soup.find("table",{"class": "class_name"}):
for cell in row:
print cell.contents[0]

Is there a better way to do this?

-cjl

Duncan Booth · Mar 22, 2007

cjl said:
This works:

for row in soup.find("table",{"class": "class_name"}):
for cell in row:
print cell.contents[0]

Is there a better way to do this?

It may work for the page you are testing against, but it wouldn't work if
your page contained valid HTML. You are assuming that the TR elements are
direct children of the TABLE, but HTML requires that the TR elements appear
inside THEAD, TBODY or TFOOT elements, so if anyone ever corrects the html
your code will break.

Something like this (untested) ought to work and be reasonably robust:

table = soup.find("table",{"class": "class_name"})
for row in table.findAll("tr"):
for cell in row.findAll("td"):
print cell.findAll(text=True)

cjl · Mar 22, 2007

DB:

Thank you, that worked perfectly.

-CJL

How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console oridle versions.	41	May 10, 2014
A little complex usage of Beautiful Soup Parsing Help!	1	Jul 20, 2011
Beautiful Soup iterator question....	2	Apr 20, 2007
python-parser running Beautiful Soup needs to be reviewed	4	Dec 11, 2010
Using Beautiful Soup	1	Aug 19, 2006
Using Beautiful Soup to entangle bookmarks.html	15	Sep 7, 2006
Beautiful soup : why does "string" not give me the string?	0	Apr 1, 2009
Using Beautiful Soup to entangle bookmarks.html	0	Sep 7, 2006

parsing tables with beautiful soup?

cjl

cjl

Duncan Booth

cjl

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads