parsing tables with beautiful soup?

Discussion in 'Python' started by cjl, Mar 21, 2007.

  1. cjl

    cjl Guest

    I am learning python and beautiful soup, and I'm stuck.

    A web page has a table that contains data I would like to scrape. The
    table has a unique class, so I can use:

    soup.find("table", {"class": "class_name"})

    This isolates the table. So far, so good. Next, this table has a
    certain number of rows (I won't know ahead of time how many), and each
    row has a set number of cells (which will be constant).

    I couldn't find example code on how to loop through the contents of
    the rows and cells of a table using beautiful soup. I'm guessing I
    need an outer loop for the rows and an inner loop for the cells, but I
    don't know how to iterate over the tags that I want. The beautiful
    soup documentation is a little beyond me at this point.

    Can anyone point me in the right direction?

    thanks again,
    cjl
    cjl, Mar 21, 2007
    #1
    1. Advertising

  2. cjl

    cjl Guest

    This works:

    for row in soup.find("table",{"class": "class_name"}):
    for cell in row:
    print cell.contents[0]

    Is there a better way to do this?

    -cjl
    cjl, Mar 21, 2007
    #2
    1. Advertising

  3. cjl

    Duncan Booth Guest

    "cjl" <> wrote:

    > This works:
    >
    > for row in soup.find("table",{"class": "class_name"}):
    > for cell in row:
    > print cell.contents[0]
    >
    > Is there a better way to do this?
    >


    It may work for the page you are testing against, but it wouldn't work if
    your page contained valid HTML. You are assuming that the TR elements are
    direct children of the TABLE, but HTML requires that the TR elements appear
    inside THEAD, TBODY or TFOOT elements, so if anyone ever corrects the html
    your code will break.

    Something like this (untested) ought to work and be reasonably robust:

    table = soup.find("table",{"class": "class_name"})
    for row in table.findAll("tr"):
    for cell in row.findAll("td"):
    print cell.findAll(text=True)
    Duncan Booth, Mar 22, 2007
    #3
  4. cjl

    cjl Guest

    DB:

    Thank you, that worked perfectly.

    -CJL
    cjl, Mar 22, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    539
    Enigma Curry
    Mar 11, 2006
  2. Tempo

    Using Beautiful Soup

    Tempo, Aug 19, 2006, in forum: Python
    Replies:
    1
    Views:
    553
    Jorge Godoy
    Aug 19, 2006
  3. SAKTHEESH
    Replies:
    1
    Views:
    723
    Thomas 'PointedEars' Lahn
    Jul 22, 2011
  4. Dieter Maurer

    Re: Beautiful Soup Table Parsing

    Dieter Maurer, Aug 9, 2012, in forum: Python
    Replies:
    0
    Views:
    253
    Dieter Maurer
    Aug 9, 2012
  5. Andreas Perstinger

    Re: Beautiful Soup Table Parsing

    Andreas Perstinger, Aug 9, 2012, in forum: Python
    Replies:
    0
    Views:
    176
    Andreas Perstinger
    Aug 9, 2012
Loading...

Share This Page