Extract Information from Tables in html

Discussion in 'Python' started by Jackie Wang, Sep 5, 2008.

  1. Jackie Wang

    Jackie Wang Guest

    Dear all,

    Here is a html code:

    <td valign="top" headers="col4">

    Premier Community Bank of Southwest Florida
    <br />
    Fort Myers, FL

    </td>

    My question is how I can extract the strings and get the results:
    Premier Community Bank of Southwest Florida; Fort Myers, FL

    Thanks a lot

    Jackie
     
    Jackie Wang, Sep 5, 2008
    #1
    1. Advertising

  2. Hi,

    Jackie Wang wrote:
    > Here is a html code:
    >
    > <td valign="top" headers="col4">
    >
    > Premier Community Bank of Southwest Florida
    > <br />
    > Fort Myers, FL
    >
    > </td>
    >
    > My question is how I can extract the strings and get the results:
    > Premier Community Bank of Southwest Florida; Fort Myers, FL


    Use lxml.html. Something like this should do what you want:

    >>> from lxml import html
    >>> tree = html.parse("http://server.org/thefile.html")
    >>> all_tds = tree.findall("//td")
    >>> for td in all_tds:

    ... print( td.xpath("normalize-space()") )

    Tweak as you see fit, tree iteration is at your service in case you need more.

    http://codespeak.net/lxml/

    Stefan
     
    Stefan Behnel, Sep 5, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ken Adams
    Replies:
    3
    Views:
    5,483
    GeorgeL
    Jul 16, 2007
  2. Chris Brat
    Replies:
    5
    Views:
    699
    =?iso-8859-1?q?Luis_M._Gonz=E1lez?=
    Aug 22, 2006
  3. Ulysse
    Replies:
    7
    Views:
    448
    Ulysse
    Apr 2, 2007
  4. Stefan Behnel
    Replies:
    1
    Views:
    235
    ZelluX
    Aug 3, 2007
  5. Replies:
    2
    Views:
    158
    Seminex
    Jul 11, 2007
Loading...

Share This Page