need start point for getting html info from web

Discussion in 'Python' started by nephish@xit.net, Oct 31, 2005.

  1. Guest

    hey there,

    i have a small app that i am going to need to get information from a
    few tables on different websites. i have looked at urllib and httplib.
    the sites i need to get data from mostly have this data in tables. So
    that, i think would make it easier. Anyone suggest a good starting
    point for me to find out how to do this, or know of a link to a good
    how-to?
    thanks,
    sk
    , Oct 31, 2005
    #1
    1. Advertising

  2. Mike Meyer Guest

    writes:
    > i have a small app that i am going to need to get information from a
    > few tables on different websites. i have looked at urllib and httplib.
    > the sites i need to get data from mostly have this data in tables. So
    > that, i think would make it easier. Anyone suggest a good starting
    > point for me to find out how to do this, or know of a link to a good
    > how-to?


    Don't have a link to a howto. But you're halfway there. urllib (and
    urllib2) will get HTML text from the websites. Pulling data from it
    sort of depends on the nature of the HTML. If it's well-structured
    XHTML, you can use your favorite xml library. if it's well structured
    HTML, you can try htmllib, but it's pretty primitive. If it's not
    well-structured, you can use BeautifulSoup. I've used it to pull data
    from tables. The problem with any of this is that your code really
    depends on the structure - or lack thereof - of the HTML you're
    scraping. If they change it, your code breaks.

    <mike
    --
    Mike Meyer <> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
    Mike Meyer, Oct 31, 2005
    #2
    1. Advertising

  3. Guest

    Re: need start point for getting html info from web

    yeah, i know i am going to have to write a bunch of stuff because the
    values i want to get come from several different sites. ah-well, just
    wanting to know the easiest way to learn how to get started. i will
    check into beautiful soup, i think i have heard it referred to before.
    thanks
    shawn
    , Oct 31, 2005
    #3
  4. Paul McGuire Guest

    <> wrote in message
    news:...
    > hey there,
    >
    > i have a small app that i am going to need to get information from a
    > few tables on different websites. i have looked at urllib and httplib.
    > the sites i need to get data from mostly have this data in tables. So
    > that, i think would make it easier. Anyone suggest a good starting
    > point for me to find out how to do this, or know of a link to a good
    > how-to?
    > thanks,
    > sk
    >

    pyparsing comes with a simple HTML scraper example for extracting the NIST
    NTP servers from an HTML table. pyparsing is also fairly tolerant of
    "unclean" HTML. Download pyparsing at http://pyparsing.sourceforge.net.

    -- Paul
    Paul McGuire, Oct 31, 2005
    #4
  5. Guest

    Re: need start point for getting html info from web

    You can easily do it with SW Explorer Automation
    (http://home.comcast.net/~furmana/SWIEAutomation.htm).
    The program creates an automation API for any Web application which
    uses HTML and DHTML and works with Microsoft Internet Explorer. The Web
    application becomes programmatically accessible from any .NET language.


    The tool has Visual Table Data Extractor. It allows visually define the
    table structure. The table becomes accessible from the code as
    DataTable class. You can develop the extraction script in hours with
    the tool.
    , Oct 31, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lucas Tam
    Replies:
    0
    Views:
    486
    Lucas Tam
    Jun 17, 2005
  2. Replies:
    14
    Views:
    626
    Rincewind
    Sep 8, 2005
  3. Replies:
    3
    Views:
    279
  4. Kiran
    Replies:
    0
    Views:
    1,205
    Kiran
    Feb 6, 2005
  5. Saraswati lakki
    Replies:
    0
    Views:
    1,285
    Saraswati lakki
    Jan 6, 2012
Loading...

Share This Page