need start point for getting html info from web

nephish · Oct 31, 2005

hey there,

i have a small app that i am going to need to get information from a
few tables on different websites. i have looked at urllib and httplib.
the sites i need to get data from mostly have this data in tables. So
that, i think would make it easier. Anyone suggest a good starting
point for me to find out how to do this, or know of a link to a good
how-to?
thanks,
sk

Mike Meyer · Oct 31, 2005

i have a small app that i am going to need to get information from a
few tables on different websites. i have looked at urllib and httplib.
the sites i need to get data from mostly have this data in tables. So
that, i think would make it easier. Anyone suggest a good starting
point for me to find out how to do this, or know of a link to a good
how-to?

Don't have a link to a howto. But you're halfway there. urllib (and
urllib2) will get HTML text from the websites. Pulling data from it
sort of depends on the nature of the HTML. If it's well-structured
XHTML, you can use your favorite xml library. if it's well structured
HTML, you can try htmllib, but it's pretty primitive. If it's not
well-structured, you can use BeautifulSoup. I've used it to pull data
from tables. The problem with any of this is that your code really
depends on the structure - or lack thereof - of the HTML you're
scraping. If they change it, your code breaks.

<mike

nephish · Oct 31, 2005

yeah, i know i am going to have to write a bunch of stuff because the
values i want to get come from several different sites. ah-well, just
wanting to know the easiest way to learn how to get started. i will
check into beautiful soup, i think i have heard it referred to before.
thanks
shawn

Paul McGuire · Oct 31, 2005

hey there,

i have a small app that i am going to need to get information from a
few tables on different websites. i have looked at urllib and httplib.
the sites i need to get data from mostly have this data in tables. So
that, i think would make it easier. Anyone suggest a good starting
point for me to find out how to do this, or know of a link to a good
how-to?
thanks,
sk

pyparsing comes with a simple HTML scraper example for extracting the NIST
NTP servers from an HTML table. pyparsing is also fairly tolerant of
"unclean" HTML. Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

alex_f_il · Oct 31, 2005

You can easily do it with SW Explorer Automation
(http://home.comcast.net/~furmana/SWIEAutomation.htm).
The program creates an automation API for any Web application which
uses HTML and DHTML and works with Microsoft Internet Explorer. The Web
application becomes programmatically accessible from any .NET language.

The tool has Visual Table Data Extractor. It allows visually define the
table structure. The table becomes accessible from the code as
DataTable class. You can develop the extraction script in hours with
the tool.

Getting extra blank rows from appending HTML..?	2	Oct 24, 2023
Web scraping i guess (Yet to start, maybe this should be done in python?)	1	Nov 10, 2021
CORS/Express: Getting data from server from domain html	2	Sep 3, 2022
Need total amount displayed of data-price attribute from each table	2	Jul 3, 2022
Bash scripts for web apps	1	Jan 16, 2023
Python point location of intersect between two lines	0	Feb 28, 2018
How to start, if at all ?	2	Apr 18, 2022
Using JS to verify registration info?	1	Mar 19, 2020

need start point for getting html info from web

nephish

Mike Meyer

nephish

Paul McGuire

alex_f_il

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads