web crawling.

S Borg · Jan 19, 2006

Hello,

I have been writing very simple Python programs that parse HTML and
such, mainly just to get
a better feel for the language. Here is my question: If I parsed an
HTML page into all of the image
files listed on that page, how could I request all of those images and
download them into some specified folder? I am sure this is quite easy,
but I am stuck.

Thank you very much.
Burgeoning Pythonista

Alex Martelli · Jan 19, 2006

S Borg said:
Hello,

I have been writing very simple Python programs that parse HTML and
such, mainly just to get
a better feel for the language. Here is my question: If I parsed an
HTML page into all of the image
files listed on that page, how could I request all of those images and
download them into some specified folder? I am sure this is quite easy,
but I am stuck.

There's a good crawler in the Demo directory of the Python source
distribution, so download and unpack said sources and look there.

Alex

gene tani · Jan 19, 2006

S said:
Hello,

I have been writing very simple Python programs that parse HTML and
such, mainly just to get
a better feel for the language. Here is my question: If I parsed an
HTML page into all of the image
files listed on that page, how could I request all of those images and
download them into some specified folder? I am sure this is quite easy,
but I am stuck.

Thank you very much.
Burgeoning Pythonista

http://sig.levillage.org/?p=588

Fuzzyman · Jan 19, 2006

Use BeautifulSoup to get all the image tags out of the html.

You'll need to join the urls of the images to the url of the page
(urlparse.urljoin off the top of my head). If you look at BeautifulSoup
you will see how to get the 'src' reference of each image tag.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

John M. Gabriele · Jan 20, 2006

Alex said:
There's a good crawler in the Demo directory of the Python source
distribution, so download and unpack said sources and look there.

Alex

Hm. Looks like that's:

Python-2.4.2/Tools/webchecker

See 'pydoc ./webchecker.py' for more info.

---J

Bash scripts for web apps	1	Jan 16, 2023
Is crawling the stack "bad"? Why?	13	Feb 25, 2008
Simple web framework - improvements to makefile	0	Feb 1, 2023
Web Page Parsing/Downloading	1	Nov 22, 2013
Having difficulty with the layout of these images / video for this web page	2	Jul 5, 2022
[C Language] Need help transferring Linux CodeBlocks Project to Windows CodeBlocks Project	1	Jun 19, 2023
Web Crawling/Threading and Things That Go Bump in the Night	1	Aug 4, 2006
Help with code	0	Jun 12, 2022

web crawling.

S Borg

Alex Martelli

gene tani

Fuzzyman

John M. Gabriele

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads