Get directory from http web site

rock69 · Jul 22, 2005

Hi all

I was wondering if there's some neat and easy way to get the entire
contents of a directory at a specific web url address.

I have the following link:

http://www.infomedia.it/immagini/riviste/covers/cp

and as you can see it's just a list containing all the files (images)
that I need. Is it possible to retrieve this list (not the physical
files) and have it stored in a variable of type list or something?

And, if so, what would be the easiest and most efficient way?

Thank you so much in advance.

Rock

Sybren Stuvel · Jul 22, 2005

rock69 enlightened us with:

I was wondering if there's some neat and easy way to get the entire
contents of a directory at a specific web url address. [...] Is it
possible to retrieve this list (not the physical files) and have it
stored in a variable of type list or something?

Check out the chapter on HTML parsing at
http://www.diveintopython.org/

Sybren

Kent Johnson · Aug 6, 2005

rock69 said:
Hi all

I was wondering if there's some neat and easy way to get the entire
contents of a directory at a specific web url address.

I have the following link:

http://www.infomedia.it/immagini/riviste/covers/cp

and as you can see it's just a list containing all the files (images)
that I need. Is it possible to retrieve this list (not the physical
files) and have it stored in a variable of type list or something?

BeautifulSoup and urllib do this easily:

>>> from BeautifulSoup import BeautifulSoup
>>> import urllib
>>> data = urllib.urlopen('http://www.infomedia.it/immagini/riviste/covers/cp/').read()
>>> soup = BeautifulSoup(data)
>>> anchors = soup.fetch('a')
>>> len(anchors) 164
>>> for a in anchors[:10]:

Click to expand...

Click to expand...

... print a['href'], a.string
...
?N=D Name
?M=A Last modified
?S=A Size
?D=A Description
/immagini/riviste/covers/ Parent Directory
cp100.jpg cp100.jpg
cp100sm.jpg cp100sm.jpg
cp101.jpg cp101.jpg
cp101sm.jpg cp101sm.jpg
cp102.jpg cp102.jpg

http://www.crummy.com/software/BeautifulSoup/

Kent

lemon97 · Aug 7, 2005

You might want to also modify your c:/python/Lib/urllib.py file.

By adding/modifying the following headers.

self.addheaders = [('User-agent', 'Mozilla/4.0')]
#Trick the server into thinking it is explorer

self.addheaders = [('Referer','http://www.infomedia.it')]
#Trick the site that you clicked on a link from their site.

PHP Web Site Counter?	5	Apr 3, 2023
Help figuring out a directory permission change problem	1	May 12, 2023
Using GIT to get remote code	1	Dec 30, 2021
Sending data from web page to Raspberry Pi	0	Nov 26, 2022
Retrieving and saving images from internet address	3	Jul 25, 2005
Directory Caching, suggestions and comments?	0	May 15, 2014
Simple web framework - improvements to makefile	0	Feb 1, 2023
Cross-platform way to get default directory for binary files likeconsole scripts?	9	Feb 20, 2014

Get directory from http web site

rock69

Sybren Stuvel

Kent Johnson

lemon97

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads