Agnostic fetching

J

jorpheus

OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.
 
B

Bruce Frederiksen

OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.

You might try the os.path module and/or the glob module in the standard
python library.
 
T

Terry Reedy

jorpheus said:
OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.

If you are asking whether servers will let you go fishing around their
file system, the answer is that http is not designed for that (whereas
ftp is as long as you stay under the main ftp directory). You can try
random file names, but the server may get unhappy and think you are
trying to break in through a back door or something. You are *expected*
to start at ..../index.html and proceed with the links given there. Or
to use a valid filename that was retrieved by that method.
 
D

Diez B. Roggisch

Bruce said:
You might try the os.path module and/or the glob module in the standard
python library.

Not on remote locations. The only work on your local filesystem.

Diez
 
M

Michael Torrie

jorpheus said:
OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.

If you happen to have a URL that simply lists files, then what you have
to do is relatively simple. Just fetch the html from the folder url,
then parse the html and look for the anchor tags. You can then fetch
those anchor urls that interest you. BeautifulSoup can help out with
this. Should be able to list all anchor tags in an html string in just
one line of code. Combine urllib2 and BeautifulSoup and you'll have a
winner.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Staff online

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top