Agnostic fetching

jorpheus · Aug 2, 2008

OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.

Bruce Frederiksen · Aug 2, 2008

OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.

You might try the os.path module and/or the glob module in the standard
python library.

Terry Reedy · Aug 2, 2008

jorpheus said:
OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.

If you are asking whether servers will let you go fishing around their
file system, the answer is that http is not designed for that (whereas
ftp is as long as you stay under the main ftp directory). You can try
random file names, but the server may get unhappy and think you are
trying to break in through a back door or something. You are *expected*
to start at ..../index.html and proceed with the links given there. Or
to use a valid filename that was retrieved by that method.

Diez B. Roggisch · Aug 2, 2008

Bruce said:
You might try the os.path module and/or the glob module in the standard
python library.

Not on remote locations. The only work on your local filesystem.

Diez

Michael Torrie · Aug 2, 2008

jorpheus said:
OK, that sounds stupid. Anyway, I've been learning Python for some
time now, and am currently having fun with the urllib and urllib2
modules, but have run into a problem(?) - is there any way to fetch
(urllib.retrieve) files from a server without knowing the filenames?
For instance, there is smth like folder/spam.egg, folder/
unpredictable.egg and so on. If not, perhaps some kind of glob to
create a list of existing files? I'd really appreciate some help,
since I'm really out of my (newb) depth here.

If you happen to have a URL that simply lists files, then what you have
to do is relatively simple. Just fetch the html from the folder url,
then parse the html and look for the anchor tags. You can then fetch
those anchor urls that interest you. BeautifulSoup can help out with
this. Should be able to list all anchor tags in an html string in just
one line of code. Combine urllib2 and BeautifulSoup and you'll have a
winner.

Imports (in Py3), please help a novice	4	Jun 16, 2013
Object is not missing constant	0	May 22, 2014
PyWart: The problem with "print"	102	Jun 2, 2013
Storing images for a web application.	3	Aug 5, 2007
[ANN]UliPad 3.3 is released	0	Aug 23, 2006
JSP Web-Development [newbie]	32	Jul 28, 2010
Erin/Sumaya Fannoun	0	Oct 22, 2009
Life	0	Sep 22, 2010

Agnostic fetching

jorpheus

Bruce Frederiksen

Terry Reedy

Diez B. Roggisch

Michael Torrie

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads