urlretrieve get file name

S

Sven

Hi guys and gals,

I'm wrestling with the urlretrieve function in the urllib module. I
want to download a file from a web server and save it locally with the
same name. The problem is the URL - it's on the form
http://www.page.com/?download=12345. It doesn't reveal the file name.
Some hints to point me in the right direction are greatly appreciated.

Sven
 
G

Gabriel Genellina

I'm wrestling with the urlretrieve function in the urllib module. I
want to download a file from a web server and save it locally with the
same name. The problem is the URL - it's on the form
http://www.page.com/?download=12345. It doesn't reveal the file name.
Some hints to point me in the right direction are greatly appreciated.

The file name *may* come in the Content-Disposition header (ex:
Content-Disposition: attachment; filename="budget.xls")
Use urlopen to obtain a file-like object; its info() method gives you
those headers.


--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
 
S

Sven

Hello Gabriel,

Thanks for your help, but I'm a guy with no luck. :) I can't get the
file name from response header...
 
G

Gabriel Genellina

Thanks for your help, but I'm a guy with no luck. :) I can't get the
file name from response header...

Try using a browser and "Save as..."; if it suggests a file name, it
*must* be in the headers - so look again carefully.
If it does not suggests a filen ame, the server is not providing one
(there is no obligation to do so).


--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
 
S

Sven

Yes the browser suggests a file name, but I did a little research using
http://web-sniffer.net/. The Response Header contains roughly this:

HTTP Status Code: HTTP/1.1 302 Found
Location: http://page.com/filename.zip
Content-Length: 0
Connection: close
Content-Type: text/html

The status code 302 tells the browser where to find the file. The funny
thing is that calling the info() function, on the file-like response
object, in Python doesn't return the same header. I'm so stuck. :)
Thanks for your help.
 
G

Gabriel Genellina

Yes the browser suggests a file name, but I did a little research using
http://web-sniffer.net/. The Response Header contains roughly this:

HTTP Status Code: HTTP/1.1 302 Found
Location: http://page.com/filename.zip
Content-Length: 0
Connection: close
Content-Type: text/html

The status code 302 tells the browser where to find the file. The funny
thing is that calling the info() function, on the file-like response
object, in Python doesn't return the same header. I'm so stuck. :)
Thanks for your help.

Because urlopen is smart enough to detect the redirection and do a
second request.
You can use the geturl() method to obtain the true URL used (that
would be http://page.com/filename.zip) and then rename the file.
Or, you can install your own URLOpener (I think a FancyURLOpener with
retries=0 would be OK) and process the Location header yourself. See
the urllib documentation.


--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
 
S

Sven

You can use the geturl() method to obtain the true URL used (that
would behttp://page.com/filename.zip) and then rename the file.

Thanks mate, this was exactly what I needed. A realy clean and simple
solution to my problem. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top