urlretrieve get file name

Discussion in 'Python' started by Sven, Nov 9, 2006.

  1. Sven

    Sven Guest

    Hi guys and gals,

    I'm wrestling with the urlretrieve function in the urllib module. I
    want to download a file from a web server and save it locally with the
    same name. The problem is the URL - it's on the form
    http://www.page.com/?download=12345. It doesn't reveal the file name.
    Some hints to point me in the right direction are greatly appreciated.

    Sven
     
    Sven, Nov 9, 2006
    #1
    1. Advertising

  2. At Thursday 9/11/2006 19:11, Sven wrote:

    >I'm wrestling with the urlretrieve function in the urllib module. I
    >want to download a file from a web server and save it locally with the
    >same name. The problem is the URL - it's on the form
    >http://www.page.com/?download=12345. It doesn't reveal the file name.
    >Some hints to point me in the right direction are greatly appreciated.


    The file name *may* come in the Content-Disposition header (ex:
    Content-Disposition: attachment; filename="budget.xls")
    Use urlopen to obtain a file-like object; its info() method gives you
    those headers.


    --
    Gabriel Genellina
    Softlab SRL

    __________________________________________________
    Correo Yahoo!
    Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
    ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
     
    Gabriel Genellina, Nov 9, 2006
    #2
    1. Advertising

  3. Sven

    Sven Guest

    Hello Gabriel,

    Thanks for your help, but I'm a guy with no luck. :) I can't get the
    file name from response header...

    On Nov 10, 12:39 am, Gabriel Genellina <> wrote:
    > At Thursday 9/11/2006 19:11, Sven wrote:
    >
    > >I'm wrestling with the urlretrieve function in the urllib module. I
    > >want to download a file from a web server and save it locally with the
    > >same name. The problem is the URL - it's on the form
    > >http://www.page.com/?download=12345. It doesn't reveal the file name.
    > >Some hints to point me in the right direction are greatly appreciated.The file name *may* come in the Content-Disposition header (ex:

    > Content-Disposition: attachment; filename="budget.xls")
    > Use urlopen to obtain a file-like object; its info() method gives you
    > those headers.
    >
    > --
    > Gabriel Genellina
    > Softlab SRL
    >
    > __________________________________________________
    > Correo Yahoo!
    > Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
    > ¡Abrí tu cuenta ya! -http://correo.yahoo.com.ar
     
    Sven, Nov 9, 2006
    #3
  4. At Thursday 9/11/2006 20:52, Sven wrote:

    >Thanks for your help, but I'm a guy with no luck. :) I can't get the
    >file name from response header...


    Try using a browser and "Save as..."; if it suggests a file name, it
    *must* be in the headers - so look again carefully.
    If it does not suggests a filen ame, the server is not providing one
    (there is no obligation to do so).


    --
    Gabriel Genellina
    Softlab SRL

    __________________________________________________
    Correo Yahoo!
    Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
    ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
     
    Gabriel Genellina, Nov 10, 2006
    #4
  5. Sven

    Sven Guest

    Yes the browser suggests a file name, but I did a little research using
    http://web-sniffer.net/. The Response Header contains roughly this:

    HTTP Status Code: HTTP/1.1 302 Found
    Location: http://page.com/filename.zip
    Content-Length: 0
    Connection: close
    Content-Type: text/html

    The status code 302 tells the browser where to find the file. The funny
    thing is that calling the info() function, on the file-like response
    object, in Python doesn't return the same header. I'm so stuck. :)
    Thanks for your help.

    On 10 Nov, 01:27, Gabriel Genellina <> wrote:
    > At Thursday 9/11/2006 20:52, Sven wrote:
    >
    > >Thanks for your help, but I'm a guy with no luck. :) I can't get the
    > >file name from response header...Try using a browser and "Save as..."; if it suggests a file name, it

    > *must* be in the headers - so look again carefully.
    > If it does not suggests a filen ame, the server is not providing one
    > (there is no obligation to do so).
    >
    > --
    > Gabriel Genellina
    > Softlab SRL
    >
    > __________________________________________________
    > Correo Yahoo!
    > Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
    > ¡Abrí tu cuenta ya! -http://correo.yahoo.com.ar
     
    Sven, Nov 10, 2006
    #5
  6. At Friday 10/11/2006 16:58, Sven wrote:

    >Yes the browser suggests a file name, but I did a little research using
    >http://web-sniffer.net/. The Response Header contains roughly this:
    >
    >HTTP Status Code: HTTP/1.1 302 Found
    >Location: http://page.com/filename.zip
    >Content-Length: 0
    >Connection: close
    >Content-Type: text/html
    >
    >The status code 302 tells the browser where to find the file. The funny
    >thing is that calling the info() function, on the file-like response
    >object, in Python doesn't return the same header. I'm so stuck. :)
    >Thanks for your help.


    Because urlopen is smart enough to detect the redirection and do a
    second request.
    You can use the geturl() method to obtain the true URL used (that
    would be http://page.com/filename.zip) and then rename the file.
    Or, you can install your own URLOpener (I think a FancyURLOpener with
    retries=0 would be OK) and process the Location header yourself. See
    the urllib documentation.


    --
    Gabriel Genellina
    Softlab SRL

    __________________________________________________
    Correo Yahoo!
    Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
    ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
     
    Gabriel Genellina, Nov 10, 2006
    #6
  7. Sven

    Sven Guest

    > You can use the geturl() method to obtain the true URL used (that
    > would behttp://page.com/filename.zip) and then rename the file.


    Thanks mate, this was exactly what I needed. A realy clean and simple
    solution to my problem. :)
     
    Sven, Nov 11, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. HP
    Replies:
    2
    Views:
    390
    Kevin Cazabon
    Jul 31, 2003
  2. Sam Sungshik Kong

    urllib.urlretrieve error

    Sam Sungshik Kong, May 23, 2004, in forum: Python
    Replies:
    2
    Views:
    604
    Sam Sungshik Kong
    May 24, 2004
  3. Replies:
    1
    Views:
    265
    Alex Martelli
    Apr 29, 2007
  4. Aldo Ceccarelli
    Replies:
    3
    Views:
    727
    7stud
    Feb 25, 2008
  5. Даниил Рыжков

    urllib, urlretrieve method, how to get headers?

    Даниил Рыжков, Jul 1, 2011, in forum: Python
    Replies:
    6
    Views:
    1,044
    Даниил Рыжков
    Jul 2, 2011
Loading...

Share This Page