Decompressing a file retrieved by URL seems too complex

Discussion in 'Python' started by John Nagle, Aug 12, 2010.

  1. John Nagle

    John Nagle Guest

    (Repost with better indentation)
    I'm reading a URL which is a .gz file, and decompressing
    it. This works, but it seems far too complex. Yet
    none of the "wrapping" you might expect to work
    actually does. You can't wrap a GzipFile around
    an HTTP connection, because GzipFile, reasonably enough,
    needs random access, and tries to do "seek" and "tell".
    Nor is the output descriptor from gzip general; it fails
    on "readline", but accepts "read". (No good reason
    for that.) So I had to make a second copy.

    John Nagle

    def readurl(url) :
    if url.endswith(".gz") :
    nd = urllib2.urlopen(url,timeout=TIMEOUTSECS)
    td1 = tempfile.TemporaryFile() # compressed file
    td1.write(nd.read()) # fetch and copy file
    nd.close() # done with network
    td2 = tempfile.TemporaryFile() # decompressed file
    td1.seek(0) # rewind
    gd = gzip.GzipFile(fileobj=td1, mode="rb") # wrap unzip
    td2.write(gd.read()) # decompress file
    td1.close() # done with compressed copy
    td2.seek(0) # rewind
    return(td2) # return file object for compressed object
    else :
    return(urllib2.urlopen(url,timeout=TIMEOUTSECS))
     
    John Nagle, Aug 12, 2010
    #1
    1. Advertisements

  2. On Thursday 12 August 2010, it occurred to John Nagle to exclaim:
    > (Repost with better indentation)


    Good, good.

    >
    > def readurl(url) :
    > if url.endswith(".gz") :


    The file name could be anything. You should be checking the reponse Content-
    Type header -- that's what it's for.

    > nd = urllib2.urlopen(url,timeout=TIMEOUTSECS)
    > td1 = tempfile.TemporaryFile() # compressed file


    You can keep the whole thing in memory by using StringIO.

    > td1.write(nd.read()) # fetch and copy file


    You're reading the entire fire into memory anyway ;-)

    > nd.close() # done with network
    > td2 = tempfile.TemporaryFile() # decompressed file


    Okay, maybe there is somthing missing from GzipFile -- but still you could use
    StringIO again, I expect.

    > Nor is the output descriptor from gzip general; it fails
    > on "readline", but accepts "read".


    >>> from gzip import GzipFile
    >>> GzipFile.readline

    <unbound method GzipFile.readline>
    >>> GzipFile.readlines

    <unbound method GzipFile.readlines>
    >>> GzipFile.__iter__

    <unbound method GzipFile.__iter__>
    >>>


    What exactly is it that's failing, and how?


    > td1.seek(0) # rewind
    > gd = gzip.GzipFile(fileobj=td1, mode="rb") # wrap unzip
    > td2.write(gd.read()) # decompress file
    > td1.close() # done with compressed copy
    > td2.seek(0) # rewind
    > return(td2) # return file object for compressed object
    > else :
    > return(urllib2.urlopen(url,timeout=TIMEOUTSECS))
     
    Thomas Jollans, Aug 12, 2010
    #2
    1. Advertisements

  3. John Nagle

    Aahz Guest

    In article <4c645c39$0$1595$>,
    John Nagle <> wrote:
    >
    >I'm reading a URL which is a .gz file, and decompressing it. This
    >works, but it seems far too complex. Yet none of the "wrapping"
    >you might expect to work actually does. You can't wrap a GzipFile
    >around an HTTP connection, because GzipFile, reasonably enough, needs
    >random access, and tries to do "seek" and "tell". Nor is the output
    >descriptor from gzip general; it fails on "readline", but accepts
    >"read". (No good reason for that.) So I had to make a second copy.


    Also consider using zlib directly.
    --
    Aahz () <*> http://www.pythoncraft.com/

    "...if I were on life-support, I'd rather have it run by a Gameboy than a
    Windows box." --Cliff Wells
     
    Aahz, Aug 13, 2010
    #3
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jim Butler

    this seems too simple, to be accurate

    Jim Butler, Dec 31, 2003, in forum: ASP .Net
    Replies:
    9
    Views:
    705
    Wolfgang Kaml
    Jan 21, 2004
  2. I'm New to Java
    Replies:
    9
    Views:
    649
    Murray
    Jul 15, 2004
  3. Siegfried Heintze
    Replies:
    2
    Views:
    15,336
  4. SeanMon

    Decompressing gzip over FTP

    SeanMon, Aug 22, 2009, in forum: Python
    Replies:
    2
    Views:
    528
    Albert Hopkins
    Aug 22, 2009
  5. Andre
    Replies:
    3
    Views:
    1,643
    John Machin
    Aug 27, 2009
  6. John Nagle
    Replies:
    0
    Views:
    323
    John Nagle
    Aug 12, 2010
  7. Ahmad Azizan
    Replies:
    2
    Views:
    400
    Brian Candler
    Mar 22, 2010
  8. Yoo

    decompressing javascript

    Yoo, Jan 30, 2007, in forum: Javascript
    Replies:
    1
    Views:
    136
    David Dorward
    Jan 30, 2007
Loading...