Decompressing a file retrieved by URL seems too complex

Discussion in 'Python' started by John Nagle, Aug 12, 2010.

  1. John Nagle

    John Nagle Guest

    (Repost with better indentation)
    I'm reading a URL which is a .gz file, and decompressing
    it. This works, but it seems far too complex. Yet
    none of the "wrapping" you might expect to work
    actually does. You can't wrap a GzipFile around
    an HTTP connection, because GzipFile, reasonably enough,
    needs random access, and tries to do "seek" and "tell".
    Nor is the output descriptor from gzip general; it fails
    on "readline", but accepts "read". (No good reason
    for that.) So I had to make a second copy.

    John Nagle

    def readurl(url) :
    if url.endswith(".gz") :
    nd = urllib2.urlopen(url,timeout=TIMEOUTSECS)
    td1 = tempfile.TemporaryFile() # compressed file
    td1.write(nd.read()) # fetch and copy file
    nd.close() # done with network
    td2 = tempfile.TemporaryFile() # decompressed file
    td1.seek(0) # rewind
    gd = gzip.GzipFile(fileobj=td1, mode="rb") # wrap unzip
    td2.write(gd.read()) # decompress file
    td1.close() # done with compressed copy
    td2.seek(0) # rewind
    return(td2) # return file object for compressed object
    else :
    return(urllib2.urlopen(url,timeout=TIMEOUTSECS))
    John Nagle, Aug 12, 2010
    #1
    1. Advertising

  2. On Thursday 12 August 2010, it occurred to John Nagle to exclaim:
    > (Repost with better indentation)


    Good, good.

    >
    > def readurl(url) :
    > if url.endswith(".gz") :


    The file name could be anything. You should be checking the reponse Content-
    Type header -- that's what it's for.

    > nd = urllib2.urlopen(url,timeout=TIMEOUTSECS)
    > td1 = tempfile.TemporaryFile() # compressed file


    You can keep the whole thing in memory by using StringIO.

    > td1.write(nd.read()) # fetch and copy file


    You're reading the entire fire into memory anyway ;-)

    > nd.close() # done with network
    > td2 = tempfile.TemporaryFile() # decompressed file


    Okay, maybe there is somthing missing from GzipFile -- but still you could use
    StringIO again, I expect.

    > Nor is the output descriptor from gzip general; it fails
    > on "readline", but accepts "read".


    >>> from gzip import GzipFile
    >>> GzipFile.readline

    <unbound method GzipFile.readline>
    >>> GzipFile.readlines

    <unbound method GzipFile.readlines>
    >>> GzipFile.__iter__

    <unbound method GzipFile.__iter__>
    >>>


    What exactly is it that's failing, and how?


    > td1.seek(0) # rewind
    > gd = gzip.GzipFile(fileobj=td1, mode="rb") # wrap unzip
    > td2.write(gd.read()) # decompress file
    > td1.close() # done with compressed copy
    > td2.seek(0) # rewind
    > return(td2) # return file object for compressed object
    > else :
    > return(urllib2.urlopen(url,timeout=TIMEOUTSECS))
    Thomas Jollans, Aug 12, 2010
    #2
    1. Advertising

  3. John Nagle

    Aahz Guest

    In article <4c645c39$0$1595$>,
    John Nagle <> wrote:
    >
    >I'm reading a URL which is a .gz file, and decompressing it. This
    >works, but it seems far too complex. Yet none of the "wrapping"
    >you might expect to work actually does. You can't wrap a GzipFile
    >around an HTTP connection, because GzipFile, reasonably enough, needs
    >random access, and tries to do "seek" and "tell". Nor is the output
    >descriptor from gzip general; it fails on "readline", but accepts
    >"read". (No good reason for that.) So I had to make a second copy.


    Also consider using zlib directly.
    --
    Aahz () <*> http://www.pythoncraft.com/

    "...if I were on life-support, I'd rather have it run by a Gameboy than a
    Windows box." --Cliff Wells
    Aahz, Aug 13, 2010
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. I'm New to Java
    Replies:
    9
    Views:
    447
    Murray
    Jul 15, 2004
  2. Fredrik Lundh
    Replies:
    0
    Views:
    584
    Fredrik Lundh
    Dec 14, 2006
  3. SeanMon

    Decompressing gzip over FTP

    SeanMon, Aug 22, 2009, in forum: Python
    Replies:
    2
    Views:
    316
    Albert Hopkins
    Aug 22, 2009
  4. John Nagle
    Replies:
    0
    Views:
    206
    John Nagle
    Aug 12, 2010
  5. Ahmad Azizan
    Replies:
    2
    Views:
    245
    Brian Candler
    Mar 22, 2010
Loading...

Share This Page