Decompressing a file retrieved by URL seems too complex

J

John Nagle

I'm reading a URL which is a .gz file, and decompressing
it. This works, but it seems far too complex. Yet
none of the "wrapping" you might expect to work
actually does. You can't wrap a GzipFile around
an HTTP connection, because GzipFile, reasonably enough,
needs random access, and tries to do "seek" and "tell".
Nor is the output descriptor from gzip general; it fails
on "readline", but accepts "read". (No good reason
for that.) So I had to make a second copy.

John Nagle

def readurl(url) :
if url.endswith(".gz") :
nd = urllib2.urlopen(url,timeout=TIMEOUTSECS)
td1 = tempfile.TemporaryFile() # compressed file
td1.write(nd.read()) # fetch and copy file
nd.close() # done with network
td2 = tempfile.TemporaryFile() # decompressed file
td1.seek(0) # rewind
gd = gzip.GzipFile(fileobj=td1, mode="rb") # wrap unzip
td2.write(gd.read()) # decompress file
td1.close() # done with compressed copy
td2.seek(0) # rewind
return(td2) # return file object for compressed object
else :
return(urllib2.urlopen(url,timeout=TIMEOUTSECS))
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top