Unzipping a .zip properly, and from a remote URL

  • Thread starter Christopher Culver
  • Start date
C

Christopher Culver

Returning to Python after several years away, I'm working on a little
script that will download a ZIP archive from a website and unzip it to
a mounted filesystem. The code is below, and it works so far, but I'm
unsure of a couple of things.

The first is, is there a way to read the .zip into memory without the
use of a temporary file? If I do archive = zipfile.ZipFile(remotedata.read())
directly without creating a temporary file, the zipfile module
complains that the data is in the wrong string type.

The second issue is that I don't know if this is the correct way to
unpack a file onto the filesystem. It's strange that the zipfile
module has no one simple function to unpack a zip onto the disk. Does
this code seem especially liable to break?

try:
remotedata = urllib2.urlopen(theurl)
except IOError:
print("Network down.")
sys.exit()
data = os.tmpfile()
data.write(remotedata.read())

archive = zipfile.ZipFile(data)
if archive.testzip() != None:
print "Invalid zipfile"
sys.exit()
contents = archive.namelist()

for item in contents:
try:
os.makedirs(os.path.join(mountpoint, os.path.dirname(item)))
except OSError:
# OSError means that the dir already exists, but no matter.
pass
if item[-1] != "/":
outputfile = open(os.path.join(mountpoint, item), 'w')
outputfile.write(archive.read(item))
outputfile.close()
 
T

Tino Wildenhain

Hi,

Christopher said:
Returning to Python after several years away, I'm working on a little
script that will download a ZIP archive from a website and unzip it to
a mounted filesystem. The code is below, and it works so far, but I'm
unsure of a couple of things.

The first is, is there a way to read the .zip into memory without the
use of a temporary file? If I do archive = zipfile.ZipFile(remotedata.read())
directly without creating a temporary file, the zipfile module
complains that the data is in the wrong string type.

Which makes sense given the documentation (note you can either browse
the HTML online/offline or just use help() within the interpreter/ide:

Help on class ZipFile in module zipfile:

class ZipFile
| Class with methods to open, read, write, close, list zip files.
|
| z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True)
|
| file: Either the path to the file, or a file-like object.
| If it is a path, the file will be opened and closed by ZipFile.
| mode: The mode can be either read "r", write "w" or append "a".
| compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires
zlib).
| allowZip64: if True ZipFile will create files with ZIP64 extensions
when
| needed, otherwise it will raise an exception when this
would
| be necessary.
|
....

so instead you would use archive = zipfile.ZipFile(remotedata)

The second issue is that I don't know if this is the correct way to
unpack a file onto the filesystem. It's strange that the zipfile
module has no one simple function to unpack a zip onto the disk. Does
this code seem especially liable to break?

try:
remotedata = urllib2.urlopen(theurl)
except IOError:
print("Network down.")
sys.exit()
data = os.tmpfile()
data.write(remotedata.read())

archive = zipfile.ZipFile(data)
if archive.testzip() != None:
print "Invalid zipfile"
sys.exit()
contents = archive.namelist()

for item in contents:
....

here you should check the zipinfo entry and normalize
and clean the path just in case to avoid unpacking a zipfile
with special crafted paths (like /etc/passwd and such)

Maybe also checking for the various encodings (like utf8)
in pathnames makes sense.

The dir-creation could be put into a class with caching
of already existing subdirectories created and recursive
creation of missing subdirectories as well es to make
sure you do not ascend out of your target directory by
accident (or crafted zip, see above).

Regards
Tino
 
C

Christopher Culver

Tino Wildenhain said:
so instead you would use archive = zipfile.ZipFile(remotedata)

That produces the following error if I try that in the Python
interpreter (URL edited for privacy):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/zipfile.py", line 346, in __init__
self._GetContents()
File "/usr/lib/python2.5/zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "/usr/lib/python2.5/zipfile.py", line 376, in _RealGetContents
endrec = _EndRecData(fp)
File "/usr/lib/python2.5/zipfile.py", line 133, in _EndRecData
fpin.seek(-22, 2) # Assume no archive comment.
AttributeError: addinfourl instance has no attribute 'seek'
 
M

M.-A. Lemburg

Try this:
File Name Modified Size
locale/Makefile.pre.in 1997-10-31 21:13:06 9818
locale/Setup.in 1997-10-31 21:14:04 74
locale/locale.c 1997-11-19 17:36:46 4698
locale/CommandLine.py 1997-11-19 15:50:02 2306
locale/probe.py 1997-11-19 15:51:18 1870
locale/__init__.py 1997-11-19 17:55:02 0

The trick is to use a StringIO buffer to provide the .seek()
method.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 04 2009)________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top