Unzipping a .zip properly, and from a remote URL

Christopher Culver · Feb 3, 2009

Returning to Python after several years away, I'm working on a little
script that will download a ZIP archive from a website and unzip it to
a mounted filesystem. The code is below, and it works so far, but I'm
unsure of a couple of things.

The first is, is there a way to read the .zip into memory without the
use of a temporary file? If I do archive = zipfile.ZipFile(remotedata.read())
directly without creating a temporary file, the zipfile module
complains that the data is in the wrong string type.

The second issue is that I don't know if this is the correct way to
unpack a file onto the filesystem. It's strange that the zipfile
module has no one simple function to unpack a zip onto the disk. Does
this code seem especially liable to break?

try:
remotedata = urllib2.urlopen(theurl)
except IOError:
print("Network down.")
sys.exit()
data = os.tmpfile()
data.write(remotedata.read())

archive = zipfile.ZipFile(data)
if archive.testzip() != None:
print "Invalid zipfile"
sys.exit()
contents = archive.namelist()

for item in contents:
try:
os.makedirs(os.path.join(mountpoint, os.path.dirname(item)))
except OSError:
# OSError means that the dir already exists, but no matter.
pass
if item[-1] != "/":
outputfile = open(os.path.join(mountpoint, item), 'w')
outputfile.write(archive.read(item))
outputfile.close()

Tino Wildenhain · Feb 3, 2009

Hi,

Christopher said:
Returning to Python after several years away, I'm working on a little
script that will download a ZIP archive from a website and unzip it to
a mounted filesystem. The code is below, and it works so far, but I'm
unsure of a couple of things.

The first is, is there a way to read the .zip into memory without the
use of a temporary file? If I do archive = zipfile.ZipFile(remotedata.read())
directly without creating a temporary file, the zipfile module
complains that the data is in the wrong string type.

Which makes sense given the documentation (note you can either browse
the HTML online/offline or just use help() within the interpreter/ide:

Help on class ZipFile in module zipfile:

class ZipFile
| Class with methods to open, read, write, close, list zip files.
|
| z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True)
|
| file: Either the path to the file, or a file-like object.
| If it is a path, the file will be opened and closed by ZipFile.
| mode: The mode can be either read "r", write "w" or append "a".
| compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires
zlib).
| allowZip64: if True ZipFile will create files with ZIP64 extensions
when
| needed, otherwise it will raise an exception when this
would
| be necessary.
|
....

so instead you would use archive = zipfile.ZipFile(remotedata)

The second issue is that I don't know if this is the correct way to
unpack a file onto the filesystem. It's strange that the zipfile
module has no one simple function to unpack a zip onto the disk. Does
this code seem especially liable to break?

try:
remotedata = urllib2.urlopen(theurl)
except IOError:
print("Network down.")
sys.exit()
data = os.tmpfile()
data.write(remotedata.read())

archive = zipfile.ZipFile(data)
if archive.testzip() != None:
print "Invalid zipfile"
sys.exit()
contents = archive.namelist()

for item in contents:

....

here you should check the zipinfo entry and normalize
and clean the path just in case to avoid unpacking a zipfile
with special crafted paths (like /etc/passwd and such)

Maybe also checking for the various encodings (like utf8)
in pathnames makes sense.

The dir-creation could be put into a class with caching
of already existing subdirectories created and recursive
creation of missing subdirectories as well es to make
sure you do not ascend out of your target directory by
accident (or crafted zip, see above).

Regards
Tino

Christopher Culver · Feb 3, 2009

Tino Wildenhain said:
so instead you would use archive = zipfile.ZipFile(remotedata)

That produces the following error if I try that in the Python
interpreter (URL edited for privacy):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/zipfile.py", line 346, in __init__
self._GetContents()
File "/usr/lib/python2.5/zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "/usr/lib/python2.5/zipfile.py", line 376, in _RealGetContents
endrec = _EndRecData(fp)
File "/usr/lib/python2.5/zipfile.py", line 133, in _EndRecData
fpin.seek(-22, 2) # Assume no archive comment.
AttributeError: addinfourl instance has no attribute 'seek'

M.-A. Lemburg · Feb 4, 2009

Try this:
File Name Modified Size
locale/Makefile.pre.in 1997-10-31 21:13:06 9818
locale/Setup.in 1997-10-31 21:14:04 74
locale/locale.c 1997-11-19 17:36:46 4698
locale/CommandLine.py 1997-11-19 15:50:02 2306
locale/probe.py 1997-11-19 15:51:18 1870
locale/__init__.py 1997-11-19 17:55:02 0

The trick is to use a StringIO buffer to provide the .seek()
method.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 04 2009)________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/

BadZipfile "file is not a zip file"	33	Jan 9, 2009
Download a zip file and extract to a directory	1	Apr 9, 2010
Script to make Windows XP-readable ZIP file	12	May 19, 2006
Simple script to make .png thumbnails from .zip archive...	9	Jun 18, 2006
Adding to a zip file	3	Jul 28, 2004
ZipFile output (ZIP file) not accessable from XP 2002	0	Oct 23, 2003
compressed file ended before the logical end-of-stream was detected	1	Jul 5, 2006
How to scrap content from a remote dynamic url	0	Nov 11, 2007

Unzipping a .zip properly, and from a remote URL

Christopher Culver

Tino Wildenhain

Christopher Culver

M.-A. Lemburg

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads