BadZipfile "file is not a zip file"

W

webcomm

The error...
Traceback (most recent call last):
File "<pyshell#23>", line 1, in <module>
file = zipfile.ZipFile('data.zip', "r")
File "C:\Python25\lib\zipfile.py", line 346, in __init__
self._GetContents()
File "C:\Python25\lib\zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "C:\Python25\lib\zipfile.py", line 378, in _RealGetContents
raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file

When I look at data.zip in Windows, it appears to be a valid zip
file. I am able to uncompress it in Windows XP, and can also
uncompress it with 7-Zip. It looks like zipfile is not able to read a
"table of contents" in the zip file. That's not a concept I'm
familiar with.

data.zip is created in this script...

decoded = base64.b64decode(datum)
f = open('data.zip', 'wb')
f.write(decoded)
f.close()
file = zipfile.ZipFile('data.zip', "r")

datum is a base64 encoded zip file. Again, I am able to open data.zip
as if it's a valid zip file. Maybe there is something wrong with the
approach I've taken to writing the data to data.zip? I'm not sure if
it matters, but the zipped data is Unicode.

What would cause a zip file to not have a table of contents? Is there
some way I can add a table of contents to a zip file using python?
Maybe there is some more fundamental problem with the data that is
making it seem like there is no table of contents?

Thanks in advance for your help.
Ryan
 
M

MRAB

webcomm said:
The error...

Traceback (most recent call last):
File "<pyshell#23>", line 1, in <module>
file = zipfile.ZipFile('data.zip', "r")
File "C:\Python25\lib\zipfile.py", line 346, in __init__
self._GetContents()
File "C:\Python25\lib\zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "C:\Python25\lib\zipfile.py", line 378, in _RealGetContents
raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file

When I look at data.zip in Windows, it appears to be a valid zip
file. I am able to uncompress it in Windows XP, and can also
uncompress it with 7-Zip. It looks like zipfile is not able to read a
"table of contents" in the zip file. That's not a concept I'm
familiar with.

data.zip is created in this script...

decoded = base64.b64decode(datum)
f = open('data.zip', 'wb')
f.write(decoded)
f.close()
file = zipfile.ZipFile('data.zip', "r")

datum is a base64 encoded zip file. Again, I am able to open data.zip
as if it's a valid zip file. Maybe there is something wrong with the
approach I've taken to writing the data to data.zip? I'm not sure if
it matters, but the zipped data is Unicode.

What would cause a zip file to not have a table of contents? Is there
some way I can add a table of contents to a zip file using python?
Maybe there is some more fundamental problem with the data that is
making it seem like there is no table of contents?
You're just creating a file called "data.zip". That doesn't make it a
zip file. A zip file has a specific format. If the file doesn't have
that format then the zipfile module will complain.
 
W

webcomm

You're just creating a file called "data.zip". That doesn't make it a
zip file. A zip file has a specific format. If the file doesn't have
that format then the zipfile module will complain.

Hmm. When I open it in Windows or with 7-Zip, it contains a text file
that has the data I would expect it to have. I guess that alone
doesn't necessarily prove it's a zip file?

datum is something I'm downloading via a web service. The providers
of the service say it's a zip file, and have provided a code sample in
C# (which I know nothing about) that shows how to deal with it. In
the code sample, the file is base64 decoded and then unzipped. I'm
trying to write something in Python to decode and unzip the file.

I checked the file for comments and it has none. At least, when I
view the properties in Windows, there are no comments.
 
J

James Mills

Hmm. When I open it in Windows or with 7-Zip, it contains a text file
that has the data I would expect it to have. I guess that alone
doesn't necessarily prove it's a zip file?

datum is something I'm downloading via a web service. The providers
of the service say it's a zip file, and have provided a code sample in
C# (which I know nothing about) that shows how to deal with it. In
the code sample, the file is base64 decoded and then unzipped. I'm
trying to write something in Python to decode and unzip the file.

Send us a sample of this file in question...

cheers
James
 
W

webcomm

Send us a sample of this file in question...

It contains data that I can't share publicly. I could ask the
providers of the service if they have a dummy file I could use that
doesn't contain any real data, but I don't know how responsive they'll
be. It's an event registration service called RegOnline.
 
M

MRAB

webcomm said:
Hmm. When I open it in Windows or with 7-Zip, it contains a text
file that has the data I would expect it to have. I guess that alone
doesn't necessarily prove it's a zip file?

datum is something I'm downloading via a web service. The providers
of the service say it's a zip file, and have provided a code sample
in C# (which I know nothing about) that shows how to deal with it.
In the code sample, the file is base64 decoded and then unzipped.
I'm trying to write something in Python to decode and unzip the file.

I checked the file for comments and it has none. At least, when I
view the properties in Windows, there are no comments.
Ah, OK. You didn't explicitly say in your original posting that the
decoded data was definitely zipfile data. There was a thread a month ago
about gzip Unix commands which could also handle non-gzipped files and I
was wondering whether this problem was something like that. Have you
tried gzip instead?
 
S

Steven D'Aprano

The error... ....
BadZipfile: File is not a zip file

When I look at data.zip in Windows, it appears to be a valid zip file.
I am able to uncompress it in Windows XP, and can also uncompress it
with 7-Zip. It looks like zipfile is not able to read a "table of
contents" in the zip file. That's not a concept I'm familiar with.

No, ZipFile can read table of contents:

Help on method printdir in module zipfile:

printdir(self) unbound zipfile.ZipFile method
Print a table of contents for the zip file.



In my experience, zip files originating from Windows sometimes have
garbage at the end of the file. WinZip just ignores the garbage, but
other tools sometimes don't -- if I recall correctly, Linux unzip
successfully unzips the file but then complains that the file was
corrupt. It's possible that you're running into a similar problem.

data.zip is created in this script...

decoded = base64.b64decode(datum)
f = open('data.zip', 'wb')
f.write(decoded)
f.close()
file = zipfile.ZipFile('data.zip', "r")

datum is a base64 encoded zip file. Again, I am able to open data.zip
as if it's a valid zip file. Maybe there is something wrong with the
approach I've taken to writing the data to data.zip? I'm not sure if it
matters, but the zipped data is Unicode.


The full signature of ZipFile is:

ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True)

Try passing compression=zipfile.ZIP_DEFLATED and/or allowZip64=False and
see if that makes any difference.

The zip format does support alternative compression methods, it's
possible that this particular file uses a different sort of compression
which Python doesn't deal with.

What would cause a zip file to not have a table of contents?

What makes you think it doesn't have one?
 
M

Martin v. Löwis

What would cause a zip file to not have a table of contents?

AFAICT, _EndRecData is failing to find the "end of zipfile" structure in
the file. You might want debug through it to see where it looks, and how
it decides that this structure is not present in the file. Towards
22 bytes before the end of the file, the bytes PK\005\006 should appear.
If they don't appear, you don't have a zipfile. If they appear, but
elsewhere towards the end of the file, there might be a bug in the
zip file module (or, more likely, the zip file uses an optional zip
feature which the module doesn't implement).

Regards,
Martin
 
C

Carl Banks

No, ZipFile can read table of contents:

    Help on method printdir in module zipfile:

    printdir(self) unbound zipfile.ZipFile method
        Print a table of contents for the zip file.

In my experience, zip files originating from Windows sometimes have
garbage at the end of the file. WinZip just ignores the garbage, but
other tools sometimes don't -- if I recall correctly, Linux unzip
successfully unzips the file but then complains that the file was
corrupt. It's possible that you're running into a similar problem.


The zipfile format is kind of brain dead, you can't tell where the end
of the file is supposed to be by looking at the header. If the end of
file hasn't yet been reached there could be more data. To make
matters worse, somehow zip files came to have text comments simply
appended to the end of them. (Probably this was for the benefit of
people who would cat them to the terminal.)

Anyway, if you see something that doesn't adhere to the zipfile
format, you don't have any foolproof way to know if it's because the
file is corrupted or if it's just an appended comment.

Most zipfile readers use a heuristic to distinguish. Python's zipfile
module just assumes it's corrupted.

The following post from a while back gives a solution that tries to
snip the comment off so that zipfile module can handle it. It might
help you out.

http://groups.google.com/group/comp.lang.python/msg/c2008e48368c6543


Carl Banks
 
J

John Machin

No, ZipFile can read table of contents:

    Help on method printdir in module zipfile:

    printdir(self) unbound zipfile.ZipFile method
        Print a table of contents for the zip file.

In my experience, zip files originating from Windows sometimes have
garbage at the end of the file. WinZip just ignores the garbage, but
other tools sometimes don't -- if I recall correctly, Linux unzip
successfully unzips the file but then complains that the file was
corrupt. It's possible that you're running into a similar problem.




The full signature of ZipFile is:

ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True)

Try passing compression=zipfile.ZIP_DEFLATED and/or allowZip64=False and
see if that makes any difference.

"compression" is irrelevant when reading. The compression method used
is stored on a per-file basis, not on a per-archive basis, and it
hasn't got anywhere near per-file details when that exception is
raised. "allowZip64" has not been used either.
 
S

Steven D'Aprano

The zipfile format is kind of brain dead, you can't tell where the end
of the file is supposed to be by looking at the header. If the end of
file hasn't yet been reached there could be more data. To make matters
worse, somehow zip files came to have text comments simply appended to
the end of them. (Probably this was for the benefit of people who would
cat them to the terminal.)

Anyway, if you see something that doesn't adhere to the zipfile format,
you don't have any foolproof way to know if it's because the file is
corrupted or if it's just an appended comment.

Yes, this has lead to a nice little attack vector, using hostile Java
classes inside JAR files (a variant of ZIP).

http://www.infoworld.com/article/08/08/01/
A_photo_that_can_steal_your_online_credentials_1.html

or http://snipurl.com/9oh0e
 
J

John Machin

The zipfile format is kind of brain dead, you can't tell where the end
of the file is supposed to be by looking at the header.  If the end of
file hasn't yet been reached there could be more data.  To make
matters worse, somehow zip files came to have text comments simply
appended to the end of them.  (Probably this was for the benefit of
people who would cat them to the terminal.)

Anyway, if you see something that doesn't adhere to the zipfile
format, you don't have any foolproof way to know if it's because the
file is corrupted or if it's just an appended comment.

Most zipfile readers use a heuristic to distinguish.  Python's zipfile
module just assumes it's corrupted.

The following post from a while back gives a solution that tries to
snip the comment off so that zipfile module can handle it.  It might
help you out.

http://groups.google.com/group/comp.lang.python/msg/c2008e48368c6543

And here's a little gadget that might help the diagnostic effort; it
shows the archive size and the position of all the "magic" PKnn
markers. In a "normal" uncommented archive, EndArchive_pos + 22 ==
archive_size.
8<---
# usage: python zip_susser.py name_of_archive.zip
import sys
grimoire = [
("FileHeader", "PK\003\004"), # magic number for file
header
("CentralDir", "PK\001\002"), # magic number for central
directory
("EndArchive", "PK\005\006"), # magic number for end of
archive record
("EndArchive64", "PK\x06\x06"), # magic token for Zip64
header
("EndArchive64Locator", "PK\x06\x07"), # magic token for locator
header
]
f = open(sys.argv[1], 'rb')
buff = f.read()
f.close()
blen = len(buff)
print "archive size is", blen
for magic_name, magic in grimoire:
pos = 0
while pos < blen:
pos = buff.find(magic, pos)
if pos < 0:
break
print "%s at %d" % (magic_name, pos)
pos += 4
8<---

HTH,
John
 
W

webcomm

The full signature of ZipFile is:

ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True)

Try passing compression=zipfile.ZIP_DEFLATED and/or allowZip64=False and
see if that makes any difference.

Those arguments didn't make a difference in my case.
The zip format does support alternative compression methods, it's
possible that this particular file uses a different sort of compression
which Python doesn't deal with.


What makes you think it doesn't have one?

Because when I search for the "file is not a zip file" error in
zipfile.py, there is a function that checks for a table of contents.
Tho it looks like there are other ideas in this thread about what
might cause that error... I'll keep reading...
 
W

webcomm

The zipfile format is kind of brain dead, you can't tell where the end
of the file is supposed to be by looking at the header.  If the end of
file hasn't yet been reached there could be more data.  To make
matters worse, somehow zip files came to have text comments simply
appended to the end of them.  (Probably this was for the benefit of
people who would cat them to the terminal.)

Anyway, if you see something that doesn't adhere to the zipfile
format, you don't have any foolproof way to know if it's because the
file is corrupted or if it's just an appended comment.

Most zipfile readers use a heuristic to distinguish.  Python's zipfile
module just assumes it's corrupted.

The following post from a while back gives a solution that tries to
snip the comment off so that zipfile module can handle it.  It might
help you out.

http://groups.google.com/group/comp.lang.python/msg/c2008e48368c6543

Carl Banks

Thanks Carl. I tried Scott's getzip() function yesterday... I
stumbled upon it in my searches. It didn't seem to help in my case,
though it did produce a different error: ValueError, substring not
found. Not sure what that means.
 
C

Chris Mellon

Thanks Carl. I tried Scott's getzip() function yesterday... I
stumbled upon it in my searches. It didn't seem to help in my case,
though it did produce a different error: ValueError, substring not
found. Not sure what that means.

This is a ticket about another issue or 2 with invalid zipfiles that
the zipfile module won't load, but that other tools will compensate
for:

http://bugs.python.org/issue1757072
 
W

webcomm

And here's a little gadget that might help the diagnostic effort; it
shows the archive size and the position of all the "magic" PKnn
markers. In a "normal" uncommented archive, EndArchive_pos + 22 ==
archive_size.

I ran the diagnostic gadget...

archive size is 69888
FileHeader at 0
CentralDir at 43796
EndArchive at 43846
 
W

webcomm

This is a ticket about another issue or 2 with invalid zipfiles that
the zipfile module won't load, but that other tools will compensate
for:

http://bugs.python.org/issue1757072

Hmm. That's interesting. Are there other tools I can use in a python
script that are more forgiving? I am using the zipfile module only
because it seems to be the most widely used. Are other options in
python likely to be just as unforgiving? Guess I'll look and see...
 
W

webcomm

This is a ticket about another issue or 2 with invalid zipfiles that
the zipfile module won't load, but that other tools will compensate
for:

http://bugs.python.org/issue1757072

Looks like I just need to do this to unzip with unix...

from os import popen
popen("unzip data.zip")

That works for me. No idea why I didn't think of that earlier. I'm
new to python but should have realized I could run unix commands with
python. I had blinders on. Now I just need to get rid of some bad
characters in the unzipped file. I'll start a new thread if I need
help with that...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,920
Messages
2,570,038
Members
46,449
Latest member
onedumbsquirrel

Latest Threads

Top