Joseph said:
And because of this "fact," one can work around the differences in
behavior between ZipFile and ZipInputStream. This is a good thing.
The way you've put "fact" in quotes suggests to me that you don't understand
the way the ZIP file format is designed. It is not an accident of
implementation, its a feature of the format's design that is very properly
reflected in the Java classes that implement that design.
Agreed, while the manor by which ZipFile and ZipInputStream approach the
compressed data is different, is it not true that a bad zip file is
always bad file?
No.
The ZIP format is a *data format* not a file format. It is designed to support
a "stream like" approach, where the only data that an application has is what
it has read *so far* (off the network, or from a tape, or whatever), and so it
*has* to be able to interpret what it has seen without waiting for the end of
the input. Hence "errors" in the file that occur after any given entry are
*irrelevant* to that entry, and hence irrelevant to any application that is
doing a forward-only pass over the data.
Other applications don't restrict themselves to forwards-only, but allow
themselves to "know" that the data is held in a seekable format (such a normal
disk file). Such applications, and such applications *only*, will use the
table-of-contents and will be sensitive to errors in it.
If unzip, zipinfo, jar, and other zip file commands
deal with the NWS zip file/stream, would it not be consistent for
ZipFile and ZipInputStream to both do the same, accept or reject it?
No.
Those programs are only some of the applications of the ZIP format. Any
individual one may use only a fraction of the power in the format, or they may
try to expose all the power. I don't know which do and which don't. One way
to see is to ask which of them can read/write ZIP-encoded data to/from a tape.
Any that can't must be relying on the random-access features of hard-disk
files. Equally you can check to see how efficient they are at retreiving data
from an entry near the end of a big ZIP file. Any that are slow are presumably
doing a forward-only scan and failing to make use the random-access
features.
The Java classes expose both feature-sets.
If you want life to be simple, and for there just to be an easy-to-use and
easy-to-understand class library, then either you will have to drop some of the
features (a bad thing) or you will have to design something that is better than
the Sun-supplied stuff.
Take it from me, that's hard. I've recently being going through the exercise
of designing a class library for manipulating ZIP formatted data (in a
different language) and it is *not easy* to find a workable compromise, let
alone something that is both simple and comprehensive.
I don't particularly admire the design Sun have come up with (but then I don't
think much of the design of the ZIP format either). But it is genuinely
difficult to do better (my own attempt uses more layering).
I think that the real problems are that:
a) not everyone realises what the ZIP format is *for* (they've only every used
it for files).
b) the documentation is *terrible*.
The fact that the behavior of ZipFile and ZipInputStream differ causes
the dilemma of knowing when to use one over the other. Since I know
this behavior exists, I for one will never again use ZipFile, creating
instead a MyZipFile class that incorporates the logic I know to work
(see the working example in previous post).
No, no, no, no, no. NO ! You have it all wrong.
The two ZIP classes in the Java library are intended for different purposes.
In some cases there's an overlap and either could be used (albeit with
different performance tradeoffs). In most cases one is at least clearly
preferable to the other; and in some cases only one or the other can possibly
be used. It's *your* job, as a programmer, to understand the issues and make
an intelligent choice based on your understanding. To the extent that you
don't understand the issues then you are not doing your job properly (and to
the extent that that is the fault of the Sun documentation -- 100% I suspect --
the Sun programmers weren't doing their job properly either).
I guess the real question is "when is a bug a bug?" M$ has answered
this question many times by saying "when we say so."
<grin>
Either of the two classes may have bugs or deficiencies, of course, in any
given implementation. But bugs are not the issue here. For the data under
discussion, both classes (in the JDK 1.4.2 implementation) are working
perfectly.
-- chris