ZipEntry.getSize

Discussion in 'Java' started by Roedy Green, Sep 15, 2003.

  1. Roedy Green

    Roedy Green Guest

    I can create zip files that Winzip says are valid.
    All the lengths are there. They pass the test.

    I can read zip files that Winzip creates.

    However, I can't read zip files that I create in Java with
    ZipOutputSTream. It claims the lengths of each entry are 0.
    Is there some trick to making this work?

    Here is a slightly simplified version of what I am doing to create the
    zip:

    String elementName = "adir/afile.txt";
    ZipEntry entry = new ZipEntry( elementName );

    File elementFile = new File ( "adir/afile.txt" );
    entry.setTime( elementFile.getLastModified() );
    int fileLength = (int) elementFile.length();

    entry.setSize( fileLength );

    FileInputStream fis = new FileInputStream ( elementFile );
    byte[] wholeFile = new byte [ fileLength ];
    int bytesRead = fis.read( wholeFile, 0 /* offset */, fileLength
    );
    fis.close();

    // no need to setCRC, computed automatically.
    zip.putNextEntry( entry );

    zip.write( wholeFile, 0, fileLength );
    zip.closeEntry();

    I am getting the horrible feeling that ZipOutputstream is stupidly
    designed so that it is up to you to fill in fields like size, CRC,
    compressed size by ESP, otherwise only the summary at the end of the
    file is accurate.
    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 15, 2003
    #1
    1. Advertising

  2. Roedy Green

    Roedy Green Guest

    On Mon, 15 Sep 2003 01:45:07 GMT, Roedy Green <>
    wrote or quoted :

    >I am getting the horrible feeling that ZipOutputstream is stupidly
    >designed so that it is up to you to fill in fields like size, CRC,
    >compressed size by ESP, otherwise only the summary at the end of the
    >file is accurate.


    I sniffed the files ZipOutputStream generates.
    The local headers have 0 in the crc-32
    uncompressed size and compressed size fields.

    I have a sneaky feeling somebody should be shot, or least fired. I
    hope I am just using the ZipOutputStream class incorrectly.


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 15, 2003
    #2
    1. Advertising

  3. Roedy Green

    Roedy Green Guest

    On Mon, 15 Sep 2003 02:02:09 GMT, Roedy Green <>
    wrote or quoted :

    >I sniffed the files ZipOutputStream generates.
    >The local headers have 0 in the crc-32
    >uncompressed size and compressed size fields.


    I discovered though that bit 3 of the general flags is on. This is
    used for streams that are not seekable. ZipOutputStream works as a
    stream, even, for example, as a socket.

    There is a lame way in the PKZip format to put the lengths AFTER the
    data.

    However, that just procrastinates the problem. When you go to read,
    the fool ZipInputstream can't scan ahead to find the lengths, because
    that too is a stream. So you have to read the stream not knowing the
    size of the element. It won't even leave the length YOU set in the
    header intact.

    The alternative, may be to use ZipFile to read, which uses the index
    at the end and allows random access.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 15, 2003
    #3
  4. Roedy Green wrote:
    >
    > I can create zip files that Winzip says are valid.
    > All the lengths are there. They pass the test.
    >
    > I can read zip files that Winzip creates.
    >
    > However, I can't read zip files that I create in Java with
    > ZipOutputSTream. It claims the lengths of each entry are 0.
    > Is there some trick to making this work?
    >
    > Here is a slightly simplified version of what I am doing to create the
    > zip:
    >
    > String elementName = "adir/afile.txt";
    > ZipEntry entry = new ZipEntry( elementName );
    >
    > File elementFile = new File ( "adir/afile.txt" );
    > entry.setTime( elementFile.getLastModified() );
    > int fileLength = (int) elementFile.length();
    >
    > entry.setSize( fileLength );
    >
    > FileInputStream fis = new FileInputStream ( elementFile );
    > byte[] wholeFile = new byte [ fileLength ];
    > int bytesRead = fis.read( wholeFile, 0 /* offset */, fileLength
    > );
    > fis.close();
    >
    > // no need to setCRC, computed automatically.
    > zip.putNextEntry( entry );
    >
    > zip.write( wholeFile, 0, fileLength );
    > zip.closeEntry();
    >
    > I am getting the horrible feeling that ZipOutputstream is stupidly
    > designed so that it is up to you to fill in fields like size, CRC,
    > compressed size by ESP, otherwise only the summary at the end of the
    > file is accurate.
    > --
    > Canadian Mind Products, Roedy Green.
    > Coaching, problem solving, economical contract programming.
    > See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.


    ZipEntry and related classes aren't very well documented. The size
    seems to get written automatically (and maybe trying to write it
    yourself mungs it somehow), and the lastModified works if set after
    "loading" the zip entry. The following worked for me in Windows 98
    under jdk 1.3:

    import java.util.zip.*;
    import java.io.*;

    public class TestZipMultiFile {
    public static void main(String[] args) {
    ZipOutputStream zo;
    ZipEntry ze;

    FileInputStream fis;
    BufferedInputStream bis;
    byte[] data = new byte[1024];
    int byteCount;

    try {
    FileOutputStream fos = new FileOutputStream("test.zip");
    zo = new ZipOutputStream(fos);
    fis = new FileInputStream("TestFile1.java");
    bis = new BufferedInputStream(fis);

    ze = new ZipEntry("TestFile1.java");
    System.out.print("TestFile1.java");
    zo.putNextEntry(ze);
    while ((byteCount = bis.read(data, 0, 1024)) > -1) {
    zo.write(data, 0, byteCount);
    System.out.print("*");
    }
    System.out.println("*");
    bis.close();
    ze.setTime( new File("TestFile1.java").lastModified() );

    fis = new FileInputStream("TestFile2.java");
    bis = new BufferedInputStream(fis);

    System.out.print("TestFile2.java");
    ze = new ZipEntry("TestFile2.java");
    zo.putNextEntry(ze);
    while ((byteCount = bis.read(data, 0, 1024)) > -1) {
    zo.write(data, 0, byteCount);
    System.out.print("*");
    }
    System.out.println("*");
    bis.close();
    ze.setTime( new File("TestFile2.java").lastModified() );

    zo.flush();
    zo.close();
    fos.close();
    }
    catch ( Exception e) { e.printStackTrace(); }
    }
    }


    --
    Steve
    --
    http://www.steveclaflin.com
     
    Steve Claflin, Sep 15, 2003
    #4
  5. Roedy Green

    Roedy Green Guest

    On Mon, 15 Sep 2003 01:29:33 -0400, Steve Claflin
    <> wrote or quoted :

    >ZipEntry and related classes aren't very well documented. The size
    >seems to get written automatically (and maybe trying to write it
    >yourself mungs it somehow), and the lastModified works if set after
    >"loading" the zip entry. The following worked for me in Windows 98
    >under jdk 1.3:


    I had the same problem whether I set the length myself or not. I
    explain what is going on at http://mindprod.com/jgloss/zip.html

    Basically the format ZipOutputStream produces is not compatible with
    ZipInputStream, though it is technically legal.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 15, 2003
    #5
  6. Roedy Green

    Harald Hein Guest

    "Roedy Green" wrote:

    > The alternative, may be to use ZipFile to read, which uses the index
    > at the end and allows random access.


    Thats why it is there.
     
    Harald Hein, Sep 15, 2003
    #6
  7. Roedy Green

    Luke Tulkas Guest

    "Roedy Green" <> wrote in message
    news:...
    > I can create zip files that Winzip says are valid.
    > All the lengths are there. They pass the test.
    >
    > I can read zip files that Winzip creates.
    >
    > However, I can't read zip files that I create in Java with
    > ZipOutputSTream. It claims the lengths of each entry are 0.
    > Is there some trick to making this work?


    zis = new ZipInputStream(...)
    while((entry = zis.getNextEntry()) != null) {
    //If you ask this entry for size here, you get -1, so... just
    ignore.
    //Read from zis until you get -1.
    //If you haven't kept track of the number of bytes you read from
    zis, you can ask the entry for size now & be surprised. ;-)
    }

    Nothing to it, really. If you want I can mail you the code.
     
    Luke Tulkas, Sep 15, 2003
    #7
  8. Roedy Green

    Roedy Green Guest

    On Mon, 15 Sep 2003 21:50:55 +0200, "Luke Tulkas"
    <> wrote or quoted :

    > //Read from zis until you get -1.
    > //If you haven't kept track of the number of bytes you read from
    >zis, you can ask the entry for size now & be surprised. ;-)


    The documentation on this really stinks. They don't explain which
    fields you have to set yourself. They don't tell you the order you
    are supposed to use the methods. They don't tell you about the
    getSize problem. They don't document the / \ problem. They don't
    warn you about trying to read zips created with Winzip or Pkzip
    because of unsupported compression algorithms.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 15, 2003
    #8
  9. Roedy Green

    Harald Hein Guest

    "Roedy Green" wrote:

    > The documentation on this really stinks. They don't explain which
    > fields you have to set yourself. They don't tell you the order
    > you are supposed to use the methods. They don't tell you about
    > the getSize problem. They don't document the / \ problem. They
    > don't warn you about trying to read zips created with Winzip or
    > Pkzip because of unsupported compression algorithms.


    The whole API is a stupid hack done in a hurry to use the info-zip
    library from within Java. It was just hacked to add JARs to Java.
    The API only contains the rudimentary stuff. It was for sure never
    intended to be published. I guess Sun just had to publish it when
    they recognized that people might want to play with JARs and ZIPs
    themself.

    If you want to see some strange things, have a look at the jar tool
    and wheep. They have interesting problems in the tool. E.g. the
    MANIFEST file must be the first file in a jar. But it contains
    per-file data. So you can't just walk through your list of files and
    add them to the jar, while at the same time complete the data in the
    MANIFEST file. Instead you have to run two passes over the input
    files. The first to build the MANIFEST file, the second to add the
    individual files. This creates the risk that a file might change
    between the first and the second pass, leading to completely broken
    JARs. The same with the new index data in a JAR. Here Sun didn't
    even bother to update the JAR/ZIP API. Instead they do everything
    hidden in the jar tool.

    The jardiff tool in the WebStart framework is also interesting. It has
    to compensate for different orders of entries in the input jars.
     
    Harald Hein, Sep 15, 2003
    #9
  10. Roedy Green

    Roedy Green Guest

    On 15 Sep 2003 21:11:56 GMT, Harald Hein <> wrote or
    quoted :

    >I guess Sun just had to publish it when
    >they recognized that people might want to play with JARs and ZIPs
    >themself.


    The other thing that is odd about them is they use native methods that
    use long handles.

    They are also at least an order of magnitude slower than Winzip/PkZip
    for compressing.

    At any rate my Replicator is now Replicating now I understand their
    limitations.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 15, 2003
    #10
  11. Roedy Green

    Roedy Green Guest

    On 15 Sep 2003 21:11:56 GMT, Harald Hein <> wrote or
    quoted :

    >The whole API is a stupid hack done in a hurry to use the info-zip
    >library from within Java. It was just hacked to add JARs to Java.
    >The API only contains the rudimentary stuff. It was for sure never
    >intended to be published. I guess Sun just had to publish it when
    >they recognized that people might want to play with JARs and ZIPs
    >themself.


    Part of the problem was they wanted to make ZipOutputStream a true
    OutputStream even though the file structure properly requires random
    access and buffering to create.

    If I were re-inventing jar files, they would have an alphabetical
    index at the HEAD of the file with absolute offsets into the file
    where to find the data. There might be a little indexing added to
    speed searching for a particular name, e.g. class file loading. There
    would be no embedded headers. That index itself would be optionally
    compressed too. The names of the elements would be in UTF-8 encoding.

    You could open a ZIP, add elements, delete elements, merge other zips,
    and when you closed, then it would do a flurry of copying to create
    the new zip. There would be no need to uncompress and recompress to
    merge two zip files.

    We have no way to update a ZIP now, only create a new one from
    scratch.


    I'd also like to add convenience methods so you could just say which
    files you wanted added, and it got them, with dates etc, and when it
    unpacked them automatically did the necessary mkdirs( f.getParent() ).

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 15, 2003
    #11
  12. Roedy Green

    Harald Hein Guest

    "Roedy Green" wrote:

    > The other thing that is odd about them is they use native methods
    > that use long handles.


    This is because of the underlying info-zip library. At some places in
    the Java jar/zip API the layer around the library is very thin and you
    see the library implementation shining through. If you grap the library
    from the net you see the similarities.

    It gets even better whan you try to figure out stuff like the directory
    in the Infaltor/Deflator. That magic byte[] goes directly into the
    corresponding calls of the info-zip library.

    And for the record, if someone googles for the directory stuff: That
    array of bytes is supposed to contain a sequence of C-style null-
    terminated strings. Don't ask about the encoding, we are back in C "a
    char is a byte" land. Disgusting.
     
    Harald Hein, Sep 16, 2003
    #12
  13. Roedy Green

    Luke Tulkas Guest

    "Roedy Green" <> wrote in message
    news:...
    > On Mon, 15 Sep 2003 21:50:55 +0200, "Luke Tulkas"
    > <> wrote or quoted :
    >
    > > //Read from zis until you get -1.
    > > //If you haven't kept track of the number of bytes you read from
    > >zis, you can ask the entry for size now & be surprised. ;-)

    >
    > The documentation on this really stinks.


    Not only documentation. The whole API is, as you noticed, badly
    designed.
     
    Luke Tulkas, Sep 16, 2003
    #13
  14. Roedy Green

    Eric Sosman Guest

    Roedy Green wrote:
    >
    > If I were re-inventing jar files, they would have an alphabetical
    > index at the HEAD of the file with absolute offsets into the file
    > where to find the data. There might be a little indexing added to
    > speed searching for a particular name, e.g. class file loading. There
    > would be no embedded headers. That index itself would be optionally
    > compressed too. The names of the elements would be in UTF-8 encoding.


    Wouldn't compression of the index just exacerbate the
    problem Harald Hein mentioned concerning the MANIFEST file?
    Actually, I think it makes the problem insoluble: You don't
    know the file offsets until you know the size of the compressed
    index, but you can't compress the index until you know the offset
    values it contains, and if the offset values change the index may
    compress to a different size, ... I imagine many .rgjar files
    would settle down to a steady state after one or two passes,
    but there's the nagging possibility of an eternal oscillation.

    --
     
    Eric Sosman, Sep 16, 2003
    #14
  15. Roedy Green

    Roedy Green Guest

    On Tue, 16 Sep 2003 11:20:42 -0400, Eric Sosman <>
    wrote or quoted :

    >Wouldn't compression of the index just exacerbate the
    >problem Harald Hein mentioned concerning the MANIFEST file?
    >Actually, I think it makes the problem insoluble:


    For simplicity, you would put the length of the index uncompressed
    followed by the index.

    Aren't the decompressors capable of detecting the end of a stream just
    from the compressed bytes? The PKZip format with the length AFTER the
    data implies that.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 16, 2003
    #15
  16. Roedy Green

    Eric Sosman Guest

    Roedy Green wrote:
    >
    > On Tue, 16 Sep 2003 11:20:42 -0400, Eric Sosman <>
    > wrote or quoted :
    >
    > >Wouldn't compression of the index just exacerbate the
    > >problem Harald Hein mentioned concerning the MANIFEST file?
    > >Actually, I think it makes the problem insoluble:

    >
    > For simplicity, you would put the length of the index uncompressed
    > followed by the index.


    Perhaps I didn't explain the problem clearly (or perhaps
    I've just imagined the whole thing ...).

    Your suggestion, if I understood correctly, was to put a
    compressed index at the beginning of the .rgjar file. The index
    would contain (among other things) the offsets of the various
    content files. The offset of any particular content file is
    the sum of the sizes of all things that appear before it, and
    one of these things is the index. Thus, the values recorded in
    the index depend on the size of the compressed index. But the
    values also (potentially) influence the size of the compressed
    index; change the values and you get a different compressed size.
    Looks like a feedback loop to me.

    You could avoid the loop by storing just the file sizes
    instead of their offsets, along with a sequence number (or other
    ordering information) to allow the offsets to be computed from
    the decompressed index. But this is exactly Harald Hein's
    problem: You'd now need to compress all the files *before*
    creating the index, then write the index at the beginning of
    the .rgjar file, then write all the compressed files. Byte code
    isn't too voluminous and could probably be kept around in memory
    between compression time and writing time, but if the .rgjar
    archive also carries images, sounds, video clips, and the entire
    database of RIAA lawsuits you're probably stuck with two complete
    compression passes.

    --
     
    Eric Sosman, Sep 16, 2003
    #16
  17. Roedy Green

    Roedy Green Guest

    On Tue, 16 Sep 2003 16:24:49 -0400, Eric Sosman <>
    wrote or quoted :

    > Your suggestion, if I understood correctly, was to put a
    >compressed index at the beginning of the .rgjar file. The index
    >would contain (among other things) the offsets of the various
    >content files. The offset of any particular content file is
    >the sum of the sizes of all things that appear before it, and
    >one of these things is the index. Thus, the values recorded in
    >the index depend on the size of the compressed index. But the
    >values also (potentially) influence the size of the compressed
    >index; change the values and you get a different compressed size.


    You have to build the index and the file separately then glue them
    together at the last minute. The offsets in the compressed index are
    relative to the end of the index, as if the index and the data were
    two separate files.

    If you tried to make them absolute offsets, you get into your chicken
    and egg loop.

    I notice now we are going for directories nested 10 deep with great
    long names containing spaces. The NAMES of the files themselves are
    sometimes just as big as the contents. There is plenty of opportunity
    there for compressing.

    On rethinking it may make more sense to tack the index on the end, so
    long as in the very last bytes of the file is a pointer to the
    beginning of the index. PKZip format lacks this. You must find the
    start by wending your way back field by field.

    This way you can append to the file more efficiently. You can take
    newe data on the end, and then write a new index on the end, without
    necessarily copying the entire front section. This is a more dangerous
    way to live, but putting the index on the end would at least leave
    that option.

    Putting it on the front however, makes it easier to sample a zip
    without downloading the whole thing.



    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 16, 2003
    #17
  18. Roedy Green

    Luke Tulkas Guest

    "Roedy Green" <> wrote in message
    news:...

    I see that you still haven't changed the code that "won't work" into the
    one that will (on http://mindprod.com/jgloss/zip.html). Why not? It's
    ever so simple. I even told you how.
     
    Luke Tulkas, Sep 18, 2003
    #18
  19. Roedy Green

    M1chael

    Joined:
    Jan 22, 2008
    Messages:
    1
    Hello :)
    Is there any way to create a zip file using ZipOutputStream which sets the sizes correctly? Using ZipFile at decompression is no option for me because we have a lot of clients in the field which do not use ZipFile and cannot be changed.
    I have tried several libraries, but have to find a satisfying one yet...
     
    M1chael, Jan 22, 2008
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roedy Green

    ZipEntry peculiarity

    Roedy Green, Sep 14, 2003, in forum: Java
    Replies:
    2
    Views:
    836
    Roedy Green
    Sep 15, 2003
  2. Ike

    ZipEntry.DEFLATED

    Ike, Dec 11, 2004, in forum: Java
    Replies:
    0
    Views:
    1,543
  3. Replies:
    5
    Views:
    1,679
    Chris Uppal
    Nov 19, 2005
  4. valjean

    wxPython, wxOGL ... GetSize???

    valjean, Apr 13, 2004, in forum: Python
    Replies:
    1
    Views:
    400
    F. GEIGER
    Apr 15, 2004
  5. Sean DiZazzo

    os.path.getsize() on Windows

    Sean DiZazzo, Mar 18, 2008, in forum: Python
    Replies:
    17
    Views:
    1,061
    Martin v. Löwis
    Mar 21, 2008
Loading...

Share This Page