possible to read self-extracting zip file?

Discussion in 'Java' started by Bomb Diggy, Aug 13, 2003.

  1. Bomb Diggy

    Bomb Diggy Guest

    Hi,

    Is it possible to use java.util.zip to read and decompress/inflate a
    self-extracting zip file? I've been able to unzip regular zip files
    (produced by WinZip), but not self-extracting zip files (produced by
    Winzip and PKZip).

    Thanks.
     
    Bomb Diggy, Aug 13, 2003
    #1
    1. Advertising

  2. Bomb Diggy

    Roedy Green Guest

    On 13 Aug 2003 14:19:29 -0700, (Bomb Diggy)
    wrote or quoted :

    >Is it possible to use java.util.zip to read and decompress/inflate a
    >self-extracting zip file? I've been able to unzip regular zip files
    >(produced by WinZip), but not self-extracting zip files (produced by
    >Winzip and PKZip).


    Just a guess. Try renaming the extension and see if that is sufficient
    to fool it.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Aug 14, 2003
    #2
    1. Advertising

  3. Bomb Diggy:

    >Is it possible to use java.util.zip to read and decompress/inflate a
    >self-extracting zip file? I've been able to unzip regular zip files
    >(produced by WinZip), but not self-extracting zip files (produced by
    >Winzip and PKZip).


    A zip file looks like this (simplified):

    HEADER1 FILE1 HEADER2 FILE2 [... other pairs] HEADER1 HEADER2 [...all
    headers]

    So the headers are repeated at the end of the archive.

    A self-extracting zip file looks like almost like this, only a CODE
    prefix is different:

    CODE HEADER1 FILE1 [...rest as above]

    where CODE is some executable code with unzip functionality that
    searches the file it is in for headers and then unzips the data.

    The problem with java.util.zip is that it never reads the header
    directory at the end, which would be much quicker than going over the
    complete file. It always expects a stream to start with a header.

    So your only chance is to write code that searches for the first
    header in an InputStream (skipping any potential CODE section), then
    wrap a ZipInputStream around that. Then you can work with that
    self-extracting zip file.

    Here are pointers to the zip file format specs and related
    information:
    <http://www.geocities.com/SiliconValley/Lakes/6686/zip-archive-file-format.html>.
    IIRC a header starts with P K \003 \004 (or similar numbers, it's in
    the specs). That's what you must search for.

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
     
    Marco Schmidt, Aug 14, 2003
    #3
  4. Roedy Green:

    >Structure of a zip file is that there are embedded headers on each
    >element, then a summary complete set of headers at the end.


    That's what I tried to explain.

    >You have
    >to scan backwards doing a bit a fancy footwork to find the last
    >header, then you can chase backwards through the summaries to the
    >first one.


    You don't have to do that. But it's quicker than reading the complete
    file, collecting local headers on the way.

    >Now chop the head off and you should have something java can eat. But
    >in the meantime, you have almost written your own ruddy zip package!


    No, as I pointed out, all you have to do is search for the first local
    header. Then you can wrap that stream into a ZipInputStream. The
    searching probably has to be done with some unread capability like
    PushbackInputStream.

    If you "chop off" the CODE section, the offsets in the "header
    summary" (central directory) at the end of the ZIP archive are
    incorrect. Java can still read those files because it never touches
    that central directory. But the ZIP file is corrupted.

    >I could write you a beast to do this in Java, C++, C or MASM as a
    >separate prestep for $50 US. It would convert an exe to a zip.


    If you want to create a valid ZIP, that's a not-so-trivial task (see
    above). 50 USD is little for that functionality.

    >Phil Katz of Pkzip.com put the 8.3 PKZip format into the public
    >domain. It must have been mildly changed to deal with long file names.
    >
    >The idea of the duplication was you could recover some of the elements
    >from a corrupted file.


    I think it was more about speed. With a normal archive it's much more
    likely that file data got corrupted. Headers are relatively small.

    But it's been a while that I've read the ZIP specs.

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
     
    Marco Schmidt, Aug 14, 2003
    #4
  5. Bomb Diggy

    Roedy Green Guest

    On Thu, 14 Aug 2003 06:48:35 +0200, Marco Schmidt
    <> wrote or quoted :

    >>You have
    >>to scan backwards doing a bit a fancy footwork to find the last
    >>header, then you can chase backwards through the summaries to the
    >>first one.

    >
    >You don't have to do that. But it's quicker than reading the complete
    >file, collecting local headers on the way.


    If you buffer appropriately, you will do less i/o scanning backwards
    through the pure headers at the end than if you scan forward through
    the embedded headers. Also if you have an exe file, I don't know how
    you would find the first header if you scan forward. It may be there
    is a quick way to find the first header. That would be ideal. If Phil
    were clever he would have hidden it at offset 0 of the program proper,
    right after the exe header. I don't know if these are DOS, Win16 or
    Win32 exe headers.


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Aug 14, 2003
    #5
  6. Roedy Green:

    >If you buffer appropriately, you will do less i/o scanning backwards
    >through the pure headers at the end than if you scan forward through
    >the embedded headers.


    Yes, that's what I was saying - getting the central directory is
    faster.

    > Also if you have an exe file, I don't know how
    >you would find the first header if you scan forward. It may be there
    >is a quick way to find the first header. That would be ideal. If Phil
    >were clever he would have hidden it at offset 0 of the program proper,
    >right after the exe header. I don't know if these are DOS, Win16 or
    >Win32 exe headers.


    As I said in <>:

    >IIRC a header starts with P K \003 \004 (or similar numbers, it's in
    >the specs). That's what you must search for.


    Just search for that local header signature. It doesn't matter what
    platform the native stub was written for. Make some sanity checks so
    that you don't get that signature stored in the native code.

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
     
    Marco Schmidt, Aug 14, 2003
    #6
  7. Bomb Diggy

    Roedy Green Guest

    On Thu, 14 Aug 2003 09:38:42 +0200, Marco Schmidt
    <> wrote or quoted :

    >>IIRC a header starts with P K \003 \004 (or similar numbers, it's in
    >>the specs). That's what you must search for.

    >
    >Just search for that local header signature. It doesn't matter what
    >platform the native stub was written for. Make some sanity checks so
    >that you don't get that signature stored in the native code.



    That strikes me as a tad dangerous. The string could appear in the
    code itself.

    For DOS headers it is fairly simple to jump over the relocation
    header, and the exe portion. For Win16 it is more complex. For win32
    I'd guess getting yucky.

    Do we know which style of header he is dealing with? What program
    created them?

    I wrote a little utility years ago that could at least tell DOS and
    Win16 style apart. It is on my website at
    http://mindprod.com/products2.html#ISWIN

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Aug 14, 2003
    #7
  8. Bomb Diggy

    Roedy Green Guest

    On Thu, 14 Aug 2003 07:50:57 GMT, Roedy Green <>
    wrote or quoted :

    >Do we know which style of header he is dealing with? What program
    >created them?


    Winzip creates PE Win32 headers. I just checked with a new version of
    isWin.

    Zip format (for non self-extracting zip) is documented at
    http://pkware.com/products/enterprise/white_papers/appnote.html

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Aug 14, 2003
    #8
  9. Roedy Green:

    >>Just search for that local header signature. It doesn't matter what
    >>platform the native stub was written for. Make some sanity checks so
    >>that you don't get that signature stored in the native code.

    >
    >
    >That strikes me as a tad dangerous. The string could appear in the
    >code itself.


    But that's what I meant when I wrote "Make some sanity checks so
    that you don't get that signature stored in the native code." It
    should be relatively easy to identify a real local header.

    >For DOS headers it is fairly simple to jump over the relocation
    >header, and the exe portion. For Win16 it is more complex. For win32
    >I'd guess getting yucky.


    >Do we know which style of header he is dealing with? What program
    >created them?


    I wouldn't want to try to interpret the native executable part. Then
    again, I have no experience with those, maybe it's easier than I
    think.

    >I wrote a little utility years ago that could at least tell DOS and
    >Win16 style apart. It is on my website at
    >http://mindprod.com/products2.html#ISWIN


    At <http://www.wotsit.org> or in any Unix magic file there probably
    are the signatures to identify types of executables. But that's only
    part of finding out where the actual data starts. I'd rather implement
    my variant, checking for P K \003 \004 (or whatever the numbers after
    PK are).

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
     
    Marco Schmidt, Aug 14, 2003
    #9
  10. Marco Schmidt <> writes:

    > A self-extracting zip file looks like almost like this, only a CODE
    > prefix is different:
    >
    > CODE HEADER1 FILE1 [...rest as above]
    >
    > where CODE is some executable code with unzip functionality that
    > searches the file it is in for headers and then unzips the data.


    Actually just look at the PE spec, it will tell you the EXE header
    format and there you can find out where the file's data segment (the
    zip stream) begins.
     
    Tor Iver Wilhelmsen, Aug 14, 2003
    #10
  11. Roedy Green:

    >I noticed the new PKZip format now allows files > 2 GIG.
    >see http://mindprod.com/jgloss/zip.html


    Unfortunately, both PKWare and WinZip Computing added proprietary,
    undocumented extensions (compression and encryption types) to ZIP.
    Once these take off, the ZIP file format will be much less valuable.
    There is a discussion on comp.compression on the topic, see
    <http://www.geocities.com/SiliconValley/Lakes/6686/zip-archive-file-format.html>
    for links to that discussion and information on the format.

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
     
    Marco Schmidt, Aug 15, 2003
    #11
  12. Bomb Diggy

    Roedy Green Guest

    On Fri, 15 Aug 2003 04:17:32 +0200, Marco Schmidt
    <> wrote or quoted :

    >Once these take off, the ZIP file format will be much less valuable.
    >There is a discussion on comp.compression on the topic, see
    ><http://www.geocities.com/SiliconValley/Lakes/6686/zip-archive-file-format.html>


    They really should not do that. At least the information needed to
    unpack should be public.
    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Aug 15, 2003
    #12
  13. On Thu, 14 Aug 2003 02:51:50 +0200, Marco Schmidt <> wrote:
    > Bomb Diggy:
    >
    >>Is it possible to use java.util.zip to read and decompress/inflate a
    >>self-extracting zip file? I've been able to unzip regular zip files
    >>(produced by WinZip), but not self-extracting zip files (produced by
    >>Winzip and PKZip).

    >
    > The problem with java.util.zip is that it never reads the header
    > directory at the end, which would be much quicker than going over the
    > complete file. It always expects a stream to start with a header.
    >
    > So your only chance is to write code that searches for the first
    > header in an InputStream (skipping any potential CODE section), then
    > wrap a ZipInputStream around that. Then you can work with that
    > self-extracting zip file.
    >


    I'm certain that some classes in the java.util.zip or java.util.jar can
    handle zip files with code prepended at the start. The java -jar command
    (which uses the same class libraries to load classes) works fine on
    executable jar files even when the jar file has extra code up front. So
    it must be using the central directory to get at the files.

    I think using the ZipFile class instead of a ZipInputStream would most
    probably work. (Not tested - just an educated guess).

    BK
     
    Babu Kalakrishnan, Aug 20, 2003
    #13
  14. Bomb Diggy

    Roedy Green Guest

    On Wed, 20 Aug 2003 11:33:13 +0530, Babu Kalakrishnan
    <> wrote or quoted :

    >? I've been able to unzip regular zip files
    >>>(produced by WinZip), but not self-extracting zip files (produced by
    >>>Winzip and PKZip).


    The other thing to consider is Winzip and Pkzip have a large variety
    of compressing algorithms. I don't know which ones Java supports. If
    you are preparing Zip for java, you have to control which algorithms
    it uses.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Aug 20, 2003
    #14
  15. On Wed, 20 Aug 2003 06:44:00 GMT, Roedy Green <> wrote:
    > On Wed, 20 Aug 2003 11:33:13 +0530, Babu Kalakrishnan
    ><> wrote or quoted :
    >
    >>? I've been able to unzip regular zip files
    >>>>(produced by WinZip), but not self-extracting zip files (produced by
    >>>>Winzip and PKZip).

    >
    > The other thing to consider is Winzip and Pkzip have a large variety
    > of compressing algorithms. I don't know which ones Java supports. If
    > you are preparing Zip for java, you have to control which algorithms
    > it uses.
    >


    True. Java supports only the "deflate" compression scheme (the ones used
    by gzip - RFC 1950 to 1952) since the compression/decompression engine
    used by it is the open source "zlib" library.

    BK
     
    Babu Kalakrishnan, Aug 20, 2003
    #15
  16. Babu Kalakrishnan:

    >True. Java supports only the "deflate" compression scheme (the ones used
    >by gzip - RFC 1950 to 1952) since the compression/decompression engine
    >used by it is the open source "zlib" library.


    I think Java also supports uncompressed entries.

    Regards,
    Marco
    --
    Please reply in the newsgroup, not by email!
    Java programming tips: http://jiu.sourceforge.net/javatips.html
    Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
     
    Marco Schmidt, Aug 29, 2003
    #16
  17. Bomb Diggy

    Roedy Green Guest

    On Fri, 29 Aug 2003 03:45:36 +0200, Marco Schmidt
    <> wrote or quoted :

    >>True. Java supports only the "deflate" compression scheme (the ones used
    >>by gzip - RFC 1950 to 1952) since the compression/decompression engine
    >>used by it is the open source "zlib" library.

    >
    >I think Java also supports uncompressed entries.


    I was just looking at WZZIP the command line part of Winzip. It gives
    you no way to control the algorithms. The Winzip32 also has a command
    line interface where you can suggest if you want fast or thorough, but
    not the particular algorithm.

    I am working on a project called the Replicator which simply
    distributes a set of files and keeps them up to date. For now I will
    use jar.exe to create the zips and later do it with the zip classes.

    I think PKZIP may give the control needed plus speed for the zip
    creation.

    The Winzip people announce that the new version uses a form of
    compression the old versions can't read. This screws up zip format
    for interchange. The PKZip people seem to have a proprietary
    algorithm now too. These formats need to be open and if not
    universally supported, at least suppressible.




    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Aug 29, 2003
    #17
  18. Bomb Diggy

    Roedy Green Guest

    On Fri, 29 Aug 2003 06:54:07 GMT, Roedy Green <>
    wrote or quoted :

    >I think PKZIP may give the control needed plus speed for the zip
    >creation.


    I downloaded the evaluation copy of PKzip. The information on you
    control it from the command line is hidden in a file called pkzipc.pdf
    which is not indexed anywhere. pkzipc.exe is the command line
    version. pkzipw.exe is the Windows GUI version.

    It lets you specify the precise algorithm on the command line.
    However, that is not really what you usually want.

    You want compatibility with something else, so you want it to use a
    SELECTION of algorithms, or to avoid some algorithm, not to use one
    particular one which may be inappropriate.

    I think though for Java use, forcing it to deflate only is what you
    want.


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Aug 29, 2003
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew Edwards

    self extracting zip files

    Andrew Edwards, Jul 12, 2003, in forum: C Programming
    Replies:
    6
    Views:
    519
    Malcolm
    Jul 12, 2003
  2. Calvin FONG
    Replies:
    12
    Views:
    1,158
    Calvin FONG
    Feb 19, 2004
  3. Ralf W. Grosse-Kunstleve
    Replies:
    16
    Views:
    601
    Lonnie Princehouse
    Jul 11, 2005
  4. Ralf W. Grosse-Kunstleve
    Replies:
    18
    Views:
    606
    Bengt Richter
    Jul 11, 2005
  5. Ralf W. Grosse-Kunstleve
    Replies:
    2
    Views:
    414
    Dan Sommers
    Jul 12, 2005
Loading...

Share This Page