Java support of GNU Tar

J

james.w.appleby

Hello, I'm having difficulties with writing Java code that unpacks and
repacks tar.gz files. I'm really stuck and would appreciate some
help.

I've got the code running using the http://www.trustice.com/java/tar/
library. It is able to uncompress and unpack the tarred gzip file, do
the work the code is supposted to do and unpack and recompress it
suggessfully. However it fails on certain files.

The problem that it reports is that the file name is too long,
exceeding 100 characters. I did a little research and found that GNU
tar supports longer file names but other implementations don't.

After a little more research I found that there is at least one GNU
tar implementation in Java, made by the Apache Ant team. In fact it's
derived from the package I'm already using, so I was hopeful that I
could just drop in their version. This didn't work and I'm not sure
why.

The error I get occurs when I am trying to repack the files into a
Tar. The relevant part of the stack trace is:

java.io.IOException: request to write '7608' bytes exceeds size in
header of '0' bytes
at org.apache.tools.tar.TarOutputStream.write(TarOutputStream.java:
235)
at utilities.ArchiveFormatter.packGZip(ArchiveFormatter.java:324)

Has anyone ever tried to use the Apache code for their own code or
have any idea why I would get the exception above?
 
C

Chris Uppal

The error I get occurs when I am trying to repack the files into a
Tar. The relevant part of the stack trace is:

java.io.IOException: request to write '7608' bytes exceeds size in
header of '0' bytes
at org.apache.tools.tar.TarOutputStream.write(TarOutputStream.java:
235)
at utilities.ArchiveFormatter.packGZip(ArchiveFormatter.java:324)

Has anyone ever tried to use the Apache code for their own code or
have any idea why I would get the exception above?

I have never used any of the tar pakages for Java so the following is only an
educated guess.

I suspect that you'll have to supply the size of the tar file entry before you
start writing the data for that entry.

From the sound of the error message, the Apache code hasn't been told how long
the data is going to be and has defaulted to zero. So when you call
TarOutputStream.write() it checks that you are not writing more data than it
has been told to expect, and -- finding that you are -- refuses to play.

In the tar file format the size of each file's data is embedded in the output
file /before/ the data itself. So obviously the TarOutputStream needs to know
how much data you are going to supply in advance[*]. The ICE implementation of
TarGzipOuputStream has code to buffer-up the write()s in memory until the entry
is close()ed, at which time it updates the header, and then writes the real
data. (I don't know why TarOutputStream doesn't do the same thing for
uncompressed tar files -- maybe I'm missing an option somewhere.) Which is
presumably why your existing code worked with that package.

If the Apache equivalent of TarGzipOuputStream lacks that feature for some
reason, or if it doesn't work the same way, then maybe that's why you are
seeing this failure.

-- chris

[*] Actually, it could just write the data as you supply it and then go back in
the output file and patch the size information, but the ICE implementation
doesn't try to do that -- and anyway it couldn't because you can't jump back
and forth in a gzip-compressed stream.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,588
Members
45,092
Latest member
vinaykumarnevatia1

Latest Threads

Top