incremental archive format for outputstream?

N

NOBODY

Hi,

Do you know of an 'incremental' archive format that would be suited for an
outputstream?

In other words, is there any archive format that can hold open the existing
entries and allow to append them in an interlaced fashion? (an
incrementally updating archive?)


Let me explain.

Let's say I have 2 types (A nd B) of csv data to send.
A1.csv, A2.csv, A3.csv
B1.csv, B2.csv, B3.csv

I want to write to a stream an archive format that will contain 2 entries
(A and B) where A is the contatenation of A1+A2+A3, and B is the
contatenation of B1+B2+B3.

Now, imagine a zip file. It is easy enough to create a new zip entry A, and
push all A1, A2, A3 files in sequence, and create a second zip entry B and
push B1, B2, B3.

But here is the problem: the sequence is rolling (like a log4j file-size-
rolling appender) and by the time I finished pushing A3, B1 be have rolled
off. I want to push A1 B1, A2 B2, A3 B3.

So, I cannot use java's zipfile, at least not that I know of, to "append
existing entry" instead of putNextEntry().

Something smart like gzip (where you can concatenate independant gzip files
and they become a valid single gzip file) only for multiple entries (that
gzip doesn't have) would be great!

Thanks.
 
C

Chris Uppal

NOBODY said:
I want to write to a stream an archive format that will contain 2 entries
(A and B) where A is the contatenation of A1+A2+A3, and B is the
contatenation of B1+B2+B3.

I doubt if that's possible in any existing archive format. Since the library
doesn't know how many "A" entries you are going to add, it doesn't know where
to put the "B" entries in the output file.

I suggest that you redesign. One simple option would be to use two (or more)
output archives which you write concurrently. A somewhat more complex, but
more elegant (IMO), option would be to layer your own "protocol" over an
existing archive format. So that you use what the archive code thinks of as
"files" as mere "chunks" in (logically) connected streams.

In the latter case, the archive would "think" that it contained:

A.csv/A1.csv
A.csv/A2.csv
B.csv/B1.csv
A.csv/A3.csv
B.csv/B2.csv
B.csv/B3.csv

but your code would interpret that as simply:

A.csv
B.csv

The ZIP file format (which has a table of contents) would be highly suitable
for the lower level of such a scheme, I think. Note that you can use any names
you like for the entries in a ZIP file -- they don't have to be names of real
files (nor even valid filenames).

-- chris
 
A

Andrey Kuznetsov

I doubt if that's possible in any existing archive format. Since the
library
doesn't know how many "A" entries you are going to add, it doesn't know
where
to put the "B" entries in the output file.

possible solution could be to keep table of contents in another file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top