Would like to read zip file directly from web

D

ducnbyu

Hi all,

There is a set of zip files available on the Web via http (also
available via ftp) for public consuption that I would like to
programmatically read directly without having to download to local
storage first. For productivity reasons as well as reduction of
consumption/maintenance of local storage as some of them are quite
large.

This is basically what I tried (all of the below tried with http with
same results)...

1) ZipFile zf = new ZipFile("ftp://...");
result: ZipException "The filename, directory name, or volume label
syntax is incorrect"

2) ZipFile zf = new ZipFile(new File(new URI("ftp://...")));
result: File constructor throws IllegalArgumentException "URI Scheme is
not file"
Seems ZipFile constructors only take a string or File and File
apparently only takes URI's that resolve to "local" paths.

3) is = new ZipInputStream((new URL("ftp://...")).openStream());
result: IOException UnknownHostException... "www.xyz.abc"

4) is = new ZipInputStream((new URL("ftp://...")).openConnection(new
Proxy(Proxy.Type.HTTP, new InetSocketAddress("<proxy host from my
IExplorer>", <port number from my IE>) )).getInputStream());
result: IOException FileNotFound... "ftp://..." (This exception occurs
faster, in debug mode, than the others if that's useful.)

Should I continue to pursue any of these such as trying #3 outside of a
firewall, as in perhaps I'm not forming the proxy correctly in #4? I
ask because I will have to install a bunch of stuff at home to try it,
but only if what I'm trying to do sounds possible to you all? Or am I
stuck with having to download first?

The obvious workaround is to programmatically download, read, delete,
but I'd like to streamline a bit if possible since I only need a subset
of the contents of the zip archives. FWIW, the application reads
locally stored zip files successfully.

Thanks for your thoughts.
 
D

ducnbyu

Shoot right after I hit send, it occured to me to navigate to the zip
file via my import file dialog since the code reads directly from zip
files... that seems to work most of the time, but sometimes it just
hangs any thoughts about that.

Thanks for you thoughts.
 
D

ducnbyu

Which is no good because it is copying the file to temp local space
anyway... so I'm back to my original question.

Thanks for your thoughts.
 
D

ducnbyu

Thanks for the link to the code generator, added it to my favorites.
Glanced at the generated code. I will give it a try tomorrow. In the
meantime, can the GZIPInputStream class deal with multiple entries in
the zip archive? The class itself doesn't seem to have methods to
treat them. However, I shouldn't encounter any problems substituting
ZipInputStream (which can) in place of GZIPInputStream, should I?

Thanks again for the lead!
 
R

Roedy Green

I will give it a try tomorrow. In the
meantime, can the GZIPInputStream class deal with multiple entries in
the zip archive? The class itself doesn't seem to have methods to
treat them. However, I shouldn't encounter any problems substituting
ZipInputStream (which can) in place of GZIPInputStream, should I?

They are quite different. GZIP compresses an entire file.
ZipInputStream compresses indivdual members in an archive.
 
C

Chris Uppal

There is a set of zip files available on the Web via http (also
available via ftp) for public consuption that I would like to
programmatically read directly without having to download to local
storage first.

You can't -- not unless you are willing to do a fair amount of work.

There are two ways to read a ZIP file, one is to start at the beginning and
read forward, iterating over every entry as you find them. The other is to use
random access to find the entries directly.

To do the first, you would need to wrap a ZipInputStream around in InputStream
which is reading the contents of the file. You can probably (I haven't tried
this) get a suitable InputStream from an URL object (do URL objects support FTP
?). If you know in advance which entries you want to read, then you just
iterate until you have read all of them. Note that this will (internally)
download /all/ of the data from the Web up to (and a bit past) the end of the
last entry you are interested in. Obviously you need to ensure that the URL
object isn't downloading the whole file to a cache first!

To do the second, you require random access to the contents of the file on the
Web. HTTP (and perhaps FTP) does have support for that, but I don't --
personally -- know of any Java HTTP client software that supports it. Then you
have the problem that the Java ZIP implementation is too limited to allow you
to supply your own source of random-access data, so you next have to implement
the ZIP decoder yourself (not as difficult as it sounds since you can use the
provided gzip decompression to do the hard bit). You might be able to find an
independent implementation of the ZIP format which is able to do this for you,
but I don't know of one myself[*]. All in all, not obviously feasible unless
you /really/ need this functionality.

-- chris

[*] It might be worth looking, though. Two places to start:
http://truezip.dev.java.net/
http://jazzlib.sourceforge.net/
 
D

ducnbyu

Chris said:
To do the first, you would need to wrap a ZipInputStream around in InputStream
which is reading the contents of the file. You can probably (I haven't tried
this) get a suitable InputStream from an URL object (do URL objects support FTP
?).

Isn't this what I tried in #3 and #4 in my original post? Sounds like
you are suggesting that this should work and perhaps I do need to
install and try #3 at home (no firewall), to see if I'm using the proxy
class wrong in #4.
If you know in advance which entries you want to read, then you just
iterate until you have read all of them.

I do know which entries, but I can't get a successful openStream.
Note that this will (internally)
download /all/ of the data from the Web up to (and a bit past) the end of the
last entry you are interested in. Obviously you need to ensure that the URL
object isn't downloading the whole file to a cache first!

Yeah that's what I want to avoid. I'm hoping that by invoking the
getNextEntry() method of the ZipInputStream class as necessary it will
"rapidly" skip through the unneeded entries while reading directly from
the webserver.
I don't know of one myself[*]. All in all, not obviously feasible unless
you /really/ need this functionality.

Thanks for the links, comments, and considerations. This not a
showstopper. Just wanting to simplify the workflow since the archive
and the rest of it's contents are not needed. I'm hoping to at least
get it working so I can try it out. Even if I do get it working, it
might be a dog anyway and not worth doing from that standpoint. There
are also use cases where the archive is needed multiple times, so the
download first workflow will still be available.
 
R

Roedy Green

Should I continue to pursue any of these such as trying #3 outside of a
firewall, as in perhaps I'm not forming the proxy correctly in #4? I
ask because I will have to install a bunch of stuff at home to try it,
but only if what I'm trying to do sounds possible to you all? Or am I
stuck with having to download first?

You either have to download it first, or else put some code on the
server to do the work for you and send you just what you needed.

You might look into the JarDiff protocol part of of Java Web Start.
http://mindprod.com/jgloss/javawebstart.html
 
C

Chris Uppal

(e-mail address removed) wrote:

[me:]
Isn't this what I tried in #3 and #4 in my original post?

Yes, I think it is. I missed that, sorry.

Sounds like
you are suggesting that this should work and perhaps I do need to
install and try #3 at home (no firewall), to see if I'm using the proxy
class wrong in #4.

I think that logically it should work, but logic is not always a reliable guide
;-) I haven't tried it myself at all.

Good luck !

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top