[rubyzip + open-uri] reading zipfiles from a url?

Janus Bor · Jul 2, 2008

Hi everyone!

I'm trying to read a zipfile directly from a url. However, open-uri and
rubyzip don't seem to cooperate very well:

require 'zip/zip'
require 'open-uri'

url = "http://www.cibiv.at/~phuong/vien/8a375.zip"
zip_file = Zip::ZipFile.open(url)

This code produces a ZipError, saying it can't find the zip file.
open-uri obviously does not support Zip::ZipFile.open. I'm pretty new to
Ruby (and programming in general), is there any other way to open the
zipfile directly?

Writing all zipfiles to a new file on the hd would be a nightmare
perfomance wise, as I only need a few very small files out of every
zipfile, but I need to process a few hundred zipfiles...

Thanks in advance!
Janus

David Masover · Jul 13, 2008

url = "http://www.cibiv.at/~phuong/vien/8a375.zip"
zip_file = Zip::ZipFile.open(url)

Maybe there is a right way to do this...

I'm going to argue that it would be difficult at best. Zip files store
metadata, such as compressed file location, at the end of the file. After
that, you'd want to seek somewhere in the middle. Since open-uri is probably
meant to issue a single, straightforward HTTP request (that is, ask for the
whole file, from beginning to end), I'm not sure this would work well.

But that's just an educated guess.

Depending on how portable you need this to be, you might consider the
FUSE-based HTTPFS:

http://httpfs.sourceforge.net/

When I last tried this, performance was pretty terrible -- no caching, and it
would fetch the file in blocks, so _many_ separate HTTP requests. But it
would probably do what you want.

Axel Etzold · Jul 13, 2008

-------- Original-Nachricht --------

Datum: Sun, 13 Jul 2008 22:50:21 +0900
Von: David Masover <[email protected]>
An: (e-mail address removed)
Betreff: Re: [rubyzip + open-uri] reading zipfiles from a url?

url = "http://www.cibiv.at/~phuong/vien/8a375.zip"
zip_file = Zip::ZipFile.open(url)

Click to expand...

Maybe there is a right way to do this...

I'm going to argue that it would be difficult at best. Zip files store
metadata, such as compressed file location, at the end of the file. After
that, you'd want to seek somewhere in the middle. Since open-uri is
probably
meant to issue a single, straightforward HTTP request (that is, ask for
the
whole file, from beginning to end), I'm not sure this would work well.

But that's just an educated guess.

Depending on how portable you need this to be, you might consider the
FUSE-based HTTPFS:

http://httpfs.sourceforge.net/

When I last tried this, performance was pretty terrible -- no caching, and
it
would fetch the file in blocks, so _many_ separate HTTP requests. But it
would probably do what you want.

Hi ---

there is also rio (http://rio.rubyforge.org/):

# Copy a file from a ftp server into a local file un-gzipping it
rio('ftp://host/afile.gz').gzip > rio('afile')

I have no idea what its performance would be.

Best regards,

Axel

Michal Suchanek · Jul 16, 2008

Maybe there is a right way to do this...

I'm going to argue that it would be difficult at best. Zip files store
metadata, such as compressed file location, at the end of the file. After
that, you'd want to seek somewhere in the middle. Since open-uri is probably
meant to issue a single, straightforward HTTP request (that is, ask for the
whole file, from beginning to end), I'm not sure this would work well.

But that's just an educated guess.

Depending on how portable you need this to be, you might consider the
FUSE-based HTTPFS:

http://httpfs.sourceforge.net/

When I last tried this, performance was pretty terrible -- no caching, and it
would fetch the file in blocks, so _many_ separate HTTP requests. But it
would probably do what you want.

This is how FUSE is designed. The caching is supposed to happen in the
kernel upper layers. The requests are sent as received from the kernel
so there is nothing httpfs can do about the granularity. In practice
the kernel requests up to ~16k chunks but probably only when the
application does large block reads.

I tried keep-alive which should speed up subsequent requests. However,
the sockets can then hang, and it takes time to detect that. Still
this should happen only when there are network problems anyway.

It is very nice for mounting CD or DVD images, extracting a single
file from a zip could also be faster. If you want all the files anyway
it's probably better to just download the zip.

Also last time I looked the httpfs at SF was broken, there was a bug
around the SSL #ifdefs in read/write. You may need to define or
undefine USE_SSL or fix the code.

FUSE is theoretically portable to *BSD but not the code there because
it uses undefined behaviour of directory operations to make the
underlying directory still visible after the mount.

Thanks

Michal

RubyZip:	1	Dec 11, 2008
Rubyzip - `dup': can't dup NilClass (TypeError)	10	Jun 8, 2010
Problems with Rubyzip	0	Jun 21, 2007
creating a zip file using rubyzip	2	May 21, 2009
rubyzip problem / question	1	Oct 17, 2006
Reading a zip file from a GET request without saving?	1	May 7, 2009
unzip process exception. dunno why	3	Sep 11, 2011
Get a web page with open-uri	6	Jul 8, 2009

[rubyzip + open-uri] reading zipfiles from a url?

Janus Bor

David Masover

Axel Etzold

Michal Suchanek

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads