how to stream or write data into a tar.gz file as if the data werefrom files?

Discussion in 'Ruby' started by bwv549, Sep 15, 2008.

  1. bwv549

    bwv549 Guest

    I have a gazillion little files in memory (each is really just a chunk
    of data, but it represents what needs to be a single file) and I need
    to throw them all into a .tar.gz archive. In this case, it must be
    in .tar.gz format and it must unzip into actual files--although I pity
    the fellow that actually has to unzip this monstrosity.

    Here's the solutions I've come up with so far:

    1. Not portable, *extremely* slow:
    write out all these "files" into a directory and make a system
    call to tar (tar -xzf ...)

    2. Portable but still just as slow:
    write out all these "files" into a directory and use archive-tar-
    minitar to make the archive

    3. Not portable, but fast:
    stream information into tar/gzip to create the archive (without
    ever first writing out files)

    I've been looking around on this and the closest I've come is this:
    tar cvf - some_directory | gzip - > some_directory.tar.gz

    Note that this would still require me to write the files to a
    directory (which must be avoided at all costs), but at least the
    problem now is how to write data into a tar file. I've been googling
    and still haven't turned up anything yet.

    4. Hack archive-tar-minitar to enable me to write my data directly
    into the format. Looking at the source code, this doesn't seem
    terribly hard, but not terribly easy either. Am I missing a method
    already written for this kind of thing?

    Others?

    Right now, anything resembling #3 or #4 would work for me.

    My feeling is that it shouldn't be that hard to write data into
    a .tar.gz format in either linux or ruby without actually having any
    files (i.e., everything in memory or streamed in).

    Thanks a lot for any suggestions or ideas!
    bwv549, Sep 15, 2008
    #1
    1. Advertising

  2. Re: how to stream or write data into a tar.gz file as if the datawere from files?

    On 15.09.2008 20:35, bwv549 wrote:
    > I have a gazillion little files in memory (each is really just a chunk
    > of data, but it represents what needs to be a single file) and I need
    > to throw them all into a .tar.gz archive. In this case, it must be
    > in .tar.gz format and it must unzip into actual files--although I pity
    > the fellow that actually has to unzip this monstrosity.


    > 3. Not portable, but fast:
    > stream information into tar/gzip to create the archive (without
    > ever first writing out files)
    >
    > I've been looking around on this and the closest I've come is this:
    > tar cvf - some_directory | gzip - > some_directory.tar.gz
    >
    > Note that this would still require me to write the files to a
    > directory (which must be avoided at all costs), but at least the
    > problem now is how to write data into a tar file. I've been googling
    > and still haven't turned up anything yet.


    So why then do you say "without ever first writing out files"?

    I'd say #3 (the original formulation) is the one to go. Googling for
    "ruby tar" quickly turned up this:

    http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/32588

    And there is zlib which allows to read and write GZip streams. So, if
    ruby-tar allows to write into any stream you got your solution.

    Kind regards

    robert
    Robert Klemme, Sep 15, 2008
    #2
    1. Advertising

  3. Re: how to stream or write data into a tar.gz file as if the data were from files?

    > Others?

    Although it's not what you're asking for, as you mention "zipping" maybe
    you could consider rubyzip:

    require 'zip/zipfilesystem'
    Zip::ZipFile.open("foo.zip") { |zfs|
    zfs.file.open("member.txt") { |f| f << data }
    zfs.commit
    }

    zip is not tar, but it does have a some advantages - in particular the
    ability to get random-access to any particular member without having to
    read through the whole thing from the start.

    > My feeling is that it shouldn't be that hard to write data into
    > a .tar.gz format in either linux or ruby without actually having any
    > files (i.e., everything in memory or streamed in).


    When reading, rubyzip lets you spool directly out of the zip. When
    writing, I think that behind the scenes it spools to a tempfile, and
    when you commit it then packs this into the archive.
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Sep 15, 2008
    #3
  4. Re: how to stream or write data into a tar.gz file as if the data were from files?

    And Googling for "ruby tar library" turns up:

    <http://raa.ruby-lang.org/project/minitar/>

    :: which looks pretty appropriate :)

    FWIW,
    --
    Hassan Schroeder ------------------------
    Hassan Schroeder, Sep 15, 2008
    #4
  5. bwv549

    bwv549 Guest

    Re: how to stream or write data into a tar.gz file as if the datawere from files?

    > So why then do you say "without ever first writing out files"?

    I'm just trying to show that if I can stream out a tar file, then I
    can at least pipe it into gzip (on many OS's). So, I'm really stuck
    at making a tar file without actually having to write files to disk
    first.

    > And there is zlib which allows to read and write GZip streams.  So, if
    > ruby-tar allows to write into any stream you got your solution.


    I looked at ruby-tar (on your suggestion) but ruby-tar turns out to
    not have any write capabilities.

    So, I'm still looking deeper into archive-tar-minitar. I also found
    'tarruby' (bindings to the C libtar library) in rubyforge but it seems
    more difficult to hack into than minitar.

    As pointed out, the difficulty here has been narrowed down to writing
    tar files without having to write files out to disk first.

    Sincere thanks for the suggestions.
    bwv549, Sep 15, 2008
    #5
  6. bwv549

    bwv549 Guest

    Re: how to stream or write data into a tar.gz file as if the datawere from files?

    > you could consider rubyzip:
    >
    >   require 'zip/zipfilesystem'
    >   Zip::ZipFile.open("foo.zip") { |zfs|
    >     zfs.file.open("member.txt") { |f| f << data }
    >     zfs.commit
    >   }


    This is *exactly* what I need to be able to do, except with .tar.gz
    files. I will use this solution for now, even while still searching
    for (or maybe writing) the .tar.gz equivalent. Short term, this will
    get me by... [even though a .tar.gz equivalent would be really nice].

    Thanks!!
    bwv549, Sep 15, 2008
    #6
  7. Re: how to stream or write data into a tar.gz file as if the data were from files?

    > This is *exactly* what I need to be able to do, except with .tar.gz
    > files. I will use this solution for now


    Do test it though. I tested it streaming large files in (100MB), and
    found that it created a tempfile behind the scenes. If it does this for
    *all* files, then it may not be any more efficient than using
    archive-tar-minitar.

    But it does have a simple API, which is essentially the same as File and
    Dir. (Although unfortunately you can't use it to open a zipfile which is
    within a zipfile :)
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Sep 15, 2008
    #7
  8. bwv549

    bwv549 Guest

    Re: how to stream or write data into a tar.gz file as if the datawere from files?

    > Do test it though. I tested it streaming large files in (100MB), and

    Yes, upon testing I saw that it was creating a bunch of temp files,
    too. It's too bad since the API is so clean! Perhaps it will be
    reimplemented someday...

    ********************************************************************
    ********************** A solution using Minitar *******************

    So, I hacked on archive-tar-minitar for a while and came up with a
    solution. Right now I add a class method that fits with the style of
    the pack_file method (indeed, pilfers most of its code) and then I can
    access it using the slightly lower level interface than 'pack':

    require 'archive/tar/minitar'
    require 'stringio'

    module Archive::Tar::Minitar

    # entry may be a string (the name), or it may be a hash specifying
    the
    # following:
    # :name (REQUIRED)
    # :mode 33188 (rw-r--r--) for files, 16877 (rwxr-xr-x) for dirs
    # (0O100644) (0O40755)
    # :uid nil
    # :gid nil
    # :mtime Time.now
    #
    # if data == nil, then this is considered a directory!
    # (use an empty string for a normal empty file)
    # data should be something that can be opened by StringIO
    def self.pack_as_file(entry, data, outputter) #:yields action, name,
    stats:
    outputter = outputter.tar if outputter.kind_of?
    (Archive::Tar::Minitar::Output)

    stats = {}
    stats[:uid] = nil
    stats[:gid] = nil
    stats[:mtime] = Time.now

    if data.nil?
    # a directory
    stats[:size] = 4096 # is this OK???
    stats[:mode] = 16877 # rwxr-xr-x
    else
    stats[:size] = data.size
    stats[:mode] = 33188 # rw-r--r--
    end

    if entry.kind_of?(Hash)
    name = entry[:name]

    entry.each { |kk, vv| stats[kk] = vv unless vv.nil? }
    else
    name = entry
    end

    if data.nil? # a directory
    yield :dir, name, stats if block_given?
    outputter.mkdir(name, stats)
    else # a file
    outputter.add_file_simple(name, stats) do |os|
    stats[:current] = 0
    yield :file_start, name, stats if block_given?
    StringIo_Open(data, "rb") do |ff|
    until ff.eof?
    stats[:currinc] = os.write(ff.read(4096))
    stats[:current] += stats[:currinc]
    yield :file_progress, name, stats if block_given?
    end
    end
    yield :file_done, name, stats if block_given?
    end
    end
    end
    end

    #####################################
    # Then to use it to make a .tgz file:
    #####################################

    require 'zlib'

    file_names = ['a_dir/dorky1', 'dorky2', 'an_empty_dir']
    file_data_strings = ['my data', 'my data also', nil]

    tgz = Zlib::GzipWriter.new(File.open('my_tar.tgz', 'wb'))

    Archive::Tar::Minitar::Output.open(tgz) do |outp|
    file_names.zip(file_data_strings) do |name, data|
    Archive::Tar::Minitar.pack_as_file(name, data, outp)
    end
    end

    ***********************************************************

    So, not terribly pretty, but not too terrible either.
    bwv549, Sep 15, 2008
    #8
  9. bwv549

    ara.t.howard Guest

    Re: how to stream or write data into a tar.gz file as if the data were from files?

    On Sep 15, 2008, at 1:38 PM, bwv549 wrote:

    > This is *exactly* what I need to be able to do, except with .tar.gz
    > files. I will use this solution for now, even while still searching
    > for (or maybe writing) the .tar.gz equivalent. Short term, this will
    > get me by... [even though a .tar.gz equivalent would be really nice].
    >
    > Thanks!!


    IO.popen 'tar cfz -', 'w+' do |pipe|

    end

    and just send files down the pipe

    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama
    ara.t.howard, Sep 15, 2008
    #9
  10. Re: how to stream or write data into a tar.gz file as if the

    Ara Howard wrote:
    > IO.popen 'tar cfz -', 'w+' do |pipe|
    >
    > end
    >
    > and just send files down the pipe


    Uh??

    "tar cfz -" creates a tarfile called "z" and tries to pack a file called
    "-" in it.

    "tar czf - file1 file2 file3" reads the named files from disk and sends
    the *output* to stdout.

    If you don't specify any files, then nothing is created:

    $ tar -czf -
    tar: Cowardly refusing to create an empty archive
    Try `tar --help' or `tar --usage' for more information.

    That's for gnu tar, maybe others work differently. However, as far as I
    know, you can't get tar to read the *content* of files on stdin - and
    even if you could, how would you format them? That is, how would you
    delimit the start and end of each file, and assign a name to each one?
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Sep 16, 2008
    #10
  11. bwv549

    ara.t.howard Guest

    Re: how to stream or write data into a tar.gz file as if the

    On Sep 16, 2008, at 3:30 AM, Brian Candler wrote:

    > Ara Howard wrote:
    >> IO.popen 'tar cfz -', 'w+' do |pipe|
    >>
    >> end
    >>
    >> and just send files down the pipe

    >
    > Uh??
    >
    > "tar cfz -" creates a tarfile called "z" and tries to pack a file
    > called
    > "-" in it.
    >
    > "tar czf - file1 file2 file3" reads the named files from disk and
    > sends
    > the *output* to stdout.
    >
    > If you don't specify any files, then nothing is created:
    >
    > $ tar -czf -
    > tar: Cowardly refusing to create an empty archive
    > Try `tar --help' or `tar --usage' for more information.
    >
    > That's for gnu tar, maybe others work differently. However, as far
    > as I
    > know, you can't get tar to read the *content* of files on stdin - and
    > even if you could, how would you format them? That is, how would you
    > delimit the start and end of each file, and assign a name to each one?
    > --
    > Posted via http://www.ruby-forum.com/.
    >




    sorry. i misread the OPs question. tar can only unpack to stdout,
    not create from stdin.

    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama
    ara.t.howard, Sep 16, 2008
    #11
  12. Re: how to stream or write data into a tar.gz file as if the

    > So, I hacked on archive-tar-minitar for a while and came up with a
    > solution.


    You got me interested now.

    I just installed the archive-tar-minitar gem and it looks pretty easy to
    generate a tar file, without any patching of the library:

    require 'rubygems'
    require 'archive/tar/minitar'

    src = {
    "foo.txt" => "This is file foo",
    "bar.txt" => "This is file bar",
    }

    File.open("test.tar","w") do |tarfile|
    Archive::Tar::Minitar::Writer.open(tarfile) do |tar|
    src.each do |name, data|
    tar.add_file_simple(name, :size=>data.size, :mode=>0644) { |f|
    f.write(data) }
    end
    end
    end

    All I did was a quick poke around the API (gem server --daemon; launch
    web browser pointing at http://localhost:8808/) and look for something
    called "Writer" :)

    HTH,

    Brian.
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Sep 16, 2008
    #12
  13. bwv549

    raleighr3 Guest

    Re: how to stream or write data into a tar.gz file as if the datawere from files?

    On Sep 15, 1:35 pm, bwv549 <> wrote:
    > I have a gazillion little files in memory (each is really just a chunk
    > of data, but it represents what needs to be a single file) and I need
    > to throw them all into a .tar.gz archive.  In this case, it must be
    > in .tar.gz format and it must unzip into actual files--although I pity
    > the fellow that actually has to unzip this monstrosity.
    >


    This maybe be a little late, but better late than never.
    Have you considered using #1 with a tmpfs and memory mapped files?
    This isn't exactly portable, but should be pretty fast since as far as
    tar is concerned your in-memory files just look like a regular
    filesystem thanks to tmpfs.
    raleighr3, Oct 6, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Claudio Grondi
    Replies:
    4
    Views:
    539
    Claudio Grondi
    Aug 20, 2005
  2. Replies:
    2
    Views:
    412
    Michael Hoffman
    Apr 24, 2007
  3. Ray Van Dolson
    Replies:
    0
    Views:
    307
    Ray Van Dolson
    Sep 23, 2009
  4. Ray Van Dolson
    Replies:
    0
    Views:
    787
    Ray Van Dolson
    Sep 25, 2009
  5. benoit Guyon
    Replies:
    2
    Views:
    207
    benoit Guyon
    Jul 26, 2005
Loading...

Share This Page