gunzip while downloading via ftp

Discussion in 'Perl Misc' started by none, Feb 19, 2006.

  1. none

    none Guest

    I am trying to gunzip a file while downloading it.

    it is 11 gigs in total, so it is an important task.

    what is the proper command format? everything I tried did not work. I
    tried a large number of combinations

    open (my $fh, "gunzip -c");
    $ftp->get("outfile.csv.gz", $fh);
    close $fh;
    none, Feb 19, 2006
    #1
    1. Advertising

  2. "none" <> wrote in news:1140316587.609593.223100
    @g47g2000cwa.googlegroups.com:

    > I am trying to gunzip a file while downloading it.
    >
    > it is 11 gigs in total, so it is an important task.


    It maybe important to you.

    > what is the proper command format?


    What command format? Perl has statements, functions, operators,
    variables, modules etc but no commands.

    > everything I tried did not work. I tried a large number of
    > combinations
    >
    > open (my $fh, "gunzip -c");


    This tries to open a file named "gunzip -c" for reading. See for
    yourself:

    open my $fh, 'gunzip -c' or die "Cannot open 'gunzip -c': $!";

    What you probably want is to open a pipe to gunzip.

    Please read the posting guidelines for this group. Post a short but
    complete script which others can compile and run.


    Sinan
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Feb 19, 2006
    #2
    1. Advertising

  3. none

    Guest

    "none" <> wrote:
    > I am trying to gunzip a file while downloading it.
    >
    > it is 11 gigs in total, so it is an important task.


    Is 11 gigs supposed to have some magic significance to us? It doesn't. If
    it is important, then tell us why it is important.

    Aside from which, if the data is so large, then wouldn't it be important to
    *not* unzip it?


    >
    > what is the proper command format? everything I tried did not work. I
    > tried a large number of combinations
    >
    > open (my $fh, "gunzip -c");


    ###perhaps you are trying for this?

    open (my $fh, "| gunzip -c") or die $!;

    > $ftp->get("outfile.csv.gz", $fh);
    > close $fh;


    close $fh or die $!;

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Feb 19, 2006
    #3
  4. wrote:
    > "none" <> wrote:
    >> I am trying to gunzip a file while downloading it.
    >>
    >> it is 11 gigs in total, so it is an important task.

    >
    > Is 11 gigs supposed to have some magic significance to us? It doesn't. If
    > it is important, then tell us why it is important.


    What hardware do you use, where gunzipping an 11GB file takes negligible
    time?

    --
    John W. Kennedy
    Half an hour into unzipping the OpenOffice.org 2.0 SDK.
    John W. Kennedy, Feb 19, 2006
    #4
  5. wrote:
    > "John W. Kennedy" <> wrote:
    >> wrote:
    >>> "none" <> wrote:
    >>>> I am trying to gunzip a file while downloading it.
    >>>>
    >>>> it is 11 gigs in total, so it is an important task.
    >>> Is 11 gigs supposed to have some magic significance to us? It doesn't.
    >>> If it is important, then tell us why it is important.

    >> What hardware do you use, where gunzipping an 11GB file takes negligible
    >> time?

    >
    > What hardware do you use where gunzipping a file while you are downloading
    > requires a nonneglibly different amount of time than gunzipping it after
    > you download it?


    Since the bottleneck will normally be the download, downloading while
    gunzipping will normally take only a fraction of a second more than the
    time needed for the download alone.

    --
    John W. Kennedy
    "But now is a new thing which is very old--
    that the rich make themselves richer and not poorer,
    which is the true Gospel, for the poor's sake."
    -- Charles Williams. "Judgement at Chelmsford"
    John W. Kennedy, Feb 20, 2006
    #5
  6. "John W. Kennedy" <> writes:

    > What hardware do you use, where gunzipping an 11GB file takes
    > negligible time?


    The point is that long uncompression time has nothing to do with
    whether your perl code is correct, so you're confusing the issue by
    even bringing it up. Even so, if your program will be run
    interactively (especially if as a CGI), you would be better off *not*
    adding that delay to your program. Better would be to spawn off a
    background process to decompress it, or dump it into a directory where
    a crontask occasionally decompresses any new files found there.

    Many FTP servers will decompress files on the fly as you download
    them, but that will cost CPU at the server end and bandwidth at both
    ends, so I wouldn't do that unless you own the FTP server.


    --
    Aaron --
    http://360.yahoo.com/aaron_baugher
    Aaron Baugher, Feb 20, 2006
    #6
  7. "John W. Kennedy" <> wrote in
    news:BzaKf.71$:

    > wrote:
    >> "John W. Kennedy" <> wrote:
    >>> wrote:
    >>>> "none" <> wrote:
    >>>>> I am trying to gunzip a file while downloading it.
    >>>>>
    >>>>> it is 11 gigs in total, so it is an important task.
    >>>> Is 11 gigs supposed to have some magic significance to us? It
    >>>> doesn't. If it is important, then tell us why it is important.
    >>> What hardware do you use, where gunzipping an 11GB file takes
    >>> negligible time?

    >>
    >> What hardware do you use where gunzipping a file while you are
    >> downloading requires a nonneglibly different amount of time than
    >> gunzipping it after you download it?

    >
    > Since the bottleneck will normally be the download, downloading while
    > gunzipping will normally take only a fraction of a second more than
    > the time needed for the download alone.


    Are you claiming that the de-compression process can begin before the
    whole file is downloaded? I have to admit that I do not know if it can
    or cannot. I am just curious to know. I would have thought that kind of
    decompression (one that does not require the compressed file to be
    stored on the filesystem first) could only done with RLE.

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Feb 20, 2006
    #7
  8. none

    Guest

    A. Sinan Unur <> wrote:
    > "John W. Kennedy" <> wrote in


    >> Since the bottleneck will normally be the download, downloading while
    >> gunzipping will normally take only a fraction of a second more than
    >> the time needed for the download alone.


    > Are you claiming that the de-compression process can begin before the
    > whole file is downloaded? I have to admit that I do not know if it can
    > or cannot. I am just curious to know. I would have thought that kind of
    > decompression (one that does not require the compressed file to be
    > stored on the filesystem first) could only done with RLE.


    Why not? If the download can be piped into gunzip there should be
    no problem.

    Some FTP sites enable downloading compressed files with decompression
    on the fly but I have never bothered experimenting with this feature.

    Axel
    , Feb 20, 2006
    #8
  9. wrote in
    news:LvjKf.23610$:

    > A. Sinan Unur <> wrote:
    >> "John W. Kennedy" <> wrote in

    >
    >>> Since the bottleneck will normally be the download, downloading
    >>> while gunzipping will normally take only a fraction of a second more
    >>> than the time needed for the download alone.

    >
    >> Are you claiming that the de-compression process can begin before the
    >> whole file is downloaded? I have to admit that I do not know if it
    >> can or cannot. I am just curious to know. I would have thought that
    >> kind of decompression (one that does not require the compressed file
    >> to be stored on the filesystem first) could only done with RLE.

    >
    > Why not? If the download can be piped into gunzip there should be
    > no problem.


    Can gunzip start decompressing before it has seen the whole file? I
    don't know the format very well.

    > Some FTP sites enable downloading compressed files with decompression
    > on the fly but I have never bothered experimenting with this feature.


    In that case, the original file already exists on the FTP server. That
    is not the same as the client piping the input stream through gunzip.

    I am not claiming to know. I guess I should run a couple of experiments
    instead of taking up bandwidth here.

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Feb 20, 2006
    #9
  10. "A. Sinan Unur" <> writes:

    > Can gunzip start decompressing before it has seen the whole file? I
    > don't know the format very well.


    Yes. That's why you can do either of these:

    gunzip file.gz
    gunzip -c <file.gz >file

    The first one eliminates file.gz after creating file; the second one
    does not, since gunzip is getting file.gz streamed from stdin and
    doesn't even know it exists as a file.


    --
    Aaron --
    http://360.yahoo.com/aaron_baugher
    Aaron Baugher, Feb 20, 2006
    #10
  11. A. Sinan Unur wrote:
    > wrote in


    >>Why not? If the download can be piped into gunzip there should be
    >>no problem.

    >
    >
    > Can gunzip start decompressing before it has seen the whole file? I
    > don't know the format very well.


    Yes, it can.

    It can also compress on-the-fly, so you can compress on one side of a
    network connection and decompress on the other side. If the CPU is fast
    enough, this can be quite a speedup.

    I guess it (de)compresses block-wise.
    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
    Josef Moellers, Feb 20, 2006
    #11
  12. Aaron Baugher <> wrote in
    news::

    > "A. Sinan Unur" <> writes:
    >
    >> Can gunzip start decompressing before it has seen the whole file? I
    >> don't know the format very well.

    >
    > Yes. That's why you can do either of these:
    >
    > gunzip file.gz
    > gunzip -c <file.gz >file
    >
    > The first one eliminates file.gz after creating file; the second one
    > does not, since gunzip is getting file.gz streamed from stdin and
    > doesn't even know it exists as a file.


    In principle, that would not mean that gzip does not wait until a complete
    compressed file is available somewhere (such as in $TEMP). However, I have
    looked at the algorithm, now I realize that it probably is possible to
    start decompressing the file before seeing all the data.

    Sinan
    A. Sinan Unur, Feb 20, 2006
    #12
  13. Josef Moellers <> wrote in news:dtcnlv
    $rdu$-siemens.com:

    > A. Sinan Unur wrote:
    >> wrote in

    >
    >>>Why not? If the download can be piped into gunzip there should be
    >>>no problem.

    >>
    >>
    >> Can gunzip start decompressing before it has seen the whole file? I
    >> don't know the format very well.

    >
    > Yes, it can.
    >
    > It can also compress on-the-fly, so you can compress on one side of a
    > network connection and decompress on the other side. If the CPU is fast
    > enough, this can be quite a speedup.
    >
    > I guess it (de)compresses block-wise.


    Indeed, that's what it looks like:

    http://www.gzip.org/algorithm.txt

    Sinan
    A. Sinan Unur, Feb 20, 2006
    #13
  14. none

    Guest

    "John W. Kennedy" <> wrote:
    > wrote:
    > > "John W. Kennedy" <> wrote:
    > >> wrote:
    > >>> "none" <> wrote:
    > >>>> I am trying to gunzip a file while downloading it.
    > >>>>
    > >>>> it is 11 gigs in total, so it is an important task.
    > >>> Is 11 gigs supposed to have some magic significance to us? It
    > >>> doesn't. If it is important, then tell us why it is important.
    > >> What hardware do you use, where gunzipping an 11GB file takes
    > >> negligible time?

    > >
    > > What hardware do you use where gunzipping a file while you are
    > > downloading requires a nonneglibly different amount of time than
    > > gunzipping it after you download it?

    >
    > Since the bottleneck will normally be the download, downloading while
    > gunzipping will normally take only a fraction of a second more than the
    > time needed for the download alone.


    I don't know. I don't expect he will be downloading 11 Gig over a
    dial-up line. So, especially if the gzip did a good job and therefore
    the uncompressed data is several times more than 11 Gig, I think writing
    the uncompressed data to the local disk might actually be the bottleneck.
    In which case he should probably store it zipped and unzip on the fly with
    a pipe open when he goes to use the data.

    In any case, if the OP had told us, we wouldn't have to speculate.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Feb 20, 2006
    #14
  15. Aaron Baugher wrote:
    > "John W. Kennedy" <> writes:
    >
    >> What hardware do you use, where gunzipping an 11GB file takes
    >> negligible time?

    >
    > The point is that long uncompression time has nothing to do with
    > whether your perl code is correct, so you're confusing the issue by
    > even bringing it up.


    It isn't my code, and I never claimed it to be correctly coded. I was
    responding to a challenge to it being done at all.

    > Even so, if your program will be run
    > interactively (especially if as a CGI), you would be better off *not*
    > adding that delay to your program. Better would be to spawn off a
    > background process to decompress it,


    It is not "my" program, it is clearly intended to run on the client, and
    the entire point was to run it in a spawned process; the original poster
    just got the syntax wrong.

    --
    John W. Kennedy
    "But now is a new thing which is very old--
    that the rich make themselves richer and not poorer,
    which is the true Gospel, for the poor's sake."
    -- Charles Williams. "Judgement at Chelmsford"
    John W. Kennedy, Feb 21, 2006
    #15
  16. none <> wrote:
    > what is the proper command format? everything I tried did not work. I
    > tried a large number of combinations


    > open (my $fh, "gunzip -c");


    this tries to open file "gunzip", which is bad.

    > $ftp->get("outfile.csv.gz", $fh);


    Since you are saving the file, anything opening it will only read what is
    already downloaded and then finish.

    Simple, but unportable approach:

    my $url = "ftp://server/filename.gz" ;
    open ( my $fh, "wget -q -O - $url | gunzip -c|" ) or die "Problem: $!\n" ;

    while ( <$fh> ) {
    # do something

    }

    close $fh ;

    Maybe that would do in your case. Otherwise, there are the Perl modules
    which allow you gunzipping files on the fly (like PerlO::Gzip).

    j.

    --
    January Weiner, Feb 22, 2006
    #16
  17. January Weiner wrote:
    > none <> wrote:
    >
    >>what is the proper command format? everything I tried did not work. I
    >>tried a large number of combinations

    >
    >
    >> open (my $fh, "gunzip -c");

    >
    >
    > this tries to open file "gunzip", which is bad.


    Slight correction: it will (attempt to) open the file "gunzip -c",
    easily checked by

    open(my $fh, '>', "gunzip -c");

    The failure would be evident, if one checked the return value of open:

    open(my $fh, "gunzip -c") or die "$0: cannot start gunzip: $!";

    However, note that "You are not allowed to "open" to a command that
    pipes both in and out, but see IPC::Open2, IPC::Open3, and
    "Bidirectional Communication with Another Process" in perlipc for
    alternatives." (perldoc -f open), so this would not work at all as desired.

    Josef
    --
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
    -- T. Pratchett
    Josef Moellers, Feb 22, 2006
    #17
  18. none

    none Guest

    Sinan,

    why don't you grow up before you post a response like that.

    How can I post sample code if I don't know how to write the code to do
    it?

    Why would something that is important to me not be worth posting about?
    If you don't think it is important, don't reply.

    I know what I need, I just don't know how the PERL IMPLEMENTATION does
    it. Believe it or not, every programming language is different. OS's
    are different too.

    For those who wish to know how I solved it... load_test is my custom
    parsing script that imports into the database while it is unzipping.
    And yes it is faster than downloading, unzipping, and importing. In
    fact it takes about the same time to download whether you import to the
    database and unzip or not.

    The myth that decompressing slows down a cpu and uses too many process
    cycles was broken such a long time ago; I am surprised it is still
    around. The bottleneck is the bandwidth, as another user pointed out.

    open (my $fh, "|gunzip -c |cut -f1,8 | perl ./load_test.pl");
    $ftp->binary;
    $ftp->get("ffilename.gz", $fh);
    close $fh;
    none, Feb 22, 2006
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andreas Kuntzagk

    Re: how to gunzip a string ?

    Andreas Kuntzagk, Jul 31, 2003, in forum: Python
    Replies:
    0
    Views:
    434
    Andreas Kuntzagk
    Jul 31, 2003
  2. Bill Loren

    Re: how to gunzip a string ?

    Bill Loren, Jul 31, 2003, in forum: Python
    Replies:
    0
    Views:
    1,056
    Bill Loren
    Jul 31, 2003
  3. Replies:
    2
    Views:
    1,206
  4. Istvan Gouritz
    Replies:
    0
    Views:
    231
    Istvan Gouritz
    Oct 11, 2010
  5. D. Buck
    Replies:
    2
    Views:
    447
    D. Buck
    Jun 29, 2004
Loading...

Share This Page