gunzip while downloading via ftp

N

none

I am trying to gunzip a file while downloading it.

it is 11 gigs in total, so it is an important task.

what is the proper command format? everything I tried did not work. I
tried a large number of combinations

open (my $fh, "gunzip -c");
$ftp->get("outfile.csv.gz", $fh);
close $fh;
 
A

A. Sinan Unur

I am trying to gunzip a file while downloading it.

it is 11 gigs in total, so it is an important task.

It maybe important to you.
what is the proper command format?

What command format? Perl has statements, functions, operators,
variables, modules etc but no commands.
everything I tried did not work. I tried a large number of
combinations

open (my $fh, "gunzip -c");

This tries to open a file named "gunzip -c" for reading. See for
yourself:

open my $fh, 'gunzip -c' or die "Cannot open 'gunzip -c': $!";

What you probably want is to open a pipe to gunzip.

Please read the posting guidelines for this group. Post a short but
complete script which others can compile and run.


Sinan
 
X

xhoster

none said:
I am trying to gunzip a file while downloading it.

it is 11 gigs in total, so it is an important task.

Is 11 gigs supposed to have some magic significance to us? It doesn't. If
it is important, then tell us why it is important.

Aside from which, if the data is so large, then wouldn't it be important to
*not* unzip it?

what is the proper command format? everything I tried did not work. I
tried a large number of combinations

open (my $fh, "gunzip -c");

###perhaps you are trying for this?

open (my $fh, "| gunzip -c") or die $!;
$ftp->get("outfile.csv.gz", $fh);
close $fh;

close $fh or die $!;

Xho
 
J

John W. Kennedy

Is 11 gigs supposed to have some magic significance to us? It doesn't. If
it is important, then tell us why it is important.

What hardware do you use, where gunzipping an 11GB file takes negligible
time?
 
J

John W. Kennedy

What hardware do you use where gunzipping a file while you are downloading
requires a nonneglibly different amount of time than gunzipping it after
you download it?

Since the bottleneck will normally be the download, downloading while
gunzipping will normally take only a fraction of a second more than the
time needed for the download alone.

--
John W. Kennedy
"But now is a new thing which is very old--
that the rich make themselves richer and not poorer,
which is the true Gospel, for the poor's sake."
-- Charles Williams. "Judgement at Chelmsford"
 
A

Aaron Baugher

John W. Kennedy said:
What hardware do you use, where gunzipping an 11GB file takes
negligible time?

The point is that long uncompression time has nothing to do with
whether your perl code is correct, so you're confusing the issue by
even bringing it up. Even so, if your program will be run
interactively (especially if as a CGI), you would be better off *not*
adding that delay to your program. Better would be to spawn off a
background process to decompress it, or dump it into a directory where
a crontask occasionally decompresses any new files found there.

Many FTP servers will decompress files on the fly as you download
them, but that will cost CPU at the server end and bandwidth at both
ends, so I wouldn't do that unless you own the FTP server.
 
A

A. Sinan Unur

Since the bottleneck will normally be the download, downloading while
gunzipping will normally take only a fraction of a second more than
the time needed for the download alone.

Are you claiming that the de-compression process can begin before the
whole file is downloaded? I have to admit that I do not know if it can
or cannot. I am just curious to know. I would have thought that kind of
decompression (one that does not require the compressed file to be
stored on the filesystem first) could only done with RLE.

Sinan
 
A

axel

Are you claiming that the de-compression process can begin before the
whole file is downloaded? I have to admit that I do not know if it can
or cannot. I am just curious to know. I would have thought that kind of
decompression (one that does not require the compressed file to be
stored on the filesystem first) could only done with RLE.

Why not? If the download can be piped into gunzip there should be
no problem.

Some FTP sites enable downloading compressed files with decompression
on the fly but I have never bothered experimenting with this feature.

Axel
 
A

A. Sinan Unur

(e-mail address removed) wrote in
Why not? If the download can be piped into gunzip there should be
no problem.

Can gunzip start decompressing before it has seen the whole file? I
don't know the format very well.
Some FTP sites enable downloading compressed files with decompression
on the fly but I have never bothered experimenting with this feature.

In that case, the original file already exists on the FTP server. That
is not the same as the client piping the input stream through gunzip.

I am not claiming to know. I guess I should run a couple of experiments
instead of taking up bandwidth here.

Sinan
 
A

Aaron Baugher

A. Sinan Unur said:
Can gunzip start decompressing before it has seen the whole file? I
don't know the format very well.

Yes. That's why you can do either of these:

gunzip file.gz
gunzip -c <file.gz >file

The first one eliminates file.gz after creating file; the second one
does not, since gunzip is getting file.gz streamed from stdin and
doesn't even know it exists as a file.
 
J

Josef Moellers

A. Sinan Unur said:
(e-mail address removed) wrote in


Can gunzip start decompressing before it has seen the whole file? I
don't know the format very well.

Yes, it can.

It can also compress on-the-fly, so you can compress on one side of a
network connection and decompress on the other side. If the CPU is fast
enough, this can be quite a speedup.

I guess it (de)compresses block-wise.
 
A

A. Sinan Unur

Yes. That's why you can do either of these:

gunzip file.gz
gunzip -c <file.gz >file

The first one eliminates file.gz after creating file; the second one
does not, since gunzip is getting file.gz streamed from stdin and
doesn't even know it exists as a file.

In principle, that would not mean that gzip does not wait until a complete
compressed file is available somewhere (such as in $TEMP). However, I have
looked at the algorithm, now I realize that it probably is possible to
start decompressing the file before seeing all the data.

Sinan
 
X

xhoster

John W. Kennedy said:
Since the bottleneck will normally be the download, downloading while
gunzipping will normally take only a fraction of a second more than the
time needed for the download alone.

I don't know. I don't expect he will be downloading 11 Gig over a
dial-up line. So, especially if the gzip did a good job and therefore
the uncompressed data is several times more than 11 Gig, I think writing
the uncompressed data to the local disk might actually be the bottleneck.
In which case he should probably store it zipped and unzip on the fly with
a pipe open when he goes to use the data.

In any case, if the OP had told us, we wouldn't have to speculate.

Xho
 
J

John W. Kennedy

Aaron said:
The point is that long uncompression time has nothing to do with
whether your perl code is correct, so you're confusing the issue by
even bringing it up.

It isn't my code, and I never claimed it to be correctly coded. I was
responding to a challenge to it being done at all.
Even so, if your program will be run
interactively (especially if as a CGI), you would be better off *not*
adding that delay to your program. Better would be to spawn off a
background process to decompress it,

It is not "my" program, it is clearly intended to run on the client, and
the entire point was to run it in a spawned process; the original poster
just got the syntax wrong.

--
John W. Kennedy
"But now is a new thing which is very old--
that the rich make themselves richer and not poorer,
which is the true Gospel, for the poor's sake."
-- Charles Williams. "Judgement at Chelmsford"
 
J

January Weiner

none said:
what is the proper command format? everything I tried did not work. I
tried a large number of combinations
open (my $fh, "gunzip -c");

this tries to open file "gunzip", which is bad.
$ftp->get("outfile.csv.gz", $fh);

Since you are saving the file, anything opening it will only read what is
already downloaded and then finish.

Simple, but unportable approach:

my $url = "ftp://server/filename.gz" ;
open ( my $fh, "wget -q -O - $url | gunzip -c|" ) or die "Problem: $!\n" ;

while ( <$fh> ) {
# do something

}

close $fh ;

Maybe that would do in your case. Otherwise, there are the Perl modules
which allow you gunzipping files on the fly (like PerlO::Gzip).

j.

--
 
J

Josef Moellers

January said:
this tries to open file "gunzip", which is bad.

Slight correction: it will (attempt to) open the file "gunzip -c",
easily checked by

open(my $fh, '>', "gunzip -c");

The failure would be evident, if one checked the return value of open:

open(my $fh, "gunzip -c") or die "$0: cannot start gunzip: $!";

However, note that "You are not allowed to "open" to a command that
pipes both in and out, but see IPC::Open2, IPC::Open3, and
"Bidirectional Communication with Another Process" in perlipc for
alternatives." (perldoc -f open), so this would not work at all as desired.

Josef
 
N

none

Sinan,

why don't you grow up before you post a response like that.

How can I post sample code if I don't know how to write the code to do
it?

Why would something that is important to me not be worth posting about?
If you don't think it is important, don't reply.

I know what I need, I just don't know how the PERL IMPLEMENTATION does
it. Believe it or not, every programming language is different. OS's
are different too.

For those who wish to know how I solved it... load_test is my custom
parsing script that imports into the database while it is unzipping.
And yes it is faster than downloading, unzipping, and importing. In
fact it takes about the same time to download whether you import to the
database and unzip or not.

The myth that decompressing slows down a cpu and uses too many process
cycles was broken such a long time ago; I am surprised it is still
around. The bottleneck is the bandwidth, as another user pointed out.

open (my $fh, "|gunzip -c |cut -f1,8 | perl ./load_test.pl");
$ftp->binary;
$ftp->get("ffilename.gz", $fh);
close $fh;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top