gunzip stdin stream?

M

Markus Dehmann

I have a convenient way to open possibly gzip'ed files:

open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");

So, if the file name ends in .gz I send it through gunzip. So far, so
good. (I don't want to use the PerlIO:Gzip module because it's not
installed by default, so it's a hassle.)

But now, my script should be callable in the following ways:
$ cat data | ./script.pl
$ ./script.pl data.gz
$ ./script.pl data

Usually, I would just use the while loop: while(<>){...}. But that does
not read gzip'ed data.

How would you handle that? I could think of the following code, but it's
long and not nice ...

if(defined $ARGV[0] && -f $ARGV[0]){
readFromFile($ARGV[0]);
}else{
readFromStdin();
}

sub readFromFile{
my ($f) = @_;
open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f")
or die("Could not open $f: $!");
while(<F>){
processLine($_);
}
close F;
}

sub readFromStdin{
while(<>){
processLine($_);
}
}

sub processLine{ ... }


Thanks!
Markus
 
A

attn.steven.kuo

Markus said:
I have a convenient way to open possibly gzip'ed files:

open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");

So, if the file name ends in .gz I send it through gunzip. So far, so
good. (I don't want to use the PerlIO:Gzip module because it's not
installed by default, so it's a hassle.)

But now, my script should be callable in the following ways:
$ cat data | ./script.pl
$ ./script.pl data.gz
$ ./script.pl data

Usually, I would just use the while loop: while(<>){...}. But that does
not read gzip'ed data.

How would you handle that? I could think of the following code, but it's
long and not nice ...

if(defined $ARGV[0] && -f $ARGV[0]){
readFromFile($ARGV[0]);
}else{
readFromStdin();
}

(snipped)

Look under 'perldoc perlopentut'
where the minus (-) file is discussed:

my $input = defined($ARGV[0]) ? $ARGV[0] : '-';
$input = $input =~ /\.gz$/
? "gunzip -c $input |"
: $input ;

open (FH, $input)
or die $!;

process_line($_) while (<FH>);

close FH;
 
J

jgraber

Markus said:
I have a convenient way to open possibly gzip'ed files:
open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");

So, if the file name ends in .gz I send it through gunzip. So far, so
good. (I don't want to use the PerlIO:Gzip module because it's not
installed by default, so it's a hassle.)

But now, my script should be callable in the following ways:
$ cat data | ./script.pl
$ ./script.pl data.gz
$ ./script.pl data

Usually, I would just use the while loop: while(<>){...}. But that does
not read gzip'ed data.
(snipped)

Look under 'perldoc perlopentut'
where the minus (-) file is discussed:

my $input = defined($ARGV[0]) ? $ARGV[0] : '-';
$input = $input =~ /\.gz$/
? "gunzip -c $input |"
: $input ;
open (FH, $input)
or die $!;
process_line($_) while (<FH>);
close FH;

I discovered that my currently installed version of gzip -d
would correctly read plain files, gzipped files (.gz),
and even packed files (.Z). So now I use gzip -d
for everything. According to top, it uses only 1%
of the CPU when called uselessly. It also works
for the occasional file that is gzipped without a .gz
extention, or vice-versa. I remember it working for
$infile = "-" as well, for those gzipped output pipes.

I've been recommending this as the "universal input pipe",
$gzip_pid = open( FH, $fp="/usr/local/bin/gzip -dfc $infile |" )
|| die "Cant open input pipe '$fp' : $!\n";

I'm primarily used to writing in perl4 style.
I'd welcome the likely followup to this post with an
example of a more modern style.
Is this a security hole for the occasionally
maliciously named file like "x;rm -rf / "
?
 
A

Anno Siegel

Markus Dehmann said:
I have a convenient way to open possibly gzip'ed files:

open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");

So, if the file name ends in .gz I send it through gunzip. So far, so
good. (I don't want to use the PerlIO:Gzip module because it's not
installed by default, so it's a hassle.)

But now, my script should be callable in the following ways:
$ cat data | ./script.pl
$ ./script.pl data.gz
$ ./script.pl data

Usually, I would just use the while loop: while(<>){...}. But that does
not read gzip'ed data.

How would you handle that? I could think of the following code, but it's
long and not nice ...

[snip]

/\.gz$/ and $_ = "gunzip -c $_ |" for @ARGV;
print while <>;

Anno
 
M

Markus Dehmann

Markus said:
I have a convenient way to open possibly gzip'ed files:
open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");

So, if the file name ends in .gz I send it through gunzip. So far, so
good. (I don't want to use the PerlIO:Gzip module because it's not
installed by default, so it's a hassle.)

But now, my script should be callable in the following ways:
$ cat data | ./script.pl
$ ./script.pl data.gz
$ ./script.pl data

Usually, I would just use the while loop: while(<>){...}. But that does
not read gzip'ed data.

(snipped)

Look under 'perldoc perlopentut'
where the minus (-) file is discussed:

my $input = defined($ARGV[0]) ? $ARGV[0] : '-';
$input = $input =~ /\.gz$/
? "gunzip -c $input |"
: $input ;
open (FH, $input)
or die $!;
process_line($_) while (<FH>);
close FH;


I discovered that my currently installed version of gzip -d
would correctly read plain files, gzipped files (.gz),
and even packed files (.Z). So now I use gzip -d
for everything. According to top, it uses only 1%
of the CPU when called uselessly. It also works
for the occasional file that is gzipped without a .gz
extention, or vice-versa. I remember it working for
$infile = "-" as well, for those gzipped output pipes.

I've been recommending this as the "universal input pipe",
$gzip_pid = open( FH, $fp="/usr/local/bin/gzip -dfc $infile |" )

Now, a slightly offtopic question:

Why do people often use the full path to an application (like here,
/usr/local/bin/gzip)? That just makes it more unlikely to work, since
my gzip might be in /usr/bin.

Why not just: open(F, "gzip -dfc $infile |");


Same thing with the perl command: Why don't we write
#!perl -w

as the first line of a perl program, and let the $PATH variable figure
out which perl is meant?

Thanks!
Markus
 
A

Anno Siegel

Markus Dehmann said:
[snip]

Now, a slightly offtopic question:

Why do people often use the full path to an application (like here,
/usr/local/bin/gzip)? That just makes it more unlikely to work, since
my gzip might be in /usr/bin.

If it is run in an environment with no path (or an unusual path), it
will fail.
Why not just: open(F, "gzip -dfc $infile |");

The command path is a convenience for interactive use. In a program
(even a "script") you want to be sure what executable you're calling,
mostly for security and reliability reasons.
Same thing with the perl command: Why don't we write
#!perl -w

as the first line of a perl program, and let the $PATH variable figure
out which perl is meant?

See above. You want to know your interpreter.

Anno
 
J

jgraber

Markus Dehmann said:
I have a convenient way to open possibly gzip'ed fil
open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");

[gzip -d reads .txt, .gz, .Z files]
... the "universal input pipe",
$gzip_pid = open( FH, $fp="/usr/local/bin/gzip -dfc $infile |" )

Markus Dehmann said:
Now, a slightly offtopic question:

Why do people often use the full path to an application (like here,
/usr/local/bin/gzip)? That just makes it more unlikely to work, since
my gzip might be in /usr/bin.

Why not just: open(F, "gzip -dfc $infile |");

#!perl -w

as the first line of a perl program, and let the $PATH variable figure
out which perl is meant?

I do it, because I write scripts for a audience that may have a
variety of PATH based on their group/culture/history,
and "I think I know better than they do" what path I want to use;
ie the one that works for me.
Plus I might have read a security warning somewhere that said
a full path is better, because it doesn't call the shell interpreter?
But that doesn't appear to be the case nowdays. (perl 5.8.0)

Diverging back to a related subject, I tried updating this to
PBP recommendations, but am puzzled by some of the results.

#!/usr/local/bin/perl
use strict;
use warnings;

## version 0
my ($pid0,$file0,$fp0,$infile0);
$infile0 = "x;touch y0;echo 'twerp'";
$pid0 = open( FH, $fp0 = "/usr/local/bin/gzip -dfc $infile0 |" ) || die "oops '$fp0' :$!\n";
print "v0:pid0='$pid0', fp0='$fp0', infile0 = '$infile0', \nlines = >>",<FH>,"<<\n";
close (FH) || warn "close error on '$fp0' : $!\n";
# results : also creates file 'y0' due to shell processing
#v0:pid0='6900', fp0='/usr/local/bin/gzip -dfc x;touch y0;echo 'twerp' |', infile0 = 'x;touch y0;echo 'twerp'',
#lines = >>this is line 1 of file x
#twerp
#<<

# version 1
my $infile1 = "x;touch y1;echo 'twerp'";
my $fp1; # why cant my $fp1 be used in next line?
my $pid1 = open( my $file1, "-|", $fp1 = "/usr/local/bin/gzip -dfc $infile1" ) || die "oops '$fp1' :$!\n";
print "v1:pid='$pid1', file1='$file1', fp1='$fp1', infile1 = '$infile1', \nline1 = >>",<$file1>,"<<\n";
close ($file1) || warn "close error on '$fp1' : $!\n";
# Results : lexical filehandle and variables; still creates 'y1' due to shell processing
#v1:pid='6903', file1='GLOB(0x804ccb0)', fp1='/usr/local/bin/gzip -dfc x;touch y1;echo 'twerp'', infile1 = 'x;touch y1;echo 'twerp'',
#line1 = >>this is line 1 of file x
#twerp
#<<

# version2
my $infile2 = "x;touch y2;echo 'twerp'";
my $fp2; # perl v3 p751 "pipe from bare command
my $pid2 = open( my $file2, "-|", $fp2 = 'gzip', '-dfc', $infile2 ) || die "oops '$fp2' :$!\n";
print "v2:pid2='$pid2', file2='$file2', fp2='$fp2', infile2 = '$infile2', \nline2 = >>",<$file2>,"<<\n";
close ($file2) || warn "close error on '$fp2' : $!\n";
# Results: bareword commands, no shell processing, no touch on file 'y2', no error since file exists
#v2:pid2='6906', file2='GLOB(0x8062d94)', fp2='gzip', infile2 = 'x;touch y2;echo 'twerp'',
#line2 = >>this is only line of file "x;touch y2;echo 'twerp'"
#<<

# version3
my $infile3 = "x;touch y3;echo 'twerp'";
my $fp3; # perl v3 p751 "pipe from bare command
my $pid3 = open( my $file3, "-|", $fp3 = 'gzip', '-dfc', $infile3 ) || die "oops '$fp3' :$!\n";
print "v3:pid3='$pid3', file3='$file3', fp3='$fp3', infile3 = '$infile3', \nline3 = >>",<$file3>,"<<\n";
close ($file3) || warn "close error on '$fp3' : $!\n";
# Results : no shell processing, shell error on gzip since no such file
#gzip: x;touch y3;echo 'twerp'.gz: No such file or directory
#v3:pid3='6907', file3='GLOB(0x80954ec)', fp3='gzip', infile3 = 'x;touch y3;echo 'twerp'',
#line3 = >><<
#close error on 'gzip' :

# version4
my $infile4 = "-";
my $fp4; # perl v3 p751 "pipe from bare command" to avoid shell processing, Note 'gzip -dfc' will call shell!
my $pid4 = open( my $file4, "-|", $fp4 = 'gzip', '-dfc', $infile4 ) || die "oops '$fp4' :$!\n";
print "v4:pid4='$pid4', file4='$file4', fp4='$fp4', infile4 = '$infile4', \nline4 = >>",<$file4>,"<<\n";
close ($file4) || warn "close error on '$fp4' : $!\n";
#Results: waits for 'keyboardtext^D^D^D' from stdin, works on - for std input.
#v4:pid4='6908', file4='GLOB(0x804cae8)', fp4='gzip', infile4 = '-',
#line4 = >>keyboardtext<<

Summary:
PBP lexical version of my "gzip -dfc is the universal input pipe" works
without calling unsafe shell interpretation on variable $infile4
Note that pipe command $fp is now an array
my $pid5 = open( my $file5, "-|", my @fp5 = ('gzip', '-dfc', $infile5) );
$pid5 || die "oops '@fp5' :$!\n";

Notes:
1 '/bin/gzip' won't call shell
2 '/bin/gzip -dfc' will call shell
3 Can't use 'my' $fp4 inside the open, and still die $fp4 on same line.
my $pid4 = open( my $file4, "-|", my $fp4 = 'gzip', '-dfc', $infile4 ) || die "$fp4:$!\n";
Error: Global symbol "$fp4" requires explicit package name at line 36.

Question:
Why is there no $! error message printed for version 3 from this line?
close ($file3) || warn "close error on '$fp3' : $!\n";
 
A

axel

Markus Dehmann said:
Same thing with the perl command: Why don't we write
#!perl -w
as the first line of a perl program, and let the $PATH variable figure
out which perl is meant?

Because $PATH is not applied to the 'shebang' line.

sh-2.05a$ cat q1
#!sh

echo "Goat"

sh-2.05a$ sh q1
Goat
sh-2.05a$ ./q1
sh: ./q1: sh: bad interpreter: No such file or directory
sh-2.05a$

Axel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top