gunzip stdin stream?

Discussion in 'Perl Misc' started by Markus Dehmann, Jan 21, 2006.

  1. I have a convenient way to open possibly gzip'ed files:

    open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");

    So, if the file name ends in .gz I send it through gunzip. So far, so
    good. (I don't want to use the PerlIO:Gzip module because it's not
    installed by default, so it's a hassle.)

    But now, my script should be callable in the following ways:
    $ cat data | ./script.pl
    $ ./script.pl data.gz
    $ ./script.pl data

    Usually, I would just use the while loop: while(<>){...}. But that does
    not read gzip'ed data.

    How would you handle that? I could think of the following code, but it's
    long and not nice ...

    if(defined $ARGV[0] && -f $ARGV[0]){
    readFromFile($ARGV[0]);
    }else{
    readFromStdin();
    }

    sub readFromFile{
    my ($f) = @_;
    open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f")
    or die("Could not open $f: $!");
    while(<F>){
    processLine($_);
    }
    close F;
    }

    sub readFromStdin{
    while(<>){
    processLine($_);
    }
    }

    sub processLine{ ... }


    Thanks!
    Markus
     
    Markus Dehmann, Jan 21, 2006
    #1
    1. Advertising

  2. Markus Dehmann

    Guest

    Markus Dehmann wrote:
    > I have a convenient way to open possibly gzip'ed files:
    >
    > open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");
    >
    > So, if the file name ends in .gz I send it through gunzip. So far, so
    > good. (I don't want to use the PerlIO:Gzip module because it's not
    > installed by default, so it's a hassle.)
    >
    > But now, my script should be callable in the following ways:
    > $ cat data | ./script.pl
    > $ ./script.pl data.gz
    > $ ./script.pl data
    >
    > Usually, I would just use the while loop: while(<>){...}. But that does
    > not read gzip'ed data.
    >
    > How would you handle that? I could think of the following code, but it's
    > long and not nice ...
    >
    > if(defined $ARGV[0] && -f $ARGV[0]){
    > readFromFile($ARGV[0]);
    > }else{
    > readFromStdin();
    > }


    (snipped)

    Look under 'perldoc perlopentut'
    where the minus (-) file is discussed:

    my $input = defined($ARGV[0]) ? $ARGV[0] : '-';
    $input = $input =~ /\.gz$/
    ? "gunzip -c $input |"
    : $input ;

    open (FH, $input)
    or die $!;

    process_line($_) while (<FH>);

    close FH;

    --
    Hope this helps,
    Steven
     
    , Jan 21, 2006
    #2
    1. Advertising

  3. Markus Dehmann

    Guest

    "" <> writes:
    > Markus Dehmann wrote:
    > > I have a convenient way to open possibly gzip'ed files:
    > > open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");
    > >
    > > So, if the file name ends in .gz I send it through gunzip. So far, so
    > > good. (I don't want to use the PerlIO:Gzip module because it's not
    > > installed by default, so it's a hassle.)
    > >
    > > But now, my script should be callable in the following ways:
    > > $ cat data | ./script.pl
    > > $ ./script.pl data.gz
    > > $ ./script.pl data
    > >
    > > Usually, I would just use the while loop: while(<>){...}. But that does
    > > not read gzip'ed data.

    > (snipped)
    >
    > Look under 'perldoc perlopentut'
    > where the minus (-) file is discussed:
    >
    > my $input = defined($ARGV[0]) ? $ARGV[0] : '-';
    > $input = $input =~ /\.gz$/
    > ? "gunzip -c $input |"
    > : $input ;
    > open (FH, $input)
    > or die $!;
    > process_line($_) while (<FH>);
    > close FH;


    I discovered that my currently installed version of gzip -d
    would correctly read plain files, gzipped files (.gz),
    and even packed files (.Z). So now I use gzip -d
    for everything. According to top, it uses only 1%
    of the CPU when called uselessly. It also works
    for the occasional file that is gzipped without a .gz
    extention, or vice-versa. I remember it working for
    $infile = "-" as well, for those gzipped output pipes.

    I've been recommending this as the "universal input pipe",
    $gzip_pid = open( FH, $fp="/usr/local/bin/gzip -dfc $infile |" )
    || die "Cant open input pipe '$fp' : $!\n";

    I'm primarily used to writing in perl4 style.
    I'd welcome the likely followup to this post with an
    example of a more modern style.
    Is this a security hole for the occasionally
    maliciously named file like "x;rm -rf / "
    ?
    --
    Joel
     
    , Jan 23, 2006
    #3
  4. Markus Dehmann

    Anno Siegel Guest

    Markus Dehmann <> wrote in comp.lang.perl.misc:
    > I have a convenient way to open possibly gzip'ed files:
    >
    > open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");
    >
    > So, if the file name ends in .gz I send it through gunzip. So far, so
    > good. (I don't want to use the PerlIO:Gzip module because it's not
    > installed by default, so it's a hassle.)
    >
    > But now, my script should be callable in the following ways:
    > $ cat data | ./script.pl
    > $ ./script.pl data.gz
    > $ ./script.pl data
    >
    > Usually, I would just use the while loop: while(<>){...}. But that does
    > not read gzip'ed data.
    >
    > How would you handle that? I could think of the following code, but it's
    > long and not nice ...


    [snip]

    /\.gz$/ and $_ = "gunzip -c $_ |" for @ARGV;
    print while <>;

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Jan 23, 2006
    #4
  5. wrote:
    > "" <> writes:
    >
    >>Markus Dehmann wrote:
    >>
    >>>I have a convenient way to open possibly gzip'ed files:
    >>>open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");
    >>>
    >>>So, if the file name ends in .gz I send it through gunzip. So far, so
    >>>good. (I don't want to use the PerlIO:Gzip module because it's not
    >>>installed by default, so it's a hassle.)
    >>>
    >>>But now, my script should be callable in the following ways:
    >>>$ cat data | ./script.pl
    >>>$ ./script.pl data.gz
    >>>$ ./script.pl data
    >>>
    >>>Usually, I would just use the while loop: while(<>){...}. But that does
    >>>not read gzip'ed data.

    >>
    >>(snipped)
    >>
    >>Look under 'perldoc perlopentut'
    >>where the minus (-) file is discussed:
    >>
    >>my $input = defined($ARGV[0]) ? $ARGV[0] : '-';
    >> $input = $input =~ /\.gz$/
    >> ? "gunzip -c $input |"
    >> : $input ;
    >>open (FH, $input)
    >> or die $!;
    >>process_line($_) while (<FH>);
    >>close FH;

    >
    >
    > I discovered that my currently installed version of gzip -d
    > would correctly read plain files, gzipped files (.gz),
    > and even packed files (.Z). So now I use gzip -d
    > for everything. According to top, it uses only 1%
    > of the CPU when called uselessly. It also works
    > for the occasional file that is gzipped without a .gz
    > extention, or vice-versa. I remember it working for
    > $infile = "-" as well, for those gzipped output pipes.
    >
    > I've been recommending this as the "universal input pipe",
    > $gzip_pid = open( FH, $fp="/usr/local/bin/gzip -dfc $infile |" )


    Now, a slightly offtopic question:

    Why do people often use the full path to an application (like here,
    /usr/local/bin/gzip)? That just makes it more unlikely to work, since
    my gzip might be in /usr/bin.

    Why not just: open(F, "gzip -dfc $infile |");


    Same thing with the perl command: Why don't we write
    #!perl -w

    as the first line of a perl program, and let the $PATH variable figure
    out which perl is meant?

    Thanks!
    Markus
     
    Markus Dehmann, Jan 23, 2006
    #5
  6. Markus Dehmann

    Anno Siegel Guest

    Markus Dehmann <> wrote in comp.lang.perl.misc:
    > wrote:
    > > "" <> writes:
    > >
    > >>Markus Dehmann wrote:
    > >>
    > >>>I have a convenient way to open possibly gzip'ed files:
    > >>>open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");


    [snip]

    > Now, a slightly offtopic question:
    >
    > Why do people often use the full path to an application (like here,
    > /usr/local/bin/gzip)? That just makes it more unlikely to work, since
    > my gzip might be in /usr/bin.


    If it is run in an environment with no path (or an unusual path), it
    will fail.

    > Why not just: open(F, "gzip -dfc $infile |");


    The command path is a convenience for interactive use. In a program
    (even a "script") you want to be sure what executable you're calling,
    mostly for security and reliability reasons.

    > Same thing with the perl command: Why don't we write
    > #!perl -w
    >
    > as the first line of a perl program, and let the $PATH variable figure
    > out which perl is meant?


    See above. You want to know your interpreter.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Jan 23, 2006
    #6
  7. Markus Dehmann

    Guest

    Markus Dehmann <> writes:
    >I have a convenient way to open possibly gzip'ed fil
    >open(F, ($f =~ m/\.gz$/) ? "gunzip -c $f |" : "$f");


    wrote:
    > [gzip -d reads .txt, .gz, .Z files]
    > ... the "universal input pipe",
    > $gzip_pid = open( FH, $fp="/usr/local/bin/gzip -dfc $infile |" )


    Markus Dehmann <> writes:
    > Now, a slightly offtopic question:
    >
    > Why do people often use the full path to an application (like here,
    > /usr/local/bin/gzip)? That just makes it more unlikely to work, since
    > my gzip might be in /usr/bin.
    >
    > Why not just: open(F, "gzip -dfc $infile |");
    >
    > > Same thing with the perl command: Why don't we write

    > #!perl -w
    >
    > as the first line of a perl program, and let the $PATH variable figure
    > out which perl is meant?


    I do it, because I write scripts for a audience that may have a
    variety of PATH based on their group/culture/history,
    and "I think I know better than they do" what path I want to use;
    ie the one that works for me.
    Plus I might have read a security warning somewhere that said
    a full path is better, because it doesn't call the shell interpreter?
    But that doesn't appear to be the case nowdays. (perl 5.8.0)

    Diverging back to a related subject, I tried updating this to
    PBP recommendations, but am puzzled by some of the results.

    #!/usr/local/bin/perl
    use strict;
    use warnings;

    ## version 0
    my ($pid0,$file0,$fp0,$infile0);
    $infile0 = "x;touch y0;echo 'twerp'";
    $pid0 = open( FH, $fp0 = "/usr/local/bin/gzip -dfc $infile0 |" ) || die "oops '$fp0' :$!\n";
    print "v0:pid0='$pid0', fp0='$fp0', infile0 = '$infile0', \nlines = >>",<FH>,"<<\n";
    close (FH) || warn "close error on '$fp0' : $!\n";
    # results : also creates file 'y0' due to shell processing
    #v0:pid0='6900', fp0='/usr/local/bin/gzip -dfc x;touch y0;echo 'twerp' |', infile0 = 'x;touch y0;echo 'twerp'',
    #lines = >>this is line 1 of file x
    #twerp
    #<<

    # version 1
    my $infile1 = "x;touch y1;echo 'twerp'";
    my $fp1; # why cant my $fp1 be used in next line?
    my $pid1 = open( my $file1, "-|", $fp1 = "/usr/local/bin/gzip -dfc $infile1" ) || die "oops '$fp1' :$!\n";
    print "v1:pid='$pid1', file1='$file1', fp1='$fp1', infile1 = '$infile1', \nline1 = >>",<$file1>,"<<\n";
    close ($file1) || warn "close error on '$fp1' : $!\n";
    # Results : lexical filehandle and variables; still creates 'y1' due to shell processing
    #v1:pid='6903', file1='GLOB(0x804ccb0)', fp1='/usr/local/bin/gzip -dfc x;touch y1;echo 'twerp'', infile1 = 'x;touch y1;echo 'twerp'',
    #line1 = >>this is line 1 of file x
    #twerp
    #<<

    # version2
    my $infile2 = "x;touch y2;echo 'twerp'";
    my $fp2; # perl v3 p751 "pipe from bare command
    my $pid2 = open( my $file2, "-|", $fp2 = 'gzip', '-dfc', $infile2 ) || die "oops '$fp2' :$!\n";
    print "v2:pid2='$pid2', file2='$file2', fp2='$fp2', infile2 = '$infile2', \nline2 = >>",<$file2>,"<<\n";
    close ($file2) || warn "close error on '$fp2' : $!\n";
    # Results: bareword commands, no shell processing, no touch on file 'y2', no error since file exists
    #v2:pid2='6906', file2='GLOB(0x8062d94)', fp2='gzip', infile2 = 'x;touch y2;echo 'twerp'',
    #line2 = >>this is only line of file "x;touch y2;echo 'twerp'"
    #<<

    # version3
    my $infile3 = "x;touch y3;echo 'twerp'";
    my $fp3; # perl v3 p751 "pipe from bare command
    my $pid3 = open( my $file3, "-|", $fp3 = 'gzip', '-dfc', $infile3 ) || die "oops '$fp3' :$!\n";
    print "v3:pid3='$pid3', file3='$file3', fp3='$fp3', infile3 = '$infile3', \nline3 = >>",<$file3>,"<<\n";
    close ($file3) || warn "close error on '$fp3' : $!\n";
    # Results : no shell processing, shell error on gzip since no such file
    #gzip: x;touch y3;echo 'twerp'.gz: No such file or directory
    #v3:pid3='6907', file3='GLOB(0x80954ec)', fp3='gzip', infile3 = 'x;touch y3;echo 'twerp'',
    #line3 = >><<
    #close error on 'gzip' :

    # version4
    my $infile4 = "-";
    my $fp4; # perl v3 p751 "pipe from bare command" to avoid shell processing, Note 'gzip -dfc' will call shell!
    my $pid4 = open( my $file4, "-|", $fp4 = 'gzip', '-dfc', $infile4 ) || die "oops '$fp4' :$!\n";
    print "v4:pid4='$pid4', file4='$file4', fp4='$fp4', infile4 = '$infile4', \nline4 = >>",<$file4>,"<<\n";
    close ($file4) || warn "close error on '$fp4' : $!\n";
    #Results: waits for 'keyboardtext^D^D^D' from stdin, works on - for std input.
    #v4:pid4='6908', file4='GLOB(0x804cae8)', fp4='gzip', infile4 = '-',
    #line4 = >>keyboardtext<<

    Summary:
    PBP lexical version of my "gzip -dfc is the universal input pipe" works
    without calling unsafe shell interpretation on variable $infile4
    Note that pipe command $fp is now an array
    my $pid5 = open( my $file5, "-|", my @fp5 = ('gzip', '-dfc', $infile5) );
    $pid5 || die "oops '@fp5' :$!\n";

    Notes:
    1 '/bin/gzip' won't call shell
    2 '/bin/gzip -dfc' will call shell
    3 Can't use 'my' $fp4 inside the open, and still die $fp4 on same line.
    my $pid4 = open( my $file4, "-|", my $fp4 = 'gzip', '-dfc', $infile4 ) || die "$fp4:$!\n";
    Error: Global symbol "$fp4" requires explicit package name at line 36.

    Question:
    Why is there no $! error message printed for version 3 from this line?
    close ($file3) || warn "close error on '$fp3' : $!\n";

    --
    Joel
     
    , Jan 24, 2006
    #7
  8. Markus Dehmann

    Guest

    Markus Dehmann <> wrote:
    > Same thing with the perl command: Why don't we write
    > #!perl -w


    > as the first line of a perl program, and let the $PATH variable figure
    > out which perl is meant?


    Because $PATH is not applied to the 'shebang' line.

    sh-2.05a$ cat q1
    #!sh

    echo "Goat"

    sh-2.05a$ sh q1
    Goat
    sh-2.05a$ ./q1
    sh: ./q1: sh: bad interpreter: No such file or directory
    sh-2.05a$

    Axel
     
    , Jan 24, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andreas Kuntzagk

    Re: how to gunzip a string ?

    Andreas Kuntzagk, Jul 31, 2003, in forum: Python
    Replies:
    0
    Views:
    446
    Andreas Kuntzagk
    Jul 31, 2003
  2. Bill Loren

    Re: how to gunzip a string ?

    Bill Loren, Jul 31, 2003, in forum: Python
    Replies:
    0
    Views:
    1,074
    Bill Loren
    Jul 31, 2003
  3. Replies:
    2
    Views:
    1,254
  4. flebber

    Gzip - gunzip using zlib

    flebber, Jun 9, 2007, in forum: Python
    Replies:
    2
    Views:
    349
    Stefan Behnel
    Jun 9, 2007
  5. kj
    Replies:
    3
    Views:
    764
Loading...

Share This Page