Re: utilities in perl

Discussion in 'Perl Misc' started by Peter J. Holzer, Sep 21, 2013.

  1. On 2013-09-21 14:49, Henry Law <> wrote:
    > On 21/09/13 01:04, Cal Dershowitz wrote:
    >> if ($#ARGV < 1) {
    >> print "Needs directory and filetype\n";
    >> exit;
    >> }
    >> my $dir = $ARGV[0];
    >> my $filetype = $ARGV[1];

    [...]
    >> Q1) Do the ultimate 2 statements effectively pipe the input from stdin
    >> to stdout?

    >
    > No. STDIN contained the two values in one line;


    No STDIN didn't contain those values at all. @ARGV is not STDIN!

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
     
    Peter J. Holzer, Sep 21, 2013
    #1
    1. Advertising

  2. On 2013-09-21 17:26, Cal Dershowitz <> wrote:
    > On 9/21/2013 8:33 AM, Peter J. Holzer wrote:
    >> On 2013-09-21 14:49, Henry Law <> wrote:
    >>> On 21/09/13 01:04, Cal Dershowitz wrote:
    >>>> if ($#ARGV < 1) {
    >>>> print "Needs directory and filetype\n";
    >>>> exit;
    >>>> }
    >>>> my $dir = $ARGV[0];
    >>>> my $filetype = $ARGV[1];

    >> [...]
    >>>> Q1) Do the ultimate 2 statements effectively pipe the input from stdin
    >>>> to stdout?
    >>>
    >>> No. STDIN contained the two values in one line;

    >>
    >> No STDIN didn't contain those values at all. @ARGV is not STDIN!

    >
    > Hmmmm. I'm looking at Stevens and Rago, p. 774
    >
    > #include <fcntl.h>
    >
    > int getopt(int argc, const * const argv[], const char *options);
    >
    > It certainly reminds a person of the straight C version of it.


    The C getopt function (and the arguments argc and argv to the main
    function) doesn't have anything to do with stdin in C either.

    > Maybe you can say a few words why this is not STDIN.


    Because it isn't. They are completely separate. There is no "why" except
    that Ken and Dennis thought it was a good idea to have both a standard
    input and program arguments.

    hp

    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
     
    Peter J. Holzer, Sep 21, 2013
    #2
    1. Advertising

  3. "Peter J. Holzer" <> wrote:
    >On 2013-09-21 17:26, Cal Dershowitz <> wrote:

    [...]
    >>>>> my $dir = $ARGV[0];
    >>>>> my $filetype = $ARGV[1];
    >>> [...]
    >>>>> Q1) Do the ultimate 2 statements effectively pipe the input from stdin
    >>>>> to stdout?
    >>>>
    >>>> No. STDIN contained the two values in one line;
    >>>
    >>> No STDIN didn't contain those values at all. @ARGV is not STDIN!

    >>
    >> Maybe you can say a few words why this is not STDIN.


    Try a trivial experiment and redirect STDIN, e.g. feed it from a pipe:

    cat whateverfile | myprog.pl foo bar
    (yes, this is a useless use of cat, just to make the example
    super-explicit)

    Now, what value to you expect in $dir and $filetype? Based on your
    reasoning it must be (part of) the content of whateverfile because that
    is where the content of STDIN is coming from.

    jue
     
    Jürgen Exner, Sep 21, 2013
    #3
  4. Peter J. Holzer

    hymie! Guest

    In our last episode, the evil Dr. Lacto had captured our hero,
    Cal Dershowitz <>, who said:

    >I don't want to spend too long talking about something where I clearly
    >don't get it, but everyone else here does. I know this is a perl group,
    >so C talk is OT.
    >
    >int main(int argc, char * argv)
    >
    >Do people still think these values don't come from STDIN in this context?


    STDIN means that a program that is already running has asked you a
    question and is waiting for you to type in an answer.

    In your case, on the other hand, you are starting the program with a set
    of arguments already provided when the program starts. That's ARGV.

    It is possible, however, that one of the arguments you provide to
    the program is - . That is a clue to the operating system that
    "this argument should not read data from a pre-existing file, it should
    read from STDIN."

    --hymie! http://lactose.homelinux.net/~hymie
    -------------------------------------------------------------------------------
     
    hymie!, Sep 21, 2013
    #4
  5. (hymie!) wrote:
    >In our last episode, the evil Dr. Lacto had captured our hero,
    > Cal Dershowitz <>, who said:
    >
    >>I don't want to spend too long talking about something where I clearly
    >>don't get it, but everyone else here does. I know this is a perl group,
    >>so C talk is OT.
    >>
    >>int main(int argc, char * argv)
    >>
    >>Do people still think these values don't come from STDIN in this context?


    Of course they don't come from STDIN. They are command line parameters
    and have absolutely nothing to do with STDIN.

    jue
     
    Jürgen Exner, Sep 21, 2013
    #5
  6. Cal Dershowitz <> writes:
    <snip>
    > Ok. Does one say "data from the command line" for whatever populates
    > ARGV? Something specified by Unix?


    Yes, but it would be better just to say "the arguments for the program".
    After all, that's all @ARGV stands for -- the "argument vector" so named
    because C programmers use argv as the canonical name for the C
    equivalent.

    They do often come from a command line, but some environments don't have a
    command line (for example Perl scripts running on a web server may not
    have such a thing), and programs can be started in other ways (see, for
    example, perldoc -f exec).

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Sep 21, 2013
    #6
  7. On 2013-09-21 19:59, hymie! <> wrote:
    > In our last episode, the evil Dr. Lacto had captured our hero,
    > Cal Dershowitz <>, who said:
    >>I don't want to spend too long talking about something where I clearly
    >>don't get it, but everyone else here does. I know this is a perl group,
    >>so C talk is OT.
    >>
    >>int main(int argc, char * argv)
    >>
    >>Do people still think these values don't come from STDIN in this context?

    >
    > STDIN means that a program that is already running has asked you a
    > question and is waiting for you to type in an answer.


    No, it doesn't mean that. Many programs reading from stdin never ask you
    any questions. For example all the typical Unix filters: cat, grep, cut,
    sort, ...


    > In your case, on the other hand, you are starting the program with a set
    > of arguments already provided when the program starts. That's ARGV.


    Yes, he has provided the program with arguments and those can be
    accessed through @ARGV. However, that hasn't anything to do with stdin.
    A program can choose to read or not to read from stdin whether it was
    passed any command line arguments or not.


    > It is possible, however, that one of the arguments you provide to
    > the program is - . That is a clue to the operating system that
    > "this argument should not read data from a pre-existing file, it should
    > read from STDIN."


    Also wrong. It's not a clue to the operating system, it is a clue to the
    program. Many programs accept a "-" instead of a filename to mean either
    "read from stdin" or "write to stdout". This is something the program
    has to handle. tho OS just passes the "-" to the program.


    To summarize:

    On startup, a program is provided with three sets of information:

    1) The argument vector: This is an array of strings containing the
    command name and any "command line" arguments, i.e. the arguments
    you type on the command line after the command (interactively), or
    the arguments to exec (in a program). (Perl is a bit unusual in that
    it shoves the first argument (the command name) into $0 and only the
    rest of the arguments into @ARGV).

    2) The environment: Another array of strings. By convention each program
    passes this through to any programs it invokes and the strings are in
    "key=value" format. This contains the PATH, locale information,
    information about the terminal (if applicable) and other
    configuration information.

    3) A set of three file descriptors numbered 0, 1, and 2, and typically
    called stdin, stdout, and stderr respectively in most programming
    languages. These are *file descriptors*, not strings. You can read
    from them (well, you should read only from stdin) with the read
    system call (or higher level functions like getc() in C or <> in
    Perl) and write to them (stdout and stderr, at least) with write (or
    print or printf, etc.)

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
     
    Peter J. Holzer, Sep 22, 2013
    #7
  8. Peter J. Holzer

    Tim McDaniel Guest

    In article <>,
    Ben Morrow <> wrote:
    >
    >Quoth "Peter J. Holzer" <>:
    >>
    >> To summarize:
    >>
    >> On startup, a program is provided with three sets of information:
    >>
    >> 1) The argument vector: This is an array of strings containing the
    >> command name and any "command line" arguments, i.e. the arguments
    >> you type on the command line after the command (interactively), or
    >> the arguments to exec (in a program). (Perl is a bit unusual in that
    >> it shoves the first argument (the command name) into $0 and only the
    >> rest of the arguments into @ARGV).

    >
    >No, the first argument (the command name) goes into $^X, the first
    >non-option argument goes into $0, and the rest of the arguments go into
    >@ARGV.


    I found your answer confusing. When I type a command line, like just
    now with
    $ chmod u+x local/test/106.pl
    $ local/test/106.pl hello world
    $0 was 'local/test/106.pl', as I expected, which was what I was
    thinking of as the "command name", and I was thinking of "hello" as
    the "first non-option argument".

    However, the first line of the script was
    #! /usr/bin/perl
    and $^X was output as '/usr/bin/perl'.

    So I think the explanation should be expanded. In UNIXy systems, for
    a script that starts with #! and run from the command line, the
    program on the #! line is put into $^X, and in particular, if it's a
    Perl script, $^X is the perl program being run. $0 is set using the
    first word on the command line (identifying the script itself), and
    the rest of the arguments are put into @ARGV.

    --
    Tim McDaniel,
     
    Tim McDaniel, Sep 23, 2013
    #8
  9. On 2013-09-22 18:43, Ben Morrow <> wrote:
    >
    > Quoth "Peter J. Holzer" <>:
    >>
    >> To summarize:
    >>
    >> On startup, a program is provided with three sets of information:
    >>
    >> 1) The argument vector: This is an array of strings containing the
    >> command name and any "command line" arguments, i.e. the arguments
    >> you type on the command line after the command (interactively), or
    >> the arguments to exec (in a program). (Perl is a bit unusual in that
    >> it shoves the first argument (the command name) into $0 and only the
    >> rest of the arguments into @ARGV).

    >
    > No, the first argument (the command name) goes into $^X, the first
    > non-option argument goes into $0, and the rest of the arguments go into
    > @ARGV.


    That's what I get from adding parenthetical remarks just befor posting.
    You are right of course, from the POV of the perl process. $0 and @ARGV
    are handles as I described from the POV of the caller, but I didn't
    write that.


    > (Unless perl gets $^X from somewhere else, in which case the
    > first argument is thrown away, or you pass an -e option, in which case
    > $0 is "-e".)
    >
    > This is further confused by the kernel's (and perl's) #! processing, but
    > by the time perl gets its final argument list to process the first
    > argument is a path to perl itself.


    Why "further confused"? The mechanism you describe is perl's attempt to
    undo the effects of kernel's #! processing.

    The caller invokes »execl("/usr/local/bin/script", "script", "foo",
    NULL),
    the kernel finds "#!/usr/bin/perl" in "/usr/local/bin/script" and
    invokes /usr/bin/perl with the argv ["/usr/bin/perl",
    "/usr/local/bin/script", "foo"] instead (note that the original argv[0]
    is lost in the process)
    the perl interpreter then "hides" itself by putting what it thinks was
    the original argv[0] into $0 and the original argv[1] .. argv[argc-1]
    into @ARGV.


    > This is not really unusual: it's what all the shells do, and I'd wager
    > also any other language which has some equivalent to $0.
    >
    >> 2) The environment: Another array of strings. By convention each program
    >> passes this through to any programs it invokes and the strings are in
    >> "key=value" format. This contains the PATH, locale information,
    >> information about the terminal (if applicable) and other
    >> configuration information.
    >>
    >> 3) A set of three file descriptors numbered 0, 1, and 2, and typically
    >> called stdin, stdout, and stderr respectively in most programming
    >> languages. These are *file descriptors*, not strings. You can read
    >> from them (well, you should read only from stdin) with the read
    >> system call (or higher level functions like getc() in C or <> in
    >> Perl) and write to them (stdout and stderr, at least) with write (or
    >> print or printf, etc.)

    >
    > In fact, a completely arbitrary set of file descriptors, which may or
    > may not be contiguously numbered. It's entirely possible to invoke a
    > program with one of the standard fds closed, though it's not a good idea
    > since many programs misbehave.


    Linux enforces that at least these three file descriptors are open at
    least on setuid programs, but I don't know offhand whether that's done
    by the kernel or the startup code. And I am aware that this isn't true
    for other unixes.


    > It's also not uncommon to pass additional open file descriptors.


    Yes, I should have written "at least three". You can always pass more,
    and indeed some Unixes did pass a fd to the controlling terminal as file
    descriptor 3 ("stdtty") by default.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
     
    Peter J. Holzer, Sep 23, 2013
    #9
  10. (Tim McDaniel) writes:
    > In article <>,
    > Ben Morrow <> wrote:
    >>
    >>Quoth "Peter J. Holzer" <>:
    >>>
    >>> To summarize:
    >>>
    >>> On startup, a program is provided with three sets of information:
    >>>
    >>> 1) The argument vector: This is an array of strings containing the
    >>> command name and any "command line" arguments, i.e. the arguments
    >>> you type on the command line after the command (interactively), or
    >>> the arguments to exec (in a program). (Perl is a bit unusual in that
    >>> it shoves the first argument (the command name) into $0 and only the
    >>> rest of the arguments into @ARGV).

    >>
    >>No, the first argument (the command name) goes into $^X, the first
    >>non-option argument goes into $0, and the rest of the arguments go into
    >>@ARGV.

    >
    > I found your answer confusing. When I type a command line, like just
    > now with
    > $ chmod u+x local/test/106.pl
    > $ local/test/106.pl hello world
    > $0 was 'local/test/106.pl', as I expected, which was what I was
    > thinking of as the "command name", and I was thinking of "hello" as
    > the "first non-option argument".


    That's probably how the shell invoked it but it need not be done in this
    way. Assuming execl as an example, the general format of that is

    execl("/path/to/file", "argument #0", ...);

    the first argument to execl being the pathname of the file which is
    supposed to be executed and the next being what ends up in argv[0]. By
    convention, this should be 'the program name' and IIRC, POSIX even says
    somewhere that it should really just be the name and not the
    path. Assuming that /tmp/a.pl is the following perl script,

    -----
    #!/usr/bin/perl
    print($^X, "\t", $0, "\t", $ARGV[0], "\n");
    -----

    this could be invoked via

    -----
    #include <unistd.h>

    int main(void)
    {
    execl("/tmp/a.pl", "Blafasel", "Are we having an argument?", (void *)0);
    return 0;
    }
    -----

    and the output would be

    -----
    /usr/bin/perl /tmp/a.pl Are we having an argument?
    -----

    with the original 'program name' ("Blafasel") vanishing in the
    process. It could also be called with

    -----
    #include <unistd.h>

    int main(void)
    {
    execl("/usr/bin/perl", "Now what?", "/tmp/a.pl", "Are we having an argument?", (void *)0);
    return 0;
    }
    -----

    This will result in the same output on a system which supports
    /proc/self/exe aka 'Linux' but in case perl has to resort to the real
    'program name' argument, $^X should become "Now what?" (according to the
    documentation).
     
    Rainer Weikusat, Sep 23, 2013
    #10
  11. Peter J. Holzer

    Justin C Guest

    On 2013-09-22, Ben Morrow <> wrote:
    >
    > Quoth Cal Dershowitz <>:
    >>
    >> Do germans typically have directories for their own stuff that have
    >> german encodings like:
    >>
    >> /home/jue/Documents/Persoenlichkeiten/
    >>
    >> , where you have an actual o umlaut as opposed to the english
    >> transcription? Maybe even the u too.

    >
    > The answer to that is rather complicated :).


    [snip]

    Ben, you really need to get out more.


    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Sep 23, 2013
    #11
  12. Peter J. Holzer

    hymie! Guest

    In our last episode, the evil Dr. Lacto had captured our hero,
    "Peter J. Holzer" <>, who said:
    >On 2013-09-21 19:59, hymie! <> wrote:
    >> In our last episode, the evil Dr. Lacto had captured our hero,
    >> Cal Dershowitz <>, who said:
    >>>I don't want to spend too long talking about something where I clearly
    >>>don't get it, but everyone else here does. I know this is a perl group,
    >>>so C talk is OT.
    >>>
    >>>int main(int argc, char * argv)
    >>>
    >>>Do people still think these values don't come from STDIN in this context?

    >>
    >> STDIN means that a program that is already running has asked you a
    >> question and is waiting for you to type in an answer.

    >
    >No, it doesn't mean that. Many programs reading from stdin never ask you
    >any questions.


    I was trying to simplify the situation for a user who, by his own
    admission, doesn't get it.

    >> It is possible, however, that one of the arguments you provide to
    >> the program is - . That is a clue to the operating system that
    >> "this argument should not read data from a pre-existing file, it should
    >> read from STDIN."

    >
    >Also wrong. It's not a clue to the operating system, it is a clue to the
    >program.


    My mistake.

    --hymie! http://lactose.homelinux.net/~hymie
    -------------------------------------------------------------------------------
     
    hymie!, Sep 23, 2013
    #12
  13. Peter J. Holzer

    Tim McDaniel Guest

    In article <>,
    Rainer Weikusat <> wrote:
    > (Tim McDaniel) writes:
    >> I found your answer confusing. When I type a command line, ...

    ....
    >That's probably how the shell invoked it but it need not be done in this
    >way. Assuming execl as an example, ...


    I was restricting myself to the shell, and in particular to my
    *perception* of the command line, in particular the "program name" and
    "first argument". Certainly exec.*() makes things clearer and allows
    playing some games.

    --
    Tim McDaniel,
     
    Tim McDaniel, Sep 23, 2013
    #13
  14. On 2013-09-24 06:54, Ben Morrow <> wrote:
    > Quoth "Peter J. Holzer" <>:
    >> On 2013-09-22 18:43, Ben Morrow <> wrote:
    >> >
    >> > No, the first argument (the command name) goes into $^X, the first
    >> > non-option argument goes into $0, and the rest of the arguments go into
    >> > @ARGV.

    >>
    >> That's what I get from adding parenthetical remarks just befor posting.
    >> You are right of course, from the POV of the perl process. $0 and @ARGV
    >> are handles as I described from the POV of the caller, but I didn't
    >> write that.
    >>
    >> > (Unless perl gets $^X from somewhere else, in which case the
    >> > first argument is thrown away, or you pass an -e option, in which case
    >> > $0 is "-e".)
    >> >
    >> > This is further confused by the kernel's (and perl's) #! processing, but
    >> > by the time perl gets its final argument list to process the first
    >> > argument is a path to perl itself.

    >>
    >> Why "further confused"? The mechanism you describe is perl's attempt to
    >> undo the effects of kernel's #! processing.

    >
    > Hmm, we seem to be approaching this from opposite sides :). Here and
    > above, you seem to be considering invocation via #! to be the 'normal'
    > case,


    Yes. When I invoke a script, I normally invoke it as
    name args
    or maybe as
    dir/name args
    (if it isn't in the path).

    I never invoke it as
    interpreter name args
    unless I actually want to invoke the interpreter in a special mode (e.g.
    perl -d for debugging, perl -d:NYTProf for profiling).

    Normally, I want to invoke the *program*, and I don't care whether it is
    an exectuable which can be interpreted directly by the CPU or a script
    which needs an additional interpreter.

    And on the other side, as a programmer, I don't want to care about this
    either. The program was invoked with certain arguments, and I want to
    get at those arguments (and maybe also the program name). And I don't
    want to care about whether there is an interpreter involved at run-time
    and how that was invoked. If there is, it should get out of the way (and
    thankfully, perl does).


    [and now for something completely different]

    > IIRC some of the qmail programs made non-standard use of the first three
    > file descriptors, such as taking input on fd 1 or writing to fd 0.


    Yes. I though about mentioning that, but decided against confusing the
    issue further.

    Also, if file descriptors 0 to 2 are still attached to the tty they were
    originally opened on, it is very likely that all three of them are open
    for both reading and writing. You shouldn't rely on that, of course, but
    I've seen a few interactive programs write to fd 0 (after all fd 1 isn't
    necessarily the same device ...).

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
     
    Peter J. Holzer, Sep 24, 2013
    #14
  15. >Quoth Cal Dershowitz <>:
    >> I'm just catching up on this reading. It seems like almost everyone had
    >> a turn to be wrong about some aspect of it. Would anyone say that STDIN
    >> is doing anything here:
    >>
    >> find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" | xargs -n2 echo cp
    >> find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" | xargs -n2 cp


    Yes. Whatever find writes to the STDOUT filehandle is piped into the
    STDIN filehandle of xargs such that xargs can read those values.
    But STDIN itself is not "doing" anything, it is just a passive data
    source from where xargs can read.

    What does this have to do with Perl?

    jue
     
    Jürgen Exner, Sep 25, 2013
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JWhite

    Perl CGI utilities?

    JWhite, Jul 1, 2008, in forum: Perl Misc
    Replies:
    5
    Views:
    297
    Todd Wade
    Jul 5, 2008
  2. $Bill

    Re: utilities in perl

    $Bill, Sep 21, 2013, in forum: Perl Misc
    Replies:
    4
    Views:
    229
    $Bill
    Sep 22, 2013
  3. George Mpouras

    Re: utilities in perl

    George Mpouras, Sep 21, 2013, in forum: Perl Misc
    Replies:
    5
    Views:
    185
    J. Gleixner
    Sep 26, 2013
  4. Ben Bacarisse

    Re: perl hash utilities

    Ben Bacarisse, Sep 26, 2013, in forum: Perl Misc
    Replies:
    1
    Views:
    170
    Ben Bacarisse
    Sep 26, 2013
  5. Uri Guttman

    Re: perl hash utilities

    Uri Guttman, Sep 26, 2013, in forum: Perl Misc
    Replies:
    2
    Views:
    178
    Charles DeRykus
    Sep 27, 2013
Loading...

Share This Page