[Q] $ARGV, <>, and command-line Perl

J

J Krugman

I've rtfm'd this to death, but I still don't get it. If someone
can explain it to me (as opposed to tell me something like "man
perlfrotz"), I'd be very grateful.

The immediate problem that serves as the context of the question
is this: find all the files below /path/to/subdir (this is Linux)
that contain either of the strings "foo bar baz" or "quux frobozz",
and do this *from the command line* (i.e. I'm looking for a one-liner
here, not a longwinded affair using File::Find, etc.).

I tried

% perl -e 'print "$ARGV\n" if grep /(foo bar baz|quux frobozz)/, <>' `find /path/to/subdir -type f`

which failed to generate any output, even though I *know* that
there are files under /path/to/subdir that contain strings "foo
bar baz" and/or "quux frobozz".

Even if it had worked, the last alternative is not good, because
it can easily fail through choking the shell with an excessively
long arguments list. There has to be a better way.

At any rate, I also tried

% perl -e 'for (@ARGV) { print "$ARGV\n" if grep /(foo bar baz|quux frobozz)/, <> }' `find /path/to/subdir -type f`

which also failed. In fact, even

% perl -e 'for (@ARGV) { print "$ARGV\n" }' `find /path/to/subdir -type f`

failed: it generated a whole bunch of empty lines. Clearly I have
*no clue* of what's going on. What exactly is the relationship
between $ARGV and <>? Is it possible to write a simple one-liner
that cycles through all the lines of *each* of the files in named
in @ARGV and prints the name of the file if at least one of its
lines meets a condition?

Any help would be much appreciated.

jill
 
J

J. Gleixner

J said:
I've rtfm'd this to death, but I still don't get it. If someone
can explain it to me (as opposed to tell me something like "man
perlfrotz"), I'd be very grateful.

The immediate problem that serves as the context of the question
is this: find all the files below /path/to/subdir (this is Linux)
that contain either of the strings "foo bar baz" or "quux frobozz",
and do this *from the command line* (i.e. I'm looking for a one-liner
here, not a longwinded affair using File::Find, etc.).

I tried

Since you're on a real OS, why not simply use the shell?

find /path/to/subdir -exec egrep -l 'foo bar baz|quux frobozz' {} \;
 
A

A. Sinan Unur

The immediate problem that serves as the context of the question
is this: find all the files below /path/to/subdir (this is Linux)
that contain either of the strings "foo bar baz" or "quux frobozz",
and do this *from the command line* (i.e. I'm looking for a one-liner
here, not a longwinded affair using File::Find, etc.).

Hmmmm ...
I tried

% perl -e 'print "$ARGV\n" if grep /(foo bar baz|quux frobozz)/, <>'
`find /path/to/subdir -type f`

Why? find, the command whose output you are processing, can take a regex
to match against. Use that.

$ find . -regex '.*/z.*' -type f -print
../Dload/fpTeX-0.7/package/zed-csp.tpm
../Dload/fpTeX-0.7/package/zed-csp.zip
../Dload/fpTeX-0.7/package/zefonts.tpm
../Dload/fpTeX-0.7/package/zefonts.zip
../z.pl
../zz.pl

Please consult

man find
% perl -e 'for (@ARGV) { print "$ARGV\n" if grep /(foo bar baz|quux
frobozz)/, <> }' `find /path/to/subdir -type f`

You could also go the pure Perl route and use File::Find::Rule but that's
going to get a little wordy.

I am not too well versed in various shell syntaxes but aren't you
supposed to write the above as:

% find /path/to/subdir -type f | perl -e "print if /foo/"

But all this seems so unnecessary.

Sinan
 
K

KKramsch

Since you're on a real OS, why not simply use the shell?
find /path/to/subdir -exec egrep -l 'foo bar baz|quux frobozz' {} \;


I wish there were a standard term for "answering something other
than what the OP asked." I bet that this describes far more than
50% of the answers given to questions posted in the Usenet.

I can think of one reason why the OP didn't simply "use the shell"
and that is because there are times when one wants to use regexps
(perhaps coupled with other tests, e.g. such as this regexp must
happen in the file's first line) that are more sophisticated than
the ones that egrep can handle (clearly the "foo bar" regexp was
just a generic example).

To the OP, I can't answer the more theoretical aspects of your
question; I'll leave that to the experts. But the following works
for me:

$ find ~/ -type f | xargs perl -e 'for (@ARGV) { next unless -T; while (<>) { ++$found if /(foo bar baz|quux frobozz)/; if (eof && $found) { print "$ARGV\n"; $found = 0; }}}'

I'm sure there are less verbose and/or more efficient (e.g. not
reading *every* line) ways of doing the above in Perl; I look
forward to reading them.

Karl
 
B

Brian McCauley

J Krugman wrote:

I tried

% perl -e 'print "$ARGV\n" if grep /(foo bar baz|quux frobozz)/, <>' `find /path/to/subdir -type f`

which failed to generate any output, even though I *know* that
there are files under /path/to/subdir that contain strings "foo
bar baz" and/or "quux frobozz".

Look at your code. Your print is outside the loop.

You slurp all the lines from all the input files into a big list. Then
you count the number of of matching lines in the list. Then, and only
then, if that number is non-zero you print the name of the _current_
input file. Of course at this point there is no current input file.
Even if it had worked, the last alternative is not good, because
it can easily fail through choking the shell with an excessively
long arguments list. There has to be a better way.

At any rate, I also tried

% perl -e 'for (@ARGV) { print "$ARGV\n" if grep /(foo bar baz|quux frobozz)/, <> }' `find /path/to/subdir -type f`

Exacly the same problem as above.
which also failed. In fact, even

% perl -e 'for (@ARGV) { print "$ARGV\n" }' `find /path/to/subdir -type f`

failed: it generated a whole bunch of empty lines. Clearly I have
*no clue* of what's going on. What exactly is the relationship
between $ARGV and <>?

You problem is that you are failing to realise that in a list context <>
slurps all the lines the input files into a single list. The value of
$ARGV is only meaningful if you are using said:
Is it possible to write a simple one-liner
that cycles through all the lines of *each* of the files in named
in @ARGV and prints the name of the file if at least one of its
lines meets a condition?

perl -ne 'print "$ARGV\n" if /whatever/ && !$seen{$ARGV}++'
 
B

bill

perl -ne 'print "$ARGV\n" if /whatever/ && !$seen{$ARGV}++'


Interesting topic.

I have two follow up questions regarding this:

1. Is there a simple modification of this one-liner that would short
circuit the unnecessary reading of lines when $seen{$ARGV} evaluates
to true?

I tried

perl -ne 'if (/whatever/) { print "$ARGV\n"; last }'

which seems to work, but I really don't know what I'm doing.

2. A related issue is the resetting of $. . Suppose that I wanted
to check for /whatever/ only within the first 10 lines of every
file. This is perhaps getting to be too difficult for a "one-liner",
but I tried

perl -ne '$seen{$ARGV}++ if /whatever/; if ($seen{$ARGV} || eof || $. >= 10) { print "$ARGV\n" if $seen{$ARGV}; close ARGV }'

which almost works, but I get a slew of weird "Can't open ./xyz:
no such file or directory." errors at the end. I haven't been able
to figure out the reason for these. (Actually, I got those with
Brian's version too.)

bill
 
B

Brian McCauley

bill said:
Interesting topic.

I have two follow up questions regarding this:

1. Is there a simple modification of this one-liner that would short
circuit the unnecessary reading of lines when $seen{$ARGV} evaluates
to true?

But.... this is one-liner Perl! It's very unlikely that any such
optomization could replay the time to make the keystrokes much less the
time to think about it.

My understanding is that close(ARGV) will advance <> to the next file.

If you were going to write a real script then sure it would be worth the
candle.

I tried

perl -ne 'if (/whatever/) { print "$ARGV\n"; last }'

which seems to work, but I really don't know what I'm doing.

I would expect that to print only the first matching file.
2. A related issue is the resetting of $. . Suppose that I wanted
to check for /whatever/ only within the first 10 lines of every
file. This is perhaps getting to be too difficult for a "one-liner",
but I tried

perl -ne '$seen{$ARGV}++ if /whatever/; if ($seen{$ARGV} || eof || $. >= 10) { print "$ARGV\n" if $seen{$ARGV}; close ARGV }'

which almost works, but I get a slew of weird "Can't open ./xyz:
no such file or directory." errors at the end. I haven't been able
to figure out the reason for these. (Actually, I got those with
Brian's version too.)

Seems likely that there are files with shell metacharacters or spaces in
their names.
 
L

lpetrov

KKramsch said:
In <[email protected]> "J. Gleixner"




I wish there were a standard term for "answering something other
than what the OP asked." I bet that this describes far more than
50% of the answers given to questions posted in the Usenet.

I can think of one reason why the OP didn't simply "use the shell"
and that is because there are times when one wants to use regexps
(perhaps coupled with other tests, e.g. such as this regexp must
happen in the file's first line) that are more sophisticated than
the ones that egrep can handle (clearly the "foo bar" regexp was
just a generic example).

To the OP, I can't answer the more theoretical aspects of your
question; I'll leave that to the experts. But the following works
for me:

$ find ~/ -type f | xargs perl -e 'for (@ARGV) { next unless -T;
while ( said:
I'm sure there are less verbose and/or more efficient (e.g. not
reading *every* line) ways of doing the above in Perl; I look
forward to reading them.

Karl
perl -ne '/search_string/ && print "$ARGV:$&\n"' *
 
A

Arndt Jonasson

A. Sinan Unur said:
[...]
Why? find, the command whose output you are processing, can take a regex
to match against. Use that.

$ find . -regex '.*/z.*' -type f -print

The OP said he was using Linux, where 'find' does have the -regex option,
but it may be worth noting that the -regex option does not exist on
all Unix platforms.
 
B

Brian McCauley

bill said:
No, actually, it prints more than one.

I cannot reproduce that on Perl 5.8.4 (ActiveState Win32 build 810), or
5.6.1 or 5.8.0 or 5.8.5 (Linux).

Which version are you using?
 
B

bill

In said:
bill wrote:
I cannot reproduce that on Perl 5.8.4 (ActiveState Win32 build 810), or
5.6.1 or 5.8.0 or 5.8.5 (Linux).

Hmm. If that's the case, then there's an error somewhere in the
following perl documentation:

(from perlrun)

-n causes Perl to assume the following loop around your
program, which makes it iterate over filename argu­
ments somewhat like sed -n or awk:

LINE:
while (<>) {
... # your program goes here
}

and (from perlop)

...The loop

while (<>) {
... # code for each line
}

is equivalent to the following Perl-like pseudo code:

unshift(@ARGV, '-') unless @ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}

except that it isn't so cumbersome to say, and will actu­
ally work. It really does shift the @ARGV array and put
the current filename into the $ARGV variable. It also
uses filehandle ARGV internally--<> is just a synonym for
<ARGV>, which is magical. (The pseudo code above doesn't
work because it treats <ARGV> as non-magical.)

Of course the catch could be in that last "non-magical" bit, which
as far as I know is totally undefined crap (i.e. useless
non-documentation). This documentation suggests that a "last" in
the code given to a command-line script run under -ne should exit
only the inner loop above; instead it appears to exit the outer
one.

BTW, I figured out why I got more than one line to print out in my
version: I had preceded the perl statement with "find . -type f |
xargs ". I didn't question the result in light of the documentation
cited.

bill
 
C

chris-usenet

find all the files below /path/to/subdir (this is Linux)
that contain either of the strings "foo bar baz" or "quux frobozz",

A. Sinan Unur said:
Why? find, the command whose output you are processing, can take a regex
to match against. Use that.
$ find . -regex '.*/z.*' -type f -print

But -regex doesn't search the file contents like the OP required. It
matches against file names.
Please consult
man find

Exactly:

-regex pattern
File name matches regular expression pattern. This is a
match on the whole path, not a search. For example to match
a file named './fubar3', you can use the regular expression
`.*bar.' or `.*b.*3', but not `b.*r3'.


Quick alternative, since this is CLPM:

find . -type f | perl -ane 'chomp;open(F,$_)||next;print "$_\n" if grep {/foo bar baz|quux frobozz/} <F>'

More complex alternative. I've left the (apparently redundant) perl
bit in on the assumption that you want a more complex RE match than can
simply be performed by egrep (the egrep provides an efficient first-pass
filter to reduce the number of candidate files that perl has to process):

find . -type f -print0 | xargs -0 egrep -l 'foo bar baz|quux frobozz' | perl -ane 'chomp;open(F,$_)||next;print "$_\n" if grep {/foo bar baz|quux frobozz/} <F>'

Chris
 
A

A. Sinan Unur

(e-mail address removed) wrote in
But -regex doesn't search the file contents like the OP required. It
matches against file names.

Well, it seems like I misunderstood the OP. Apologies.

Sinan
 
B

bill

In said:
Of course the catch could be in that last "non-magical" bit, which
as far as I know is totally undefined crap (i.e. useless
non-documentation). [...]

Documentation patches are a great way to contribute to the Perl
project. Thanks in advance for yours.

I can't possibly produce this patch, since I have no idea what this
magical/non-magical stuff means. Do you?

bill
 
M

Michele Dondi

The immediate problem that serves as the context of the question
is this: find all the files below /path/to/subdir (this is Linux)
that contain either of the strings "foo bar baz" or "quux frobozz",
and do this *from the command line* (i.e. I'm looking for a one-liner
here, not a longwinded affair using File::Find, etc.).

Please note, that indeed it will tend to be somewhat clumsy, but you
can use File::Find on the cmd line:

perl -MFile::Find -le 'find sub { print if -f }, @ARGV' .

(OK, accomplishing the task you described *can't* be that terse!)
I tried

% perl -e 'print "$ARGV\n" if grep /(foo bar baz|quux frobozz)/, <>' `find /path/to/subdir -type f`

As a side note to the other good suggestions you already received,
I'll just add that if I wanted to do it "like this", then I'd do:

perl -lne 'print $ARGV and close ARGV if /whatever/' \
$(find /path/to/subdir -type f)

This has the big advantage (in terms of keystrokes) that you do not
have to explicitly open the files, but (I fear that) you may get into
trouble if the output of find gets large enough...


Michele
 
B

Brian McCauley

bill said:

Of course the catch could be in that last "non-magical" bit, which
as far as I know is totally undefined crap (i.e. useless
non-documentation). [...]


Documentation patches are a great way to contribute to the Perl
project. Thanks in advance for yours.


I can't possibly produce this patch, since I have no idea what this
magical/non-magical stuff means. Do you?

Yes, the magicical nature of the *ARGV special variable is explained in
the documentation decribing the *ARGV special varaible. Oddly enough,
when I started to try to help you in this thread which asks questions
about *ARGV the _first_ thing I did was go re-read the relevant
documentation. This was before I actually looked at the code in any
detail and noticed that the print $ARGV was outside the loop.
 
B

Brian McCauley

bill said:
BTW, I figured out why I got more than one line to print out in my
version: I had preceded the perl statement with "find . -type f |
xargs ".

Yes, I'd already wroked that out.
 
B

bill

In said:
bill wrote:


Of course the catch could be in that last "non-magical" bit, which
as far as I know is totally undefined crap (i.e. useless
non-documentation). [...]


Documentation patches are a great way to contribute to the Perl
project. Thanks in advance for yours.


I can't possibly produce this patch, since I have no idea what this
magical/non-magical stuff means. Do you?
Yes, the magicical nature of the *ARGV special variable is explained in
the documentation decribing the *ARGV special varaible.

Where exactly is that documentation? If I search for ARGV in
perlvar, this is what comes up:

input_line_number HANDLE EXPR
$INPUT_LINE_NUMBER
$NR
$. The current input record number for the last file
handle from which you just read() (or called a
"seek" or "tell" on). The value may be different
from the actual physical line number in the file,
depending on what notion of "line" is in
effect--see $/ on how to change that. An explicit
close on a filehandle resets the line number.
Because "<>" never does an explicit close, line
numbers increase across ARGV files (but see exam­
ples in "eof" in perlfunc). Consider this vari­
able read-only: setting it does not reposition the
seek pointer; you'll have to do that on your own.
Localizing $. has the effect of also localizing
Perl's notion of "the last read filehandle".
(Mnemonic: many programs use "." to mean the cur­
rent line number.)

...

$ARGV contains the name of the current file when reading
from <>.

@ARGV The array @ARGV contains the command-line argu­
ments intended for the script. $#ARGV is gener­
ally the number of arguments minus one, because
$ARGV[0] is the first argument, not the program's
command name itself. See $0 for the command name.

...

Perl identifiers that begin with digits, control charac­
ters, or punctuation characters are exempt from the
effects of the "package" declaration and are always forced
to be in package "main". A few other names are also
exempt:

ENV STDIN
INC STDOUT
ARGV STDERR
ARGVOUT
SIG

That's it. Nothing on magic. Programming Perl 3d ed. says

ARGV
[ALL] The special filehandle that iterates over command-line
filenames in @ARGV. Usually written as the null filehandle
in the angle operator: <>.

No mention of magic here either. Also there's no mention of anything
(whether magic or not) that explains the inconsistency I cited
earlier, namely that a "last" statement in an -ne script doesn't
exit just the inner loop.

bill
 
B

Brian McCauley

bill said:
bill wrote:
[...]


Of course the catch could be in that last "non-magical" bit, which
as far as I know is totally undefined crap (i.e. useless
non-documentation). [...]



Documentation patches are a great way to contribute to the Perl
project. Thanks in advance for yours.


I can't possibly produce this patch, since I have no idea what this
magical/non-magical stuff means. Do you?

Yes, the magicical nature of the *ARGV special variable is explained in
the documentation decribing the *ARGV special varaible.


Where exactly is that documentation? If I search for ARGV in
perlvar, this is what comes up:

[ snip ]

Sorry, I was looking at the current version. I didn't realise it had
changed recently. I also followed the implied cross-reference and
looked up the <> operator. I didn't remeber doing this because it was
barely a concious act - I've been reading reference manuals for a couple
of decades.

Anyhow, there you have it - I was wrong, I should have not have criticised.
ARGV
[ALL] The special filehandle that iterates over command-line
filenames in @ARGV. Usually written as the null filehandle
in the angle operator: <>.

No mention of magic here either. Also there's no mention of anything
(whether magic or not) that explains the inconsistency I cited
earlier, namely that a "last" statement in an -ne script doesn't
exit just the inner loop.

There is only one loop.

As you yourself cited in earlier in this thread:

perl -ne 'WHATEVER'

is, as a matter of fact, a shorthand for

perl -ne 'LINE: while (<>) { WHATEVER }'

It then went on you try to explain the effect of this with some pseudo
code. But the pseudo code is an analogy. A parable. Where you are
given a literal explaination and an analogy you should not fall into the
trap of drawing false infererences by from a literal interpretation of
the analogy. (Or else you may end up knocking on people's doors trying
to persude them to accept free magazines :) ).

In reality the <> or <ARGV> behaves as a iterator that iterates over all
ther records in all the files in @ARGV.

Anyhow you should submit a documention patch that inserts the phrase
"There are not really two nested loops" at the appropriate place in the
explaination of the pseudocode. As has been said a number of times in
this newsgroup (mostly by me it has to be admitted) that people who are
rading the documentation for the first time are actually most qualified
to find bits that are hard to follow.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top