perl grep problem

D

demolitionz

hey, wonder if anyone can help 'cause i'm fresh out of ideas why my
perl script isn't working!

basically the script reads all the data from files in a directory into
an array. i then want the user to be able to search that array for
keywords (in each line) and output the keywords to a file. i've got
the script to work using the following line:

@found = grep(/$ARGV[2]/i, @rf);

where @rf is the array that's being searched, @found is the array the
found words are stored to (i output it to a file later which also works
fine) and $ARGV[2] is the user input word to search for. the problem
with this script is that because the user inputs the search word as
$ARGV[2] the program can only search for one word per run, which means
when they want to search for another word they have to run the whole
program again and this slows things down as the @rf array has to be
created from scratch once more.

what i want to do (and what i've tried endlessly to do in the 2nd
remake of the script!) is to have a 2 step proccess, where the files
are read into the @rf array as step one, and then in step 2 the user
inputs the keyword to search for and we loop step 2 as many times as
the user wants. what i'm currently doing with that then, is this:

$keyword = <STDIN>;
chop $keyword;
@found = grep(/$keyword/i, @rf);

now i've printed to screen everything so as to debug it, and if the
user inputs "chickens" for example, then print $keyword; will return
"chickens" correctly. the problem is, no matter what i try, i cannot
get the grep(/$keyword/) bit to work and @found is *always* empty! i
don't really understand why grep would work fine with $ARGV[2] but not
with $keyword and it's drivin me crazy! i've tried @found =
grep(/"$keyword"/i, @rf); and i've tried chomp $keyword; and i've even
resorted to pushing $keyword into an array and calling the same value
from the array as a scalar (i got very very desperate by this point and
would try anything ;)) but nothing i do works.

can anyone help?! :)

cheers,
d
 
M

Mark Clements

hey, wonder if anyone can help 'cause i'm fresh out of ideas why my
perl script isn't working!

basically the script reads all the data from files in a directory into
an array. i then want the user to be able to search that array for
keywords (in each line) and output the keywords to a file. i've got
the script to work using the following line:

@found = grep(/$ARGV[2]/i, @rf);

where @rf is the array that's being searched, @found is the array the
found words are stored to (i output it to a file later which also
works fine) and $ARGV[2] is the user input word to search for. the
problem with this script is that because the user inputs the search
word as $ARGV[2] the program can only search for one word per run,
which means when they want to search for another word they have to
run the whole program again and this slows things down as the @rf
array has to be created from scratch once more.

what i want to do (and what i've tried endlessly to do in the 2nd
remake of the script!) is to have a 2 step proccess, where the files
are read into the @rf array as step one, and then in step 2 the user
inputs the keyword to search for and we loop step 2 as many times as
the user wants. what i'm currently doing with that then, is this:

$keyword = <STDIN>;
chop $keyword;
@found = grep(/$keyword/i, @rf);

now i've printed to screen everything so as to debug it, and if the
user inputs "chickens" for example, then print $keyword; will return
"chickens" correctly. the problem is, no matter what i try, i cannot
get the grep(/$keyword/) bit to work and @found is always empty! i
don't really understand why grep would work fine with $ARGV[2] but not
with $keyword and it's drivin me crazy! i've tried @found =
grep(/"$keyword"/i, @rf); and i've tried chomp $keyword; and i've even
resorted to pushing $keyword into an array and calling the same value
from the array as a scalar (i got very very desperate by this point
and would try anything ;)) but nothing i do works.

can anyone help?! :)
You need to post a small, complete program that displays this
behaviour, as well as sample data and output, copying and pasting
rather than retyping. Check out the posting guidelines. I would have
suggested "chomp" rather than "chop", but you've tried that. Is also
possible that the data you are feeding to STDIN has something
unexpected in it. Bear in mind that you'll need to escape special
characters if you want to use them in a regex to match the, er, special
characters.

Mark
 
D

demolitionz

Okay, have read the posting guidelines and hopefully understood them,
sorry about that :)

Here's a scaled down version of the program that isn't working...

#!usr/bin/perl
$filenumber = 0;
do {
print "Processing file $filenumber of $#rd\n"; # nb: this is just to
debug
opendir(DH,$ARGV[0]);
@rd = readdir(DH);
open(FH,"$ARGV[0]/$rd[$filenumber]");
@rf = <FH>;
$filenumber++;
}
while ($filenumber <= $#rd);
do {
print "File to save to: ";
$filename = <STDIN>;
chomp $filename;
print "Keyword to search for:";
$searchterm = <STDIN>;
chomp $searchterm;
@found = grep(/$searchterm/i, @rf);
open(SAVETOFILE,">>./new/$filename");
print SAVETOFILE @found;
print "rf array: @rf\n";
print "keyword: $searchterm\n";
print "found array: @found\n";
print "Search again? y/n\n";
$stop = <STDIN>;
chop $stop;
if ($stop eq "n") { exit; }
}
while ($filename ne "!exit");
exit;

Have also tried adding in $searchterm =~ s/[^A-Za-z0-9 .\\:-]*//g; but
doesn't seem to make a difference. (oh and the directory 'New' does
exist just in case you were wondering :)).

And here's some sample data (directory contains 4 txt files. 1.txt
contains word eggs, 2.txt contains word bacon, 3.txt contains word
chickens, 4.txt contains word flower)...

c:\scriptdir> new1.pl c:\prltest\
Processing file 0 of -1 #this is just cos i put the debug print in
weird place :)
Processing file 1 of 5
Processing file 2 of 5
Processing file 3 of 5
Processing file 4 of 5
Processing file 5 of 5
File to save to: new.txt
Keyword to search for: chicken
rf array: flower
keyword: chicken
found array:
Search again? n

And that's basically it. As i say, it works absolutely fine with
$ARGV[2] as input so i'm stumped!

cheers,
d
ps this is only the 3rd script i've ever written in perl, so pls go
easy on me if i've done something obviously stupid ;)
 
D

demolitionz

oh and just to pre-empt anyone lol, i did actually copy and paste that
script so i assume some of the misformats are due to google's
newsreader - e.g. open(FH,"$ARGV[0]/$rd[$filenum ber]"); is not a
mistake in the script (it's actually
open(FH,"$ARGV[0]/$rd[$filenumber]"); in the script) :)
 
J

John Bokma

wrote:
Okay, have read the posting guidelines and hopefully understood them,

Probably not entirely, so I added some guidelines ;-)
#!usr/bin/perl

use strict;
use warnings;
opendir(DH,$ARGV[0]);

check return value
open(FH,"$ARGV[0]/$rd[$filenumber]");

check return value
open(SAVETOFILE,">>./new/$filename");
check

chop $stop;

use chomp if you want to chomp, see perldoc -f chomp
if ($stop eq "n") { exit; }
}
while ($filename ne "!exit");

nicer:

while ( 1 ) {

:
last if $stop eq 'n';
}
 
D

demolitionz

Ok, I've added the following debug in which will check to see if
directory can be opened and files are being read properly...

open(FH,"$ARGV[0]/$rd[$filenumber]");
print "\nFH is: ";
print <FH>;

this returns...

Processing file 1 of 5
FH is:
Processing file 2 of 5
FH is: eggs
Processing file 3 of 5
FH is: bacon
Processing file 4 of 5
FH is: chickens
Processing file 5 of 5
FH is: flower

So return value for open(FH,"$ARGV[0]/$rd[$filenumber]"); (and
therefore opendir(DH,$ARGV[0]);) seem fine.

Also, the new file is created ok (and it writes to the new file ok when
using $ARGV[2]) so open(SAVETOFILE,">>./new/$file name"); so this seems
fine too, its just annoying that it's always bleeding empty lol!

Have changed the ending of the script per your suggestion, and ty for
that :)

So I'm once again totally baffled, as all my debug checks seem to show
everything is working ok. The files in the directory are read to the
@rf array ok, the new file is created fine, the $keyword stdin works,
but the script just refuses to grep using the $keyword. And to top it
off google is doing it's best to misformat these posts lol :) ty for
ongoing help btw, appreciate it :)
 
J

John Bokma

wrote:

Learn how to quote, otherwise you will notice that no one is going to reply
to your postings.
Ok, I've added the following debug

wrong, try again.

(hint open( ... ) or die "Can't open '$filename': $!";

BTW: I am not saying that it's going to fix your problem, but it might trap
errors now, or in your future work.
 
D

demolitionz

mmkay, well these will be manual quotes then as google doesn't have a
quote feature that i can find, so hopefully they come out ok.
(hint open( ... ) or die "Can't open '$filename': $!";

done on opendir(DH,$ARGV[0]); and open(FH,"$ARGV[0]/$rd[$filenumber]");
and open(SAVETOFILE,">>./new/$file name") and they all work fine, no
errors...
 
M

Mark Clements

Okay, have read the posting guidelines and hopefully understood them,
sorry about that :)

Here's a scaled down version of the program that isn't working...

I've piped it through perltidy to make it semi-legible.

$filenumber = 0;
do {
print "Processing file $filenumber of $#rd\n"; # nb: this >
opendir( DH, $ARGV[0] );
@rd = readdir(DH);
open( FH, "$ARGV[0]/$rd[$filenumber]" );
@rf = <FH>;
$filenumber++;
} while ( $filenumber <= $#rd );

You are doing opendir and reading the directory each time through the
loop. You don't need to do this. You aren't checking the return value
of your system calls. You aren't running with strict and warnings
(already pointed out). You are probably trying to open "." and ".." as
files. You are overwriting the value of @rf each time through the loop,
so @rf will only contain the contents of the last file found in the
directory, whatever that is.

use strict;
use warnings;
use Data::Dumper;

my $dirName = shift;
opendir DIRTOREAD, $dirName or die "could not open dir $dirName: $!";

my @filesToSearch = grep { -f "$dirName/$_" } readdir DIRTOREAD;
closedir DIRTOREAD or die "error closing dir $dirName: $!";

my %fileData = ();

foreach my $fileName(@filesToSearch){
my $fileToSearch = "$dirName$fileName";
open IN, "<$fileToSearch"
or die "could not open $fileToSearch: $!";
my @lines = map { chomp , $_} <IN>;
$fileData{$fileName} = \@lines;

}

warn Dumper %fileData;

while(my($fileName,$lines)=each %fileData){

print "enter search term for $fileName: ";
my $searchTerm = <STDIN>;
chomp $searchTerm;
last unless $searchTerm;
print "\n";

my @foundLines = grep { /$searchTerm/ } @$lines;

print "filename = $fileName searchTerm = $searchTerm\n";
print "found ".Dumper(@foundLines)."\n";


}

use Data::Dumper to make sure that your arrays contain what you think
they contain....

Note that doing this loads *all* of the files in the directory into
memory; you may not want to do this.

Mark
 
J

John Bokma

Mark said:
my $dirName = shift;
opendir DIRTOREAD, $dirName or die "could not open dir $dirName: $!";

Isn't it more common to use:

opendir my $dh, etc

nowadays? (Also CamelCase is something I prefer not to use ;-) )
my @lines = map { chomp , $_} <IN>;

chomp( my @lines = <IN> ); ?

(Just curious, not nitpicking, ok a little).
 
D

demolitionz

Thanks for your reply. Haven't used your code primarily because the
point of the excercise for me was just to try and learn some perl and
see if i could make the thing work (i'll move on to elegance later ;)),
but you did hit the nail on the head with this...
You are overwriting the value of @rf each time through the loop,
so @rf will only contain the contents of the last file found in the
directory, whatever that is.

Have now changed the code from

@rf = <FH>;

to:

push(@rf, <FH>);

and it works fine :)

have also moved opendir(DH,$ARGV[0]); @rd = readdir(DH); out of the
first loop as you suggested.

many thanks to you both for your help :)

d
 
M

Mark Clements

John said:
Isn't it more common to use:

opendir my $dh, etc

Sure - I was just following on from the OP's style, or, er, perhaps it
just didn't occur to me. On another point, I tend not to put the "my"
there unless it is eg at the start of a foreach. I think it makes
things clearer if the my is the first non-whitespace on the line.
nowadays? (Also CamelCase is something I prefer not to use ;-) )

Yeah - I've been whistled on this one before :)
chomp( my @lines = <IN> ); ?

Good point. I hadn't realised that chomp could be fed a list argument.
You learn something new every day.

regards,

Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top