pattern serach over many files

P

pavan734

Hi,
Suppose I have got 3 files like this

file1.txt file2.txt file3.txt
abc gsywg wrtw
def abc abc hshs dhwu wwg
dadq aft hhs gtc ffs
abc ttsg abc hhshh abc

Assume that all the files are of same no. of lines

I need a script that compares each line of all the files for the
pattern "abc" and print the number of lines not containing the pattern
"abc" in all the files

In the above example, the script must print 1 because only 3rd line of
all the files is not containing the pattern "abc". I think you
understood my question, if not pls ask me again, I will eloborate more.
 
M

Mirco Wahab

Hi,
Suppose I have got 3 files like this

file1.txt file2.txt file3.txt
abc gsywg wrtw
def abc abc hshs dhwu wwg
dadq aft hhs gtc ffs
abc ttsg abc hhshh abc

Assume that all the files are of same no. of lines

I need a script that compares each line of all the files for the
pattern "abc" and print the number of lines not containing the pattern
"abc" in all the files

In the above example, the script must print 1 because only 3rd line of
all the files is not containing the pattern "abc". I think you
understood my question, if not pls ask me again, I will eloborate more.

use strict;
use warnings;

my $term = qr/abc/;

open my $f1, '<', 'file1.txt';
open my $f2, '<', 'file2.txt';
open my $f3, '<', 'file3.txt';

my ($count, $line);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" unless grep /$term/, @lines;
}

print "Total: $count\n";



Regards

M.
 
T

Tad McClellan

Hi,
Suppose I have got 3 files like this

file1.txt file2.txt file3.txt
abc gsywg wrtw
def abc abc hshs dhwu wwg
dadq aft hhs gtc ffs
abc ttsg abc hhshh abc

Assume that all the files are of same no. of lines

I need a script that compares each line of all the files for the
pattern "abc" and print the number of lines not containing the pattern
"abc" in all the files

In the above example, the script must print 1 because only 3rd line of
all the files is not containing the pattern "abc". I think you
understood my question, if not pls ask me again, I will eloborate more.


----------------------
#!/usr/bin/perl
use warnings;
use strict;

my @file1 = ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my @file2 = ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my @file3 = ('wrtw', 'dhwu wwg', 'ffs', 'abc');

my $cnt=0;
foreach my $i ( 0 .. $#file1 ) {
next if $file1[$i] =~ /abc/;
next if $file2[$i] =~ /abc/;
next if $file3[$i] =~ /abc/;
$cnt++;
}

print "$cnt\n";
 
P

pavan734

Hi,
Suppose I have got 3 files like this
file1.txt file2.txt file3.txt
abc gsywg wrtw
def abc abc hshs dhwu wwg
dadq aft hhs gtc ffs
abc ttsg abc hhshh abc
Assume that all the files are of same no. of lines
I need a script that compares each line of all the files for the
pattern "abc" and print the number of lines not containing the pattern
"abc" in all the files
In the above example, the script must print 1 because only 3rd line of
all the files is not containing the pattern "abc". I think you
understood my question, if not pls ask me again, I will eloborate more.

----------------------
#!/usr/bin/perl
use warnings;
use strict;

my @file1 = ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my @file2 = ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my @file3 = ('wrtw', 'dhwu wwg', 'ffs', 'abc');

my $cnt=0;
foreach my $i ( 0 .. $#file1 ) {
next if $file1[$i] =~ /abc/;
next if $file2[$i] =~ /abc/;
next if $file3[$i] =~ /abc/;
$cnt++;

}

print "$cnt\n";
----------------------

--
Tad McClellan SGML consulting
(e-mail address removed) Perl programming
Fort Worth, Texas- Hide quoted text -

- Show quoted text -

I think you have misunderstood. file1 name is file1.txt and `abc',
'def abc', 'dadq aft', `abc ttsg' are its contents. Similarly for
file2.txt and file3.txt. Note that the contents can be anything and my
real application files are containing as many as 2000 lines.
 
G

Gunnar Hjalmarsson

<homework assignment snipped>

I think you have misunderstood. file1 name is file1.txt and `abc',
'def abc', 'dadq aft', `abc ttsg' are its contents. Similarly for
file2.txt and file3.txt.

Then adapt the solution accordingly!
 
P

pavan734

use strict;
use warnings;

my $term = qr/abc/;

open my $f1, '<', 'file1.txt';
open my $f2, '<', 'file2.txt';
open my $f3, '<', 'file3.txt';

my ($count, $line);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" unless grep /$term/, @lines;
}

print "Total: $count\n";

Regards

M.- Hide quoted text -

- Show quoted text -

Thank you.
 
P

pavan734

use strict;
use warnings;

my $term = qr/abc/;

open my $f1, '<', 'file1.txt';
open my $f2, '<', 'file2.txt';
open my $f3, '<', 'file3.txt';

my ($count, $line);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" unless grep /$term/, @lines;
}

print "Total: $count\n";

Regards

M.- Hide quoted text -

- Show quoted text -


One more question.. kindly answer. How should the code change if I
want to print the number of lines of each file containg the pattern
"abc". Again I should get the answer 1 because only 4th line of all
the files is containing the pattern "abc"
 
M

Mirco Wahab

One more question.. kindly answer. How should the code change if I
want to print the number of lines of each file containg the pattern
"abc". Again I should get the answer 1 because only 4th line of all
the files is containing the pattern "abc"

Change one line

# old
# print ++$count, ". # $line\n" unless grep /$term/, @lines;
# new
print ++$count, ". # $line\n" if 3 == grep /$term/, @lines;

"unless grep" means:
if 0 == grep ...

so we change that to
if 3 == grep ...

in order to find lines w/all three occurrences

M.
 
M

Mirco Wahab

Mirco said:
Change one line
...
in order to find lines w/all three occurrences

BTW, my initial code was more like 'enhanced pseudo Perl',
so if you are interested to do some more Perl, you should
adopt to a more robust programming style, you should
for example (at least) check opened file handles and
avoid unnecessary repeated statements, more like:

[revamped example]

use strict;
use warnings;

my $term = qr/abc/;
my @name = qw' file1.txt file2.txt file3.txt ';

open my $f1, '<', $name[0] or warn "$name[0] $!";
open my $f2, '<', $name[1] or warn "$name[1] $!";;
open my $f3, '<', $name[2] or warn "$name[2] $!";;

my ($count, $line) = (0, 0);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" if 3 == grep /$term/, @lines;
}

print "Total: $count\n";

...

Regards

M.
 
D

Dr.Ruud

(e-mail address removed) schreef:
I think you have misunderstood.

No, you have misunderstood. An approach was shown, for you to learn.

Tip: With larger files, a while loop takes less memory.

Also: You really shouldn't quote old text that is not relevant to your
reply. Especially don't quote signatures!
 
T

Tad McClellan

I think you have misunderstood.


No, I think I didn't.

file1 name is file1.txt and `abc',
'def abc', 'dadq aft', `abc ttsg' are its contents.


Yes, I understood all of that, but I didn't want to litter my
filesystem with a bunch of files just for testing.

I left it to you to figure out yourself how to read each of
the 3 files into an array, or to adapt it to read 3 files
in parallel.
 
M

Mirco Wahab

Tad said:
No, I think I didn't.


Yes, I understood all of that, but I didn't want to litter my
filesystem with a bunch of files just for testing.

I left it to you to figure out yourself how to read each of
the 3 files into an array, or to adapt it to read 3 files
in parallel.

I know you wouldn't hesitate to spell out
a complete solution for al this within 45
seconds if necessary, but I believe in this
case your example wasn't really a good one.

I actually tried to start from it and pull
a solution but there are (imho) some larger
difficulties from a beginners view because
you can't really index file lines via
loop variables.

The closest working solution to that (without
cluttering the file system) I found was this:

#!/usr/bin/perl
use warnings;
use strict;

my $file1 = join "\n", ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my $file2 = join "\n", ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my $file3 = join "\n", ('wrtw', 'dhwu wwg', 'ffs', 'abc');

open my $h1, '<', \$file1 or warn "$!";
open my $h2, '<', \$file2 or warn "$!";;
open my $h3, '<', \$file3 or warn "$!";;

my $cnt=0;

foreach my $i ( 1 .. 3 ) {
$cnt += ( (<$h1> =~ /abc/)
+ (<$h2> =~ /abc/)
+ (<$h3> =~ /abc/) )
== 0 ? 1 : 0;
}

print int($cnt),"\n";



As you can see, to adopt to a 'file reading solution',
you have to completely rewrite the loop ... ;-)

(If I'm not completely mistaken)


Regards

Mirco (intermediate level)
 
M

Mirco Wahab

Tad said:
Yes, I understood all of that, but I didn't want to litter my
filesystem with a bunch of files just for testing.
I left it to you to figure out yourself how to read each of
the 3 files into an array, or to adapt it to read 3 files
in parallel.

I know you wouldn't hesitate to spell out
a complete solution for all this within 45
seconds if necessary, but I believe in this
case your example wasn't really a good one.

I actually tried to start from it and pull
a solution but there are (imho) some larger
difficulties from a beginners view because
you can't really index file lines via
loop variables.

The closest working solution to that (without
cluttering the file system) I found was this:

#!/usr/bin/perl
use warnings;
use strict;

my $file1 = join "\n", ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my $file2 = join "\n", ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my $file3 = join "\n", ('wrtw', 'dhwu wwg', 'ffs', 'abc');

open my $h1, '<', \$file1 or warn "$!";
open my $h2, '<', \$file2 or warn "$!";;
open my $h3, '<', \$file3 or warn "$!";;

my $cnt=0;
my $line=0;

while( ++$line ) {
last if( eof($h1) || eof($h2) || eof($h3) );

++$cnt unless ( (<$h1> =~ /abc/)
+ (<$h2> =~ /abc/)
+ (<$h3> =~ /abc/) )
}

print $cnt,"\n";



As one can see, to adopt to a 'file reading solution',
you have to completely rewrite the loop ... ;-)

(If I'm not completely mistaken)


Regards

Mirco (intermediate level)
 
T

Tad McClellan

Mirco Wahab said:
I know you wouldn't hesitate to spell out
a complete solution for all this within 45
seconds if necessary, but I believe in this
case your example wasn't really a good one.

I actually tried to start from it and pull
a solution but there are (imho) some larger
difficulties from a beginners view because
you can't really index file lines via
loop variables.

But since he said we are guaranteed that the files all have
the same number of lines ...
As one can see, to adopt to a 'file reading solution',
you have to completely rewrite the loop ... ;-)


.... the loop can be adopted to reading from files instead of having
the data in arrays without much trouble.


# assume 3 open()s done here

while ( my $file1 = <FILE1> ) {
next if $file1 =~ /abc/;

my $file2 = <FILE2>;
next if $file2 =~ /abc/;

my $file3 = <FILE3>;
next if $file3 =~ /abc/;

$cnt++;
}
 
A

anno4000

Mirco Wahab said:
Mirco said:
Change one line
...
in order to find lines w/all three occurrences

BTW, my initial code was more like 'enhanced pseudo Perl',
so if you are interested to do some more Perl, you should
adopt to a more robust programming style, you should
for example (at least) check opened file handles and
avoid unnecessary repeated statements, more like:

[revamped example]

use strict;
use warnings;

my $term = qr/abc/;
my @name = qw' file1.txt file2.txt file3.txt ';

open my $f1, '<', $name[0] or warn "$name[0] $!";
open my $f2, '<', $name[1] or warn "$name[1] $!";;
open my $f3, '<', $name[2] or warn "$name[2] $!";;

my ($count, $line) = (0, 0);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" if 3 == grep /$term/, @lines;
}

print "Total: $count\n";

my @name = map "file$_.txt", 1 .. 3;
my @fh;
open $fh[ @fh], '<', $_ for @name;

while ( ! grep eof( $_), @fh ) {
chomp( my @lines = map scalar <$_>, @fh);
# etc...
}
# etc...

Anno
 
M

Mirco Wahab

Tad said:
But since he said we are guaranteed that the files all have
the same number of lines ...
...
... the loop can be adopted to reading from files instead of having
the data in arrays without much trouble.

# assume 3 open()s done here

while ( my $file1 = <FILE1> ) {
next if $file1 =~ /abc/;

my $file2 = <FILE2>;
next if $file2 =~ /abc/;

my $file3 = <FILE3>;
next if $file3 =~ /abc/;

$cnt++;
}

This would imho loose synchronization
(number of read lines in each file).

You need to subsequently read
/one line out of each file/
in every iteration. If you don't,
you'd compare file1's twentieth
line with file2's second or so.

Regards

M.
 
C

comp.llang.perl.moderated

my @file1 = ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my @file2 = ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my @file3 = ('wrtw', 'dhwu wwg', 'ffs', 'abc');
my $cnt=0;
foreach my $i ( 0 .. $#file1 ) {
next if $file1[$i] =~ /abc/;
next if $file2[$i] =~ /abc/;
next if $file3[$i] =~ /abc/;
$cnt++;

print "$cnt\n";
----------------------
- Show quoted text -

I think you have misunderstood. file1 name is file1.txt and `abc',
'def abc', 'dadq aft', `abc ttsg' are its contents. Similarly for
file2.txt and file3.txt. Note that the contents can be anything and my
real application files are containing as many as 2000 lines.


Tie::File (comes with Perl distro now) will
load a file into an array for you. Slower
if big or many files but an easy upfront
change to use Tad's solution.

Eg,

tie @array1, 'Tie::File', 'file1.txt' or die ...;
 
M

Mirco Wahab

my @name = map "file$_.txt", 1 .. 3;
my @fh;
open $fh[ @fh], '<', $_ for @name;

while ( ! grep eof( $_), @fh ) {
chomp( my @lines = map scalar <$_>, @fh);
# etc...
}
# etc...

Not a bad idea!
But much too verbose ... ;-)


use strict;
use warnings;

my (@fh, $count);
open $fh[ @fh ], '<', 'file'.$_.'.txt' for 1..3;

++$count unless grep /abc/, map scalar <$_>, @fh
while( ! grep eof($_), @fh )

print $count if defined $count;



(actually I didn't believe yours would work
at all, but learned from you that

...
map <$_>, @handle_array;
...

does indeed work!
Thanks for that hint.

Regards

Mirco
 
A

anno4000

comp.llang.perl.moderated said:
On Feb 15, 4:39 am, (e-mail address removed) wrote:
[...]

Tie::File (comes with Perl distro now) will
load a file into an array for you. Slower
if big or many files but an easy upfront
change to use Tad's solution.

Tie::File doesn't preload the file, if that's your concern, it uses tie
magic to make it look as if it did.

Anno
 
A

anton.vandersteen

comp.llang.perl.moderated said:
On Feb 15, 4:39 am, (e-mail address removed) wrote:
[...]

Tie::File (comes with Perl distro now) will
load a file into an array for you. Slower
if big or many files but an easy upfront
change to use Tad's solution.

Tie::File doesn't preload the file, if that's your concern, it uses tie
magic to make it look as if it did.

Anno

Hello to Perl Momks,

This is my sollution to the problem:

#!/perl/bin/perl
#This programme is written by Anton van der Steen
#Email adres: (e-mail address removed)
use Tk;
use File::Find;

my $mw = new MainWindow; # Main Window
$mw->title("Search Engine Version 2.0 by Stone Logic Systems");


######################################################################
$mw->configure(-menu => my $menubar = $mw->Menu);

my $filemenu = $menubar->cascade(-label => "~File",
-tearoff => 1);
my $execute_sql_statement= $menubar->cascade(-label => "Count
~Phrase",
-tearoff=>1);
my $save_result_to_file = $menubar->cascade(-label => "~Show Text",
-tearoff=>1);
my $clear_text_area = $menubar->cascade(-label => "~Clear Result Set",
-tearoff=>1);
my $create_pdf = $menubar->cascade(-label => "~Export to File",
-tearoff=>1);
my $create_excel = $menubar->cascade(-label=> "Find File",
-tearoff=>1);

#my $create_xml = $menubar->cascade(-label=> "Export to XML",
# -tearoff=>1);

my $helpmenu = $menubar->cascade(-label => "~Help",
-tearoff => 1);



$filemenu->command(-label => "E~xit",
-command => sub{$mw->destroy});

$execute_sql_statement->command(-label => "Count ~Phrase",
-command=> sub{push_button1()});

$save_result_to_file->command(-label =>"~Show Text",
-command=> sub{push_button2()});

$clear_text_area->command(-label => "~Clear Result Set",
-command=> sub {push_button3()});

$create_pdf->command(-label => "~Export to File",
-command=> sub {push_button4()});

$create_excel->command(-label =>"Find File",
-command=> sub {push_button5()}) ;

#$create_xml->command(-label =>"Export to XML",
#-command=> sub {XML()});

$helpmenu->command(-label => "~Help Contents",
-command => sub{showhelp()});

######################################################################
my $frm_name = $mw -> Frame() -> pack();

my $lab1 = $frm_name -> Label(-text=>"Phrase :", -font => '-adobe-
helvetica-bold-r-normal--11-120-75-75-p-70-*-1') -> pack();

my $ent1 = $frm_name -> Entry(-width=>100, -borderwidth=>2) -> pack();
$ent1->configure(-font => '-adobe-helvetica-bold-r-
normal--11-120-75-75-p-70-*-1');


my $lab2=$frm_name->Label(-text=>"Search in File :", -font => '-adobe-
helvetica-bold-r-normal--11-120-75-75-p-70-*-1')->pack();

my $ent2=$frm_name->Entry(-width=>100)->pack();
$ent2->configure(-font => '-adobe-helvetica-bold-r-
normal--12-120-75-75-p-70-*-1');

my $lab3=$frm_name->Label(-text=>"Export to file :", -font => '-adobe-
helvetica-bold-r-normal--11-120-75-75-p-70-*-1')->pack();

my $ent3=$frm_name->Entry(-width=>100)->pack();
$ent3->configure(-font => '-adobe-helvetica-bold-r-
normal--12-120-75-75-p-70-*-1');


#my $but1 = $mw -> Button(-text=>"Count Appearance Phrase", -command =>
\&push_button1, -background=>"green",
# -font => '-adobe-helvetica-bold-r-normal--11-120-75-75-p-70-*-1') ->
pack();

#my $but2 = $mw -> Button(-text=>"Show text", -command =>
\&push_button2, -background=>"yellow",
# -font => '-adobe-helvetica-bold-r-normal--11-120-75-75-p-70-*-1') ->
pack();

#my $but3 = $mw -> Button(-text=>"Clear Text Area", -command =>
\&push_button3, -background=>"orange",
# -font => '-adobe-helvetica-bold-r-normal--11-120-75-75-p-70-*-1') ->
pack();

#my $but4 = $mw -> Button(-text=>"Save result to file", -command =>
\&push_button4, -background=>"cyan",
# -font => '-adobe-helvetica-bold-r-normal--11-120-75-75-p-70-*-1') ->
pack();



#Text Area
my $txt = $mw->Scrolled( 'Text' , -scrollbars=>'se' , -wrap=>
'none',);
$txt->configure(-width=>120, -height=>20, -font => '-adobe-helvetica-
bold-r-normal--14-120-75-75-p-70-*-1');
$txt->pack();

MainLoop;

sub push_button1 {

use Getopt::Std;

my $name1 = $ent1 -> get();
my $name2 = $ent2 -> get();
@ARGV= ($name1, $name2);
#print @ARGV;

$i=0;

my $pattern = shift @ARGV;


foreach $file (@ARGV)
{
open (FILE, $file);
while ($line = <FILE>)
{
if ($line =~m"$pattern")
{
$i++;
last if ($opt_1);


}

}
#print "The phrase $pattern is $i times found!!\n";


$txt-> insert ('0.0',"The phrase $pattern is $i times found in file
$file.\n");
close (FILE);
$i=0;

}


sub push_button2 {

use Getopt::Std;

my $name1 = $ent1 -> get();
my $name2 = $ent2 -> get();
@ARGV= ($name1, $name2);

$i=0;

my $pattern = shift @ARGV;


foreach $file (@ARGV)
{
open (FILE, $file);
while ($line = <FILE>)
{
if ($line =~m"$pattern")
{
$i++;
last if ($opt_1);
$txt-> insert ('end', "$line\n");
#print "$line\n";

}

}

close (FILE);
$i=0;

}

}


sub push_button3 {
$txt-> delete ('0.0', 'end');


}

sub push_button4 {

use Getopt::Std;

my $name1 = $ent1 -> get();
my $name2 = $ent2 -> get();
my $file_out = $ent3 -> get();

@ARGV= ($name1, $name2);

$i=0;

my $pattern = shift @ARGV;

open(OUT,">$file_out");


foreach $file (@ARGV)
{
open (FILE, $file);
while ($line = <FILE>)
{
if ($line =~m"$pattern")
{
$i++;
last if ($opt_1);
print OUT "$line\n";

}

}

close (FILE);
$i=0;

}
close(OUT);

}
}

sub push_button5{

find(\&push_button6, "c:\\")



}


sub push_button6 {

use Getopt::Std;
use File::Find;

my $a = $ent1 -> get();



if (/\.$a$/) # ((/\.zip$/) ||
{ #print "$File::Find::name\n";
$txt-> insert ('end', "$File::Find::name\n");
}
}

;




Have fun......
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top