pattern serach over many files

pavan734 · Feb 15, 2007

Hi,
Suppose I have got 3 files like this

file1.txt file2.txt file3.txt
abc gsywg wrtw
def abc abc hshs dhwu wwg
dadq aft hhs gtc ffs
abc ttsg abc hhshh abc

Assume that all the files are of same no. of lines

I need a script that compares each line of all the files for the
pattern "abc" and print the number of lines not containing the pattern
"abc" in all the files

In the above example, the script must print 1 because only 3rd line of
all the files is not containing the pattern "abc". I think you
understood my question, if not pls ask me again, I will eloborate more.

Mirco Wahab · Feb 15, 2007

Hi,
Suppose I have got 3 files like this

file1.txt file2.txt file3.txt
abc gsywg wrtw
def abc abc hshs dhwu wwg
dadq aft hhs gtc ffs
abc ttsg abc hhshh abc

Assume that all the files are of same no. of lines

I need a script that compares each line of all the files for the
pattern "abc" and print the number of lines not containing the pattern
"abc" in all the files

In the above example, the script must print 1 because only 3rd line of
all the files is not containing the pattern "abc". I think you
understood my question, if not pls ask me again, I will eloborate more.

use strict;
use warnings;

my $term = qr/abc/;

open my $f1, '<', 'file1.txt';
open my $f2, '<', 'file2.txt';
open my $f3, '<', 'file3.txt';

my ($count, $line);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" unless grep /$term/, @lines;
}

print "Total: $count\n";

Regards

M.

Tad McClellan · Feb 15, 2007

Hi,
Suppose I have got 3 files like this

file1.txt file2.txt file3.txt
abc gsywg wrtw
def abc abc hshs dhwu wwg
dadq aft hhs gtc ffs
abc ttsg abc hhshh abc

Assume that all the files are of same no. of lines

I need a script that compares each line of all the files for the
pattern "abc" and print the number of lines not containing the pattern
"abc" in all the files

In the above example, the script must print 1 because only 3rd line of
all the files is not containing the pattern "abc". I think you
understood my question, if not pls ask me again, I will eloborate more.

----------------------
#!/usr/bin/perl
use warnings;
use strict;

my @file1 = ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my @file2 = ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my @file3 = ('wrtw', 'dhwu wwg', 'ffs', 'abc');

my $cnt=0;
foreach my $i ( 0 .. $#file1 ) {
next if $file1[$i] =~ /abc/;
next if $file2[$i] =~ /abc/;
next if $file3[$i] =~ /abc/;
$cnt++;
}

print "$cnt\n";

pavan734 · Feb 15, 2007

[email protected] said:
[email protected] said:

Hi,
Suppose I have got 3 files like this

Click to expand...

file1.txt file2.txt file3.txt
abc gsywg wrtw
def abc abc hshs dhwu wwg
dadq aft hhs gtc ffs
abc ttsg abc hhshh abc

Click to expand...

Assume that all the files are of same no. of lines

Click to expand...

I need a script that compares each line of all the files for the
pattern "abc" and print the number of lines not containing the pattern
"abc" in all the files

Click to expand...

In the above example, the script must print 1 because only 3rd line of
all the files is not containing the pattern "abc". I think you
understood my question, if not pls ask me again, I will eloborate more.

Click to expand...

----------------------
#!/usr/bin/perl
use warnings;
use strict;

my @file1 = ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my @file2 = ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my @file3 = ('wrtw', 'dhwu wwg', 'ffs', 'abc');

my $cnt=0;
foreach my $i ( 0 .. $#file1 ) {
next if $file1[$i] =~ /abc/;
next if $file2[$i] =~ /abc/;
next if $file3[$i] =~ /abc/;
$cnt++;

}

print "$cnt\n";
----------------------

--
Tad McClellan SGML consulting
(e-mail address removed) Perl programming
Fort Worth, Texas- Hide quoted text -

- Show quoted text -

I think you have misunderstood. file1 name is file1.txt and `abc',
'def abc', 'dadq aft', `abc ttsg' are its contents. Similarly for
file2.txt and file3.txt. Note that the contents can be anything and my
real application files are containing as many as 2000 lines.

Gunnar Hjalmarsson · Feb 15, 2007

I think you have misunderstood. file1 name is file1.txt and `abc',
'def abc', 'dadq aft', `abc ttsg' are its contents. Similarly for
file2.txt and file3.txt.

Then adapt the solution accordingly!

pavan734 · Feb 15, 2007

use strict;
use warnings;

my $term = qr/abc/;

open my $f1, '<', 'file1.txt';
open my $f2, '<', 'file2.txt';
open my $f3, '<', 'file3.txt';

my ($count, $line);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" unless grep /$term/, @lines;
}

print "Total: $count\n";

Regards

M.- Hide quoted text -

- Show quoted text -

Thank you.

pavan734 · Feb 15, 2007

use strict;
use warnings;

my $term = qr/abc/;

open my $f1, '<', 'file1.txt';
open my $f2, '<', 'file2.txt';
open my $f3, '<', 'file3.txt';

my ($count, $line);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" unless grep /$term/, @lines;
}

print "Total: $count\n";

Regards

M.- Hide quoted text -

- Show quoted text -

One more question.. kindly answer. How should the code change if I
want to print the number of lines of each file containg the pattern
"abc". Again I should get the answer 1 because only 4th line of all
the files is containing the pattern "abc"

Mirco Wahab · Feb 15, 2007

One more question.. kindly answer. How should the code change if I
want to print the number of lines of each file containg the pattern
"abc". Again I should get the answer 1 because only 4th line of all
the files is containing the pattern "abc"

Change one line

# old
# print ++$count, ". # $line\n" unless grep /$term/, @lines;
# new
print ++$count, ". # $line\n" if 3 == grep /$term/, @lines;

"unless grep" means:
if 0 == grep ...

so we change that to
if 3 == grep ...

in order to find lines w/all three occurrences

M.

Mirco Wahab · Feb 15, 2007

Mirco said:
Change one line
...
in order to find lines w/all three occurrences

BTW, my initial code was more like 'enhanced pseudo Perl',
so if you are interested to do some more Perl, you should
adopt to a more robust programming style, you should
for example (at least) check opened file handles and
avoid unnecessary repeated statements, more like:

[revamped example]

use strict;
use warnings;

my $term = qr/abc/;
my @name = qw' file1.txt file2.txt file3.txt ';

open my $f1, '<', $name[0] or warn "$name[0] $!";
open my $f2, '<', $name[1] or warn "$name[1] $!";;
open my $f3, '<', $name[2] or warn "$name[2] $!";;

my ($count, $line) = (0, 0);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" if 3 == grep /$term/, @lines;
}

print "Total: $count\n";

...

Regards

M.

Dr.Ruud · Feb 15, 2007

(e-mail address removed) schreef:

I think you have misunderstood.

No, you have misunderstood. An approach was shown, for you to learn.

Tip: With larger files, a while loop takes less memory.

Also: You really shouldn't quote old text that is not relevant to your
reply. Especially don't quote signatures!

Tad McClellan · Feb 16, 2007

I think you have misunderstood.

No, I think I didn't.

file1 name is file1.txt and `abc',
'def abc', 'dadq aft', `abc ttsg' are its contents.

Yes, I understood all of that, but I didn't want to litter my
filesystem with a bunch of files just for testing.

I left it to you to figure out yourself how to read each of
the 3 files into an array, or to adapt it to read 3 files
in parallel.

Mirco Wahab · Feb 16, 2007

Tad said:
No, I think I didn't.

Yes, I understood all of that, but I didn't want to litter my
filesystem with a bunch of files just for testing.

I left it to you to figure out yourself how to read each of
the 3 files into an array, or to adapt it to read 3 files
in parallel.

I know you wouldn't hesitate to spell out
a complete solution for al this within 45
seconds if necessary, but I believe in this
case your example wasn't really a good one.

I actually tried to start from it and pull
a solution but there are (imho) some larger
difficulties from a beginners view because
you can't really index file lines via
loop variables.

The closest working solution to that (without
cluttering the file system) I found was this:

#!/usr/bin/perl
use warnings;
use strict;

my $file1 = join "\n", ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my $file2 = join "\n", ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my $file3 = join "\n", ('wrtw', 'dhwu wwg', 'ffs', 'abc');

open my $h1, '<', \$file1 or warn "$!";
open my $h2, '<', \$file2 or warn "$!";;
open my $h3, '<', \$file3 or warn "$!";;

my $cnt=0;

foreach my $i ( 1 .. 3 ) {
$cnt += ( (<$h1> =~ /abc/)
+ (<$h2> =~ /abc/)
+ (<$h3> =~ /abc/) )
== 0 ? 1 : 0;
}

print int($cnt),"\n";

As you can see, to adopt to a 'file reading solution',
you have to completely rewrite the loop ... ;-)

(If I'm not completely mistaken)

Regards

Mirco (intermediate level)

Mirco Wahab · Feb 16, 2007

Tad said:
Yes, I understood all of that, but I didn't want to litter my
filesystem with a bunch of files just for testing.
I left it to you to figure out yourself how to read each of
the 3 files into an array, or to adapt it to read 3 files
in parallel.

I know you wouldn't hesitate to spell out
a complete solution for all this within 45
seconds if necessary, but I believe in this
case your example wasn't really a good one.

I actually tried to start from it and pull
a solution but there are (imho) some larger
difficulties from a beginners view because
you can't really index file lines via
loop variables.

The closest working solution to that (without
cluttering the file system) I found was this:

#!/usr/bin/perl
use warnings;
use strict;

my $file1 = join "\n", ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my $file2 = join "\n", ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my $file3 = join "\n", ('wrtw', 'dhwu wwg', 'ffs', 'abc');

open my $h1, '<', \$file1 or warn "$!";
open my $h2, '<', \$file2 or warn "$!";;
open my $h3, '<', \$file3 or warn "$!";;

my $cnt=0;
my $line=0;

while( ++$line ) {
last if( eof($h1) || eof($h2) || eof($h3) );

++$cnt unless ( (<$h1> =~ /abc/)
+ (<$h2> =~ /abc/)
+ (<$h3> =~ /abc/) )
}

print $cnt,"\n";

As one can see, to adopt to a 'file reading solution',
you have to completely rewrite the loop ... ;-)

(If I'm not completely mistaken)

Regards

Mirco (intermediate level)

Tad McClellan · Feb 16, 2007

Mirco Wahab said:
I know you wouldn't hesitate to spell out
a complete solution for all this within 45
seconds if necessary, but I believe in this
case your example wasn't really a good one.

I actually tried to start from it and pull
a solution but there are (imho) some larger
difficulties from a beginners view because
you can't really index file lines via
loop variables.

But since he said we are guaranteed that the files all have
the same number of lines ...

As one can see, to adopt to a 'file reading solution',
you have to completely rewrite the loop ... ;-)

.... the loop can be adopted to reading from files instead of having
the data in arrays without much trouble.

# assume 3 open()s done here

while ( my $file1 = <FILE1> ) {
next if $file1 =~ /abc/;

my $file2 = <FILE2>;
next if $file2 =~ /abc/;

my $file3 = <FILE3>;
next if $file3 =~ /abc/;

$cnt++;
}

anno4000 · Feb 16, 2007

Mirco Wahab said:
Mirco said:

Change one line
...
in order to find lines w/all three occurrences

Click to expand...

BTW, my initial code was more like 'enhanced pseudo Perl',
so if you are interested to do some more Perl, you should
adopt to a more robust programming style, you should
for example (at least) check opened file handles and
avoid unnecessary repeated statements, more like:

[revamped example]

use strict;
use warnings;

my $term = qr/abc/;
my @name = qw' file1.txt file2.txt file3.txt ';

open my $f1, '<', $name[0] or warn "$name[0] $!";
open my $f2, '<', $name[1] or warn "$name[1] $!";;
open my $f3, '<', $name[2] or warn "$name[2] $!";;

my ($count, $line) = (0, 0);
while( ! eof($f1) and ! eof($f2) and ! eof($f3) ) {
my @lines = (scalar <$f1>, scalar <$f2>, scalar <$f3>);
++$line;
print ++$count, ". # $line\n" if 3 == grep /$term/, @lines;
}

print "Total: $count\n";

my @name = map "file$_.txt", 1 .. 3;
my @fh;
open $fh[ @fh], '<', $_ for @name;

while ( ! grep eof( $_), @fh ) {
chomp( my @lines = map scalar <$_>, @fh);
# etc...
}
# etc...

Anno

Mirco Wahab · Feb 16, 2007

Tad said:
But since he said we are guaranteed that the files all have
the same number of lines ...
...
... the loop can be adopted to reading from files instead of having
the data in arrays without much trouble.

# assume 3 open()s done here

while ( my $file1 = <FILE1> ) {
next if $file1 =~ /abc/;

my $file2 = <FILE2>;
next if $file2 =~ /abc/;

my $file3 = <FILE3>;
next if $file3 =~ /abc/;

$cnt++;
}

This would imho loose synchronization
(number of read lines in each file).

You need to subsequently read
/one line out of each file/
in every iteration. If you don't,
you'd compare file1's twentieth
line with file2's second or so.

Regards

M.

comp.llang.perl.moderated · Feb 16, 2007

my @file1 = ('abc', 'def abc', 'dadq aft', 'abc ttsg');
my @file2 = ('gsywg', 'abc hshs', 'hhs gtc', 'abc hhshh');
my @file3 = ('wrtw', 'dhwu wwg', 'ffs', 'abc');

Click to expand...

my $cnt=0;
foreach my $i ( 0 .. $#file1 ) {
next if $file1[$i] =~ /abc/;
next if $file2[$i] =~ /abc/;
next if $file3[$i] =~ /abc/;
$cnt++;

}

Click to expand...

print "$cnt\n";
----------------------

Click to expand...

- Show quoted text -

Click to expand...

I think you have misunderstood. file1 name is file1.txt and `abc',
'def abc', 'dadq aft', `abc ttsg' are its contents. Similarly for
file2.txt and file3.txt. Note that the contents can be anything and my
real application files are containing as many as 2000 lines.

Tie::File (comes with Perl distro now) will
load a file into an array for you. Slower
if big or many files but an easy upfront
change to use Tad's solution.

Eg,

tie @array1, 'Tie::File', 'file1.txt' or die ...;

Mirco Wahab · Feb 17, 2007

my @name = map "file$_.txt", 1 .. 3;
my @fh;
open $fh[ @fh], '<', $_ for @name;

while ( ! grep eof( $_), @fh ) {
chomp( my @lines = map scalar <$_>, @fh);
# etc...
}
# etc...

Not a bad idea!
But much too verbose ... ;-)

use strict;
use warnings;

my (@fh, $count);
open $fh[ @fh ], '<', 'file'.$_.'.txt' for 1..3;

++$count unless grep /abc/, map scalar <$_>, @fh
while( ! grep eof($_), @fh )

print $count if defined $count;

(actually I didn't believe yours would work
at all, but learned from you that

...
map <$_>, @handle_array;
...

does indeed work!
Thanks for that hint.

Regards

Mirco

anno4000 · Mar 3, 2007

comp.llang.perl.moderated said:
On Feb 15, 4:39 am, (e-mail address removed) wrote:
[...]

Tie::File (comes with Perl distro now) will
load a file into an array for you. Slower
if big or many files but an easy upfront
change to use Tad's solution.

Tie::File doesn't preload the file, if that's your concern, it uses tie
magic to make it look as if it did.

Anno

anton.vandersteen · Mar 4, 2007

comp.llang.perl.moderated said:
comp.llang.perl.moderated said:

On Feb 15, 4:39 am, (e-mail address removed) wrote:
[...]

Tie::File (comes with Perl distro now) will
load a file into an array for you. Slower
if big or many files but an easy upfront
change to use Tad's solution.

Click to expand...

Tie::File doesn't preload the file, if that's your concern, it uses tie
magic to make it look as if it did.

Anno

Hello to Perl Momks,

This is my sollution to the problem:

#!/perl/bin/perl
#This programme is written by Anton van der Steen
#Email adres: (e-mail address removed)
use Tk;
use File::Find;

my $mw = new MainWindow; # Main Window
$mw->title("Search Engine Version 2.0 by Stone Logic Systems");

######################################################################
$mw->configure(-menu => my $menubar = $mw->Menu);

my $filemenu = $menubar->cascade(-label => "~File",
-tearoff => 1);
my $execute_sql_statement= $menubar->cascade(-label => "Count
~Phrase",
-tearoff=>1);
my $save_result_to_file = $menubar->cascade(-label => "~Show Text",
-tearoff=>1);
my $clear_text_area = $menubar->cascade(-label => "~Clear Result Set",
-tearoff=>1);
my $create_pdf = $menubar->cascade(-label => "~Export to File",
-tearoff=>1);
my $create_excel = $menubar->cascade(-label=> "Find File",
-tearoff=>1);

#my $create_xml = $menubar->cascade(-label=> "Export to XML",
# -tearoff=>1);

my $helpmenu = $menubar->cascade(-label => "~Help",
-tearoff => 1);

$filemenu->command(-label => "E~xit",
-command => sub{$mw->destroy});

$execute_sql_statement->command(-label => "Count ~Phrase",
-command=> sub{push_button1()});

$save_result_to_file->command(-label =>"~Show Text",
-command=> sub{push_button2()});

$clear_text_area->command(-label => "~Clear Result Set",
-command=> sub {push_button3()});

$create_pdf->command(-label => "~Export to File",
-command=> sub {push_button4()});

$create_excel->command(-label =>"Find File",
-command=> sub {push_button5()}) ;

#$create_xml->command(-label =>"Export to XML",
#-command=> sub {XML()});

$helpmenu->command(-label => "~Help Contents",
-command => sub{showhelp()});

######################################################################
my $frm_name = $mw -> Frame() -> pack();

my $lab1 = $frm_name -> Label(-text=>"Phrase :", -font => '-adobe-
helvetica-bold-r-normal--11-120-75-75-p-70-*-1') -> pack();

my $ent1 = $frm_name -> Entry(-width=>100, -borderwidth=>2) -> pack();
$ent1->configure(-font => '-adobe-helvetica-bold-r-
normal--11-120-75-75-p-70-*-1');

my $lab2=$frm_name->Label(-text=>"Search in File :", -font => '-adobe-
helvetica-bold-r-normal--11-120-75-75-p-70-*-1')->pack();

my $ent2=$frm_name->Entry(-width=>100)->pack();
$ent2->configure(-font => '-adobe-helvetica-bold-r-
normal--12-120-75-75-p-70-*-1');

my $lab3=$frm_name->Label(-text=>"Export to file :", -font => '-adobe-
helvetica-bold-r-normal--11-120-75-75-p-70-*-1')->pack();

my $ent3=$frm_name->Entry(-width=>100)->pack();
$ent3->configure(-font => '-adobe-helvetica-bold-r-
normal--12-120-75-75-p-70-*-1');

#my $but1 = $mw -> Button(-text=>"Count Appearance Phrase", -command =>
\&push_button1, -background=>"green",
# -font => '-adobe-helvetica-bold-r-normal--11-120-75-75-p-70-*-1') ->
pack();

#my $but2 = $mw -> Button(-text=>"Show text", -command =>
\&push_button2, -background=>"yellow",
# -font => '-adobe-helvetica-bold-r-normal--11-120-75-75-p-70-*-1') ->
pack();

#my $but3 = $mw -> Button(-text=>"Clear Text Area", -command =>
\&push_button3, -background=>"orange",
# -font => '-adobe-helvetica-bold-r-normal--11-120-75-75-p-70-*-1') ->
pack();

#my $but4 = $mw -> Button(-text=>"Save result to file", -command =>
\&push_button4, -background=>"cyan",
# -font => '-adobe-helvetica-bold-r-normal--11-120-75-75-p-70-*-1') ->
pack();

#Text Area
my $txt = $mw->Scrolled( 'Text' , -scrollbars=>'se' , -wrap=>
'none',);
$txt->configure(-width=>120, -height=>20, -font => '-adobe-helvetica-
bold-r-normal--14-120-75-75-p-70-*-1');
$txt->pack();

MainLoop;

sub push_button1 {

use Getopt::Std;

my $name1 = $ent1 -> get();
my $name2 = $ent2 -> get();
@ARGV= ($name1, $name2);
#print @ARGV;

$i=0;

my $pattern = shift @ARGV;

foreach $file (@ARGV)
{
open (FILE, $file);
while ($line = <FILE>)
{
if ($line =~m"$pattern")
{
$i++;
last if ($opt_1);

}

}
#print "The phrase $pattern is $i times found!!\n";

$txt-> insert ('0.0',"The phrase $pattern is $i times found in file
$file.\n");
close (FILE);
$i=0;

}

sub push_button2 {

use Getopt::Std;

my $name1 = $ent1 -> get();
my $name2 = $ent2 -> get();
@ARGV= ($name1, $name2);

$i=0;

my $pattern = shift @ARGV;

foreach $file (@ARGV)
{
open (FILE, $file);
while ($line = <FILE>)
{
if ($line =~m"$pattern")
{
$i++;
last if ($opt_1);
$txt-> insert ('end', "$line\n");
#print "$line\n";

}

}

close (FILE);
$i=0;

}

}

sub push_button3 {
$txt-> delete ('0.0', 'end');

}

sub push_button4 {

use Getopt::Std;

my $name1 = $ent1 -> get();
my $name2 = $ent2 -> get();
my $file_out = $ent3 -> get();

@ARGV= ($name1, $name2);

$i=0;

my $pattern = shift @ARGV;

open(OUT,">$file_out");

foreach $file (@ARGV)
{
open (FILE, $file);
while ($line = <FILE>)
{
if ($line =~m"$pattern")
{
$i++;
last if ($opt_1);
print OUT "$line\n";

}

}

close (FILE);
$i=0;

}
close(OUT);

}
}

sub push_button5{

find(\&push_button6, "c:\\")

}

sub push_button6 {

use Getopt::Std;
use File::Find;

my $a = $ent1 -> get();

if (/\.$a$/) # ((/\.zip$/) ||
{ #print "$File::Find::name\n";
$txt-> insert ('end', "$File::Find::name\n");
}
}

;

Have fun......

Select Eof extension files based on text list of filenames with if condition	0	May 4, 2022
To compare the content in two files..	4	Nov 17, 2010
eval within grep not working	1	Oct 1, 2010
Open Multiples Files at same time with multiprocessing - How"declare" dynamically the var?	1	Nov 11, 2010
lunix split with ruby	8	Oct 7, 2009
Diff Files contaning regex pattern string	0	May 13, 2007
pattern matching and array methods	7	Apr 27, 2011
script crashes with "Too many open files"	1	Dec 18, 2008

pattern serach over many files

pavan734

Mirco Wahab

Tad McClellan

pavan734

Gunnar Hjalmarsson

pavan734

pavan734

Mirco Wahab

Mirco Wahab

Dr.Ruud

Tad McClellan

Mirco Wahab

Mirco Wahab

Tad McClellan

anno4000

Mirco Wahab

comp.llang.perl.moderated

Mirco Wahab

anno4000

anton.vandersteen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads