Filter content from a list: hard-coded expression or read from a file?

Francois Massion · Mar 26, 2012

Newbee question:
I have a list of strings like the following list:

Log file content
a long date
the mandatory check
Mark text to replace

I want to keep only the strings which do not begin with certain words.
So far I have done it with a hard coded list of words but this list
may vary and can be very long. I wonder how I could read the list from
a file and achieve the same result.
Here the code which works:

open(INPUT,'mytext.txt') || die("File cannot be opened!\n");
@sentence = <INPUT>;
close(INPUT);
foreach $sentence (@sentence) {
chomp $sentence;
if ($sentence !~ m/^a |^the |^therefore /i) { # Actually a very long
list
push (@filteredresult,$sentence);
}

Dr.Ruud · Mar 26, 2012

Newbee question:

See also the beginners list @perl.org.

[...]
open(INPUT,'mytext.txt') || die("File cannot be opened!\n");

my $infile = 'mytext.txt';

open my $input, '<', $infile
or die "Error opening '$infile': $!\n");

@sentence =<INPUT>;

No need to slurp the file in, when you will process it by line.

my @words = qw/ a the therefore /;

my $re = join '|', @words;

while ( <$input> ) {
next if /^(?:$re)\x{20}/;
...;
}

Rainer Weikusat · Mar 26, 2012

Francois Massion said:
I have a list of strings like the following list:

Log file content
a long date
the mandatory check
Mark text to replace

I want to keep only the strings which do not begin with certain words.
So far I have done it with a hard coded list of words but this list
may vary and can be very long. I wonder how I could read the list from
a file and achieve the same result.
Here the code which works:

open(INPUT,'mytext.txt') || die("File cannot be opened!\n");
@sentence = <INPUT>;
close(INPUT);
foreach $sentence (@sentence) {
chomp $sentence;
if ($sentence !~ m/^a |^the |^therefore /i) { # Actually a very long
list
push (@filteredresult,$sentence);
}

My suggestion would be to put the exclusion list into a hash (this is
uncompiled example code), ie,

open($fh, '<', '/path/to/list');
%excls = map { chomp; $_, 1; } <$fh>;

and then check it as follows:

next if $sentence =~ /^(\W*)/ && $excls{lc($1));

(push coming after this line) or

push(@result, $sentence) unless $sentence =~ /^(\W*)/ && $excls{lc($1)}

Francois Massion · Mar 26, 2012

My suggestion would be to put the exclusion list into a hash (this is
uncompiled example code), ie,

open($fh, '<', '/path/to/list');
%excls = map { chomp; $_, 1; } <$fh>;

and then check it as follows:

next if $sentence =~ /^(\W*)/ && $excls{lc($1));

(push coming after this line) or

push(@result, $sentence) unless $sentence =~ /^(\W*)/ && $excls{lc($1)}

I have tested 2 versions, unsuccessfully:

Version # 1 (based on Rainer's suggestion):
#!/usr/bin/perl -w

my $infile = 'a.txt';
open my $input, '<', $infile;
open($fh, '<', 'b.txt');
%excls = map { chomp; $_, 1; } <$fh>;
next if $input =~ /^(\W*)/ && $excls{lc($1)};
push(@result, $input) unless $input =~ /^(\W*)/ && $excls{lc($1)} ;
foreach (@result) {
print "$_\n";
}

RESULT: GLOB(0x36f178)
(No idea what this means)

Version # 2 (based on Dr Ruud and Ben's suggestion; sorry if I messed
it up):

#!/usr/bin/perl -w

my $infile = 'a.txt';

open my $input, '<', $infile;
open my $WORDS, '<', 'b.txt';
my @words = <$WORDS>;
my $re = join "|", map quotemeta, @words;
while ( <$input> ) {
next if /^(?:$re)\x{20}/;
push (@filteredresult,$input);

foreach (@filteredresult) {
print "$_\n";
}}

RESULT:
GLOB(0x1ff178)
GLOB(0x1ff178)
GLOB(0x1ff178)
....

Rainer Weikusat · Mar 26, 2012

Francois Massion said:
I have tested 2 versions, unsuccessfully:

Version # 1 (based on Rainer's suggestion):
#!/usr/bin/perl -w

my $infile = 'a.txt';
open my $input, '<', $infile;
open($fh, '<', 'b.txt');
%excls = map { chomp; $_, 1; } <$fh>;
next if $input =~ /^(\W*)/ && $excls{lc($1)};
push(@result, $input) unless $input =~ /^(\W*)/ && $excls{lc($1)} ;
foreach (@result) {
print "$_\n";
}

RESULT: GLOB(0x36f178)
(No idea what this means)

The reason why I wrote 'you can do this OR that' was that these were
supposed to be mutually exclusive options. Also, you obviously need
some kind of input processing loop and test the condition against the
sentences, NOT against the result of stringfying the input file handle
(which is 'some glob').

ccc31807 · Mar 26, 2012

Newbee question:
I have a list of strings like the following list:

Log file content
a long date
the mandatory check
Mark text to replace

I want to keep only the strings which do not begin with certain words.

It would have been more helpful (for me, anyway) if you had posted
your actual data, but that's okay.

I have found that these kinds of tasks often decompose into a
particular pattern, illustrated below. The pattern has three phases:
(1) read the file contents into a data structure, (2) munge the data,
and (3) write the data to a file. The following (hypothetical) script
illustrates this:

#! perl
use strict;
use warnings;

my %data;
read_file_contents();
munge_data();
write_data_to_file();
exit(0);

sub read_file_contets
{
open FILE, '<', 'data_file.csv' or die "$!";
next unless /\w/; #skip empty lines
next if /your REGEX to skip/; #skip unneeded lines
chomp;
my ($val1, $val2, $val3, ...) = split(/?/, $_)
$data{$val1} = {
KEY2 => $val2,
KEY3 => $val3,
KEY4 => $val4,
...,
}
close FILE;
}
sub munge_data
{
#you now have your data in a convenient structure
#so you can manipulate it how you please
foreach my $key (keys %data) { munge_record($data{$key}); }
}
sub write_data_to_file
{
open OUT, '>', 'output.csv' or die "$!";
print OUT qq("COL1","COL2","COL3", ...);
foreach my $key (keys %data)
{
print OUT qq("$key","$data{$key}{KEY2}"," ...);
}
close OUT;
}
sub munge_record
{
my $record = shift;
# munge here
}

Ted Zlatanov · Mar 26, 2012

FM> I have tested 2 versions, unsuccessfully:

Hi Francois,

if you're OK with using different tools, maybe try the GNU egrep tool.

Given files a and b:

% grep . a b
a:1
a:2
a:3
a:4
a:5
b:^[12]
b:^[4]

You can just use the -f option to read patterns from b to filter a:

% egrep -f b a
1
2
4

This approach may work better for you, depending on the OS platforms you
have to support, the size of the file, and the complexity of the regular
expressions. Try it out.

Ted

How to read from a .csv file in Java?	1	Nov 6, 2023
How do I save information from an GUI into a XML-file?	0	Aug 17, 2022
Insert replace text based on a name in other file python script	4	Mar 5, 2025
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
String and list error while running a Markov Chain	1	Aug 26, 2020
Mandatory Elements To Conduct JavaScript Form Manipulation	7	Aug 22, 2023
User prompt as file to read	1	Mar 22, 2014
Collect Excel Data from Website	5	Apr 30, 2022

Filter content from a list: hard-coded expression or read from a file?

Francois Massion

Dr.Ruud

Rainer Weikusat

Francois Massion

Rainer Weikusat

ccc31807

Ted Zlatanov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads