perl newbie: leaner code ideas

J

John Adams

To learn Perl, I have written a bit of code that needs to do the following:

1) Load a configuration file with various line formats (i.e. starting
with }, or ), or SEC_CRIT, etc)
2) Load a second file with a list of file names in the form of
"/usr/sbin/someprog", "/usr/etc/somefile" etc
3) Write each line (keeping the same format in terms of leading spaces and
blank lines) unless it matches
an entry in the second file list where it will add a "# " to the line to
comment it out.

The configuration file is called "list.txt" while the configuration file is
"policy.txt". The code below works, but it could probably be done more
efficiently. Some ideas on how this script can be made leaner would be
greatly appreciated. Below is the code and then some sample data for the
two files:


__CODE__ match.pl

#!/usr/bin/perl

use strict;
use warnings;

open(SEARCH, "list.txt") || die "Can't open file list.txt";
my (@search_string) = <SEARCH>;
close(SEARCH);
open(POLICY, "policy.txt") || die "Can't open file policy.txt";
my (@policy_lines) = <POLICY>;
close(POLICY);

# Declare Variables #

my $first_part;
my $first_part2;
my $i;
my $search_string;
my $policy_line;
my $policy_line2;

foreach $policy_line(@policy_lines) { #1
for ($i=0;$i<scalar(@search_string);$i++ ) { #2
if ($policy_line =~ /^(\s)*$/) {next;} #
Ignore Blank Lines
$policy_line2 = $policy_line;
$policy_line2 =~ s/^\s*//; #
Remove leading whitespace
($first_part) = split(/\s+/,$policy_line2); #
Avoid collisions with text after search string
$first_part2 = quotemeta($first_part); #
Quote Meta Characters
if ($search_string[$i] =~ /$first_part2$/) { #3 #
Match strings with first and last anchors
print "# ",$policy_line; last; #
Print commented line if matched
}
else {next;}
} #4
{
if ($i>=scalar(@search_string)) { print $policy_line; next;} #
Just print line if not matched
}
}
__END__

__FILE__ list.txt

/usr/sbin/a
/usr/sbin/11
/sbin/busybox
/usr/sbin/3
/usr/sbin/22
/sbin/accton

__END__

__FILE__ policy.txt



############################################################################
##
#
##
############################################################################
## #
#
# #
# Policy file
# #
#
##
############################################################################
##



############################################################################
##

/usr/sbin
/usr/sbin/a
/usr/sbin/b
/usr/sbin/c
/usr/sbin/d
/usr/sbin/e
/usr/sbin/f
/usr/sbin/1
/usr/sbin/2
/usr/sbin/3
/usr/sbin/4
/usr/sbin/11
/usr/sbin/22
/usr/sbin/33
/usr/sbin/44
/usr/sbin/55

@@section GLOBAL
TWROOT=/usr/sbin;
TWBIN=/usr/sbin;

@@section FS
SEC_CRIT = $(IgnoreNone)-SHa ; # Critical files that cannot change
SEC_SUID = $(IgnoreNone)-SHa ; # Binaries with the SUID or SGID flags
set
SEC_BIN = $(ReadOnly) ; # Binaries that should not change
SEC_CONFIG = $(Dynamic) ; # Config files that are changed
infrequently but accessed often
SEC_LOG = $(Growing) ; # Files that grow, but that should
never change ownership
SEC_INVARIANT = +tpug ; # Directories that should never change
permission or ownership
SIG_LOW = 33 ; # Non-critical files that are of
minimal security impact
SIG_MED = 66 ; # Non-critical files that are of
significant security impact
SIG_HI = 100 ; # Critical files that are significant
points of vulnerability


# Binaries
(
)
{

{
/sbin/accton -> $(SEC_CRIT) ;
/sbin/badblocks -> $(SEC_CRIT) ;
/sbin/busybox -> $(SEC_CRIT) ;
/sbin/busybox.anaconda -> $(SEC_CRIT) ;
/sbin/convertquota -> $(SEC_CRIT) ;
/sbin/dosfsck -> $(SEC_CRIT) ;
/sbin/debugfs -> $(SEC_CRIT) ;
/sbin/debugreiserfs -> $(SEC_CRIT) ;

__END__
 
N

nobull

John Adams said:
To learn Perl, I have written a bit of code that needs to do the following:

1) Load a configuration file with various line formats (i.e. starting
with }, or ), or SEC_CRIT, etc)
2) Load a second file with a list of file names in the form of
"/usr/sbin/someprog", "/usr/etc/somefile" etc
3) Write each line (keeping the same format in terms of leading spaces and
blank lines) unless it matches
an entry in the second file list where it will add a "# " to the line to
comment it out.

The configuration file is called "list.txt" while the configuration file is
"policy.txt". The code below works, but it could probably be done more
efficiently. Some ideas on how this script can be made leaner would be
greatly appreciated. Below is the code and then some sample data for the
two files:

Congrats on an excellently constucted post. You've done everything
right! Clear, concise, well thought out... Probably in the top 1% of
all newbie posts. I won't even criticise you (much) for putting
"newbie" in the subject line although it's usually frowned upon and
would prevent your message being seen by many of the most knowledgible
and helpful people in comp.lang.perl.*.

There's just one tiny thing. This newsgroup does not exist. A post
sent to this newsgroup will only show up on a very few misconfigured
news servers.
__CODE__ match.pl

#!/usr/bin/perl

use strict;
use warnings;

open(SEARCH, "list.txt") || die "Can't open file list.txt";
my (@search_string) = <SEARCH>;
close(SEARCH);
open(POLICY, "policy.txt") || die "Can't open file policy.txt";
my (@policy_lines) = <POLICY>;
close(POLICY);

All OK so far, assuming that is you needed to slurp. You probably
didn't need to slurp POLICY - you could process it line-wise.
# Declare Variables #

Your code has too many comments. The comments you see in programming
examples in tutorials are there to explain what the code does for the
beniefit of people who don't already know the language. Many people
fall into the trap of thinking that is what comments in real programs
should do. That is not true.

In real programs comments should not simply restate in English (or
Japanese or whatever) what you've just said in Perl (or C++ or
whatever).

Where possible use comments to explain in English why you do things
not how. (Try to make the how clear in the Perl - use as much
whitespace as you need to achieve this).
my $first_part;
my $first_part2;
my $i;
my $search_string;
my $policy_line;
my $policy_line2;

Don't declare variables prematurely. Declare them in the smallest
applicable scope. Premature declaration defeats half the purpose of
insisting on explicit declaration in the first place. In practice
declaration should usually be combined with the first time a variable
is used.
foreach $policy_line(@policy_lines) { #1

You should declare $policy_line here:

foreach my $policy_line (@policy_lines) { #1

As it happens, if the foreach loop contriol variable has already been
declared lexical it gets implicitly redeclared so you've actually got
two variables called $policy_line - one inside the loop and one
outside. IMNSHO Perl should emit a warning if you use a lexical
variable as a foreach loop contriol variable without an explicit "my".

But as I said, there was no need to slurp...

while ( my $policy_line = said:
for ($i=0;$i<scalar(@search_string);$i++ ) { #2

That's more conventionally written

for my $i ( 0 .. $#search_string ) {

However I suspect you really should be iterating over the contents of
@search_string directly rather then using a subscript.
if ($policy_line =~ /^(\s)*$/) {next;} #

This is more coventionally written:

next unless $policy_line =~ /\S/;

The above is such a common idiom that it cetainly needs no comment.
Any experienced Perl programmer (including, presumably yourself in 6
months when you come to look back at this script) will instantly
recognise it.
$policy_line2 = $policy_line;
$policy_line2 =~ s/^\s*//; #
($first_part) = split(/\s+/,$policy_line2); #
$first_part2 = quotemeta($first_part); #

It is always difficult finding a ballance between one expression being
hard to follow because it's too complex and several smaller step with
intermediate variables being hard to follow because there are just too
many of them.

I would say, however, you've definely gone to the too many side. Not
least because each of your steps is as complex as the whole problem!

To get the first sequence of non-whitespace in $policy_line:

my ($first_part) = $policy_line =~ /(\S+)/;

The above will also be false if there are no non-whitespace characters
so you can use its return value instead of checking for such lines
explicitly first.

It is best to explicitly consturct a pre-compiled regex. (Mostly for
the sake of readability but also there is a small speed gain).

my $pattern = qr/\Q$first_part\E$/;

Also you should take all the stuff thats invarient wrt the inner loop,
ouside the inner loop.

if ($search_string[$i] =~ /$first_part2$/) { #3 #

if ($search_string[$i] =~ /$pattern/) {

I'm assuming here that the fact that the pattern is only anchored at
the end is deliberate. If not there's a much much simpler way to do
the whole thing using a hash.
print "# ",$policy_line; last; #

You really want to do a next on the outer loop here. You can do this
with labeled loops.

Alternatively just print the "# " and then there's not need to prevent
$policy_line being printed later.
}
else {next;}
} #4

The next was already redundant. Reaching the end of a loop is an
implicit next.
{
if ($i>=scalar(@search_string)) { print $policy_line; next;} #
Just print line if not matched
}
}

Again the next is redundant. You probaly didn't even realise it was
exiting the bare block.

while ( my $policy_line = <POLICY> ) {
if ( my ($first_part) = $policy_line =~ /(\S+)/ ) {
my $pattern = qr/\Q$first_part\E$/;
foreach ( @search_string ) {
if ( /$pattern/ ) {
print '# ';
last;
}
}
}
print $policy_line;
}

Actually instead of slurping list.txt into and array @search_string it
would be rather simpler if you had slurped it into a scalar
$search_string and used a /m (multi-line) qualifier on the pattern
match.

while ( my $policy_line = <POLICY> ) {
if ( my ($first_part) = $policy_line =~ /(\S+)/ ) {
print '# ' if $search_string =~ /\Q$first_part\E$/m;
}
print $policy_line;
}

Once the loop is that small I'd be inclined not to bother with
$policy_line and just use $_. (Note: some highly respected people
would disagree).

while ( <POLICY> ) {
if ( my ($first_part) = /(\S+)/ ) {
print '# ' if $search_string =~ /\Q$first_part\E$/m;
}
print;
}
 
N

nobull

Further to my previous post... (sorry I can self-follow use to the
posting latency on Google (this wouldn't be an issue in a group that
still exists)).

Read my first post first.

John Adams said:
if ($search_string[$i] =~ /$first_part2$/) { #3 #
Match strings with first and last anchors

Now as a commented before, comments that simply restate in English
what you've said in Perl are bad.

Once you'd cried-woolf several times with comments that restated what
you'd said in the Perl I stopped reading the comments so I didn't even
notice the disrepancy.

Put the effort into getting the code right, leave out the restatement.
If one sees a piece of code that does one thing and a comment along
side that says it does somthing else one can't help sometimes
mis-reading the code.

As I also said, if you are anchoring both ends there's an alternative
hash based approach that will be much more efficient if @search_string
is large (although it'll probably not be as efficient as the scalar
slurp approach if it isn't ever going to be large).

Populate %search_string thus:

my %search_string;
while ( <SEARCH> ) {
chomp;
$search_string{$_}++;
}

(Note: =1 is faster than ++ but I find ++ more ideomatic).

Then later:

print '# ' if $search_string{$first_part};
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,049
Members
48,769
Latest member
Clifft

Latest Threads

Top