perl newbie: leaner code ideas

Discussion in 'Perl' started by John Adams, Jan 31, 2004.

  1. John Adams

    John Adams Guest

    To learn Perl, I have written a bit of code that needs to do the following:

    1) Load a configuration file with various line formats (i.e. starting
    with }, or ), or SEC_CRIT, etc)
    2) Load a second file with a list of file names in the form of
    "/usr/sbin/someprog", "/usr/etc/somefile" etc
    3) Write each line (keeping the same format in terms of leading spaces and
    blank lines) unless it matches
    an entry in the second file list where it will add a "# " to the line to
    comment it out.

    The configuration file is called "list.txt" while the configuration file is
    "policy.txt". The code below works, but it could probably be done more
    efficiently. Some ideas on how this script can be made leaner would be
    greatly appreciated. Below is the code and then some sample data for the
    two files:


    __CODE__ match.pl

    #!/usr/bin/perl

    use strict;
    use warnings;

    open(SEARCH, "list.txt") || die "Can't open file list.txt";
    my (@search_string) = <SEARCH>;
    close(SEARCH);
    open(POLICY, "policy.txt") || die "Can't open file policy.txt";
    my (@policy_lines) = <POLICY>;
    close(POLICY);

    # Declare Variables #

    my $first_part;
    my $first_part2;
    my $i;
    my $search_string;
    my $policy_line;
    my $policy_line2;

    foreach $policy_line(@policy_lines) { #1
    for ($i=0;$i<scalar(@search_string);$i++ ) { #2
    if ($policy_line =~ /^(\s)*$/) {next;} #
    Ignore Blank Lines
    $policy_line2 = $policy_line;
    $policy_line2 =~ s/^\s*//; #
    Remove leading whitespace
    ($first_part) = split(/\s+/,$policy_line2); #
    Avoid collisions with text after search string
    $first_part2 = quotemeta($first_part); #
    Quote Meta Characters
    if ($search_string[$i] =~ /$first_part2$/) { #3 #
    Match strings with first and last anchors
    print "# ",$policy_line; last; #
    Print commented line if matched
    }
    else {next;}
    } #4
    {
    if ($i>=scalar(@search_string)) { print $policy_line; next;} #
    Just print line if not matched
    }
    }
    __END__

    __FILE__ list.txt

    /usr/sbin/a
    /usr/sbin/11
    /sbin/busybox
    /usr/sbin/3
    /usr/sbin/22
    /sbin/accton

    __END__

    __FILE__ policy.txt



    ############################################################################
    ##
    #
    ##
    ############################################################################
    ## #
    #
    # #
    # Policy file
    # #
    #
    ##
    ############################################################################
    ##



    ############################################################################
    ##

    /usr/sbin
    /usr/sbin/a
    /usr/sbin/b
    /usr/sbin/c
    /usr/sbin/d
    /usr/sbin/e
    /usr/sbin/f
    /usr/sbin/1
    /usr/sbin/2
    /usr/sbin/3
    /usr/sbin/4
    /usr/sbin/11
    /usr/sbin/22
    /usr/sbin/33
    /usr/sbin/44
    /usr/sbin/55

    @@section GLOBAL
    TWROOT=/usr/sbin;
    TWBIN=/usr/sbin;

    @@section FS
    SEC_CRIT = $(IgnoreNone)-SHa ; # Critical files that cannot change
    SEC_SUID = $(IgnoreNone)-SHa ; # Binaries with the SUID or SGID flags
    set
    SEC_BIN = $(ReadOnly) ; # Binaries that should not change
    SEC_CONFIG = $(Dynamic) ; # Config files that are changed
    infrequently but accessed often
    SEC_LOG = $(Growing) ; # Files that grow, but that should
    never change ownership
    SEC_INVARIANT = +tpug ; # Directories that should never change
    permission or ownership
    SIG_LOW = 33 ; # Non-critical files that are of
    minimal security impact
    SIG_MED = 66 ; # Non-critical files that are of
    significant security impact
    SIG_HI = 100 ; # Critical files that are significant
    points of vulnerability


    # Binaries
    (
    )
    {

    {
    /sbin/accton -> $(SEC_CRIT) ;
    /sbin/badblocks -> $(SEC_CRIT) ;
    /sbin/busybox -> $(SEC_CRIT) ;
    /sbin/busybox.anaconda -> $(SEC_CRIT) ;
    /sbin/convertquota -> $(SEC_CRIT) ;
    /sbin/dosfsck -> $(SEC_CRIT) ;
    /sbin/debugfs -> $(SEC_CRIT) ;
    /sbin/debugreiserfs -> $(SEC_CRIT) ;

    __END__
     
    John Adams, Jan 31, 2004
    #1
    1. Advertising

  2. John Adams

    Guest

    "John Adams" <> wrote in message news:<YlJSb.336879$JQ1.113917@pd7tw1no>...
    > To learn Perl, I have written a bit of code that needs to do the following:
    >
    > 1) Load a configuration file with various line formats (i.e. starting
    > with }, or ), or SEC_CRIT, etc)
    > 2) Load a second file with a list of file names in the form of
    > "/usr/sbin/someprog", "/usr/etc/somefile" etc
    > 3) Write each line (keeping the same format in terms of leading spaces and
    > blank lines) unless it matches
    > an entry in the second file list where it will add a "# " to the line to
    > comment it out.
    >
    > The configuration file is called "list.txt" while the configuration file is
    > "policy.txt". The code below works, but it could probably be done more
    > efficiently. Some ideas on how this script can be made leaner would be
    > greatly appreciated. Below is the code and then some sample data for the
    > two files:


    Congrats on an excellently constucted post. You've done everything
    right! Clear, concise, well thought out... Probably in the top 1% of
    all newbie posts. I won't even criticise you (much) for putting
    "newbie" in the subject line although it's usually frowned upon and
    would prevent your message being seen by many of the most knowledgible
    and helpful people in comp.lang.perl.*.

    There's just one tiny thing. This newsgroup does not exist. A post
    sent to this newsgroup will only show up on a very few misconfigured
    news servers.

    > __CODE__ match.pl
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > open(SEARCH, "list.txt") || die "Can't open file list.txt";
    > my (@search_string) = <SEARCH>;
    > close(SEARCH);
    > open(POLICY, "policy.txt") || die "Can't open file policy.txt";
    > my (@policy_lines) = <POLICY>;
    > close(POLICY);


    All OK so far, assuming that is you needed to slurp. You probably
    didn't need to slurp POLICY - you could process it line-wise.

    > # Declare Variables #


    Your code has too many comments. The comments you see in programming
    examples in tutorials are there to explain what the code does for the
    beniefit of people who don't already know the language. Many people
    fall into the trap of thinking that is what comments in real programs
    should do. That is not true.

    In real programs comments should not simply restate in English (or
    Japanese or whatever) what you've just said in Perl (or C++ or
    whatever).

    Where possible use comments to explain in English why you do things
    not how. (Try to make the how clear in the Perl - use as much
    whitespace as you need to achieve this).

    > my $first_part;
    > my $first_part2;
    > my $i;
    > my $search_string;
    > my $policy_line;
    > my $policy_line2;


    Don't declare variables prematurely. Declare them in the smallest
    applicable scope. Premature declaration defeats half the purpose of
    insisting on explicit declaration in the first place. In practice
    declaration should usually be combined with the first time a variable
    is used.

    > foreach $policy_line(@policy_lines) { #1


    You should declare $policy_line here:

    foreach my $policy_line (@policy_lines) { #1

    As it happens, if the foreach loop contriol variable has already been
    declared lexical it gets implicitly redeclared so you've actually got
    two variables called $policy_line - one inside the loop and one
    outside. IMNSHO Perl should emit a warning if you use a lexical
    variable as a foreach loop contriol variable without an explicit "my".

    But as I said, there was no need to slurp...

    while ( my $policy_line = <POLICY> ) {

    > for ($i=0;$i<scalar(@search_string);$i++ ) { #2


    That's more conventionally written

    for my $i ( 0 .. $#search_string ) {

    However I suspect you really should be iterating over the contents of
    @search_string directly rather then using a subscript.

    > if ($policy_line =~ /^(\s)*$/) {next;} #


    This is more coventionally written:

    next unless $policy_line =~ /\S/;

    The above is such a common idiom that it cetainly needs no comment.
    Any experienced Perl programmer (including, presumably yourself in 6
    months when you come to look back at this script) will instantly
    recognise it.

    > $policy_line2 = $policy_line;
    > $policy_line2 =~ s/^\s*//; #
    > ($first_part) = split(/\s+/,$policy_line2); #
    > $first_part2 = quotemeta($first_part); #


    It is always difficult finding a ballance between one expression being
    hard to follow because it's too complex and several smaller step with
    intermediate variables being hard to follow because there are just too
    many of them.

    I would say, however, you've definely gone to the too many side. Not
    least because each of your steps is as complex as the whole problem!

    To get the first sequence of non-whitespace in $policy_line:

    my ($first_part) = $policy_line =~ /(\S+)/;

    The above will also be false if there are no non-whitespace characters
    so you can use its return value instead of checking for such lines
    explicitly first.

    It is best to explicitly consturct a pre-compiled regex. (Mostly for
    the sake of readability but also there is a small speed gain).

    my $pattern = qr/\Q$first_part\E$/;

    Also you should take all the stuff thats invarient wrt the inner loop,
    ouside the inner loop.


    > if ($search_string[$i] =~ /$first_part2$/) { #3 #


    if ($search_string[$i] =~ /$pattern/) {

    I'm assuming here that the fact that the pattern is only anchored at
    the end is deliberate. If not there's a much much simpler way to do
    the whole thing using a hash.

    > print "# ",$policy_line; last; #


    You really want to do a next on the outer loop here. You can do this
    with labeled loops.

    Alternatively just print the "# " and then there's not need to prevent
    $policy_line being printed later.

    > }
    > else {next;}
    > } #4


    The next was already redundant. Reaching the end of a loop is an
    implicit next.

    > {
    > if ($i>=scalar(@search_string)) { print $policy_line; next;} #
    > Just print line if not matched
    > }
    > }


    Again the next is redundant. You probaly didn't even realise it was
    exiting the bare block.

    > __END__


    while ( my $policy_line = <POLICY> ) {
    if ( my ($first_part) = $policy_line =~ /(\S+)/ ) {
    my $pattern = qr/\Q$first_part\E$/;
    foreach ( @search_string ) {
    if ( /$pattern/ ) {
    print '# ';
    last;
    }
    }
    }
    print $policy_line;
    }

    Actually instead of slurping list.txt into and array @search_string it
    would be rather simpler if you had slurped it into a scalar
    $search_string and used a /m (multi-line) qualifier on the pattern
    match.

    while ( my $policy_line = <POLICY> ) {
    if ( my ($first_part) = $policy_line =~ /(\S+)/ ) {
    print '# ' if $search_string =~ /\Q$first_part\E$/m;
    }
    print $policy_line;
    }

    Once the loop is that small I'd be inclined not to bother with
    $policy_line and just use $_. (Note: some highly respected people
    would disagree).

    while ( <POLICY> ) {
    if ( my ($first_part) = /(\S+)/ ) {
    print '# ' if $search_string =~ /\Q$first_part\E$/m;
    }
    print;
    }
     
    , Jan 31, 2004
    #2
    1. Advertising

  3. John Adams

    Guest

    Further to my previous post... (sorry I can self-follow use to the
    posting latency on Google (this wouldn't be an issue in a group that
    still exists)).

    Read my first post first.

    "John Adams" <> wrote in message news:<YlJSb.336879$JQ1.113917@pd7tw1no>...

    > if ($search_string[$i] =~ /$first_part2$/) { #3 #
    > Match strings with first and last anchors


    Now as a commented before, comments that simply restate in English
    what you've said in Perl are bad.

    Once you'd cried-woolf several times with comments that restated what
    you'd said in the Perl I stopped reading the comments so I didn't even
    notice the disrepancy.

    Put the effort into getting the code right, leave out the restatement.
    If one sees a piece of code that does one thing and a comment along
    side that says it does somthing else one can't help sometimes
    mis-reading the code.

    As I also said, if you are anchoring both ends there's an alternative
    hash based approach that will be much more efficient if @search_string
    is large (although it'll probably not be as efficient as the scalar
    slurp approach if it isn't ever going to be large).

    Populate %search_string thus:

    my %search_string;
    while ( <SEARCH> ) {
    chomp;
    $search_string{$_}++;
    }

    (Note: =1 is faster than ++ but I find ++ more ideomatic).

    Then later:

    print '# ' if $search_string{$first_part};
     
    , Jan 31, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dave
    Replies:
    12
    Views:
    782
    Ioannis Vranos
    Dec 19, 2004
  2. Siki

    Programming Ideas/Exercises for Newbie

    Siki, May 22, 2004, in forum: C Programming
    Replies:
    1
    Views:
    443
    Allin Cottrell
    May 22, 2004
  3. Jasper
    Replies:
    17
    Views:
    185
    David H. Adler
    Jul 8, 2004
  4. Perl CGI project ideas

    , Nov 3, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    342
    krakle
    Nov 4, 2004
  5.  (Cody Houston)

    Learning Perl - resources and ideas

    (Cody Houston), Feb 8, 2005, in forum: Perl Misc
    Replies:
    5
    Views:
    113
    Sherm Pendley
    Feb 9, 2005
Loading...

Share This Page