Help me to Improve

Discussion in 'Perl Misc' started by SSS Develop, Oct 7, 2011.

  1. SSS Develop

    SSS Develop Guest

    Hello,

    The intention to ask question in this forum is to improve this piece
    of code (for performance and other best practices)

    I have couple of Web Applications - need to monitor the HTTP Response
    for them. Decided to use the Perl - the script should fetch the HTTP
    page from Web App within SLA time (Service Level). If the SLA did not
    met - then system should send alert. For sending alert, it interacts
    with command line tools like email/nagios.

    SLA's are in seconds.

    Typical file structure of application is as follow:

    /bin/httpresp.pl
    /conf/{app1.conf, app2.conf, ..}
    /logs/{app1.log, app2.log...}


    The script use configuration file for each web app, configuration file
    looks as below:

    ---------------
    url = https://www.example.com
    sla = 10
    logfile = example.com.prod

    -----------

    The script looks as follow:

    -------------------------------

    #!/usr/bin/perl

    use strict;
    use warnings;
    use AppConfig;
    use Parallel::ForkManager;
    use Log::Log4perl qw:)easy);
    use HTTP::Request::Common;
    use LWP::UserAgent;
    use HTTP::Cookies;
    use Benchmark ':hireswallclock';
    use Time::HiRes qw ( time alarm sleep );
    use IPC::Run('run'); # Not shown the use of it in this script, is
    useful to interact with nagios or run any other command line commands


    our %LOGFILE;
    our $APP_HOME = '/usr/local/application';
    our $HTTPRESP = 'httpresp';
    $LOGFILE{file} = $APP_HOME . "/" . $HTTPRESP . "/". 'logs/
    generic.log';




    my $MAX_PROC =
    100; # restricting to 100 process at this moment, but can be
    modified any time

    $SIG{ALRM} = \&response_sla;

    my $confdir = "$APP_HOME/$HTTPRESP/conf";
    my $pm = new Parallel::ForkManager($MAX_PROC);

    my $cnfhs = get_configurations();



    foreach my $key ( keys %$cnfhs ) {

    my $conf = $cnfhs->{$key};

    my $pid = $pm->start and next;
    my $logger = logger( $conf->get('logfile') );

    crawl_sites( $conf, $logger );
    $pm->finish;

    }
    $pm->wait_all_children;


    sub get_configurations {

    my @confs = glob( $confdir . "/*.conf" );
    my $confhash = {};
    foreach my $cnf (@confs) {
    my $config = AppConfig->new();
    $config->define('name=s');
    $config->define('url=s');
    $config->define('sla=s');
    $config->define('logfile=s');
    $config->define('appname=s');
    $config->file($cnf);
    $confhash->{$cnf} = $config;
    }

    return $confhash;

    }

    sub logger {
    my $logfilename = shift;
    $LOGFILE{file} = $logfilename;
    my $conf = q(
    log4perl.logger = INFO, FileApp
    log4perl.appender.FileApp =
    Log::Log4perl::Appender::File
    log4perl.appender.FileApp.filename = sub {getLogfilename();}
    log4perl.appender.FileApp.layout = PatternLayout
    log4perl.appender.FileApp.layout.ConversionPattern = %d> %m%n
    );

    # Initialize logging behaviour
    Log::Log4perl->init( \$conf );

    # Obtain a logger instance
    my $logger = get_logger();

    return $logger;

    }

    sub getLogfilename {

    return "$APP_HOME/$HTTPRESP/logs/" . $LOGFILE{file} . ".log";

    }

    sub crawl_sites {
    my ( $conf, $logger ) = @_;
    my $sla = $conf->get('sla');
    my $url = $conf->get('url');
    eval {
    alarm($sla);
    get_response( $url, $logger );
    alarm(0);
    };

    if ( $@ =~ /SLA: DID NOT MET/ ) {
    print "SLA DID NOT MET\n";
    //code_to_interact_with_nagios
    }else {
    //code_to_interact_with_nagios
    }

    }

    sub get_response {
    my ( $url, $logger ) = @_;

    $logger->info("Started monitoring: $url ");
    my $ua = new LWP::UserAgent;
    $ua->cookie_jar( HTTP::Cookies->new() );
    $ua->agent(
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.18) Gecko/
    20110614 Firefox/3.6.18'
    );
    $ua->ssl_opts( verify_hostname => 0 ); ##Skip verifying SSL
    Certificate

    my $t1 = Benchmark->new;

    my $response = $ua->request( GET $url);
    if ( $response->is_success ) {
    $logger->info("SUCCESS: Got the response"); # or whatever
    }
    else {
    $logger->info( "ERROR: the response was, " . $response-
    >status_line );

    }
    my $t2 = Benchmark->new;
    my $td = timediff( $t2, $t1 );
    print "Total Response time: " . timestr($td) . "\n";
    $logger->info( "Total Response time: " . timestr($td) );
    }


    sub response_sla {
    die "SLA: DID NOT MET";
    }

    --------------------------------

    Help me to improve this program - performance, scale to large (say 1k
    or 2 k web apps) number of apps..


    thank you for your time !

    --sss
    SSS Develop, Oct 7, 2011
    #1
    1. Advertising

  2. SSS Develop

    SSS Develop Guest

    Thank you so much Ben - this was the best feedback/review i got. I was
    not expecting such a detailed help. I am so happy and thank you so
    much !

    I have some comments below. I will correct the code and post the final
    version here later.


    On Oct 8, 4:47 am, Ben Morrow <> wrote:
    > Quoth SSS Develop <>:
    >
    > > Hello,

    >
    > > The intention to ask question in this forum is to improve this piece
    > > of code (for performance and other best practices)

    >
    > I have to say, it's a lot better than a lot of what gets posted here...
    >
    >
    >
    > > I have couple of Web Applications - need to monitor the HTTP Response
    > > for them. Decided to use the Perl - the script should fetch the HTTP
    > > page from Web App within SLA  time (Service Level). If the SLA did not
    > > met - then system should send alert. For sending alert, it interacts
    > > with command line tools like email/nagios.

    > <snip>
    >
    > > #!/usr/bin/perl

    >
    > > use strict;
    > > use warnings;
    > > use AppConfig;
    > > use Parallel::ForkManager;

    >
    > If the performance of this script is an issue, forking may not be the
    > best solution. It's pretty much always faster to use a non-blocking
    > single-process (or single-process-per-CPU) approach, using something
    > like POE. That would require restructuring the whole program, though, so
    > if P::FM works for you I should stick with it.
    >
    > > use Log::Log4perl qw:)easy);
    > > use HTTP::Request::Common;
    > > use LWP::UserAgent;
    > > use HTTP::Cookies;
    > > use Benchmark ':hireswallclock';
    > > use Time::HiRes qw ( time alarm sleep );
    > > use IPC::Run('run');   # Not shown the use of it in this script, is
    > > useful to interact with nagios or run any other command line commands

    >
    > > our %LOGFILE;
    > > our $APP_HOME = '/usr/local/application';
    > > our $HTTPRESP = 'httpresp';
    > > $LOGFILE{file} =  $APP_HOME . "/" . $HTTPRESP . "/". 'logs/
    > > generic.log';

    >
    > I see Tad's already mentioned you're using 'our' where you could be
    > using 'my'. I would add that there's no need for concatentation when you
    > can interpolate, and that you can assign directly to a hash too:
    >
    >     my $APP_HOME    = '/usr/local/application';
    >     my $HTTPRESP    = 'httpresp';
    >     my %LOGFILE     = (
    >         file    => "$APP_HOME/$HTTPRESP/logs/generic.log",
    >     );
    >
    > (Whether to use single or double quotes when nothing will be
    > interpolated is a matter of taste. I prefer double, but many people here
    > prefer single.)
    >
    > > my $MAX_PROC =
    > >   100; # restricting to 100 process at this moment, but can be
    > > modified any time

    >
    > > $SIG{ALRM} = \&response_sla;

    >
    > It's a little confusing to refer to a sub all the way down at the bottom
    > of the file, especially when it's so trivial it doesn't really need a
    > name:
    >
    >     $SIG{ALRM} = sub {
    >         die "SLA: DID NOT MET";
    >     };
    >
    > > my $confdir = "$APP_HOME/$HTTPRESP/conf";
    > > my $pm      = new Parallel::ForkManager($MAX_PROC);

    >
    > It's best to avoid dative method syntax (METHOD OBJECT ARGS or METHOD
    > CLASS ARGS), at least in part because it encourages the idea that there
    > is something special about the 'new' method.
    >
    >     my $pm = Parallel::ForkManager->new($MAX_PROC);
    >
    > (Yes, I know you just copied that from the documentation. Unfortunately
    > a lot of CPAN documentation uses older, now-discouraged idioms in the
    > examples.)
    >
    > > my $cnfhs = get_configurations();

    >
    > > foreach my $key ( keys %$cnfhs ) {

    >
    > >     my $conf = $cnfhs->{$key};

    >
    > You can write that loop as
    >
    >     while (my ($key, $conf) = each %$cnfhs) {
    >
    > and it'll even be more efficient on really large hashes. (Not enough
    > people remember about 'each'.)
    >
    >
    >
    > >     my $pid = $pm->start and next;
    > >     my $logger = logger( $conf->get('logfile') );

    >
    > >     crawl_sites( $conf, $logger );
    > >     $pm->finish;

    >
    > > }
    > > $pm->wait_all_children;

    >
    > > sub get_configurations {

    >
    > >     my @confs    = glob( $confdir . "/*.conf" );

    >
    > Passing a value into a sub in a file-scoped global is always worth
    > avoiding if you can. In this case it's easy: pass it in as a real
    > parameter instead.
    >
    > It may seem silly calling get_configurations($confdir) when you know
    > perfectly well that get_configurations can see that variable, but it
    > rapidly becomes less silly when you come back six months later and take
    > it out without remembering some sub half way down the file needed it.
    >
    > <snip>
    >
    > > sub logger {
    > >     my $logfilename = shift;
    > >     $LOGFILE{file} = $logfilename;
    > >     my $conf = q(
    > >         log4perl.logger                    = INFO, FileApp
    > >         log4perl.appender.FileApp          =
    > > Log::Log4perl::Appender::File
    > >         log4perl.appender.FileApp.filename = sub {getLogfilename();}

    >
    > <...later...>
    >
    > > sub getLogfilename {

    >
    > >     return "$APP_HOME/$HTTPRESP/logs/" . $LOGFILE{file} . ".log";

    >
    > > }

    >
    > It's not really clear to me what you're trying to do here with
    > $LOGFILE{file}, but it can almost certainly be done less confusingly.
    > Since you don't appear to use it anywhere else, is there any reason not
    > to just put the correct filename straight in the log4perl config?
    >


    [sssdevelop] I am creating log file for each forked process (per web
    app one log file). method logger accepts logfile name as arguments,
    that's used to change the "file" value of LOGFILE, thats then used
    inside getLogfilename().
    I really wanted to interpolate logfilename in appender. Now i got the
    clue, i will use "qq" instead of single "q" :)



    >     my $logpath = "$APP_HOME/$HTTPRESP/logs/$logfilename.log";
    >     my $conf    = qq(
    >         log4perl.logger             = INFO, FileApp
    >         log4perl.appender.FileApp   = Log::Log4perl::Appender::File
    >         log4perl.appender.FileApp.filename = $logpath
    >         ...
    >     );
    >
    > (Notice I've changed 'q()' to 'qq()'. I said I preferred double quotes :)..)
    >
    > I'd also want to tidy up this "$APP_HOME/$HTTPRESP" prefix that keeps
    > turning up. Probably I'd just chdir to that directory and use relative
    > paths.
    >
    > >         log4perl.appender.FileApp.layout   = PatternLayout
    > >         log4perl.appender.FileApp.layout.ConversionPattern = %d> %m%n
    > >     );

    >
    > >     # Initialize logging behaviour
    > >     Log::Log4perl->init( \$conf );

    >
    > >     # Obtain a logger instance
    > >     my $logger = get_logger();

    >
    > I see no 'sub get_logger' here...
    >
    > ...oh, I see, it's from Log4perl. I don't think you should be mixing the
    > normal and the 'easy' interfaces like that. Since you've got as far as
    > setting up a custom logger config, use the real method so people reading
    > the code have some idea what's going on:
    >
    >     return Log::Log4perl->get_logger();
    >
    > >     return $logger;

    >
    > > }

    >




    [sssdevelop] - yup, agree with this. I will change it!


    > > sub crawl_sites {
    > >     my ( $conf, $logger ) = @_;
    > >     my $sla = $conf->get('sla');
    > >     my $url = $conf->get('url');
    > >     eval {
    > >         alarm($sla);

    >
    > Once we get as far as this 'alarm' statement, we've entirely forgotten
    > about the $SIG{ALRM} assignment at the top of the file that said what
    > the alarm will actually *do*. It's also better, where possible, to write
    > subs so they don't rely on or change global values. Generally this means
    > that if you need to use one of the magic Perl globals, you need to
    > 'local' it.
    >
    >     eval {
    >         local $SIG{ALRM} = sub { die "SLA NOT MET" };
    >         alarm($sla);
    >
    > >         get_response( $url, $logger );
    > >         alarm(0);

    >


    [sssdevelop] Now i see the importance of this, thanks for this
    advice.

    > You have a race condition here. It's entirely possible the alarm will
    > fire after get_response has received a successful response but before it
    > has returned and the alarm has been deactivated. In this case, you may
    > not care (if it was that close to the deadline you may want to count it
    > as 'late' anyway), but it needs considering. If you have, in fact,
    > considered it, you should put in a comment so you remember not to
    > consider it again :).
    >
    > >     };

    >
    > You need to cancel the alarm here, as well. get_response may die for
    > some reason other than an alarm, and if it does the alarm will still go
    > off, but nothing will catch the exception. (There is another race here,
    > of course, but it's rather hard to avoid. Alarms are tricky.)
    >



    [sssdevelop] Ah.. i did not put much thought on race happening here.
    Need to work on this. this was good advice here :)


    > <snip>
    >
    > > sub get_response {
    > >     my ( $url, $logger ) = @_;

    >
    > >     $logger->info("Started monitoring: $url ");
    > >     my $ua = new LWP::UserAgent;
    > >     $ua->cookie_jar( HTTP::Cookies->new() );
    > >     $ua->agent(
    > > 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.18) Gecko/
    > > 20110614 Firefox/3.6.18'
    > >     );
    > >     $ua->ssl_opts( verify_hostname => 0 );    ##Skip verifying SSL
    > > Certificate

    >
    > You don't need to re-do all this every time. Create a LWP::UA object as
    > part of your script initialization, then pass it in to get_response.
    >


    [sssdevelop] yup, will add single LWP::UA object ! this is helpful.

    > >     my $t1 = Benchmark->new;

    >
    > >     my $response = $ua->request( GET $url);
    > >     if ( $response->is_success ) {
    > >         $logger->info("SUCCESS: Got the response");    # orwhatever
    > >     }
    > >     else {
    > >         $logger->info( "ERROR: the response was, " . $response-
    > > >status_line );

    > >     }
    > >     my $t2 = Benchmark->new;
    > >     my $td = timediff( $t2, $t1 );
    > >     print "Total Response time: " . timestr($td) . "\n";
    > >     $logger->info( "Total Response time: " . timestr($td) );

    >
    > Since you're timing the response anyway, it's probably better to rely on
    > that measurement, rather than the alarm, to see whether it's gone over
    > its SLA. You might still want to set an alarm for something like twice
    > the SLA, just in case the request never returns for some reason (though
    > LWP will timeout for you eventually).
    >
    > You might also want to look at LWPx::TimedHTTP, which gives detailed
    > timings of the various stages of the request cycle.
    >


    [sssdevelop] - thanks, will surely check this LWPx::TimedHTTP.
    Ben, appreciate your help and thank you so much for
    your time.


    ----sssdevelop


    > Ben
    SSS Develop, Oct 8, 2011
    #2
    1. Advertising

  3. Ben Morrow <> writes:

    [...]

    > I see Tad's already mentioned you're using 'our' where you could be
    > using 'my'. I would add that there's no need for concatentation when you
    > can interpolate, and that you can assign directly to a hash too:
    >
    > my $APP_HOME = '/usr/local/application';
    > my $HTTPRESP = 'httpresp';
    > my %LOGFILE = (
    > file => "$APP_HOME/$HTTPRESP/logs/generic.log",
    > );
    >
    > (Whether to use single or double quotes when nothing will be
    > interpolated is a matter of taste.


    Using double-quotes means the compiler has to analyze the string order
    to determine if and how something needs to be interpolated into
    it. Otherwise, this effort can be avoided and also the effort the next
    person who looks at the code needs to come to the same result.

    [...]

    >> my $confdir = "$APP_HOME/$HTTPRESP/conf";
    >> my $pm = new Parallel::ForkManager($MAX_PROC);

    >
    > It's best to avoid dative method syntax (METHOD OBJECT ARGS or METHOD
    > CLASS ARGS), at least in part because it encourages the idea that there
    > is something special about the 'new' method.
    >
    > my $pm = Parallel::ForkManager->new($MAX_PROC);
    >
    > (Yes, I know you just copied that from the documentation. Unfortunately
    > a lot of CPAN documentation uses older, now-discouraged idioms in the
    > examples.)


    In this particular case, the 'indirect object' syntax is actually even
    'discouraged' by the corresponding Perl documentation. More generally,
    however, the sentence above parses as "a lot of people who wrote
    useful code disagreed with some (or all) opinions
    $random_people_on_usenet had regarding what they should have done".

    >> my $cnfhs = get_configurations();
    >>
    >> foreach my $key ( keys %$cnfhs ) {
    >>
    >> my $conf = $cnfhs->{$key};

    >
    > You can write that loop as
    >
    > while (my ($key, $conf) = each %$cnfhs) {
    >
    > and it'll even be more efficient on really large hashes. (Not enough
    > people remember about 'each'.)


    Except in case of insanely large hashes (> 500,000 keys, according to
    some tests I did a while back), getting the list of keys and
    traversing the hash 'open coded' will be faster than calling Perl
    subroutine per key. OTOH, the latter should need less memory.

    [...]


    >> sub get_configurations {
    >>
    >> my @confs = glob( $confdir . "/*.conf" );

    >
    > Passing a value into a sub in a file-scoped global is always worth
    > avoiding if you can. In this case it's easy: pass it in as a real
    > parameter instead.


    Doing something because it can be done is just about the most awful
    reason for doing it which can be imagined, especially in code where
    more or less everything can be done.

    > It may seem silly calling get_configurations($confdir) when you know
    > perfectly well that get_configurations can see that variable, but it
    > rapidly becomes less silly when you come back six months later and take
    > it out without remembering some sub half way down the file needed
    > it.


    It is silly to remove something without checking that it isn't needed
    and even more silly to remove something and not even run the changed
    file through the compiler before assuming that the change was ok. And
    with 'strict', such a change won't go through the compiler.

    >
    > <snip>
    >> sub logger {
    >> my $logfilename = shift;
    >> $LOGFILE{file} = $logfilename;
    >> my $conf = q(
    >> log4perl.logger = INFO, FileApp
    >> log4perl.appender.FileApp =
    >> Log::Log4perl::Appender::File
    >> log4perl.appender.FileApp.filename = sub {getLogfilename();}

    >
    > <...later...>
    >> sub getLogfilename {
    >>
    >> return "$APP_HOME/$HTTPRESP/logs/" . $LOGFILE{file} . ".log";
    >>
    >> }

    >
    > It's not really clear to me what you're trying to do here with
    > $LOGFILE{file}, but it can almost certainly be done less confusingly.
    > Since you don't appear to use it anywhere else, is there any reason not
    > to just put the correct filename straight in the log4perl config?


    The subroutine above is a specific algorithm with a generic name and
    the usual reason for functional decomposition (didn't someboday claim
    that was acutally used in practice not that long ago) is that it
    simplifies the code at each level abstraction by moving details
    which are not relevant for the present algorithm elsewhere.
    Rainer Weikusat, Oct 9, 2011
    #3
  4. Rainer Weikusat <> writes:

    > Ben Morrow <> writes:
    >
    >
    >>> my $confdir = "$APP_HOME/$HTTPRESP/conf";
    >>> my $pm = new Parallel::ForkManager($MAX_PROC);

    >>
    >> It's best to avoid dative method syntax (METHOD OBJECT ARGS or METHOD
    >> CLASS ARGS), at least in part because it encourages the idea that there
    >> is something special about the 'new' method.
    >>
    >> my $pm = Parallel::ForkManager->new($MAX_PROC);
    >>
    >> (Yes, I know you just copied that from the documentation. Unfortunately
    >> a lot of CPAN documentation uses older, now-discouraged idioms in the
    >> examples.)

    >
    > In this particular case, the 'indirect object' syntax is actually even
    > 'discouraged' by the corresponding Perl documentation. More generally,
    > however, the sentence above parses as "a lot of people who wrote
    > useful code disagreed with some (or all) opinions
    > $random_people_on_usenet had regarding what they should have done".
    >

    There are actually good, technical, reasons why the indirect syntax is
    being discouraged.

    But do not let that stop you from ranting at what you perceive to be
    irrational authoritarianism. The rest of us will enjoy our code without
    hard to find bugs.

    Mart

    --
    "We will need a longer wall when the revolution comes."
    --- AJS, quoting an uncertain source.
    Mart van de Wege, Oct 9, 2011
    #4
  5. Rainer Weikusat <> wrote:
    >Ben Morrow <> writes:
    >> Passing a value into a sub in a file-scoped global is always worth
    >> avoiding if you can. In this case it's easy: pass it in as a real
    >> parameter instead.

    >
    >Doing something because it can be done is just about the most awful
    >reason for doing it which can be imagined, especially in code where
    >more or less everything can be done.


    Are you arguing in favour of or against global variables?

    jue
    Jürgen Exner, Oct 9, 2011
    #5
  6. Ben Morrow <> wrote:
    >In message <31836.1270260026@chthon>, <> wrote:
    >|
    >| Also, I see you keep using single quotes for strings that don't need them.
    >| Why? Those aren't really a generic string, you know. They loudly scream
    >| "DO NOT INTERPOLATE ANYTHING HERE!!!", so I always scrutinize them extra
    >| hard to try to figure out what it is that you're trying to suppress. Plus
    >| I hate having to deal with backslash escapes that aren't. Perl strings
    >| normally interpolate, and it takes something special to suppress that:
    >|
    >| $string = $normal ; # interpolate values normally
    >| $string = "$normal" ; # interpolate values normally


    By some definition of normally. This assignment actually stringifies
    $normal which most often is not the desired action when dealing with
    e.g. numbers. Therefore you could argue if this is the "normal" or
    "naively expected" semantic.

    >| $string = `$normal` ; # interpolate values normally
    >| $string = '$abnormal' ; # DON'T INTERPOLATE VALUES!!!


    I like to think of '...' as the literal operator or the literal quotes.
    They don't munge their text.

    >| $string = \$abnormal ; # DON'T INTERPOLATE VALUES!!!
    >| $string = "\$abnormal" ; # DON'T INTERPOLATE VALUES!!!


    Actually example does interpolated. It is interpolates \$ as the literal
    dollar sign.

    etc, etc.

    jue
    Jürgen Exner, Oct 10, 2011
    #6
  7. SSS Develop

    SSS Develop Guest

    Very healthy discussions ! thanks everyone for participating !
    (will post the final version soon :))
    thank you,

    sssdevelop

    On Oct 10, 5:09 am, Ben Morrow <> wrote:
    > Quoth Rainer Weikusat <>:
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > > Ben Morrow <> writes:

    >
    > > > It's best to avoid dative method syntax (METHOD OBJECT ARGS or METHOD
    > > > CLASS ARGS), at least in part because it encourages the idea that there
    > > > is something special about the 'new' method.

    >
    > > >     my $pm = Parallel::ForkManager->new($MAX_PROC);

    >
    > > > (Yes, I know you just copied that from the documentation. Unfortunately
    > > > a lot of CPAN documentation uses older, now-discouraged idioms in the
    > > > examples.)

    >
    > > In this particular case, the 'indirect object' syntax is actually even
    > > 'discouraged' by the corresponding Perl documentation. More generally,
    > > however, the sentence above parses as "a lot of people who wrote
    > > useful code disagreed with some (or all) opinions
    > > $random_people_on_usenet had regarding what they should have done".

    >
    > I know. I was aware when I wrote that paragraph that I was weaselling
    > out of properly explaining what's wrong with the dative syntax, and the
    > reason I was doing that was that I can never remember an example of
    > where it goes wrong :). I can give you *lots* of examples where Perl
    > assumes I'm making a dative method call when I didn't want it to, but
    > that's a reason for not putting it in the language at all, rather than
    > one for not using it if its there.
    >
    > I believe the canonical example is
    >
    >     my $rv = method $obj;
    >
    > with a 'sub method' in the current package. While that is a problem
    > (since it will parse differently depending on the presence of the sub)
    > it is easily disambiguated:
    >
    >     my $rv = method $obj ();
    >
    > So, can anyone remind me of when this actually goes wrong?
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > > >> my $cnfhs = get_configurations();

    >
    > > >> foreach my $key ( keys %$cnfhs ) {

    >
    > > >>     my $conf = $cnfhs->{$key};

    >
    > > > You can write that loop as

    >
    > > >     while (my ($key, $conf) = each %$cnfhs) {

    >
    > > > and it'll even be more efficient on really large hashes. (Not enough
    > > > people remember about 'each'.)

    >
    > > Except in case of insanely large hashes (> 500,000 keys, according to
    > > some tests I did a while back), getting the list of keys and
    > > traversing the hash 'open coded' will be faster than calling Perl
    > > subroutine per key. OTOH, the latter should need less memory.

    >
    > Interesting. The efficiency wasn't the point, of course. The point was
    > to get key and value in one go.
    >
    > When you say 'calling Perl subroutine per key' do you mean just the call
    > to 'each'? 'each' isn't a subroutine, of course, but the while loop does
    > have to run some Perl every iteration rather than just popping the
    > stack.
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > > > <snip>
    > > >> sub logger {
    > > >>     my $logfilename = shift;
    > > >>     $LOGFILE{file} = $logfilename;
    > > >>     my $conf = q(
    > > >>         log4perl.logger                    = INFO, FileApp
    > > >>         log4perl.appender.FileApp          =
    > > >> Log::Log4perl::Appender::File
    > > >>         log4perl.appender.FileApp.filename = sub {getLogfilename();}

    >
    > > > <...later...>
    > > >> sub getLogfilename {

    >
    > > >>     return "$APP_HOME/$HTTPRESP/logs/" . $LOGFILE{file} . ".log";

    >
    > > >> }

    >
    > > > It's not really clear to me what you're trying to do here with
    > > > $LOGFILE{file}, but it can almost certainly be done less confusingly.
    > > > Since you don't appear to use it anywhere else, is there any reason not
    > > > to just put the correct filename straight in the log4perl config?

    >
    > > The subroutine above is a specific algorithm with a generic name and
    > > the usual reason for functional decomposition (didn't someboday claim
    > > that was acutally used in practice not that long ago) is that it
    > > simplifies the code at each level abstraction by moving details
    > > which are not relevant for the present algorithm elsewhere.

    >
    > For there to be any merit to that argument, getLogFilename would have to
    > be independant of logger. Since logger sets a global read by
    > getLogFilename, this obviously isn't the case. If you were to rewrite
    > the code so it looked like
    >
    >     sub logger {
    >         my $logfilename = shift;
    >         my $logpath     = getLogFilename($logfilename);
    >         my $conf        = qq(
    >             ...filename = $logpath
    >
    > I would consider that a sensible arrangement. You can argue about
    > whether or not the mapping from basename to full path should be a
    > separate function: in this case, since it's neither particularly
    > complicated nor something that can be changed at runtime ISTM better to
    > put the expression inline, but it makes little difference.
    >
    > What I was complaining about was the extremely convoluted data path that
    > went like this:
    >
    >     - stuff a value into (the sole element of) a global;
    >     - create a string containing the Perl code for a sub call;
    >     - pass that string to a method, which will
    >         - parse out the sub call;
    >         - call the sub, which will
    >             - pull the value out of the global;
    >             - interpolate it into a path;
    >             - return that path;
    >         - use the returned value as a log filename.
    >
    > I was recommending something more like
    >
    >     - interpolate the basename into a path;
    >     - interpolate that path into a config string;
    >     - pass that string to a method, which will
    >         - use it as a log filename.
    >
    > Whether the first step is a sub call or a direct expression is
    > irrelevant.
    >
    > Ben
    SSS Develop, Oct 10, 2011
    #7
  8. with <> Ben Morrow wrote:
    >
    > Quoth Tad McClellan <>:
    >> Rainer Weikusat <> wrote:
    >> > Ben Morrow <> writes:

    *SKIP*
    >>> Using double-quotes means the compiler has to analyze the string order
    >>> to determine if and how something needs to be interpolated into
    >>> it. Otherwise, this effort can be avoided and also the effort the next
    >>> person who looks at the code needs to come to the same result.

    *SKIP*
    >> I, errr, agree with Rainer here, more for his second reason
    >> that for his first.
    >>
    >> I am always looking to optimize for maintenance.
    >>
    >> Using double quotes when nothing is intended to be interpolated
    >> is in _bad_ taste. :)

    >
    > Well, at least I'm not alone:


    So the double-quote-war has begun? Let me introduce some fuel to that
    so far ceased fire

    my $aa;
    my $ab = 'abc';

    $aa = eval q|"$ab"|
    $aa = eval q|'$ab'|
    $aa = eval q|"\\$ab"|
    $aa = eval q|'def'|
    #!/usr/bin/perl

    use strict;
    use warnings;
    use Benchmark qw{ cmpthese timethese };

    my $aa;
    my $ab = 'abc';

    cmpthese timethese -5, {
    code00 => sub { $aa = eval q|"$ab"| },
    code01 => sub { $aa = eval q|'$ab'| },
    code02 => sub { $aa = eval q|"\\$ab"| },
    code03 => sub { $aa = eval q|'def'| },
    };

    __END__
    Benchmark: running code00, code01, code02, code03 for at least 5 CPU seconds...
    code00: 6 wallclock secs ( 5.00 usr + 0.02 sys = 5.02 CPU) @ 64101.99/s (n=321792)
    code01: 6 wallclock secs ( 5.18 usr + 0.00 sys = 5.18 CPU) @ 96095.95/s (n=497777)
    code02: 6 wallclock secs ( 5.24 usr + 0.00 sys = 5.24 CPU) @ 69054.58/s (n=361846)
    code03: 5 wallclock secs ( 5.34 usr + 0.00 sys = 5.34 CPU) @ 94380.15/s (n=503990)
    Rate code00 code02 code03 code01
    code00 64102/s -- -7% -32% -33%
    code02 69055/s 8% -- -27% -28%
    code03 94380/s 47% 37% -- -2%
    code01 96096/s 50% 39% 2% --

    Hopefully, I did leaning sticks correctly in 'code02'.
    *CUT*

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Oct 10, 2011
    #8
  9. Jürgen Exner <> writes:
    > Rainer Weikusat <> wrote:
    >>Ben Morrow <> writes:
    >>> Passing a value into a sub in a file-scoped global is always worth
    >>> avoiding if you can. In this case it's easy: pass it in as a real
    >>> parameter instead.

    >>
    >>Doing something because it can be done is just about the most awful
    >>reason for doing it which can be imagined, especially in code where
    >>more or less everything can be done.

    >
    > Are you arguing in favour of or against global variables?


    I'm arguing against 'do this whenever you can' 'design recipes'. As to
    'global variables': A (non OO) module composed of some subroutines and
    some shared state variables these subroutines work with is
    structurally similar/ identical (isomorph) to an object with some
    instance variables and some methods working with these. Programs can
    often be decomposed into relatively independent functional units
    providing services to each other by invoking interface routines. These
    independent functional units will usually be stateful, meaning, they
    will have a set of internal variables with 'state information'. I see
    no ground for an argument 'for' or 'against' something except maybe at
    the completely different level of 'should computer programs work like
    other, complex real-world objects', IOW, have state which changes over
    time, or should fat runtimes be used to provide the illusion that it
    wasn't in this way.
    Rainer Weikusat, Oct 10, 2011
    #9
  10. SSS Develop

    Guest

    Tad McClellan <> wrote:
    > Rainer Weikusat <> wrote:
    > > Ben Morrow <> writes:
    > >
    > > [...]
    > >
    > >> I see Tad's already mentioned you're using 'our' where you could be
    > >> using 'my'. I would add that there's no need for concatentation when
    > >> you can interpolate, and that you can assign directly to a hash too:
    > >>
    > >> my $APP_HOME = '/usr/local/application';
    > >> my $HTTPRESP = 'httpresp';
    > >> my %LOGFILE = (
    > >> file => "$APP_HOME/$HTTPRESP/logs/generic.log",
    > >> );
    > >>
    > >> (Whether to use single or double quotes when nothing will be
    > >> interpolated is a matter of taste.

    > >
    > > Using double-quotes means the compiler has to analyze the string order
    > > to determine if and how something needs to be interpolated into
    > > it.


    Sure, but who cares?

    > > Otherwise, this effort can be avoided and also the effort the next
    > > person who looks at the code needs to come to the same result.


    Why would the person need to do that? I find it more common that I need to
    interpolate where I (or someone) didn't before, and now I have to inspect
    the whole string to find out what it was that motivated the person to make
    it a single quote rather than double in the first place, only to find out
    there was no reason for it.

    >
    > The Moon is blue...
    >
    > I, errr, agree with Rainer here, more for his second reason
    > that for his first.
    >
    > I am always looking to optimize for maintenance.


    Using single quotes because the content does not *currently* require
    interpolation is not optimizing for maintenance. Optimizing for
    maintenance would require you to predict how the usage is likely to change
    in the future, and using whatever style of quotes is less likely to need to
    change under that theory of the future.


    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
    , Oct 11, 2011
    #10
  11. writes:
    > Tad McClellan <> wrote:
    >> Rainer Weikusat <> wrote:
    >> > Ben Morrow <> writes:
    >> >
    >> > [...]
    >> >
    >> >> I see Tad's already mentioned you're using 'our' where you could be
    >> >> using 'my'. I would add that there's no need for concatentation when
    >> >> you can interpolate, and that you can assign directly to a hash too:
    >> >>
    >> >> my $APP_HOME = '/usr/local/application';
    >> >> my $HTTPRESP = 'httpresp';
    >> >> my %LOGFILE = (
    >> >> file => "$APP_HOME/$HTTPRESP/logs/generic.log",
    >> >> );
    >> >>
    >> >> (Whether to use single or double quotes when nothing will be
    >> >> interpolated is a matter of taste.
    >> >
    >> > Using double-quotes means the compiler has to analyze the string order
    >> > to determine if and how something needs to be interpolated into
    >> > it.

    >
    > Sure, but who cares?


    I assume that you got this wrong by accident. It should have been
    "Sure, but I don't care". Why precisely you consider this to be
    noteworthy enough to post it escapes me, however.

    >> > Otherwise, this effort can be avoided and also the effort the next
    >> > person who looks at the code needs to come to the same result.

    >
    > Why would the person need to do that?


    In order to determine what will become of the string.

    > I find it more common that I need to interpolate where I (or
    > someone) didn't before, and now I have to inspect
    > the whole string to find out what it was that motivated the person to make
    > it a single quote rather than double in the first place, only to find out
    > there was no reason for it.


    But there was a reason for it, this reason being given above. And if
    you need to change the string, you'll have to inspect it anyway, in

    [...]

    >> I, errr, agree with Rainer here, more for his second reason
    >> that for his first.
    >>
    >> I am always looking to optimize for maintenance.

    >
    > Using single quotes because the content does not *currently* require
    > interpolation is not optimizing for maintenance. Optimizing for
    > maintenance would require you to predict how the usage is likely to change
    > in the future, and using whatever style of quotes is less likely to need to
    > change under that theory of the future.


    If 'optimizing for maintenance' requires 'predicting the future' then
    it can't be done because the future can't be predicted.
    Rainer Weikusat, Oct 11, 2011
    #11
  12. Ben Morrow <> writes:
    > Quoth Rainer Weikusat <>:
    >> writes:
    >> > Tad McClellan <> wrote:
    >> >> Rainer Weikusat <> wrote:
    >> >> >
    >> >> > Using double-quotes means the compiler has to analyze the string order
    >> >> > to determine if and how something needs to be interpolated into
    >> >> > it.
    >> >
    >> > Sure, but who cares?

    >>
    >> I assume that you got this wrong by accident. It should have been
    >> "Sure, but I don't care". Why precisely you consider this to be
    >> noteworthy enough to post it escapes me, however.

    >
    > No, it should've been 'No, it doesn't'.


    That's not possible, given that the compiler needs to determine what
    needs to be interpolated into which locations of the string.

    > When the compiler is scanning a
    > q// string, it runs along looking for a \ or a final delimiter. When
    > it's scanning a qq// string, it does exactly the same thing, looking for
    > $ or @ as well. In particular, it does *not* pull out the whole string
    > up to the closing delimiter and then rescan it looking for
    > interpolations:


    Nobody except you wrote that. And - of course - that's exactly what
    perl is doing, judgeing from a rather cursory look at toke.c: First,
    a 'string token' is created by calling scan_str, then, the string
    created in this way is scanned for 'characters requiring special
    processing' and if one is found, the 'op' which is being created is
    changed from OP_CONST to OP_STRINGIFY, cf

    case '"':
    s = scan_str(s,!!PL_madskills,FALSE);
    DEBUG_T( { printbuf("### Saw string before %s\n", s); } );
    if (PL_expect == XOPERATOR) {
    if (PL_lex_formbrack && PL_lex_brackets == PL_lex_formbrack) {
    PL_expect = XTERM;
    deprecate_old(commaless_variable_list);
    return REPORT(','); /* grandfather non-comma-format format */
    }
    else
    no_op("String",s);
    }
    if (!s)
    missingterm(NULL);
    pl_yylval.ival = OP_CONST;
    /* FIXME. I think that this can be const if char *d is replaced by
    more localised variables. */
    for (d = SvPV(PL_lex_stuff, len); len; len--, d++) {
    if (*d == '$' || *d == '@' || *d == '\\' || !UTF8_IS_INVARIANT((U8)*d)) {
    pl_yylval.ival = OP_STRINGIFY;
    break;
    }
    }
    TERM(sublex_start());

    [Perl_yylex, toke.c]

    [...]

    > In any case, the difference is almost certainly unmeasurable, so it
    > doesn't matter.


    That should probably also been "I don't care about that" instead of
    "it doesn't matter". Also, the difference is measurable using
    sufficiently long strings. Eg, on the computer where I tested this
    (3.2Ghz Intel i3), creating a subroutine returning a string of 8192 As
    with that string being "-quoted is about 1.4 times slower than
    when string is '-quoted. Granted, that's a contrived example and the
    absolute numbers are really low (4.75E-5s vs 3.37E-5s for a single
    invocation of the function, IOW, the per-character cost is in the
    nanosecond range).

    >> >> > Otherwise, this effort can be avoided and also the effort the next
    >> >> > person who looks at the code needs to come to the same result.
    >> >
    >> > Why would the person need to do that?

    >>
    >> In order to determine what will become of the string.

    >
    > Under normal circumstances, if I see something looking like $foo it
    > represents a variable. The only exceptions are inside a single-quoted
    > string, and \-escaped inside a double-quoted string. \-escaping is so
    > universal I don't have to think about it any more, so the case which
    > trips me up is a (non-\-escaped) $foo inside a single-quoted string.


    The common way to read a (n English) text is to start at the top and
    read each line from left to right until the bottom has been
    reached and using shortcuts based on assumptions regarding what the
    content of the text will be is bound to lead to misunderstandings.

    >> But there was a reason for it, this reason being given above. And if
    >> you need to change the string, you'll have to inspect it anyway, in

    > [snip]
    >>
    >> If 'optimizing for maintenance' requires 'predicting the future' then
    >> it can't be done because the future can't be predicted.

    >
    > You really don't understand about maintainable programming, do you? Yes,
    > it requires predicting the future.


    If it requires predicting the future than it can't be done. This is
    nothing to do with conjectures about my understanding of anything but
    is a fact: The future can't be predicted.
    Rainer Weikusat, Oct 11, 2011
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. alan
    Replies:
    2
    Views:
    307
    Wendy S
    Aug 22, 2003
  2. Aaron Fude

    Help me improve parsing trick?

    Aaron Fude, Dec 16, 2004, in forum: Java
    Replies:
    7
    Views:
    333
    Rogan Dawes
    Dec 22, 2004
  3. gbattine
    Replies:
    8
    Views:
    428
    Philipp Leitner
    Jul 16, 2006
  4. Raymond Hettinger

    Help improve Python -- Call for reviewers

    Raymond Hettinger, Aug 27, 2003, in forum: Python
    Replies:
    0
    Views:
    278
    Raymond Hettinger
    Aug 27, 2003
  5. Replies:
    14
    Views:
    525
Loading...

Share This Page