Help me to Improve

SSS Develop · Oct 7, 2011

Hello,

The intention to ask question in this forum is to improve this piece
of code (for performance and other best practices)

I have couple of Web Applications - need to monitor the HTTP Response
for them. Decided to use the Perl - the script should fetch the HTTP
page from Web App within SLA time (Service Level). If the SLA did not
met - then system should send alert. For sending alert, it interacts
with command line tools like email/nagios.

SLA's are in seconds.

Typical file structure of application is as follow:

/bin/httpresp.pl
/conf/{app1.conf, app2.conf, ..}
/logs/{app1.log, app2.log...}

The script use configuration file for each web app, configuration file
looks as below:

---------------
url = https://www.example.com
sla = 10
logfile = example.com.prod

-----------

The script looks as follow:

-------------------------------

#!/usr/bin/perl

use strict;
use warnings;
use AppConfig;
use Parallel::ForkManager;
use Log::Log4perl qw

easy);
use HTTP::Request::Common;
use LWP::UserAgent;
use HTTP::Cookies;
use Benchmark ':hireswallclock';
use Time::HiRes qw ( time alarm sleep );
use IPC::Run('run'); # Not shown the use of it in this script, is
useful to interact with nagios or run any other command line commands

our %LOGFILE;
our $APP_HOME = '/usr/local/application';
our $HTTPRESP = 'httpresp';
$LOGFILE{file} = $APP_HOME . "/" . $HTTPRESP . "/". 'logs/
generic.log';

my $MAX_PROC =
100; # restricting to 100 process at this moment, but can be
modified any time

$SIG{ALRM} = \&response_sla;

my $confdir = "$APP_HOME/$HTTPRESP/conf";
my $pm = new Parallel::ForkManager($MAX_PROC);

my $cnfhs = get_configurations();

foreach my $key ( keys %$cnfhs ) {

my $conf = $cnfhs->{$key};

my $pid = $pm->start and next;
my $logger = logger( $conf->get('logfile') );

crawl_sites( $conf, $logger );
$pm->finish;

}
$pm->wait_all_children;

sub get_configurations {

my @confs = glob( $confdir . "/*.conf" );
my $confhash = {};
foreach my $cnf (@confs) {
my $config = AppConfig->new();
$config->define('name=s');
$config->define('url=s');
$config->define('sla=s');
$config->define('logfile=s');
$config->define('appname=s');
$config->file($cnf);
$confhash->{$cnf} = $config;
}

return $confhash;

}

sub logger {
my $logfilename = shift;
$LOGFILE{file} = $logfilename;
my $conf = q(
log4perl.logger = INFO, FileApp
log4perl.appender.FileApp =
Log::Log4perl::Appender::File
log4perl.appender.FileApp.filename = sub {getLogfilename();}
log4perl.appender.FileApp.layout = PatternLayout
log4perl.appender.FileApp.layout.ConversionPattern = %d> %m%n
);

# Initialize logging behaviour
Log::Log4perl->init( \$conf );

# Obtain a logger instance
my $logger = get_logger();

return $logger;

}

sub getLogfilename {

return "$APP_HOME/$HTTPRESP/logs/" . $LOGFILE{file} . ".log";

}

sub crawl_sites {
my ( $conf, $logger ) = @_;
my $sla = $conf->get('sla');
my $url = $conf->get('url');
eval {
alarm($sla);
get_response( $url, $logger );
alarm(0);
};

if ( $@ =~ /SLA: DID NOT MET/ ) {
print "SLA DID NOT MET\n";
//code_to_interact_with_nagios
}else {
//code_to_interact_with_nagios
}

}

sub get_response {
my ( $url, $logger ) = @_;

$logger->info("Started monitoring: $url ");
my $ua = new LWP::UserAgent;
$ua->cookie_jar( HTTP::Cookies->new() );
$ua->agent(
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.18) Gecko/
20110614 Firefox/3.6.18'
);
$ua->ssl_opts( verify_hostname => 0 ); ##Skip verifying SSL
Certificate

my $t1 = Benchmark->new;

my $response = $ua->request( GET $url);
if ( $response->is_success ) {
$logger->info("SUCCESS: Got the response"); # or whatever
}
else {
$logger->info( "ERROR: the response was, " . $response-

status_line );

}
my $t2 = Benchmark->new;
my $td = timediff( $t2, $t1 );
print "Total Response time: " . timestr($td) . "\n";
$logger->info( "Total Response time: " . timestr($td) );
}

sub response_sla {
die "SLA: DID NOT MET";
}

--------------------------------

Help me to improve this program - performance, scale to large (say 1k
or 2 k web apps) number of apps..

thank you for your time !

--sss

SSS Develop · Oct 8, 2011

Thank you so much Ben - this was the best feedback/review i got. I was
not expecting such a detailed help. I am so happy and thank you so
much !

I have some comments below. I will correct the code and post the final
version here later.

I have to say, it's a lot better than a lot of what gets posted here...

If the performance of this script is an issue, forking may not be the
best solution. It's pretty much always faster to use a non-blocking
single-process (or single-process-per-CPU) approach, using something
like POE. That would require restructuring the whole program, though, so
if P::FM works for you I should stick with it.

I see Tad's already mentioned you're using 'our' where you could be
using 'my'. I would add that there's no need for concatentation when you
can interpolate, and that you can assign directly to a hash too:

my $APP_HOME = '/usr/local/application';
my $HTTPRESP = 'httpresp';
my %LOGFILE = (
file => "$APP_HOME/$HTTPRESP/logs/generic.log",
);

(Whether to use single or double quotes when nothing will be
interpolated is a matter of taste. I prefer double, but many people here
prefer single.)

It's a little confusing to refer to a sub all the way down at the bottom
of the file, especially when it's so trivial it doesn't really need a
name:

$SIG{ALRM} = sub {
die "SLA: DID NOT MET";
};

It's best to avoid dative method syntax (METHOD OBJECT ARGS or METHOD
CLASS ARGS), at least in part because it encourages the idea that there
is something special about the 'new' method.

my $pm = Parallel::ForkManager->new($MAX_PROC);

(Yes, I know you just copied that from the documentation. Unfortunately
a lot of CPAN documentation uses older, now-discouraged idioms in the
examples.)

You can write that loop as

while (my ($key, $conf) = each %$cnfhs) {

and it'll even be more efficient on really large hashes. (Not enough
people remember about 'each'.)

Passing a value into a sub in a file-scoped global is always worth
avoiding if you can. In this case it's easy: pass it in as a real
parameter instead.

It may seem silly calling get_configurations($confdir) when you know
perfectly well that get_configurations can see that variable, but it
rapidly becomes less silly when you come back six months later and take
it out without remembering some sub half way down the file needed it.

It's not really clear to me what you're trying to do here with
$LOGFILE{file}, but it can almost certainly be done less confusingly.
Since you don't appear to use it anywhere else, is there any reason not
to just put the correct filename straight in the log4perl config?

[sssdevelop] I am creating log file for each forked process (per web
app one log file). method logger accepts logfile name as arguments,
that's used to change the "file" value of LOGFILE, thats then used
inside getLogfilename().
I really wanted to interpolate logfilename in appender. Now i got the
clue, i will use "qq" instead of single "q"

my $logpath = "$APP_HOME/$HTTPRESP/logs/$logfilename.log";
my $conf = qq(
log4perl.logger = INFO, FileApp
log4perl.appender.FileApp = Log::Log4perl::Appender::File
log4perl.appender.FileApp.filename = $logpath
...
);

(Notice I've changed 'q()' to 'qq()'. I said I preferred double quotes ..)

I'd also want to tidy up this "$APP_HOME/$HTTPRESP" prefix that keeps
turning up. Probably I'd just chdir to that directory and use relative
paths.

I see no 'sub get_logger' here...

...oh, I see, it's from Log4perl. I don't think you should be mixing the
normal and the 'easy' interfaces like that. Since you've got as far as
setting up a custom logger config, use the real method so people reading
the code have some idea what's going on:

return Log::Log4perl->get_logger();

[sssdevelop] - yup, agree with this. I will change it!

Once we get as far as this 'alarm' statement, we've entirely forgotten
about the $SIG{ALRM} assignment at the top of the file that said what
the alarm will actually *do*. It's also better, where possible, to write
subs so they don't rely on or change global values. Generally this means
that if you need to use one of the magic Perl globals, you need to
'local' it.

eval {
local $SIG{ALRM} = sub { die "SLA NOT MET" };
alarm($sla);

[sssdevelop] Now i see the importance of this, thanks for this
advice.

You have a race condition here. It's entirely possible the alarm will
fire after get_response has received a successful response but before it
has returned and the alarm has been deactivated. In this case, you may
not care (if it was that close to the deadline you may want to count it
as 'late' anyway), but it needs considering. If you have, in fact,
considered it, you should put in a comment so you remember not to
consider it again .

You need to cancel the alarm here, as well. get_response may die for
some reason other than an alarm, and if it does the alarm will still go
off, but nothing will catch the exception. (There is another race here,
of course, but it's rather hard to avoid. Alarms are tricky.)

[sssdevelop] Ah.. i did not put much thought on race happening here.
Need to work on this. this was good advice here

You don't need to re-do all this every time. Create a LWP::UA object as
part of your script initialization, then pass it in to get_response.

[sssdevelop] yup, will add single LWP::UA object ! this is helpful.

Since you're timing the response anyway, it's probably better to rely on
that measurement, rather than the alarm, to see whether it's gone over
its SLA. You might still want to set an alarm for something like twice
the SLA, just in case the request never returns for some reason (though
LWP will timeout for you eventually).

You might also want to look at LWPx::TimedHTTP, which gives detailed
timings of the various stages of the request cycle.

[sssdevelop] - thanks, will surely check this LWPx::TimedHTTP.
Ben, appreciate your help and thank you so much for
your time.

----sssdevelop

Rainer Weikusat · Oct 9, 2011

[...]

I see Tad's already mentioned you're using 'our' where you could be
using 'my'. I would add that there's no need for concatentation when you
can interpolate, and that you can assign directly to a hash too:

my $APP_HOME = '/usr/local/application';
my $HTTPRESP = 'httpresp';
my %LOGFILE = (
file => "$APP_HOME/$HTTPRESP/logs/generic.log",
);

(Whether to use single or double quotes when nothing will be
interpolated is a matter of taste.

Using double-quotes means the compiler has to analyze the string order
to determine if and how something needs to be interpolated into
it. Otherwise, this effort can be avoided and also the effort the next
person who looks at the code needs to come to the same result.

[...]

It's best to avoid dative method syntax (METHOD OBJECT ARGS or METHOD
CLASS ARGS), at least in part because it encourages the idea that there
is something special about the 'new' method.

my $pm = Parallel::ForkManager->new($MAX_PROC);

(Yes, I know you just copied that from the documentation. Unfortunately
a lot of CPAN documentation uses older, now-discouraged idioms in the
examples.)

In this particular case, the 'indirect object' syntax is actually even
'discouraged' by the corresponding Perl documentation. More generally,
however, the sentence above parses as "a lot of people who wrote
useful code disagreed with some (or all) opinions
$random_people_on_usenet had regarding what they should have done".

You can write that loop as

while (my ($key, $conf) = each %$cnfhs) {

and it'll even be more efficient on really large hashes. (Not enough
people remember about 'each'.)

Except in case of insanely large hashes (> 500,000 keys, according to
some tests I did a while back), getting the list of keys and
traversing the hash 'open coded' will be faster than calling Perl
subroutine per key. OTOH, the latter should need less memory.

[...]

Passing a value into a sub in a file-scoped global is always worth
avoiding if you can. In this case it's easy: pass it in as a real
parameter instead.

Doing something because it can be done is just about the most awful
reason for doing it which can be imagined, especially in code where
more or less everything can be done.

It may seem silly calling get_configurations($confdir) when you know
perfectly well that get_configurations can see that variable, but it
rapidly becomes less silly when you come back six months later and take
it out without remembering some sub half way down the file needed
it.

It is silly to remove something without checking that it isn't needed
and even more silly to remove something and not even run the changed
file through the compiler before assuming that the change was ok. And
with 'strict', such a change won't go through the compiler.

It's not really clear to me what you're trying to do here with
$LOGFILE{file}, but it can almost certainly be done less confusingly.
Since you don't appear to use it anywhere else, is there any reason not
to just put the correct filename straight in the log4perl config?

The subroutine above is a specific algorithm with a generic name and
the usual reason for functional decomposition (didn't someboday claim
that was acutally used in practice not that long ago) is that it
simplifies the code at each level abstraction by moving details
which are not relevant for the present algorithm elsewhere.

Mart van de Wege · Oct 9, 2011

Rainer Weikusat said:
In this particular case, the 'indirect object' syntax is actually even
'discouraged' by the corresponding Perl documentation. More generally,
however, the sentence above parses as "a lot of people who wrote
useful code disagreed with some (or all) opinions
$random_people_on_usenet had regarding what they should have done".

There are actually good, technical, reasons why the indirect syntax is
being discouraged.

But do not let that stop you from ranting at what you perceive to be
irrational authoritarianism. The rest of us will enjoy our code without
hard to find bugs.

Mart

Jürgen Exner · Oct 9, 2011

Rainer Weikusat said:
Doing something because it can be done is just about the most awful
reason for doing it which can be imagined, especially in code where
more or less everything can be done.

Are you arguing in favour of or against global variables?

jue

Jürgen Exner · Oct 10, 2011

Ben Morrow said:
|
| Also, I see you keep using single quotes for strings that don't need them.
| Why? Those aren't really a generic string, you know. They loudly scream
| "DO NOT INTERPOLATE ANYTHING HERE!!!", so I always scrutinize them extra
| hard to try to figure out what it is that you're trying to suppress. Plus
| I hate having to deal with backslash escapes that aren't. Perl strings
| normally interpolate, and it takes something special to suppress that:
|
| $string = $normal ; # interpolate values normally
| $string = "$normal" ; # interpolate values normally

By some definition of normally. This assignment actually stringifies
$normal which most often is not the desired action when dealing with
e.g. numbers. Therefore you could argue if this is the "normal" or
"naively expected" semantic.

| $string = `$normal` ; # interpolate values normally
| $string = '$abnormal' ; # DON'T INTERPOLATE VALUES!!!

I like to think of '...' as the literal operator or the literal quotes.
They don't munge their text.

| $string = \$abnormal ; # DON'T INTERPOLATE VALUES!!!
| $string = "\$abnormal" ; # DON'T INTERPOLATE VALUES!!!

Actually example does interpolated. It is interpolates \$ as the literal
dollar sign.

etc, etc.

jue

SSS Develop · Oct 10, 2011

Very healthy discussions ! thanks everyone for participating !
(will post the final version soon

)
thank you,

sssdevelop

Eric Pozharski · Oct 10, 2011

with said:
Well, at least I'm not alone:

So the double-quote-war has begun? Let me introduce some fuel to that
so far ceased fire

my $aa;
my $ab = 'abc';

$aa = eval q|"$ab"|
$aa = eval q|'$ab'|
$aa = eval q|"\\$ab"|
$aa = eval q|'def'|
#!/usr/bin/perl

use strict;
use warnings;
use Benchmark qw{ cmpthese timethese };

my $aa;
my $ab = 'abc';

cmpthese timethese -5, {
code00 => sub { $aa = eval q|"$ab"| },
code01 => sub { $aa = eval q|'$ab'| },
code02 => sub { $aa = eval q|"\\$ab"| },
code03 => sub { $aa = eval q|'def'| },
};

__END__
Benchmark: running code00, code01, code02, code03 for at least 5 CPU seconds...
code00: 6 wallclock secs ( 5.00 usr + 0.02 sys = 5.02 CPU) @ 64101.99/s (n=321792)
code01: 6 wallclock secs ( 5.18 usr + 0.00 sys = 5.18 CPU) @ 96095.95/s (n=497777)
code02: 6 wallclock secs ( 5.24 usr + 0.00 sys = 5.24 CPU) @ 69054.58/s (n=361846)
code03: 5 wallclock secs ( 5.34 usr + 0.00 sys = 5.34 CPU) @ 94380.15/s (n=503990)
Rate code00 code02 code03 code01
code00 64102/s -- -7% -32% -33%
code02 69055/s 8% -- -27% -28%
code03 94380/s 47% 37% -- -2%
code01 96096/s 50% 39% 2% --

Hopefully, I did leaning sticks correctly in 'code02'.
*CUT*

Rainer Weikusat · Oct 10, 2011

Jürgen Exner said:
Are you arguing in favour of or against global variables?

I'm arguing against 'do this whenever you can' 'design recipes'. As to
'global variables': A (non OO) module composed of some subroutines and
some shared state variables these subroutines work with is
structurally similar/ identical (isomorph) to an object with some
instance variables and some methods working with these. Programs can
often be decomposed into relatively independent functional units
providing services to each other by invoking interface routines. These
independent functional units will usually be stateful, meaning, they
will have a set of internal variables with 'state information'. I see
no ground for an argument 'for' or 'against' something except maybe at
the completely different level of 'should computer programs work like
other, complex real-world objects', IOW, have state which changes over
time, or should fat runtimes be used to provide the illusion that it
wasn't in this way.

xhoster · Oct 11, 2011

Tad McClellan said:
Rainer Weikusat said:

[...]

I see Tad's already mentioned you're using 'our' where you could be
using 'my'. I would add that there's no need for concatentation when
you can interpolate, and that you can assign directly to a hash too:

my $APP_HOME = '/usr/local/application';
my $HTTPRESP = 'httpresp';
my %LOGFILE = (
file => "$APP_HOME/$HTTPRESP/logs/generic.log",
);

(Whether to use single or double quotes when nothing will be
interpolated is a matter of taste.

Click to expand...

Using double-quotes means the compiler has to analyze the string order
to determine if and how something needs to be interpolated into
it.

Click to expand...

Sure, but who cares?

Why would the person need to do that? I find it more common that I need to
interpolate where I (or someone) didn't before, and now I have to inspect
the whole string to find out what it was that motivated the person to make
it a single quote rather than double in the first place, only to find out
there was no reason for it.

The Moon is blue...

I, errr, agree with Rainer here, more for his second reason
that for his first.

I am always looking to optimize for maintenance.

Using single quotes because the content does not *currently* require
interpolation is not optimizing for maintenance. Optimizing for
maintenance would require you to predict how the usage is likely to change
in the future, and using whatever style of quotes is less likely to need to
change under that theory of the future.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Rainer Weikusat · Oct 11, 2011

Tad McClellan said:
Tad McClellan said:

Rainer Weikusat said:

[...]

I see Tad's already mentioned you're using 'our' where you could be
using 'my'. I would add that there's no need for concatentation when
you can interpolate, and that you can assign directly to a hash too:

my $APP_HOME = '/usr/local/application';
my $HTTPRESP = 'httpresp';
my %LOGFILE = (
file => "$APP_HOME/$HTTPRESP/logs/generic.log",
);

(Whether to use single or double quotes when nothing will be
interpolated is a matter of taste.

Using double-quotes means the compiler has to analyze the string order
to determine if and how something needs to be interpolated into
it.

Click to expand...

Click to expand...

Sure, but who cares?

I assume that you got this wrong by accident. It should have been
"Sure, but I don't care". Why precisely you consider this to be
noteworthy enough to post it escapes me, however.

Why would the person need to do that?

In order to determine what will become of the string.

I find it more common that I need to interpolate where I (or
someone) didn't before, and now I have to inspect
the whole string to find out what it was that motivated the person to make
it a single quote rather than double in the first place, only to find out
there was no reason for it.

But there was a reason for it, this reason being given above. And if
you need to change the string, you'll have to inspect it anyway, in

[...]

Using single quotes because the content does not *currently* require
interpolation is not optimizing for maintenance. Optimizing for
maintenance would require you to predict how the usage is likely to change
in the future, and using whatever style of quotes is less likely to need to
change under that theory of the future.

If 'optimizing for maintenance' requires 'predicting the future' then
it can't be done because the future can't be predicted.

Rainer Weikusat · Oct 11, 2011

Ben Morrow said:
No, it should've been 'No, it doesn't'.

That's not possible, given that the compiler needs to determine what
needs to be interpolated into which locations of the string.

When the compiler is scanning a
q// string, it runs along looking for a \ or a final delimiter. When
it's scanning a qq// string, it does exactly the same thing, looking for
$ or @ as well. In particular, it does *not* pull out the whole string
up to the closing delimiter and then rescan it looking for
interpolations:

Nobody except you wrote that. And - of course - that's exactly what
perl is doing, judgeing from a rather cursory look at toke.c: First,
a 'string token' is created by calling scan_str, then, the string
created in this way is scanned for 'characters requiring special
processing' and if one is found, the 'op' which is being created is
changed from OP_CONST to OP_STRINGIFY, cf

case '"':
s = scan_str(s,!!PL_madskills,FALSE);
DEBUG_T( { printbuf("### Saw string before %s\n", s); } );
if (PL_expect == XOPERATOR) {
if (PL_lex_formbrack && PL_lex_brackets == PL_lex_formbrack) {
PL_expect = XTERM;
deprecate_old(commaless_variable_list);
return REPORT(','); /* grandfather non-comma-format format */
}
else
no_op("String",s);
}
if (!s)
missingterm(NULL);
pl_yylval.ival = OP_CONST;
/* FIXME. I think that this can be const if char *d is replaced by
more localised variables. */
for (d = SvPV(PL_lex_stuff, len); len; len--, d++) {
if (*d == '$' || *d == '@' || *d == '\\' || !UTF8_IS_INVARIANT((U8)*d)) {
pl_yylval.ival = OP_STRINGIFY;
break;
}
}
TERM(sublex_start());

[Perl_yylex, toke.c]

[...]

In any case, the difference is almost certainly unmeasurable, so it
doesn't matter.

That should probably also been "I don't care about that" instead of
"it doesn't matter". Also, the difference is measurable using
sufficiently long strings. Eg, on the computer where I tested this
(3.2Ghz Intel i3), creating a subroutine returning a string of 8192 As
with that string being "-quoted is about 1.4 times slower than
when string is '-quoted. Granted, that's a contrived example and the
absolute numbers are really low (4.75E-5s vs 3.37E-5s for a single
invocation of the function, IOW, the per-character cost is in the
nanosecond range).

Under normal circumstances, if I see something looking like $foo it
represents a variable. The only exceptions are inside a single-quoted
string, and \-escaped inside a double-quoted string. \-escaping is so
universal I don't have to think about it any more, so the case which
trips me up is a (non-\-escaped) $foo inside a single-quoted string.

The common way to read a (n English) text is to start at the top and
read each line from left to right until the bottom has been
reached and using shortcuts based on assumptions regarding what the
content of the text will be is bound to lead to misunderstandings.

But there was a reason for it, this reason being given above. And if
you need to change the string, you'll have to inspect it anyway, in [snip]

If 'optimizing for maintenance' requires 'predicting the future' then
it can't be done because the future can't be predicted.

Click to expand...

You really don't understand about maintainable programming, do you? Yes,
it requires predicting the future.

If it requires predicting the future than it can't be done. This is
nothing to do with conjectures about my understanding of anything but
is a fact: The future can't be predicted.

Problem with fork on some RHEL servers	0	Mar 9, 2012
CSPEC issue: lossing scope (or incorrect scope) in cspec subroutine.	0	Feb 5, 2009
Reducing dependencies when initializing Log4perl?	3	Oct 3, 2006
Hi Experts Please help :: Problem in Bio-Parser....with XML-RPC call	1	Jul 30, 2006
Could someone help me with this source code?	5	Jan 20, 2007
using uwsgi to get flask going	0	Sep 19, 2012
can anyone help me about rotation Logging issue	1	Jan 3, 2008
Singleton logger	0	Apr 20, 2004

Help me to Improve

SSS Develop

SSS Develop

Rainer Weikusat

Mart van de Wege

Jürgen Exner

Jürgen Exner

SSS Develop

Eric Pozharski

Rainer Weikusat

xhoster

Rainer Weikusat

Rainer Weikusat

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads