help analyzing cause of return code

Discussion in 'Perl Misc' started by axeman, Feb 22, 2006.

  1. axeman

    axeman Guest

    Synopsis:

    A variant of a typical host availability / pinger script has performed
    well for many years. Multiple daemons process various lists at various
    intervals with various timeouts. The tool was recently modified to
    support attempting sequences of tests (i.e. ping and TCP port test, ...
    vs. just one test). The daemons will run fine for days, but then some
    will suddenly receive non-zero return codes for every command/test they
    perform. Specifically, return code 16777215 (-1 before shift >> 8).
    Searches have suggested problems with CHLD signals, though they have
    never been a problem before. Appreciate any insight.

    Code versions:

    AIX 4.3.3
    Perl 5.005_03

    Basic daemon model:

    ....

    sub timed_out { # ALRM signal handler for command time-out
    die "timed out";
    }

    ....

    $SIG{'HUP'} = 'IGNORE'; # don't die on these signals
    $SIG{'PIPE'} = 'IGNORE';
    $SIG{'TERM'} = 'IGNORE';
    $SIG{'ALRM'} = \&timed_out;
    $SIG{'USR1'} = \&quiesce;
    use POSIX ":sys_wait_h";

    ....

    foreach $test ( split(/;/,$TESTS) ) {

    # std wrapper for timed operation, return code in $rc,
    output in @out
    ($rc,@out) = eval {
    alarm($timeout);
    $test =~ s/HOST/$check/g;
    $test[$testCount] = $test;
    @eout = `$test 2>&1`;
    $erc = ($? >> 8);
    alarm(0);
    return ($erc,@eout);
    };
    if( $@ =~ /^timed out/ ) {
    $rc = 1;
    $timeouts++;
    $test_timeout[$testCount] = 1;
    }
    $test_rc[$testCount] = $rc;
    $test_console[$testCount] = join('',@out);
    $testCount++;
    $spawned++;

    last if( $rc == 0 ); # successful test
    }

    ....

    # clean up any hung children for every 10 or more spawned
    processes

    if( $spawned > 10 ) {
    reap; # NOTE - also new code - this recursively traverses
    the process tree
    # and kill KILL's any children
    $spawned = 0;
    }

    # clean up zombies - not done w/signal handler due to unreliable
    signals

    while( ($waitedPid = waitpid(-1, &WNOHANG)) > 0 ) {}

    ....
    axeman, Feb 22, 2006
    #1
    1. Advertising

  2. axeman

    Guest

    axeman wrote:
    > vs. just one test). The daemons will run fine for days, but then some
    > will suddenly receive non-zero return codes for every command/test they
    > perform.


    Is your process reaper reaping? For some odd reason, AIX has an
    insanely-low default max-pid-per-user limitation (I think default is
    256 - I usually run it at 1024). Check "smitty chgsys" and check your
    process table.

    You would have messages in /var/spool/mail if you were pid-starved.
    And, of course, if the process is running as root, I don't think it
    would matter, since (I believe) root is not limited.

    FWIW, whatever is happening here probably (almost surely) has nothing
    to do with Perl.

    --
    http://DavidFilmer.com
    , Feb 22, 2006
    #2
    1. Advertising

  3. axeman

    axeman Guest

    Thanks David.

    Unfortunately, it is running as root (even thought the limit is low -
    128 - and no related mail). The reaper is misnamed (not my code), it
    just kills hung test procs, but does not reap their exit status, thats
    what the asynchronous 'while( ($waitedPid = waitpid(-1, &WNOHANG)) > 0
    ) {}' line does. CHLD signals are not mapped (i.e. left to DEFAULT).
    Curiously, if I do map them to a handler or IGNORE, the bad return code
    occurs always.
    axeman, Feb 22, 2006
    #3
  4. axeman

    Guest

    "axeman" <> wrote:
    > Synopsis:
    >
    > A variant of a typical host availability / pinger script has performed
    > well for many years. Multiple daemons process various lists at various
    > intervals with various timeouts.


    How often are the timeouts actually activated?

    > The tool was recently modified to
    > support attempting sequences of tests (i.e. ping and TCP port test, ...
    > vs. just one test).


    Did these changes change how often timeout were actually activated?

    >
    > AIX 4.3.3
    > Perl 5.005_03
    > ...
    > sub timed_out { # ALRM signal handler for command time-out
    > die "timed out";
    > }


    Does the handler need to re=install itself after being activated
    on your system?


    > ($rc,@out) = eval {
    > alarm($timeout);
    > $test =~ s/HOST/$check/g;
    > $test[$testCount] = $test;
    > @eout = `$test 2>&1`;
    > $erc = ($? >> 8);
    > alarm(0);
    > return ($erc,@eout);
    > };
    > if( $@ =~ /^timed out/ ) {
    > $rc = 1;
    > $timeouts++;
    > $test_timeout[$testCount] = 1;
    > }


    If $@ is defined but not timed out, shouldn't you do something about it?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Feb 23, 2006
    #4
  5. axeman

    Guest

    "axeman" <> wrote:
    > Thanks David.
    >
    > Unfortunately, it is running as root (even thought the limit is low -
    > 128 - and no related mail). The reaper is misnamed (not my code), it
    > just kills hung test procs, but does not reap their exit status, thats
    > what the asynchronous 'while( ($waitedPid = waitpid(-1, &WNOHANG)) > 0
    > ) {}' line does. CHLD signals are not mapped (i.e. left to DEFAULT).
    > Curiously, if I do map them to a handler or IGNORE, the bad return code
    > occurs always.


    qx{} automatically waits for the job it spawns--that is how it sets $?.
    If you set SIG{CHLD}, it will interfer with qw{}'s wait.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Feb 23, 2006
    #5
  6. axeman

    axeman Guest

    >> Multiple daemons process various lists at various
    >> intervals with various timeouts.

    > How often are the timeouts actually activated?


    Rarely, i.e. only when a test fails / system is down, and most are
    usually up.

    > Did these changes change how often timeout were actually activated?


    No.

    > Does the handler need to re=install itself after being activated
    > on your system?


    As mentioned, there is no handler, exit statuses are gathered
    asynchronously.

    > If $@ is defined but not timed out, shouldn't you do something about it?


    Yes, clearly. That code was left out (the elipses ...) because it was
    not relevant to the problem.

    > qx{} automatically waits for the job it spawns--that is how it sets $?.
    > If you set SIG{CHLD}, it will interfer with qw{}'s wait.


    Thanks, that makes sense.
    axeman, Feb 23, 2006
    #6
  7. axeman

    Guest

    "axeman" <> wrote:


    Note: snipped material restored with "] ]".

    ] ] > sub timed_out { # ALRM signal handler for command time-out
    ] ] > die "timed out";
    ] ] > }

    > > Does the handler need to re=install itself after being activated
    > > on your system?

    >
    > As mentioned, there is no handler, exit statuses are gathered
    > asynchronously.


    If the thing whose comment says "ALRM signal handler" is not a handler,
    then what the heck is it? And why is it commented thusly?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Feb 23, 2006
    #7
  8. axeman

    axeman Guest

    Lol. Thought you meant a handler for CHLD. No, the ALRM handler does
    not need to be reinstalled.
    axeman, Feb 23, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bj daniels

    analyzing a csv using sql commands

    bj daniels, May 12, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    408
    bj daniels
    May 12, 2004
  2. Mike Landis

    Analyzing and tyding Java code

    Mike Landis, Oct 21, 2003, in forum: Java
    Replies:
    5
    Views:
    495
    Christopher Dean
    Oct 23, 2003
  3. Cross Eyed Admin
    Replies:
    3
    Views:
    809
    Cross Eyed Admin
    Jan 16, 2004
  4. Zach

    analyzing C code?

    Zach, Feb 1, 2007, in forum: C Programming
    Replies:
    12
    Views:
    572
  5. Daniel Zinn

    Source Code Analyzing

    Daniel Zinn, Mar 12, 2006, in forum: Perl Misc
    Replies:
    5
    Views:
    106
    Daniel Zinn
    Mar 13, 2006
Loading...

Share This Page