D
Debo
I have a script that is severely misbehaving. Its purpose is simple: It is
supposed to retrieve some records from a local database, compare these
records to others fetched from the NCBI genbank database via BioPerl, and
then insert the results of these comparisons into the local database.
It does all of these things admirably: However, it seems that it is
implicitly forking or something because after it terminates, execution
still continues. I'll elaborate on this in a bit.
I don't like posting code, but I'm sort of stumped here. Here's a
trimmed down version of what I'm doing:
#!/usr/bin/perl -w
use strict;
use lib "D:/cygwin/site/www";
use Site;
use Bio::Seq;
use Bio:
B::GenPept;
<SNIP: Many more Bio:: includes>
use DBI;
use Error qw
try);
use File::Copy;
use Log::Log4perl qw(get_logger :levels);
######################### Main routine ##############################
#Initialize the logger we'll be using to record output
Log::Log4perl->init("mlogger.conf");
my $mlogger = get_logger('MITOSEEK');
$mlogger -> info("Job started.");
#define a single handle to use everywhere here.
my $dbh = Site::createDBHandle()
or &logexit("Could not open handle to database.");
$mlogger -> info("Examining genbank N. crassa homologues.");
&examineHomologues();
eval
{
$dbh -> commit();
$dbh -> disconnect();
};
$mlogger -> error("Commit/disconnect failed; $@") if ($@);
#donesville
exit(0);
#Sanity check 1 -- should never get here
die("DIDN'T EXIT");
#Sanity check 2... pull out big guns, aim at perl -- shouldn't happen
system('D:/cygwin/usr/local/bin/killall -s 9 perl');
#----------------------- End main routine ---------------------------
######################### Helper methods ############################
sub examineHomologues()
{
my $sth = Site::createStmt($dbh, "SELECT <UNINTERESTING DATABASE
SCHTUFF HERE>';", 0)
or &logexit("Couldn't fetch homologue data ");
#create a cached query to do the insertion
my $insert_handle = Site::createStmt($dbh, "INSERT INTO
<MORE DATABASE STUFF>", 1);
Site::executeStmt($sth) or &logexit("Couldn't execute stmt to
retrieve homologue data");
#connection to NCBI genbank database
my $fetcher = new Bio:
B::GenBank;
eval
{
my $count = 0;
while (my $dp = $sth -> fetchrow_hashref)
{
my $cur_accession = $dp -> {'accession'};
my $refseq;
eval
{
$refseq = $fetcher -> get_Seq_by_id($cur_accession);
};
#we can afford data loss
next if ($@ || !defined($refseq));
<LOTS OF BORING REGEXP STRINGY STUFF HERE, FOLLOWED
BY SOME DATABASE INSERTIONS>
}
};
$mlogger -> error("Homologue analysis died early. $@") if ($@);
}
sub logexit($)
{
my $logerr_string = shift(@_);
$mlogger -> fatal($logerr_string);
eval{ $dbh -> rollback();};
if ($@)
{
$mlogger -> fatal("Also, an attempted rollback failed to
execute with error: '$@'.");
}
exit(1);
}
#----------------------- End helper methods -------------------------
The Site::createHandle/Stmt etc. commands are just calls to the
corresponding DBI commands (connect, prepare, execute, etc.) wrapped
in evals. I'm using PrintError=0, RaiseError=1.
The abbreviated output (in the log file) looks like this (note that
there are about 24650 records to be processed):
2005/06/22 09:44:53 INFO> Job started.
2005/06/22 09:44:53 INFO> Examining genbank N. crassa homologues.
2005/06/22 09:47:32 INFO> 50 processed.
2005/06/22 09:49:54 INFO> 100 processed.
<..SNIP...>
2005/06/23 05:40:03 INFO> 24550 processed.
2005/06/23 05:41:27 INFO> 24600 processed.
2005/06/23 05:42:06 ERROR> Commit/disconnect failed; DBD:
g::db
commit failed: commit failed at
../cgi/mitoseek line 44.
2005/06/23 05:42:25 INFO> 24550 processed.
2005/06/23 05:43:49 INFO> 24600 processed.
2005/06/23 05:44:28 ERROR> Commit/disconnect failed; DBD:
g::db
commit failed: commit failed at
../cgi/mitoseek line 44.
2005/06/23 05:45:07 INFO> 23500 processed.
2005/06/23 05:46:41 INFO> 23550 processed.
2005/06/23 05:47:49 INFO> 23600 processed.
<...SNIP...>
2005/06/23 06:16:45 INFO> 24550 processed.
2005/06/23 06:18:09 INFO> 24600 processed.
2005/06/23 06:18:48 ERROR> Commit/disconnect failed; DBD:
g::db
commit failed: commit failed at
../cgi/mitoseek line 44.
....and so forth until I arrive in the morning. When I check 'ps',
there are about 20-30 perl processes happily chugging away. I haven't
checked if the 'original' process that started it all is still alive,
which I should really do.
Some things I've noticed:
- the first commit actually does succeed, since I find new records
in the database. I assume subsequent commits fail because of
the inital disconnect call on the handle
- My 'sanity checks' (the die() and system('killall...') calls)
never actually get executed, so there is some exiting going on
somewhere
- exit(0) behaves 'normally' if it occurs before the execution of
the eval{}-wrapped while loop in examineHomologues()
I apologize for the glut of information, but this is sort of the bare
minimum that I thought would be required to diagnose the problem (although
it might even be less than that).
Any suggestions for what I might be able to do to pick apart this problem
would be greatly appreciated. I've never seen this sort of 'implicit
forking' behaviour from perl before.
-Debo
supposed to retrieve some records from a local database, compare these
records to others fetched from the NCBI genbank database via BioPerl, and
then insert the results of these comparisons into the local database.
It does all of these things admirably: However, it seems that it is
implicitly forking or something because after it terminates, execution
still continues. I'll elaborate on this in a bit.
I don't like posting code, but I'm sort of stumped here. Here's a
trimmed down version of what I'm doing:
#!/usr/bin/perl -w
use strict;
use lib "D:/cygwin/site/www";
use Site;
use Bio::Seq;
use Bio:
<SNIP: Many more Bio:: includes>
use DBI;
use Error qw
use File::Copy;
use Log::Log4perl qw(get_logger :levels);
######################### Main routine ##############################
#Initialize the logger we'll be using to record output
Log::Log4perl->init("mlogger.conf");
my $mlogger = get_logger('MITOSEEK');
$mlogger -> info("Job started.");
#define a single handle to use everywhere here.
my $dbh = Site::createDBHandle()
or &logexit("Could not open handle to database.");
$mlogger -> info("Examining genbank N. crassa homologues.");
&examineHomologues();
eval
{
$dbh -> commit();
$dbh -> disconnect();
};
$mlogger -> error("Commit/disconnect failed; $@") if ($@);
#donesville
exit(0);
#Sanity check 1 -- should never get here
die("DIDN'T EXIT");
#Sanity check 2... pull out big guns, aim at perl -- shouldn't happen
system('D:/cygwin/usr/local/bin/killall -s 9 perl');
#----------------------- End main routine ---------------------------
######################### Helper methods ############################
sub examineHomologues()
{
my $sth = Site::createStmt($dbh, "SELECT <UNINTERESTING DATABASE
SCHTUFF HERE>';", 0)
or &logexit("Couldn't fetch homologue data ");
#create a cached query to do the insertion
my $insert_handle = Site::createStmt($dbh, "INSERT INTO
<MORE DATABASE STUFF>", 1);
Site::executeStmt($sth) or &logexit("Couldn't execute stmt to
retrieve homologue data");
#connection to NCBI genbank database
my $fetcher = new Bio:
eval
{
my $count = 0;
while (my $dp = $sth -> fetchrow_hashref)
{
my $cur_accession = $dp -> {'accession'};
my $refseq;
eval
{
$refseq = $fetcher -> get_Seq_by_id($cur_accession);
};
#we can afford data loss
next if ($@ || !defined($refseq));
<LOTS OF BORING REGEXP STRINGY STUFF HERE, FOLLOWED
BY SOME DATABASE INSERTIONS>
}
};
$mlogger -> error("Homologue analysis died early. $@") if ($@);
}
sub logexit($)
{
my $logerr_string = shift(@_);
$mlogger -> fatal($logerr_string);
eval{ $dbh -> rollback();};
if ($@)
{
$mlogger -> fatal("Also, an attempted rollback failed to
execute with error: '$@'.");
}
exit(1);
}
#----------------------- End helper methods -------------------------
The Site::createHandle/Stmt etc. commands are just calls to the
corresponding DBI commands (connect, prepare, execute, etc.) wrapped
in evals. I'm using PrintError=0, RaiseError=1.
The abbreviated output (in the log file) looks like this (note that
there are about 24650 records to be processed):
2005/06/22 09:44:53 INFO> Job started.
2005/06/22 09:44:53 INFO> Examining genbank N. crassa homologues.
2005/06/22 09:47:32 INFO> 50 processed.
2005/06/22 09:49:54 INFO> 100 processed.
<..SNIP...>
2005/06/23 05:40:03 INFO> 24550 processed.
2005/06/23 05:41:27 INFO> 24600 processed.
2005/06/23 05:42:06 ERROR> Commit/disconnect failed; DBD:
commit failed: commit failed at
../cgi/mitoseek line 44.
2005/06/23 05:42:25 INFO> 24550 processed.
2005/06/23 05:43:49 INFO> 24600 processed.
2005/06/23 05:44:28 ERROR> Commit/disconnect failed; DBD:
commit failed: commit failed at
../cgi/mitoseek line 44.
2005/06/23 05:45:07 INFO> 23500 processed.
2005/06/23 05:46:41 INFO> 23550 processed.
2005/06/23 05:47:49 INFO> 23600 processed.
<...SNIP...>
2005/06/23 06:16:45 INFO> 24550 processed.
2005/06/23 06:18:09 INFO> 24600 processed.
2005/06/23 06:18:48 ERROR> Commit/disconnect failed; DBD:
commit failed: commit failed at
../cgi/mitoseek line 44.
....and so forth until I arrive in the morning. When I check 'ps',
there are about 20-30 perl processes happily chugging away. I haven't
checked if the 'original' process that started it all is still alive,
which I should really do.
Some things I've noticed:
- the first commit actually does succeed, since I find new records
in the database. I assume subsequent commits fail because of
the inital disconnect call on the handle
- My 'sanity checks' (the die() and system('killall...') calls)
never actually get executed, so there is some exiting going on
somewhere
- exit(0) behaves 'normally' if it occurs before the execution of
the eval{}-wrapped while loop in examineHomologues()
I apologize for the glut of information, but this is sort of the bare
minimum that I thought would be required to diagnose the problem (although
it might even be less than that).
Any suggestions for what I might be able to do to pick apart this problem
would be greatly appreciated. I've never seen this sort of 'implicit
forking' behaviour from perl before.
-Debo