Question about system() in multiple threads

E

enstrophy.2000

Hi,
I'm trying to create a perl script for managing multiple tasks. What
I have been doing is creating a queue of tasks via Thread::Queue, and
having two separate threads retrieving
the task names from the queue and execute each one via system() call.
However, upon
running the script, I found that, if I call system() for more than once
after each retrieval of
the task name, only the first system() call actually succeeds in
executing, whereas the
rest got ignored. Even more counterintuitive is that the rest of the
system() calls do return
0, meaning there was a success. Is there anything fundamentally wrong
with this approach?
Any feedback will be greatly appreciated.

Here is the script in question:

#!/usr/bin/perl -w

use threads;
use Thread::Queue;
use Cwd;

$num_thread = 2;

my $q = new Thread::Queue;

#populate this queue using directory names

opendir(DDIR,"..");
@ddir= grep(/^\d/,readdir(DDIR));

foreach $ddir(@ddir){
$q->enqueue($ddir);
}

for (1..$num_thread){
$thr[$_-1] = threads->new(\&sub1,$_);
# print "creating tid ", $thr[$_-1]->tid,"\n";
}

sub sub1 {
my ($id) = @_;
while($foo =$q->dequeue_nb){
batch($foo);
$left = $q->pending;
print "In the thread $id foo=$foo left=$left\n";
last if($left==0);
#sleep($id);
}


foreach $tt(@thr){
print "joining ", $tt->tid,"\n";
$tt->join();
}

sub batch{
my ($ddir) = @_;
chdir "../".$ddir||die "unable to enter $ddir\n";

print "rpt2dat.pl\n";
system("rpt2dat.pl");
print "$ddir R CMD BATCH cmb.obs.mod.R \n";
system("R CMD BATCH cmb.obs.mod.R")==0
or die "$ddir cmb.obs.mod failed $?";

print "$ddir R CMD BATCH comp.ann.sum.R \n";
system("R CMD BATCH comp.ann.sum.R")==0
or die "$ddir cmb.ann.sum failed $?";
print "$ddir R CMD BATCH comp.ann.sum.R finished\n";

wait();
print "$ddir R CMD BATCH comp.mon.sum.R \n";
system("R CMD BATCH comp.mon.sum.R")==0
: or die "$ddir cmb.mon.sum failed $?";
wait();
print "$ddir R CMD BATCH comp.mon.sum.R finished\n";
}
 
X

xhoster

Hi,
I'm trying to create a perl script for managing multiple tasks. What
I have been doing is creating a queue of tasks via Thread::Queue, and
having two separate threads retrieving
the task names from the queue and execute each one via system() call.
However, upon
running the script, I found that, if I call system() for more than once
after each retrieval of
the task name, only the first system() call actually succeeds in
executing, whereas the
rest got ignored.

Replacing R with echo, I cannot reproduce your results. How do you know
the sytems are being ignored, rather than running and simply producing
no results?

Here is the script in question:

#!/usr/bin/perl -w

use threads;
use Thread::Queue;
use Cwd;

$|=1; # at least for debugging purposes


sub batch{
my ($ddir) = @_;
chdir "../".$ddir||die "unable to enter $ddir\n";

I think chdir chdirs the directory of the entire process, not on per-thread
basis. So this is a race condition.
print "rpt2dat.pl\n";
system("rpt2dat.pl");
print "$ddir R CMD BATCH cmb.obs.mod.R \n";
system("R CMD BATCH cmb.obs.mod.R")==0
or die "$ddir cmb.obs.mod failed $?";

print "$ddir R CMD BATCH comp.ann.sum.R \n";
system("R CMD BATCH comp.ann.sum.R")==0
or die "$ddir cmb.ann.sum failed $?";
print "$ddir R CMD BATCH comp.ann.sum.R finished\n";

wait();

What are you waiting for?

Xho
 
E

enstrophy.2000

xhoster,

Thank you so much for your kind reply. You are right that the wait()
is redundant; I put it there nevertheless because I found the system
calls were not executed.

Good point about that the chdir() may create a race condition.
I wonder if there is a way for specifically testing this?
I have multiple directories and each script outputs to
the local directory. Perhaps I need a semaphore to have this
work? I will appreciate any input.
 
X

xhoster

xhoster,

Thank you so much for your kind reply. You are right that the wait()
is redundant; I put it there nevertheless because I found the system
calls were not executed.

Good point about that the chdir() may create a race condition.
I wonder if there is a way for specifically testing this?
I have multiple directories and each script outputs to
the local directory. Perhaps I need a semaphore to have this
work? I will appreciate any input.

I don't think that *simply* adding a semaphore will help you, because then
you might as well not use threads at all, they will run one at a time.

My first recommendation would be using Parallel::ForkManager rather than
threads. Then you should get independent cwd (at least on Linux; I don't
know what would happend on Windows).

If not that, then I'd try re-writing your processes to use full (or at
least fuller) paths, so you don't need to chdir at all.

third choice would be to do the chdir in the "system" calls:
system "cd ../$ddir && R CMD BATCH cmb.obs.mod.R" and die ....;

Finally, maybe a semaphore method, combined with putting jobs in the
background:

{
lock $semaphore;
chdir "../$ddir" or die;
system "R CMD BATCH cmb.obs.mod.R &" and die ...;
}; # release the lock
wait;

But on second thought, this won't work because you (probably) can't be sure
that wait will get its own background job rather than the other threads
background job, and when you put it a job in the background the success of
"system" is not an indicator of overall success.

Xho
 
E

enstrophy.2000

xhoster,
Thanks again for offering these solutions. I tried the first one and
it worked fine.
I did some testing again on the script that I posted, and I found I can
reproduce
the problem by replacing "R CMD BATCH..." with the execution of two
perl scripts.
Here are the scripts:

####################################################
test1.pl
####################################################
#!/usr/bin/perl

print "this is test1\n";


####################################################
test2.pl
####################################################
#!/usr/bin/perl

print "this is test2\n";


####################################################
test.thread.pl
####################################################
#!/usr/bin/perl -w

use threads;
use Thread::Queue;
use Cwd;

$|=1;

$num_thread = 2;

my $q = new Thread::Queue;

#populate this queue using directory names

opendir(DDIR,"..");
@ddir= grep(/^\d/,readdir(DDIR));

foreach $ddir(@ddir){
$q->enqueue($ddir);
}

for (1..$num_thread){
$thr[$_-1] = threads->new(\&sub1,$_);
# print "creating tid ", $thr[$_-1]->tid,"\n";
}

sub sub1 {
my ($id) = @_;
while($foo =$q->dequeue_nb){
batch($foo);
$left = $q->pending;
print "In the thread $id foo=$foo left=$left\n";
last if($left==0);
#sleep($id);
}
}

foreach $tt(@thr){
print "joining ", $tt->tid,"\n";
$tt->join();
}

sub batch{
my ($ddir) = @_;
chdir "../".$ddir||die "unable to enter $ddir\n";

print system("test1.pl >test1.log "),"\n";
print system("test2.pl >test2.log "),"\n";
}

I put test1.pl and test2.pl under each directory, and run
test.thread.pl,
then I found test1.log in every directory where test1.pl is, but
test2.log
only under some of the directories. As you pointed out, chdir is not
thread safe, so I guess it is possible that the current directory may
have
changed after the first system() call due to the activity in the other
thread. I will test this a little further. Thank you very much for your

direction and I really appreciate it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top