More Forks!

  • Thread starter it_says_BALLS_on_your forehead
  • Start date
I

it_says_BALLS_on_your forehead

I am getting erratic behavior with this script...

#!/apps/webstats/bin/perl

use File::Copy;
use Parallel::ForkManager;


my $pm = Parallel::ForkManager->new(10);

$pm->run_on_start(
sub { my ($pid,$ident)=@_;
# print "** $ident started, pid: $pid\n";
}
);


# NOTE: this MUST be assigned BEFORE the $pm->run_on_finish
my @tokens = "log1" .. "log5";
# test with a 'late' file.
my @data = "file01" .. "file10";

$pm->run_on_finish(
sub {
my ($pid, $exit_code, $ident) = @_;
my ($outlog, $missingfile) = split /\|/, $_[2];
push( @tokens, $outlog );
# only push if the file is missing...HOW TO KNOW
IF IT'S MISSING?? or do a next if (-e $missing
# ...but the next messes up the ForkManager b/c
child process does not go to $pm->finish and so
# tries to spawn an extra process...breaks the
limit
# could put $pm->finish in continue block, search
in ...perl.misc Google Groups: fork ftp
unless (-e "data/$missingfile") {
push( @data, $missingfile );
}
# print "ID: $ident (pid: $pid) had exit code:
$exit_code.\n";
}
);

my $counter = 0;
for (@data) {
$counter++;
if ($counter > 20 ) {
print "\n*******counter was above 20:
$counter**********************\n\n";
last; # maybe this doesn't ->finish...
}
my $outfile = shift(@tokens);
$pm->start("$outfile|$_") and next;
print "$counter: ";
print "reading data/$_: writing to log -$outfile-\n";
my $func_ref = hello($_);
$func_ref->("Simon");
$pm->finish;
}

$pm->wait_all_children;
#--- subs ---

sub hello {
my ($type) = @_;

if ($type eq "file01") {
print "type: $type\n";
return \&func1;
}
else {
print "type: $type\n";
return \&func2;
}
return 0;
}

sub func1 {
my ($noun) = @_;
print "* using func1. you stink, $noun\n";
}

sub func2 {
my ($noun) = @_;
print "* * using func2. yay!!! it's $noun!!!\n";
}

#-----
RESULT 1:
[mymachine] ~/simon/1-perl > tryFork.pl
1: reading data/file01: writing to log -log1-
type: file01
* using func1. you stink, Simon
2: reading data/file02: writing to log -log2-
type: file02
* * using func2. yay!!! it's Simon!!!
3: reading data/file03: writing to log -log3-
type: file03
* * using func2. yay!!! it's Simon!!!
4: reading data/file04: writing to log -log4-
type: file04
* * using func2. yay!!! it's Simon!!!
5: reading data/file05: writing to log -log5-
type: file05
* * using func2. yay!!! it's Simon!!!
6: reading data/file06: writing to log -log1-
type: file06
* * using func2. yay!!! it's Simon!!!
7: reading data/file07: writing to log -log2-
type: file07
* * using func2. yay!!! it's Simon!!!
8: reading data/file08: writing to log -log3-
type: file08
* * using func2. yay!!! it's Simon!!!
9: reading data/file09: writing to log -log4-
type: file09
* * using func2. yay!!! it's Simon!!!
10: reading data/file10: writing to log -log5-
type: file10
* * using func2. yay!!! it's Simon!!!
11: reading data/file07: writing to log -log1-
type: file07
* * using func2. yay!!! it's Simon!!!
12: reading data/file08: writing to log -log2-
type: file08
* * using func2. yay!!! it's Simon!!!
13: reading data/file09: writing to log -log3-
type: file09
* * using func2. yay!!! it's Simon!!!
14: reading data/file10: writing to log -log4-
type: file10
* * using func2. yay!!! it's Simon!!!
15: reading data/file07: writing to log -log5-
type: file07
* * using func2. yay!!! it's Simon!!!
16: reading data/file08: writing to log -log1-
type: file08
* * using func2. yay!!! it's Simon!!!
17: reading data/file09: writing to log -log2-
type: file09
* * using func2. yay!!! it's Simon!!!
18: reading data/file10: writing to log -log3-
type: file10
* * using func2. yay!!! it's Simon!!!
19: reading data/file07: writing to log -log4-
type: file07
* * using func2. yay!!! it's Simon!!!

*******counter was above 20: 21**********************

20: reading data/file08: writing to log -log5-
type: file08
* * using func2. yay!!! it's Simon!!!

RESULT2:
[mymachine] ~/simon/1-perl > tryFork.pl
1: reading data/file01: writing to log -log1-
type: file01
* using func1. you stink, Simon
2: reading data/file02: writing to log -log2-
type: file02
* * using func2. yay!!! it's Simon!!!
3: reading data/file03: writing to log -log3-
type: file03
* * using func2. yay!!! it's Simon!!!
4: reading data/file04: writing to log -log4-
type: file04
* * using func2. yay!!! it's Simon!!!
5: reading data/file05: writing to log -log5-
type: file05
* * using func2. yay!!! it's Simon!!!
6: reading data/file06: writing to log -log1-
type: file06
* * using func2. yay!!! it's Simon!!!
7: reading data/file07: writing to log -log2-
type: file07
* * using func2. yay!!! it's Simon!!!
8: reading data/file08: writing to log -log3-
type: file08
* * using func2. yay!!! it's Simon!!!
9: reading data/file09: writing to log -log4-
type: file09
* * using func2. yay!!! it's Simon!!!
10: reading data/file10: writing to log -log5-
type: file10
* * using func2. yay!!! it's Simon!!!
11: reading data/file07: writing to log -log1-
type: file07
* * using func2. yay!!! it's Simon!!!
12: reading data/file08: writing to log -log2-
type: file08
* * using func2. yay!!! it's Simon!!!
13: reading data/file09: writing to log -log3-
type: file09
* * using func2. yay!!! it's Simon!!!
14: reading data/file10: writing to log -log4-
type: file10
* * using func2. yay!!! it's Simon!!!
15: reading data/file07: writing to log -log5-
type: file07
* * using func2. yay!!! it's Simon!!!
16: reading data/file08: writing to log -log1-
type: file08
* * using func2. yay!!! it's Simon!!!
17: reading data/file09: writing to log -log2-
type: file09
* * using func2. yay!!! it's Simon!!!

....sometimes it stops at 10, sometimes at 15...MOST of the time it gets
all the way to 20. does anyone understand this behavior? do i need to
stick a continue block around the $pm->finish? i understand that if i
do this, the parent will also call finish, but this will be a silent
no-op when called by the parent.
 
X

xhoster

it_says_BALLS_on_your forehead said:
I am getting erratic behavior with this script...

my $pm = Parallel::ForkManager->new(10); ....
my @tokens = "log1" .. "log5"; ....
for (@data) { ....
my $outfile = shift(@tokens);

You can have up to 10 parallel processes at a time, but you try to
make them share 5 tokens. What happens if a sixth job is started before
one of the previous 5 ends, therefore @tokens is empty and $outfile gets
set to be undefined? Could that cause the problem? The number of tokens
should be at least one more than the max number of children.


Your "for (@data) {" loop has a lot of stuff going in inside of it,
including sort-of-asynchronous calls. Who knows if $_ is getting stomped
on by something? Any non-trivial foreach loop should declare a "my"
variable, rather than defaulting to $_.

$pm->run_on_finish(
sub {
my ($pid, $exit_code, $ident) = @_; ....
unless (-e "data/$missingfile") {
push( @data, $missingfile );
}
}
);

This routine modifies @data while it is being iterated over with
a for statement. That could cause problems.

perldoc perlsyn:
If any part of LIST is an array, "foreach" will get very confused if
you add or remove elements within the loop body, for example with
"splice". So don't do that.

So don't use a foreach:

##for (@data) {
while (@data) { $_=shift @data;

(although really you should use a lexical variable, rather than $_)
while (@data) { my $new_var=shift @data;

This still has the problem of what happens if @data is empty, the while
statement sees that it is empty, falls through, and only then does some
job shove something back into @data, too late to be noticed? In fact, I
think this may be the root of your problem. You need to wait for all the
stragglers to have come in and shoved whatever they have back onto the
queue, then give it another go.

do {
while (@data) { $_=shift @data;
...<the rest of what used to be your for loop but is now a while loop>
};
$pm->wait_all_children;
} while @data;


...sometimes it stops at 10, sometimes at 15...MOST of the time it gets
all the way to 20. does anyone understand this behavior?

I think so.
do i need to
stick a continue block around the $pm->finish? i understand that if i
do this, the parent will also call finish, but this will be a silent
no-op when called by the parent.

I don't think that that is the problem, but why not do it anyway?

Xho
 
I

it_says_BALLS_on_your forehead

You can have up to 10 parallel processes at a time, but you try to
make them share 5 tokens. What happens if a sixth job is started before
one of the previous 5 ends, therefore @tokens is empty and $outfile gets
set to be undefined? Could that cause the problem? The number of tokens
should be at least one more than the max number of children.

ahh! thank you for pointing that out.
Your "for (@data) {" loop has a lot of stuff going in inside of it,
including sort-of-asynchronous calls. Who knows if $_ is getting stomped
on by something? Any non-trivial foreach loop should declare a "my"
variable, rather than defaulting to $_.

hmm, i thought (perhaps this is a misapprehension on my part--i will
investigate) that:

for (@data)

....automatically 'lexified' $_.
This routine modifies @data while it is being iterated over with
a for statement. That could cause problems.

yeah, i was debating whether or not this was a good idea. i'm thinking
now that it's not.
perldoc perlsyn:
If any part of LIST is an array, "foreach" will get very confused if
you add or remove elements within the loop body, for example with
"splice". So don't do that.

So don't use a foreach:

##for (@data) {
while (@data) { $_=shift @data;

(although really you should use a lexical variable, rather than $_)
while (@data) { my $new_var=shift @data;

This still has the problem of what happens if @data is empty, the while
statement sees that it is empty, falls through, and only then does some
job shove something back into @data, too late to be noticed? In fact, I
think this may be the root of your problem. You need to wait for all the
stragglers to have come in and shoved whatever they have back onto the
queue, then give it another go.

do {
while (@data) { $_=shift @data;
...<the rest of what used to be your for loop but is now a while loop>
};
$pm->wait_all_children;
} while @data;

this sounds sensible. i need to wrap my brain around it :)
I think so.


I don't think that that is the problem, but why not do it anyway?

i'm not too comfortable with continues. i've read that they aren't used
much in real-life code. is the use of continue blocks indicative of a
wrong way of thinking? should i re-design my code so that it is not
necessary?
 
X

xhoster

it_says_BALLS_on_your forehead said:
hmm, i thought (perhaps this is a misapprehension on my part--i will
investigate) that:

for (@data)

...automatically 'lexified' $_.

It automatically localizes it, which is quite different. If it lexified
it, then only things within the foreach block's lexical scope could stomp
on $_. But because it localizes it, anything within the foreach's dynamic
scope can stomp on it. Because of your use of modules and callbacks, there
is a lot of invisible stuff within the dynamic scope.

This shows the dynamic nature:

$ perl -le 'sub foo {$_="bar"}; foreach (1..10) {foo(); print $_}'
bar
bar
bar
....

i'm not too comfortable with continues. i've read that they aren't used
much in real-life code. is the use of continue blocks indicative of a
wrong way of thinking?

I very rarely use continue blocks myself, but when I see someoneelse's code
with a well-placed continue I always think to myself that that saved a lot
of messy code, and that I should remember to use them more often.

The one thing, other than failure to think of it, that does stop me from
using them more often is their lexical isolation from the main loop block.

foreach my $foo (@foo) {
##...
my $bar=something();
##...
next if something_else();
##..
} continue {
##D'oh
##Can't access $bar here
}


Anyway, they probably aren't all that rare.

#sorry about the wrap

~/perl_misc]$ find /usr/lib/perl5/ -name "*.pm" -exec perl -lne 'print
"$ARGV\t$_" if /continue\s*{/' {} \;


/usr/lib/perl5/5.8.0/File/Find.pm continue {
/usr/lib/perl5/5.8.0/File/Find.pm continue {
/usr/lib/perl5/5.8.0/ExtUtils/MakeMaker.pm } continue {
/usr/lib/perl5/5.8.0/ExtUtils/MM_MacOS.pm } continue {
/usr/lib/perl5/5.8.0/ExtUtils/MM_Unix.pm } continue {
....

should i re-design my code so that it is not
necessary?

Um, it already isn't necessary. As far as I can tell, the only reason
one *might* erroneously think it is necessary is if you tried to
conditionally "next" out of the run_on_finish code, which you thankfully
aren't trying to do. That would be a bad thing to do in general, even if
not using ForkManager. You are supposed to "return" out of subroutines,
or just let them finish naturally, not "next" out of them. If you use
warnings, you will get warnings when you do such uncouth things.

Put since you are using ForkManager, the "next" in the subroutine would not
only be bad manners, it would also be highly broken. The run_on_finish
coderef is called from within the parent, not the child (otherwise, pushing
the token back on the stack would have no effect, the useful stack lives in
the parent, not the child). If the the run_on_finish code invokes "next",
it is the parent that will suffer this invocation. And who knows what
havoc that will wreak.

Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top