More Forks!

Discussion in 'Perl Misc' started by it_says_BALLS_on_your forehead, Nov 2, 2005.

  1. I am getting erratic behavior with this script...

    #!/apps/webstats/bin/perl

    use File::Copy;
    use Parallel::ForkManager;


    my $pm = Parallel::ForkManager->new(10);

    $pm->run_on_start(
    sub { my ($pid,$ident)=@_;
    # print "** $ident started, pid: $pid\n";
    }
    );


    # NOTE: this MUST be assigned BEFORE the $pm->run_on_finish
    my @tokens = "log1" .. "log5";
    # test with a 'late' file.
    my @data = "file01" .. "file10";

    $pm->run_on_finish(
    sub {
    my ($pid, $exit_code, $ident) = @_;
    my ($outlog, $missingfile) = split /\|/, $_[2];
    push( @tokens, $outlog );
    # only push if the file is missing...HOW TO KNOW
    IF IT'S MISSING?? or do a next if (-e $missing
    # ...but the next messes up the ForkManager b/c
    child process does not go to $pm->finish and so
    # tries to spawn an extra process...breaks the
    limit
    # could put $pm->finish in continue block, search
    in ...perl.misc Google Groups: fork ftp
    unless (-e "data/$missingfile") {
    push( @data, $missingfile );
    }
    # print "ID: $ident (pid: $pid) had exit code:
    $exit_code.\n";
    }
    );

    my $counter = 0;
    for (@data) {
    $counter++;
    if ($counter > 20 ) {
    print "\n*******counter was above 20:
    $counter**********************\n\n";
    last; # maybe this doesn't ->finish...
    }
    my $outfile = shift(@tokens);
    $pm->start("$outfile|$_") and next;
    print "$counter: ";
    print "reading data/$_: writing to log -$outfile-\n";
    my $func_ref = hello($_);
    $func_ref->("Simon");
    $pm->finish;
    }

    $pm->wait_all_children;
    #--- subs ---

    sub hello {
    my ($type) = @_;

    if ($type eq "file01") {
    print "type: $type\n";
    return \&func1;
    }
    else {
    print "type: $type\n";
    return \&func2;
    }
    return 0;
    }

    sub func1 {
    my ($noun) = @_;
    print "* using func1. you stink, $noun\n";
    }

    sub func2 {
    my ($noun) = @_;
    print "* * using func2. yay!!! it's $noun!!!\n";
    }

    #-----
    RESULT 1:
    [mymachine] ~/simon/1-perl > tryFork.pl
    1: reading data/file01: writing to log -log1-
    type: file01
    * using func1. you stink, Simon
    2: reading data/file02: writing to log -log2-
    type: file02
    * * using func2. yay!!! it's Simon!!!
    3: reading data/file03: writing to log -log3-
    type: file03
    * * using func2. yay!!! it's Simon!!!
    4: reading data/file04: writing to log -log4-
    type: file04
    * * using func2. yay!!! it's Simon!!!
    5: reading data/file05: writing to log -log5-
    type: file05
    * * using func2. yay!!! it's Simon!!!
    6: reading data/file06: writing to log -log1-
    type: file06
    * * using func2. yay!!! it's Simon!!!
    7: reading data/file07: writing to log -log2-
    type: file07
    * * using func2. yay!!! it's Simon!!!
    8: reading data/file08: writing to log -log3-
    type: file08
    * * using func2. yay!!! it's Simon!!!
    9: reading data/file09: writing to log -log4-
    type: file09
    * * using func2. yay!!! it's Simon!!!
    10: reading data/file10: writing to log -log5-
    type: file10
    * * using func2. yay!!! it's Simon!!!
    11: reading data/file07: writing to log -log1-
    type: file07
    * * using func2. yay!!! it's Simon!!!
    12: reading data/file08: writing to log -log2-
    type: file08
    * * using func2. yay!!! it's Simon!!!
    13: reading data/file09: writing to log -log3-
    type: file09
    * * using func2. yay!!! it's Simon!!!
    14: reading data/file10: writing to log -log4-
    type: file10
    * * using func2. yay!!! it's Simon!!!
    15: reading data/file07: writing to log -log5-
    type: file07
    * * using func2. yay!!! it's Simon!!!
    16: reading data/file08: writing to log -log1-
    type: file08
    * * using func2. yay!!! it's Simon!!!
    17: reading data/file09: writing to log -log2-
    type: file09
    * * using func2. yay!!! it's Simon!!!
    18: reading data/file10: writing to log -log3-
    type: file10
    * * using func2. yay!!! it's Simon!!!
    19: reading data/file07: writing to log -log4-
    type: file07
    * * using func2. yay!!! it's Simon!!!

    *******counter was above 20: 21**********************

    20: reading data/file08: writing to log -log5-
    type: file08
    * * using func2. yay!!! it's Simon!!!

    RESULT2:
    [mymachine] ~/simon/1-perl > tryFork.pl
    1: reading data/file01: writing to log -log1-
    type: file01
    * using func1. you stink, Simon
    2: reading data/file02: writing to log -log2-
    type: file02
    * * using func2. yay!!! it's Simon!!!
    3: reading data/file03: writing to log -log3-
    type: file03
    * * using func2. yay!!! it's Simon!!!
    4: reading data/file04: writing to log -log4-
    type: file04
    * * using func2. yay!!! it's Simon!!!
    5: reading data/file05: writing to log -log5-
    type: file05
    * * using func2. yay!!! it's Simon!!!
    6: reading data/file06: writing to log -log1-
    type: file06
    * * using func2. yay!!! it's Simon!!!
    7: reading data/file07: writing to log -log2-
    type: file07
    * * using func2. yay!!! it's Simon!!!
    8: reading data/file08: writing to log -log3-
    type: file08
    * * using func2. yay!!! it's Simon!!!
    9: reading data/file09: writing to log -log4-
    type: file09
    * * using func2. yay!!! it's Simon!!!
    10: reading data/file10: writing to log -log5-
    type: file10
    * * using func2. yay!!! it's Simon!!!
    11: reading data/file07: writing to log -log1-
    type: file07
    * * using func2. yay!!! it's Simon!!!
    12: reading data/file08: writing to log -log2-
    type: file08
    * * using func2. yay!!! it's Simon!!!
    13: reading data/file09: writing to log -log3-
    type: file09
    * * using func2. yay!!! it's Simon!!!
    14: reading data/file10: writing to log -log4-
    type: file10
    * * using func2. yay!!! it's Simon!!!
    15: reading data/file07: writing to log -log5-
    type: file07
    * * using func2. yay!!! it's Simon!!!
    16: reading data/file08: writing to log -log1-
    type: file08
    * * using func2. yay!!! it's Simon!!!
    17: reading data/file09: writing to log -log2-
    type: file09
    * * using func2. yay!!! it's Simon!!!

    ....sometimes it stops at 10, sometimes at 15...MOST of the time it gets
    all the way to 20. does anyone understand this behavior? do i need to
    stick a continue block around the $pm->finish? i understand that if i
    do this, the parent will also call finish, but this will be a silent
    no-op when called by the parent.
     
    it_says_BALLS_on_your forehead, Nov 2, 2005
    #1
    1. Advertising

  2. it_says_BALLS_on_your forehead

    Guest

    "it_says_BALLS_on_your forehead" <> wrote:
    > I am getting erratic behavior with this script...
    >
    > my $pm = Parallel::ForkManager->new(10);

    ....
    > my @tokens = "log1" .. "log5";

    ....
    > for (@data) {

    ....
    > my $outfile = shift(@tokens);


    You can have up to 10 parallel processes at a time, but you try to
    make them share 5 tokens. What happens if a sixth job is started before
    one of the previous 5 ends, therefore @tokens is empty and $outfile gets
    set to be undefined? Could that cause the problem? The number of tokens
    should be at least one more than the max number of children.


    Your "for (@data) {" loop has a lot of stuff going in inside of it,
    including sort-of-asynchronous calls. Who knows if $_ is getting stomped
    on by something? Any non-trivial foreach loop should declare a "my"
    variable, rather than defaulting to $_.


    > $pm->run_on_finish(
    > sub {
    > my ($pid, $exit_code, $ident) = @_;

    ....
    > unless (-e "data/$missingfile") {
    > push( @data, $missingfile );
    > }
    > }
    > );


    This routine modifies @data while it is being iterated over with
    a for statement. That could cause problems.

    perldoc perlsyn:
    If any part of LIST is an array, "foreach" will get very confused if
    you add or remove elements within the loop body, for example with
    "splice". So don't do that.

    So don't use a foreach:

    ##for (@data) {
    while (@data) { $_=shift @data;

    (although really you should use a lexical variable, rather than $_)
    while (@data) { my $new_var=shift @data;

    This still has the problem of what happens if @data is empty, the while
    statement sees that it is empty, falls through, and only then does some
    job shove something back into @data, too late to be noticed? In fact, I
    think this may be the root of your problem. You need to wait for all the
    stragglers to have come in and shoved whatever they have back onto the
    queue, then give it another go.

    do {
    while (@data) { $_=shift @data;
    ...<the rest of what used to be your for loop but is now a while loop>
    };
    $pm->wait_all_children;
    } while @data;



    >
    > ...sometimes it stops at 10, sometimes at 15...MOST of the time it gets
    > all the way to 20. does anyone understand this behavior?


    I think so.

    > do i need to
    > stick a continue block around the $pm->finish? i understand that if i
    > do this, the parent will also call finish, but this will be a silent
    > no-op when called by the parent.


    I don't think that that is the problem, but why not do it anyway?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Nov 2, 2005
    #2
    1. Advertising

  3. wrote:
    > "it_says_BALLS_on_your forehead" <> wrote:
    > > I am getting erratic behavior with this script...
    > >
    > > my $pm = Parallel::ForkManager->new(10);

    > ...
    > > my @tokens = "log1" .. "log5";

    > ...
    > > for (@data) {

    > ...
    > > my $outfile = shift(@tokens);

    >
    > You can have up to 10 parallel processes at a time, but you try to
    > make them share 5 tokens. What happens if a sixth job is started before
    > one of the previous 5 ends, therefore @tokens is empty and $outfile gets
    > set to be undefined? Could that cause the problem? The number of tokens
    > should be at least one more than the max number of children.
    >


    ahh! thank you for pointing that out.

    >
    > Your "for (@data) {" loop has a lot of stuff going in inside of it,
    > including sort-of-asynchronous calls. Who knows if $_ is getting stomped
    > on by something? Any non-trivial foreach loop should declare a "my"
    > variable, rather than defaulting to $_.
    >


    hmm, i thought (perhaps this is a misapprehension on my part--i will
    investigate) that:

    for (@data)

    ....automatically 'lexified' $_.

    >
    > > $pm->run_on_finish(
    > > sub {
    > > my ($pid, $exit_code, $ident) = @_;

    > ...
    > > unless (-e "data/$missingfile") {
    > > push( @data, $missingfile );
    > > }
    > > }
    > > );

    >
    > This routine modifies @data while it is being iterated over with
    > a for statement. That could cause problems.
    >


    yeah, i was debating whether or not this was a good idea. i'm thinking
    now that it's not.

    > perldoc perlsyn:
    > If any part of LIST is an array, "foreach" will get very confused if
    > you add or remove elements within the loop body, for example with
    > "splice". So don't do that.
    >
    > So don't use a foreach:
    >
    > ##for (@data) {
    > while (@data) { $_=shift @data;
    >
    > (although really you should use a lexical variable, rather than $_)
    > while (@data) { my $new_var=shift @data;
    >
    > This still has the problem of what happens if @data is empty, the while
    > statement sees that it is empty, falls through, and only then does some
    > job shove something back into @data, too late to be noticed? In fact, I
    > think this may be the root of your problem. You need to wait for all the
    > stragglers to have come in and shoved whatever they have back onto the
    > queue, then give it another go.
    >
    > do {
    > while (@data) { $_=shift @data;
    > ...<the rest of what used to be your for loop but is now a while loop>
    > };
    > $pm->wait_all_children;
    > } while @data;
    >


    this sounds sensible. i need to wrap my brain around it :)

    >
    >
    > >
    > > ...sometimes it stops at 10, sometimes at 15...MOST of the time it gets
    > > all the way to 20. does anyone understand this behavior?

    >
    > I think so.
    >
    > > do i need to
    > > stick a continue block around the $pm->finish? i understand that if i
    > > do this, the parent will also call finish, but this will be a silent
    > > no-op when called by the parent.

    >
    > I don't think that that is the problem, but why not do it anyway?
    >


    i'm not too comfortable with continues. i've read that they aren't used
    much in real-life code. is the use of continue blocks indicative of a
    wrong way of thinking? should i re-design my code so that it is not
    necessary?
     
    it_says_BALLS_on_your forehead, Nov 2, 2005
    #3
  4. it_says_BALLS_on_your forehead

    Guest

    "it_says_BALLS_on_your forehead" <> wrote:
    >
    > hmm, i thought (perhaps this is a misapprehension on my part--i will
    > investigate) that:
    >
    > for (@data)
    >
    > ...automatically 'lexified' $_.


    It automatically localizes it, which is quite different. If it lexified
    it, then only things within the foreach block's lexical scope could stomp
    on $_. But because it localizes it, anything within the foreach's dynamic
    scope can stomp on it. Because of your use of modules and callbacks, there
    is a lot of invisible stuff within the dynamic scope.

    This shows the dynamic nature:

    $ perl -le 'sub foo {$_="bar"}; foreach (1..10) {foo(); print $_}'
    bar
    bar
    bar
    ....


    > > > do i need to
    > > > stick a continue block around the $pm->finish? i understand that if i
    > > > do this, the parent will also call finish, but this will be a silent
    > > > no-op when called by the parent.

    > >
    > > I don't think that that is the problem, but why not do it anyway?
    > >

    >
    > i'm not too comfortable with continues. i've read that they aren't used
    > much in real-life code. is the use of continue blocks indicative of a
    > wrong way of thinking?


    I very rarely use continue blocks myself, but when I see someoneelse's code
    with a well-placed continue I always think to myself that that saved a lot
    of messy code, and that I should remember to use them more often.

    The one thing, other than failure to think of it, that does stop me from
    using them more often is their lexical isolation from the main loop block.

    foreach my $foo (@foo) {
    ##...
    my $bar=something();
    ##...
    next if something_else();
    ##..
    } continue {
    ##D'oh
    ##Can't access $bar here
    }


    Anyway, they probably aren't all that rare.

    #sorry about the wrap

    ~/perl_misc]$ find /usr/lib/perl5/ -name "*.pm" -exec perl -lne 'print
    "$ARGV\t$_" if /continue\s*{/' {} \;


    /usr/lib/perl5/5.8.0/File/Find.pm continue {
    /usr/lib/perl5/5.8.0/File/Find.pm continue {
    /usr/lib/perl5/5.8.0/ExtUtils/MakeMaker.pm } continue {
    /usr/lib/perl5/5.8.0/ExtUtils/MM_MacOS.pm } continue {
    /usr/lib/perl5/5.8.0/ExtUtils/MM_Unix.pm } continue {
    ....


    > should i re-design my code so that it is not
    > necessary?


    Um, it already isn't necessary. As far as I can tell, the only reason
    one *might* erroneously think it is necessary is if you tried to
    conditionally "next" out of the run_on_finish code, which you thankfully
    aren't trying to do. That would be a bad thing to do in general, even if
    not using ForkManager. You are supposed to "return" out of subroutines,
    or just let them finish naturally, not "next" out of them. If you use
    warnings, you will get warnings when you do such uncouth things.

    Put since you are using ForkManager, the "next" in the subroutine would not
    only be bad manners, it would also be highly broken. The run_on_finish
    coderef is called from within the parent, not the child (otherwise, pushing
    the token back on the stack would have no effect, the useful stack lives in
    the parent, not the child). If the the run_on_finish code invokes "next",
    it is the parent that will suffer this invocation. And who knows what
    havoc that will wreak.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Nov 2, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. The Boss
    Replies:
    1
    Views:
    367
    Jack Klein
    Oct 28, 2003
  2. Mitrokhin
    Replies:
    2
    Views:
    807
    Mitrokhin
    Aug 30, 2008
  3. Lucas Nussbaum
    Replies:
    12
    Views:
    304
    Tanaka Akira
    Jul 23, 2005
  4. Steve Zich

    OS X Resource Forks

    Steve Zich, Aug 17, 2005, in forum: Ruby
    Replies:
    2
    Views:
    145
    Steve Zich
    Aug 17, 2005
  5. Lee Hinman

    Threads + Forks in Ruby 1.9.1p129

    Lee Hinman, Jun 23, 2009, in forum: Ruby
    Replies:
    5
    Views:
    160
    Eleanor McHugh
    Jun 23, 2009
Loading...

Share This Page