fork it

Discussion in 'Perl Misc' started by George Mpouras, Aug 8, 2013.

  1. The idea is to finish a "special" job as soon as possible by auto split
    it and explicitly assign its parts to dedicated cores.
    it worked with significant time benefits. Simplistic the idea is the
    following





    my @Threads;

    foreach my $cpu (0 .. (qx[grep "processor" /proc/cpuinfo|wc -l] - 1))
    {
    my $answer = fork;
    die "Could not fork because \"$^E\"\n" unless defined $answer;

    if (0 == $answer)
    {
    print "I the thread $$ doing some parallel work\n";
    for(;;){}
    exit
    }
    else
    {
    push @Threads, $answer;
    `/bin/taskset -pc $cpu $answer`;
    }
    }

    print "Main program $$ waiting the threads: @Threads\n";
    sleep 20;
    foreach my $tid (@Threads) { kill(9,$tid) }
    George Mpouras, Aug 8, 2013
    #1
    1. Advertising

  2. George Mpouras <>
    writes:

    > The idea is to finish a "special" job as soon as possible by auto
    > split it and explicitly assign its parts to dedicated cores.


    Often it will be hard to reliably split the job into into a number of
    chunks that precisely fits with the number of cores available.

    Most of the time I split the task into natural chunks and then maintain
    a queue of chunks to be processed. Then I fork a new process for each
    chunk with some code to ensure that I only have $N jobs running at the
    same time.

    This scheme is implemented by Parallel::ForkManager available on CPAN.

    https://metacpan.org/module/Parallel::ForkManager

    I have never cared about pinning a task to a specific CPU. Most of my
    task are inherently IO-bound and often running on servers doing other
    work at the same time. Both issues that makes pinning less important, if
    not right out bad.

    //Makholm
    Peter Makholm, Aug 8, 2013
    #2
    1. Advertising

  3. > This scheme is implemented by Parallel::ForkManager available on CPAN.
    >
    > https://metacpan.org/module/Parallel::ForkManager


    this module is a only a fork wrapper that keep track of the threads


    > I have never cared about pinning a task to a specific CPU. Most of my
    > task are inherently IO-bound and often running on servers doing other
    > work at the same time. Both issues that makes pinning less important, if
    > not right out bad.
    >
    > //Makholm
    >


    it can be bad or no. It depends of the scenario
    George Mpouras, Aug 8, 2013
    #3
  4. George Mpouras <>
    writes:
    [...]
    > my @Threads;
    >
    > foreach my $cpu (0 .. (qx[grep "processor" /proc/cpuinfo|wc -l] - 1))
    > {
    > my $answer = fork;
    > die "Could not fork because \"$^E\"\n" unless defined $answer;
    >
    > if (0 == $answer)
    > {
    > print "I the thread $$ doing some parallel work\n";
    > for(;;){}
    > exit
    > }
    > else
    > {
    > push @Threads, $answer;
    > `/bin/taskset -pc $cpu $answer`;
    > }
    > }
    >
    > print "Main program $$ waiting the threads: @Threads\n";
    > sleep 20;
    > foreach my $tid (@Threads) { kill(9,$tid) }


    Consistent indentation would make this a lot easier to read.

    `/bin/taskset -pc $cpu $answer`;

    Since you're not using the output, this would be clearer as:

    system("/bin/taskset -pc $cpu $answer");

    or, even better:

    system('/bin/taskset', '-pc', $cpu, $answer);

    which avoids the overhead of invoking a shell.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Working, but not speaking, for JetHead Development, Inc.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Aug 8, 2013
    #4
  5. On 8/8/2013 4:55 AM, George Mpouras wrote:
    > The idea is to finish a "special" job as soon as possible by auto split
    > it and explicitly assign its parts to dedicated cores.
    > it worked with significant time benefits. Simplistic the idea is the
    > following
    > ...
    >
    > print "Main program $$ waiting the threads: @Threads\n";
    > sleep 20;
    > foreach my $tid (@Threads) { kill(9,$tid) }

    ^^^^^^^^^^^^

    Why not first try waitpid to reap the forked processes normally before
    resorting to SIGKILL...?

    --
    Charles DeRykus
    Charles DeRykus, Aug 8, 2013
    #5
  6. On 8/8/2013 11:21 AM, Charles DeRykus wrote:
    > On 8/8/2013 4:55 AM, George Mpouras wrote:
    >> The idea is to finish a "special" job as soon as possible by auto split
    >> it and explicitly assign its parts to dedicated cores.
    >> it worked with significant time benefits. Simplistic the idea is the
    >> following
    >> ...
    >>
    >> print "Main program $$ waiting the threads: @Threads\n";
    >> sleep 20;
    >> foreach my $tid (@Threads) { kill(9,$tid) }

    > ^^^^^^^^^^^^
    >
    > Why not first try waitpid to reap the forked processes normally before
    > resorting to SIGKILL...?
    >


    Also, preferable to try SIGTERM first:

    kill(TERM=>$tid) or kill(KILL=>$tid)

    --
    Charles DeRykus
    Charles DeRykus, Aug 8, 2013
    #6
  7. George Mpouras

    John Black Guest

    In article <>, says...
    >
    > George Mpouras <>
    > writes:
    >
    > > The idea is to finish a "special" job as soon as possible by auto
    > > split it and explicitly assign its parts to dedicated cores.

    >
    > Often it will be hard to reliably split the job into into a number of
    > chunks that precisely fits with the number of cores available.
    >
    > Most of the time I split the task into natural chunks and then maintain
    > a queue of chunks to be processed. Then I fork a new process for each
    > chunk with some code to ensure that I only have $N jobs running at the
    > same time.
    >
    > This scheme is implemented by Parallel::ForkManager available on CPAN.
    >
    > https://metacpan.org/module/Parallel::ForkManager
    >
    > I have never cared about pinning a task to a specific CPU. Most of my
    > task are inherently IO-bound and often running on servers doing other
    > work at the same time. Both issues that makes pinning less important, if
    > not right out bad.
    >
    > //Makholm


    Can anyone describe the pros/cons of using fork here instead of the routines provided in "use
    threads"?

    http://perldoc.perl.org/threads.html

    John Black
    John Black, Aug 8, 2013
    #7
  8. John Black <> writes:
    > In article <>, says...
    >> George Mpouras <>
    >> writes:
    >>
    >> > The idea is to finish a "special" job as soon as possible by auto
    >> > split it and explicitly assign its parts to dedicated cores.

    >>
    >> Often it will be hard to reliably split the job into into a number of
    >> chunks that precisely fits with the number of cores available.
    >>
    >> Most of the time I split the task into natural chunks and then maintain
    >> a queue of chunks to be processed. Then I fork a new process for each
    >> chunk with some code to ensure that I only have $N jobs running at the
    >> same time.
    >>
    >> This scheme is implemented by Parallel::ForkManager available on CPAN.
    >>
    >> https://metacpan.org/module/Parallel::ForkManager
    >>
    >> I have never cared about pinning a task to a specific CPU. Most of my
    >> task are inherently IO-bound and often running on servers doing other
    >> work at the same time. Both issues that makes pinning less important, if
    >> not right out bad.
    >>
    >> //Makholm

    >
    > Can anyone describe the pros/cons of using fork here instead of the routines provided in "use
    > threads"?
    >
    > http://perldoc.perl.org/threads.html


    Perl threading support is based on the Windows 'fork emulation': It
    creates a new interpreter for every thread which doesn't share memory
    with any other interpreter. This means 'createing a thread' is
    expensive (comparable to forking prior to COW because it essentially
    'copies the complete process') and 'Perl threads' use a lot of
    memory. Also, inter-thread communication is (reportedly) based on tied
    variables, another 'slow' mechanism.
    Rainer Weikusat, Aug 8, 2013
    #8
  9. Στις 8/8/2013 21:42, ο/η Charles DeRykus έγÏαψε:
    > kill(TERM=>$tid) or kill(KILL=>$tid)


    what I wrote it was only quick example, your recommendation of
    kill(TERM=>$tid) is very good, in cases you want to tell the process
    gently to go away
    George Mpouras, Aug 8, 2013
    #9
  10. George Mpouras

    John Black Guest

    In article <>, says...
    > > Can anyone describe the pros/cons of using fork here instead of the routines provided in "use
    > > threads"?
    > >
    > > http://perldoc.perl.org/threads.html

    >
    > Perl threading support is based on the Windows 'fork emulation':


    Are you saying that "use threads" should not or cannot be used on non-Windows platforms? And
    am I to infer that fork cannot be used on Windows platforms?

    > It
    > creates a new interpreter for every thread which doesn't share memory
    > with any other interpreter.


    Does fork not have this problem? How would it not since Perl is interpreted?

    > This means 'createing a thread' is
    > expensive (comparable to forking prior to COW because it essentially


    What is COW?

    > 'copies the complete process') and 'Perl threads' use a lot of
    > memory. Also, inter-thread communication is (reportedly) based on tied
    > variables, another 'slow' mechanism.


    I've programmed lots in Perl but never multi-threaded yet. I have been wanting to make some
    of my programs multi-threaded but haven't gotten around to learning that yet. Ideally, I'd
    like a general solution that works for both Windows and non-Windows platforms. Is there a
    good way to do multi-threading that is platform independent?

    John Black
    John Black, Aug 9, 2013
    #10
  11. On 2013-08-09 15:20, John Black <> wrote:
    > In article <>, says...
    >> > Can anyone describe the pros/cons of using fork here instead of the
    >> > routines provided in "use threads"?
    >> >
    >> > http://perldoc.perl.org/threads.html

    >>
    >> Perl threading support is based on the Windows 'fork emulation':

    >
    > Are you saying that "use threads" should not or cannot be used on
    > non-Windows platforms? And am I to infer that fork cannot be used on
    > Windows platforms?


    The windows OS doesn't have a fork system call. The perl implementation
    on Windows emulates it by cloning the interpreter and starting a new
    thread (basically what starting a thread with use threads does).


    >> It creates a new interpreter for every thread which doesn't share
    >> memory with any other interpreter.

    >
    > Does fork not have this problem? How would it not since Perl is
    > interpreted?


    Yes, but the OS can do this much more efficiently than the perl
    interpreter can.

    Firstly, the OS doesn't actually have to copy the process memory at all.
    It can just set all pages to read only, let both processes refer to
    those read-only pages and carry on. When one process tries to modify
    some data, it can intercept that write and create a writable copy of
    that page on the fly ("copy on write" = "COW"). That makes startup of
    the new process very fast and it often also saves time over the lifetime
    of the process since many pages will never be written to.

    Secondly, even if the OS has to (eventually) copy all the process
    memory, it can do it blindly via bulk copies, while the perl interpreter
    needs to copy each variable individually, manipulate pointers, etc.
    (because the copy will live in the same process, i.e. address space).


    >> This means 'createing a thread' is expensive (comparable to forking
    >> prior to COW because it essentially

    >
    > What is COW?


    See above.


    >> 'copies the complete process') and 'Perl threads' use a lot of
    >> memory. Also, inter-thread communication is (reportedly) based on tied
    >> variables, another 'slow' mechanism.

    >
    > I've programmed lots in Perl but never multi-threaded yet. I have
    > been wanting to make some of my programs multi-threaded but haven't
    > gotten around to learning that yet. Ideally, I'd like a general
    > solution that works for both Windows and non-Windows platforms. Is
    > there a good way to do multi-threading that is platform independent?


    Multithreading is platform-independent, it is just as horrible on Unix
    as on Windows. Fork is fast on Unix and slow on Windows (almost always
    - as Ben stated, there are some rare situations where threads are
    faster).

    If you are serious about writing an application which can exploit
    parallelism on Unix and Windows, you should probably look at
    POE for a high-level framework, ZeroMQ for relatively low-level
    building blocks or even the IPC primitives for the DIY approach.

    hp



    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Aug 9, 2013
    #11
  12. John Black <> writes:
    > In article <>, says...
    >> > Can anyone describe the pros/cons of using fork here instead of the routines provided in "use
    >> > threads"?
    >> >
    >> > http://perldoc.perl.org/threads.html

    >>
    >> Perl threading support is based on the Windows 'fork emulation':

    >
    > Are you saying that "use threads" should not or cannot be used on
    > non-Windows platforms?


    Why do you think so?

    > And am I to infer that fork cannot be used on Windows platforms?


    Fork doesn't exist on VMSish platforms because Digital didn't invent
    it. Because of this, fork is emulated on Windows, cf

    The fork() emulation is implemented at the level of the Perl
    interpreter. What this means in general is that running
    fork() will actually clone the running interpreter and all its
    state, and run the cloned interpreter in a separate thread,
    beginning execution in the new thread just after the point
    where the fork() was called in the parent.
    [perldoc perlfork]

    >> It creates a new interpreter for every thread which doesn't share memory
    >> with any other interpreter. This means 'createing a thread' is
    >> expensive (comparable to forking prior to COW because it essentially


    [sensible ordering restored]

    > Does fork not have this problem? How would it not since Perl is interpreted?
    > What is COW?


    Copy-on-write. That's how fork has been usually implemented since
    System V except on sufficiently ancient BSD-based system (prior to
    4.4BSD) because it wasn't invented in Berkeley and the guys who
    'invented stuff in Berkeley' and thus,got their code into the
    BSD-kernel regardless of any technical merits it might have agreed
    with the Digital tribe in one important aspect: Nobody uses fork for
    multiprocessing (to this date, this is probably true for BSD because
    it faithfully preserves UNIX(*) V7 'fork failure semantics', IOW, large
    processes can't be forked[*]). Consequently, an even remotely efficient
    fork which supports actual concurrent execution isn't needed (a
    splendid example of a self-fullfilling prophecy). Back to COW: This
    means that, by default, parent and child share all 'physical memory'
    after a fork with individual page copies being created as the need
    arises. This is beneficial to byte-compiled languages because both
    copies can not only share the interpeter code but also all 'read-only
    after compilation phase' parts of the interpreter state.

    [*] My opinion on the 'system which refuse to work because it is too
    afraid of possible future problems' is "I don't want it". YMMV.

    >> 'copies the complete process') and 'Perl threads' use a lot of
    >> memory. Also, inter-thread communication is (reportedly) based on tied
    >> variables, another 'slow' mechanism.

    >
    > I've programmed lots in Perl but never multi-threaded yet. I have
    > been wanting to make some of my programs multi-threaded but haven't
    > gotten around to learning that yet. Ideally, I'd like a general
    > solution that works for both Windows and non-Windows platforms. Is
    > there a good way to do multi-threading that is platform independent?


    If you don't mind createing code whose runtime behaviour is even more
    'UNIX(*) V7 style' than 4.4BSD, the Perl threading support is the way
    to go. If 'predominantly targetting UNIX(*)', I think fork is the
    better choice, especially as this will auto-degrade to something which
    works on Windows. OTOH, it pretty much won't work anywhere else. But
    one gets some 'nice' features in return such as "multiple threads of
    execution which can't crash each other".
    Rainer Weikusat, Aug 9, 2013
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. CwK

    fork

    CwK, Dec 22, 2003, in forum: Perl
    Replies:
    2
    Views:
    920
    Misha Gale
    Jan 4, 2004
  2. Josh Denny

    fork in perl 5.8.3 on windows

    Josh Denny, Mar 2, 2004, in forum: Perl
    Replies:
    2
    Views:
    6,644
    Jim Gibson
    Mar 2, 2004
  3. Patrick
    Replies:
    1
    Views:
    511
  4. xchris
    Replies:
    5
    Views:
    4,116
  5. Eric Snow

    os.fork and pty.fork

    Eric Snow, Jan 8, 2009, in forum: Python
    Replies:
    0
    Views:
    562
    Eric Snow
    Jan 8, 2009
Loading...

Share This Page