fork it

George Mpouras · Aug 8, 2013

The idea is to finish a "special" job as soon as possible by auto split
it and explicitly assign its parts to dedicated cores.
it worked with significant time benefits. Simplistic the idea is the
following

my @Threads;

foreach my $cpu (0 .. (qx[grep "processor" /proc/cpuinfo|wc -l] - 1))
{
my $answer = fork;
die "Could not fork because \"$^E\"\n" unless defined $answer;

if (0 == $answer)
{
print "I the thread $$ doing some parallel work\n";
for(;

{}
exit
}
else
{
push @Threads, $answer;
`/bin/taskset -pc $cpu $answer`;
}
}

print "Main program $$ waiting the threads: @Threads\n";
sleep 20;
foreach my $tid (@Threads) { kill(9,$tid) }

Peter Makholm · Aug 8, 2013

The idea is to finish a "special" job as soon as possible by auto
split it and explicitly assign its parts to dedicated cores.

Often it will be hard to reliably split the job into into a number of
chunks that precisely fits with the number of cores available.

Most of the time I split the task into natural chunks and then maintain
a queue of chunks to be processed. Then I fork a new process for each
chunk with some code to ensure that I only have $N jobs running at the
same time.

This scheme is implemented by Parallel::ForkManager available on CPAN.

https://metacpan.org/module/Parallel::ForkManager

I have never cared about pinning a task to a specific CPU. Most of my
task are inherently IO-bound and often running on servers doing other
work at the same time. Both issues that makes pinning less important, if
not right out bad.

//Makholm

George Mpouras · Aug 8, 2013

This scheme is implemented by Parallel::ForkManager available on CPAN.

https://metacpan.org/module/Parallel::ForkManager

this module is a only a fork wrapper that keep track of the threads

I have never cared about pinning a task to a specific CPU. Most of my
task are inherently IO-bound and often running on servers doing other
work at the same time. Both issues that makes pinning less important, if
not right out bad.

//Makholm

it can be bad or no. It depends of the scenario

Keith Thompson · Aug 8, 2013

George Mpouras <[email protected]>
writes:
[...]

my @Threads;

foreach my $cpu (0 .. (qx[grep "processor" /proc/cpuinfo|wc -l] - 1))
{
my $answer = fork;
die "Could not fork because \"$^E\"\n" unless defined $answer;

if (0 == $answer)
{
print "I the thread $$ doing some parallel work\n";
for(;{}
exit
}
else
{
push @Threads, $answer;
`/bin/taskset -pc $cpu $answer`;
}
}

print "Main program $$ waiting the threads: @Threads\n";
sleep 20;
foreach my $tid (@Threads) { kill(9,$tid) }

Consistent indentation would make this a lot easier to read.

`/bin/taskset -pc $cpu $answer`;

Since you're not using the output, this would be clearer as:

system("/bin/taskset -pc $cpu $answer");

or, even better:

system('/bin/taskset', '-pc', $cpu, $answer);

which avoids the overhead of invoking a shell.

Charles DeRykus · Aug 8, 2013

The idea is to finish a "special" job as soon as possible by auto split
it and explicitly assign its parts to dedicated cores.
it worked with significant time benefits. Simplistic the idea is the
following
...

print "Main program $$ waiting the threads: @Threads\n";
sleep 20;
foreach my $tid (@Threads) { kill(9,$tid) }

^^^^^^^^^^^^

Why not first try waitpid to reap the forked processes normally before
resorting to SIGKILL...?

Charles DeRykus · Aug 8, 2013

^^^^^^^^^^^^

Why not first try waitpid to reap the forked processes normally before
resorting to SIGKILL...?

Also, preferable to try SIGTERM first:

kill(TERM=>$tid) or kill(KILL=>$tid)

John Black · Aug 8, 2013

Often it will be hard to reliably split the job into into a number of
chunks that precisely fits with the number of cores available.

Most of the time I split the task into natural chunks and then maintain
a queue of chunks to be processed. Then I fork a new process for each
chunk with some code to ensure that I only have $N jobs running at the
same time.

This scheme is implemented by Parallel::ForkManager available on CPAN.

https://metacpan.org/module/Parallel::ForkManager

I have never cared about pinning a task to a specific CPU. Most of my
task are inherently IO-bound and often running on servers doing other
work at the same time. Both issues that makes pinning less important, if
not right out bad.

//Makholm

Can anyone describe the pros/cons of using fork here instead of the routines provided in "use
threads"?

http://perldoc.perl.org/threads.html

John Black

Rainer Weikusat · Aug 8, 2013

John Black said:
Can anyone describe the pros/cons of using fork here instead of the routines provided in "use
threads"?

http://perldoc.perl.org/threads.html

Perl threading support is based on the Windows 'fork emulation': It
creates a new interpreter for every thread which doesn't share memory
with any other interpreter. This means 'createing a thread' is
expensive (comparable to forking prior to COW because it essentially
'copies the complete process') and 'Perl threads' use a lot of
memory. Also, inter-thread communication is (reportedly) based on tied
variables, another 'slow' mechanism.

George Mpouras · Aug 8, 2013

Î£Ï„Î¹Ï‚ 8/8/2013 21:42, Î¿/Î· Charles DeRykus ÎÎ³ÏÎ±ÏˆÎµ:

kill(TERM=>$tid) or kill(KILL=>$tid)

what I wrote it was only quick example, your recommendation of
kill(TERM=>$tid) is very good, in cases you want to tell the process
gently to go away

John Black · Aug 9, 2013

Perl threading support is based on the Windows 'fork emulation':

Are you saying that "use threads" should not or cannot be used on non-Windows platforms? And
am I to infer that fork cannot be used on Windows platforms?

It
creates a new interpreter for every thread which doesn't share memory
with any other interpreter.

Does fork not have this problem? How would it not since Perl is interpreted?

This means 'createing a thread' is
expensive (comparable to forking prior to COW because it essentially

What is COW?

'copies the complete process') and 'Perl threads' use a lot of
memory. Also, inter-thread communication is (reportedly) based on tied
variables, another 'slow' mechanism.

I've programmed lots in Perl but never multi-threaded yet. I have been wanting to make some
of my programs multi-threaded but haven't gotten around to learning that yet. Ideally, I'd
like a general solution that works for both Windows and non-Windows platforms. Is there a
good way to do multi-threading that is platform independent?

John Black

Peter J. Holzer · Aug 9, 2013

Are you saying that "use threads" should not or cannot be used on
non-Windows platforms? And am I to infer that fork cannot be used on
Windows platforms?

The windows OS doesn't have a fork system call. The perl implementation
on Windows emulates it by cloning the interpreter and starting a new
thread (basically what starting a thread with use threads does).

Does fork not have this problem? How would it not since Perl is
interpreted?

Yes, but the OS can do this much more efficiently than the perl
interpreter can.

Firstly, the OS doesn't actually have to copy the process memory at all.
It can just set all pages to read only, let both processes refer to
those read-only pages and carry on. When one process tries to modify
some data, it can intercept that write and create a writable copy of
that page on the fly ("copy on write" = "COW"). That makes startup of
the new process very fast and it often also saves time over the lifetime
of the process since many pages will never be written to.

Secondly, even if the OS has to (eventually) copy all the process
memory, it can do it blindly via bulk copies, while the perl interpreter
needs to copy each variable individually, manipulate pointers, etc.
(because the copy will live in the same process, i.e. address space).

What is COW?

See above.

I've programmed lots in Perl but never multi-threaded yet. I have
been wanting to make some of my programs multi-threaded but haven't
gotten around to learning that yet. Ideally, I'd like a general
solution that works for both Windows and non-Windows platforms. Is
there a good way to do multi-threading that is platform independent?

Multithreading is platform-independent, it is just as horrible on Unix
as on Windows. Fork is fast on Unix and slow on Windows (almost always
- as Ben stated, there are some rare situations where threads are
faster).

If you are serious about writing an application which can exploit
parallelism on Unix and Windows, you should probably look at
POE for a high-level framework, ZeroMQ for relatively low-level
building blocks or even the IPC primitives for the DIY approach.

hp

Rainer Weikusat · Aug 9, 2013

John Black said:
Are you saying that "use threads" should not or cannot be used on
non-Windows platforms?

Why do you think so?

And am I to infer that fork cannot be used on Windows platforms?

Fork doesn't exist on VMSish platforms because Digital didn't invent
it. Because of this, fork is emulated on Windows, cf

The fork() emulation is implemented at the level of the Perl
interpreter. What this means in general is that running
fork() will actually clone the running interpreter and all its
state, and run the cloned interpreter in a separate thread,
beginning execution in the new thread just after the point
where the fork() was called in the parent.
[perldoc perlfork]

[sensible ordering restored]

Does fork not have this problem? How would it not since Perl is interpreted?
What is COW?

Copy-on-write. That's how fork has been usually implemented since
System V except on sufficiently ancient BSD-based system (prior to
4.4BSD) because it wasn't invented in Berkeley and the guys who
'invented stuff in Berkeley' and thus,got their code into the
BSD-kernel regardless of any technical merits it might have agreed
with the Digital tribe in one important aspect: Nobody uses fork for
multiprocessing (to this date, this is probably true for BSD because
it faithfully preserves UNIX(*) V7 'fork failure semantics', IOW, large
processes can't be forked[*]). Consequently, an even remotely efficient
fork which supports actual concurrent execution isn't needed (a
splendid example of a self-fullfilling prophecy). Back to COW: This
means that, by default, parent and child share all 'physical memory'
after a fork with individual page copies being created as the need
arises. This is beneficial to byte-compiled languages because both
copies can not only share the interpeter code but also all 'read-only
after compilation phase' parts of the interpreter state.

[*] My opinion on the 'system which refuse to work because it is too
afraid of possible future problems' is "I don't want it". YMMV.

I've programmed lots in Perl but never multi-threaded yet. I have
been wanting to make some of my programs multi-threaded but haven't
gotten around to learning that yet. Ideally, I'd like a general
solution that works for both Windows and non-Windows platforms. Is
there a good way to do multi-threading that is platform independent?

If you don't mind createing code whose runtime behaviour is even more
'UNIX(*) V7 style' than 4.4BSD, the Perl threading support is the way
to go. If 'predominantly targetting UNIX(*)', I think fork is the
better choice, especially as this will auto-degrade to something which
works on Windows. OTOH, it pretty much won't work anywhere else. But
one gets some 'nice' features in return such as "multiple threads of
execution which can't crash each other".

fork and blocking...	3	Apr 19, 2011
issue with multiprocess - fork	2	Aug 6, 2009
problem with fork	8	Aug 3, 2009
fork() slowness.	3	Aug 16, 2005
Fork Problem	2	Nov 18, 2005
Fork in windows	0	Jan 26, 2007
runaway memory leak with LWP and Fork()ing on Windows	4	Nov 2, 2007
fork,exec, and parallel processing	4	Mar 26, 2007

fork it

George Mpouras

Peter Makholm

George Mpouras

Keith Thompson

Charles DeRykus

Charles DeRykus

John Black

Rainer Weikusat

George Mpouras

John Black

Peter J. Holzer

Rainer Weikusat

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads