fork() slowness.

dmitrym · Aug 16, 2005

Afternoon ladies and gentlemen. I have a strange problem that
I've ran into, and I'd appreciate any and all input that you can
provide.

Essentially, I'm writing a quick and dirty stress test script
with WWW::Mechanize. For the purposes of development, I worked on
the single process version first. I ran my script and got about
0.5 seconds per transaction. To make matters more interesting, I
modified my script so that there's an option to spawn multiple
threads with fork(). Here's where the interesting stuff begins.
Even with two threads, the performance fell sharply, to around
1.5 seconds/transaction. Increasing the threads count provided
for further degradation.

Now the obvious thing would be to assume that the remote service
is slowing down, however this is not so. I simply ran 2 instances
of the the single thread versions of my script and received the initial
0.5
seconds/thread results. Furthermore, some of the stress scripts
are implemented in jmeter which seem to indicate that the remote
service is capable of far more throughput that I'm doing with my
perl script.

I combed through the script but couldn't find anything that would
explain the delay. I then went and wrote a simple script to
emulate what I'm doing and the results are similar.

If I start a single process, I get an average of about 0.0001
second per "transaction" (which in this case is a few hundreds of
iteration of a square root).

If I start a single process which spawns 2 threads, I get about
0.0004 seconds per transaction.

If I start two single thread processes they'll run at around
0.0001 seconds/transaction.

So why is this happening? As far as I know, fork() will just copy
all the data (might be slow, but I dont care about that) and then
the two processes go their separate ways. So why isn't this
happening, and why are my processing slowing down significantly
with moderate increase in thread number?

Here's the script I used to for playing around. Modify the
$threads and $iterations and it'll dump the average time per
transaction and standard deviation.

I'll appreciate any insight.

Thanks.

Script is attached below:
-------------------------

#!/usr/bin/perl -w

use Time::HiRes qw( usleep ualarm gettimeofday tv_interval nanosleep );
use Statistics:

escriptive;
use File::Temp qw/ tempfile tempdir /;

use strict;
use diagnostics;

my $threads = 2;
my $iterations = 100;
my ($fh, $STAT) = tempfile();

# let every thread do it's thing.
for(my $i = 0; $i < $threads; ++$i)
{
my $pid = fork();
if (!$pid)
{
sleep(1);
for(my $k = 0; $k < $iterations; ++$k)
{
my $t0 = [gettimeofday()];
for(my $j = 0; $j < 1000; ++$j)
{
my $asdf = sqrt(123234234234123);
}
my $elapsed = tv_interval ($t0);
#print "$elapsed\n";
writeStat($elapsed);
}
exit(0);
}

}

# sleep waiting for threads to finish.
for (my $i = 0; $i < $threads; ++$i)
{
wait();

}

analyzeStats();
exit(0);
# done

sub writeStat
{
my $time = shift;

open (STAT, ">>$STAT");
print STAT $time . "\n";
close(STAT);

}

sub analyzeStats
{
print "Analyzing stats...\n";
open (STAT, "$STAT");
my @data;
while(<STAT>)
{
chomp; chomp;
push (@data, $_);
}

if (@data)
{
my $stat = Statistics:

escriptive::Full->new();
$stat->add_data(@data);

print "Mean time per wager: " . $stat->mean() . "
seconds\n";
if (scalar(@data) > 1)
{
print "StdDev : " .
$stat->standard_deviation() . " seconds\n";
}
}
else
{
print "There appears to be no data.\n";
}
}

PS: I have posted this in comp.lang.perl.moderated originally, but I'm
not getting many responses there.

Cheers.

xhoster · Aug 17, 2005

I combed through the script but couldn't find anything that would
explain the delay. I then went and wrote a simple script to
emulate what I'm doing and the results are similar.

It does a poor job of emulating the original case. Your simple script
is almost completely CPU bound.

If I start a single process, I get an average of about 0.0001
second per "transaction" (which in this case is a few hundreds of
iteration of a square root).

If I start a single process which spawns 2 threads, I get about
0.0004 seconds per transaction.

Are you sure your numbers are statistically valid? I get differences in
timing that are that large (a factor of 4) upon running the exact same code
several times.

If I start two single thread processes they'll run at around
0.0001 seconds/transaction.

Apiece, or combined?

So why is this happening? As far as I know, fork() will just copy
all the data (might be slow, but I dont care about that) and then
the two processes go their separate ways.

What OS are you using?

....

sub writeStat
{
my $time = shift;

open (STAT, ">>$STAT");
print STAT $time . "\n";
close(STAT);

}

File corruption. Either use separate temp files per forked process, or
implement file locking. Or just rip it all out. All of this re-opening
and IO is dominating the program's runtime on my machine.

Crap, I just realized something else. Your accounting is totally Enron-ed.
Because you are using wall time to compute your intervals, every interval
is counted multiple times. If 5 threads all start, and then a second later
5 threads all finish, each thread will count itself as taking a second, for
a total of 5 seconds, even though there was only 1 second there.

Xho

dmitrym · Aug 17, 2005

It does a poor job of emulating the original case. Your simple script
is almost completely CPU bound.

I agree, but I expected a consistent response regardless of the
"transaction" in the middle. The results were remarkably similar to the
real script that's giving me trouble.

Are you sure your numbers are statistically valid? I get differences in
timing that are that large (a factor of 4) upon running the exact same code
several times.

I ran a few hundred thousand iterations. I have no basis for this, but
my gut feeling says that should be enough. Regardless the results were
consistent.

Apiece, or combined?

0.0001 value was an average per a large number of transactions with a
very small standard deviation (i'll verify this tomorrow).

What OS are you using?

Linux, running kernel 2.6.11.

File corruption. Either use separate temp files per forked process, or
implement file locking. Or just rip it all out. All of this re-opening
and IO is dominating the program's runtime on my machine.

Corruption why? I must be missing something crucial here..but the data
is small enough (just a few bytes) to be reliably written as an atomic
transaction. Am I wrong in this? To further confuse the issue, all of
the expected values are written (ie: 10 threads x 1000 iterations =
10,000 values).

Crap, I just realized something else. Your accounting is totally Enron-ed.
Because you are using wall time to compute your intervals, every interval
is counted multiple times. If 5 threads all start, and then a second later
5 threads all finish, each thread will count itself as taking a second, for
a total of 5 seconds, even though there was only 1 second there.

Hmm. Frankly, I'm a little confused. The time that I try to measure is
the time the "transaction" takes to complete. Even if wall time is
taken, it's unique per process (thread) so the time is appropriately
written down. If two threads (processes?) take 1 second each to
compute, surely the time returned will not be 2 seconds. Will it? More
things to consider tomorrow.

Thanks for the input, you've provided some directions for me.

Cheers.

xhoster · Aug 17, 2005

I agree, but I expected a consistent response regardless of the
"transaction" in the middle. The results were remarkably similar to the
real script that's giving me trouble.

But the results may be look similar for dissimilar reasons.

I ran a few hundred thousand iterations. I have no basis for this, but
my gut feeling says that should be enough. Regardless the results were
consistent.

Apparently that is one benefit of running on a real OS

0.0001 value was an average per a large number of transactions with a
very small standard deviation (i'll verify this tomorrow).

OK. For me the stdev was nearly ten times the average. Because a few of
the "transactions" were taking 100s of times longer than the typical
transaction. Also, any given typical interval reportage was for about the
same length of time as the granulatiry in the measurement, which concerns
me.

Linux, running kernel 2.6.11.

Good. From you interchangable use of "fork" and "threads", I had kind of
assumed you were using Windows. And I had been running your code on
Windows, just for a lark.

Corruption why? I must be missing something crucial here..but the data
is small enough (just a few bytes) to be reliably written as an atomic
transaction. Am I wrong in this?

Maybe on Linux that is the case. On Windows I definitely saw corruption.

Hmm. Frankly, I'm a little confused. The time that I try to measure is
the time the "transaction" takes to complete. Even if wall time is
taken, it's unique per process (thread) so the time is appropriately
written down. If two threads (processes?) take 1 second each to
compute, surely the time returned will not be 2 seconds. Will it?

If each process takes 1 second of CPU time, and there are two processes
which run simultaneously, then each process will take 2 seconds of wall
time because it is only getting the CPU half the time. Each process
reports this 2 seconds of wall time as their intervals. If you take the
average of these reported intervals, the average is 2 seconds per
transactions. But really, you had 2 transactions in two seconds, for 1
second per transaction. The two seconds that one process counted are the
same two seconds as the two seconds the other process counted.

Your accouting will only give you accurate results if any given process is
never interupted and kicked off the CPU at any point between the execution
of:

my $t0 = [gettimeofday()];

and the execution of the corresponding:

my $elapsed = tv_interval ($t0);

This isn't a good assumption. In fact, since your processes presumably
spends most of their time between these two lines, that is where they are
*most* likely to be when they get interupted.

Xho

fork it	11	Aug 8, 2013
fork/exec question	6	Dec 7, 2013
Fork (and exec) in a threaded script.	4	Aug 16, 2011
fork and blocking...	3	Apr 19, 2011
issue with multiprocess - fork	2	Aug 6, 2009
Problem with fork on some RHEL servers	0	Mar 9, 2012
problem with fork	8	Aug 3, 2009
In C, the longest palindromic subsequence multithread exists	0	Nov 23, 2022

fork() slowness.

dmitrym

xhoster

dmitrym

xhoster

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads