fork() slowness.

Discussion in 'Perl Misc' started by dmitrym@gmail.com, Aug 16, 2005.

  1. Guest

    Afternoon ladies and gentlemen. I have a strange problem that
    I've ran into, and I'd appreciate any and all input that you can
    provide.

    Essentially, I'm writing a quick and dirty stress test script
    with WWW::Mechanize. For the purposes of development, I worked on
    the single process version first. I ran my script and got about
    0.5 seconds per transaction. To make matters more interesting, I
    modified my script so that there's an option to spawn multiple
    threads with fork(). Here's where the interesting stuff begins.
    Even with two threads, the performance fell sharply, to around
    1.5 seconds/transaction. Increasing the threads count provided
    for further degradation.

    Now the obvious thing would be to assume that the remote service
    is slowing down, however this is not so. I simply ran 2 instances
    of the the single thread versions of my script and received the initial
    0.5
    seconds/thread results. Furthermore, some of the stress scripts
    are implemented in jmeter which seem to indicate that the remote
    service is capable of far more throughput that I'm doing with my
    perl script.

    I combed through the script but couldn't find anything that would
    explain the delay. I then went and wrote a simple script to
    emulate what I'm doing and the results are similar.

    If I start a single process, I get an average of about 0.0001
    second per "transaction" (which in this case is a few hundreds of
    iteration of a square root).

    If I start a single process which spawns 2 threads, I get about
    0.0004 seconds per transaction.

    If I start two single thread processes they'll run at around
    0.0001 seconds/transaction.

    So why is this happening? As far as I know, fork() will just copy
    all the data (might be slow, but I dont care about that) and then
    the two processes go their separate ways. So why isn't this
    happening, and why are my processing slowing down significantly
    with moderate increase in thread number?

    Here's the script I used to for playing around. Modify the
    $threads and $iterations and it'll dump the average time per
    transaction and standard deviation.

    I'll appreciate any insight.

    Thanks.

    Script is attached below:
    -------------------------

    #!/usr/bin/perl -w

    use Time::HiRes qw( usleep ualarm gettimeofday tv_interval nanosleep );
    use Statistics::Descriptive;
    use File::Temp qw/ tempfile tempdir /;

    use strict;
    use diagnostics;

    my $threads = 2;
    my $iterations = 100;
    my ($fh, $STAT) = tempfile();

    # let every thread do it's thing.
    for(my $i = 0; $i < $threads; ++$i)
    {
    my $pid = fork();
    if (!$pid)
    {
    sleep(1);
    for(my $k = 0; $k < $iterations; ++$k)
    {
    my $t0 = [gettimeofday()];
    for(my $j = 0; $j < 1000; ++$j)
    {
    my $asdf = sqrt(123234234234123);
    }
    my $elapsed = tv_interval ($t0);
    #print "$elapsed\n";
    writeStat($elapsed);
    }
    exit(0);
    }

    }

    # sleep waiting for threads to finish.
    for (my $i = 0; $i < $threads; ++$i)
    {
    wait();

    }

    analyzeStats();
    exit(0);
    # done

    sub writeStat
    {
    my $time = shift;

    open (STAT, ">>$STAT");
    print STAT $time . "\n";
    close(STAT);

    }

    sub analyzeStats
    {
    print "Analyzing stats...\n";
    open (STAT, "$STAT");
    my @data;
    while(<STAT>)
    {
    chomp; chomp;
    push (@data, $_);
    }

    if (@data)
    {
    my $stat = Statistics::Descriptive::Full->new();
    $stat->add_data(@data);

    print "Mean time per wager: " . $stat->mean() . "
    seconds\n";
    if (scalar(@data) > 1)
    {
    print "StdDev : " .
    $stat->standard_deviation() . " seconds\n";
    }
    }
    else
    {
    print "There appears to be no data.\n";
    }
    }



    PS: I have posted this in comp.lang.perl.moderated originally, but I'm
    not getting many responses there.

    Cheers.
     
    , Aug 16, 2005
    #1
    1. Advertising

  2. Guest

    wrote:
    ....
    >
    > I combed through the script but couldn't find anything that would
    > explain the delay. I then went and wrote a simple script to
    > emulate what I'm doing and the results are similar.


    It does a poor job of emulating the original case. Your simple script
    is almost completely CPU bound.

    >
    > If I start a single process, I get an average of about 0.0001
    > second per "transaction" (which in this case is a few hundreds of
    > iteration of a square root).
    >
    > If I start a single process which spawns 2 threads, I get about
    > 0.0004 seconds per transaction.


    Are you sure your numbers are statistically valid? I get differences in
    timing that are that large (a factor of 4) upon running the exact same code
    several times.

    >
    > If I start two single thread processes they'll run at around
    > 0.0001 seconds/transaction.


    Apiece, or combined?

    > So why is this happening? As far as I know, fork() will just copy
    > all the data (might be slow, but I dont care about that) and then
    > the two processes go their separate ways.


    What OS are you using?

    ....
    > sub writeStat
    > {
    > my $time = shift;
    >
    > open (STAT, ">>$STAT");
    > print STAT $time . "\n";
    > close(STAT);
    >
    > }


    File corruption. Either use separate temp files per forked process, or
    implement file locking. Or just rip it all out. All of this re-opening
    and IO is dominating the program's runtime on my machine.

    Crap, I just realized something else. Your accounting is totally Enron-ed.
    Because you are using wall time to compute your intervals, every interval
    is counted multiple times. If 5 threads all start, and then a second later
    5 threads all finish, each thread will count itself as taking a second, for
    a total of 5 seconds, even though there was only 1 second there.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 17, 2005
    #2
    1. Advertising

  3. Guest

    wrote:
    > It does a poor job of emulating the original case. Your simple script
    > is almost completely CPU bound.


    I agree, but I expected a consistent response regardless of the
    "transaction" in the middle. The results were remarkably similar to the
    real script that's giving me trouble.

    > Are you sure your numbers are statistically valid? I get differences in
    > timing that are that large (a factor of 4) upon running the exact same code
    > several times.


    I ran a few hundred thousand iterations. I have no basis for this, but
    my gut feeling says that should be enough. Regardless the results were
    consistent.


    > > If I start two single thread processes they'll run at around
    > > 0.0001 seconds/transaction.

    >
    > Apiece, or combined?


    0.0001 value was an average per a large number of transactions with a
    very small standard deviation (i'll verify this tomorrow).


    > What OS are you using?


    Linux, running kernel 2.6.11.

    > > sub writeStat
    > > ...

    > File corruption. Either use separate temp files per forked process, or
    > implement file locking. Or just rip it all out. All of this re-opening
    > and IO is dominating the program's runtime on my machine.


    Corruption why? I must be missing something crucial here..but the data
    is small enough (just a few bytes) to be reliably written as an atomic
    transaction. Am I wrong in this? To further confuse the issue, all of
    the expected values are written (ie: 10 threads x 1000 iterations =
    10,000 values).

    > Crap, I just realized something else. Your accounting is totally Enron-ed.
    > Because you are using wall time to compute your intervals, every interval
    > is counted multiple times. If 5 threads all start, and then a second later
    > 5 threads all finish, each thread will count itself as taking a second, for
    > a total of 5 seconds, even though there was only 1 second there.


    Hmm. Frankly, I'm a little confused. The time that I try to measure is
    the time the "transaction" takes to complete. Even if wall time is
    taken, it's unique per process (thread) so the time is appropriately
    written down. If two threads (processes?) take 1 second each to
    compute, surely the time returned will not be 2 seconds. Will it? More
    things to consider tomorrow.

    Thanks for the input, you've provided some directions for me.

    Cheers.
     
    , Aug 17, 2005
    #3
  4. Guest

    wrote:
    > wrote:
    > > It does a poor job of emulating the original case. Your simple script
    > > is almost completely CPU bound.

    >
    > I agree, but I expected a consistent response regardless of the
    > "transaction" in the middle. The results were remarkably similar to the
    > real script that's giving me trouble.


    But the results may be look similar for dissimilar reasons.

    > > Are you sure your numbers are statistically valid? I get differences
    > > in timing that are that large (a factor of 4) upon running the exact
    > > same code several times.

    >
    > I ran a few hundred thousand iterations. I have no basis for this, but
    > my gut feeling says that should be enough. Regardless the results were
    > consistent.


    Apparently that is one benefit of running on a real OS :)

    >
    > > > If I start two single thread processes they'll run at around
    > > > 0.0001 seconds/transaction.

    > >
    > > Apiece, or combined?

    >
    > 0.0001 value was an average per a large number of transactions with a
    > very small standard deviation (i'll verify this tomorrow).


    OK. For me the stdev was nearly ten times the average. Because a few of
    the "transactions" were taking 100s of times longer than the typical
    transaction. Also, any given typical interval reportage was for about the
    same length of time as the granulatiry in the measurement, which concerns
    me.

    > > What OS are you using?

    >
    > Linux, running kernel 2.6.11.


    Good. From you interchangable use of "fork" and "threads", I had kind of
    assumed you were using Windows. And I had been running your code on
    Windows, just for a lark.

    >
    > > > sub writeStat
    > > > ...

    > > File corruption. Either use separate temp files per forked process, or
    > > implement file locking. Or just rip it all out. All of this
    > > re-opening and IO is dominating the program's runtime on my machine.

    >
    > Corruption why? I must be missing something crucial here..but the data
    > is small enough (just a few bytes) to be reliably written as an atomic
    > transaction. Am I wrong in this?


    Maybe on Linux that is the case. On Windows I definitely saw corruption.

    >
    > > Crap, I just realized something else. Your accounting is totally
    > > Enron-ed. Because you are using wall time to compute your intervals,
    > > every interval is counted multiple times. If 5 threads all start, and
    > > then a second later 5 threads all finish, each thread will count itself
    > > as taking a second, for a total of 5 seconds, even though there was
    > > only 1 second there.

    >
    > Hmm. Frankly, I'm a little confused. The time that I try to measure is
    > the time the "transaction" takes to complete. Even if wall time is
    > taken, it's unique per process (thread) so the time is appropriately
    > written down. If two threads (processes?) take 1 second each to
    > compute, surely the time returned will not be 2 seconds. Will it?


    If each process takes 1 second of CPU time, and there are two processes
    which run simultaneously, then each process will take 2 seconds of wall
    time because it is only getting the CPU half the time. Each process
    reports this 2 seconds of wall time as their intervals. If you take the
    average of these reported intervals, the average is 2 seconds per
    transactions. But really, you had 2 transactions in two seconds, for 1
    second per transaction. The two seconds that one process counted are the
    same two seconds as the two seconds the other process counted.

    Your accouting will only give you accurate results if any given process is
    never interupted and kicked off the CPU at any point between the execution
    of:

    my $t0 = [gettimeofday()];

    and the execution of the corresponding:

    my $elapsed = tv_interval ($t0);

    This isn't a good assumption. In fact, since your processes presumably
    spends most of their time between these two lines, that is where they are
    *most* likely to be when they get interupted.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 17, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason K
    Replies:
    6
    Views:
    4,008
    Jeff Flinn
    May 12, 2005
  2. re.I slowness

    , Mar 30, 2006, in forum: Python
    Replies:
    1
    Views:
    351
    Paul McGuire
    Mar 30, 2006
  3. Replies:
    10
    Views:
    558
  4. Replies:
    2
    Views:
    396
  5. Eric Snow

    os.fork and pty.fork

    Eric Snow, Jan 8, 2009, in forum: Python
    Replies:
    0
    Views:
    581
    Eric Snow
    Jan 8, 2009
Loading...

Share This Page