Strange system() slowdown when using Inline::C

Discussion in 'Perl Misc' started by rbg, Mar 5, 2009.

  1. rbg

    rbg Guest

    Before reading the code below, I want to make it clear that this code
    is just a *simple reproduction* of a bug I'm seeing in some
    computational biology code. So please don't point out things like,
    "why are you using system() to run date?!?". :)

    The problem I'm having is that when I make the array foo[] large, the
    calls to system() slow down enormously. I'm measuring this problem
    using `perl -d:DProf ...`

    Test case 1 below is slow, test case 2 is faster. The only difference
    is LEN = (1<<26) in test case 1 and LEN = (1<<10) in test case 2. I
    welcome any ideas on how to debug this bizarre problem.

    Thanks!
    Ross

    Test case 1:
    -------------------------
    use Inline C;
    use strict;
    hello();

    sub foo {
    for my $i (1..1000) {
    system('date > /dev/null');
    }
    }
    foo();
    __END__
    __C__
    #define LEN (1<<26)
    static char foo[LEN];

    void hello() {
    int i;
    printf("hello world\n");
    for (i = 0; i < LEN; i++) {
    foo = 0;
    }
    }
    -------------------------

    Test case 2
    -------------------------
    use Inline C;
    use strict;
    hello();

    sub foo {
    for my $i (1..1000) {
    system('date > /dev/null');
    }
    }
    foo();
    __END__
    __C__
    #define LEN (1<<10)
    static char foo[LEN];

    void hello() {
    int i;
    printf("hello world\n");
    for (i = 0; i < LEN; i++) {
    foo = 0;
    }
    }
    -------------------------
     
    rbg, Mar 5, 2009
    #1
    1. Advertising

  2. rbg

    rbg Guest

    I should include the DProf output for these cases.

    Test 1:

    Total Elapsed Time = 12.65265 Seconds
    User+System Time = 3.202649 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c Name
    90.5 2.900 2.900 1 2.9000 2.9000 main::foo


    Test 2:

    Total Elapsed Time = 4.697635 Seconds
    User+System Time = 0.337635 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c Name
    77.0 0.260 0.260 1 0.2600 0.2600 main::foo


    You can see that in test 2, foo() is roughly 10x faster.
     
    rbg, Mar 5, 2009
    #2
    1. Advertising

  3. rbg

    sisyphus Guest

    On Mar 6, 4:24 am, rbg <> wrote:

    > You can see that in test 2, foo() is roughly 10x faster.


    What happens if hello() is written as a perl sub (and Inline::C is not
    used at all) - something like (untested):

    use constant LEN => 1<<26;

    sub hello {
    my $i;
    my @foo;
    print "hello world\n";
    for($i = 0; $i < LEN; $i++) {
    $foo[$i] = chr(0);
    }
    }

    Does the speed with which foo() executes still vary in the same way,
    depending upon the value of LEN ?
    Or does the variation occur *only* when hello() is written as an
    Inline::C sub ?

    Cheers,
    Rob
     
    sisyphus, Mar 5, 2009
    #3
  4. rbg

    rbg Guest

    Continuing the discussion with myself, it seems that the problem is
    really much more basic.

    This script:
    --------
    use strict;
    my $bf = 'a' x (1<<26);

    sub foo {
    for my $i (1..1000) {
    system('date > /dev/null');
    }
    }
    foo();
    --------

    is much slower than this one than this one:

    --------
    use strict;
    my $bf = 'a' x (1<<10);

    sub foo {
    for my $i (1..1000) {
    system('date > /dev/null');
    }
    }
    foo();
    --------.

    I can reproduce this issue ubuntu with perl 5.10.0 (and debian with
    perl 5.8.8), but *not* on OS X with perl 5.8.8. It seems like it has
    something to do with how the perl interpreter is cloned when system()
    is called (seems that it isn't actually doing copy-on-write forking).
    Anyone have a deeper understanding of this issue?
     
    rbg, Mar 5, 2009
    #4
  5. rbg <> wrote in news:4eeea107-69d5-4e3d-9610-
    :

    > Continuing the discussion with myself, it seems that the problem is
    > really much more basic.
    >
    > This script:
    > --------
    > use strict;
    > my $bf = 'a' x (1<<26);
    >
    > sub foo {
    > for my $i (1..1000) {
    > system('date > /dev/null');
    > }
    > }
    > foo();
    > --------
    >
    > is much slower than this one than this one:
    >
    > --------
    > use strict;
    > my $bf = 'a' x (1<<10);
    >
    > sub foo {
    > for my $i (1..1000) {
    > system('date > /dev/null');
    > }
    > }
    > foo();
    > --------.


    I don't see that.

    This is perl, v5.10.0 built for MSWin32-x86-multi-thread
    (with 5 registered patches, see perl -V for more detail)

    Binary build 1004 [287188] provided by ActiveState

    C:\DOCUME~1\asu1\LOCALS~1\Temp\q> cat t.pl
    use strict;
    my $bf = 'a' x (1<<26);

    sub foo {
    for my $i (1..1000) {
    system('date /t > NUL');
    }
    }
    foo();

    TimeThis : Command Line : t.pl
    TimeThis : Start Time : Thu Mar 05 18:39:11 2009
    TimeThis : End Time : Thu Mar 05 18:40:11 2009
    TimeThis : Elapsed Time : 00:01:00.078

    C:\DOCUME~1\asu1\LOCALS~1\Temp\q> cat s.pl
    use strict;
    my $bf = 'a' x (1<<10);

    sub foo {
    for my $i (1..1000) {
    system('date /t > NUL');
    }
    }
    foo();

    TimeThis : Command Line : s.pl
    TimeThis : Start Time : Thu Mar 05 18:41:35 2009
    TimeThis : End Time : Thu Mar 05 18:42:34 2009
    TimeThis : Elapsed Time : 00:00:59.640

    C:\DOCUME~1\asu1\LOCALS~1\Temp\q> cat z.pl
    use strict;

    sub foo {
    for my $i (1..1000) {
    system('date /t > NUL');
    }
    }
    foo();

    TimeThis : Command Line : z.pl
    TimeThis : Start Time : Thu Mar 05 18:43:11 2009
    TimeThis : End Time : Thu Mar 05 18:44:10 2009
    TimeThis : Elapsed Time : 00:00:59.015

    C:\DOCUME~1\asu1\LOCALS~1\Temp\q> cat u.pl
    use strict;
    my $bf = 'a' x (1<<26);

    TimeThis : Command Line : u.pl
    TimeThis : Start Time : Thu Mar 05 18:45:05 2009
    TimeThis : End Time : Thu Mar 05 18:45:06 2009
    TimeThis : Elapsed Time : 00:00:00.390

    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://www.rehabitation.com/clpmisc/
     
    A. Sinan Unur, Mar 5, 2009
    #5
  6. Ben Morrow <> wrote in news:vc6686-it1.ln1
    @osiris.mauzo.dyndns.org:

    >
    > Quoth "A. Sinan Unur" <>:
    >>
    >> I don't see that.
    >>
    >> This is perl, v5.10.0 built for MSWin32-x86-multi-thread
    >> (with 5 registered patches, see perl -V for more detail)

    >
    > Well, no. Since this is probably a problem with fork(2) of a large
    > address space, and Win32 doesn't use fork(2), you wouldn't.


    OK then ... I tried it on my Eee PC 701 with ArchLinux using the stock
    perl v5.10.0 while downloading system updates.

    time shows:

    real 0m37.892s
    user 0m17.959s
    sys 0m17.882s

    For control, omitting the large assignment to $bf yields

    real 0m24.965s
    user 0m17.586s
    sys 0m6.416s

    On the other hand, using

    my $bf = 'a' x (1 << 10);

    yields

    real 0m24.748s
    user 0m17.322s
    sys 0m6.526s

    Clearly, there is a slowdown with the larger $bf, but there is not a 10x
    difference.

    PS: Apologies for not copying and pasting.

    Sinan

    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://www.rehabitation.com/clpmisc/
     
    A. Sinan Unur, Mar 6, 2009
    #6
  7. rbg

    rbg Guest

    Thanks for your responses.

    I guess my real question is, can this performance penalty be
    eliminated? My understanding is that linux uses copy-on-write for the
    fork() system call, so forking a process is very cheap, and
    independent of memory usage (modulo page table copying). My
    understanding at this point is that, somehow, perl has found a way to
    defeat the copy-on-write gains. This seems rather unfortunate. Is
    there some way to tell perl, "when I call system() don't copy the
    state of the interpreter"?

    thanks
    -ross
     
    rbg, Mar 6, 2009
    #7
  8. rbg

    Guest

    On Mar 6, 1:12 am, Ben Morrow <> wrote:
    > Quoth "A. Sinan Unur" <>:
    >
    > > I don't see that.

    >
    > > This is perl, v5.10.0 built for MSWin32-x86-multi-thread
    > > (with 5 registered patches, see perl -V for more detail)

    >
    > Well, no. Since this is probably a problem with fork(2) of a large
    > address space, and Win32 doesn't use fork(2), you wouldn't.


    I have looked up perldoc -f system where it says:
    "[...] Does exactly the same thing as exec LIST, except that a fork is
    done first, and the parent process waits for the child process to
    complete [...]"

    I see that system() involves a fork, but a naïve question here: Why
    does system() need to fork, can it not just simply run whatever needs
    running (a shell program presumably) without forking and return to the
    perl program to continue the program ? -- this would avoid the
    unneccessary large address space.
     
    , Mar 6, 2009
    #8
  9. rbg

    rbg Guest

    Ben -- right, you are. On the OS X system I get d_vfork='define' and
    on the linux systems I get d_vfork='undef'. Is this a compile time
    configuration option or is there a way to change it at runtime?

    Thanks,
    ross
     
    rbg, Mar 7, 2009
    #9
  10. rbg

    rbg Guest

    On Mar 7, 7:12 am, Ben Morrow <> wrote:
    > Quoth rbg <>:
    >
    > It's set at perl Configure time. You could try building perl with
    > -Dusevfork on the Configure line, but there may be a good reason for it
    > being disabled by default. Make sure you run all the tests.
    >
    > Ben


    The linux man pages caution strongly against using vfork() -- saying
    that its deprecated and should not be used because fork() is
    implemented efficiently using COW. This brings me back to the same
    question: if fork() is behaving efficiently, what else is perl doing
    on linux that is causing the execution time of system() to grow
    (significantly) with process size? Maybe I need to dig into the perl
    source to answer this question...

    -ross
     
    rbg, Mar 7, 2009
    #10
  11. rbg

    rbg Guest

    And on my linux system it's implemented with clone, not fork

    strace perl -e'system "true"'
    <snip>
    brk(0x8b9c000) = 0x8b9c000
    close(3) = 0
    pipe([3, 4]) = 0
    clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
    SIGCHLD, child_tidptr=0xb7e9a908) = 16784
    --- SIGCHLD (Child exited) @ 0 (0) ---
    close(4) = 0
    rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
    rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
    waitpid(16784, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 16784
    rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
    rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0
    </snip>

    Odd.
     
    rbg, Mar 7, 2009
    #11
  12. On 2009-03-07 15:19, Ben Morrow <> wrote:
    > Quoth rbg <>:
    >> On Mar 7, 7:12 am, Ben Morrow <> wrote:
    >> > It's set at perl Configure time. You could try building perl with
    >> > -Dusevfork on the Configure line, but there may be a good reason for it
    >> > being disabled by default. Make sure you run all the tests.

    >>
    >> The linux man pages caution strongly against using vfork() -- saying
    >> that its deprecated and should not be used because fork() is
    >> implemented efficiently using COW.

    >
    > Even with COW, fork(2) still has to copy all the page tables, which
    > makes it linear in the size of the process address space, albeit with a
    > much smaller constant than without COW. vfork(2) should be more-or-less
    > constant time (it will be linear in the number for open file
    > descriptors, since those are still cloned).


    This is correct. A simple C program which just calls (v)fork/waitpid in a
    tight loop (the child process exits immediately) shows that the
    execution time increases with the process size for fork, but is roughly
    constant for vfork.

    See http://www.hjp.at/os/fork-vs-vfork/ for a graph.

    hp
     
    Peter J. Holzer, Mar 7, 2009
    #12
  13. On 2009-03-07 15:57, rbg <> wrote:
    > And on my linux system it's implemented with clone, not fork
    >
    > strace perl -e'system "true"'
    > clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
    > SIGCHLD, child_tidptr=0xb7e9a908) = 16784
    >
    > Odd.


    fork() and vfork() are just wrapper functions for clone() on Linux (the
    "real" fork and vfork syscalls still exist for binary compatibility but
    have been deprecated for a very long time and aren't used by any current
    libc (AFAIK)).

    hp
     
    Peter J. Holzer, Mar 7, 2009
    #13
  14. On 2009-03-07, rbg <> wrote:
    > The linux man pages caution strongly against using vfork() -- saying
    > that its deprecated and should not be used because fork() is
    > implemented efficiently using COW. This brings me back to the same
    > question: if fork() is behaving efficiently, what else is perl doing


    As you know now, using fork() for startup of a child process was one
    of the most brain-damaged decisions of designers of a certain one-user
    non-multiprocessing OS for micro-computers.

    When people say that "fork() is efficient" they mean just that it is
    "more effecient than it was when one had no COW optimization".

    Hope this helps,
    Ilya

    P.S. I wonder why people do not enable Perl to use the POSIX variant
    of spawn(). All the necessary code is already there in DOSISH
    branches of the codebase...
     
    Ilya Zakharevich, Mar 8, 2009
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PseudoIntellectual

    Java Printing Slowdown

    PseudoIntellectual, Mar 8, 2005, in forum: Java
    Replies:
    0
    Views:
    281
    PseudoIntellectual
    Mar 8, 2005
  2. PseudoIntellectual

    Java Printing Slowdown

    PseudoIntellectual, Mar 8, 2005, in forum: Java
    Replies:
    0
    Views:
    325
    PseudoIntellectual
    Mar 8, 2005
  3. Trevor Perrin

    slowdown on looping

    Trevor Perrin, Oct 4, 2003, in forum: Python
    Replies:
    5
    Views:
    303
    Greg Ewing (using news.cis.dfn.de)
    Oct 8, 2003
  4. Hallvard B Furuseth

    slowdown with massive memory usage

    Hallvard B Furuseth, Jul 30, 2004, in forum: Python
    Replies:
    5
    Views:
    326
    Bengt Richter
    Aug 1, 2004
  5. tac-tics

    Slowdown in Jython

    tac-tics, Dec 28, 2006, in forum: Python
    Replies:
    3
    Views:
    384
    tac-tics
    Dec 29, 2006
Loading...

Share This Page