Multi-cpu and ruby Threading

Discussion in 'Ruby' started by Regis d'Aubarede, Jun 29, 2010.

  1. Hello,

    We receive a new PC based on I Core 7 on Windows 7.
    So i try to compare the use processors resources of each
    Ruby interpretor (JRuby,IronRuby,Ruby 1.9.1 ).
    I do the same (stupid) treatment by 1 to 8 threads, and measure
    the global duration.

    (test program is on attachment)

    Here is the result.

    c:\usr\ruby\local>jruby thread_bench.rb
    1.8.7, java, 2010-05-12
    1000 iterations by 1 threads , Duration = 2772 ms
    500 iterations by 2 threads , Duration = 2076 ms
    333 iterations by 3 threads , Duration = 1884 ms
    250 iterations by 4 threads , Duration = 1848 ms
    200 iterations by 5 threads , Duration = 1814 ms
    166 iterations by 6 threads , Duration = 1755 ms
    142 iterations by 7 threads , Duration = 1866 ms
    125 iterations by 8 threads , Duration = 1538 ms

    c:\usr\ruby\local>ir thread_bench.rb
    1.8.6, i386-mswin32, 2009-03-31
    1000 iterations by 1 threads , Duration = 2257 ms
    500 iterations by 2 threads , Duration = 1305 ms
    333 iterations by 3 threads , Duration = 1055 ms
    250 iterations by 4 threads , Duration = 880 ms
    200 iterations by 5 threads , Duration = 1026 ms
    166 iterations by 6 threads , Duration = 940 ms
    142 iterations by 7 threads , Duration = 989 ms
    125 iterations by 8 threads , Duration = 1098 ms

    c:\usr\ruby\local>ruby19 thread_bench.rb
    1.9.1, i386-mswin32, 2010-01-10
    1000 iterations by 1 threads , Duration = 7318 ms
    500 iterations by 2 threads , Duration = 7393 ms
    333 iterations by 3 threads , Duration = 7335 ms
    250 iterations by 4 threads , Duration = 7367 ms
    200 iterations by 5 threads , Duration = 7450 ms
    166 iterations by 6 threads , Duration = 7343 ms
    142 iterations by 7 threads , Duration = 7349 ms
    125 iterations by 8 threads , Duration = 7454 ms

    So it's seem that IronRuby has better use of cpus than JRuby ?

    Attachments:
    http://www.ruby-forum.com/attachment/4825/thread_bench.rb

    --
    Posted via http://www.ruby-forum.com/.
    Regis d'Aubarede, Jun 29, 2010
    #1
    1. Advertising

  2. Regis d'Aubarede

    Roger Pack Guest

    Roger Pack, Jun 29, 2010
    #2
    1. Advertising

  3. Roger Pack wrote:
    > I've seen a bit of slowdown on jruby when using multiple threads, as
    > well.


    Result seem different on Linux.Here same test, on same machine,
    on ubunbtu 10.4/virtualbox with 8 processor affinity ;

    regis@regis-desktop:~/Ruby/local$ java -version
    java version "1.6.0_18"
    OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
    OpenJDK Server VM (build 14.0-b16, mixed mode)

    regis@regis-desktop:~/Ruby/local$ jruby -v
    jruby 1.5.1 (ruby 1.8.7 patchlevel 249) (2010-06-06 f3a3480) (OpenJDK
    Client VM 1.6.0_18) [i386-java]

    regis@regis-desktop:~/Ruby/local$ jruby thread_bench.rb
    1.8.7, java, 2010-06-06
    1000 iterations by 1 threads , Duration = 3930 ms
    500 iterations by 2 threads , Duration = 3723 ms
    333 iterations by 3 threads , Duration = 3490 ms
    250 iterations by 4 threads , Duration = 3470 ms
    200 iterations by 5 threads , Duration = 3353 ms
    166 iterations by 6 threads , Duration = 3378 ms
    142 iterations by 7 threads , Duration = 3455 ms
    125 iterations by 8 threads , Duration = 4032 ms

    regis@regis-desktop:~/Ruby/local$ ir -v
    IronRuby 0.9.0.0 on .NET 2.0.0.0

    regis@regis-desktop:~/Ruby/local$ ir thread_bench.rb
    1.8.6, i386-mswin32, 2008-05-28
    1000 iterations by 1 threads , Duration = 11091 ms
    500 iterations by 2 threads , Duration = 7676 ms
    333 iterations by 3 threads , Duration = 12243 ms
    250 iterations by 4 threads , Duration = 7728 ms
    200 iterations by 5 threads , Duration = 7767 ms
    166 iterations by 6 threads , Duration = 7749 ms
    142 iterations by 7 threads , Duration = 8184 ms
    125 iterations by 8 threads , Duration = 8069 ms
    regis@regis-desktop:~/Ruby/local$

    --
    Posted via http://www.ruby-forum.com/.
    Regis d'Aubarede, Jun 29, 2010
    #3
  4. On Tue, Jun 29, 2010 at 9:52 AM, Regis d'Aubarede
    <> wrote:
    > Hello,
    >
    > We receive a new PC based on I Core 7 on Windows 7.
    > So i try to compare the use processors resources of each
    > Ruby interpretor (JRuby,IronRuby,Ruby 1.9.1 ).
    > I do the same (stupid) treatment by 1 to 8 threads, and measure
    > the global duration.
    >
    > (test program is on attachment)
    >
    > Here is the result.
    >
    > c:\usr\ruby\local>jruby =C2=A0thread_bench.rb
    > 1.8.7, java, 2010-05-12
    > 1000 iterations by 1 threads =C2=A0, Duration =C2=A0=3D 2772 ms
    > 500 iterations by 2 threads =C2=A0 , Duration =C2=A0=3D 2076 ms
    > 333 iterations by 3 threads =C2=A0 , Duration =C2=A0=3D 1884 ms
    > 250 iterations by 4 threads =C2=A0 , Duration =C2=A0=3D 1848 ms
    > 200 iterations by 5 threads =C2=A0 , Duration =C2=A0=3D 1814 ms
    > 166 iterations by 6 threads =C2=A0 , Duration =C2=A0=3D 1755 ms
    > 142 iterations by 7 threads =C2=A0 , Duration =C2=A0=3D 1866 ms
    > 125 iterations by 8 threads =C2=A0 , Duration =C2=A0=3D 1538 ms


    Probably not running server VM, so pass --server. Overall times should
    be better, but depending on the algorithm the remaining bottleneck for
    JRuby may or may not be CPU-bound.

    The initial iteration's time should probably be largely discounted,
    and the whole thing should probably be run a couple times to see the
    actual perf of a longer-running app.

    I don't have IronRuby here, but here's numbers for me on Java 6,
    server, OS X, Core 2 Duo 2.6GHz:

    (2nd time through in the same script, only the 1 and 2 processor runs):

    1000 iterations by 1 threads , Duration =3D 2633 ms
    500 iterations by 2 threads , Duration =3D 1628 ms

    If with --server on your system JRuby's still slower than IronRuby,
    there may be a bug or bottleneck we can repair. I have been meaning to
    make blocks faster in JRuby, but they still come with a higher cost
    than some other impls.

    - Charlie
    Charles Oliver Nutter, Jun 29, 2010
    #4
  5. On Tue, Jun 29, 2010 at 3:23 PM, Charles Oliver Nutter
    <> wrote:
    > If with --server on your system JRuby's still slower than IronRuby,
    > there may be a bug or bottleneck we can repair. I have been meaning to
    > make blocks faster in JRuby, but they still come with a higher cost
    > than some other impls.


    Maybe also worth showing an experimental dynopt flag for JRuby that
    seem to improve performance dramatically, but at a small cost of some
    Ruby semantics (backtraces get a little funky, for example):

    ~/projects/jruby =E2=9E=94 jruby --server -J-Djruby.compile.dynopt=3Dtrue t=
    hread_bench.rb
    1.8.7, java, 2010-06-17
    1000 iterations by 1 threads , Duration =3D 400 ms
    500 iterations by 2 threads , Duration =3D 188 ms
    333 iterations by 3 threads , Duration =3D 192 ms
    250 iterations by 4 threads , Duration =3D 149 ms
    200 iterations by 5 threads , Duration =3D 167 ms
    166 iterations by 6 threads , Duration =3D 214 ms
    142 iterations by 7 threads , Duration =3D 177 ms
    125 iterations by 8 threads , Duration =3D 163 ms
    1000 iterations by 1 threads , Duration =3D 265 ms
    500 iterations by 2 threads , Duration =3D 160 ms
    333 iterations by 3 threads , Duration =3D 186 ms
    250 iterations by 4 threads , Duration =3D 148 ms
    200 iterations by 5 threads , Duration =3D 159 ms
    166 iterations by 6 threads , Duration =3D 151 ms
    142 iterations by 7 threads , Duration =3D 150 ms
    125 iterations by 8 threads , Duration =3D 171 ms
    ...

    Hopefully I can land this in JRuby 1.6, but it's on master now.

    - Charlie
    Charles Oliver Nutter, Jun 29, 2010
    #5
  6. Charles Nutter wrote:
    >> If with --server on your system JRuby's still slower than IronRuby,...

    > Maybe also worth showing an experimental dynopt flag for JRuby that seem to
    > improve performance ....



    Sorry for my bad english !!

    My test consist to verify that symetric multi-core (SMP) is well use by
    the VM. In this aspect, pure performence is not important.
    the decrease of duration calculation with the increase off used threads
    is my concern.

    (http://programmingzen.com/2010/06/28/the-great-ruby-shootout-windows-edition/
    show that JRuby is superior to IronRuby...)

    For discrimination if the issue is in JRuby side or in JVM side, i run
    same
    JRubyCode, but invoke a pure Java traitement :
    (1..nb_threads).map { Thread.new() { Calc.calc(p1,n1) } }
    with

    class Calc {
    public static long calc(int a, int b) {
    long res=0;
    for (int i=0;i<a;i++)
    for (int j=0;j<b;j++)
    for (int k=0;k<1000;k++)
    res+=i+j+k;
    return(res);
    }
    }

    c:\usr\ruby\local>jruby thread_bench2.rb
    1.8.7, java, 2010-05-12
    1000 iterations by 1 threads , Duration = 15404 ms
    500 iterations by 2 threads , Duration = 8147 ms
    333 iterations by 3 threads , Duration = 5812 ms
    250 iterations by 4 threads , Duration = 4690 ms
    200 iterations by 5 threads , Duration = 4648 ms
    166 iterations by 6 threads , Duration = 4749 ms
    142 iterations by 7 threads , Duration = 4371 ms
    125 iterations by 8 threads , Duration = 4222 ms

    So JVM scale right :)
    And my intel core i7 has realy 4 core...

    Attachments:
    http://www.ruby-forum.com/attachment/4829/thread_bench2.rb

    --
    Posted via http://www.ruby-forum.com/.
    Regis d'Aubarede, Jun 30, 2010
    #6
  7. On Wed, Jun 30, 2010 at 10:20 AM, Regis d'Aubarede
    <> wrote:
    > For discrimination if the issue is in JRuby side or in JVM side, i run
    > same
    > JRubyCode, but invoke a pure Java traitement :
    > =C2=A0 =C2=A0(1..nb_threads).map { =C2=A0Thread.new() { Calc.calc(p1,n1) =

    } }
    > with
    >
    > class Calc {
    > =C2=A0public static long calc(int a, int b) {
    > =C2=A0 =C2=A0long res=3D0;
    > =C2=A0 =C2=A0for (int i=3D0;i<a;i++)
    > =C2=A0 =C2=A0 =C2=A0for (int j=3D0;j<b;j++)
    > =C2=A0 =C2=A0 =C2=A0 for (int k=3D0;k<1000;k++)
    > =C2=A0 =C2=A0 =C2=A0 res+=3Di+j+k;
    > =C2=A0 =C2=A0return(res);
    > =C2=A0}
    > }


    Yes, this result is not surprising to me. In the original case, the
    benchmark suffers mostly from all the objects being created. For
    example:

    * All the numeric loops (in JRuby) create at least one new Fixnum
    object for every iteration
    * All the math operations create Fixnum or Float objects as well

    Running an allocation profile of your benchmark (which actually runs
    pretty slow because there's *so much* allocation happening) shows the
    amount of data that's being chewed up...it's very likely that the
    bottleneck is in allocating all those closures and all those Fixnums
    for this particular case:

    ~/projects/jruby =E2=9E=94 jruby -J-Xrunhprof thread_bench.rb
    1.8.7, java, 2010-06-17
    1000 iterations by 1 threads , Duration =3D 399267 ms
    ^CDumping Java heap ... allocation sites ... done.

    ~/projects/jruby =E2=9E=94 egrep "%|objs" java.hprof.txt | head -n 11
    rank self accum bytes objs bytes objs trace name
    1 65.18% 65.18% 13545024 423282 1133938432 35435576 302318
    org.jruby.RubyFixnum
    2 22.61% 87.79% 4697920 146810 381348672 11917146 302867
    org.jruby.RubyFloat
    3 1.32% 89.12% 274992 5350 274992 5350 300000 char[]
    4 0.62% 89.74% 128488 5341 128488 5341 300000 java.lang.String
    5 0.18% 89.92% 38184 1 38184 1 306423 short[]
    6 0.18% 90.10% 38184 1 38184 1 306428 short[]
    7 0.14% 90.24% 28720 718 29400 735 300521
    java.util.WeakHashMap$Entry
    8 0.13% 90.37% 27792 70 27792 70 300000 byte[]
    9 0.13% 90.50% 26832 1118 35040 1460 300704
    java.util.concurrent.ConcurrentHashMap$HashEntry
    10 0.12% 90.63% 25232 166 25232 166 300557 org.jruby.MetaCla=
    ss

    Note that this is only after the 1000-iteration run, and during
    execution over 1GB of memory was allocated and released, mostly in
    Fixnum objects with a smaller amount (380MB+) in Float objects.
    Running with verbose GC:


    ~/projects/jruby =E2=9E=94 jruby -J-verbose:gc thread_bench.rb
    1.8.7, java, 2010-06-17
    [GC 13184K->1128K(63936K), 0.0108696 secs]
    [GC 14312K->2124K(63936K), 0.0077762 secs]
    [GC 15308K->1445K(63936K), 0.0010409 secs]
    [GC 14629K->1246K(63936K), 0.0031958 secs]
    ...

    And adding up all the size changes (number of GC runs * difference in
    live object size) produces roughly the same estimate; for the period
    the 1000-iteration part of the bench runs, it allocates a *lot* of
    objects.

    IronRuby may do better here if they're able to treat Fixnum objects as
    value types, which the CLR handles more efficiently than the JVM's
    "every object is on the heap". Ultimately this is largely an
    allocation-rate benchmark, at least on JRuby, since our Fixnum objects
    are "real" objects (or to put it in MRI's favor...our Fixnum objects
    are forced to be "real" objects with heap lifecycles).

    The dynopt work is part of efforts in JRuby to bring math performance
    closer to Java, largely by eliminating te excessive object churn and
    layers of noise for math operations.

    - Charlie
    Charles Oliver Nutter, Jun 30, 2010
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    38
    Views:
    1,258
    Dennis Lee Bieber
    Feb 15, 2005
  2. gabor
    Replies:
    0
    Views:
    523
    gabor
    Jun 14, 2005
  3. akineko
    Replies:
    3
    Views:
    2,600
    Jesse Noller
    Jan 29, 2009
  4. pavunkumar

    How , system cpu and user cpu times calculates

    pavunkumar, Feb 27, 2009, in forum: C Programming
    Replies:
    1
    Views:
    338
  5. Terry Michaels

    Ruby Multi-threading?

    Terry Michaels, Sep 15, 2010, in forum: Ruby
    Replies:
    10
    Views:
    257
    Jarmo Pertman
    Sep 25, 2010
Loading...

Share This Page