Using multicore CPUs in parallel tasks

Discussion in 'Ruby' started by Marc Hoeppner, Oct 29, 2009.

  1. Hi,

    I've been reading around a bit but couldn't find a solution that worked,
    so here goes:

    I am running ruby 1.8 and want to make full use of a quad core CPU
    (64bit, Ubuntu) in a task that lends itself to multithreading/multicore
    use.

    It's basically an array of objects that are each use in a fairly CPU
    intensive job, so I figured I could have 4 of them run at the same time
    , one on each CPU.

    BUT...

    The only reasonably understandably suggestion looked something like:

    ----
    threads = 4
    my_array = [something_here]

    threads.times do
    Process.fork(a_method(my_array.shift))
    end

    my_array.each do |object|
    Process.wait(0)
    Process.fork(a_method(object))
    end
    ---

    But this still only used one CPU (and looks a bit ugly..). Is that some
    limitation of ruby (v 1.8 specifically) or am I doing something wrong?

    Cheers,

    Marc
    --
    Posted via http://www.ruby-forum.com/.
     
    Marc Hoeppner, Oct 29, 2009
    #1
    1. Advertisements

  2. On Thu, Oct 29, 2009 at 10:56 AM, Marc Hoeppner
    <> wrote:
    > Hi,
    >
    > I've been reading around a bit but couldn't find a solution that worked,
    > so here goes:
    >
    > I am running ruby 1.8 and want to make full use of a quad core CPU
    > (64bit, Ubuntu) in a task that lends itself to multithreading/multicore
    > use.
    >
    > It's basically an array of objects that are each use in a fairly CPU
    > intensive job, so I figured I could have 4 of them run at the same time
    > , one on each CPU.


    You might want to checkout Pure and Tiamat and talk to James Lawrence
    (see links). He seems to have something you are asking for. I don't
    know much about these 2 project, they came by my radar a few days ago
    but I think it's cool what James is working on!

    =3D=3D Links

    * Pure: http://purefunctional.rubyforge.org

    * Documentation: http://tiamat.rubyforge.org
    * Download: http://rubyforge.org/frs/?group_id=3D9145
    * Rubyforge home: http://rubyforge.org/projects/tiamat
    * Repository: http://github.com/quix/tiamat

    =3D=3D Author

    * James M. Lawrence <>


    > BUT...
    >
    > The only reasonably understandably suggestion looked something like:
    >
    > ----
    > threads =3D 4
    > my_array =3D [something_here]
    >
    > threads.times do
    > =A0Process.fork(a_method(my_array.shift))
    > end
    >
    > my_array.each do |object|
    > =A0Process.wait(0)
    > =A0Process.fork(a_method(object))
    > end
    > ---
    >
    > But this still only used one CPU (and looks a bit ugly..). Is that some
    > limitation of ruby (v 1.8 specifically) or am I doing something wrong?
    >
    > Cheers,
    >
    > Marc
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >




    --=20
    Kind Regards,
    Rajinder Yadav

    http://DevMentor.org

    Do Good! - Share Freely, Enrich and Empower people to Transform their lives=
     
    Rajinder Yadav, Oct 29, 2009
    #2
    1. Advertisements

  3. Marc Hoeppner

    Glen Holcomb Guest

    On Thu, Oct 29, 2009 at 8:56 AM, Marc Hoeppner
    <>wrote:

    > Hi,
    >
    > I've been reading around a bit but couldn't find a solution that worked,
    > so here goes:
    >
    > I am running ruby 1.8 and want to make full use of a quad core CPU
    > (64bit, Ubuntu) in a task that lends itself to multithreading/multicore
    > use.
    >
    > It's basically an array of objects that are each use in a fairly CPU
    > intensive job, so I figured I could have 4 of them run at the same time
    > , one on each CPU.
    >
    > BUT...
    >
    > The only reasonably understandably suggestion looked something like:
    >
    > ----
    > threads =3D 4
    > my_array =3D [something_here]
    >
    > threads.times do
    > Process.fork(a_method(my_array.shift))
    > end
    >
    > my_array.each do |object|
    > Process.wait(0)
    > Process.fork(a_method(object))
    > end
    > ---
    >
    > But this still only used one CPU (and looks a bit ugly..). Is that some
    > limitation of ruby (v 1.8 specifically) or am I doing something wrong?
    >
    > Cheers,
    >
    > Marc
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >

    You are going to want Ruby 1.9 for this. In 1.8 threads are "green",
    basically they only exists as threads inside the VM so you still only hit
    one core and any blocking system I/O will block all of your threads.

    --=20
    "Hey brother Christian with your high and mighty errand, Your actions speak
    so loud, I can=92t hear a word you=92re saying."

    -Greg Graffin (Bad Religion)
     
    Glen Holcomb, Oct 29, 2009
    #3
  4. Marc Hoeppner

    Peter Booth Guest

    Marc,

    How long lived is each of these tasks? Are we talking seconds or weeks?
    Is there a user-facing aspect to this or is throughput the variable
    that you're wanting to optimize?

    When you say "fairly CPU intensive", doe sthis mean that when one of
    these tasks runs you see (from sar/mpstat) that one of your CPUs is
    pinned?

    Peter


    On Oct 29, 2009, at 10:56 AM, Marc Hoeppner wrote:

    > Hi,
    >
    > I've been reading around a bit but couldn't find a solution that
    > worked,
    > so here goes:
    >
    > I am running ruby 1.8 and want to make full use of a quad core CPU
    > (64bit, Ubuntu) in a task that lends itself to multithreading/
    > multicore
    > use.
    >
    > It's basically an array of objects that are each use in a fairly CPU
    > intensive job, so I figured I could have 4 of them run at the same
    > time
    > , one on each CPU.
    >
    > BUT...
    >
    > The only reasonably understandably suggestion looked something like:
    >
    > ----
    > threads = 4
    > my_array = [something_here]
    >
    > threads.times do
    > Process.fork(a_method(my_array.shift))
    > end
    >
    > my_array.each do |object|
    > Process.wait(0)
    > Process.fork(a_method(object))
    > end
    > ---
    >
    > But this still only used one CPU (and looks a bit ugly..). Is that
    > some
    > limitation of ruby (v 1.8 specifically) or am I doing something wrong?
    >
    > Cheers,
    >
    > Marc
    > --
    > Posted via http://www.ruby-forum.com/.
    >
     
    Peter Booth, Oct 29, 2009
    #4
  5. Marc Hoeppner

    Tony Arcieri Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Thu, Oct 29, 2009 at 11:48 AM, Glen Holcomb <> wrote:

    > You are going to want Ruby 1.9 for this. In 1.8 threads are "green",
    > basically they only exists as threads inside the VM so you still only hit
    > one core and any blocking system I/O will block all of your threads.
    >


    Ruby 1.9 isn't going to help you when using threads to distribute
    computation across CPU cores. The Global VM Lock ensures that simultaneous
    computation is still limited to one core.

    JRuby, on the other hand, does not have this limitation. On MRI/1.9 I would
    recommend using multiple processes.

    --
    Tony Arcieri
    Medioh/Nagravision
     
    Tony Arcieri, Oct 29, 2009
    #5
  6. Marc Hoeppner

    Glen Holcomb Guest

    On Thu, Oct 29, 2009 at 2:04 PM, Tony Arcieri <> wrote:

    > On Thu, Oct 29, 2009 at 11:48 AM, Glen Holcomb <>
    > wrote:
    >
    > > You are going to want Ruby 1.9 for this. In 1.8 threads are "green",
    > > basically they only exists as threads inside the VM so you still only h=

    it
    > > one core and any blocking system I/O will block all of your threads.
    > >

    >
    > Ruby 1.9 isn't going to help you when using threads to distribute
    > computation across CPU cores. The Global VM Lock ensures that simultaneo=

    us
    > computation is still limited to one core.
    >
    > JRuby, on the other hand, does not have this limitation. On MRI/1.9 I
    > would
    > recommend using multiple processes.
    >
    > --
    > Tony Arcieri
    > Medioh/Nagravision
    >


    Ah, I did not know that.

    --=20
    "Hey brother Christian with your high and mighty errand, Your actions speak
    so loud, I can=92t hear a word you=92re saying."

    -Greg Graffin (Bad Religion)
     
    Glen Holcomb, Oct 29, 2009
    #6
  7. On 10/29/2009 09:04 PM, Tony Arcieri wrote:

    > On Thu, Oct 29, 2009 at 11:48 AM, Glen Holcomb <> wrote:
    >
    >> You are going to want Ruby 1.9 for this. In 1.8 threads are "green",
    >> basically they only exists as threads inside the VM so you still only hit
    >> one core and any blocking system I/O will block all of your threads.

    >
    > Ruby 1.9 isn't going to help you when using threads to distribute
    > computation across CPU cores. The Global VM Lock ensures that simultaneous
    > computation is still limited to one core.


    Are you saying that the global VM lock even extends to several
    processes? Because Marc did not want to use threads for distribution
    but rather processes.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Oct 29, 2009
    #7
  8. On 10/29/2009 03:56 PM, Marc Hoeppner wrote:
    > Hi,
    >
    > I've been reading around a bit but couldn't find a solution that worked,
    > so here goes:
    >
    > I am running ruby 1.8 and want to make full use of a quad core CPU
    > (64bit, Ubuntu) in a task that lends itself to multithreading/multicore
    > use.
    >
    > It's basically an array of objects that are each use in a fairly CPU
    > intensive job, so I figured I could have 4 of them run at the same time
    > , one on each CPU.
    >
    > BUT...
    >
    > The only reasonably understandably suggestion looked something like:
    >
    > ----
    > threads = 4
    > my_array = [something_here]
    >
    > threads.times do
    > Process.fork(a_method(my_array.shift))
    > end
    >
    > my_array.each do |object|
    > Process.wait(0)
    > Process.fork(a_method(object))
    > end
    > ---
    >
    > But this still only used one CPU (and looks a bit ugly..). Is that some
    > limitation of ruby (v 1.8 specifically) or am I doing something wrong?


    I believe you are not using Process.fork properly. In fact, I am
    surprised that you do not get an exception:

    irb(main):001:0> Process.fork("foo")
    ArgumentError: wrong number of arguments (1 for 0)
    from (irb):1:in `fork'
    from (irb):1
    from :0

    Basically what you do is you do a calculation (a_method(object)) and
    _then_ you create a process. No surprise that only one CPU is busy.

    Here's something else that you could do

    processes = 4

    my_array.each_slice my_array.size / processes do |tasks|
    fork do
    tasks.each do |task|
    a_method(task)
    end
    end
    end

    Process.waitall

    Drawback is that one of those processes might accidentally get all the
    easy tasks and you do not utilize CPUs optimally. Here's another
    solution that does not have that issue

    processes = 4
    count = 0

    my_array.each do |task|
    if count == processes
    Process.wait
    count -= 1
    end

    fork do
    a_method(task)
    end
    count += 1
    end

    Process.waitall

    You can see that it works with this example:

    processes = 4
    count = 0

    10.times do |task|
    if count == processes
    Process.wait
    count -= 1
    end

    fork do
    printf "%-20s start %4d %4d\n", Time.now, $$, task
    sleep rand(5) + 2
    printf "%-20s end %4d %4d\n", Time.now, $$, task
    end
    count += 1
    end

    Process.waitall


    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Oct 29, 2009
    #8
  9. Marc Hoeppner

    Tony Arcieri Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Thu, Oct 29, 2009 at 4:05 PM, Robert Klemme
    <>wrote:

    > On 10/29/2009 09:04 PM, Tony Arcieri wrote:
    >
    >> Ruby 1.9 isn't going to help you when using threads to distribute
    >> computation across CPU cores. The Global VM Lock ensures that
    >> simultaneous
    >> computation is still limited to one core.
    >>

    >
    > Are you saying that the global VM lock even extends to several processes?
    > Because Marc did not want to use threads for distribution but rather
    > processes.
    >


    No, if you look over my post again it specifically mentions the GVL applies
    to threads and suggests using processes.

    --
    Tony Arcieri
    Medioh/Nagravision
     
    Tony Arcieri, Oct 29, 2009
    #9
  10. 2009/10/29 Tony Arcieri <>:
    > On Thu, Oct 29, 2009 at 4:05 PM, Robert Klemme
    > <>wrote:
    >
    >> On 10/29/2009 09:04 PM, Tony Arcieri wrote:
    >>
    >>> Ruby 1.9 isn't going to help you when using threads to distribute
    >>> computation across CPU cores. =A0The Global VM Lock ensures that
    >>> simultaneous
    >>> computation is still limited to one core.

    >>
    >> Are you saying that the global VM lock even extends to several processes=

    ?
    >> =A0Because Marc did not want to use threads for distribution but rather
    >> processes.

    >
    > No, if you look over my post again it specifically mentions the GVL appli=

    es
    > to threads and suggests using processes.


    I figured as much. The thread discussion does not help Marc, because
    he explicitly wanted to use processes for core utilization. Basically
    Glen sent us in the wrong direction though. :)

    Cheers

    robert


    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Oct 30, 2009
    #10
  11. Robert Klemme wrote:

    > I believe you are not using Process.fork properly. In fact, I am
    > surprised that you do not get an exception:
    >
    > irb(main):001:0> Process.fork("foo")
    > ArgumentError: wrong number of arguments (1 for 0)
    > from (irb):1:in `fork'
    > from (irb):1
    > from :0


    Yes, quite possible - I didn't really look up the exact code, just wrote
    it down from memory, sorry about that..

    >
    > processes = 4
    > count = 0
    >
    > my_array.each do |task|
    > if count == processes
    > Process.wait
    > count -= 1
    > end
    >
    > fork do
    > a_method(task)
    > end
    > count += 1
    > end
    >
    > Process.waitall
    >


    That works like a charm, thanks a lot!
    --
    Posted via http://www.ruby-forum.com/.
     
    Marc Hoeppner, Oct 30, 2009
    #11
  12. Robert Klemme wrote:
    > processes = 4
    > count = 0
    >
    > my_array.each do |task|
    > if count == processes
    > Process.wait
    > count -= 1
    > end
    >
    > fork do
    > a_method(task)
    > end
    > count += 1
    > end
    >
    > Process.waitall


    Another option,

    Tiamat.open_local(4) {
    pure do
    fun_map :result => my_array do |elem|
    a_method(elem)
    end
    end.compute.result
    }

    This lets you distribute across N physical machines without a change to
    the code.
    --
    Posted via http://www.ruby-forum.com/.
     
    James M. Lawrence, Oct 30, 2009
    #12
  13. Tony Arcieri wrote:
    > Ruby 1.9 isn't going to help you when using threads to distribute
    > computation across CPU cores. The Global VM Lock ensures that
    > simultaneous computation is still limited to one core.
    >
    > JRuby, on the other hand, does not have this limitation. On MRI/1.9
    > I would recommend using multiple processes.


    I'm not so sure jruby does this effectively.

    require 'tiamat/autoconfig'
    require 'pure/dsl'
    require 'benchmark'

    mod = pure do
    def total(left, right)
    left + right
    end

    def left
    (1..5_000_000).inject(0) { |acc, n| acc + n }
    end

    def right
    (1..5_000_000).inject(0) { |acc, n| acc + n }
    end
    end

    Benchmark.bmbm { |bm|
    bm.report("1 thread, 1 interpreter") {
    mod.compute(1).total
    }
    bm.report("2 threads, 1 interpreter") {
    mod.compute(2).total
    }
    # this part removed for jruby bench
    bm.report("2 threads, 2 interpreters") {
    Tiamat.open_local(2) {
    mod.compute.total
    }
    }
    }

    == ruby 1.9.2dev (2009-10-18 trunk 25393) [i386-darwin9.8.0]
    Rehearsal -------------------------------------------------------------
    1 thread, 1 interpreter 4.370000 0.020000 4.390000 ( 4.389990)
    2 threads, 1 interpreter 4.360000 0.030000 4.390000 ( 4.385111)
    2 threads, 2 interpreters 0.010000 0.010000 4.700000 ( 2.460661)
    --------------------------------------------------- total: 13.480000sec

    user system total real
    1 thread, 1 interpreter 4.360000 0.020000 4.380000 ( 4.376050)
    2 threads, 1 interpreter 4.360000 0.030000 4.390000 ( 4.380982)
    2 threads, 2 interpreters 0.010000 0.010000 4.710000 ( 2.465925)


    == jruby 1.4.0RC3 (ruby 1.8.7 patchlevel 174) (2009-10-30 1d7de2d) (Java
    HotSpot(TM) Client VM 1.5.0_20) [i386-java]
    Rehearsal ------------------------------------------------------------
    1 thread, 1 interpreter 6.060000 0.000000 6.060000 ( 6.060000)
    2 threads, 1 interpreter 7.629000 0.000000 7.629000 ( 7.629000)
    -------------------------------------------------- total: 13.689000sec

    user system total real
    1 thread, 1 interpreter 6.080000 0.000000 6.080000 ( 6.080000)
    2 threads, 1 interpreter 7.288000 0.000000 7.288000 ( 7.288000)


    --
    Posted via http://www.ruby-forum.com/.
     
    James M. Lawrence, Oct 30, 2009
    #13
  14. On Fri, Oct 30, 2009 at 11:07 AM, James M. Lawrence
    <> wrote:
    > Robert Klemme wrote:
    >> processes =3D 4
    >> count =3D 0
    >>
    >> my_array.each do |task|
    >> =A0 =A0if count =3D=3D processes
    >> =A0 =A0 =A0Process.wait
    >> =A0 =A0 =A0count -=3D 1
    >> =A0 =A0end
    >>
    >> =A0 =A0fork do
    >> =A0 =A0 =A0a_method(task)
    >> =A0 =A0end
    >> =A0 =A0count +=3D 1
    >> end
    >>
    >> Process.waitall

    >
    > Another option,
    >
    > Tiamat.open_local(4) {
    > =A0pure do
    > =A0 =A0fun_map :result =3D> my_array do |elem|
    > =A0 =A0 =A0a_method(elem)
    > =A0 =A0end
    > =A0end.compute.result
    > }
    >
    > This lets you distribute across N physical machines without a change to
    > the code.


    This is just elegant =3D) ... it's funny how I observer something then
    more of what I observer comes in to the fold! Was hoping you would
    reply to the thread ;)
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >


    --=20
    Kind Regards,
    Rajinder Yadav

    http://DevMentor.org

    Do Good! - Share Freely, Enrich and Empower people to Transform their lives=
     
    Rajinder Yadav, Oct 30, 2009
    #14
  15. Marc Hoeppner

    Glen Holcomb Guest

    On Fri, Oct 30, 2009 at 2:06 AM, Robert Klemme
    <>wrote:

    >
    >
    >
    > I figured as much. The thread discussion does not help Marc, because
    > he explicitly wanted to use processes for core utilization. Basically
    > Glen sent us in the wrong direction though. :)
    >
    >

    I've always worked best as a diversion.

    --=20
    "Hey brother Christian with your high and mighty errand, Your actions speak
    so loud, I can=92t hear a word you=92re saying."

    -Greg Graffin (Bad Religion)
     
    Glen Holcomb, Oct 30, 2009
    #15
  16. On Fri, Oct 30, 2009 at 10:14 AM, James M. Lawrence
    <> wrote:
    > =3D=3D ruby 1.9.2dev (2009-10-18 trunk 25393) [i386-darwin9.8.0]
    > Rehearsal -------------------------------------------------------------
    > 1 thread, 1 interpreter =C2=A0 =C2=A0 4.370000 =C2=A0 0.020000 =C2=A0 4.3=

    90000 ( =C2=A04.389990)
    > 2 threads, 1 interpreter =C2=A0 =C2=A04.360000 =C2=A0 0.030000 =C2=A0 4.3=

    90000 ( =C2=A04.385111)
    > 2 threads, 2 interpreters =C2=A0 0.010000 =C2=A0 0.010000 =C2=A0 4.700000=

    ( =C2=A02.460661)
    > --------------------------------------------------- total: 13.480000sec
    >
    > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=

    =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0user =C2=A0 =C2=A0 system =C2=A0 =C2=
    =A0 =C2=A0total =C2=A0 =C2=A0 =C2=A0 =C2=A0real
    > 1 thread, 1 interpreter =C2=A0 =C2=A0 4.360000 =C2=A0 0.020000 =C2=A0 4.3=

    80000 ( =C2=A04.376050)
    > 2 threads, 1 interpreter =C2=A0 =C2=A04.360000 =C2=A0 0.030000 =C2=A0 4.3=

    90000 ( =C2=A04.380982)
    > 2 threads, 2 interpreters =C2=A0 0.010000 =C2=A0 0.010000 =C2=A0 4.710000=

    ( =C2=A02.465925)
    >
    >
    > =3D=3D jruby 1.4.0RC3 (ruby 1.8.7 patchlevel 174) (2009-10-30 1d7de2d) (J=

    ava
    > HotSpot(TM) Client VM 1.5.0_20) [i386-java]
    > Rehearsal ------------------------------------------------------------
    > 1 thread, 1 interpreter =C2=A0 =C2=A06.060000 =C2=A0 0.000000 =C2=A0 6.06=

    0000 ( =C2=A06.060000)
    > 2 threads, 1 interpreter =C2=A0 7.629000 =C2=A0 0.000000 =C2=A0 7.629000 =

    ( =C2=A07.629000)
    > -------------------------------------------------- total: 13.689000sec
    >
    > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=

    =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 user =C2=A0 =C2=A0 system =C2=A0 =C2=A0 =C2=
    =A0total =C2=A0 =C2=A0 =C2=A0 =C2=A0real
    > 1 thread, 1 interpreter =C2=A0 =C2=A06.080000 =C2=A0 0.000000 =C2=A0 6.08=

    0000 ( =C2=A06.080000)
    > 2 threads, 1 interpreter =C2=A0 7.288000 =C2=A0 0.000000 =C2=A0 7.288000 =

    ( =C2=A07.288000)

    JRuby benchmarking:

    * Use Java 6+

    Java 6 is much faster than Java 5. Java 7 is faster still in many cases.

    * Pass --server if -v output says "client" VM

    The Hotspot JVM has two modes: "server" and "client". The "server" VM
    does runtime-profiled optimizations and can be 2x or more faster than
    the "client" VM.

    Results on my system (core 2 duo 2.66GHz):

    ruby 1.9.2dev (2009-07-23 trunk 24248) [i386-darwin9.7.1]
    Rehearsal -------------------------------------------------------------
    1 thread, 1 interpreter 3.370000 0.020000 3.390000 ( 3.516261)
    2 threads, 1 interpreter 3.330000 0.020000 3.350000 ( 3.412460)
    2 threads, 2 interpreters 0.010000 0.000000 3.590000 ( 2.133313)
    --------------------------------------------------- total: 10.330000sec

    user system total real
    1 thread, 1 interpreter 3.350000 0.010000 3.360000 ( 3.415410)
    2 threads, 1 interpreter 3.350000 0.020000 3.370000 ( 3.423560)
    2 threads, 2 interpreters 0.000000 0.010000 3.630000 ( 2.302965)

    jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2009-10-30 eaa9e7f) (Java
    HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]
    Rehearsal ------------------------------------------------------------
    1 thread, 1 interpreter 2.373000 0.000000 2.373000 ( 2.373000)
    2 threads, 1 interpreter 1.733000 0.000000 1.733000 ( 1.733000)
    --------------------------------------------------- total: 4.106000sec

    user system total real
    1 thread, 1 interpreter 2.145000 0.000000 2.145000 ( 2.145000)
    2 threads, 1 interpreter 1.840000 0.000000 1.840000 ( 1.840000)

    It would probably improve more with a longer run, but this is pretty good.

    - Charlie
     
    Charles Oliver Nutter, Nov 1, 2009
    #16
  17. Charles Nutter wrote:
    >
    > JRuby benchmarking:
    >
    > * Use Java 6+
    >
    > Java 6 is much faster than Java 5. Java 7 is faster still in many cases.
    >
    > * Pass --server if -v output says "client" VM


    I didn't consider it because the behavior I showed looks wrong for
    either Java 5 or Java 6 in either client or server mode. Indeed I
    obtained the same results with Java 6 Server VM.

    A computation split into two parallel threads takes more time than the
    same computation with one thread. 'top' reports 185% CPU and 100% CPU
    respectively.

    I was not concerned with comparing MRI and jruby. MRI was a baseline
    to demonstrate that Pure's parallelism was working in the first place.

    I was unable to find your eaa9e7f commit so I grabbed the latest
    master branch.

    jruby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2009-11-02 55366a1) (Java
    HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]

    Core 2 Duo 1.83GHz; all apps closed except Terminal; benchmarks made
    without 'top' running.

    Rehearsal ------------------------------------------------------------
    1 thread, 1 interpreter 3.422000 0.000000 3.422000 ( 3.422000)
    2 threads, 1 interpreter 4.008000 0.000000 4.008000 ( 4.008000)
    --------------------------------------------------- total: 7.430000sec

    user system total real
    1 thread, 1 interpreter 2.942000 0.000000 2.942000 ( 2.942000)
    2 threads, 1 interpreter 3.595000 0.000000 3.595000 ( 3.595000)

    Results are the same with Pure removed:

    require 'benchmark'

    def left
    (1..10_000_000).inject(0) { |acc, n| acc + n }
    end

    def right
    (1..10_000_000).inject(0) { |acc, n| acc + n }
    end

    Benchmark.bmbm { |bm|
    bm.report("1 thread") {
    Thread.new {
    [left, right]
    }.value
    }
    bm.report("2 threads") {
    [
    Thread.new { left },
    Thread.new { right },
    ].map { |t| t.value }
    }
    }

    Rehearsal ---------------------------------------------
    1 thread 6.726000 0.000000 6.726000 ( 6.726000)
    2 threads 7.478000 0.000000 7.478000 ( 7.478000)
    ----------------------------------- total: 14.204000sec

    user system total real
    1 thread 6.636000 0.000000 6.636000 ( 6.636000)
    2 threads 8.196000 0.000000 8.196000 ( 8.196000)

    --
    Posted via http://www.ruby-forum.com/.
     
    James M. Lawrence, Nov 2, 2009
    #17
  18. On Mon, Nov 2, 2009 at 11:47 AM, James M. Lawrence
    <> wrote:
    > Rehearsal ------------------------------------------------------------
    > 1 thread, 1 interpreter =C2=A0 =C2=A03.422000 =C2=A0 0.000000 =C2=A0 3.42=

    2000 ( =C2=A03.422000)
    > 2 threads, 1 interpreter =C2=A0 4.008000 =C2=A0 0.000000 =C2=A0 4.008000 =

    ( =C2=A04.008000)
    > --------------------------------------------------- total: 7.430000sec
    >
    > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=

    =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 user =C2=A0 =C2=A0 system =C2=A0 =C2=A0 =C2=
    =A0total =C2=A0 =C2=A0 =C2=A0 =C2=A0real
    > 1 thread, 1 interpreter =C2=A0 =C2=A02.942000 =C2=A0 0.000000 =C2=A0 2.94=

    2000 ( =C2=A02.942000)
    > 2 threads, 1 interpreter =C2=A0 3.595000 =C2=A0 0.000000 =C2=A0 3.595000 =

    ( =C2=A03.595000)

    This does not match my results. Are you sure both cores are being used?

    > Rehearsal ---------------------------------------------
    > 1 thread =C2=A0 =C2=A06.726000 =C2=A0 0.000000 =C2=A0 6.726000 ( =C2=A06.=

    726000)
    > 2 threads =C2=A0 7.478000 =C2=A0 0.000000 =C2=A0 7.478000 ( =C2=A07.47800=

    0)
    > ----------------------------------- total: 14.204000sec
    >
    > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0user =C2=A0 =C2=A0=

    system =C2=A0 =C2=A0 =C2=A0total =C2=A0 =C2=A0 =C2=A0 =C2=A0real
    > 1 thread =C2=A0 =C2=A06.636000 =C2=A0 0.000000 =C2=A0 6.636000 ( =C2=A06.=

    636000)
    > 2 threads =C2=A0 8.196000 =C2=A0 0.000000 =C2=A0 8.196000 ( =C2=A08.19600=

    0)

    Also does not match my results:

    Rehearsal ---------------------------------------------
    1 thread 4.795000 0.000000 4.795000 ( 4.739000)
    2 threads 3.072000 0.000000 3.072000 ( 3.072000)
    ------------------------------------ total: 7.867000sec

    user system total real
    1 thread 4.081000 0.000000 4.081000 ( 4.081000)
    2 threads 2.966000 0.000000 2.966000 ( 2.966000)

    I'd love to hear from others trying this benchmark, since the results
    you've given don't match my results on any of the systems I'm testing.

    - Charlie
     
    Charles Oliver Nutter, Nov 2, 2009
    #18
  19. Charles Oliver Nutter:
    > This does not match my results. Are you sure both cores are being used?


    I am certain. I tried to head off this question when I said: all
    applications are closed save Terminal; top reports 0% CPU usage
    beforehand; top reports java at 100% CPU during the 1-thread test;
    185% CPU during the 2-thread test; top was not running during the
    posted benchmarks.

    I should also mention this is my mp3 player co-opted into a Mac dev
    machine--a Mac Mini. Maybe Java balks at the specs. System Profiler:

    Model Name: Mac mini
    Model Identifier: Macmini2,1
    Processor Name: Intel Core 2 Duo
    Processor Speed: 1.83 GHz
    Number Of Processors: 1
    Total Number Of Cores: 2
    L2 Cache: 2 MB
    Memory: 1 GB
    Bus Speed: 667 MHz

    Darwin jl.local 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01
    PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

    It would be nice to match jruby versions. Can you try master 55366a1
    or push eaa9e7f to a remote branch?

    [quoting the rest in full due to ruby-forum gateway breakage]

    > Also does not match my results:
    >
    > Rehearsal ---------------------------------------------
    > 1 thread 4.795000 0.000000 4.795000 ( 4.739000)
    > 2 threads 3.072000 0.000000 3.072000 ( 3.072000)
    > ------------------------------------ total: 7.867000sec
    >
    > user system total real
    > 1 thread 4.081000 0.000000 4.081000 ( 4.081000)
    > 2 threads 2.966000 0.000000 2.966000 ( 2.966000)
    >
    > I'd love to hear from others trying this benchmark, since the results
    > you've given don't match my results on any of the systems I'm testing.
    >

    --
    Posted via http://www.ruby-forum.com/.
     
    James M. Lawrence, Nov 3, 2009
    #19
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Giovanni Azua
    Replies:
    2
    Views:
    517
    Kevin McMurtrie
    Mar 25, 2005
  2. Obnoxious User

    Using a multicore-processor

    Obnoxious User, Aug 29, 2008, in forum: C++
    Replies:
    5
    Views:
    401
    Juha Nieminen
    Aug 29, 2008
  3. Jochen Riekhof

    ANN: PTask parallel tasks library

    Jochen Riekhof, Jun 15, 2010, in forum: Java
    Replies:
    0
    Views:
    268
    Jochen Riekhof
    Jun 15, 2010
  4. Replies:
    6
    Views:
    251
  5. EZP
    Replies:
    2
    Views:
    311
    Gunnar Hjalmarsson
    Nov 16, 2006
Loading...

Share This Page