Why not call Thread.join?

Discussion in 'Ruby' started by thefed, Dec 31, 2007.

  1. thefed

    thefed Guest

    Take this code from the Ruby Cookbook:

    module Enumerable
    def each_simultaneously
    threads = []
    each { |e| threads << Thread.new { yield e } }
    return threads
    end
    end

    It is used on an array so that you may do this:
    [1,2,3].each_simultaneously do |i|
    sleep 5
    puts i
    end

    And it works!

    But why don't I need to call threads.each {|t| t.join }?

    And if I did, would it slow it down?

    Thanks,
    Ari
    -------------------------------------------|
    Nietzsche is my copilot
     
    thefed, Dec 31, 2007
    #1
    1. Advertising

  2. On Dec 30, 9:02 pm, thefed <> wrote:
    > Take this code from the Ruby Cookbook:
    >
    > module Enumerable
    >    def each_simultaneously
    >      threads = []
    >      each { |e| threads << Thread.new { yield e } }
    >      return threads
    >    end
    > end
    >
    > It is used on an array so that you may do this:
    > [1,2,3].each_simultaneously do |i|
    >         sleep 5
    >         puts i
    > end
    >
    > And it works!



    What did you expect to happen?
    The example you provided will do nothing but create threads and
    exit.

    > But why don't I need to call threads.each {|t| t.join }?


    Any running threads are killed when the program exits.


    > And if I did, would it slow it down?


    Generally speaking, the only thing it would slow down (stop really) is
    the execution path of the main thread.

    Now if for some reason your main thread has to do other work, a join
    would delay that, of course.
     
    Skye Shaw!@#$, Dec 31, 2007
    #2
    1. Advertising

  3. On 31.12.2007 06:45, Skye Shaw!@#$ wrote:
    > On Dec 30, 9:02 pm, thefed <> wrote:
    >> Take this code from the Ruby Cookbook:
    >>
    >> module Enumerable
    >> def each_simultaneously
    >> threads = []
    >> each { |e| threads << Thread.new { yield e } }
    >> return threads
    >> end
    >> end
    >>
    >> It is used on an array so that you may do this:
    >> [1,2,3].each_simultaneously do |i|
    >> sleep 5
    >> puts i
    >> end
    >>
    >> And it works!

    >
    >
    > What did you expect to happen?
    > The example you provided will do nothing but create threads and
    > exit.
    >
    >> But why don't I need to call threads.each {|t| t.join }?

    >
    > Any running threads are killed when the program exits.
    >
    >
    >> And if I did, would it slow it down?

    >
    > Generally speaking, the only thing it would slow down (stop really) is
    > the execution path of the main thread.
    >
    > Now if for some reason your main thread has to do other work, a join
    > would delay that, of course.


    Nevertheless it's good practice to join. If main has other work to do
    then you should join once that is done, i.e. at the end of the script.
    If those threads have terminated already you basically only have the
    overhead of the Threads Array iteration - but you get robustness in
    return, i.e. you ensure that all those Threads can terminate properly
    (assuming that they are written in a way to do that eventually).

    Kind regards

    robert
     
    Robert Klemme, Dec 31, 2007
    #3
  4. thefed

    thefed Guest

    On Dec 31, 2007, at 12:49 AM, Skye Shaw!@#$ wrote:

    > Generally speaking, the only thing it would slow down (stop really) is
    > the execution path of the main thread.
    >
    > Now if for some reason your main thread has to do other work, a join
    > would delay that, of course.


    OK, I understand it better. But why does each {|t| t.join} join them
    all at the same time (ish), and not wait for the first one to finish
    executing before joining the others?
     
    thefed, Dec 31, 2007
    #4
  5. On 31.12.2007 17:02, thefed wrote:
    > On Dec 31, 2007, at 12:49 AM, Skye Shaw!@#$ wrote:
    >
    >> Generally speaking, the only thing it would slow down (stop really) is
    >> the execution path of the main thread.
    >>
    >> Now if for some reason your main thread has to do other work, a join
    >> would delay that, of course.

    >
    > OK, I understand it better. But why does each {|t| t.join} join them
    > all at the same time (ish), and not wait for the first one to finish
    > executing before joining the others?


    They are not joined at the same time but one after the other.

    Cheers

    robert
     
    Robert Klemme, Dec 31, 2007
    #5
  6. thefed

    Ken Bloom Guest

    On Mon, 31 Dec 2007 00:02:10 -0500, thefed wrote:

    > Take this code from the Ruby Cookbook:
    >
    > module Enumerable
    > def each_simultaneously
    > threads = []
    > each { |e| threads << Thread.new { yield e } } return threads
    > end
    > end
    >
    > It is used on an array so that you may do this:
    > [1,2,3].each_simultaneously do |i|
    > sleep 5
    > puts i
    > end


    When I ran this (not in IRB) it didn't work. The interpreter terminated
    before any of the threads finished sleeping for 5 seconds. In any case,
    you want to join each thread so that the next statement will only execute
    after all of the threads have finished their work (otherwise your next
    statement will see an undetermined intermediate view of the array).

    > OK, I understand it better. But why does each {|t| t.join} join them
    > all at the same time (ish), and not wait for the first one to finish
    > executing before joining the others?


    It joins them one at a time in order. But while your main thread is
    waiting for a specific thread to finish, any other thread is also allowed
    to execute, and possibly terminate. If thread b terminates while thread a
    is joined, then you call join on thread b, join will return immediately
    since there's nothing to wait for. Hence, each{|t| t.join} finishes
    practically immediately when the longest running thread finishes.

    --Ken

    --
    Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
    Department of Computer Science. Illinois Institute of Technology.
    http://www.iit.edu/~kbloom1/
     
    Ken Bloom, Dec 31, 2007
    #6
  7. thefed

    thefed Guest

    On Dec 31, 2007, at 11:15 AM, Robert Klemme wrote:

    > On 31.12.2007 17:02, thefed wrote:


    >> OK, I understand it better. But why does each {|t| t.join} join
    >> them all at the same time (ish), and not wait for the first one
    >> to finish executing before joining the others?

    >
    > They are not joined at the same time but one after the other.


    But then why doesn't this take 15 seconds? t.join is called in the
    main thread, so shouldn't the next Thread#join not get called until
    the first one finishes?

    module Enumerable
    def each_simultaneously
    threads = []
    each { |e| threads >> Thread.new { yield e } }
    return threads
    end
    end

    start_time = Time.now
    [7,8,9].each_simultaneously do |e|
    sleep(5) # Simulate a long, high-latency operation
    print "Completed operation for #{e}!\n"
    end
    # Completed operation for 8!
    # Completed operation for 7!
    # Completed operation for 9!
    Time.now - start_time # => 5.009334
     
    thefed, Dec 31, 2007
    #7
  8. thefed

    thefed Guest

    > module Enumerable
    > def each_simultaneously
    > threads = []
    > each { |e| threads << Thread.new { yield e } }
    > return threads
    > end
    > end


    Sorry all, THIS is the fixed up version of each_simultaneously. Turns
    out Ruby Cookbook has errors, too!
     
    thefed, Dec 31, 2007
    #8
  9. thefed

    Craig Beck Guest

    >>> OK, I understand it better. But why does each {|t| t.join} join
    >>> them all at the same time (ish), and not wait for the first one
    >>> to finish executing before joining the others?

    >>
    >> They are not joined at the same time but one after the other.

    >
    > But then why doesn't this take 15 seconds? t.join is called in the
    > main thread, so shouldn't the next Thread#join not get called until
    > the first one finishes?
    >
    > module Enumerable
    > def each_simultaneously
    > threads = []
    > each { |e| threads >> Thread.new { yield e } }
    > return threads
    > end
    > end
    >
    > start_time = Time.now
    > [7,8,9].each_simultaneously do |e|
    > sleep(5) # Simulate a long, high-latency operation
    > print "Completed operation for #{e}!\n"
    > end
    > # Completed operation for 8!
    > # Completed operation for 7!
    > # Completed operation for 9!
    > Time.now - start_time # => 5.009334


    try looking at the crude timeline below...

    sec 0 1 2 3 4 5
    6 7
    |---------|---------|---------|---------|---------|---------|---------|
    main ====@=================================================
    t[1] ===================================================
    t[2] ===================================================
    t[3] ===================================================

    The @ on the main thread represents when the t.join gets called. It
    waits in this simple case for t[1] to finish it's work (sleeping for 5
    seconds), then waits for t[2]. As t[2] has also been doing work all
    this time, it only blocks the main thread for another 0.1 sec before
    finishing. Same for t[3]. So this contrived example it takes 5 seconds
    + whatever overhead for starting threads.

    You could throw more instrumentation in there if you wish and do
    things like adding additional calls to sleep to simulate extra thread
    overhead to make it more obvious.
     
    Craig Beck, Dec 31, 2007
    #9
  10. thefed

    thefed Guest

    On Dec 31, 2007, at 3:46 PM, Craig Beck wrote:

    > try looking at the crude timeline below...
    >
    > sec 0 1 2 3 4 5
    > 6 7
    > |---------|---------|---------|---------|---------|---------|--------
    > -|
    > main ====@=================================================
    > t[1] ===================================================
    > t[2] ===================================================
    > t[3] ===================================================
    >
    > The @ on the main thread represents when the t.join gets called. It
    > waits in this simple case for t[1] to finish it's work (sleeping
    > for 5 seconds), then waits for t[2]. As t[2] has also been doing
    > work all this time, it only blocks the main thread for another 0.1
    > sec before finishing. Same for t[3]. So this contrived example it
    > takes 5 seconds + whatever overhead for starting threads.
    >
    > You could throw more instrumentation in there if you wish and do
    > things like adding additional calls to sleep to simulate extra
    > thread overhead to make it more obvious.


    Thank you SO MUCH! This really clears threading up for me. In
    retrospect it was less than obvious, but evident nonetheless. But
    this timeline really made the difference for me. Thank you!


    - Ari
     
    thefed, Dec 31, 2007
    #10
  11. thefed

    Ian Whitlock Guest

    Craig Beck wrote:
    >> module Enumerable
    >> print "Completed operation for #{e}!\n"
    >> end
    >> # Completed operation for 8!
    >> # Completed operation for 7!
    >> # Completed operation for 9!
    >> Time.now - start_time # => 5.009334

    >
    > try looking at the crude timeline below...
    >
    > sec 0 1 2 3 4 5
    > 6 7
    > |---------|---------|---------|---------|---------|---------|---------|
    > main ====@=================================================
    > t[1] ===================================================
    > t[2] ===================================================
    > t[3] ===================================================
    >
    > The @ on the main thread represents when the t.join gets called. It
    > waits in this simple case for t[1] to finish it's work (sleeping for 5
    > seconds), then waits for t[2]. As t[2] has also been doing work all
    > this time, it only blocks the main thread for another 0.1 sec before
    > finishing. Same for t[3]. So this contrived example it takes 5 seconds
    > + whatever overhead for starting threads.
    >
    > You could throw more instrumentation in there if you wish and do
    > things like adding additional calls to sleep to simulate extra thread
    > overhead to make it more obvious.


    To me the important point in addition to the parallelism is that, when
    run in batch mode, say with SciTE, main takes less than a second and
    kills all the threads. Hence the messages are never seen. To see
    the reports you have to do something like

    start_time = Time.now
    [7,8,9].each_simultaneously do |e|
    sleep(5) # Simulate a long, high-latency operation
    print "Completed operation for #{e}!\n"
    end
    sleep 5 #######main must take at least 5 seconds!!!!
    # Completed operation for 8!
    # Completed operation for 7!
    # Completed operation for 9!
    Time.now - start_time # => 5.009334

    to guarantee that the threads have 5 seconds to finish
    their operation. Or you can use

    module Enumerable
    def each_simultaneously
    collect {|e| Thread.new {yield e}}.each {|t| t.join}
    end
    end

    which guarantees that the threads will finish before
    control is returned to main.

    In reality it is also important that threads spend a large
    part of their operation just waiting when there is only one
    CPU.

    I think the problem arose because the example on page 760
    of the Ruby Cookbook does not mention the necessity of the
    main thread lasting long enough and does not show code to
    make it happen.

    I realize that much of this may have been obvious to some
    who replied, but as a newby it wasn't to me until I read
    the section and played with the code.

    Ian
    --
    Posted via http://www.ruby-forum.com/.
     
    Ian Whitlock, Jan 1, 2008
    #11
  12. On 01.01.2008 03:25, Ian Whitlock wrote:
    > To me the important point in addition to the parallelism is that, when
    > run in batch mode, say with SciTE, main takes less than a second and
    > kills all the threads. Hence the messages are never seen. To see
    > the reports you have to do something like
    >
    > start_time = Time.now
    > [7,8,9].each_simultaneously do |e|
    > sleep(5) # Simulate a long, high-latency operation
    > print "Completed operation for #{e}!\n"
    > end
    > sleep 5 #######main must take at least 5 seconds!!!!


    Sorry to say that, but this is a bogus solution. Using sleep for this
    is not a good idea: if tasks take longer then you will loose output
    anyway or even risk that some tasks are not finished properly, if all
    tasks are finished much faster you'll waste time.

    The thread killing is the exact reason why #each_simultaneously was
    built to return an Array of Thread objects. That way you can join all
    the threads.

    > # Completed operation for 8!
    > # Completed operation for 7!
    > # Completed operation for 9!
    > Time.now - start_time # => 5.009334
    >
    > to guarantee that the threads have 5 seconds to finish
    > their operation. Or you can use
    >
    > module Enumerable
    > def each_simultaneously
    > collect {|e| Thread.new {yield e}}.each {|t| t.join}
    > end
    > end
    >
    > which guarantees that the threads will finish before
    > control is returned to main.


    I prefer the solution that does not join in the method but returns
    Threads. If you think about it, that version is significantly more
    flexible. You can join those threads immediately

    an_enum.each_simultaneously {|e| ... }.each {|th| th.join}

    but you can as well do some work in between

    threads = an_enum.each_simultaneously {|e| ... }
    do_some_work
    ....
    threads.each {|th| th.join}

    > I realize that much of this may have been obvious to some
    > who replied, but as a newby it wasn't to me until I read
    > the section and played with the code.


    When I was initially confronted with multithreading it also took me a
    while. For me at the time it was difficult to not confuse Thread
    objects with threads. This was in Java which decouples Thread object
    creation and thread execution, which probably makes it a bit easier to
    grasp the concepts.

    It is important to keep this distinction in mind: a Thread object in a
    way is an object that is like any other object just with the added twist
    that it *may* be associated with an independent thread of execution
    (i.e. in Java it is not associated until the thread starts and after the
    thread terminates, in Ruby the association is there right from the start
    because threads are started immediately and lasts until the thread
    terminates).

    Kind regards

    robert
     
    Robert Klemme, Jan 1, 2008
    #12
  13. thefed

    Ian Whitlock Guest

    Robert Klemme wrote:

    > I prefer the solution that does not join in the method but returns
    > Threads. If you think about it, that version is significantly more
    > flexible. You can join those threads immediately
    >
    > an_enum.each_simultaneously {|e| ... }.each {|th| th.join}
    >
    > but you can as well do some work in between
    >
    > threads = an_enum.each_simultaneously {|e| ... }
    > do_some_work
    > ...
    > threads.each {|th| th.join}
    >


    Thanks. That helps both with my understanding the significance
    of collect and threads.

    Ian
    --
    Posted via http://www.ruby-forum.com/.
     
    Ian Whitlock, Jan 1, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alan Silver
    Replies:
    0
    Views:
    914
    Alan Silver
    Jun 5, 2006
  2. googleboy
    Replies:
    1
    Views:
    953
    Benji York
    Oct 1, 2005
  3. Mr. SweatyFinger
    Replies:
    2
    Views:
    2,237
    Smokey Grindel
    Dec 2, 2006
  4. Replies:
    5
    Views:
    1,654
    Roedy Green
    Jun 20, 2008
  5. David Karr
    Replies:
    1
    Views:
    112
    Willem
    Apr 6, 2011
Loading...

Share This Page