modifying a Hash in one process when .each is running in another

Discussion in 'Ruby' started by Nathan, Apr 7, 2010.

  1. Nathan

    Nathan Guest

    I want one process to continually loop through a list of objects (in
    the form of a hash), while another process continually refreshes that
    list. I've done it the most obvious way below, but are there any
    pitfalls here? I don't really know how the each method works, so will
    it be looking for keys that may not be there anymore? Would I be
    better off doing some kind of merge of the old hash with the new,
    rather than replacing the old hash entirely?
    Any help much appreciated.
    Nathan

    tester = Thread.new {
    until terminate
    myHash.each do |key,value|
    value.test()
    end
    end
    }

    while Time.now < time_to_finish
    myHash = MyClass.generate_list_of_objects
    end

    terminate=true

    tester.join
     
    Nathan, Apr 7, 2010
    #1
    1. Advertising

  2. Nathan Macinnes wrote:
    > I want one process to continually loop through a list of objects (in
    > the form of a hash), while another process continually refreshes that
    > list. I've done it the most obvious way below, but are there any
    > pitfalls here? I don't really know how the each method works, so will
    > it be looking for keys that may not be there anymore? Would I be
    > better off doing some kind of merge of the old hash with the new,
    > rather than replacing the old hash entirely?


    Fortunately you are not replacing or modifying the hash at all (and
    modifying a hash while iterating through it is a really bad idea)

    Rather, you are updating the local variable 'myHash' to point to a new
    hash. The tester thread, once it has started iterating through the old
    hash, will continue to do so until it gets to the end.

    You could make this even more explicit by eliminating the 'terminate'
    variable:

    tester = Thread.new {
    while myHash
    myHash.each ...
    end
    }

    ...
    myHash = nil
    tester.join

    But is there a particular reason to do this using threads? It would be
    simpler like this:

    while Time.now < time_to_finish
    myHash = MyClass.generate_list_of_objects
    myHash.each do |key,value|
    value.test
    end
    end

    Note that MRI won't give you thread concurrency across multiple cores,
    although JRuby does.
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Apr 7, 2010
    #2
    1. Advertising

  3. Nathan

    Nathan Guest

    Thanks for the clarification... My application is network based, and
    some operations take several seconds. generate_list_of_objects takes
    anything between 2 and 15 seconds, so I wouldn't like to have my whole
    program on hold while that's happening. Each test process takes a few
    seconds too, so if the object is no longer on the list, it isn't a
    particularly bad thing if test tries to run, but it'll take up a lot
    of time unnecessarily.

    I'm trying to work out if there's another way of doing it then.
    Perhaps I'll modify the test method so that it knows if the current
    object is on the list, and only runs the test if it is.
    Nathan

    On Apr 7, 2:25=A0pm, Brian Candler <> wrote:
    > Nathan Macinnes wrote:
    > > I want one process to continually loop through a list of objects (in
    > > the form of a hash), while another process continually refreshes that
    > > list. I've done it the most obvious way below, but are there any
    > > pitfalls here? I don't really know how the each method works, so will
    > > it be looking for keys that may not be there anymore? Would I be
    > > better off doing some kind of merge of the old hash with the new,
    > > rather than replacing the old hash entirely?

    >
    > Fortunately you are not replacing or modifying the hash at all (and
    > modifying a hash while iterating through it is a really bad idea)
    >
    > Rather, you are updating the local variable 'myHash' to point to a new
    > hash. The tester thread, once it has started iterating through the old
    > hash, will continue to do so until it gets to the end.
    >
    > You could make this even more explicit by eliminating the 'terminate'
    > variable:
    >
    > tester =3D Thread.new {
    > =A0 while myHash
    > =A0 =A0 myHash.each ...
    > =A0 end
    >
    > }
    >
    > ...
    > myHash =3D nil
    > tester.join
    >
    > But is there a particular reason to do this using threads? It would be
    > simpler like this:
    >
    > while Time.now < time_to_finish
    > =A0 myHash =3D MyClass.generate_list_of_objects
    > =A0 myHash.each do |key,value|
    > =A0 =A0 value.test
    > =A0 end
    > end
    >
    > Note that MRI won't give you thread concurrency across multiple cores,
    > although JRuby does.
    > --
    > Posted viahttp://www.ruby-forum.com/.
     
    Nathan, Apr 7, 2010
    #3
  4. On Wed, 7 Apr 2010, Nathan wrote:

    > Thanks for the clarification... My application is network based, and
    > some operations take several seconds. generate_list_of_objects takes
    > anything between 2 and 15 seconds, so I wouldn't like to have my whole
    > program on hold while that's happening. Each test process takes a few
    > seconds too, so if the object is no longer on the list, it isn't a
    > particularly bad thing if test tries to run, but it'll take up a lot
    > of time unnecessarily.
    >
    > I'm trying to work out if there's another way of doing it then.
    > Perhaps I'll modify the test method so that it knows if the current
    > object is on the list, and only runs the test if it is.
    > Nathan


    Might a messaging queue work better? That way you don't have the
    concurrency issues, as well as the issues with changing a hash in the
    middle of iteration.

    Matt
     
    Matthew K. Williams, Apr 7, 2010
    #4
  5. Nathan

    Dan Drew Guest

    Depending on how memory efficient you want to be you could also try.

    1) Memory hog method... this will have potentially two copies of your =
    data at a given time

    Thread 1:
    lock_shared # using whatever mechanism you want such as a mutex
    testHash =3D sharedHash
    unlock_shared
    # iterate and test

    Thread 2
    newHash =3D generate_hash()
    lock_shared
    sharedHash =3D newHash
    unlock_shared

    2) Memory efficient but more contention

    Thread 1
    lock_shared
    test_keys =3D sharedHash.keys
    unlock_shared
    test_keys.each do |k|
    lock_shared
    v =3D sharedHash[k]
    unlock_shared
    test(v) if v
    end

    Thread 2
    # for each item as it's loaded from the network
    generate_index do |k,v| =20
    next if sharedHash[k] =3D=3D v # Optional to avoid unnecessary =
    contention
    lock_shared
    if v
    sharedHash[k] =3D v
    else
    sharedHash.delete(k)
    end
    unlock_shared
    end

    Dan


    From: Matthew K. Williams=20
    Sent: Wednesday, April 07, 2010 10:02 AM
    To: ruby-talk ML=20
    Subject: Re: modifying a Hash in one process when .each is running in =
    another


    On Wed, 7 Apr 2010, Nathan wrote:

    > Thanks for the clarification... My application is network based, and
    > some operations take several seconds. generate_list_of_objects takes
    > anything between 2 and 15 seconds, so I wouldn't like to have my whole
    > program on hold while that's happening. Each test process takes a few
    > seconds too, so if the object is no longer on the list, it isn't a
    > particularly bad thing if test tries to run, but it'll take up a lot
    > of time unnecessarily.
    >
    > I'm trying to work out if there's another way of doing it then.
    > Perhaps I'll modify the test method so that it knows if the current
    > object is on the list, and only runs the test if it is.
    > Nathan


    Might a messaging queue work better? That way you don't have the=20
    concurrency issues, as well as the issues with changing a hash in the=20
    middle of iteration.

    Matt
     
    Dan Drew, Apr 7, 2010
    #5
  6. Nathan Macinnes wrote:
    > Thanks for the clarification... My application is network based, and
    > some operations take several seconds. generate_list_of_objects takes
    > anything between 2 and 15 seconds, so I wouldn't like to have my whole
    > program on hold while that's happening. Each test process takes a few
    > seconds too, so if the object is no longer on the list, it isn't a
    > particularly bad thing if test tries to run, but it'll take up a lot
    > of time unnecessarily.


    Let's say generate_list_objects takes 15 seconds, and the test suite
    (iterating the objects) takes 60 seconds. Do you really want
    generate_list_objects to run 4 times uselessly while the test suite runs
    the original set of tests?

    Similarly, if the test suite takes 15 seconds and generate_list_objects
    takes 60 seconds, do you want to run the test suite 4 times identically
    with the first set of objects?

    If not, but rather you want to run the test suite exactly once for each
    set of data, then you should synchronize them. How about this
    (untested):

    require 'thread'
    cmd = Queue.new
    res = Queue.new
    tester = Thread.new do
    while myHash = cmd.pop
    myHash.each { ... }
    res.push [myHash, :eek:k]
    end
    end

    q.push MyClass.generate_list_of_objects
    while Time.now < time_to_finish
    q.push MyClass.generate_list_of_objects
    res.pop
    end
    res.pop
    q.push nil
    tester.wait
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Apr 7, 2010
    #6
  7. Brian Candler, Apr 7, 2010
    #7
  8. Nathan

    Nathan Guest

    > Do you really want generate_list_objects to run 4 times uselessly while
    > the test suite runs the original set of tests?


    I think you've misunderstood... sorry if I wasn't clear. Perhaps
    naming the method test was a bad idea. I don't mean test as in unit
    test or whatever, I mean something completely different. If an object
    is no longer on the list returned by generate_list_of_objects, that
    means it has become unavailable, and running test won't accomplish
    anything meaningful. That's why I'm interested in keeping the list as
    up-to-date as possible.

    On Apr 7, 3:45=A0pm, Brian Candler <> wrote:
    > Nathan Macinnes wrote:
    > > Thanks for the clarification... My application is network based, and
    > > some operations take several seconds. generate_list_of_objects takes
    > > anything between 2 and 15 seconds, so I wouldn't like to have my whole
    > > program on hold while that's happening. Each test process takes a few
    > > seconds too, so if the object is no longer on the list, it isn't a
    > > particularly bad thing if test tries to run, but it'll take up a lot
    > > of time unnecessarily.

    >
    > Let's say generate_list_objects takes 15 seconds, and the test suite
    > (iterating the objects) takes 60 seconds. Do you really want
    > generate_list_objects to run 4 times uselessly while the test suite runs
    > the original set of tests?
    >
    > Similarly, if the test suite takes 15 seconds and generate_list_objects
    > takes 60 seconds, do you want to run the test suite 4 times identically
    > with the first set of objects?
    >
    > If not, but rather you want to run the test suite exactly once for each
    > set of data, then you should synchronize them. How about this
    > (untested):
    >
    > require 'thread'
    > cmd =3D Queue.new
    > res =3D Queue.new
    > tester =3D Thread.new do
    > =A0 while myHash =3D cmd.pop
    > =A0 =A0 myHash.each { ... }
    > =A0 =A0 res.push [myHash, :eek:k]
    > =A0 end
    > end
    >
    > q.push MyClass.generate_list_of_objects
    > while Time.now < time_to_finish
    > =A0 q.push MyClass.generate_list_of_objects
    > =A0 res.pop
    > end
    > res.pop
    > q.push nil
    > tester.wait
    > --
    > Posted viahttp://www.ruby-forum.com/.
     
    Nathan, Apr 7, 2010
    #8
  9. Nathan

    Nathan Guest

    Dan, many thanks for this. Method 2 looks like it'll suite me
    perfectly. As I've explained in reply to Brian's post, I don't want to
    test objects which are no longer on the most up-to-date list, and, if
    I've understood correctly, method 1 would still do that, but method 2
    wouldn't. So that looks like the solution.
    Thanks,
    Nathan

    On Apr 7, 3:32=A0pm, "Dan Drew" <> wrote:
    > Depending on how memory efficient you want to be you could also try.
    >
    > 1) Memory hog method... this will have potentially two copies of your dat=

    a at a given time
    >
    > Thread 1:
    > =A0 =A0 lock_shared =A0 # using whatever mechanism you want such as a mut=

    ex
    > =A0 =A0 testHash =3D sharedHash
    > =A0 =A0 unlock_shared
    > =A0 =A0 # iterate and test
    >
    > Thread 2
    > =A0 =A0 newHash =3D generate_hash()
    > =A0 =A0 lock_shared
    > =A0 =A0 sharedHash =3D newHash
    > =A0 =A0 unlock_shared
    >
    > 2) Memory efficient but more contention
    >
    > Thread 1
    > =A0 =A0 lock_shared
    > =A0 =A0 test_keys =3D sharedHash.keys
    > =A0 =A0 unlock_shared
    > =A0 =A0 test_keys.each do |k|
    > =A0 =A0 =A0 =A0 lock_shared
    > =A0 =A0 =A0 =A0 v =3D sharedHash[k]
    > =A0 =A0 =A0 =A0 unlock_shared
    > =A0 =A0 =A0 =A0 test(v) if v
    > =A0 =A0 end
    >
    > Thread 2
    > =A0 =A0 # for each item as it's loaded from the network
    > =A0 =A0 generate_index do |k,v| =A0 =A0 =A0 =A0 =A0
    > =A0 =A0 =A0 =A0 next if sharedHash[k] =3D=3D v =A0# Optional to avoid unn=

    ecessary contention
    > =A0 =A0 =A0 =A0 lock_shared
    > =A0 =A0 =A0 =A0 if v
    > =A0 =A0 =A0 =A0 =A0 =A0 sharedHash[k] =3D v
    > =A0 =A0 =A0 =A0 else
    > =A0 =A0 =A0 =A0 =A0 =A0 sharedHash.delete(k)
    > =A0 =A0 =A0 =A0 end
    > =A0 =A0 =A0 =A0 unlock_shared
    > =A0 =A0 end
    >
    > Dan
    >
    > From: Matthew K. Williams
    > Sent: Wednesday, April 07, 2010 10:02 AM
    > To: ruby-talk ML
    > Subject: Re: modifying a Hash in one process when .each is running in ano=

    ther
    >
    > On Wed, 7 Apr 2010, Nathan wrote:
    > > Thanks for the clarification... My application is network based, and
    > > some operations take several seconds. generate_list_of_objects takes
    > > anything between 2 and 15 seconds, so I wouldn't like to have my whole
    > > program on hold while that's happening. Each test process takes a few
    > > seconds too, so if the object is no longer on the list, it isn't a
    > > particularly bad thing if test tries to run, but it'll take up a lot
    > > of time unnecessarily.

    >
    > > I'm trying to work out if there's another way of doing it then.
    > > Perhaps I'll modify the test method so that it knows if the current
    > > object is on the list, and only runs the test if it is.
    > > Nathan

    >
    > Might a messaging queue work better? =A0That way you don't have the
    > concurrency issues, as well as the issues with changing a hash in the
    > middle of iteration.
    >
    > Matt
     
    Nathan, Apr 7, 2010
    #9
  10. Nathan

    Nathan Guest

    Sorry, that should read "it's source has become unavailable".
    Obviously the object itself is still there!!!

    On Apr 7, 3:53=A0pm, Nathan <> wrote:
    > > Do you really want generate_list_objects to run 4 times uselessly while
    > > the test suite runs the original set of tests?

    >
    > I think you've misunderstood... sorry if I wasn't clear. Perhaps
    > naming the method test was a bad idea. I don't mean test as in unit
    > test or whatever, I mean something completely different. If an object
    > is no longer on the list returned by generate_list_of_objects, that
    > means it has become unavailable, and running test won't accomplish
    > anything meaningful. That's why I'm interested in keeping the list as
    > up-to-date as possible.
    >
    > On Apr 7, 3:45=A0pm, Brian Candler <> wrote:
    >
    >
    >
    > > Nathan Macinnes wrote:
    > > > Thanks for the clarification... My application is network based, and
    > > > some operations take several seconds. generate_list_of_objects takes
    > > > anything between 2 and 15 seconds, so I wouldn't like to have my whol=

    e
    > > > program on hold while that's happening. Each test process takes a few
    > > > seconds too, so if the object is no longer on the list, it isn't a
    > > > particularly bad thing if test tries to run, but it'll take up a lot
    > > > of time unnecessarily.

    >
    > > Let's say generate_list_objects takes 15 seconds, and the test suite
    > > (iterating the objects) takes 60 seconds. Do you really want
    > > generate_list_objects to run 4 times uselessly while the test suite run=

    s
    > > the original set of tests?

    >
    > > Similarly, if the test suite takes 15 seconds and generate_list_objects
    > > takes 60 seconds, do you want to run the test suite 4 times identically
    > > with the first set of objects?

    >
    > > If not, but rather you want to run the test suite exactly once for each
    > > set of data, then you should synchronize them. How about this
    > > (untested):

    >
    > > require 'thread'
    > > cmd =3D Queue.new
    > > res =3D Queue.new
    > > tester =3D Thread.new do
    > > =A0 while myHash =3D cmd.pop
    > > =A0 =A0 myHash.each { ... }
    > > =A0 =A0 res.push [myHash, :eek:k]
    > > =A0 end
    > > end

    >
    > > q.push MyClass.generate_list_of_objects
    > > while Time.now < time_to_finish
    > > =A0 q.push MyClass.generate_list_of_objects
    > > =A0 res.pop
    > > end
    > > res.pop
    > > q.push nil
    > > tester.wait
    > > --
    > > Posted viahttp://www.ruby-forum.com/.
     
    Nathan, Apr 7, 2010
    #10
  11. Nathan

    Dan Drew Guest

    No problem, glad I could help
    Dan


    From: Nathan=20
    Sent: Wednesday, April 07, 2010 10:57 AM
    To: ruby-talk ML=20
    Subject: Re: modifying a Hash in one process when .each is running in =
    another


    Dan, many thanks for this. Method 2 looks like it'll suite me
    perfectly. As I've explained in reply to Brian's post, I don't want to
    test objects which are no longer on the most up-to-date list, and, if
    I've understood correctly, method 1 would still do that, but method 2
    wouldn't. So that looks like the solution.
    Thanks,
    Nathan

    On Apr 7, 3:32 pm, "Dan Drew" <> wrote:
    > Depending on how memory efficient you want to be you could also try.
    >
    > 1) Memory hog method... this will have potentially two copies of your =

    data at a given time
    >
    > Thread 1:
    > lock_shared # using whatever mechanism you want such as a mutex
    > testHash =3D sharedHash
    > unlock_shared
    > # iterate and test
    >
    > Thread 2
    > newHash =3D generate_hash()
    > lock_shared
    > sharedHash =3D newHash
    > unlock_shared
    >
    > 2) Memory efficient but more contention
    >
    > Thread 1
    > lock_shared
    > test_keys =3D sharedHash.keys
    > unlock_shared
    > test_keys.each do |k|
    > lock_shared
    > v =3D sharedHash[k]
    > unlock_shared
    > test(v) if v
    > end
    >
    > Thread 2
    > # for each item as it's loaded from the network
    > generate_index do |k,v| =20
    > next if sharedHash[k] =3D=3D v # Optional to avoid =

    unnecessary contention
    > lock_shared
    > if v
    > sharedHash[k] =3D v
    > else
    > sharedHash.delete(k)
    > end
    > unlock_shared
    > end
    >
    > Dan
    >
    > From: Matthew K. Williams
    > Sent: Wednesday, April 07, 2010 10:02 AM
    > To: ruby-talk ML
    > Subject: Re: modifying a Hash in one process when .each is running in =

    another
    >
    > On Wed, 7 Apr 2010, Nathan wrote:
    > > Thanks for the clarification... My application is network based, and
    > > some operations take several seconds. generate_list_of_objects takes
    > > anything between 2 and 15 seconds, so I wouldn't like to have my =

    whole
    > > program on hold while that's happening. Each test process takes a =

    few
    > > seconds too, so if the object is no longer on the list, it isn't a
    > > particularly bad thing if test tries to run, but it'll take up a lot
    > > of time unnecessarily.

    >
    > > I'm trying to work out if there's another way of doing it then.
    > > Perhaps I'll modify the test method so that it knows if the current
    > > object is on the list, and only runs the test if it is.
    > > Nathan

    >
    > Might a messaging queue work better? That way you don't have the
    > concurrency issues, as well as the issues with changing a hash in the
    > middle of iteration.
    >
    > Matt
     
    Dan Drew, Apr 7, 2010
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. navS
    Replies:
    3
    Views:
    537
    Ismo Salonen
    May 9, 2008
  2. rp
    Replies:
    1
    Views:
    563
    red floyd
    Nov 10, 2011
  3. Igor Nn
    Replies:
    7
    Views:
    457
    Johnny Morrice
    May 28, 2011
  4. PerlFAQ Server
    Replies:
    0
    Views:
    149
    PerlFAQ Server
    Jan 26, 2011
  5. PerlFAQ Server

    FAQ 6.14 How do I process each word on each line?

    PerlFAQ Server, Apr 8, 2011, in forum: Perl Misc
    Replies:
    0
    Views:
    163
    PerlFAQ Server
    Apr 8, 2011
Loading...

Share This Page