modifying a Hash in one process when .each is running in another

Nathan · Apr 7, 2010

I want one process to continually loop through a list of objects (in
the form of a hash), while another process continually refreshes that
list. I've done it the most obvious way below, but are there any
pitfalls here? I don't really know how the each method works, so will
it be looking for keys that may not be there anymore? Would I be
better off doing some kind of merge of the old hash with the new,
rather than replacing the old hash entirely?
Any help much appreciated.
Nathan

tester = Thread.new {
until terminate
myHash.each do |key,value|
value.test()
end
end
}

while Time.now < time_to_finish
myHash = MyClass.generate_list_of_objects
end

terminate=true

tester.join

Brian Candler · Apr 7, 2010

Nathan said:
I want one process to continually loop through a list of objects (in
the form of a hash), while another process continually refreshes that
list. I've done it the most obvious way below, but are there any
pitfalls here? I don't really know how the each method works, so will
it be looking for keys that may not be there anymore? Would I be
better off doing some kind of merge of the old hash with the new,
rather than replacing the old hash entirely?

Fortunately you are not replacing or modifying the hash at all (and
modifying a hash while iterating through it is a really bad idea)

Rather, you are updating the local variable 'myHash' to point to a new
hash. The tester thread, once it has started iterating through the old
hash, will continue to do so until it gets to the end.

You could make this even more explicit by eliminating the 'terminate'
variable:

tester = Thread.new {
while myHash
myHash.each ...
end
}

...
myHash = nil
tester.join

But is there a particular reason to do this using threads? It would be
simpler like this:

while Time.now < time_to_finish
myHash = MyClass.generate_list_of_objects
myHash.each do |key,value|
value.test
end
end

Note that MRI won't give you thread concurrency across multiple cores,
although JRuby does.

Nathan · Apr 7, 2010

Thanks for the clarification... My application is network based, and
some operations take several seconds. generate_list_of_objects takes
anything between 2 and 15 seconds, so I wouldn't like to have my whole
program on hold while that's happening. Each test process takes a few
seconds too, so if the object is no longer on the list, it isn't a
particularly bad thing if test tries to run, but it'll take up a lot
of time unnecessarily.

I'm trying to work out if there's another way of doing it then.
Perhaps I'll modify the test method so that it knows if the current
object is on the list, and only runs the test if it is.
Nathan

Matthew K. Williams · Apr 7, 2010

Thanks for the clarification... My application is network based, and
some operations take several seconds. generate_list_of_objects takes
anything between 2 and 15 seconds, so I wouldn't like to have my whole
program on hold while that's happening. Each test process takes a few
seconds too, so if the object is no longer on the list, it isn't a
particularly bad thing if test tries to run, but it'll take up a lot
of time unnecessarily.

I'm trying to work out if there's another way of doing it then.
Perhaps I'll modify the test method so that it knows if the current
object is on the list, and only runs the test if it is.
Nathan

Might a messaging queue work better? That way you don't have the
concurrency issues, as well as the issues with changing a hash in the
middle of iteration.

Matt

Dan Drew · Apr 7, 2010

Depending on how memory efficient you want to be you could also try.

1) Memory hog method... this will have potentially two copies of your =
data at a given time

Thread 1:
lock_shared # using whatever mechanism you want such as a mutex
testHash =3D sharedHash
unlock_shared
# iterate and test

Thread 2
newHash =3D generate_hash()
lock_shared
sharedHash =3D newHash
unlock_shared

2) Memory efficient but more contention

Thread 1
lock_shared
test_keys =3D sharedHash.keys
unlock_shared
test_keys.each do |k|
lock_shared
v =3D sharedHash[k]
unlock_shared
test(v) if v
end

Thread 2
# for each item as it's loaded from the network
generate_index do |k,v| =20
next if sharedHash[k] =3D=3D v # Optional to avoid unnecessary =
contention
lock_shared
if v
sharedHash[k] =3D v
else
sharedHash.delete(k)
end
unlock_shared
end

Dan

From: Matthew K. Williams=20
Sent: Wednesday, April 07, 2010 10:02 AM
To: ruby-talk ML=20
Subject: Re: modifying a Hash in one process when .each is running in =
another

Thanks for the clarification... My application is network based, and
some operations take several seconds. generate_list_of_objects takes
anything between 2 and 15 seconds, so I wouldn't like to have my whole
program on hold while that's happening. Each test process takes a few
seconds too, so if the object is no longer on the list, it isn't a
particularly bad thing if test tries to run, but it'll take up a lot
of time unnecessarily.

I'm trying to work out if there's another way of doing it then.
Perhaps I'll modify the test method so that it knows if the current
object is on the list, and only runs the test if it is.
Nathan

Might a messaging queue work better? That way you don't have the=20
concurrency issues, as well as the issues with changing a hash in the=20
middle of iteration.

Matt

Brian Candler · Apr 7, 2010

Nathan said:
Thanks for the clarification... My application is network based, and
some operations take several seconds. generate_list_of_objects takes
anything between 2 and 15 seconds, so I wouldn't like to have my whole
program on hold while that's happening. Each test process takes a few
seconds too, so if the object is no longer on the list, it isn't a
particularly bad thing if test tries to run, but it'll take up a lot
of time unnecessarily.

Let's say generate_list_objects takes 15 seconds, and the test suite
(iterating the objects) takes 60 seconds. Do you really want
generate_list_objects to run 4 times uselessly while the test suite runs
the original set of tests?

Similarly, if the test suite takes 15 seconds and generate_list_objects
takes 60 seconds, do you want to run the test suite 4 times identically
with the first set of objects?

If not, but rather you want to run the test suite exactly once for each
set of data, then you should synchronize them. How about this
(untested):

require 'thread'
cmd = Queue.new
res = Queue.new
tester = Thread.new do
while myHash = cmd.pop
myHash.each { ... }
res.push [myHash,

k]
end
end

q.push MyClass.generate_list_of_objects
while Time.now < time_to_finish
q.push MyClass.generate_list_of_objects
res.pop
end
res.pop
q.push nil
tester.wait

Brian Candler · Apr 7, 2010

Replace 'q' with 'cmd' in that of course.

Nathan · Apr 7, 2010

Do you really want generate_list_objects to run 4 times uselessly while

the test suite runs the original set of tests?

I think you've misunderstood... sorry if I wasn't clear. Perhaps
naming the method test was a bad idea. I don't mean test as in unit
test or whatever, I mean something completely different. If an object
is no longer on the list returned by generate_list_of_objects, that
means it has become unavailable, and running test won't accomplish
anything meaningful. That's why I'm interested in keeping the list as
up-to-date as possible.

Nathan said:
Nathan said:

Thanks for the clarification... My application is network based, and
some operations take several seconds. generate_list_of_objects takes
anything between 2 and 15 seconds, so I wouldn't like to have my whole
program on hold while that's happening. Each test process takes a few
seconds too, so if the object is no longer on the list, it isn't a
particularly bad thing if test tries to run, but it'll take up a lot
of time unnecessarily.

Click to expand...

Let's say generate_list_objects takes 15 seconds, and the test suite
(iterating the objects) takes 60 seconds. Do you really want
generate_list_objects to run 4 times uselessly while the test suite runs
the original set of tests?

Similarly, if the test suite takes 15 seconds and generate_list_objects
takes 60 seconds, do you want to run the test suite 4 times identically
with the first set of objects?

If not, but rather you want to run the test suite exactly once for each
set of data, then you should synchronize them. How about this
(untested):

require 'thread'
cmd =3D Queue.new
res =3D Queue.new
tester =3D Thread.new do
=A0 while myHash =3D cmd.pop
=A0 =A0 myHash.each { ... }
=A0 =A0 res.push [myHash, k]
=A0 end
end

q.push MyClass.generate_list_of_objects
while Time.now < time_to_finish
=A0 q.push MyClass.generate_list_of_objects
=A0 res.pop
end
res.pop
q.push nil
tester.wait

Nathan · Apr 7, 2010

Dan, many thanks for this. Method 2 looks like it'll suite me
perfectly. As I've explained in reply to Brian's post, I don't want to
test objects which are no longer on the most up-to-date list, and, if
I've understood correctly, method 1 would still do that, but method 2
wouldn't. So that looks like the solution.
Thanks,
Nathan

Depending on how memory efficient you want to be you could also try.

1) Memory hog method... this will have potentially two copies of your dat= a at a given time

Thread 1:
=A0 =A0 lock_shared =A0 # using whatever mechanism you want such as a mut= ex
=A0 =A0 testHash =3D sharedHash
=A0 =A0 unlock_shared
=A0 =A0 # iterate and test

Thread 2
=A0 =A0 newHash =3D generate_hash()
=A0 =A0 lock_shared
=A0 =A0 sharedHash =3D newHash
=A0 =A0 unlock_shared

2) Memory efficient but more contention

Thread 1
=A0 =A0 lock_shared
=A0 =A0 test_keys =3D sharedHash.keys
=A0 =A0 unlock_shared
=A0 =A0 test_keys.each do |k|
=A0 =A0 =A0 =A0 lock_shared
=A0 =A0 =A0 =A0 v =3D sharedHash[k]
=A0 =A0 =A0 =A0 unlock_shared
=A0 =A0 =A0 =A0 test(v) if v
=A0 =A0 end

Thread 2
=A0 =A0 # for each item as it's loaded from the network
=A0 =A0 generate_index do |k,v| =A0 =A0 =A0 =A0 =A0
=A0 =A0 =A0 =A0 next if sharedHash[k] =3D=3D v =A0# Optional to avoid unn= ecessary contention
=A0 =A0 =A0 =A0 lock_shared
=A0 =A0 =A0 =A0 if v
=A0 =A0 =A0 =A0 =A0 =A0 sharedHash[k] =3D v
=A0 =A0 =A0 =A0 else
=A0 =A0 =A0 =A0 =A0 =A0 sharedHash.delete(k)
=A0 =A0 =A0 =A0 end
=A0 =A0 =A0 =A0 unlock_shared
=A0 =A0 end

Dan

From: Matthew K. Williams
Sent: Wednesday, April 07, 2010 10:02 AM
To: ruby-talk ML
Subject: Re: modifying a Hash in one process when .each is running in ano= ther

Thanks for the clarification... My application is network based, and
some operations take several seconds. generate_list_of_objects takes
anything between 2 and 15 seconds, so I wouldn't like to have my whole
program on hold while that's happening. Each test process takes a few
seconds too, so if the object is no longer on the list, it isn't a
particularly bad thing if test tries to run, but it'll take up a lot
of time unnecessarily.

Click to expand...

I'm trying to work out if there's another way of doing it then.
Perhaps I'll modify the test method so that it knows if the current
object is on the list, and only runs the test if it is.
Nathan

Click to expand...

Might a messaging queue work better? =A0That way you don't have the
concurrency issues, as well as the issues with changing a hash in the
middle of iteration.

Matt

Nathan · Apr 7, 2010

Sorry, that should read "it's source has become unavailable".
Obviously the object itself is still there!!!

Do you really want generate_list_objects to run 4 times uselessly while
the test suite runs the original set of tests?

Click to expand...

I think you've misunderstood... sorry if I wasn't clear. Perhaps
naming the method test was a bad idea. I don't mean test as in unit
test or whatever, I mean something completely different. If an object
is no longer on the list returned by generate_list_of_objects, that
means it has become unavailable, and running test won't accomplish
anything meaningful. That's why I'm interested in keeping the list as
up-to-date as possible.

Let's say generate_list_objects takes 15 seconds, and the test suite
(iterating the objects) takes 60 seconds. Do you really want
generate_list_objects to run 4 times uselessly while the test suite run= s
the original set of tests?

Click to expand...

Similarly, if the test suite takes 15 seconds and generate_list_objects
takes 60 seconds, do you want to run the test suite 4 times identically
with the first set of objects?

Click to expand...

If not, but rather you want to run the test suite exactly once for each
set of data, then you should synchronize them. How about this
(untested):

Click to expand...

require 'thread'
cmd =3D Queue.new
res =3D Queue.new
tester =3D Thread.new do
=A0 while myHash =3D cmd.pop
=A0 =A0 myHash.each { ... }
=A0 =A0 res.push [myHash, k]
=A0 end
end

Click to expand...

q.push MyClass.generate_list_of_objects
while Time.now < time_to_finish
=A0 q.push MyClass.generate_list_of_objects
=A0 res.pop
end
res.pop
q.push nil
tester.wait

Click to expand...

Dan Drew · Apr 7, 2010

No problem, glad I could help
Dan

From: Nathan=20
Sent: Wednesday, April 07, 2010 10:57 AM
To: ruby-talk ML=20
Subject: Re: modifying a Hash in one process when .each is running in =
another

Dan, many thanks for this. Method 2 looks like it'll suite me
perfectly. As I've explained in reply to Brian's post, I don't want to
test objects which are no longer on the most up-to-date list, and, if
I've understood correctly, method 1 would still do that, but method 2
wouldn't. So that looks like the solution.
Thanks,
Nathan

Depending on how memory efficient you want to be you could also try.

1) Memory hog method... this will have potentially two copies of your = data at a given time

Thread 1:
lock_shared # using whatever mechanism you want such as a mutex
testHash =3D sharedHash
unlock_shared
# iterate and test

Thread 2
newHash =3D generate_hash()
lock_shared
sharedHash =3D newHash
unlock_shared

2) Memory efficient but more contention

Thread 1
lock_shared
test_keys =3D sharedHash.keys
unlock_shared
test_keys.each do |k|
lock_shared
v =3D sharedHash[k]
unlock_shared
test(v) if v
end

Thread 2
# for each item as it's loaded from the network
generate_index do |k,v| =20
next if sharedHash[k] =3D=3D v # Optional to avoid = unnecessary contention
lock_shared
if v
sharedHash[k] =3D v
else
sharedHash.delete(k)
end
unlock_shared
end

Dan

From: Matthew K. Williams
Sent: Wednesday, April 07, 2010 10:02 AM
To: ruby-talk ML
Subject: Re: modifying a Hash in one process when .each is running in = another

Thanks for the clarification... My application is network based, and
some operations take several seconds. generate_list_of_objects takes
anything between 2 and 15 seconds, so I wouldn't like to have my = whole
program on hold while that's happening. Each test process takes a = few
seconds too, so if the object is no longer on the list, it isn't a
particularly bad thing if test tries to run, but it'll take up a lot
of time unnecessarily.

Click to expand...

I'm trying to work out if there's another way of doing it then.
Perhaps I'll modify the test method so that it knows if the current
object is on the list, and only runs the test if it is.
Nathan

Click to expand...

Might a messaging queue work better? That way you don't have the
concurrency issues, as well as the issues with changing a hash in the
middle of iteration.

Matt

FAQ 4.55 How do I process an entire hash?	0	Apr 7, 2011
Counting values in an array, storing in a hash then making an arrayof hashes?	3	Mar 21, 2011
Stuck trying to pass an array that contains a hash to another subprogram	1	Mar 21, 2007
"Pseudo-hashes are deprecated" error and accessing a hash of hashes	3	Jan 30, 2006
The process aspnet_wp is a surviver !	1	Dec 16, 2004
ANN: Sequel 3.12.0 Released	0	Jun 1, 2010
[SUMMARY] Records and Arrays (#170)	0	Jul 24, 2008
PEP 324: popen5 - New POSIX process module	1	Jan 3, 2004

modifying a Hash in one process when .each is running in another

Nathan

Brian Candler

Nathan

Matthew K. Williams

Dan Drew

Brian Candler

Brian Candler

Nathan

Nathan

Nathan

Dan Drew

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads