Nested threading? implications to timeout()

G

Geff Geff

All,

I'm certainly not a ruby expert by any means. I'm trying to come up to
speed as fast as possible on ruby and rails. My last few months have
been reading, coding, reading, more reading, more coding, etc. :) So I
was doing some coding to try to come up to speed on ruby. I really
think it's one of the most phenominal languages I've ever found. I
never got into java, too verbose. That said I'd like to move on to the
question.

At this point, I'm trying to track down some sort of memory leak. In my
vast amount of digging through google and writing test code, I came
across the following interesting behavior. It seems that threads do not
have a parent child relationships. So when one kills a thread, ruby
doesn't actually kill any threads that were started inside that thread.
To clarify, consider the following:

irb(main):001:0> x=nil ; y = Thread.start { x = Thread.start { sleep 0.5
until false } ; sleep 0.5 until false }
=> #<Thread:0xb7abe20c run>
irb(main):002:0> x.status
=> "sleep"
irb(main):003:0> y.kill
=> #<Thread:0xb7abe20c dead>
irb(main):007:0> x.status
=> "sleep"
irb(main):008:0>

Clearly by killing, what I would call the "parent thread", did not kill
the child thread. And this would be okay, I think, except that the way
the timeout class works is by starting a thread and killing it if it
doesn't complete in the right amount of time. So if I were to wrap
anything in "timeout() {}" it's possible that something else calls
timeout inside that and then the thread is still alive, until some later
time. Is this normal? Can this cause memory leak? Am I confused? :)

Thanks much for your help and feedback,

Geff
 
A

Arnaud Bergeron

All,

I'm certainly not a ruby expert by any means. I'm trying to come up to
speed as fast as possible on ruby and rails. My last few months have
been reading, coding, reading, more reading, more coding, etc. :) So I
was doing some coding to try to come up to speed on ruby. I really
think it's one of the most phenominal languages I've ever found. I
never got into java, too verbose. That said I'd like to move on to the
question.

At this point, I'm trying to track down some sort of memory leak. In my
vast amount of digging through google and writing test code, I came
across the following interesting behavior. It seems that threads do not
have a parent child relationships. So when one kills a thread, ruby
doesn't actually kill any threads that were started inside that thread.
To clarify, consider the following:

irb(main):001:0> x=nil ; y = Thread.start { x = Thread.start { sleep 0.5
until false } ; sleep 0.5 until false }
=> #<Thread:0xb7abe20c run>
irb(main):002:0> x.status
=> "sleep"
irb(main):003:0> y.kill
=> #<Thread:0xb7abe20c dead>
[There are lines missing here...]
irb(main):007:0> x.status
=> "sleep"
irb(main):008:0>

Clearly by killing, what I would call the "parent thread", did not kill
the child thread. And this would be okay, I think, except that the way
the timeout class works is by starting a thread and killing it if it
doesn't complete in the right amount of time. So if I were to wrap
anything in "timeout() {}" it's possible that something else calls
timeout inside that and then the thread is still alive, until some later
time. Is this normal?

Reading the timeout code I see that it creates a watcher thread to
provide the timeout. The block is executed on the current thread and
interruption is provided by raising an exception (from the watcher
tread) when the timeout expires. At the end of the method there is an
ensure to kill the watcher thread in any case so it would not
encounter the case you describe.
Can this cause memory leak?

Due do what I described in the previous paragraph nested timeouts
won't cause any kind of leak.
Am I confused? :)

Yes, I believe you are confused.

By the way, reading the code of the methods you have doubts about can
be a good way to clarify something.
 
G

Geff Geff

Arnaud said:
At this point, I'm trying to track down some sort of memory leak. In my
=> "sleep"
irb(main):003:0> y.kill
=> #<Thread:0xb7abe20c dead>
[There are lines missing here...]

sorry that was just me repeating the status method multiple times. :)
 
G

Geff Geff

Francis said:
Threads don't have any kind of parent-child relationship, as processes
do.
None of the threading models that have seen wide use have a feature like
this. What was your basis for thinking that they do?

What was I thinking was the following: lets consider the case where you
try to timeout a HTTP connect with the net/http stuff. There are
already "timeouts" in the HTTP class. If my timeout times out before
the library's timeout, I end up killing the http thread that is watching
the tcpconnect thread. The tcpconnect thread lives on until it's done.
Maybe not such a big deal in thise case but the situation certainly
could have been worse. It could be kind of messy if you're using a
library in general because you don't know if it's going ot call timeout
of it's own and how that might impact resource allocation/deallocation.
As far as the programmer is concerned, he / she timed out the http class
and it should be (all) gone. But it's not. It's still hanging around.
Just like my nested sleep was.

I don't have any basis for thinking there is a parent / child
relationship. Just sorta counter intuitive in the case of a nested
timeout. Because what programmer probably wants is a timeout on the
thing that they are trying to timeout. The *WHOLE* thing, not part of
it, without verifying the libraries don't call timeout themselves.

Thoughts?

Seprately I'm having some issue with some of my activerecord objects
sticking around in a long running deamon. Not all, but some. It's odd.
I haven't found the problem yet, and I really don't know what's going
on. It's probably not timeout related because those would free up
eventually.

Thanks so much for the comments thus far.

Geff
 
F

Francis Cianfrocca

What was I thinking was the following: lets consider the case where you
try to timeout a HTTP connect with the net/http stuff. There are
already "timeouts" in the HTTP class. If my timeout times out before
the library's timeout, I end up killing the http thread that is watching
the tcpconnect thread. The tcpconnect thread lives on until it's done.

I haven't read through the net/http code as it relates to this issue,
but I did read timeout.rb and as Arnaud says, there should be no
problem. If I may assume that your thread calls timeout on a net/http
call, which in turn calls timeout on a TCP connect-attempt, then if
your timeout fires first, you'll kill the thread that is doing the TCP
connect, because it's the same thread that called net/http. The
"internal" timer thread is left running. It will hold a memory
reference to the killed thread, which can hold resources longer than
you expect, but there's no leak. When the internal timer thread that
is left running eventually times out, it will try to raise an
exception on the killed thread, and nothing will happen.
 
G

Geff Geff

Francis said:
I haven't read through the net/http code as it relates to this issue,
but I did read timeout.rb and as Arnaud says, there should be no
problem. If I may assume that your thread calls timeout on a net/http
call, which in turn calls timeout on a TCP connect-attempt, then if
your timeout fires first, you'll kill the thread that is doing the TCP
connect, because it's the same thread that called net/http. The
"internal" timer thread is left running. It will hold a memory
reference to the killed thread, which can hold resources longer than
you expect, but there's no leak. When the internal timer thread that
is left running eventually times out, it will try to raise an
exception on the killed thread, and nothing will happen.

I think if I timeout a timeout that my timeout is killing the "waiter"
thread. And since the waiter thread started the TCPconnect attempt
thread that tcpconnect stays running until it ends, if it ends. Kinda
like my sleep thread example. Is that not accurate?
 
F

Francis Cianfrocca

I think if I timeout a timeout that my timeout is killing the "waiter"
thread. And since the waiter thread started the TCPconnect attempt
thread that tcpconnect stays running until it ends, if it ends. Kinda
like my sleep thread example. Is that not accurate?

I don't think so. Try the following:

---------------------------------------
require 'timeout'
begin
timeout(2) {
timeout(4) {
sleep 6
p "I finished"
}
}
rescue Timeout::Error
p "timeout fired"
end

sleep 10
 
J

Jan Svitok

I don't think so. Try the following:

---------------------------------------
require 'timeout'
begin
timeout(2) {
timeout(4) {
sleep 6
p "I finished"
}
}
rescue Timeout::Error
p "timeout fired"
end

sleep 10
------------------------------------------
You'll see the timeout message after two seconds, and the program will
end ten seconds later without saying "I finished." You should be able
to ignore the fact that timeout is implemented with threads in the
first place. Or were you asking a different question?

FYI: Eric Hodel has written a post about nested timeouts:
http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-timeout-timeout
 
G

Geff Geff

Francis said:
require 'timeout'
begin
timeout(2) {
timeout(4) {
sleep 6
p "I finished"
}
}
rescue Timeout::Error
p "timeout fired"
end

sleep 10

Ur very right. Thanks very much. Sorry I had a bad case of HUA. I
read the timeout library code backwards, so to speak. As if the child
thread (so to speak) was the actual code block, instead of just the
sleep and the raise.

Sorry for posting something I shouldn't have wasted your time with.
Thanks for the patience and the help.

Geff
 
A

Arnaud Bergeron

I haven't read through the net/http code as it relates to this issue,
but I did read timeout.rb and as Arnaud says, there should be no
problem. If I may assume that your thread calls timeout on a net/http
call, which in turn calls timeout on a TCP connect-attempt, then if
your timeout fires first, you'll kill the thread that is doing the TCP
connect, because it's the same thread that called net/http. The
"internal" timer thread is left running. It will hold a memory
reference to the killed thread, which can hold resources longer than
you expect, but there's no leak. When the internal timer thread that
is left running eventually times out, it will try to raise an
exception on the killed thread, and nothing will happen.
The timer thread won't even remain because of the ensure at the end of
the timeout function that kills it ...
 
J

Joel VanderWerf

Jan Svitok wrote:
...

The second part of the article points out a potential danger of the
interaction between timeouts and rescue/ensure. But the proposed
solution may not be good advice.

The danger is that a timeout exception may fire during an ensure clause,
preventing the ensure clause (and any necessary cleanup code within it)
from finishing, which could cause resource leaks or other problems.

The solution proposed in the article is to wrap code within the ensure
clause in a begin..end block to handle timeout exceptions, like this:

require 'timeout'

Timeout.timeout 2 do
begin
puts "Allocating the thingy..."
sleep 1
raise RuntimeError, 'Oh no! Something went wrong!'
ensure
# Since we might time out, hold onto the timeout we caught
# so we can re-raise it when we're done cleaning up.
timeout = nil
begin # we really need to clean up
puts "Cleaning up after the thingy..."
sleep 2
puts "Cleaned up after the thingy!"
rescue Timeout::Error => e
puts "Timed out! Trying again!"
timeout = e # save that timeout then retry
retry
end
# Raise the timeout so we time out all the way to the top.
raise timeout unless timeout.nil?
end
end

However, that solution only reduces the chance of the timeout
interfering with the ensure clause. Suppose the timeout fires while the
main thread is executing the line "timeout = nil". (It's impossible in
this example, but you can get it to happen by putting a "sleep 5" just
after this line.) Then the inner begin..end clause doesn't catch the
timeout, and the cleanup doesn't happen. In general, unless you are very
sure about the timings of your code, you cannot guarantee that the
timeout won't fire at the wrong time. So it's a race condition.

Another problem is that the "retry" may cause the cleanup to happen
twice, if the timeout fires just when cleanup is finishing. That may
lead to other problems (such as closing a file twice and generating
another exception).

A solution in this case (but not in general) is to move the ensure block
*outside* of the timeout block:

require 'timeout'

begin
Timeout.timeout 2 do
puts "Allocating the thingy..."
sleep 1
raise RuntimeError, 'Oh no! Something went wrong!'
end

rescue Timeout::Error => e
puts "timed out"

ensure
puts "Cleaning up after the thingy..."
sleep 2
puts "Cleaned up after the thingy!"
end

The effect of this is to kill the timeout thread before the ensure block
starts (since the timeout block has been exited). Also, the ensure code
will be called exactly once (assuming there are no other interrupts).

But this refactoring technique cannot be used if the ensure clause
occurs in some nested method call. This has been noted before:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/113417

For example:

require 'timeout'

def do_some_work
puts "do_some_work is starting"
sleep 1
raise 'exception in do_some_work'
ensure
puts "cleaning up resources in do_some_work"
sleep 2
puts "cleaned up resources in do_some_work"
end

Timeout.timeout 2 do
do_some_work
end

This outputs:

do_some_work is starting
cleaning up resources in do_some_work
/usr/local/lib/ruby/1.8/timeout.rb:54:in `do_some_work': execution
expired (Timeout::Error)

The cleanup never finishes.

We could modify do_some_work to handle timeouts, as Eric did in his
article, but then there would still be the race condition that would
(depending on timing) allow the timeout to kill the ensure clause
anyway. And there's still the retry problem (causing multiple executions
of cleanup code). And it's not easy to modify every library method in
this way.

I wish I had an easy answer to this problem, but I don't. Are timeout
and ensure inherently incompatible?
 
K

khaines

I wish I had an easy answer to this problem, but I don't. Are timeout and
ensure inherently incompatible?

I don't have an answer, but what if you make your ensure section into a
critical section?

ensure
Thread.critical = true
# do stuff
Thread.critical = false
end


Kirk Haines
 
A

Arnaud Bergeron

I don't have an answer, but what if you make your ensure section into a
critical section?

ensure
Thread.critical = true
# do stuff
Thread.critical = false
end


Kirk Haines

There is still the race problem if the exception gets raised just
before the Thread.critical.

I'm not sure if it is really possible in this case. It depends if the
implentation allows thread switching just after entering an ensure but
before executing any code. You will have to ask gurus on this one.
 
J

Joel VanderWerf

Arnaud said:
There is still the race problem if the exception gets raised just
before the Thread.critical.

Confirmed by experiment with ruby 1.8.4:

$ cat ensure-timeout.rb
require 'timeout'

Thread.abort_on_exception = true

work_thread = Thread.new do
loop do
begin
Thread.pass
ensure
Thread.critical = true
begin
rescue Timeout::Error
puts "caught a Timeout::Error"
end
Thread.critical = false
#retry
end
end
end

loop do
work_thread.raise Timeout::Error
end

$ ruby ensure-timeout.rb
ensure-timeout.rb:22: Timeout::Error (Timeout::Error)
from ensure-timeout.rb:6
from ensure-timeout.rb:5
 
J

Joel VanderWerf

I don't have an answer, but what if you make your ensure section into a
critical section?

ensure
Thread.critical = true
# do stuff
Thread.critical = false
end

Even if this worked reliably, using a critical section would be
udesirable if the ensure clause invokes IO (e.g. waiting for some
network operation to finish). That would freeze the whole process.
 
E

Eric Hodel

Reading the timeout code I see that it creates a watcher thread to
provide the timeout. The block is executed on the current thread and
interruption is provided by raising an exception (from the watcher
tread) when the timeout expires. At the end of the method there is an
ensure to kill the watcher thread in any case so it would not
encounter the case you describe.


Due do what I described in the previous paragraph nested timeouts
won't cause any kind of leak.

Nested timeouts have a different problem:

http://blog.segment7.net/articles/2006/04/11/care-and-feeding-of-
timeout-timeout
 
E

Eric Hodel

I don't have an answer, but what if you make your ensure section
into a critical section?

ensure
Thread.critical = true
# do stuff
Thread.critical = false
end

Careful, you may want:

ensure
orig_crit = Thread.critical
Thread.critical = true
# do stuff
Thread.critical = orig_crit
end

If you may already be inside a critical section.

(or require 'thread' use Thread.exclusive)
 
A

Arnaud Bergeron

Careful, you may want:

ensure
orig_crit = Thread.critical
Thread.critical = true
# do stuff
Thread.critical = orig_crit
end

If you may already be inside a critical section.

(or require 'thread' use Thread.exclusive)

This still leaves the race problem. If you can raise an exception
between an ensure and its first instruction, NOTHING can be safe in
the ensure clause.

To solve this problem would require some changes to the language
semantics. At the very least external exceptions should not be
accepted when inside an ensure clause. This would require some
interpreter changes and certainly more thougth than what I have given
it.
 
E

Eric Hodel

Jan Svitok wrote:
...

The second part of the article points out a potential danger of the
interaction between timeouts and rescue/ensure. But the proposed
solution may not be good advice.

The danger is that a timeout exception may fire during an ensure
clause, preventing the ensure clause (and any necessary cleanup
code within it) from finishing, which could cause resource leaks or
other problems.

The solution proposed in the article is to wrap code within the
ensure clause in a begin..end block to handle timeout exceptions,
like this:

require 'timeout'

Timeout.timeout 2 do
begin
puts "Allocating the thingy..."
sleep 1
raise RuntimeError, 'Oh no! Something went wrong!'
ensure
# Since we might time out, hold onto the timeout we caught
# so we can re-raise it when we're done cleaning up.
timeout = nil
begin # we really need to clean up
puts "Cleaning up after the thingy..."
sleep 2
puts "Cleaned up after the thingy!"
rescue Timeout::Error => e
puts "Timed out! Trying again!"
timeout = e # save that timeout then retry
retry
end
# Raise the timeout so we time out all the way to the top.
raise timeout unless timeout.nil?
end
end

However, that solution only reduces the chance of the timeout
interfering with the ensure clause. Suppose the timeout fires while
the main thread is executing the line "timeout = nil". (It's
impossible in this example, but you can get it to happen by putting
a "sleep 5" just after this line.) Then the inner begin..end clause
doesn't catch the timeout, and the cleanup doesn't happen. In
general, unless you are very sure about the timings of your code,
you cannot guarantee that the timeout won't fire at the wrong time.
So it's a race condition.

From reading rb_eval, it might be possible to fix the race condition
by exploiting implementation details, but I don't have the
opportunity to examine it in depth.

rb_eval looks something like this:

rb_eval(VALUE self, NODE *node) {
switch (nd_type(node)) {
case NODE_ENSURE:
rb_eval(self, node->begin_body);
rb_eval(self, node->ensure_body);
break;
case NODE_RESCUE:
/* ... */
}
CHECK_INTS; /* check for a thread to switch to or a signal to
process */
}

So threads may be switched (and Timeout::Error raised) only when is
finished evaluating a node.

So if the first item of an ensure is begin (which creates a
NODE_RESCUE) we might be "safe":

$ parse_tree_show -f
begin
perform some work
ensure
begin
clean up after ourselves
rescue Timeout::Error
retry
end
end
(eval):2: warning: parenthesize argument(s) for future version
(eval):5: warning: parenthesize argument(s) for future version
(eval):5: warning: parenthesize argument(s) for future version
[[:ensure,
# this is the begin body.
[:fcall, :perform, [:array, [:fcall, :some, [:array,
[:vcall, :work]]]]],
# this is the ensure body
[:rescue,
# this is the inner begin body, we will always reach here after
ensure.
[:fcall,
:clean,
[:array,
[:fcall,
:up,
[:array, [:fcall, :after, [:array, [:vcall, :eek:urselves]]]]]]],
# this is the rescue statement
[:resbody, [:array, [:colon2, [:const, :Timeout], :Error]],
# this is the rescue body
[:retry]]]]]

I am on vacation starting tomorrow, so I'll probably look up this
thread when I return.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top