Frustrated: System call timeouts

M

Mikel Lindsaar

Hello all,

I am having some (un)fun with timing out a database calls.

Basically I have some database calls that go out to a remote database
server on the other side of the planet, (using Rails' active record).

This works all fine, but occasionally, the link gets interrupted and
you get a stale session and the whole thing just locks up waiting for
the call to complete (which it never does).

This then hangs the rake task that is doing a periodic update through
the system cron, and it can jam until you go in and reset it. - quite
annoying.

Trying timeout.rb didn't help, as it does not handle system calls
(except I believe for ones that Ruby makes itself, like file I/O).

Trying system-timer (http://ph7spot.com/articles/system_timer) from
Philippe Hanrigou also didn't work - same hang, waiting for a return
call from the DB driver.

The DB adapter is Oracle instant client then OCI, then Oracle Active
Record Adapter, within ActiveRecord called from a rake task (that
includes the environment), so I am basically calling from within a
full rails stack on top of Ruby 1.8.6p36

When the rake task starts, it checks to see if another copy is running
through a lock file and exits if so, so there is only ever one copy of
the rake task running - so it is not some race condition here.

The time outs happen while I am finding an individual row of a table
[Model.find(id)] which is usually a fast operation, in the context of
where I am using it, it is the slowest part of my process, and so
seems to be where the network has the most chance to crap out, so it
is probably not that that bit of the code fails.

Has anyone found a reliable way to timeout this sort of call / does
anyone have any idea why the system timer would _not_ be timing out
this sort of call.

The hard thing is I am not 100% sure where it is failing, I think
(from looking at tcpdump and copious logging) that it is stalling in
that find method, but this I am not 100% sure.

Any pointers from others that must have tackled this problem on where
to go from here? I see my options are:

1) Figure out a solution to this problem (preferred)
2) Abandon it and monitor for a zombie by tailing a log file or the
like for inactivity and then kill appropriately (sounds like a real
hack).

Mikel
 
M

Mikel Lindsaar

try this:
<snip>
pid = Process.pid
signaler = IO.popen "ruby -e'sleep #{ seconds };
Process.kill:)TERM.to_s, #{ pid }) rescue nil'"
thread = Thread.current
handler = Signal.trap('TERM'){ thread.raise Error, seconds.to_s }
begin
block.call
ensure
Process.kill 'TERM', signaler.pid rescue nil
Signal.trap('TERM', handler)
end

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :) But works well.

There was a bit of delay (putting out some fires here over the past
two days) but I got to your code last night and this morning, and it
basically works... except it doesn't kill off the signaler threads
fully.

This is because two processes get made, first is the shell which then
creates the ruby -e "sleep..." blah thread.

The 'hack' I used to solve this is to replace the ensure block with:

ensure
Process.kill 'TERM', signaler.pid rescue nil
Process.kill('TERM', signaler.pid+1) rescue nil
Signal.trap('TERM', handler)
end

But this obviously is insane as it assumes that no other processes get
started on the computer between sh starting up and it firing off the
ruby process.

the ps output looks like this:

$ ps -ef | grep ruby
rails 2153 2152 69 17:04 /usr/sbin/ruby1.8 /usr/bin/rake update:all
rails 2237 2153 69 17:04 sh -c ruby -e'sleep 40.0;?
Process.kill:)TERM.to_s, 2153) rescue nil'
rails 2238 2237 69 17:04 ruby -e'sleep 40.0;?
Process.kill:)TERM.to_s, 2153) rescue nil'

Any ideas on how to reliably find the PID of the ruby process that the
sh process created by IO.popen creates?


Mikel
 
M

Martin DeMello

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :) But works well.

I agree, that was very clever :) Bookmarked in case I ever need this.

martin
 
M

Michal Suchanek

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :) But works well.

There was a bit of delay (putting out some fires here over the past
two days) but I got to your code last night and this morning, and it
basically works... except it doesn't kill off the signaler threads
fully.

This is because two processes get made, first is the shell which then
creates the ruby -e "sleep..." blah thread.

The 'hack' I used to solve this is to replace the ensure block with:


ensure
Process.kill 'TERM', signaler.pid rescue nil

Process.kill('TERM', signaler.pid+1) rescue nil

Signal.trap('TERM', handler)
end


But this obviously is insane as it assumes that no other processes get
started on the computer between sh starting up and it firing off the
ruby process.

the ps output looks like this:

$ ps -ef | grep ruby
rails 2153 2152 69 17:04 /usr/sbin/ruby1.8 /usr/bin/rake update:all
rails 2237 2153 69 17:04 sh -c ruby -e'sleep 40.0;?
Process.kill:)TERM.to_s, 2153) rescue nil'
rails 2238 2237 69 17:04 ruby -e'sleep 40.0;?
Process.kill:)TERM.to_s, 2153) rescue nil'

Any ideas on how to reliably find the PID of the ruby process that the
sh process created by IO.popen creates?

Since you are using popen anyway you can just have your ruby process
print it's PID when it starts, and read it in your terminator.

HTH

Michal
 
A

ara.t.howard

Since you are using popen anyway you can just have your ruby process
print it's PID when it starts, and read it in your terminator.

HTH

correct. this is basically how systemu does it, which you could use
similarly to this

require 'thread'

q = Queue.new

systemu command do |pid|

q.push pid

end

pid = q.pop


this bizzare syntax will capture the pid but *also* wait for the
process do start. all it's doing is reading from a pipe so your
solution seems fine.

cheers.

a @ http://codeforpeople.com/
 
A

ara.t.howard

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :) But works well.

i keep meaning to turn this into a library but have not. any other
advice - besides the pid issue - that you encountered trying to make
it live?

cheers.




a @ http://codeforpeople.com/
 
M

Mikel Lindsaar

i keep meaning to turn this into a library but have not. any other advice -
besides the pid issue - that you encountered trying to make it live?

No, the pid issue is the only thing... it sometimes misses.

A library hey?

gem install terminator

Terminate.timeout(40) do
... my block
end

:)

Mikel
 
A

ara.t.howard

No, the pid issue is the only thing... it sometimes misses.

A library hey?

gem install terminator

Terminate.timeout(40) do
... my block
end

:)

Mikel

oh that's good! i can give you commit rights to codeforpeople and we
could release. such a great name! ;-)

a @ http://codeforpeople.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,069
Latest member
SimplyleanKetoReviews

Latest Threads

Top