Parallel::ForkManager, Net::HTTP and catching Timeout::Error

N

nvp

Hello,

I'm about to release Parallel::ForkManager 1.1 and while generating
new examples (that use Net::HTTP) for PFM 1.1 features, I catch the
following error *sometimes* (maybe 10-15% of the time) when I try to
connect to a URL where the host part doesn't exist or isn't reachable:

/usr/lib/ruby/1.8/timeout.rb:60:in `rbuf_fill': execution expired
(Timeout::Error)
from /usr/lib/ruby/1.8/net/protocol.rb:134:in `rbuf_fill'
from /usr/lib/ruby/1.8/net/protocol.rb:86:in `read'
from /usr/lib/ruby/1.8/net/http.rb:2212:in `read_body_0'
from /usr/lib/ruby/1.8/net/http.rb:2173:in `read_body'
from /usr/lib/ruby/1.8/net/http.rb:773:in `get'
from /usr/lib/ruby/1.8/net/http.rb:1053:in `request'
from /usr/lib/ruby/1.8/net/http.rb:2136:in `reading_body'
from /usr/lib/ruby/1.8/net/http.rb:1052:in `request'
from /usr/lib/ruby/1.8/net/http.rb:1037:in `request'
from /usr/lib/ruby/1.8/net/http.rb:543:in `start'
from /usr/lib/ruby/1.8/net/http.rb:1035:in `request'
from /usr/lib/ruby/1.8/net/http.rb:772:in `get'
from ./parallel_http_get2.rb:36
from ./lib/parallel/forkmanager.rb:232:in `call'
from ./lib/parallel/forkmanager.rb:232:in `start'
from ./lib/parallel/forkmanager.rb:232:in `fork'
from ./lib/parallel/forkmanager.rb:232:in `start'
from ./parallel_http_get2.rb:30
from ./parallel_http_get2.rb:26:in `each'
from ./parallel_http_get2.rb:26

Based on my review of the error message, it would seem to be a simple
case that I should catch (rescue) 'Exception' because Timeout::Error
is sending to stderr or such, but when I try to catch 'Exception' in
my test program, nothing happens! I never catch the error!

Here's my test code:

#!/usr/bin/env ruby

require 'net/http'
require 'lib/parallel/forkmanager'

save_dir = '/tmp'

my_urls = [
'http://www.cnn.com/index.html',
'http://oreilly.com/index.html',
'http://www.cakewalk.com/index.html',
'http://www.asdfsemicolonl.kj/index.htm'
]
my_timeout = 5 # seconds

max_proc = 20
pfm = Parallel::ForkManager.new(max_proc)

pfm.run_on_finish(
lambda {
|pid,exit_code,ident|
print "** PID (#{pid}) for #{ident} exited with code #{exit_code}!\n"
}
)

my_urls.each {
|my_url|

begin
pfm.start(my_url) {
url = URI.parse(my_url)
out_file = save_dir + '/' + url.host + '.txt';

http = Net::HTTP.new(url.host, url.port)
http.open_timeout = http.read_timeout = my_timeout
res = http.get(url.path)
status = res.code

if status.to_i == 200
f = File.open(out_file, 'w')
f.print res.body
f.close()
exit 0
else
exit 255
end

http = Net::HTTP.new(url.host, url.port)
http.open_timeout = http.read_timeout = my_timeout
res = http.get(url.path)
status = res.code

if status.to_i == 200
f = File.open(out_file, 'w')
f.print res.body
f.close()
exit 0 # start() with a block means that we exit with
status or else it's 0 all the time.
else
exit 255
end
} # end pfm.start { ... }
rescue Exception => e
print "Arggh, exception: ", e, "\n"
exit 255
end
}

pfm.wait_all_children()

# end

Would somebody be able to shed some light on why I'm unable to handle
the exceptions that Net::HTTP is throwing? Am I just trying to catch
the wrong exception or is there something else? Is it because I'm
trying to handle the exception in the child?

Note that this is: ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
under Ubuntu 9.10 (Karmic).
 
E

Eric Wong

nvp said:
Hello,

I'm about to release Parallel::ForkManager 1.1 and while generating
new examples (that use Net::HTTP) for PFM 1.1 features, I catch the
following error *sometimes* (maybe 10-15% of the time) when I try to
connect to a URL where the host part doesn't exist or isn't reachable:

/usr/lib/ruby/1.8/timeout.rb:60:in `rbuf_fill': execution expired
(Timeout::Error)

from /usr/lib/ruby/1.8/net/http.rb:772:in `get'
from ./parallel_http_get2.rb:36
from ./lib/parallel/forkmanager.rb:232:in `call'
from ./lib/parallel/forkmanager.rb:232:in `start'
from ./lib/parallel/forkmanager.rb:232:in `fork'
from ./lib/parallel/forkmanager.rb:232:in `start'
from ./parallel_http_get2.rb:30
from ./parallel_http_get2.rb:26:in `each'
from ./parallel_http_get2.rb:26

Based on my review of the error message, it would seem to be a simple
case that I should catch (rescue) 'Exception' because Timeout::Error
is sending to stderr or such, but when I try to catch 'Exception' in
my test program, nothing happens! I never catch the error!

Here's my test code:

begin
pfm.start(my_url) {
url = URI.parse(my_url)
out_file = save_dir + '/' + url.host + '.txt';

http = Net::HTTP.new(url.host, url.port)
http.open_timeout = http.read_timeout = my_timeout
res = http.get(url.path)

} # end pfm.start { ... }
rescue Exception => e
print "Arggh, exception: ", e, "\n"
exit 255
end
}
Would somebody be able to shed some light on why I'm unable to handle
the exceptions that Net::HTTP is throwing? Am I just trying to catch
the wrong exception or is there something else? Is it because I'm
trying to handle the exception in the child?

It looks like the parent is trying to rescue exception in a child, which
isn't possible. Exceptions are Ruby-level objects that normally don't
get transferred/communicated between Unix processes. The parent can
only normally get the exit status of a child process, otherwise it needs
to resort to explicit IPC mechanisms (pipes/sockets/files/shared
memory/...).
 
L

Luc Heinrich

/usr/lib/ruby/1.8/timeout.rb:60:in `rbuf_fill': execution expired
(Timeout::Error)

FWIW, I'm getting the exact same error - at roughly the same rate - in a =
regular mongrel-based application which talks to a separate HTTP backend =
and I would *love* to get rid of it too :)

--=20
Luc Heinrich - (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top