exception in thread?

S

Sylvain Viart

Hi,

I'm trying to use Net::SSH::Multi. http://net-ssh.rubyforge.org/

It seems that ruby have difficulties to catch exception in multi
threaded mode, any hint?

the doc said, if we put :)on_error => :warn) it shouldn't fail, be the
exception begin/rescue bloc did catch the exception, here
Errno::EHOSTUNREACH

/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/./multi/session.rb
468 def next_session(server, force=false) #:nodoc:
[...]
482 begin
483 server.new_session
484 rescue Exception => e
485 server.fail!
486 @session_mutex.synchronize { @open_connections -= 1 }
487
488 case on_error
489 when :ignore then
490 # do nothing
491 when :warn then
492 warn("error connecting to #{server}: #{e.class}
(#{e.message})")
493 when Proc then
494 go = catch:)go) { on_error.call(server); nil }
495 case go
496 when nil, :ignore then # nothing
497 when :retry then retry
498 when :raise then raise
499 else warn "unknown 'go' command: #{go.inspect}"
500 end
501 else
502 raise
503 end
504
505 return nil
506 end
[...]


As we can test, we are able to catch the exception at to level, could
you confirm its tread related?

require 'rubygems'
require 'net/ssh/multi'
Net::SSH::Multi.start:)on_error => :warn) do |session|
# define the servers we want to use
session.use 'root@server-04'
session.use 'root@server-07' # doesn't exist
session.use 'root@server-08'

# execute commands on all servers
begin
session.exec( "hostname" )
rescue Exception => e
p "main:#{e}"
end

# run the aggregated event loop
session.loop
end

ruby 1.8.5 (2006-08-25) [x86_64-linux]

Regards,
Sylvain.
 
B

Brian Candler

Sylvain said:
It seems that ruby have difficulties to catch exception in multi
threaded mode, any hint?

For debugging purposes, maybe you want Thread.abort_on_exception = true
(or just run ruby with -d flag)

Other than that I don't understand your problem. What behaviour do you
see when you run your test program? What behaviour do you expect? Is no
warning generated for the non-existent host?
# execute commands on all servers
begin
session.exec( "hostname" )
rescue Exception => e
p "main:#{e}"
end

That rescue won't catch exceptions in other threads. Each thread of
execution is responsible for catching its own exceptions. If it doesn't,
then the thread just terminates (unless Thread.abort_on_exception is
set)

It *is* possible for one thread to raise an exception in another thread
(Thread#raise), but this is extremely hairy asynchronous programming and
I would strongly discourage it.

It would seem reasonable for session.exec to collect the status of each
of the threads and return an array of them. I don't know if it does so.
Perhaps you can use something like this:

errs = []
...
:eek:n_error => lambda { |server| errs << server }
...
session.exec "hostname"
unless errs.empty?
puts "The command failed on #{errs.size} hosts"
end
 
S

Sylvain Viart

Hi Brian,

Brian Candler a =C3=A9crit :
Sylvain Viart wrote:
=20

For debugging purposes, maybe you want Thread.abort_on_exception =3D tr= ue=20
(or just run ruby with -d flag)
=20 Thanks good to know that.
Other than that I don't understand your problem. What behaviour do you =
see when you run your test program? What behaviour do you expect?=20
Sorry, I was late yesterday and my post is confusing.

In fact, I've made some tests and I suspect some strange behavior (or=20
unknown to me) on exception handling.
In the lib, we got a bloc with

484 rescue Exception =3D> e

Which I would expect to catch anything. but it missed Errno::EHOSTUNREACH=
=2E
I didn't find a good explanation so I suspect that exception in threads=20
are behaving somewhat differently.

Strangely, if I add another rescue statement in the lib:

/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/./multi/session.rb=

506 rescue
507 puts "caught:#{$!}"
508 end

it works?? Why?

Why the normally more open "rescue Exception =3D> e" didn't do its job?
That's why I suspect some interaction between exception and threaded=20
execution.
It seems I missed something about exception or so. :-\
Is no warning generated for the non-existent host?

=20
Sorry for that, I was expecting this bloc not using the begin/rescue in=20
fact.
The rescue here, catch the Errno::EHOSTUNREACH., not caught internally=20
by the lib.

I should have written:

session.exec( "hostname" )

With no rescue, the program fail, no job is performed on any host.
It would seem reasonable for session.exec to collect the status of each= =20
of the threads and return an array of them. I don't know if it does so.= =20
Perhaps you can use something like this:

errs =3D []
...
:eek:n_error =3D> lambda { |server| errs << server }
...
session.exec "hostname"
unless errs.empty?
puts "The command failed on #{errs.size} hosts"
end
=20
Hum, nice, I'm gonna try. :)
Would it catch the Errno::EHOSTUNREACH?

Thanks for your hints.
Regards,
Sylvain.
 
B

Brian Candler

Sylvain said:
In the lib, we got a bloc with

484 rescue Exception => e

Which I would expect to catch anything. but it missed
Errno::EHOSTUNREACH.

Probably I should not try to answer this as I don't use Net::SSH::Multi,
but I've installed the gem now:

Successfully installed net-ssh-2.0.4
Successfully installed net-ssh-gateway-1.0.0
Successfully installed net-ssh-multi-1.0.0

I see that rescue only covers the preceding line:

begin
server.new_session
rescue Exception => e

That is, it will catch an exception raised by server.new_session only.
Strangely, if I add another rescue statement in the lib:

/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/./multi/session.rb
506 rescue
507 puts "caught:#{$!}"
508 end

it works?? Why?

Possibly it's rescuing an exception which is occurring between lines 469
and 480. But since you didn't show the actual backtrace, then this is
pure guesswork.

One option is:

puts "caught:#{$!}\n#{$!.backtrace.join("\n")}"
Why the normally more open "rescue Exception => e" didn't do its job?

I don't know. But when making a extraordinary claim ("rescue is not
doing its job") then you need to provide the evidence to back it up.

Now, I can replicate something like your problem: pointing to a
non-existent host on my LAN gives an Errno::EHOSTUNREACH.

require 'rubygems'
require 'net/ssh/multi'
Net::SSH::Multi.start:)on_error => :warn) do |session|
# define the servers we want to use
session.use 'root@localhost'
session.use '[email protected]' # non-existent host on local LAN

# execute commands on all servers
begin
session.exec( "hostname" )
rescue Exception => e
puts "main:#{e}\n#{e.backtrace.join("\n")}"
end

# run the aggregated event loop
session.loop
end

$ ruby test.rb
error connecting to root@localhost: Net::SSH::AuthenticationFailed
(root@localhost)
main:No route to host - connect(2)
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-2.0.4/lib/net/ssh/transport/session.rb:65:in
`initialize'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in
`join'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in
`sessions'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in
`each'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:37:in
`sessions'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:81:in
`open_channel'
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_actions.rb:120:in
`exec'
test.rb:10
/usr/local/lib/ruby/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi.rb:62:in
`start'
test.rb:3

I get an exception like you do. However I see no evidence at all that
lib/net/ssh/multi/session.rb is involved.

This is ruby 1.8.6p114. I don't have any specific reason why 1.8.5
wouldn't work, but I consider 1.8.6p114 to be the most "stable" Ruby
available (certainly more stable than later releases in the 1.8.6 family
:) and it may well be that Net::SSH::Multi hasn't been well tested with
1.8.5. So it could be worth a try.
 
S

Sylvain Viart

Hi,

Sylvain Viart a =C3=A9crit :
It would seem reasonable for session.exec to collect the status of=20
each of the threads and return an array of them. I don't know if it=20
does so. Perhaps you can use something like this:

errs =3D []
...
:eek:n_error =3D> lambda { |server| errs << server }
...
session.exec "hostname"
unless errs.empty?
puts "The command failed on #{errs.size} hosts"
end =20
Hum, nice, I'm gonna try. :)
Would it catch the Errno::EHOSTUNREACH?
It didn't catch the exception

a workaround, using a closure and Net::SSH::Multi::DynamicServer=20
<http://net-ssh.rubyforge.org/multi/v1/api/classes/Net/SSH/Multi/DynamicS=
erver.html>=20
behavior, instead of specifying the server. It's evaluated by attempting =

the ssh connection first. Which mean the server, is connected twice,=20
during the test and later in the session. Note that I discard the=20
'options' to test the connection.

errs =3D []
def test_server(errs, server)
lambda do |options|
begin
server =3D~ /(.+)@(.+)/
server_name, user =3D $2, $1
puts server_name
s =3D Net::SSH.start(server_name, user)
s.close
s =3D server
rescue Errno::EHOSTUNREACH, SocketError
puts "echec connexion #{server} : #{$!}"
errs << server
s =3D nil
end

return s
end
end

Net::SSH::Multi.start:)on_error =3D> :warn) do |session|
# define the servers we want to use
session.use &test_server(errs, 'root@srv-04')
session.use &test_server(errs, 'root@srv-07')
session.use &test_server(errs, 'root@srv-08')
session.use &test_server(errs, '(e-mail address removed)')

# execute commands on all servers
session.exec( "hostname" )


# run the aggregated event loop
session.loop
end

unless errs.empty?
puts "The command failed on #{errs.size} hosts"
end

#srv-04
#srv-07
#echec connexion root@srv-07 : No route to host - connect(2)
#srv-08
#fail-08.local
#echec connexion (e-mail address removed) : getaddrinfo: Name or service not kn=
own
#[srv-04] srv-04
#[srv-08] srv-08
#The command failed on 2 hosts

Works, but I still don't know why the exception are not handled in the=20
lib Net::SSH::Multi which may be specific to this lib.
I still appreciate some more hint.

Regards,
Sylvain.
 
B

Brian Candler

Sylvain said:
Works, but I still don't know why the exception are not handled in the
lib Net::SSH::Multi which may be specific to this lib.

Show the backtrace! Otherwise, nobody is going to be able to help you.

That is, in your original demo code, either remove the top-level rescue
clause entirely, or change it to

rescue Exception => e
puts "main:#{e}\n#{e.backtrace.join("\n")}"
end

Then paste the full, unedited result here.
 
S

Sylvain Viart

Hi Brian,

Thanks a lot for your work, I really appreciate your effort. :)

Brian Candler a =C3=A9crit :
Sylvain Viart wrote:
=20

Show the backtrace! Otherwise, nobody is going to be able to help you.
=20
Sorry for that, I'm not enough backtrace friendly. :-\

----------------------------8<----------------------- t3.rb
require 'rubygems'
require 'net/ssh/multi'

Net::SSH::Multi.start:)on_error =3D> :warn) do |session|
# define the servers we want to use
session.use 'root@srv-04'
session.use 'root@srv-07'
session.use 'root@srv-08'
session.use '(e-mail address removed)'

# execute commands on all servers
session.exec( "hostname" )


# run the aggregated event loop
session.loop
end
----------------------------8<-----------------------


ruby t3.rb
error connecting to root@srv-04: Net::SSH::AuthenticationFailed=20
(root@srv-04)
Text will be echoed in the clear. Please install the HighLine or Termios =

libraries to suppress echoed text.
Password:=20
/var/lib/gems/1.8/gems/net-ssh-2.0.4/lib/net/ssh/transport/session.rb:65:=
in=20
`initialize': No route to host - connect(2) (Errno::EHOSTUNREACH)
from=20
/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_acti=
ons.rb:37:in=20
`join'
from=20
/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_acti=
ons.rb:37:in=20
`sessions'
from=20
/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_acti=
ons.rb:37:in=20
`each'
from=20
/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_acti=
ons.rb:37:in=20
`sessions'
from=20
/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_acti=
ons.rb:81:in=20
`open_channel'
from=20
/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi/session_acti=
ons.rb:120:in=20
`exec'
from t3.rb:12
from=20
/var/lib/gems/1.8/gems/net-ssh-multi-1.0.0/lib/net/ssh/multi.rb:62:in=20
`start'
from t3.rb:4

shell returned 1

I think you're right and this kind of exception is not handled by the lib=
=2E
I've to reread the lib, but its doc is confusing.

Regards,
Sylvain.
 
B

Brian Candler

It does seem inconsistent that "error connecting to root@srv-04:
Net::SSH::AuthenticationFailed" is caught as a warning, but
Errno::EHOSTUNREACH is not. I suggest you check for a project mailing
list or bug tracker and report it there.
http://rubyforge.org/projects/net-ssh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top