Thread#raise, Thread#kill, and timeout.rb are unsafe

  • Thread starter Charles Oliver Nutter
  • Start date
P

Paul Brannan

It is because Thread.check_interrupt with blocking operations causes
race conditions.

Raising an exception during a blocking write operation makes it
impossible to know how much data was written.

Similarly, raising an exception during a blocking read operation makes
it impossible to access the data that was read.

The same is true for any operation for which partial success is
possible.
So application must choose that make a blocking operation interruption
safe or uninterruptible.
* Thread.blocking_interruptible(klass) { ... }
* Thread.blocking_uninterruptible(klass) { ... }

Interesting. Which of these two is the default?
Another idea is about ensure clause. Since an ensure clause is used
to release a resource, it may delay asynchronous events as in
Thread.delay_interrupt.

Reminds me of an old C++ interview question we used to joke about:

"While digging through source code, you discover an application which
executes entirely in the body of the destructor of a global object.
Give one or more reasons why the author may have chosen to implement the
application this way."

Paul
 
M

MenTaLguY

Raising an exception during a blocking write operation makes it
impossible to know how much data was written.

Similarly, raising an exception during a blocking read operation makes
it impossible to access the data that was read.

The same is true for any operation for which partial success is
possible.

That's a good point. To some extent it is an API design and
implementation problem; an API which can partially "succeed" in that
fashion is arguably flawed.

There are really two alternatives:

1. fix the operation so that it succeeds or fails atomically

2. otherwise, make the operation always non-interruptable

(Well, there is also a third solution, used by Java's NIO for blocking
reads and writes: on interrupt, destroy the object in question so that
it doesn't matter anymore. But I don't think it's a particularly good
solution.)

-mental
 
P

Paul Brannan

There are really two alternatives:

1. fix the operation so that it succeeds or fails atomically

2. otherwise, make the operation always non-interruptable

For I/O operations, it is typically desirable for the operation to be
interruptible, but to get the result of the partial operation after it
is interrupted. At least, that's how it works at the C level with
respect to signals.

Paul
 
M

MenTaLguY

For I/O operations, it is typically desirable for the operation to be
interruptible, but to get the result of the partial operation after it
is interrupted. At least, that's how it works at the C level with
respect to signals.

Yes. For my purposes, partial success is still success as long
as the caller can tell how far it got. Even in the absence of
signals, a successful write() isn't guaranteed to write everything
that was requested. The important thing is that the operation
doesn't report total failure when it really succeeded, or vice-versa.

Historical example: under SVR4, a write() interrupted by a signal
would always fail with EINTR, even if some bytes had already
been written. POSIX later remedied this, so that write() would
always report success if any bytes were written, even if it was
interrupted by a signal. In that instance, the API was modified
to conform to alternative #1.

Now, the question with IO and asynchronous exceptions in Ruby is
whether there is a good way to report a partial result?

-mental
 
M

MenTaLguY

Yes. For my purposes, partial success is still success as long
as the caller can tell how far it got.

(For the sake of those following along who might not have a lot of
C or Unix experience, I forgot to mention that a successful write()
always returns the number of bytes actually written. The issue with
SVR4 Unix was that interrupted writes were always treated like
failures, so no count was returned in those cases and you couldn't
tell how much you had left over in case you needed to try again.)

-mental
 
R

Roger Pack

I still like Ara's idea :)

However--is something like this already possible if we redefine the
Thread#raise method? For example define it to be something like this.

class Thread
def raise_safe *args
# sorry, couldn't think of a better way than a mutex to prohibit
weirdness from occurring. So that's where a problem would occur--in the
ensure block of the synchronize function. Ahh well. If you re-aliased
raise to be this method, maybe avoided.

@raise_mutex.synchronize {
unless @currently_non_interruptible_section
raise *args
else
self.queued_interrupts << *args
end
}
end

def uninterruptible_block
return if @currently_non_interruptible_section # already within one.
Maybe should keep a count of how deep you are. Not sure
@currently_non_interruptible_section = true
begin
yield # might want to handle queued exceptions at the end of this
block
ensure
@raise_mutex.synchronize {@currently_non_interruptible_section =
false}
end
# after this point the thread is subject to interruptions again
end

end

Then you could have a thread do something like

uninterruptible_block do
# some stuff
# handle the queued interrupts if desired. Note that you could loop
and 'check if you've been interupted yet' once per loop or what not, if
desired

end

# handle queued interrupts, if desired



timeout would possibly become

def timeout(sec, exception=Error)
return yield if sec == nil or sec.zero?
raise ThreadError, "timeout within critical session"\
if Thread.critical
raise ThreadError, "called nested functions that want to use
uninterruptible and raise on it" if
Thread.current.currently_non_interruptible_section
# this is necessary since we do use 'real' raise this time
# to interrupt our block's yielding
# so if you had nested--ever--the raise of an uppermost one
# could interrupt a lower one in it ensure. Same problem would occur.
# we can only have one at a time.
# this is a hack that assumes we 'only' use uninterruptible_block for
calls to timeout.

death_mutex = Mutex.new

uninterruptible_block do
begin
x = Thread.current
y = Thread.start {
sleep sec
death_mutex.synchronize {
x.raise exception, "execution expired" if x.alive? # this is
not raise_safe, but real raise. Still unsafe as per some previous
comments...but is it safer than it used to be now?
}
}
yield sec # would this code be dampered somehow?
ensure
death_mutex.synchronize {
y.kill if y and y.alive?
}
end
end
for interrupt in Thread.current.queued_interrupts do; raise interrupt;
end # or something like that
end

now you could "replace" the use of raise with raise_safe. Would this
work?
This would prohibit 'other threads' from interrupting a thread that is
within a timeout block (obviously bad) until it timed out, but would it
be thread safe?

Thoughts?
Thank you for reading :)

Overall I'd say I'm in agreeance with the OP's call to deprecate
Thread#kill, Thread#raise, Thread#critical= as they can too easily hurt,
though something like this might help. This suggestion also seems far
less complete than Tanaka's somehow.

Still wishing for an ensure block that was somehow 'guaranteed' to run
all the way through, and delay interruptions or ignore them till it
ended.
Thanks.

-R
 
C

Charles Oliver Nutter

ara said:

I don't think we'll get anywhere solving the problem by saying "let's
just remove threads" or "don't use threads".

Threads are hard, yes. Threads are too hard for most programmers, yes.
But since they're in Ruby, they're going to get used...and providing
provably unsafe operations on threads is obviously not making matters
better.

Besides, even if *you* avoid using threads, the libraries you call might
not. So we need a global answer.

- Charlie
 
C

Charles Oliver Nutter

Tanaka said:
Basically, asynchronous events should be queued.

* make a queue for each thread.
* Thread#raise(exc) enqueue exc to the queue.
* Thread.check_interrupt(klass) checks an exception which is
a kind of klass in the queue of the current thread. If it
exists, the exception is raised.
* other queue examining mechanism, such as sigpending, etc,
may be useful.

Thread.check_interrupt is a safe point.

Agreed. This is essentially how these unsafe operations are implemented
in JRuby currently. The "raise queue" is one-element long, and
appropriate locks are acquired to modify it from outside the target
thread. All threads periodically check to see if they've been "raised"
or "killed" or whether they should sleep because another thread has
"gone critical". The checkpoints are currently at the same points during
execution that Ruby 1.8 checks whether it should run the thread scheduler.
However safe points is not only Thread.check_interrupt.
Blocking operations, such as I/O, may or may not be safe
points. It is because Thread.check_interrupt with blocking
operations causes race conditions. So application must
choose that make a blocking operation interruption safe or
uninterruptible.

* Thread.blocking_interruptible(klass) { ... }
* Thread.blocking_uninterruptible(klass) { ... }

Also good. Awful names though :)
Blocking operations in Thread.blocking_interruptible are
safe points.

If a thread has no resources to release, it is safe to kill
asynchronously. Other than that, safe points should be
declared explicitly. When a resource is acquired,
application should declare it.

* Thread.delay_interrupt(klass) { ... }

It is safe points that out of outermost
Thread.delay_interrupt.

Another idea is about ensure clause. Since an ensure
clause is used to release a resource, it may delay
asynchronous events as in Thread.delay_interrupt.

As mental has said, if threads are uninterruptible by default, this
would make ensures safe. I think that's the only reasonable option.

- Charlie
 
C

Charles Oliver Nutter

MenTaLguY said:
Anything using Thread.critical has its own problems, particularly
when native threads are involved. It can easily result in deadlocks
if the running thread inadvertently tries to use something which another
stopped thread is using (imagine a thread stopped halfway through
releasing a lock that the ensure block needs to acquire, for example
-- something which would have worked fine with normal use of locks, but
becomes a problem once threads can "stop the world").

And to clarify, this problem isn't restricted to native threads...even a
green-threaded model could deadlock if one thread goes critical while
other threads are holding unrelated locks.

- Charlie
 
A

ara howard

I don't think we'll get anywhere solving the problem by saying
"let's just remove threads" or "don't use threads".

indeed. the only sane approach, i think, is the middle one whereby
threads are there, exceptions are there, but cross thread exceptions
operated via channels and events.

a @ http://codeforpeople.com/
 
T

Tanaka Akira

MenTaLguY said:
This sounds very good! I hadn't considered anything like
blocking_interruptible, but it seems useful.

The idea is come from the cancelation point of pthread.
 
T

Tanaka Akira

Paul Brannan said:
Raising an exception during a blocking write operation makes it
impossible to know how much data was written.

Similarly, raising an exception during a blocking read operation makes
it impossible to access the data that was read.

It is a very difficult problem.

I don't try to solve the problem perfectly.

I think the data lost may be acceptable if an asynchronous
event is for termination.

If the purpose is termination, data lost may be happen
anyway. For example, data from TCP is lost if a process
exits before data arrival.

If data lost is acceptable but termination delay is not
acceptable, Thread.blocking_interruptible(klass) { I/O } can
be used.

If data lost is not acceptable but termination delay is
acceptable, Thread.blocking_uninterruptible(klass) { I/O }
can be used.

If data lost and termination delay is not acceptable, it is
difficult. One idea is
Thread.blocking_interruptible(klass) { IO.select } and
nonblocking I/O. But nonblocking I/O causes inherently
partial result. So there should be a read/write buffer. If
termination procedure ignore the read buffer, data lost
occur. If termination procedure flushes the write buffer,
it may blocks. So data lost or termination delay again. It
is the difficult problem.

I hope either data lost or termination delay is acceptable.

If an asynchronous event is used for non-termination, data
lost is not acceptable in general. Assume a procedure is
called for the event. If some delay is acceptable,
Thread.blocking_uninterruptible(klass) { I/O } can be used.
If the delay is also not acceptable, it is the difficult
problem. However if the event is caused by Thread#raise,
I'm not sure why the procedure is not called directly
instead of Thread#raise. If the event is caused by signal,
I have no good idea to do it.

So I think asynchronous events should be used only for
termination. This is why I think exception on blocking
operation is tolerable.
 
T

Tanaka Akira

Charles Oliver Nutter said:
As mental has said, if threads are uninterruptible by default, this
would make ensures safe. I think that's the only reasonable option.

If threads are uniterruptible by default for all kind of
asynchronous events, SIGINT and other signals are not
effective by default. It is not reasonable.

So the default should be depended by kind of asynchronous
events. I think it is reasonable that signals are
interruptible by default but others are uninterruptible by
default.
 
D

Daniel DeLorme

I think the specific problem mentionned was that asynchronous exceptions
can abort the cleanup within an ensure section. So what about a simple
solution to that specific problem, like: if an asynchronous exception is
raised inside an ensure section, the exception is queued and raised at
the end of the ensure section. Is that too naive?
 
P

Paul Brannan

Now, the question with IO and asynchronous exceptions in Ruby is
whether there is a good way to report a partial result?

The existing API already returns the number of bytes successfully
written for both IO#write and IO#syswrite. But how do we find out what
caused the operation to be interrupted? Currently IO#write gives an
exception:

irb(main):001:0> r, w = IO.pipe
=> [#<IO:0x4039f734>, #<IO:0x4039f70c>]
irb(main):002:0> t = Thread.new { w.write("HELLO" * 20000) }
=> #<Thread:0x4039a0a4 run>
irb(main):003:0> r.close
=> nil
irb(main):004:0> t.join
Errno::EPIPE: Broken pipe
from (irb):2:in `write'
from (irb):4:in `join'
from (irb):4

and IO#syswrite gives the number of bytes written:

irb(main):018:0> sock = TCPSocket.new('localhost', 2000)
=> #<TCPSocket:0x4043bf1c>
irb(main):019:0> t = Thread.new { p sock.syswrite("HELLO" * 200000) }
=> #<Thread:0x40428804 run>
irb(main):024:0> t.join
131072
=> #<Thread:0x4040d324 dead>

I think the behavior of IO#syswrite is correct (because there is only
one write(2) system call), but IO#write actually got *both* partial
success *and* an exceptional condition, so what should its behavior be?

Paul
 
P

Paul Brannan

So the default should be depended by kind of asynchronous
events. I think it is reasonable that signals are
interruptible by default but others are uninterruptible by
default.

I think this is reasonable too, especially given that we can turn
signals off using trap.

Paul
 
M

MenTaLguY

I think the specific problem mentionned was that asynchronous exceptions
can abort the cleanup within an ensure section. So what about a simple
solution to that specific problem, like: if an asynchronous exception is
raised inside an ensure section, the exception is queued and raised at
the end of the ensure section. Is that too naive?

Maybe slightly. I don't think it would necessarily be an issue for the
current implementation of 1.9, but you do also need to do the
registration/unregistration of ensure and catch clauses in an
uninterruptible fashion.

Basically you need to have the equivalent of this:

Thread.uninterruptable do
begin
Thread.interruptable do
# ...
end
rescue
# ...
ensure
# ...
end
end

(in cases where the thread is currently interruptable)

See also section 4.2 of "Asynchronous Exceptions in Haskell":

http://citeseer.ist.psu.edu/415348.html

-mental
 
M

MenTaLguY

So the default should be depended by kind of asynchronous
events. I think it is reasonable that signals are
interruptible by default but others are uninterruptible by
default.

This seems sensible to me, though it would be nice to also have
an alternate way to handle POSIX signals that didn't involve
interruption.

-mental
 
M

MenTaLguY

So I think asynchronous events should be used only for
termination. This is why I think exception on blocking
operation is tolerable.

I think I agree. However, that still leaves us with timeouts, which
are sometimes done for operations that will be retried in the future,
so data loss is not acceptable there.

Won't we also need to augment the existing stdlib APIs to allow the
specification of timeouts for any blocking operations that can't
already be done with non-blocking and select?

-mental
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top