How to force a thread to stop

Alex Martelli · Aug 3, 2006

H J van Rooyen said:
| > *grin* - Yes of course - if the WDT was enabled - its something that
| > I have not seen on PC's yet...
|
| They are available for PC's, as plug-in cards, at least for the ISA
| bus in the old days, and almost certainly for the PCI bus today.

That is cool, I was not aware of this - added to a long running server it will
help to make the system more stable - a hardware solution to hard to find bugs
in Software - (or even stuff like soft errors in hardware - speak to the
Avionics boys about Neutrons) do you know who sells them and what they are
called? -

When you're talking about a bunch of (multiprocessing) machines on a
LAN, you can have a "watchdog machine" (or more than one, for
redundancy) periodically checking all others for signs of health -- and,
if needed, rebooting the sick machines via ssh (assuming the sickness is
in userland, of course -- to come back from a kernel panic _would_
require HW support)... so (in this setting) you _could_ do it in SW, and
save the $100+ per box that you'd have to spend at some shop such as
<http://www.pcwatchdog.com/> or the like...

Alex

H J van Rooyen · Aug 3, 2006

|
| >
| > | > *grin* - Yes of course - if the WDT was enabled - its something that
| > | > I have not seen on PC's yet...
| > |
| > | They are available for PC's, as plug-in cards, at least for the ISA
| > | bus in the old days, and almost certainly for the PCI bus today.
| >
| > That is cool, I was not aware of this - added to a long running server it
will
| > help to make the system more stable - a hardware solution to hard to find
bugs
| > in Software - (or even stuff like soft errors in hardware - speak to the
| > Avionics boys about Neutrons) do you know who sells them and what they are
| > called? -
|
| When you're talking about a bunch of (multiprocessing) machines on a
| LAN, you can have a "watchdog machine" (or more than one, for
| redundancy) periodically checking all others for signs of health -- and,
| if needed, rebooting the sick machines via ssh (assuming the sickness is
| in userland, of course -- to come back from a kernel panic _would_
| require HW support)... so (in this setting) you _could_ do it in SW, and
| save the $100+ per box that you'd have to spend at some shop such as
| <http://www.pcwatchdog.com/> or the like...
|
|
| Alex

Thanks - will check it out - seems a lot of money for 555 functionality
though....

Especially if like I, you have to pay for it with Rand - I have started to call
the local currency Runt...

(Typical South African Knee Jerk Reaction - everything is too expensive here...
:- ) )

- Hendrik

Gerhard Fiedler · Aug 3, 2006

Thanks - will check it out - seems a lot of money for 555 functionality
though....

Especially if like I, you have to pay for it with Rand - I have started
to call the local currency Runt...

Depending on what you're up to, you can make such a thing yourself
relatively easily. There are various possibilities, both for the
reset/restart part and for the kick-the-watchdog part.

Since you're talking about a "555" you know at least /some/ electronics

Two 555s (or similar):
- One wired as a retriggerable monostable and hooked up to a control line
of a serial port. It needs to be triggered regularly in order to not
trigger the second timer.
- The other wired as a monostable and hooked up to a relay that gets
activated for a certain time when it gets triggered. That relay controls
the computer power line (if you want to stay outside the case) or the reset
switch (if you want to build it into your computer).

I don't do such things with 555s... I'm more a digital guy. There are many
options to do that, and all a lot cheaper than those boards, if you have
more time than money

Gerhard

Carl J. Van Arsdall · Aug 3, 2006

Alex said:
When you're talking about a bunch of (multiprocessing) machines on a
LAN, you can have a "watchdog machine" (or more than one, for
redundancy) periodically checking all others for signs of health -- and,
if needed, rebooting the sick machines via ssh (assuming the sickness is
in userland, of course -- to come back from a kernel panic _would_
require HW support)... so (in this setting) you _could_ do it in SW, and
save the $100+ per box that you'd have to spend at some shop such as
<http://www.pcwatchdog.com/> or the like...

Yea, there are other free solutions you might want to check out, I've
been looking at ganglia and nagios. These require constant
communication with a server, however they are customizable in that you
can have the server take action on various events.

Cheers!

-c

--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software

Paul Rubin · Aug 3, 2006

Carl J. Van Arsdall said:
Yea, there are other free solutions you might want to check out, I've
been looking at ganglia and nagios. These require constant
communication with a server, however they are customizable in that you
can have the server take action on various events. Cheers!

There's some pretty tricky issues with desktop-class PC hardware about
what to do if you need to reconfigure or reboot one remotely. Real
server hardware is better equipped for this but costs a lot more.

I remember something called "PC-Weasel" which was an ISA-bus plug-in
card that was basically a VGA card with an ethernet port. That let
you see the bootup screens remotely, adjust the cmos settings, etc. I
remember trying without success to find something like that for the
PCI bus. Without something like that, all you can really do if a PC
in server gets wedged is remote-reset or power cycle it; even that of
course takes special hardware, but many colo places are already set up
for that.

H J van Rooyen · Aug 4, 2006

| On 2006-08-03 06:07:31, H J van Rooyen wrote:
|
| > Thanks - will check it out - seems a lot of money for 555 functionality
| > though....
| >
| > Especially if like I, you have to pay for it with Rand - I have started
| > to call the local currency Runt...
|
| Depending on what you're up to, you can make such a thing yourself
| relatively easily. There are various possibilities, both for the
| reset/restart part and for the kick-the-watchdog part.
|
| Since you're talking about a "555" you know at least /some/ electronics

*grin* You could say that - original degree was Physics and Maths ...

| Two 555s (or similar):
| - One wired as a retriggerable monostable and hooked up to a control line
| of a serial port. It needs to be triggered regularly in order to not
| trigger the second timer.
| - The other wired as a monostable and hooked up to a relay that gets
| activated for a certain time when it gets triggered. That relay controls
| the computer power line (if you want to stay outside the case) or the reset
| switch (if you want to build it into your computer).
|
| I don't do such things with 555s... I'm more a digital guy. There are many
| options to do that, and all a lot cheaper than those boards, if you have
| more time than money

Like wise - some 25 years of amongst other things designing hardware and
programming 8051 and DSP type processors in assembler...

The 555 came to mind because it has been around for ever - and as someone once
said (Steve Circia ?) -
"My favourite programming language is solder"... - a dumb state machine
implemented in hardware beats a processor every time when it comes to
reliability - its just a tad inflexible...

The next step above the 555 is a PIC... then you can steal power from the RS-232
line - and its a small step from "PIC" to "PIG"...

Although this is getting bit off topic on a language group...

;-) Hendrik

H J van Rooyen · Aug 4, 2006

Gerhard Fiedler · Aug 4, 2006

The next step above the 555 is a PIC... then you can steal power from the
RS-232 line - and its a small step from "PIC" to "PIG"...

I see... you obviously know what to do, if you want to

But I'm not sure such a device alone is of much help in a typical server. I
think it's probably just as common that only one service hangs. To make it
useful, the trigger process has to be carefully designed, so that it
actually has a chance of failing when you need it to fail. This probably
requires either code changes to the various services (so that they each
trigger their own watchdog) or some supervisor program that only triggers
the watchdog if it receives responses from all relevant services.

Gerhard

H J van Rooyen · Aug 4, 2006

| On 2006-08-04 02:33:07, H J van Rooyen wrote:
|
| > The next step above the 555 is a PIC... then you can steal power from the
| > RS-232 line - and its a small step from "PIC" to "PIG"...
|
| I see... you obviously know what to do, if you want to

|
| But I'm not sure such a device alone is of much help in a typical server. I
| think it's probably just as common that only one service hangs. To make it
| useful, the trigger process has to be carefully designed, so that it
| actually has a chance of failing when you need it to fail. This probably
| requires either code changes to the various services (so that they each
| trigger their own watchdog) or some supervisor program that only triggers
| the watchdog if it receives responses from all relevant services.
|
| Gerhard

This is true - its trivial to just kill the whole machine like this, but its
kind of like using a sledgehammer to crack a nut - and as you so rightly point
out - if the process that tickles the watchdog to make it happy is not (very)
tightly coupled to the thing you want to monitor - then it may not work at all -
specially if interrupts are involved - in fact something like a state machine
that looks for alternate occurrences of (at least) two things is required - the
interrupt gives it a kick and sets a flag, the application sees the flag and
gives it the alternate kick and clears the flag, and so on, with the internal
tasks in the machine "passing the ball" in this (or some other) way - that way
you are (relatively) sure the thing is still running... but it needs careful
design or it will either kill the machine for no good reason, (when something
like disk accesses slow the external (user) processes down ) , or it will fail
to fire if it is something that is driven from a call back - the app may be
crazy, but the OS may still be doing call-backs and timing stuff faithfully -
you cant be too careful...

- Hendrik

asyncio - how to stop loop?	0	Jun 11, 2014
stop thread from outside 'run'	0	Jun 5, 2012
embedding ipython kernel in a thread	0	Jun 9, 2014
how does a queue stop the thread?	5	Apr 21, 2010
How to Restart a thread	1	Nov 25, 2006
Thread._stop() behavior changed in Python 3.4	0	Mar 17, 2014
Stop a thread on deletion	2	Aug 8, 2007
How to stop an [Rpyc] server thread?	3	Sep 7, 2006

How to force a thread to stop

Alex Martelli

H J van Rooyen

Gerhard Fiedler

Carl J. Van Arsdall

Paul Rubin

H J van Rooyen

H J van Rooyen

Gerhard Fiedler

H J van Rooyen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads