How to force a thread to stop

A

Alex Martelli

H J van Rooyen said:
| > *grin* - Yes of course - if the WDT was enabled - its something that
| > I have not seen on PC's yet...
|
| They are available for PC's, as plug-in cards, at least for the ISA
| bus in the old days, and almost certainly for the PCI bus today.

That is cool, I was not aware of this - added to a long running server it will
help to make the system more stable - a hardware solution to hard to find bugs
in Software - (or even stuff like soft errors in hardware - speak to the
Avionics boys about Neutrons) do you know who sells them and what they are
called? -

When you're talking about a bunch of (multiprocessing) machines on a
LAN, you can have a "watchdog machine" (or more than one, for
redundancy) periodically checking all others for signs of health -- and,
if needed, rebooting the sick machines via ssh (assuming the sickness is
in userland, of course -- to come back from a kernel panic _would_
require HW support)... so (in this setting) you _could_ do it in SW, and
save the $100+ per box that you'd have to spend at some shop such as
<http://www.pcwatchdog.com/> or the like...


Alex
 
H

H J van Rooyen

|
| >
| > | > *grin* - Yes of course - if the WDT was enabled - its something that
| > | > I have not seen on PC's yet...
| > |
| > | They are available for PC's, as plug-in cards, at least for the ISA
| > | bus in the old days, and almost certainly for the PCI bus today.
| >
| > That is cool, I was not aware of this - added to a long running server it
will
| > help to make the system more stable - a hardware solution to hard to find
bugs
| > in Software - (or even stuff like soft errors in hardware - speak to the
| > Avionics boys about Neutrons) do you know who sells them and what they are
| > called? -
|
| When you're talking about a bunch of (multiprocessing) machines on a
| LAN, you can have a "watchdog machine" (or more than one, for
| redundancy) periodically checking all others for signs of health -- and,
| if needed, rebooting the sick machines via ssh (assuming the sickness is
| in userland, of course -- to come back from a kernel panic _would_
| require HW support)... so (in this setting) you _could_ do it in SW, and
| save the $100+ per box that you'd have to spend at some shop such as
| <http://www.pcwatchdog.com/> or the like...
|
|
| Alex

Thanks - will check it out - seems a lot of money for 555 functionality
though....

Especially if like I, you have to pay for it with Rand - I have started to call
the local currency Runt...

(Typical South African Knee Jerk Reaction - everything is too expensive here...
:- ) )

- Hendrik
 
G

Gerhard Fiedler

Thanks - will check it out - seems a lot of money for 555 functionality
though....

Especially if like I, you have to pay for it with Rand - I have started
to call the local currency Runt...

Depending on what you're up to, you can make such a thing yourself
relatively easily. There are various possibilities, both for the
reset/restart part and for the kick-the-watchdog part.

Since you're talking about a "555" you know at least /some/ electronics :)

Two 555s (or similar):
- One wired as a retriggerable monostable and hooked up to a control line
of a serial port. It needs to be triggered regularly in order to not
trigger the second timer.
- The other wired as a monostable and hooked up to a relay that gets
activated for a certain time when it gets triggered. That relay controls
the computer power line (if you want to stay outside the case) or the reset
switch (if you want to build it into your computer).

I don't do such things with 555s... I'm more a digital guy. There are many
options to do that, and all a lot cheaper than those boards, if you have
more time than money :)

Gerhard
 
C

Carl J. Van Arsdall

Alex said:
When you're talking about a bunch of (multiprocessing) machines on a
LAN, you can have a "watchdog machine" (or more than one, for
redundancy) periodically checking all others for signs of health -- and,
if needed, rebooting the sick machines via ssh (assuming the sickness is
in userland, of course -- to come back from a kernel panic _would_
require HW support)... so (in this setting) you _could_ do it in SW, and
save the $100+ per box that you'd have to spend at some shop such as
<http://www.pcwatchdog.com/> or the like...
Yea, there are other free solutions you might want to check out, I've
been looking at ganglia and nagios. These require constant
communication with a server, however they are customizable in that you
can have the server take action on various events.

Cheers!

-c


--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software
 
P

Paul Rubin

Carl J. Van Arsdall said:
Yea, there are other free solutions you might want to check out, I've
been looking at ganglia and nagios. These require constant
communication with a server, however they are customizable in that you
can have the server take action on various events. Cheers!

There's some pretty tricky issues with desktop-class PC hardware about
what to do if you need to reconfigure or reboot one remotely. Real
server hardware is better equipped for this but costs a lot more.

I remember something called "PC-Weasel" which was an ISA-bus plug-in
card that was basically a VGA card with an ethernet port. That let
you see the bootup screens remotely, adjust the cmos settings, etc. I
remember trying without success to find something like that for the
PCI bus. Without something like that, all you can really do if a PC
in server gets wedged is remote-reset or power cycle it; even that of
course takes special hardware, but many colo places are already set up
for that.
 
H

H J van Rooyen

| On 2006-08-03 06:07:31, H J van Rooyen wrote:
|
| > Thanks - will check it out - seems a lot of money for 555 functionality
| > though....
| >
| > Especially if like I, you have to pay for it with Rand - I have started
| > to call the local currency Runt...
|
| Depending on what you're up to, you can make such a thing yourself
| relatively easily. There are various possibilities, both for the
| reset/restart part and for the kick-the-watchdog part.
|
| Since you're talking about a "555" you know at least /some/ electronics :)

*grin* You could say that - original degree was Physics and Maths ...

| Two 555s (or similar):
| - One wired as a retriggerable monostable and hooked up to a control line
| of a serial port. It needs to be triggered regularly in order to not
| trigger the second timer.
| - The other wired as a monostable and hooked up to a relay that gets
| activated for a certain time when it gets triggered. That relay controls
| the computer power line (if you want to stay outside the case) or the reset
| switch (if you want to build it into your computer).
|
| I don't do such things with 555s... I'm more a digital guy. There are many
| options to do that, and all a lot cheaper than those boards, if you have
| more time than money :)

Like wise - some 25 years of amongst other things designing hardware and
programming 8051 and DSP type processors in assembler...

The 555 came to mind because it has been around for ever - and as someone once
said (Steve Circia ?) -
"My favourite programming language is solder"... - a dumb state machine
implemented in hardware beats a processor every time when it comes to
reliability - its just a tad inflexible...

The next step above the 555 is a PIC... then you can steal power from the RS-232
line - and its a small step from "PIC" to "PIG"...

Although this is getting bit off topic on a language group...

;-) Hendrik
 
H

H J van Rooyen

| Alex Martelli wrote:
| >
| >
| >>
| >> | > *grin* - Yes of course - if the WDT was enabled - its something that
| >> | > I have not seen on PC's yet...
| >> |
| >> | They are available for PC's, as plug-in cards, at least for the ISA
| >> | bus in the old days, and almost certainly for the PCI bus today.
| >>
| >> That is cool, I was not aware of this - added to a long running server it
will
| >> help to make the system more stable - a hardware solution to hard to find
bugs
| >> in Software - (or even stuff like soft errors in hardware - speak to the
| >> Avionics boys about Neutrons) do you know who sells them and what they are
| >> called? -
| >>
| >
| > When you're talking about a bunch of (multiprocessing) machines on a
| > LAN, you can have a "watchdog machine" (or more than one, for
| > redundancy) periodically checking all others for signs of health -- and,
| > if needed, rebooting the sick machines via ssh (assuming the sickness is
| > in userland, of course -- to come back from a kernel panic _would_
| > require HW support)... so (in this setting) you _could_ do it in SW, and
| > save the $100+ per box that you'd have to spend at some shop such as
| > <http://www.pcwatchdog.com/> or the like...
| >
| >
| >
| Yea, there are other free solutions you might want to check out, I've
| been looking at ganglia and nagios. These require constant
| communication with a server, however they are customizable in that you
| can have the server take action on various events.
|
| Cheers!
|
| -c
Thanks - will have a look - Hendrik
 
G

Gerhard Fiedler

The next step above the 555 is a PIC... then you can steal power from the
RS-232 line - and its a small step from "PIC" to "PIG"...

I see... you obviously know what to do, if you want to :)

But I'm not sure such a device alone is of much help in a typical server. I
think it's probably just as common that only one service hangs. To make it
useful, the trigger process has to be carefully designed, so that it
actually has a chance of failing when you need it to fail. This probably
requires either code changes to the various services (so that they each
trigger their own watchdog) or some supervisor program that only triggers
the watchdog if it receives responses from all relevant services.

Gerhard
 
H

H J van Rooyen

| On 2006-08-04 02:33:07, H J van Rooyen wrote:
|
| > The next step above the 555 is a PIC... then you can steal power from the
| > RS-232 line - and its a small step from "PIC" to "PIG"...
|
| I see... you obviously know what to do, if you want to :)
|
| But I'm not sure such a device alone is of much help in a typical server. I
| think it's probably just as common that only one service hangs. To make it
| useful, the trigger process has to be carefully designed, so that it
| actually has a chance of failing when you need it to fail. This probably
| requires either code changes to the various services (so that they each
| trigger their own watchdog) or some supervisor program that only triggers
| the watchdog if it receives responses from all relevant services.
|
| Gerhard

This is true - its trivial to just kill the whole machine like this, but its
kind of like using a sledgehammer to crack a nut - and as you so rightly point
out - if the process that tickles the watchdog to make it happy is not (very)
tightly coupled to the thing you want to monitor - then it may not work at all -
specially if interrupts are involved - in fact something like a state machine
that looks for alternate occurrences of (at least) two things is required - the
interrupt gives it a kick and sets a flag, the application sees the flag and
gives it the alternate kick and clears the flag, and so on, with the internal
tasks in the machine "passing the ball" in this (or some other) way - that way
you are (relatively) sure the thing is still running... but it needs careful
design or it will either kill the machine for no good reason, (when something
like disk accesses slow the external (user) processes down ) , or it will fail
to fire if it is something that is driven from a call back - the app may be
crazy, but the OS may still be doing call-backs and timing stuff faithfully -
you cant be too careful...

- Hendrik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,152
Latest member
LorettaGur
Top