How to force a thread to stop

G

Gerhard Fiedler

I'd be all for using processes but setting up communication between
processes would be difficult wouldn't it? I mean, threads have shared
memory so making sure all threads know the current system state is an
easy thing.

I'm not sure about that. Sharing data between threads or processes is never
an easy thing, especially since you are saying you can't trust your module
coders to "play nice". If you can't trust them to terminate their threads
nicely when asked so, you also can't trust them to responsibly handle
shared memory. That's exactly the reason why I suggested processes.

With processes wouldn't I have to setup some type of server/client
design, where one process has the system state and then the other
processes constantly probe the host when they need the current system
state?

Anything else is bound to fail. You need to have safeguards around any
shared data. (A semaphore is a type of server/client thing...) At the very
least you need to prevent read access while it is updated; very rarely this
is an atomic action, so there are times where the system state is
inconsistent while it is being updated. (I don't think you can consider
many Python commands as atomic WRT threads, but I'm not sure about this.)
IMO, in the situation you are describing, it is an advantage that data is
not normally accessible -- this means that your module coders need to
access the data in the way you present it to them, and so you can control
that it is being accessed correctly.

Gerhard
 
B

bryanjugglercryptographer

Carl said:
Unfortunately this is due to the nature of the problem I am tasked with
solving. I have a large computing farm, these os.system calls are often
things like ssh that do work on locations remote from the initial python
task. I suppose eventually I'll end up using a framework like twisted
but, as with many projects, I got thrown into this thing and threading
is where we ended up. So now there's the rush to make things work
before we can really look at a proper solution.

I don't get what threading and Twisted would to do for
you. The problem you actually have is that you sometimes
need terminate these other process running other programs.
Use spawn, fork/exec* or maybe one of the popens.

Again, the problem I'm trying to solve doesn't work like this. I've
been working on a framework to be run across a large number of
distributed nodes (here's where you throw out the "duh, use a
distributed technology" in my face). The thing is, I'm only writing the
framework, the framework will work with modules, lots of them, which
will be written by other people. Its going to be impossible to get
people to write hundreds of modules that constantly check for status
messages. So, if I want my thread to "give itself up" I have to tell it
to give up.

Threads have little to do with what you say you need.

[...]
I feel like this is something we've established multiple times. Yes, we
want the thread to kill itself. Alright, now that we agree on that,
what is the best way to do that.

Wrong. In your examples, you want to kill other processes. You
can't run external programs such as ssh as Python threads. Ending
a Python thread has essentially nothing to do with it.
Right now people keep saying we must send the thread a message.

Not me. I'm saying work the problem you actually have.
 
C

Carl J. Van Arsdall

Carl J. Van Arsdall wrote:

I don't get what threading and Twisted would to do for
you. The problem you actually have is that you sometimes
need terminate these other process running other programs.
Use spawn, fork/exec* or maybe one of the popens.
I have a strong need for shared memory space in a large distributed
environment. How does spawn, fork/exec allow me to meet that need?
I'll look into it, but I was under the impression having shared memory
in this situation would be pretty hairy. For example, I could fork of a
50 child processes, but then I would have to setup some kind of
communication mechanism between them where the server builds up a queue
of requests from child processes and then services them in a FIFO
fashion, does that sound about right?
Threads have little to do with what you say you need.

[...]
I feel like this is something we've established multiple times. Yes, we
want the thread to kill itself. Alright, now that we agree on that,
what is the best way to do that.

Wrong. In your examples, you want to kill other processes. You
can't run external programs such as ssh as Python threads. Ending
a Python thread has essentially nothing to do with it.
There's more going on than ssh here. Since I want to run multiple
processes to multiple devices at one time and still have mass shared
memory I need to use threads. There's a mass distributed system that
needs to be controlled, that's the problem I'm trying to solve. You can
think of each ssh as a lengthy IO process that each gets its own
device. I use the threads to allow me to do IO to multiple devices at
once, ssh just happens to be the IO. The combination of threads and ssh
allowed us to have a *primitive* distributed system (and it works too,
so I *can* run external programs in python threads). I didn't say is
was the best or the correct solution, but it works and its what I was
handed when I was thrown into this project. I'm hoping in fifteen years
or when I get an army of monkeys to fix it, it will change. I'm not
worried about killing processes, that's easy, I could kill all the sshs
or whatever else I want without batting an eye. The threads that were
created in order to allow me to do all of this work simultaneously,
that's the issue. Granted, I'm heavily looking into a way of doing this
with processes, I still don't see how threads are the wrong choice with
my present situation.
Not me. I'm saying work the problem you actually have.
The problem I have is a large distributed system, that's the reality of
it. The short summary, I need to use and control 100+ machines in a
computing farm. They all need to share memory or to actively
communicate with each other via some other mechanism. Without giving
any other details, that's the problem I have to solve. Right now I'm
working with someone else's code. Without redesigning the system from
the ground up, I have to fix it.


--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software
 
P

Paul Rubin

Carl J. Van Arsdall said:
The problem I have is a large distributed system, that's the reality
of it. The short summary, I need to use and control 100+ machines in
a computing farm. They all need to share memory or to actively
communicate with each other via some other mechanism. Without giving
any other details, that's the problem I have to solve.

Have you looked at POSH yet? http://poshmodule.sf.net

There's also an shm module that's older and maybe more reliable.
Or you might be able to just use mmap.
 
C

Carl J. Van Arsdall

Paul said:
Have you looked at POSH yet? http://poshmodule.sf.net

There's also an shm module that's older and maybe more reliable.
Or you might be able to just use mmap.
I'm looking at POSH, shm, and stackless right now! :)

Thanks!

-carl

--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software
 
B

bryanjugglercryptographer

Carl said:
I have a strong need for shared memory space in a large distributed
environment.

Distributed shared memory is a tough trick; only a few systems simulate
it.
How does spawn, fork/exec allow me to meet that need?

I have no idea why you think threads or fork/exec will give you
distributed
shared memory.
I'll look into it, but I was under the impression having shared memory
in this situation would be pretty hairy. For example, I could fork of a
50 child processes, but then I would have to setup some kind of
communication mechanism between them where the server builds up a queue
of requests from child processes and then services them in a FIFO
fashion, does that sound about right?

That much is easy. What it has to with what you say you require
remains a mystery.

Threads have little to do with what you say you need.

[...]
I feel like this is something we've established multiple times. Yes, we
want the thread to kill itself. Alright, now that we agree on that,
what is the best way to do that.

Wrong. In your examples, you want to kill other processes. You
can't run external programs such as ssh as Python threads. Ending
a Python thread has essentially nothing to do with it.
There's more going on than ssh here. Since I want to run multiple
processes to multiple devices at one time and still have mass shared
memory I need to use threads.

No, you would need to use something that implements shared
memory across multiple devices. Threads are multiple lines of
execution in the same address space.
There's a mass distributed system that
needs to be controlled, that's the problem I'm trying to solve. You can
think of each ssh as a lengthy IO process that each gets its own
device. I use the threads to allow me to do IO to multiple devices at
once, ssh just happens to be the IO. The combination of threads and ssh
allowed us to have a *primitive* distributed system (and it works too,
so I *can* run external programs in python threads).

No, you showed launching it from a Python thread using os.system().
It's not running in the thread; it's running in a separate process.
I didn't say is
was the best or the correct solution, but it works and its what I was
handed when I was thrown into this project. I'm hoping in fifteen years
or when I get an army of monkeys to fix it, it will change. I'm not
worried about killing processes, that's easy, I could kill all the sshs
or whatever else I want without batting an eye.

After launching it with os.sytem()? Can you show the code?
 
C

Carl J. Van Arsdall

Distributed shared memory is a tough trick; only a few systems simulate
it.
Yea, this I understand, maybe I chose some poor words to describe what I
wanted. I think this conversation is getting hairy and confusing so I'm
going to try and paint a better picture of what's going on. Maybe this
will help you understand exactly what's going on or at least what I'm
trying to do, because I feel like we're just running in circles. After
the detailed explanation, if threads are the obvious choice or not, it
will be much easier to pick apart what I need and probably also easier
for me to see your point... so here goes... (sorry its long, but I keep
getting dinged for not being thorough enough).

So, I have a distributed build system. The system is tasked with
building a fairly complex set of packages that form a product. The
system needs to build these packages for 50 architectures using cross
compilation as well as support for 5 different hosts. Say there are
also different versions of this with tweaks for various configurations,
so in the end I might be trying to build 200+ different things at once.
I have a computing farm of 40 machines to do this for me.. That's the
high-level scenario without getting too detailed. There are also
subsystems that help us manage the machines and things, I don't want to
get into that, I'm going to try to focus on a scenario more abstract
than cluster/resource management stuff.

Alright, so manually running builds is going to be crazy and
unmanageable. So what the people who came before me did to manage this
scenario was to fork on thread per build. The threads invoke a series
of calls that look like

os.system(ssh <host> <command>)

or for more complex operations they would just spawn a process that ran
another python script)

os.system(ssh <host> <script>)

The purpose behind all this was for a couple things:

* The thread constantly needed information about the state of the
system (for example we don't want to end up building the same
architecture twice)
* We wanted a centralized point of control for an entire build
* We needed to be able to use as many machines as possible from a
central location.

Python threads worked very well for this. os.system behaves a lot like
many other IO operations in python and the interpreter gives up the
GIL. Each thread could run remote operations and we didn't really have
any problems. There wasn't much of a need to do fork, all it would have
done is increased the amount of memory used by the system.

Alright, so this scheme that was first put in place kind of worked.
There were some problems, for example when someone did something like

os.system(ssh <host> <script>) we had no good way of knowing what the
hell happened in the script. Now granted, they used shared files to do
some of it over nfs mounts, but I really hate that. It doesn't work
well, its clunky, and difficult to manage. There were other problems
too, but I just wanted to give a sample.

Alright, so things aren't working, I come on board, I have a boss who
wants things done immediately. What we did was created what we called a
"Python Execution Framework". The purpose of the framework was to
mitigate a number of problems we had as well as take the burden of
distribution away from the programmers by providing a few layers of
abstraction (i'm only going to focus on the distributed part of the
framework, the rest is irrelevant to the discussion). The framework
executes and threads modules (or lists of modules). Since we had
limited time, we designed the framework with "distribution environment"
in mind but realized that if we shoot for the top right away it will
take years to get anything implemented.

Since we knew we eventually wanted a distributed system that could
execute framework modules entirely on remote machines we carefully
design and prepared the system for this. This involves some abstraction
and some simple mechanisms. However right now each ssh call will be
executed from a thread (as they will be done concurrently, just like
before). The threads still need to know about the state of the system,
but we'd also like to be able to issue some type of control that is more
event driven -- this can be sending the thread a terminate message or
sending the thread a message regarding the completion of a dependency
(we use conditions and events to do this synchronization right now). We
hoped that in the case of a catastrophic event or a user 'kill' signal
that the the system could take control of all the threads (or at least,
ask them to go away), this is what started the conversation in the first
place. We don't want to use a polling loop for these threads to check
for messages, we wanted to use something event driven (I mistakenly used
the word interrupt in earlier posts, but I think it still illustrates my
point). Its not only important that the threads die, but that they die
with grace. There's lots of cleanup work that has to be done when
things exit or things end up in an indeterminable state.

So, I feel like I have a couple options,

1) try moving everything to a process oriented configuration - we think
this would be bad, from a resource standpoint as well as it would make
things more difficult to move to a fully distributed system later, when
I get my army of code monkeys.

2) Suck it up and go straight for the distributed system now - managers
don't like this, but maybe its easier than I think its going to be, I dunno

3) See if we can find some other way of getting the threads to terminate.

4) Kill it and clean it up by hand or helper scripts - we don't want to
do this either, its one of the major things we're trying to get away from.

Alright, that's still a fairly high-level description. After all that,
if threads are still stupid then I think I'll much more easily see it
but I hope this starts to clear up confused. I don't really need a
distributed shared memory environment, but right now I do need shared
memory and it needs to be used fairly efficiently. For a fully
distributed environment I was going to see what various technologies
offered to pass data around, I figured that they must have some
mechanism for doing it or at least accessing memory from a central
location (we're setup to do this now we threads, we just need to expand
the concept to allow nodes to do it remotely). Right now, based on what
I have to do I think threads are the right choice until I can look at a
better implementation (i hear twisted is good at what I ultimately want
to do, but I don't know a thing about it).

Alright, if you read all that, thanks, and thanks for your input.
Whether or not I've agreed with anything, me and a few colleagues
definitely discuss each idea as its passed to us. For that, thanks to
the python list!

-carl


--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software
 
P

Paul Rubin

Carl J. Van Arsdall said:
Alright, so manually running builds is going to be crazy and
unmanageable. So what the people who came before me did to manage
this scenario was to fork on thread per build. The threads invoke a
series of calls that look like

os.system(ssh <host> <command>)

Instead of using os.system, maybe you want to use one of the popens or
the subprocess module. For each ssh, you'd spawn off a process that
does the ssh and communicates back to the control process through a
set of file descriptors (Unix pipe endpoints or whatever). The
control process could use either threads or polling/select to talk to
the pipes and keep track of what the subprocesses were doing.

I don't think you need anything as complex as shared memory for this.
You're just writing a special purpose chat server.
 
B

bryanjugglercryptographer

Paul said:
Have you looked at POSH yet? http://poshmodule.sf.net

Paul, have you used POSH? Does it work well? Any major
gotchas?

I looked at the paper... well, not all 200+ pages, but I checked
how they handle a couple parts that I thought hard and they
seem to have good ideas. I didn't find the SourceForge project
so promising. The status is alpha, the ToDo's are a little scary,
and project looks stalled. Also it's *nix only.
 
G

Gerhard Fiedler

Alright, if you read all that, thanks, and thanks for your input. Whether
or not I've agreed with anything, me and a few colleagues definitely
discuss each idea as its passed to us. For that, thanks to the python
list!

I think you should spend a few hours and read up on realtime OS features
and multitasking programming techniques. Get a bit away from the bottom
level, forget about the specific features of your OS and your language and
try to come up with a set of requirements and a structure that fits them.

Regarding communicating with a thread (or process, that's about the same,
only the techniques vary), for example -- there are not that many options.
Either the thread/process polls a message queue or it goes to sleep once it
has done whatever it needed to do until something comes in through a queue
or until a semaphore gets set. What is better suited for you depends on
your requirements and overall structure. Both doesn't seem to be too clear.

If you have threads that take too long and need to be killed, then I'd say
fix the code that runs there...

Gerhard
 
P

Paul Rubin

Paul, have you used POSH? Does it work well? Any major gotchas?

I haven't used it. I've been wanting to try. I've heard it works ok
in Linux but I've heard of problems with it under Solaris.

Now that I understand what the OP is trying to do, I think POSH is
overkill, and just using pipes or sockets is fine. If he really wants
to use shared memory, hmmm, there used to be an shm module at

http://mambo.peabody.jhu.edu/omr/omi/source/shm_source/shm.html

but that site now hangs (and it's not on archive.org), and Python's
built-in mmap module doesn't support any type of locks.

I downloaded the above shm module quite a while ago, so if I can find
it I might upload it to my own site. It was a straightforward
interface to the Sys V shm calls (also *nix-only, I guess). I guess
he also could use mmap with no locks, but with separate memory regions
for reading and writing in each subprocess, using polling loops. I
sort of remember Apache's mod_mmap doing something like that if it has
to.

To really go off the deep end, there are a few different MPI libraries
with Python interfaces.
I looked at the paper... well, not all 200+ pages, but I checked
how they handle a couple parts that I thought hard and they
seem to have good ideas.

200 pages?? The paper I read was fairly short, and I looked at the
code (not too carefully) and it seemed fairly straightforward. Maybe
I missed something, or am not remembering; it's been a while.
I didn't find the SourceForge project
so promising. The status is alpha, the ToDo's are a little scary,
and project looks stalled. Also it's *nix only.

Yeah, using it for anything serious would involve being willing to fix
problems with it as they came up. But I think the delicate parts of
it are parts that aren't that important, so I'd just avoid using
those.
 
G

Gerhard Fiedler

Also, threading's condition and event constructs are used a lot
(i talk about it somewhere in that thing I wrote). They are easy to use
and nice and ready for me, with a server wouldn't I have to have things
poll/wait for messages?

How would a thread receive a message, unless it polls some kind of queue or
waits for a message from a queue or at a semaphore? You can't just "push" a
message into a thread; the thread has to "pick it up", one way or another.

Gerhard
 
C

Carl J. Van Arsdall

Gerhard said:
How would a thread receive a message, unless it polls some kind of queue or
waits for a message from a queue or at a semaphore? You can't just "push" a
message into a thread; the thread has to "pick it up", one way or another.

Gerhard
Well, I guess I'm thinking of an event driven mechanism, kinda like
setting up signal handlers. I don't necessarily know how it works under
the hood, but I don't poll for a signal. I setup a handler, when the
signal comes, if it comes, the handler gets thrown into action. That's
what I'd be interesting in doing with threads.

-c



--

Carl J. Van Arsdall
(e-mail address removed)
Build and Release
MontaVista Software
 
G

Gerhard Fiedler

Well, I guess I'm thinking of an event driven mechanism, kinda like
setting up signal handlers. I don't necessarily know how it works under
the hood, but I don't poll for a signal. I setup a handler, when the
signal comes, if it comes, the handler gets thrown into action. That's
what I'd be interesting in doing with threads.

What you call an event handler is a routine that gets called from a message
queue polling routine. You said a few times that you don't want that.

The queue polling routine runs in the context of the thread. If any of the
actions in that thread takes too long, it will prevent the queue polling
routine from running, and therefore the event won't get handled. This is
exactly the scenario that you seem to want to avoid. Event handlers are not
anything multitask or multithread, they are simple polling mechanisms with
an event queue. It just seems that they act preemtively, when you can click
on one button and another button becomes disabled :)

There are of course also semaphores. But they also have to either get
polled like GUI events, or the thread just goes to sleep until the
semaphore wakes it up. You need to understand this basic limitation: a
processor can only execute statements. Either it is doing other things,
then it must, by programming, check the queue -- this is polling. Or it can
suspend itself (the thread or process) and tell the OS (or the thread
handling mechanism) to wake it up when a message arrives in a queue or a
semaphore gets active.

You need to look a bit under the hood, so to speak... That's why I said in
the other message that I think it would do you some good to read up a bit
on multitasking OS programming techniques in general. There are not that
many, in principle, but it helps to understand the basics.

Gerhard
 
B

bryanjugglercryptographer

Gerhard said:
What you call an event handler is a routine that gets called from a message
queue polling routine. You said a few times that you don't want that.

I think he's refering to Unix signal handlers. These really are called
asynchronously. When the signal comes in, the system pushes some
registers on the stack, calls the signal handler, and when the signal
handler returns it pops the registers off the stack and resumes
execution where it left off, more or less. If the signal comes while
the
process is in certain system calls, the call returns with a value or
errno setting that indicated it was interrupted by a signal.

Unix signals are an awkward low-level relic. They used to be the only
way to do non-blocking but non-polling I/O, but current systems offer
much better ways. Today the sensible things to do upon receiving a
signal are ignore it or terminate the process. My opinion, obviously.
 
H

H J van Rooyen

8<----------------------------------------------------------------


| point). Its not only important that the threads die, but that they die
| with grace. There's lots of cleanup work that has to be done when
| things exit or things end up in an indeterminable state.
|
| So, I feel like I have a couple options,
|
| 1) try moving everything to a process oriented configuration - we think
| this would be bad, from a resource standpoint as well as it would make
| things more difficult to move to a fully distributed system later, when
| I get my army of code monkeys.
|
| 2) Suck it up and go straight for the distributed system now - managers
| don't like this, but maybe its easier than I think its going to be, I dunno
|
| 3) See if we can find some other way of getting the threads to terminate.
|
| 4) Kill it and clean it up by hand or helper scripts - we don't want to
| do this either, its one of the major things we're trying to get away from.

8<-----------------------------------------------------------------------------

This may be a stupid suggestion - If I understand what you are doing, its
essentially running a bunch of compilers with different options on various
machines around the place - so there is a fifth option - namely to do nothing -
let them finish and just throw the output away - i.e. just automate the
cleanup...

- Hendrik
 
D

Dennis Lee Bieber

scenario was to fork on thread per build. The threads invoke a series
of calls that look like

os.system(ssh <host> <command>)

or for more complex operations they would just spawn a process that ran
another python script)

os.system(ssh <host> <script>)

Ugh... Seems to me it would be better to find some Python library
for SSH, something similar to telnetlib, rather than doing an
os.system() per command line. EACH of those os.system() calls probably
causes a full fork() operation on Linux/UNIX, and the equivalent on
Windows (along with loading a command shell interpreter to handle the
actual statement).
* The thread constantly needed information about the state of the
system (for example we don't want to end up building the same
architecture twice)

That would seem to specify the use of some central build database of
architecture&build commands needed...
* We wanted a centralized point of control for an entire build
Same...

* We needed to be able to use as many machines as possible from a
central location.
I presume you mean "concurrently" -- otherwise a complex yet single
makefile could be forced into the use...
Python threads worked very well for this. os.system behaves a lot like
many other IO operations in python and the interpreter gives up the

Given that every os.system(), as mentioned, is creating a whole
separate process, and waiting for that process to complete -- so, of
course, the GIL is released... Even a time.sleep(0) will trigger a GIL
release/thread swap.
any problems. There wasn't much of a need to do fork, all it would have
done is increased the amount of memory used by the system.
NO... You ARE doing the equivalent of a fork with each os.system(),
with the original thread doing the equivalent of a waitpid() on the
child process.

fork() does not duplicate loaded executable code. It basically
creates a new process environment and stack. Only when the child does
something that would change the code pages in memory is new memory
allocated -- and that would occur as soon as the child loads a command
shell.
place. We don't want to use a polling loop for these threads to check
for messages, we wanted to use something event driven (I mistakenly used
the word interrupt in earlier posts, but I think it still illustrates my

Do you know how event driven frameworks operate?

Behind the scenes they use a polling loop. Even twisted has an event
loop somewhere in its core. On the archaic Amiga, the windowing system
required the application programmer to even code the event loop dispatch
logic -- practically all modern GUI systems have a standard event loop
and one registers functions to be called by the event loop. I've coded
on systems where even a <ctrl-c> requires the program to perform some
sort of I/O operation to be recognized.

After registering the functions to event(name)s, you start the
"mainloop". The mainloop essentially polls, fetching each event from the
OS, determining the type of event, and scans the list of
event/functions, calling the function (one at a time) if the event
matches.

Event handlers have to be fast and short, because while an event is
being handled, other events are being queued and held.
1) try moving everything to a process oriented configuration - we think
this would be bad, from a resource standpoint as well as it would make
things more difficult to move to a fully distributed system later, when
I get my army of code monkeys.
Perhaps... perhaps not... If each process can be dispatched to run
on remote computers based upon availability...
2) Suck it up and go straight for the distributed system now - managers
don't like this, but maybe its easier than I think its going to be, I dunno
I don't think you've managed to lay out the specifications well
enough to start on this... fully distributed will require, in my mind,
defining RPC/CORBA type protocols...

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
D

Dennis Lee Bieber

Well, I guess I'm thinking of an event driven mechanism, kinda like
setting up signal handlers. I don't necessarily know how it works under
the hood, but I don't poll for a signal. I setup a handler, when the
signal comes, if it comes, the handler gets thrown into action. That's
what I'd be interesting in doing with threads.
Well, first off, something in the run-time IS doing the equivalent
of polling. When IT detects the condition of a registered event has
taken place, it calls the registered handler as a normal subroutine.
Now, that polling may seem transparent if it is tied to, say the OS I/O
operations. This means that, if the process never performs any I/O, the
"polling" of the events never happens. Basically, as part of the I/O
operation, the OS checks for whatever data signals an event (some bit
set in the process table, etc.) has happened and actively calls the
registered handler. But that handler runs as a subroutine and returns.

How would this handler terminate a thread? It doesn't run /as/ the
thread, it runs in the context of whatever invoked the I/O, and a return
is made to that context after the handler finishes.

You are back to the original problem -- such an event handler, when
called by the OS, could do nothing more than set some data item that is
part of the thread, and return... The thread is STILL responsible for
periodically testing the data item and exiting if it is set.

If your thread is waiting for one of those many os.system() calls to
return, you need to find some way to kill the child process (which may
be something running on the remote end, since you emphasize using SSH).

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
P

Paul Rubin

Dennis Lee Bieber said:
Ugh... Seems to me it would be better to find some Python library
for SSH, something similar to telnetlib, rather than doing an
os.system() per command line. EACH of those os.system() calls probably
causes a full fork() operation on Linux/UNIX, and the equivalent on
Windows (along with loading a command shell interpreter to handle the
actual statement).

I think Carl is using Linux, so the awful overhead of process creation
in Windows doesn't apply. Forking in Linux isn't that big a deal.
os.system() usually forks a shell, and the shell forks the actual
command, but even two forks per ssh is no big deal. The Apache web
server usually runs with a few hundred processes, etc. Carl, just how
many of these ssh's do you need active at once? If it's a few hundred
or less, I just wouldn't worry about these optimizations you're asking
about.
 
B

bryanjugglercryptographer

Carl said:
Yea, this I understand, maybe I chose some poor words to describe what I
wanted.

Ya' think? Looks like you have no particular need for shared
memory, in your small distributed system.
I think this conversation is getting hairy and confusing so I'm
going to try and paint a better picture of what's going on. Maybe this
will help you understand exactly what's going on or at least what I'm
trying to do, because I feel like we're just running in circles.
[...]

So step out of the circles already. You don't have a Python thread
problem. You don't have a process overhead problem.

[...]
So, I have a distributed build system. [...]

Not a trivial problem, but let's not pretend we're pushing the
state of the art here.

Looks like the system you inherited already does some things
smartly: you have ssh set up so that a controller machine can
launch various build steps on a few dozen worker machines.

[...]
The threads invoke a series
of calls that look like

os.system(ssh <host> <command>)

or for more complex operations they would just spawn a process that ran
another python script)

os.system(ssh <host> <script>) [...]
Alright, so this scheme that was first put in place kind of worked.
There were some problems, for example when someone did something like
os.system(ssh <host> <script>) we had no good way of knowing what the
hell happened in the script.

Yeah, that's one thing we've been telling you. The os.system()
function doesn't give you enough information nor enough control.
Use one of the alternatives we've suggested -- probably the
subprocess.Popen class.

[...]
So, I feel like I have a couple options,

1) try moving everything to a process oriented configuration - we think
this would be bad, from a resource standpoint as well as it would make
things more difficult to move to a fully distributed system later, when
I get my army of code monkeys.

2) Suck it up and go straight for the distributed system now - managers
don't like this, but maybe its easier than I think its going to be, I dunno

3) See if we can find some other way of getting the threads to terminate.

4) Kill it and clean it up by hand or helper scripts - we don't want to
do this either, its one of the major things we're trying to get away from.

The more you explain, the sillier that feeling looks -- that those
are your options. Focus on the problems you actually have. Track
what build steps worked as expected; log what useful information
you have about the ones that did not.

That "resource standpoint" thing doesn't really make sense. Those
os.system() calls launch *at least* one more process. Some
implementations will launch a process to run a shell, and the
shell will launch another process to run the named command. Even
so, efficiency on the controller machine is not a problem given
the scale you have described.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,149
Latest member
Vinay Kumar Nevatia0
Top