Recommended number of threads? (in CPython)

M

mk

Hello everyone,

I wrote run-of-the-mill program for concurrent execution of ssh command
over a large number of hosts. (someone may ask why reinvent the wheel
when there's pssh and shmux around -- I'm not happy with working details
and lack of some options in either program)

The program has a working queue of threads so that no more than
maxthreads number are created and working at particular time.

But this begs the question: what is the recommended number of threads
working concurrently? If it's dependent on task, the task is: open ssh
connection, execute command (then the main thread loops over the queue
and if the thread is finished, it closes ssh connection and does .join()
on the thread)

I found that when using more than several hundred threads causes weird
exceptions to be thrown *sometimes* (rarely actually, but it happens
from time to time). Although that might be dependent on modules used in
threads (I'm using paramiko, which is claimed to be thread safe).
 
F

Falcolas

Hello everyone,

I wrote run-of-the-mill program for concurrent execution of ssh command
over a large number of hosts. (someone may ask why reinvent the wheel
when there's pssh and shmux around -- I'm not happy with working details
and lack of some options in either program)

The program has a working queue of threads so that no more than
maxthreads number are created and working at particular time.

But this begs the question: what is the recommended number of threads
working concurrently? If it's dependent on task, the task is: open ssh
connection, execute command (then the main thread loops over the queue
and if the thread is finished, it closes ssh connection and does .join()
on the thread)

I found that when using more than several hundred threads causes weird
exceptions to be thrown *sometimes* (rarely actually, but it happens
from time to time). Although that might be dependent on modules used in
threads (I'm using paramiko, which is claimed to be thread safe).

Since you're creating OS threads when doing this, your issue is
probably more related to your OS' implementation of threads than
Python. That said, several hundred threads, regardless of them being
blocked by the GIL, sounds like a recipe for trouble on most machines,
but as usual YMMV.

If you're running into problems with a large number of connections
(not related to a socket limit), you might look into doing it
asynchronously - loop over a list of connections and do non-blocking
reads to see if your command has completed. I've done this
successfully with pexpect, and didn't run into any issues with the
underlying OS.

Garrick
 
N

Neil Hodgson

mk:
I found that when using more than several hundred threads causes weird
exceptions to be thrown *sometimes* (rarely actually, but it happens
from time to time).

If you are running on a 32-bit environment, it is common to run out
of address space with many threads. Each thread allocates a stack and
this allocation may be as large as 10 Megabytes on Linux. With a 4
Gigabyte 32-bit address space this means that the maximum number of
threads will be 400. In practice, the operating system will further
subdivide the address space so only 200 to 300 threads will be possible.
On Windows, I think the normal stack allocation is 1 Megabyte.

The allocation is only of address space, not memory since memory can
be mapped into this space when it is needed and many threads do not need
very much stack.

Neil
 
P

Paul Rubin

Neil Hodgson said:
If you are running on a 32-bit environment, it is common to run out
of address space with many threads. Each thread allocates a stack and
this allocation may be as large as 10 Megabytes on Linux.

I'm sure it's smaller than that under most circumstances. I run
python programs with hundreds of threads all the time, and they don't
use gigabytes of memory.
 
D

Dave Angel

Paul said:
I'm sure it's smaller than that under most circumstances. I run
python programs with hundreds of threads all the time, and they don't
use gigabytes of memory.
As Neil pointed out further on, in the same message you quoted, address
space is not the same as allocated memory. It's easy to run out of
allocatable address space long before you run out of virtual memory, or
swap space.

Any time a buffer is needed that will need to be contiguous (such as a
return stack), the address space for the max possible size must be
reserved, but the actual virtual memory allocations (which is what you
see when you're using the system utilities to display memory usage) are
done incrementally, as needed.

It's been several years, but I believe the two terms on Windows are
"reserve" and "commit." Reserve is done in multiples of 64k, and commit
in multiples of 4k.

DaveA
 
A

Aahz

I wrote run-of-the-mill program for concurrent execution of ssh command
over a large number of hosts. (someone may ask why reinvent the wheel
when there's pssh and shmux around -- I'm not happy with working details
and lack of some options in either program)

The program has a working queue of threads so that no more than
maxthreads number are created and working at particular time.

But this begs the question: what is the recommended number of threads
working concurrently? If it's dependent on task, the task is: open ssh
connection, execute command (then the main thread loops over the queue
and if the thread is finished, it closes ssh connection and does .join()
on the thread)

Given that you code is not just I/O-bound but wait-bound, I suggest
following the suggestion to use asynch code -- then you could open a
connection to every single machine simultaneously. Assuming your system
setup can handle the load, that is.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

[on old computer technologies and programmers] "Fancy tail fins on a
brand new '59 Cadillac didn't mean throwing out a whole generation of
mechanics who started with model As." --Andrew Dalke
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top