Recommended number of threads? (in CPython)

Discussion in 'Python' started by mk, Oct 29, 2009.

  1. mk

    mk Guest

    Hello everyone,

    I wrote run-of-the-mill program for concurrent execution of ssh command
    over a large number of hosts. (someone may ask why reinvent the wheel
    when there's pssh and shmux around -- I'm not happy with working details
    and lack of some options in either program)

    The program has a working queue of threads so that no more than
    maxthreads number are created and working at particular time.

    But this begs the question: what is the recommended number of threads
    working concurrently? If it's dependent on task, the task is: open ssh
    connection, execute command (then the main thread loops over the queue
    and if the thread is finished, it closes ssh connection and does .join()
    on the thread)

    I found that when using more than several hundred threads causes weird
    exceptions to be thrown *sometimes* (rarely actually, but it happens
    from time to time). Although that might be dependent on modules used in
    threads (I'm using paramiko, which is claimed to be thread safe).
     
    mk, Oct 29, 2009
    #1
    1. Advertising

  2. mk

    Falcolas Guest

    On Oct 29, 9:56 am, mk <> wrote:
    > Hello everyone,
    >
    > I wrote run-of-the-mill program for concurrent execution of ssh command
    > over a large number of hosts. (someone may ask why reinvent the wheel
    > when there's pssh and shmux around -- I'm not happy with working details
    > and lack of some options in either program)
    >
    > The program has a working queue of threads so that no more than
    > maxthreads number are created and working at particular time.
    >
    > But this begs the question: what is the recommended number of threads
    > working concurrently? If it's dependent on task, the task is: open ssh
    > connection, execute command (then the main thread loops over the queue
    > and if the thread is finished, it closes ssh connection and does .join()
    > on the thread)
    >
    > I found that when using more than several hundred threads causes weird
    > exceptions to be thrown *sometimes* (rarely actually, but it happens
    > from time to time). Although that might be dependent on modules used in
    > threads (I'm using paramiko, which is claimed to be thread safe).


    Since you're creating OS threads when doing this, your issue is
    probably more related to your OS' implementation of threads than
    Python. That said, several hundred threads, regardless of them being
    blocked by the GIL, sounds like a recipe for trouble on most machines,
    but as usual YMMV.

    If you're running into problems with a large number of connections
    (not related to a socket limit), you might look into doing it
    asynchronously - loop over a list of connections and do non-blocking
    reads to see if your command has completed. I've done this
    successfully with pexpect, and didn't run into any issues with the
    underlying OS.

    Garrick
     
    Falcolas, Oct 29, 2009
    #2
    1. Advertising

  3. mk

    Neil Hodgson Guest

    mk:

    > I found that when using more than several hundred threads causes weird
    > exceptions to be thrown *sometimes* (rarely actually, but it happens
    > from time to time).


    If you are running on a 32-bit environment, it is common to run out
    of address space with many threads. Each thread allocates a stack and
    this allocation may be as large as 10 Megabytes on Linux. With a 4
    Gigabyte 32-bit address space this means that the maximum number of
    threads will be 400. In practice, the operating system will further
    subdivide the address space so only 200 to 300 threads will be possible.
    On Windows, I think the normal stack allocation is 1 Megabyte.

    The allocation is only of address space, not memory since memory can
    be mapped into this space when it is needed and many threads do not need
    very much stack.

    Neil
     
    Neil Hodgson, Oct 29, 2009
    #3
  4. mk

    Paul Rubin Guest

    Neil Hodgson <> writes:
    > If you are running on a 32-bit environment, it is common to run out
    > of address space with many threads. Each thread allocates a stack and
    > this allocation may be as large as 10 Megabytes on Linux.


    I'm sure it's smaller than that under most circumstances. I run
    python programs with hundreds of threads all the time, and they don't
    use gigabytes of memory.
     
    Paul Rubin, Oct 29, 2009
    #4
  5. mk

    Dave Angel Guest

    Paul Rubin wrote:
    > Neil Hodgson <> writes:
    >
    >> If you are running on a 32-bit environment, it is common to run out
    >> of address space with many threads. Each thread allocates a stack and
    >> this allocation may be as large as 10 Megabytes on Linux.
    >>

    >
    > I'm sure it's smaller than that under most circumstances. I run
    > python programs with hundreds of threads all the time, and they don't
    > use gigabytes of memory.
    >
    >

    As Neil pointed out further on, in the same message you quoted, address
    space is not the same as allocated memory. It's easy to run out of
    allocatable address space long before you run out of virtual memory, or
    swap space.

    Any time a buffer is needed that will need to be contiguous (such as a
    return stack), the address space for the max possible size must be
    reserved, but the actual virtual memory allocations (which is what you
    see when you're using the system utilities to display memory usage) are
    done incrementally, as needed.

    It's been several years, but I believe the two terms on Windows are
    "reserve" and "commit." Reserve is done in multiples of 64k, and commit
    in multiples of 4k.

    DaveA
     
    Dave Angel, Oct 29, 2009
    #5
  6. mk

    Aahz Guest

    In article <>,
    mk <> wrote:
    >
    >I wrote run-of-the-mill program for concurrent execution of ssh command
    >over a large number of hosts. (someone may ask why reinvent the wheel
    >when there's pssh and shmux around -- I'm not happy with working details
    >and lack of some options in either program)
    >
    >The program has a working queue of threads so that no more than
    >maxthreads number are created and working at particular time.
    >
    >But this begs the question: what is the recommended number of threads
    >working concurrently? If it's dependent on task, the task is: open ssh
    >connection, execute command (then the main thread loops over the queue
    >and if the thread is finished, it closes ssh connection and does .join()
    >on the thread)


    Given that you code is not just I/O-bound but wait-bound, I suggest
    following the suggestion to use asynch code -- then you could open a
    connection to every single machine simultaneously. Assuming your system
    setup can handle the load, that is.
    --
    Aahz () <*> http://www.pythoncraft.com/

    [on old computer technologies and programmers] "Fancy tail fins on a
    brand new '59 Cadillac didn't mean throwing out a whole generation of
    mechanics who started with model As." --Andrew Dalke
     
    Aahz, Nov 2, 2009
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. York
    Replies:
    1
    Views:
    380
    Chris Liechti
    Sep 10, 2003
  2. Boris Boutillier

    Compiling a CPython library on Mac OSX

    Boris Boutillier, Dec 12, 2003, in forum: Python
    Replies:
    3
    Views:
    393
  3. David McNab
    Replies:
    0
    Views:
    509
    David McNab
    Apr 9, 2004
  4. Randall Smith

    jython 2 cpython bridge

    Randall Smith, May 24, 2004, in forum: Python
    Replies:
    10
    Views:
    1,817
    John Mudd
    Jun 5, 2004
  5. how does CPython work?

    , May 26, 2004, in forum: Python
    Replies:
    3
    Views:
    407
    Stefan Seefeld
    May 27, 2004
Loading...

Share This Page