What's the cost of using hundreds of threads?

  • Thread starter =?ISO-8859-2?Q?Przemys=B3aw_R=F3=BFycki?=
  • Start date
?

=?ISO-8859-2?Q?Przemys=B3aw_R=F3=BFycki?=

Hello,

I have written some code, which creates many threads for each connection
('main connection'). The purpose of this code is to balance the load
between several connections ('pipes'). The number of spawned threads
depends on how many pipes I create (= 2*n+2, where n is the number of
pipes).

For good results I'll presumably share main connection's load between 10
pipes - therefore 22 threads will be spawned. Now if about 50
connections are forwarded the number of threads rises to thousand of
threads (or several thousands if even more connections are established).

My questions are:
- What is the cost (in memory / CPU usage) of creating such amounts of
threads?
- Is there any 'upper boundary' that limits the number of threads? (is
it python / OS related)
- Is that the sign of 'clumsy programming' - i.e. Is creating so many
threads a bad habit? (I must say that it simplified the solution of my
problem very much).

Limiting the number of threads is possible, but would affect the
independence of data flows. (ok I admit - creating tricky algorithm
could perhaps gurantee concurrency without spawning so many threads -
but it's the simplest solution to this problem :) ).
 
W

wes weston

Przemys³aw Ró¿ycki said:
Hello,

I have written some code, which creates many threads for each connection
('main connection'). The purpose of this code is to balance the load
between several connections ('pipes'). The number of spawned threads
depends on how many pipes I create (= 2*n+2, where n is the number of
pipes).

For good results I'll presumably share main connection's load between 10
pipes - therefore 22 threads will be spawned. Now if about 50
connections are forwarded the number of threads rises to thousand of
threads (or several thousands if even more connections are established).

My questions are:
- What is the cost (in memory / CPU usage) of creating such amounts of
threads?
- Is there any 'upper boundary' that limits the number of threads? (is
it python / OS related)
- Is that the sign of 'clumsy programming' - i.e. Is creating so many
threads a bad habit? (I must say that it simplified the solution of my
problem very much).

Limiting the number of threads is possible, but would affect the
independence of data flows. (ok I admit - creating tricky algorithm
could perhaps gurantee concurrency without spawning so many threads -
but it's the simplest solution to this problem :) ).

PR,
I notice there's a resource module with a
getrusage(who) that looks like it would support
a test to get what you need.
wes
 
J

Jarek Zgoda

Przemys³aw Ró¿ycki napisa³(a):
- Is there any 'upper boundary' that limits the number of threads? (is
it python / OS related)
- Is that the sign of 'clumsy programming' - i.e. Is creating so many
threads a bad habit? (I must say that it simplified the solution of my
problem very much).

I've read somewhere (I cann't recall where, though, was it MSDN?) that
Windows is not well suited to run more than 32 threads per process. Most
of the code I saw doesn't spawn more threads than a half of this.
 
S

Steve Holden

Jarek said:
Przemys³aw Ró¿ycki napisa³(a):



I've read somewhere (I cann't recall where, though, was it MSDN?) that
Windows is not well suited to run more than 32 threads per process. Most
of the code I saw doesn't spawn more threads than a half of this.
This is apocryphal. Do you have any hard evidence for this assertion?

Apache, for example, can easily spawn more threads under Windows, and
I've written code that uses 200 threads with excellent performance.
Things seem to slow down around the 2,000 mark for some reason I'm not
familiar with.

regards
Steve
 
A

Aahz

=?ISO-8859-2?Q?Przemys=B3aw_R=F3=BFycki?=
said:
I have written some code, which creates many threads for each connection
('main connection'). The purpose of this code is to balance the load
between several connections ('pipes'). The number of spawned threads
depends on how many pipes I create (= 2*n+2, where n is the number of
pipes).

For good results I'll presumably share main connection's load between 10
pipes - therefore 22 threads will be spawned. Now if about 50
connections are forwarded the number of threads rises to thousand of
threads (or several thousands if even more connections are established).

I'm a bit confused by your math. Fifty connections should be 102
threads, which is quite reasonable.
My questions are:
- What is the cost (in memory / CPU usage) of creating such amounts of
threads?
- Is there any 'upper boundary' that limits the number of threads? (is
it python / OS related)
- Is that the sign of 'clumsy programming' - i.e. Is creating so many
threads a bad habit? (I must say that it simplified the solution of my
problem very much).

Limiting the number of threads is possible, but would affect the
independence of data flows. (ok I admit - creating tricky algorithm
could perhaps gurantee concurrency without spawning so many threads -
but it's the simplest solution to this problem :) ).

My experience with lots of threads dates back to Python 1.5.2, but I
rarely saw much improvement with more than a hundred threads, even for
heavily I/O-bound applications on a multi-CPU system. However, if your
focus is algorithmic complexity, you should be able to handle a couple of
thousand threads easily enough.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
C

Cameron Laird

.
.
.
This is apocryphal. Do you have any hard evidence for this assertion?

Apache, for example, can easily spawn more threads under Windows, and
I've written code that uses 200 threads with excellent performance.
Things seem to slow down around the 2,000 mark for some reason I'm not
familiar with.
.
.
.
I'll support Mr. Zgoda's apocrypha. The thing is, as so often
obtains, you're both right--early Windows flavors could dismember
themselves entertainingly when a process launched a few dozen
threads, but WinXP vastly improves that condition.

I assert that I could substantiate my claims with appropriate
references. I choose not to do so today.
 
?

=?ISO-8859-2?Q?Przemys=B3aw_R=F3=BFycki?=

I'm a bit confused by your math. Fifty connections should be 102
threads, which is quite reasonable.

My formula applies to one forwarded ('loadbalanced') connection. Every
such connection creates further n connections (pipes) which share the
load. Every pipe requires two threads to be spawned. Every 'main
connection' spawns two other threads - so my formula: 2*pipes+2 gives
the number of threads spawned per 'main connection'.

Now if connections_count connections are established the thread count
equals:
conn_count * threads_per_main_connection = conn_count * (2*pipes+2)

For 50 connections and about 10 pipes it will give 1100 threads.
My experience with lots of threads dates back to Python 1.5.2, but I
rarely saw much improvement with more than a hundred threads, even for
heavily I/O-bound applications on a multi-CPU system. However, if your
focus is algorithmic complexity, you should be able to handle a couple of
thousand threads easily enough.

I don't spawn them because of computional reasons, but due to the fact
that it makes my code much more simpler. I use built-in tcp features to
achieve loadbalancing - every flow (directed through pipe) has it's own
dedicated threads - separate for down- and upload. For every 'main
connection' these threads share send and receive buffer. If any of pipes
is congested the corresponding threads block on their send / recv
functions - without affecting independence of data flows.

Using threads gives me VERY simple code. To achieve this with poll /
select would be much more difficult. And to guarantee concurrency and
maximal throughput for all of pipes I would probably have to mirror code
from linux TCP stack (I mean window shifting, data acknowlegement,
retransmission queues). Or perhaps I exaggerate.
 
?

=?ISO-8859-2?Q?Przemys=B3aw_R=F3=BFycki?=

Thanks for your comments on winXP threads implementation. You confirmed
me in conviction that I shouldn't use windows.
Personally I use linux with 2.6.10 kernel, so hopefully I don't have to
share your grief. ;)
 
N

Nick Coghlan

Steve said:
Apache, for example, can easily spawn more threads under Windows, and
I've written code that uses 200 threads with excellent performance.
Things seem to slow down around the 2,000 mark for some reason I'm not
familiar with.

As far as I know, the default Windows thread stack size is 2 MB. Do the math :)

On NT4, beyond a couple of hundred threads a *heck* of a lot of time ends up
being spent in the kernel doing context switches (and you can kiss even vaguely
deterministic response times good-bye).

Using a more recent version of Windows improves matters significantly.

Cheers,
Nick.
 
C

Cameron Laird

Thanks for your comments on winXP threads implementation. You confirmed
me in conviction that I shouldn't use windows.
Personally I use linux with 2.6.10 kernel, so hopefully I don't have to
share your grief. ;)

? !? I'm confused, and apparently I'm confusing others.
The one message I posted in this thread--largely reinforced
by others--emphasizes only that WinXP is far *better* than
earlier Win* flavors in its thread management. While I not
only agree that Windows has disadvantages, but have stopped
buying it for our company, my reasons have absolutely nothing
to do with the details of implementation of WinXP.
 
?

=?ISO-8859-2?Q?Przemys=B3aw_R=F3=BFycki?=

to share your grief. ;)
>
>
>
> ? !? I'm confused, and apparently I'm confusing others.
> The one message I posted in this thread--largely reinforced
> by others--emphasizes only that WinXP is far *better* than
> earlier Win* flavors in its thread management. While I not
> only agree that Windows has disadvantages, but have stopped
> buying it for our company, my reasons have absolutely nothing
> to do with the details of implementation of WinXP.
>
:) . Ok, perhaps my answer wasn't that precise. I wrote my post only to
say that your discussion on windows' threading performace doesn't
concern me - because my program is written for linux environment. And
yes, I agree that my comment could sound a bit enigmatic.
 
A

Aahz

=?ISO-8859-2?Q?Przemys=B3aw_R=F3=BFycki?=
said:
I don't spawn them because of computional reasons, but due to the fact
that it makes my code much more simpler. I use built-in tcp features
to achieve loadbalancing - every flow (directed through pipe) has it's
own dedicated threads - separate for down- and upload. For every 'main
connection' these threads share send and receive buffer. If any of
pipes is congested the corresponding threads block on their send / recv
functions - without affecting independence of data flows.

Using threads gives me VERY simple code. To achieve this with poll /
select would be much more difficult. And to guarantee concurrency and
maximal throughput for all of pipes I would probably have to mirror
code from linux TCP stack (I mean window shifting, data acknowlegement,
retransmission queues). Or perhaps I exaggerate.

Maybe it would help if you explained what these "pipes" do. Based on
what you've said so far, what I'd do in your situation is create one
thread per pipe and one thread per connection, then use Queue to move
data between threads.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top