Thread Pool versus Dedicated Threads

Discussion in 'C++' started by Ò»Ê×Ê«, Aug 14, 2008.

  1. Ò»Ê×Ê«

    Ò»Ê×Ê« Guest

    Hi all,

    Recently I had a new coworker. There is some dispute between us.

    The last company he worked for has a special networking programming
    model. They split the business logic into different modules, and have
    a dedicated thread for the each module. Modules exchanged info
    through a in-memory message queue.

    In my opinion, such a model means very complicated asynchronous
    interaction between module. A simple function call between modules
    would require a timer to avoid waiting for answer forever.
    And if a module was blocked by IO (such as db query), other modules
    depends on would have to wait for it.

    For example, if module A want to query db, it would

    1. save states in a list
    2 .sending a message to db-adapter module (a thread dedicated for db
    operation)
    3. start a timer
    4. if response message arrived on time, retrieve states from the
    list, and go on
    5. if timer fires, log an error message and cancel the operation ——
    send an error notify to user……

    My new coworker had written 300,000 lines of code in this model and
    claimed this is the most simple way to write a network application.
    He said we could implement a message queue in half-a day and message
    would make interface much more clear.

    I think if module interact with each other through function calls and
    a thread/process pool model would be more easier, in which each
    thread/
    process has no dedicated job but handle whatever the master thread
    give it.

    But as I don't have much experience in this area, I am not quite
    sure.

    What do u think about it? Is there any successful projects that could
    prove which model is **right**?
    Ò»Ê×Ê«, Aug 14, 2008
    #1
    1. Advertising

  2. Ò»Ê×Ê«

    Ian Collins Guest

    一首诗 wrote:
    > Hi all,
    >

    <snip>

    While interesting, there isn't really a C++ question in there. You
    would get more insight on comp.programming.threads.

    --
    Ian Collins.
    Ian Collins, Aug 14, 2008
    #2
    1. Advertising

  3. Ò»Ê×Ê«

    James Kanze Guest

    On Aug 14, 8:20 am, ??? <> wrote:
    > Recently I had a new coworker. There is some dispute between us.


    > The last company he worked for has a special networking
    > programming model. They split the business logic into
    > different modules, and have a dedicated thread for the each
    > module. Modules exchanged info through a in-memory message
    > queue.


    > In my opinion, such a model means very complicated
    > asynchronous interaction between module.


    If there's common data, there's always a more or less
    complicated asynchronous interaction between modules. The
    dedicated thread model normally reduces the "common data" to
    just the message queue, which makes things significantly
    simpler.

    > A simple function call between modules would require a timer
    > to avoid waiting for answer forever. And if a module was
    > blocked by IO (such as db query), other modules depends on
    > would have to wait for it.


    Yup. That's the downside. The single, dedicated thread is (or
    can be) a bottleneck. Of course, such bottlenecks can occur
    anyway; if your manipulating a shared resource, for example,
    which needs locking.

    > For example, if module A want to query db, it would


    > 1. save states in a list
    > 2 .sending a message to db-adapter module (a thread dedicated for db
    > operation)
    > 3. start a timer
    > 4. if response message arrived on time, retrieve states from the
    > list, and go on
    > 5. if timer fires, log an error message and cancel the operation ??
    > send an error notify to user??


    I'd put the time-out in the DB adapter module. Other than this:
    what's the difference between putting the request in a single
    block and posting it to the message queue, and passing the
    information as arguments to a function?

    > My new coworker had written 300,000 lines of code in this
    > model and claimed this is the most simple way to write a
    > network application. He said we could implement a message
    > queue in half-a day and message would make interface much more
    > clear.


    It's typically easier to get the code right using the message
    queue, but it's not a silver bullet. You can still end up with
    deadlocks. But you're much less likely to have problems due to
    two threads accessing the same data without sufficient
    synchronization.

    > I think if module interact with each other through function
    > calls and a thread/process pool model would be more easier, in
    > which each thread/ process has no dedicated job but handle
    > whatever the master thread give it.


    A "thread/process pool model" doesn't mean anything. I'm not
    sure what real alternative you're suggesting. Most places I've
    worked at use a thread per client connection; on receiving a
    request, the thread either grabs whatever locks it needs and
    does the work, or forwards it to the dedicated thread (which
    then doesn't need any locks, because it is the only thread which
    accesses the information). Both models work. Which one is
    better depends on the application.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Aug 14, 2008
    #3
  4. On 2008-08-14 08:20, 一首诗 wrote:
    > Hi all,
    >
    > Recently I had a new coworker. There is some dispute between us.
    >
    > The last company he worked for has a special networking programming
    > model. They split the business logic into different modules, and have
    > a dedicated thread for the each module. Modules exchanged info
    > through a in-memory message queue.
    >
    > In my opinion, such a model means very complicated asynchronous
    > interaction between module. A simple function call between modules
    > would require a timer to avoid waiting for answer forever.
    > And if a module was blocked by IO (such as db query), other modules
    > depends on would have to wait for it.


    > What do u think about it? Is there any successful projects that could
    > prove which model is **right**?


    In general I think you might be right, but when dealing with networking
    there is usually a very layered architecture with one-way communication
    between the layers (i.e. a lower layer passing the processed data up to
    a higher layer). In that case the message-passing model makes very much
    sense since it models the actual workings very well and makes each layer
    simple to implement (if there are any packages in the in-queue you
    process it and put the result in the out-queue, if there are no packages
    in the in-queue you wait 'till there are).

    For other kinds of tasks it might be easier to let one thread handle the
    work-package in all the steps (and modules). Of course there are other
    models and combinations, and which one is the best for a given purpose
    is not always clear until you have tried a few.

    --
    Erik Wikström
    Erik Wikström, Aug 14, 2008
    #4
  5. Ò»Ê×Ê«

    James Kanze Guest

    On Aug 14, 4:24 pm, "Chris Becke" <> wrote:
    > "James Kanze" <> wrote:
    > > A "thread/process pool model" doesn't mean anything. I'm not
    > >sure what real alternative you're suggesting.


    > The real alternative is to create a similar message queue
    > design, but completely break the relationship of client
    > connections to threads.


    That's a valid solution if there is no client specific data.
    That's not always the case, however.

    > Client connections exist on as many or as few threads as
    > needed by the scalibility of the comms library. Requests
    > coming in are packaged and posted to a message queue. A pool
    > of worker threads, proportional to the number of virtual CPUs
    > in the server (rather than the number of client connections)
    > pull requests from the queue, process the request, and then go
    > back to see if theres anything in the queue to process.


    Again, it depends. If requests constantly use shared data,
    there's no point in having more than one thread to handle them.
    If requests never use shared data, there's no point in not
    handling them immediately in the receiving thread.

    > This sort of design can ultimately be far better tuned to keep
    > the CPU cores as busy as possible, while minimising needless
    > context switches from having an "active" thread count far in
    > excess of CPU availability.


    Do you have actual measurements from a real application to
    support this claim. I doubt that it's true for most
    applications.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Aug 14, 2008
    #5
  6. Ò»Ê×Ê«

    Ò»Ê×Ê« Guest

    Hi all,

    Thanks for all your help! After I read all your posts and reconsider
    my coworker's arguments, I think some explanation may be needed.

    1. Chris explained exactly my real alternative solutions in this post.

    2. Also as James pointed out, the most valuable point of a 'dedicated
    model' is that no lock is needed as only one thread would touch the
    data.

    3. As Erik wrote "there is usually a very layered architecture",
    whether there should have a layered architecture, is a key
    consideration whether to use a 'dedicated model'.

    4. About shared data. Yes, of course there are shared data between
    each client. Actually we are building an SIP server for VOIP and
    Instant Message. But in the case of a web server, isn't there are
    also shared data between each client? ...

    (Sorry I have to attend a meeting, I will further explain my
    consideration later.)

    On Aug 14, 10:24 pm, "Chris Becke" <> wrote:
    > "James Kanze" <> wrote:
    > > A "thread/process pool model" doesn't mean anything.  I'm not
    > >sure what real alternative you're suggesting.

    >
    > The real alternative is to create a similar message queue design, but completely break the relationship of client connections to threads. Client connections exist on as many or as few threads as needed by the scalibility of the comms library. Requests coming in are packaged and posted to a message queue.
    > A pool of worker threads, proportional to the number of virtual CPUs in the server (rather than the number of client connections) pull requests from the queue, process the request, and then go back to see if theres anything in the queue to process.
    >
    > This sort of design can ultimately be far better tuned to keep the CPU cores as busy as possible, while minimising needless context switches from having an "active" thread count far in excess of CPU availability. Given a database server that can, likewise, process multiple requests at once using asynchronous file io, this design will keep the database busy, rather than continually bottlenecking in the single DB "object" thread.
    Ò»Ê×Ê«, Aug 15, 2008
    #6
  7. Ò»Ê×Ê«

    James Kanze Guest

    On Aug 15, 10:03 am, "Chris Becke" <> wrote:
    > >> This sort of design can ultimately be far better tuned to keep
    > >> the CPU cores as busy as possible, while minimising needless
    > >> context switches from having an "active" thread count far in
    > >> excess of CPU availability.

    > >Do you have actual measurements from a real application to
    > >support this claim. I doubt that it's true for most
    > >applications.


    > Microsoft Windows needs to allocate stack space for each
    > thread created. On the 32bit version of the OS then, this
    > means an immediately scalibility problem :- with only 2Gb of
    > address space per process, this implies a hard limit of 2048
    > connections (threads) per server. Even on a 64bit OS the
    > working set added to the process for each thread means that
    > phsyical hardware limits will be reached that much faster than
    > a system that uses asynchronous IO to keep lots of connections
    > on one thread.


    That's a different problem, but yes, it does have to be taken
    into account. The cost of creating a thread can also be an
    issue, if connections are short lived (e.g. as in an HTML
    server).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Aug 15, 2008
    #7
  8. Ò»Ê×Ê«

    James Kanze Guest

    On Aug 15, 4:07 am, 一首诗 <> wrote:
    > Thanks for all your help! After I read all your posts and
    > reconsider my coworker's arguments, I think some explanation
    > may be needed.


    > 1. Chris explained exactly my real alternative solutions in
    > this post.


    > 2. Also as James pointed out, the most valuable point of a
    > 'dedicated model' is that no lock is needed as only one thread
    > would touch the data.


    > 3. As Erik wrote "there is usually a very layered
    > architecture", whether there should have a layered
    > architecture, is a key consideration whether to use a
    > 'dedicated model'.


    > 4. About shared data. Yes, of course there are shared data
    > between each client. Actually we are building an SIP server
    > for VOIP and Instant Message. But in the case of a web
    > server, isn't there are also shared data between each client?
    > ...


    Sometimes. Sometimes not. Of course, only mutable shared data
    is a problem. And it depends on who's using it, when. As I
    said, there's no silver bullet. It all depends on the
    application.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Aug 15, 2008
    #8
  9. >
    "Chris Becke" <> wrote in message
    news:...
    >>> This sort of design can ultimately be far better tuned to keep
    >>> the CPU cores as busy as possible, while minimising needless
    >>> context switches from having an "active" thread count far in
    >>> excess of CPU availability.


    >>Do you have actual measurements from a real application to
    >>support this claim. I doubt that it's true for most
    >>applications.


    >Microsoft Windows needs to allocate stack space for each thread created. On
    >the 32bit version of >the OS then, this means an immediately scalibility
    >problem :- with only 2Gb of address space per >process, this implies a hard
    >limit of 2048 connections (threads) per server. Even on a 64bit OS >the
    >working set added to the process for each thread means that phsyical
    >hardware limits will be >reached that much faster than a system that uses
    >asynchronous IO to keep lots of connections on >one thread.


    I have personally created IOCP servers on Windows which can handle __well__
    over 40,000 connections; want some tips?
    Chris M. Thomasson, Aug 15, 2008
    #9
  10. Ò»Ê×Ê«

    Guest

    Can anyone explain what is thread pool and dedicated pool ?

    I have read billion of materials still not very understand.
    , Aug 16, 2008
    #10
  11. Ò»Ê×Ê«

    Ian Collins Guest

    Chris M. Thomasson wrote:
    >>

    > "Chris Becke" <> wrote:
    >
    >> Microsoft Windows needs to allocate stack space for each thread
    >> created. On the 32bit version of >the OS then, this means an
    >> immediately scalibility problem :- with only 2Gb of address space per
    >> >process, this implies a hard limit of 2048 connections (threads) per

    >> server. Even on a 64bit OS >the working set added to the process for
    >> each thread means that phsyical hardware limits will be >reached that
    >> much faster than a system that uses asynchronous IO to keep lots of
    >> connections on >one thread.

    >
    > I have personally created IOCP servers on Windows which can handle
    > __well__ over 40,000 connections; want some tips?


    But I'd bet several gallons for my favourite beer that you didn't create
    40,000 threads!

    The one thread per connection model simply isn't scalable beyond a
    handful of threads per core.

    --
    Ian Collins.
    Ian Collins, Aug 16, 2008
    #11
  12. "Ian Collins" <> wrote in message
    news:...
    > Chris M. Thomasson wrote:
    >>>

    >> "Chris Becke" <> wrote:
    >>
    >>> Microsoft Windows needs to allocate stack space for each thread
    >>> created. On the 32bit version of >the OS then, this means an
    >>> immediately scalibility problem :- with only 2Gb of address space per
    >>> >process, this implies a hard limit of 2048 connections (threads) per
    >>> server. Even on a 64bit OS >the working set added to the process for
    >>> each thread means that phsyical hardware limits will be >reached that
    >>> much faster than a system that uses asynchronous IO to keep lots of
    >>> connections on >one thread.

    >>
    >> I have personally created IOCP servers on Windows which can handle
    >> __well__ over 40,000 connections; want some tips?

    >
    > But I'd bet several gallons for my favourite beer that you didn't create
    > 40,000 threads!


    I only created around 2 * N threads for the IOCP treading pool, where N is
    the number of processors in the system. I did create a couple of more
    threads whose only job was to perform some resource maintenance tasks...




    > The one thread per connection model simply isn't scalable beyond a
    > handful of threads per core.


    Right. Well, I guess you could use one user-thread (e.g. fiber)
    per-connection and implement your own scheduler. The question is why in the
    world would you do that on Windows when there is the wonderful and scalable
    IOCP mechanism to work with...
    Chris M. Thomasson, Aug 16, 2008
    #12
  13. Ò»Ê×Ê«

    James Kanze Guest

    On Aug 16, 6:08 am, Ian Collins <> wrote:
    > Chris M. Thomasson wrote:


    > > "Chris Becke" <> wrote:


    > >> Microsoft Windows needs to allocate stack space for each thread
    > >> created. On the 32bit version of >the OS then, this means an
    > >> immediately scalibility problem :- with only 2Gb of address space per
    > >> >process, this implies a hard limit of 2048 connections (threads) per
    > >> server. Even on a 64bit OS >the working set added to the process for
    > >> each thread means that phsyical hardware limits will be >reached that
    > >> much faster than a system that uses asynchronous IO to keep lots of
    > >> connections on >one thread.


    > > I have personally created IOCP servers on Windows which can handle
    > > __well__ over 40,000 connections; want some tips?


    > But I'd bet several gallons for my favourite beer that you
    > didn't create 40,000 threads!


    > The one thread per connection model simply isn't scalable beyond a
    > handful of threads per core.


    It depends. We're not at 40,000 connections yet, but we've
    certainly more than a handful per core. And there's no problem
    with the one thread per connection model for our application; in
    fact, it would work better with two threads per connection
    (one for push, and the other for pull). I've done a few tests
    on Solaris, and there's no problem with thousands of threads.

    It depends on what each connection is doing, and how long they
    stay connected. (In our case, connections tend to last anywhere
    between four and twelve hours. And of course, most of that
    time, they are quiescent.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Aug 16, 2008
    #13
  14. Ò»Ê×Ê«

    Ian Collins Guest

    James Kanze wrote:
    > On Aug 16, 6:08 am, Ian Collins <> wrote:
    >
    >> The one thread per connection model simply isn't scalable beyond a
    >> handful of threads per core.

    >
    > It depends. We're not at 40,000 connections yet, but we've
    > certainly more than a handful per core. And there's no problem
    > with the one thread per connection model for our application; in
    > fact, it would work better with two threads per connection
    > (one for push, and the other for pull). I've done a few tests
    > on Solaris, and there's no problem with thousands of threads.
    >

    That depends what they are doing, I've been hit by a thundering heard
    with just 100 or so.

    > It depends on what each connection is doing, and how long they
    > stay connected. (In our case, connections tend to last anywhere
    > between four and twelve hours. And of course, most of that
    > time, they are quiescent.)
    >

    Ah, that explains it. I guess very few are blocking on the same
    resource and you will have a very low rate of context switches. The
    problems begin when the thread lifetime is short, the classic example
    being a web server.

    --
    Ian Collins.
    Ian Collins, Aug 16, 2008
    #14
  15. Ò»Ê×Ê«

    gpderetta Guest

    On Aug 16, 7:47 am, "Chris M. Thomasson" <> wrote:
    > "Ian Collins" <> wrote in message
    >
    > news:...
    >
    >
    >
    > > Chris M. Thomasson wrote:

    >
    > >> "Chris Becke" <> wrote:

    >
    > >>> Microsoft Windows needs to allocate stack space for each thread
    > >>> created. On the 32bit version of >the OS then, this means an
    > >>> immediately scalibility problem :- with only 2Gb of address space per
    > >>> >process, this implies a hard limit of 2048 connections (threads) per
    > >>> server. Even on a 64bit OS >the working set added to the process for
    > >>> each thread means that phsyical hardware limits will be >reached that
    > >>> much faster than a system that uses asynchronous IO to keep lots of
    > >>> connections on >one thread.

    >
    > >> I have personally created IOCP servers on Windows which can handle
    > >> __well__ over 40,000 connections; want some tips?

    >
    > > But I'd bet several gallons for my favourite beer that you didn't create
    > > 40,000 threads!

    >
    > I only created around 2 * N threads for the IOCP treading pool, where N is
    > the number of processors in the system. I did create a couple of more
    > threads whose only job was to perform some resource maintenance tasks...
    >
    > > The one thread per connection model simply isn't scalable beyond a
    > > handful of threads per core.

    >
    > Right. Well, I guess you could use one user-thread (e.g. fiber)
    > per-connection and implement your own scheduler. The question is why in the
    > world would you do that on Windows when there is the wonderful and scalable
    > IOCP mechanism to work with...


    You can of course use user-threads on top of IOCP and get the best of
    both worlds.

    BTW, a good reference on the topic of (web) server scalability:

    http://www.kegel.com/c10k.html

    (I guess many here know this page).

    HTH,

    --
    gpd
    gpderetta, Aug 16, 2008
    #15
  16. Ò»Ê×Ê«

    James Kanze Guest

    On Aug 16, 12:47 pm, Ian Collins <> wrote:
    > James Kanze wrote:
    > > On Aug 16, 6:08 am, Ian Collins <> wrote:


    > >> The one thread per connection model simply isn't scalable beyond a
    > >> handful of threads per core.


    > > It depends. We're not at 40,000 connections yet, but we've
    > > certainly more than a handful per core. And there's no problem
    > > with the one thread per connection model for our application; in
    > > fact, it would work better with two threads per connection
    > > (one for push, and the other for pull). I've done a few tests
    > > on Solaris, and there's no problem with thousands of threads.


    > That depends what they are doing, I've been hit by a
    > thundering heard with just 100 or so.


    Exactly. If you're using threads to parallelize operations,
    then too many will be counter-productive. If you're using them
    to separate various concerns, it depends.

    > > It depends on what each connection is doing, and how long they
    > > stay connected. (In our case, connections tend to last anywhere
    > > between four and twelve hours. And of course, most of that
    > > time, they are quiescent.)


    > Ah, that explains it. I guess very few are blocking on the
    > same resource and you will have a very low rate of context
    > switches.


    Probably. In our case, clients have to remain connected,
    because we use both push and pull. And there is client
    (connection) specific state, related to things like privileges.
    There are many ways of handling this: I'm pretty sure that the
    entire application could have been written in a single thread
    without too many problems; alternatively, we could have used two
    threads per connection (one for the push, and one for the pull).
    Or any number of mixtures of this (a thread per connection for
    the pull, but a single thread for the push).

    > The problems begin when the thread lifetime is short, the
    > classic example being a web server.


    HTTP tends to be an example where a new thread per connection is
    NOT a good idea. But there are a lot of other protocols out
    there, and a lot of other client/server architectures. And I
    suspect that writing C++ code to handle HTTP connections is
    fairly rare: I would expect that most people would use existing
    server (Apache, WebSphere, etc.) software, with JSP or something
    similar for the dynamically generated contents. The C++ parts
    would be the back-end engines, where the connections wouldn't
    necessarily (but could) reflect the incoming connections.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Aug 16, 2008
    #16
  17. "gpderetta" <> wrote in message
    news:...
    > On Aug 16, 7:47 am, "Chris M. Thomasson" <> wrote:
    >> "Ian Collins" <> wrote in message
    >>
    >> news:...
    >>
    >>
    >>
    >> > Chris M. Thomasson wrote:

    >>
    >> >> "Chris Becke" <> wrote:

    >>
    >> >>> Microsoft Windows needs to allocate stack space for each thread
    >> >>> created. On the 32bit version of >the OS then, this means an
    >> >>> immediately scalibility problem :- with only 2Gb of address space per
    >> >>> >process, this implies a hard limit of 2048 connections (threads) per
    >> >>> server. Even on a 64bit OS >the working set added to the process for
    >> >>> each thread means that phsyical hardware limits will be >reached that
    >> >>> much faster than a system that uses asynchronous IO to keep lots of
    >> >>> connections on >one thread.

    >>
    >> >> I have personally created IOCP servers on Windows which can handle
    >> >> __well__ over 40,000 connections; want some tips?

    >>
    >> > But I'd bet several gallons for my favourite beer that you didn't
    >> > create
    >> > 40,000 threads!

    >>
    >> I only created around 2 * N threads for the IOCP treading pool, where N
    >> is
    >> the number of processors in the system. I did create a couple of more
    >> threads whose only job was to perform some resource maintenance tasks...
    >>
    >> > The one thread per connection model simply isn't scalable beyond a
    >> > handful of threads per core.

    >>
    >> Right. Well, I guess you could use one user-thread (e.g. fiber)
    >> per-connection and implement your own scheduler. The question is why in
    >> the
    >> world would you do that on Windows when there is the wonderful and
    >> scalable
    >> IOCP mechanism to work with...

    >
    > You can of course use user-threads on top of IOCP and get the best of
    > both worlds.


    Sure. I guess you would use an IOCP thread as the actual scheduler for the
    fibers within it. When an IO completeion is encountered, you extract the
    fiber context from the completeion key and simply switch to that fiber. When
    the fiber does its thing, it switches back to the IOCP thread. Something
    like:

    // pseudo-code


    struct per_io {
    OVERLAPPED ol;
    char buf[1024];
    DWORD bytes;
    int action;
    BOOL status;
    };

    struct per_socket {
    SOCKET sck;
    void* fiber_socket_context;
    void* fiber_iocp_context;
    struct per_io* active_io;
    };


    DWORD WINAPI iocp_entry(LPVOID state) {
    for (;;) {
    struct per_io* pio = NULL;
    struct per_socket* psck = NULL;
    DWORD bytes = 0;
    BOOL status = GQCS(...,
    &bytes,
    ...,
    (LPOVERLAPPED)&pio,
    (PULONG_PTR)&psck,
    INFINITE);
    pio->status = status;
    psck->active_io = pio;
    SwitchToFiber(psck->fiber_socket_context);
    }
    return 0;
    }


    VOID WINAPI per_socket_entry(LPVOID state) {
    struct per_socket* const _this = state;
    for (;;) {
    struct per_io* const pio = _this->active_io;
    switch (pio->action) {
    case ACTION_RECV:
    [...];
    break;
    case ACTION_SEND:
    [...];

    [whatever...];
    }
    }
    }





    > BTW, a good reference on the topic of (web) server scalability:
    >
    > http://www.kegel.com/c10k.html
    >
    > (I guess many here know this page).


    Indeed.
    Chris M. Thomasson, Aug 17, 2008
    #17
  18. "Chris M. Thomasson" <> wrote in message
    news:lAMpk.5327$...
    > "gpderetta" <> wrote in message
    > news:...
    >> On Aug 16, 7:47 am, "Chris M. Thomasson" <> wrote:
    >>> "Ian Collins" <> wrote in message
    >>>
    >>> news:...
    >>>
    >>>
    >>>
    >>> > Chris M. Thomasson wrote:
    >>>
    >>> >> "Chris Becke" <> wrote:
    >>>
    >>> >>> Microsoft Windows needs to allocate stack space for each thread
    >>> >>> created. On the 32bit version of >the OS then, this means an
    >>> >>> immediately scalibility problem :- with only 2Gb of address space
    >>> >>> per
    >>> >>> >process, this implies a hard limit of 2048 connections (threads)
    >>> >>> >per
    >>> >>> server. Even on a 64bit OS >the working set added to the process for
    >>> >>> each thread means that phsyical hardware limits will be >reached
    >>> >>> that
    >>> >>> much faster than a system that uses asynchronous IO to keep lots of
    >>> >>> connections on >one thread.
    >>>
    >>> >> I have personally created IOCP servers on Windows which can handle
    >>> >> __well__ over 40,000 connections; want some tips?
    >>>
    >>> > But I'd bet several gallons for my favourite beer that you didn't
    >>> > create
    >>> > 40,000 threads!
    >>>
    >>> I only created around 2 * N threads for the IOCP treading pool, where N
    >>> is
    >>> the number of processors in the system. I did create a couple of more
    >>> threads whose only job was to perform some resource maintenance tasks...
    >>>
    >>> > The one thread per connection model simply isn't scalable beyond a
    >>> > handful of threads per core.
    >>>
    >>> Right. Well, I guess you could use one user-thread (e.g. fiber)
    >>> per-connection and implement your own scheduler. The question is why in
    >>> the
    >>> world would you do that on Windows when there is the wonderful and
    >>> scalable
    >>> IOCP mechanism to work with...

    >>
    >> You can of course use user-threads on top of IOCP and get the best of
    >> both worlds.

    >
    > Sure. I guess you would use an IOCP thread as the actual scheduler for the
    > fibers within it. When an IO completeion is encountered, you extract the
    > fiber context from the completeion key and simply switch to that fiber.
    > When the fiber does its thing, it switches back to the IOCP thread.
    > Something like:




    WHOOPS! I accidentally sent this to early! Retarded keypress... Anyway, I
    needed to allow the per_socket fiber to switch back to the iocp fiber!!!

    >
    > // pseudo-code
    >
    >
    > struct per_io {
    > OVERLAPPED ol;
    > char buf[1024];
    > DWORD bytes;
    > int action;
    > BOOL status;
    > };
    >
    > struct per_socket {
    > SOCKET sck;
    > void* fiber_socket_context;
    > void* fiber_iocp_context;
    > struct per_io* active_io;
    > };
    >
    >
    > DWORD WINAPI iocp_entry(LPVOID state) {
    > for (;;) {
    > struct per_io* pio = NULL;
    > struct per_socket* psck = NULL;
    > DWORD bytes = 0;
    > BOOL status = GQCS(...,
    > &bytes,
    > ...,
    > (LPOVERLAPPED)&pio,
    > (PULONG_PTR)&psck,
    > INFINITE);
    > pio->status = status;
    > psck->active_io = pio;


    psck->fiber_iocp_context = state;

    > SwitchToFiber(psck->fiber_socket_context);
    > }
    > return 0;
    > }
    >





    > VOID WINAPI per_socket_entry(LPVOID state) {
    > struct per_socket* const _this = state;
    > for (;;) {
    > struct per_io* const pio = _this->active_io;
    > switch (pio->action) {
    > case ACTION_RECV:
    > [...];
    > break;
    > case ACTION_SEND:
    > [...];
    >
    > [whatever...];
    > }



    SwitchToFiber(_this->fiber_iocp_context);


    > }
    > }




    >> BTW, a good reference on the topic of (web) server scalability:
    >>
    >> http://www.kegel.com/c10k.html
    >>
    >> (I guess many here know this page).

    >
    > Indeed.
    Chris M. Thomasson, Aug 17, 2008
    #18
  19. "gpderetta" <> wrote in message
    news:...
    > On Aug 16, 7:47 am, "Chris M. Thomasson" <> wrote:
    >> "Ian Collins" <> wrote in message
    >>
    >> news:...
    >>
    >>
    >>
    >> > Chris M. Thomasson wrote:

    >>
    >> >> "Chris Becke" <> wrote:

    >>
    >> >>> Microsoft Windows needs to allocate stack space for each thread
    >> >>> created. On the 32bit version of >the OS then, this means an
    >> >>> immediately scalibility problem :- with only 2Gb of address space per
    >> >>> >process, this implies a hard limit of 2048 connections (threads) per
    >> >>> server. Even on a 64bit OS >the working set added to the process for
    >> >>> each thread means that phsyical hardware limits will be >reached that
    >> >>> much faster than a system that uses asynchronous IO to keep lots of
    >> >>> connections on >one thread.

    >>
    >> >> I have personally created IOCP servers on Windows which can handle
    >> >> __well__ over 40,000 connections; want some tips?

    >>
    >> > But I'd bet several gallons for my favourite beer that you didn't
    >> > create
    >> > 40,000 threads!

    >>
    >> I only created around 2 * N threads for the IOCP treading pool, where N
    >> is
    >> the number of processors in the system. I did create a couple of more
    >> threads whose only job was to perform some resource maintenance tasks...
    >>
    >> > The one thread per connection model simply isn't scalable beyond a
    >> > handful of threads per core.

    >>
    >> Right. Well, I guess you could use one user-thread (e.g. fiber)
    >> per-connection and implement your own scheduler. The question is why in
    >> the
    >> world would you do that on Windows when there is the wonderful and
    >> scalable
    >> IOCP mechanism to work with...

    >
    > You can of course use user-threads on top of IOCP and get the best of
    > both worlds.


    I jumped the gun here before actually working out a solution... Now that I
    think about it some more, well, this scheme may not work after all. The
    problem is that fibers are bound to specific threads for their lifetime.
    However, IOCP completions can allow a socket to receive completions on
    different threads. Think about it. If a socket issues two overlapped io
    operations, well, those completions may come in on two different threads.
    How would you use fibers in this scenario? The only way I can see it working
    is if you created a IOCP handle for each io processing thread, which defeats
    the purpose of IOCP in the first place. Therefore, I conclude that fibers
    and IOCP will _not_ work well together as-is...

    What am I missing?

    [...]
    Chris M. Thomasson, Aug 17, 2008
    #19
  20. Ò»Ê×Ê«

    gpderetta Guest

    On Aug 17, 5:18 am, "Chris M. Thomasson" <> wrote:
    > "gpderetta" <> wrote in message
    >
    > > You can of course use user-threads on top of IOCP and get the best of
    > > both worlds.

    >
    > I jumped the gun here before actually working out a solution... Now that I
    > think about it some more, well, this scheme may not work after all. The
    > problem is that fibers are bound to specific threads for their lifetime.


    Hum as far as I understand from the win32 documentation, fibers are
    allowed to migrate from one thread to another:

    "You can call SwitchToFiber with the address of a fiber created by a
    different thread. To do this, you must have the address returned to
    the other thread when it called CreateFiber and you must use proper
    synchronization. "

    (in fact I think you also need some appropriate compiler flags to
    disable some TLS related optimizations)

    Of course the problem is proper sinchronization:

    > However, IOCP completions can allow a socket to receive completions on
    > different threads. Think about it. If a socket issues two overlapped io
    > operations, well, those completions may come in on two different threads.
    > How would you use fibers in this scenario?


    Hum don't do two overlapped operations linked to the same fiber
    then :).

    More seriously, the problem is making sure never to wake up a fiber if
    it is already running and never go to sleep if there is any ready
    operation not already acknowledged for.

    For example, for every asynchronous operation posted, you could
    increment a counter ; when an operation complete, you decrement the
    counter and check if the fiber is not awake, if not, mark the fiber as
    awake and run it (or put it in a ready queue); when you stop a fiber,
    you first suspend it the atomically check for ready operations
    operations pending and mark it as sleeping. If there are ready
    operations, you abort the suspend and restart the fiber. Getting
    things right without missing wakeups is not trivial.

    You need to synchronize access to the counter and fiber state, of
    course, but you would need to synchronize access to any state attached
    to the socket anyway , so IMHO it doesn't make much of a difference.
    In fact I'm sure you can come up with some scheme that actually
    doesn't require locks.

    > The only way I can see it working
    > is if you created a IOCP handle for each io processing thread, which defeats
    > the purpose of IOCP in the first place. Therefore, I conclude that fibers
    > and IOCP will _not_ work well together as-is...
    >


    Even in the one IOCP per thread, you are no worse than unix select and
    friends: IOCP is still a pretty good reactor and will still scale way
    better than using the one thread per connection model; you lose the
    benefit of having the OS control the optimal number of running
    threads, but but you gain by not needing synchronization and better
    cache locality.

    --
    gpd
    gpderetta, Aug 22, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    469
  2. Replies:
    4
    Views:
    1,153
  3. Hugo
    Replies:
    4
    Views:
    1,721
    Logan Shaw
    Mar 27, 2008
  4. testisok
    Replies:
    0
    Views:
    303
    testisok
    Feb 17, 2009
  5. Paul Butcher
    Replies:
    12
    Views:
    687
    Gary Wright
    Nov 28, 2007
Loading...

Share This Page