Is there (semi-) portable IO-Completion-Port-like functionality inC++ land?


K

K. Frank

Hello Group!

Now that C++11 offers an atomic / synchronization / threading model,
has
an IO-Completion-Port-like synchronization object been implemented?

IO Completion Ports are discussed in MSDN:

http://msdn.microsoft.com/en-us/library/aa365198(VS.85).aspx

As a reminder, the Windows IO Completion Port has the following
interesting
feature: The IO Completion Port (as I understand it) controls a LIFO
queue of worker threads that services a FIFO queue of "things to do"
(the so-called IO Completion Packets). The interesting point is that
the IO Completion Port has a "concurrency value" and will not release
a worker thread to service the "packet" queue unless the number of
actively running threads (controlled by the IO Completion Port) is
less than the concurrency value. That is, the IO Completion Port
tries to keep the number of actively running threads equal to the
concurrency value.

The point of this is that you can set the concurrency value equal to
the number of processors on the machine. Now you can keep each
processor busy running its own active thread, but you don't slow
things down by gratuitously context switching to other threads in
thread pool.

This seems like a very good approach to getting good performance out
of a multi-core machine if you can partition your workload into to
"things to do" that can naturally be serviced by a thread pool.
(I think Microsoft designed IO Completion Ports with asynchronous
I/O in mind, but I view them as offering a more generally useful
approach to thread pools.)

Does C++11 offer the ability to implement something similar in a
semi-portable way? Are there any implementations out there?

I'm pretty sure that there is nothing like this in std::thread.
I've also looked at Boost.Asio, and didn't see anything along
these lines. (The Windows implementation of Boost.Asio appears
to use native IO Completion Ports under the hood, but appears
not to expose this functionality.)

So, are there any IO-Completion-Port-like thingies in C++ land,
or anything on the horizon?


Thanks.


K. Frank
 
Ad

Advertisements

C

Christopher Pisz

Hello Group!

Now that C++11 offers an atomic / synchronization / threading model,
has
an IO-Completion-Port-like synchronization object been implemented?

IO Completion Ports are discussed in MSDN:

http://msdn.microsoft.com/en-us/library/aa365198(VS.85).aspx

As a reminder, the Windows IO Completion Port has the following
interesting
feature: The IO Completion Port (as I understand it) controls a LIFO
queue of worker threads that services a FIFO queue of "things to do"
(the so-called IO Completion Packets). The interesting point is that
the IO Completion Port has a "concurrency value" and will not release
a worker thread to service the "packet" queue unless the number of
actively running threads (controlled by the IO Completion Port) is
less than the concurrency value. That is, the IO Completion Port
tries to keep the number of actively running threads equal to the
concurrency value.

The point of this is that you can set the concurrency value equal to
the number of processors on the machine. Now you can keep each
processor busy running its own active thread, but you don't slow
things down by gratuitously context switching to other threads in
thread pool.

This seems like a very good approach to getting good performance out
of a multi-core machine if you can partition your workload into to
"things to do" that can naturally be serviced by a thread pool.
(I think Microsoft designed IO Completion Ports with asynchronous
I/O in mind, but I view them as offering a more generally useful
approach to thread pools.)

Does C++11 offer the ability to implement something similar in a
semi-portable way? Are there any implementations out there?

I'm pretty sure that there is nothing like this in std::thread.
I've also looked at Boost.Asio, and didn't see anything along
these lines. (The Windows implementation of Boost.Asio appears
to use native IO Completion Ports under the hood, but appears
not to expose this functionality.)

So, are there any IO-Completion-Port-like thingies in C++ land,
or anything on the horizon?


Thanks.


K. Frank


boost::asio::io_service does exactly what windows io completion ports
do, in fact i uses them to do it. What is it that you think is missing?
Or maybe you just didn't understand how to do what you want?

If you'd like to run a number of concurrent threads equal to the number
of processors or cores, than get that number like you normally would,
then create that number of boost::threads, in each thread call
io_service::run. Give the io_service a work object so it doesn't close
when it runs out of work to do in stays in queue.

You can also use a thread_group, but I haven't tried that yet. I just
create an array of threads.
 
K

K. Frank

Hi Christopher!

Thank you for your reply and for pointing me to
boost::asio::io_service.

boost::asio::io_service does exactly what windows io completion ports
do, in fact i uses them to do it. What is it that you think is missing?
Or maybe you just didn't understand how to do what you want?

Looking at boost::asio::io_service, the IO Completion Port feature
that I think is missing is the "concurrency value." (More precisely,
I think io_service implements a concurrency-value feature only on
windows, and not generally across platforms. More on this, below.)
If you'd like to run a number of concurrent threads equal to the number
of processors or cores, than get that number like you normally would,
then create that number of boost::threads, in each thread call
io_service::run.

This makes sense in practice, but differs in detail from
what one can do with an IO Completion Port.

Specifically, let's say I have four processor cores. With
an IO Completion Port I can set the concurrency value to
four, and then service the IO Completion Port with a thread
pool of, say, twenty threads. As long as the threads don't
block, then the IO Completion Port will only let four of the
thread-pool threads run. Each processor keeps busy, but we
avoid unnecessary context switching. So far, this is the
same as running an io_service (or IO Completion Port) with
a thread pool of four threads.

But now suppose two of the threads block (for example, by
making a synchronous database call). If I understand things
correctly, in the io_service case only two threads will be
actively running, leaving two processor cores idle. In
contrast, the IO Completion Port will release two more threads
(from its thread pool of twenty) to work on the packet queue,
keeping all four cores busy.

To me that's the key issue. You have the ability to have
more threads in the thread pool than cores so that the cores
keep busy even if some threads block, while (usually) only
having as many actively running threads as cores, thereby
avoiding unnecessary context switching.

This feature of IO Completion Ports has always struck me as
both clever and practical.
...
You can also use a thread_group, but I haven't tried that yet. I just
create an array of threads.

Just thinking out loud here ...

So (using C++11 std::thread terminology here) if an
IO-Completion-Port-like construct used a condition_variable
to synchronize the thread pool's access to the "packet"
queue, then we would want something like:

if (
"packet added to empty queue" &&
"active threads < concurrency_value"
) cv.notify_one();

and

if (
"active threads falls below concurrency_value" &&
"queue not empty"
) cv.notify_one();

Of course, this is just suggestive. I don't pretend to know
how to implement an IO Completion Port without it already being
built into the operating system.

Coming back to io_service, it looks like it _does_ offer
concurrency_value functionality on windows. One of the
io_service constructors takes a "concurrency_hint"
argument:

io_service (std::size_t concurrency_hint);

I am willing to bet that this argument is ignored on
non-windows platforms, and that on windows platforms
io_service is implemented using an IO Completion Ports,
and that concurrency_hint is passed to the IO Completion
Port as its concurrency value. (I haven't verified it,
but it looks true to me.)

This is altogether reasonable. I can use io_service across
platforms, and on windows (but only on windows) I get the
benefit of the concurrency-value optimization.

Unfortunately this also suggests that implementing the
concurrency-value functionality on top of "standard"
synchronization primitives (without os support) isn't
so easy (because otherwise the boost guys might have
done it).

So, two questions:

Is it practical to implement the IO Completion Port
concurrency-value feature using "standard" primitives?

Does anyone have any historical color or insight into the
thinking of the Boost.Asio developers on this issue?


Thanks again.


K. Frank
 
C

Christopher Pisz

On 4/20/2012 12:43 PM, K. Frank wrote:
SNIP
But now suppose two of the threads block (for example, by
making a synchronous database call). If I understand things
correctly, in the io_service case only two threads will be
actively running, leaving two processor cores idle. In
contrast, the IO Completion Port will release two more threads
(from its thread pool of twenty) to work on the packet queue,
keeping all four cores busy.

To me that's the key issue. You have the ability to have
more threads in the thread pool than cores so that the cores
keep busy even if some threads block, while (usually) only
having as many actively running threads as cores, thereby
avoiding unnecessary context switching.

This feature of IO Completion Ports has always struck me as
both clever and practical.

I see now.
As you say below, I would think the boost io_service would mimic that
behavior on Windows. I'd have to write a test case. Let me know what
your results are if you write one. I wouldn't expect that to be the case
on non-windows platforms.
Just thinking out loud here ...

So (using C++11 std::thread terminology here) if an
IO-Completion-Port-like construct used a condition_variable
to synchronize the thread pool's access to the "packet"
queue, then we would want something like:

if (
"packet added to empty queue"&&
"active threads< concurrency_value"
) cv.notify_one();

and

if (
"active threads falls below concurrency_value"&&
"queue not empty"
) cv.notify_one();

Of course, this is just suggestive. I don't pretend to know
how to implement an IO Completion Port without it already being
built into the operating system.

Coming back to io_service, it looks like it _does_ offer
concurrency_value functionality on windows. One of the
io_service constructors takes a "concurrency_hint"
argument:

io_service (std::size_t concurrency_hint);

I am willing to bet that this argument is ignored on
non-windows platforms, and that on windows platforms
io_service is implemented using an IO Completion Ports,
and that concurrency_hint is passed to the IO Completion
Port as its concurrency value. (I haven't verified it,
but it looks true to me.)

This is altogether reasonable. I can use io_service across
platforms, and on windows (but only on windows) I get the
benefit of the concurrency-value optimization.

Unfortunately this also suggests that implementing the
concurrency-value functionality on top of "standard"
synchronization primitives (without os support) isn't
so easy (because otherwise the boost guys might have
done it).

I also believe you are correct in that I would not expect that behavior
from non-Windows platforms, because as far as I know, it is only
available on Windows. I remember when I was doing a socket library for
Linux that I couldn't find a thing.
So, two questions:

Is it practical to implement the IO Completion Port
concurrency-value feature using "standard" primitives?

I wouldn't dare try to do something similar myself, just because of
risk/cost/benefit. Maybe for a purely academic just for fun exercise.
Does anyone have any historical color or insight into the
thinking of the Boost.Asio developers on this issue?

Good question for the boost mailing list. You can get to it using Gmane.
They have instructions on the boost site. The response rate isn't great
though. Most talk is about the C++11 features.
 
K

K. Frank

Hello Christopher!

Thanks for the follow-up.

On 4/20/2012 12:43 PM, K. Frank wrote:
SNIP
...

I also believe you are correct in that I would not expect that behavior
from non-Windows platforms, because as far as I know, it is only
available on Windows. I remember when I was doing a socket library for
Linux that I couldn't find a thing.

Yes, I've been thinking about this issue off and on for
a while. Whenever I've looked at other os'es -- primarily
linux -- for this IO-Completion-Port concurrency-value
functionality, I've come up empty.

So I figure it must be one of two things:

Either IO Completion Ports aren't really that useful in
practice (but the idea seems very sensible to me), or
they're hard to implement, absent some existing low-level
support.

(Any linux gurus out there? Why doesn't linux offer the
equivalent of IO Completion Ports? Are they not really
a good idea, or are they too hard to implement?)
I wouldn't dare try to do something similar myself, just because of
risk/cost/benefit. Maybe for a purely academic just for fun exercise.

I'd certainly be interested in hearing any thoughts on how
one might go about it.

I can't see any way to get started. It seems to me that any
IO-Completion-Port-like construct would have to know somehow
which of its associated threads are running and which are
blocked. So I guess that the threads would have to notify
the construct.

Let's say that your threading model supports two kinds of
primitive (i.e., simpler than IO Completion Ports) synchronization
objects: mutexes and condition variables. It would seem to
me that the mutex and condition-variable implementation
would have to know to notify "interested parties" (i.e., the
IO Completion Port) when threads are blocked and resumed.

So, according to this reasoning, unless support for IO
Completion Ports is already built into your synchronization
primitives, you wont be able to implement IO Completion
Ports.

(Of course, this would not preclude implementing IO
Completion Ports by reimplementing the other synchronization
primitives with the necessary support. I.e., one could
imagine writing, for example, an "enhanced" pthreads library
that included IO Completion Ports, by making that library's
synchronization primitives "IO-Completion-Port-aware.")

I'm not really convinced that the above analysis is correct,
but, at the moment, that's how it looks to me.
Good question for the boost mailing list. You can get to it using Gmane.
They have instructions on the boost site. The response rate isn't great
though. Most talk is about the C++11 features.

Thanks for the pointer to the boost mailing list. I would
be curious to hear what they have to say.

I appreciate your insights.


K. Frank
 
N

Nobody

Why doesn't linux offer the equivalent of IO Completion Ports?

The pthreads API doesn't include thread pools, which would be a
prerequisite.

There are a number of third-party thread pool libraries. Once you have
that, I would have thought that implementing completion ports would be
straightforward enough.

OTOH, async I/O is often unnecessary if you're using threads (likewise for
non-blocking I/O or select/poll). You can normally just use blocking I/O
and threads, which has the advantage of working with files (accessing
files never "blocks", even if there's a delay) as well as demand-paging of
the process' code and data.

On Linux, the POSIX aio API is implemented by glibc in user-space using
threads.
 
Ad

Advertisements

C

Christof Meerwald

]
(Of course, this would not preclude implementing IO
Completion Ports by reimplementing the other synchronization
primitives with the necessary support. I.e., one could
imagine writing, for example, an "enhanced" pthreads library
that included IO Completion Ports, by making that library's
synchronization primitives "IO-Completion-Port-aware.")

That would roughly be my understanding as well - except that most of
the "enhancements" would need to be made in the kernel for pretty much
any blocking syscall (when it enters a wait state). And I suspect that
this might come with some (performance) cost - but as there is no
standard mandating that functionality, it would be hard to justify.


Christof
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top