Concurrent Containers

S

Scott Meyers

Although C++0x includes support for concurrency, it offers no standard
containers that support simultaneous readers and writers. Java and .NET
developers have them as part of their standard libraries (e.g., Java's
Concurrent HashMap and ConcurrentLinkedQueue, .NET's ConcurrentQueue and
ConcurrentBag), but, from what I can tell, there don't even seem to be any de
facto standard concurrent containers for C++ developers. For example, Boost
doesn't seem to offer any.

The closest thing I can find to a portable cross-platform API for concurrent
containers are concurrent_vector and concurrent_queue offered independently by
Intel via TBB and Microsoft via PPL, but where both companies have pledged to
offer "identical concurrent STL container solutions" (per
http://tinyurl.com/2cgr22p). (That same page suggests that Intel and MS both
also offer concurrent_unordered_map, but that container is not listed at the PPL
web site (http://msdn.microsoft.com/en-us/library/dd504906.aspx)).

If I'm a cross-platform C++ developer who wants to take advantage of the work
others have done to develop, debug, and tune a concurrent container (i.e., I
don't want to write and maintain my own), what are my choices? Obviously, the
container I choose will depend on the problem I'm trying to solve, but what's on
the pre-built cross-platform concurrent container menu?

Thanks,

Scott
 
C

Chris M. Thomasson

Scott Meyers said:
Although C++0x includes support for concurrency, it offers no standard
containers that support simultaneous readers and writers. [...]
If I'm a cross-platform C++ developer who wants to take advantage of the
work others have done to develop, debug, and tune a concurrent container
(i.e., I don't want to write and maintain my own), what are my choices?
Obviously, the container I choose will depend on the problem I'm trying to
solve, but what's on the pre-built cross-platform concurrent container
menu?

I am VERY sorry that I simply have no time to explain the details right now,
but FWIW C++0x actually does provide a perfect medium for creating fairly
efficient garbage collected concurrent containers. For instance, here is a
full blown example of a garbage collected lock-free node-based stack that is
based on one of my proxy collector algorithms:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a53f24de178b419f
(please try to read entire thread...)

And here is a relevant discussion on the Boost threading mailing list:

http://thread.gmane.org/gmane.comp.lib.boost.devel/197400/focus=198747
(please try to read entire thread...)



Here is a brief description of what proxy garbage collection is "all about":

http://groups.google.com/group/comp.programming.threads/msg/41f29efe33e7f124




Here is recent implementation of one of my proxy garbage collector
algorithms in Relacy 2.3:

http://cpt.pastebin.com/f71480694




Here is where you can learn about and download a copy of Relacy 2.3:

http://groups.google.com/group/relacy

http://groups.google.com/group/relacy/browse_frm/thread/8a7ecceb5f1bf80b


IMHO, Relacy Race Detector is one of the _best_ synchronization verification
systems out there!



When I get more time I will explain how a proxy collector algorithm can
efficiently solve the reader/writer problem in _many_ diverse situations.



Here is some more info:

http://webpages.charter.net/appcore/vzoom/round-1.pdf

http://webpages.charter.net/appcore/vzoom/round-2.pdf

FWIW, those "papers" referred to my vZOOM algorithm entry in Sun
Microsystems CoolThreads contest:

https://coolthreads.dev.java.net

I was one of the finalists, and it got a brand new SunFire T2000. I was very
pleased!

;^)
 
B

Balog Pal

Scott Meyers said:
Although C++0x includes support for concurrency, it offers no standard
containers that support simultaneous readers and writers. Java and .NET
developers have them as part of their standard libraries (e.g., Java's
Concurrent HashMap and ConcurrentLinkedQueue, .NET's ConcurrentQueue and
ConcurrentBag),

I'm light on java but what I picked up about 'concurrent' containers, (and
even more generally about using threads) is pretty sour. The early
collections with everything synchronizes did not solve any real life use
case but introduced slowdown. Later the programs were full of undefined
behavior due to race conditions. LAter people swithhed to the 'concurrent'
stuff thet converted UB to exceptions thrown here there and everywhere.

In sunmmary I saw only different flavors of broken software, and supposed
'architects' fighting to shuffle bugs around. Instead of looking at the
fundamental issue/root cause: lack of MT design and lack of even recognition
that one is needed. After all tha language is 'safe' and the tools have
all thet 'synchronized' or 'concurrent' names.
:-(((

C/C++ was at least still put in some hurdles to infest a program with
threads, and left some sanity.
but, from what I can tell, there don't even seem to be any de facto
standard concurrent containers for C++ developers. For example, Boost
doesn't seem to offer any.

Is there such thing at all? Now really.

I try to recall my own stuff: I used 'concurrent' queue, in fact many
flavors of it, to communicate between threads. Despite it being
ubiquitous as usage pattern I hardly have in in a stock library, but put
together one in the new arena. Tuned to the actual way of use, and
properties of payload.

And observed others do in a similar way.

As far as the standard: it failed miserably on the much simpler case to
give us a *string*...

On this topic I'd rather keep is conservative and teach people thinking MT,
and able to use the elements in combination. I see the chances to mess up
that way is lower than it would be from picking supposedly pre-assembled
elements.
 
S

Scott Meyers

I am VERY sorry that I simply have no time to explain the details right now,
but FWIW C++0x actually does provide a perfect medium for creating fairly
efficient garbage collected concurrent containers.

But that's not the question. C++ offers the capabilities to write anything I
want, but if I want to use a container that's already written and is likely
correct, I can use any of the containers in the standard library, as long as I
don't care about the ability to concurrently modify the container. Most of the
time, it's just not worth my trouble to write a container from scratch.

What I'm looking for now are portable, proven containers that do support
concurrent modification. Could I write one from scratch? Sure. But I'd rather
spend my time doing other things if at all possible.

Scott
 
J

Joshua Maurice

I'm light on java but what I picked up about 'concurrent' containers, (and
even more generally about using threads) is pretty sour. The early
collections with everything synchronizes did not solve any real life use
case but introduced slowdown. Later the programs were full of undefined
behavior due to race conditions. LAter people swithhed to the 'concurrent'
stuff thet converted UB to exceptions thrown here there and everywhere.

In sunmmary I saw only different flavors of broken software, and supposed
'architects' fighting to shuffle bugs around. Instead of looking at the
fundamental issue/root cause: lack of MT design and lack of even recognition
that one is needed.    After all tha language is 'safe' and the tools have
all thet 'synchronized' or 'concurrent' names.
:-(((

C/C++ was at least still put in some hurdles to infest a program with
threads, and left some sanity.

Agreed that the original Java "thread-safe" containers were uselessly
slow from the synchronized methods which solved nothing. However, some
of the newer Java containers, which Scott mentions, such as
ConcurrentHashMap, do offer useful functionality, such as an atomic
"put entry in map if not present". I've made use of that function
several times.

A blocking queue for multiple consumers and multiple consumers would
be nice too. IIRC, Java has several flavors here, such as one with a
capped size based on an array - attempts to push to a full queue
blocked the producer, and another with a limitless size so that the
producer is never blocked.
 
B

Balog Pal

Scott Meyers said:
But that's not the question. C++ offers the capabilities to write
anything I want, but if I want to use a container that's already written
and is likely correct, I can use any of the containers in the standard
library, as long as I don't care about the ability to concurrently modify
the container. Most of the time, it's just not worth my trouble to write
a container from scratch.

What I'm looking for now are portable, proven containers that do support
concurrent modification.

Please define 'correct' for the scope of your discussion.

I recall a plenty of good articles on the topic of collection vs. MT -- or
rather on what is expected from the class to be called 'thread-safe' and
what is not. Not sure if it was Herb Sutter or someone else. With very
good examples on how a list or vector shall rptect its internal state on
some operations ((that the user can't ralisticly cover or even recognise as
a possible problem) -- while still leaving the obligation to lock around
for others -- where the collection can't really guess the intent.

In real life critical sections you protect a set of data -- and that set is
IMO too rarely match what we call 'container' in C++. It is mos likely a
class, a few members of a class, a few unrelated objects, or some existing
elements in a container.

As a transaction must involve them together, something a collection can
offer is more in the way than helps. Generally.
 
B

Balog Pal

Joshua Maurice said:
However, some
of the newer Java containers, which Scott mentions, such as
ConcurrentHashMap, do offer useful functionality, such as an atomic
"put entry in map if not present". I've made use of that function
several times.

Sure, the progress with the collections is clear -- just they are still no
silver bullets (obviously) and can't replace the missing design elements.
A blocking queue for multiple consumers and multiple consumers would
be nice too.

This far it did not fit too well with the STL approach. As current STL is
full of copy operations -- and my queues were more likely holding pointers.
With allocation ownership issues placed to different parties. I was meny
times tempted to use some smart pointers there, especially shared_ptr, then
dropped it.

For other cases the payload supported swap, and that was used for good.

As move semantics enter, the picture may change here.

IIRC, Java has several flavors here, such as one with a
capped size based on an array - attempts to push to a full queue
blocked the producer, and another with a limitless size so that the
producer is never blocked.

See, the proliferation starts here. And in C++ you should add a policy for
move, a policy for allocation/destroy (that could be different that the
%$#@#%$ allocator<> we carry around). How far we got in practice with
customizable things in std this far? After all we can create a locale and
imbue it in a stream, can we? ;-o
 
S

Scott Meyers

Please define 'correct' for the scope of your discussion.

Correct means what it always means: the API does what it promises (e.g., all
invariants and postconditions are satisfied). For example, if I have a
ConcurrentSet that tells me I may concurrently perform multiple inserts, after
the concurrent inserts are done, I should find exactly one copy of each thing
that was inserted and zero copies of things that were not inserted.
As a transaction must involve them together, something a collection can offer is
more in the way than helps. Generally.

Concurrent containers don't claim to solve all concurrency problems, nor do they
promise that clients will never have to worry about concurrency issues
(including transactional issues such as you mention). They simply promise that
certain kinds of operations that would not normally be safe if performed
concurrently on a non-concurrent container are safe on the concurrent container.
So a concurrent queue (useful for producer/consumer systems) permits enqueue
and dequeue operations to proceed concurrently. A concurrent hash table
(potentially useful for, e.g., a cache shared across threads) might permit
concurrent insertions.

A *lot* of effort has gone into developing concurrent containers in Java and
..NET, as well as in C++ libraries such as TBB and PPL. To suggest that all that
effort has been misguided, because there are no common use cases for such
containers, is untenable.

Are there wrong ways to write concurrent containers? Sure. Did Java employ
them in its synchronized collections. It did. But neither of those
observations reflects on the fundamental utility of the idea of concurrent
containers. That question, IMO, has been settled for some time.

Scott
 
B

Balog Pal

Scott Meyers said:
Correct means what it always means: the API does what it promises (e.g.,
all invariants and postconditions are satisfied). For example, if I have
a ConcurrentSet that tells me I may concurrently perform multiple inserts,
after the concurrent inserts are done, I should find exactly one copy of
each thing that was inserted and zero copies of things that were not
inserted.


Concurrent containers don't claim to solve all concurrency problems, nor
do they promise that clients will never have to worry about concurrency
issues (including transactional issues such as you mention). They simply
promise that certain kinds of operations that would not normally be safe
if performed concurrently on a non-concurrent container are safe on the
concurrent container. So a concurrent queue (useful for producer/consumer
systems) permits enqueue and dequeue operations to proceed concurrently.
A concurrent hash table (potentially useful for, e.g., a cache shared
across threads) might permit concurrent insertions.

A *lot* of effort has gone into developing concurrent containers in Java
and .NET, as well as in C++ libraries such as TBB and PPL. To suggest
that all that effort has been misguided, because there are no common use
cases for such containers, is untenable.

Appears I compressed my thoughts too far, to lose essential elements.
And possibly captured the scopoe of your question incorrectly too.

I was thinking in the scope of a (possible) C++ standard. The problem is
exactly what you emphasice above. A *lot* of effort was put in that field.
And the results are not so bright even in their native environment. But
IMO (!_ even if the results were perfect in java/C#, poted to the C++
environment would not be good enough.

Especially compared to to effort of a creating them by hand.

IMO people who can pick java, will do so. Those who use C++ do that for
reasons like they need the control and performance.

Having some collection that can magically withstand all kinds of operations
from random threads looks really nice. Really. Having some time I'd jump
on it to analyse how it is done and applaud.

But I doubt I'd use such a generic solution in my app. Because having random
threads keep poking a shared collection sounds like a blasphemy. With MT
design we fight to limit sharing as much as possble, for good reasons.
Sharing objects, sharing operations, etc. And using a proper sync op
around what remains shared is not a problem, especially since RAII was
invented to guard the mutex lock.

And going back to the C++ standard, the people creating it (I guess) have
fragment of resources that owners of java and C# has at disposal.

That is why I don't expect to see it there anytime soon (read: in my
lifetime).
And I see the lack of 'de facto standard' going to similar causes.

The java way is to have everything and the pussycat in the lib. The lib is
written by Sun. Or was to yesteryear. And it is considered good enough.

C++ users have no such supplier, and on the other end are more picky. Both
in width and limits of expectations. After all the 'you don't pay for what
you don't use' is still in place. As I can visualize the implementtion of
a concurrent container, and its actual use in a real program -- I'm skeptic
whether it can hold.

To put it another way, in my view in the C++ environment the idea has a bad
tradeoff. But I do not claim my view to be some objective truth, neither
shall it be extrapolated outside its scope.

Boost covers so much stuff and has that fine selection of developers. And it
does have components for both threading and collections. But not for
concurrent collections. My explanations to "why" is that people there also
found it a poor tradeoff. I'm open to hear any other explanation.


And rereading the initial post, your question was what is the menu, not why
is it so thin -- sorry to kinda sidetrack it.
 
T

tni

A *lot* of effort has gone into developing concurrent containers in Java
and .NET, as well as in C++ libraries such as TBB and PPL. To suggest
that all that effort has been misguided, because there are no common use
cases for such containers, is untenable.

Are there wrong ways to write concurrent containers? Sure. Did Java
employ them in its synchronized collections. It did. But neither of
those observations reflects on the fundamental utility of the idea of
concurrent containers. That question, IMO, has been settled for some time.

Really? There are a couple of useful concurrent containers: messaging
infrastructure (queues) and maybe maps for certain applications. Beyond
that, the value rapidly diminishes.

If the container interface doesn't allow you to perform the transactions
you need, it's useless. Very often, more than a single container is
going to be involved in a transaction.

What are those magic containers of general value that you are talking about?

\\

In general a mutex should be be cheap. I.e. if you look at Linux
Pthreads or Qt, a lock / unlock will be a single atomic operation in the
uncontested case. For most applications, a concurrent container won't do
better than just using a standard container with a mutex (even if it's
some fancy lock-free whatever). With RAII, exception safety for the
locking operation can be trivially had.
 
S

Scott Meyers

Yes, sometimes such containers are exactly what is needed. However, given
that one can in about 20 lines create a template class which is able to
wrap any non-concurrent class and expose a locking proxy pointer for safe
concurrent access, this is not so difficult to achieve.

Can you elaborate on what you mean here? Suppose I want to take a std::map or
std::unordered_map (C++0x) and make it safe for multiple threads to
simultaneously perform insertions. What will a "locking proxy pointer" do to
enable this behavior?

Scott
 
B

Balog Pal

Scott Meyers said:
Correct means what it always means: the API does what it promises (e.g.,
all invariants and postconditions are satisfied). For example, if I have
a ConcurrentSet that tells me I may concurrently perform multiple inserts,
after the concurrent inserts are done, I should find exactly one copy of
each thing that was inserted and zero copies of things that were not
inserted.

So, if we had a ConcurrentQueue, it would not really fit its most likely MT
use case. That includes alerting the consumer on produce...

If you don't mind sidetracking, could you make up a real use case for the
example set you described above? Anything that I could think keeps ringing
'race condition'. If I left the set pray to threads, how can I learn the
postcondition.invariant is actually held? How do I know something is in
there?

Possibly my mind is too petrified to abandon 'you shall lock at least till
the if() part finishes, but more likely until the whole conditional.

Ot it is part of the supposed API to invoke equivalent of java's
synchronized(mySet) { } ?
Concurrent containers don't claim to solve all concurrency problems, nor
do they promise that clients will never have to worry about concurrency
issues (including transactional issues such as you mention).

That sounds like pair<mutex, somecollection> where I can skip locking the
mutex some 30% of the times.

No pun intended, I just try to match the thing with the good advise 'make
interfaces easy to use correctly and hard to use incorrectly'.

Practice shows just too many incorrect uses -- so the main defense is to
avoid the situations altogether.
They simply promise that certain kinds of operations that would not
normally be safe if performed concurrently on a non-concurrent container
are safe on the concurrent container. So a concurrent queue (useful for
producer/consumer systems) permits enqueue and dequeue operations to
proceed concurrently.

As mentioned above that is nice but redundant in most actual uses, where you
couple some cond/event.

My designer's guess would say the demand is for not a concurrent queue as a
generic container, but for a X-producer/Y-consumer message queue, covering
those very use cases.
A concurrent hash table (potentially useful for, e.g., a cache shared
across threads) might permit concurrent insertions.

Try to imagine this interface, Could work if:
- no item invalidation (though delete allowed)
- fetch makes copy or uses refcounting (or no delete allowed at all, making
it one-way).

Don't you feel it a fragile element?
 
S

Scott Meyers

I think this is the wrong level for design. It's gotten lost over the years, but
containers should typically implementation details, not high-level design
objects. If you need an object that can be used from multiple threads to store
objects and search for an object, its interface might look like this:

template <class Ty>
class data_table
{
public:
void store(const Ty&);
const Ty& find(whatever);
private:
// none of your business! (but see below)
};

The implementation might well be a hash table, but the public interface is far
smaller than the interface for a hash table, and it's properly synchronized for
the operations that it provides.

It's too late to change now, I think, but in retrospect I should have used a
subject of "Concurrent Data Structures" for this thread, because the term
"containers" suggests to many people all the usual STL API stuff, and I'm not
married to that API. In fact I'd be surprised if concurrent containers can
offer the same API. Java's ConcurrentHashMap, for example, has a size method
that returns only an approximation of the number of elements in the map,
because, as Brian Goetz puts it in "Java Concurrency in Practice," "the result
of size could be out of date by the time it is computed."

What I'm really interested in is the availability of
already-implemented-and-proven commonly-useful concurrency-friendly data
structure types for C++. One that's been mentioned by more than one person
(other than me) in this thread is a queue, and it's probably no coincidence that
both TBB and PPL offer concurrent_queue. They also both offer concurrent_vector
and (apparently) concurrent_unordered_map, which suggests to me that uses for
those types are not particularly difficult to come by.

Scott
 
S

Scott Meyers

don't sound like a good idea. Thread safety has to be designed in at the
application level; there's no set of tools that will make a bad design safe.

Clearly. My impression is that the primary goal of concurrent data structures
isn't thread safety per se, but rather thread safety (for some defined set of
presumably useful operations) that scales well. Locking an entire data
structure to operate on only a part of it does not scale well.

Scott
 
P

Pavel

Scott said:
Although C++0x includes support for concurrency, it offers no standard
containers that support simultaneous readers and writers. Java and .NET
developers have them as part of their standard libraries (e.g., Java's
Concurrent HashMap and ConcurrentLinkedQueue, .NET's ConcurrentQueue and
ConcurrentBag), but, from what I can tell, there don't even seem to be
any de facto standard concurrent containers for C++ developers. For
example, Boost doesn't seem to offer any.

The closest thing I can find to a portable cross-platform API for
concurrent containers are concurrent_vector and concurrent_queue offered
independently by Intel via TBB and Microsoft via PPL, but where both
companies have pledged to offer "identical concurrent STL container
solutions" (per http://tinyurl.com/2cgr22p). (That same page suggests
that Intel and MS both also offer concurrent_unordered_map, but that
container is not listed at the PPL web site
(http://msdn.microsoft.com/en-us/library/dd504906.aspx)).

If I'm a cross-platform C++ developer who wants to take advantage of the
work others have done to develop, debug, and tune a concurrent container
(i.e., I don't want to write and maintain my own), what are my choices?
Obviously, the container I choose will depend on the problem I'm trying
to solve, but what's on the pre-built cross-platform concurrent
container menu?

Thanks,

Scott
If history is any guidance:

The very first version of Java (1.0.x) had all containers thread-safe
(java.util.Vector and java.util.Hashtable are probably the best-known.
They are still around). It was immediately noticed that canned
"synchronized" containers can't be used efficiently except for very
simple problems (mainly tutorial examples). A state for real-world
problem generally consists of more than one standard-library object
(say, a Vector *and a* Hashtable or a Vector and an int) and the whole
state has to be thread-safe. As soon as the problem grows to this
(admittedly modest) size, the thread-safety of underlying parts becomes
pure liability (in both performance and lost opportunities to design a
more convenient but non-thread-safe API).

Java creators learned the lesson fast. The second generation of
collection classes was all thread-unsafe and much faster. To hedge their
bets, they provided synchronized adapters to every container interface
that a client code could create, for example as follows:

List list = Collections.synchronizedList(new ArrayList(...)); // here,
ArrayList is an analog of Vector, but thread-unsafe, and the resulting
list has a thread-safe List API

I wrote tons of Java programs, most of them multi-threaded and has never
had a good cause to use one of these adapters (a couple of times I
tried, as a quick hack and just for the sake of it and payed dearly for
my incomplete analysis). Other Java people whom I know agree.

Standardizing inter-thread communication mechanisms may require a
"thread-safe" container class (usually a kind of a queue). In my
experience, however, any such standardized mechanism required a slightly
different queue. Some questions to answer -- whether you want to
enqueue/dequeue single objects or batches, how to delineate batches, how
to mix in system-specific asynchronous sinks and sources like file
descriptors, middleware, databases, IPC structures, whether you want to
serialize or copy your objects, whether you want to pre-allocate "slots"
on the queue and serialize right in and out or you want to copy objects
as is or you want to store pointers, which threads frees slot memory,
whether you can take advantage of single producer and/or consumer or you
have multiple consumers/producers, are you after high throughput, low
latency on average or real-time, etc etc etc. An answer to any of the
above questions will affect the optimal design of the container (both
API and implementation). And, if you want threads at all, performance is
probably important to you so you will not want to leave a single stone
unturned.

To summarize: Thread-safety is best applied at highest possible level of
application design. Memory should be shared in controlled number of
structures and these are to be selected / designed based on the concrete
application's requirements. The idea of standard thread-safe containers
failed before and we know the reasons. I do not see what has changed
since so that it could succeed if brought up again now.

-Pavel
 
B

Balog Pal

Scott Meyers said:
Java's ConcurrentHashMap, for example, has a size method that returns only
an approximation of the number of elements in the map, because, as Brian
Goetz puts it in "Java Concurrency in Practice," "the result of size could
be out of date by the time it is computed."

Ah, Bingo!!!

My previous question on "define correct in the context" was supposed to
bring out something like this.

My normal definition of correct size() for a container to know *the* number.
That is NOT delivered by the concurrent one. It fairly says so, meeting what
you answered a little up. But IMO this does not really meet what
programmers would "want" or expect in a real situation. rendering the
interface either being ballast or a landmine for too many potential
situations.

What I'm really interested in is the availability of
already-implemented-and-proven commonly-useful concurrency-friendly data
structure types for C++.

For this, as C++ gains native threading and move semantics hopefully soon I
expect such libraries/tools will finally emerge in the next few years.
 
S

Scott Meyers

Possibly my mind is too petrified to abandon 'you shall lock at least till the
if() part finishes, but more likely until the whole conditional.

I don't know about petrified, but I think that (1) you are assuming that the API
for a concurrent data structure will look like that for the current STL data
structures (which, as I posted earlier, is perhaps my fault for wording my
subject line poorly; I've modified it above) and (2) you are assuming that the
kinds of operations and combinations of operations that are common on existing
STL containers (i.e., on concurrency-unfriendly data structures) will also be
common on concurrency-friendly data structures.

As regards API, the TBB API for concurrent_queue includes a try_pop function,
so, as the documentation puts it, what you might write like this with an STL queue,

bool b=!q.empty();
if(b) {
x=q.front();
q.pop();
}

you write like this with a concurrent_queue:

bool b = q.try_pop(x); // x holds the popped value if there was an
// element to pop

So your "you shall lock at least till the if() part finishes" advice falls by
the wayside, because there is no if() part. The if and pop are an atomic
combination.

For an example application of a concurrent hash_map, take a look at
http://www.devx.com/cplus/Article/33334/1954, where an engineer from Intel (in
an article designed to make TBB scalability look good) describes how a
concurrent hash map can be used to allow multiple threads to process different
parts of a large text simultaneously. Each thread performs only inserts, in
contrast to the combination of inserts, lookups, and erasures that you seem to
think are always present. (The article doesn't mention it, but once the inserts
are finished, the application could presumably use an external mutex to restrict
access to the data structure if threads needed to modify it.)

Scott
 
S

Scott Meyers

Here is a simple scratch, this can be enhanced no doubt. It essentially
locks the whole container during the operation. As stressed by numerous
other replies, this locking scope is often either too narrow or too
broad, and as such this solution does not buy you much.

Nor does it scale, which is the primary motivation for concurrent data structures.

Scott
 
S

Scott Meyers

It depends on what it is used for.

I don't think it does. Either something scales or it does not. What you're
saying is that some applications have no need for something that scales, in
which case a scalable design is immaterial. That's true, but orthogonal. If
you lock an entire data structure that's accessed by multiple threads, access to
that data structure will eventually become a bottleneck if you throw enough
threads at it. (The programming community has enough experience with such
designs for me to state this as a fact.) You may have an application where
there are never enough threads to reach the bottleneck, in which case an
unscalable design is fine. But the fact that it's fine for your application
does not make it scalable.
> If the contention is low, then there
is no need to build something more fancy.

True. But orthogonal to my comment about the scalability of a design based on
locking an entire data structure for each access.
On the other hand, if the contention is high, then it might well be a
design mistake to have such a global data structure present.

Also true. Also orthogonal. Having a scalable data structure available does
not mean it is necessarily the best data structure for your application. But
the fact that it is not the best data structure for a particular application
does not mean that the data structure is not scalable.
> Even if it
can be done 10 times faster by using some dedicated concurrent data
structure, it still scales only 10 times better.

A design that runs 10 times faster isn't necessarily more scalable, it's just
faster. For the case we are discussing, scalability is determined by whether
performance can be maintained as the number of threads using it for some set
combination of operations increases. If data structure A can handle 10 threads
before performance drops off and data structure B can handle 100 threads before
its performance drops off, B is 10 times more scalable than A. But if B runs 10
times faster than A, but both have performance drop-offs starting at the same
number of threads, neither is more scalable than the other.

Scott
 
B

Balog Pal

Scott Meyers said:
I don't know about petrified, but I think that (1) you are assuming that
the API for a concurrent data structure will look like that for the
current STL data structures (which, as I posted earlier, is perhaps my
fault for wording my subject line poorly; I've modified it above) and (2)
you are assuming that the kinds of operations and combinations of
operations that are common on existing STL containers (i.e., on
concurrency-unfriendly data structures) will also be common on
concurrency-friendly data structures.

Probably so. :) As I mentioned earlier STL's interfaces are especially
hostile for these uses.

But even the other regular interface for 'collection; like stuff -- and even
for simple aggregates -- is bad for general cases.

The examples you gave are okay: mutating operations, like insert are about
good, and may turn useful here and there. The problem is with read
operations. What include all kinds of stuff retrieval, direct or indirect,
from a non-const object. The container has no chance to cover it up.
What is worse, my experience is that too many collegues believe that to have
thread-safe thing is to protect concurrent writes against each other. And
that read can go without lock. (if you recall the mass discussion of the
double-checked nightmare, it is just the surfece.

Misleading names don't help either. In another post you quoted the dox on
..size(). it is unfortunately not called approx_size() or outdated_size()
or bogus_size(), what reflects the semantics, but just size(). That
normally report, well, the count of elements. Firmly and correctly. Just
as its twin IsEmpty() [ empty() in STL :-( ]. For a 'concurrent' thing it
reports bogus info that we in turn use to make decisions. Not good.
As regards API, the TBB API for concurrent_queue includes a try_pop
function, so, as the documentation puts it, what you might write like this
with an STL queue,

bool b=!q.empty();
if(b) {
x=q.front();
q.pop();
}

you write like this with a concurrent_queue:

bool b = q.try_pop(x); // x holds the popped value if there was an
// element to pop

Yeah, this looks much better. (Actually at least one of my async_queue
classes has this same interface. )

But if I started fresh to create a multiproducer-singleconsumer queue for
the usual interthread messaging, it would not have pop at all. Only push
for producers, and some functor, that is the consumer, and gets called
whenever there is stuff -- with the already extracted message.

Not looking like a 'collection' at all, but rather like a
manager/dispatcher/...
So your "you shall lock at least till the if() part finishes" advice falls
by the wayside, because there is no if() part.

Good, the point is that the interface must not have a single function that
can be used that way, and be unsafe.
The if and pop are an atomic combination.

Yeah. Though it still miss the most often coupled signaling, and if you have
to make it by hand, there will (too likely) be excess locks and context
switches.

(And my other implementtions do not bother with pop -- but create/pick an
empty queue and swap it after waking on the signal -- then process the
grabbed content at leasure.)
For an example application of a concurrent hash_map, take a look at
http://www.devx.com/cplus/Article/33334/1954, where an engineer from Intel
(in an article designed to make TBB scalability look good) describes how a
concurrent hash map can be used to allow multiple threads to process
different parts of a large text simultaneously. Each thread performs only
inserts, in contrast to the combination of inserts, lookups, and erasures
that you seem to think are always present.

Err, I surely do not think they are always present ;-) When I design an
app, I carefully consider what operations happen where, document it, and do
locking (and other operations) accordingly. That normally makes the
solution *not* general, and not "reusable". Certainly that is not a
requirement, and the implementation itself is not worth mentioning in the
time measure -- the hard part is the design itself.

The hash_map you refer is supposedly a reusable component, from which we
expect an easily discoverable, reasonably safe interface. I hope we agree
that intuitive names for classes and functions are vital. From soemthing
called concurent_hash_map I'd probably not expect to be limited for such
asymmetric use. And a half-page of description on limitations and intended
uses are too rarely representable in just names.

Also, if the component covers my MT work only halfway -- I have to use
explicit synch incantations for at least some operations, then the gain on
the other operations is moot.

Also, if the object does locking that is not evidently showing its spots, it
is easy to create deadlock situations, you only need another mutex, say from
another such hash_map.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top