Concurrent bidirectional one-to-many map?

S

Sebastian

Does anyone know of a concurrent bidirectional one-to-many
map implementation?

By bidirectional I mean that I can lookup keys by values, by
one-to-many I mean that the value end of the map is a list or
set, and by concurrent I mean that I do not need to synchronize
externally and get performance comparable to that of
java.util.concurrent.ConcurrentHashMap in both directions.

If this beast doesn't exist, how would I go about inventing it?
I must say that I am by no means any sort of concurrency guru...

-- Sebastian
 
M

markspace

Does anyone know of a concurrent bidirectional one-to-many
map implementation?

By bidirectional I mean that I can lookup keys by values, by
one-to-many I mean that the value end of the map is a list or
set, and by concurrent I mean that I do not need to synchronize
externally and get performance comparable to that of
java.util.concurrent.ConcurrentHashMap in both directions.

If this beast doesn't exist, how would I go about inventing it?
I must say that I am by no means any sort of concurrency guru...

-- Sebastian


Can you just put both keys and values in as the key? Something like:

Map<Key,Key> myMap = new ConcurrentHashMap();
// add key k and value v
Key holder = new Key( k, v, KEY );
myMap.put( holder, holder );
holder = new Key( v, k, VALUE )
myMap.put( holder, holder );
....

public class Key<K,V> {
public enum KeyValue {KEY,VALUE};
private k key;
private V value;
private KeyValue keyOrValue;

public Key( K key, V value, KeyValue keyOrValue ) {
this.key = key;
this.value = value;
this.keyOrValue = keyOrValue;
}
// must also override hashcode and equals...
}


This way when you add something, it gets added as both key and value.
You might want to override "put" to do that automatically. When you
look up either a key or a value, you'll find both.

There may be better ways of doing this, it's just the first thing that I
thought of. Also the code above is meant more as a sketch than a
rigorous code example.
 
S

Sebastian

Am 06.05.2011 22:45, schrieb markspace:
Can you just put both keys and values in as the key? Something like:

Map<Key,Key> myMap = new ConcurrentHashMap();
// add key k and value v
Key holder = new Key( k, v, KEY );
myMap.put( holder, holder );
holder = new Key( v, k, VALUE )
myMap.put( holder, holder );
...

public class Key<K,V> {
public enum KeyValue {KEY,VALUE};
private k key;
private V value;
private KeyValue keyOrValue;

public Key( K key, V value, KeyValue keyOrValue ) {
this.key = key;
this.value = value;
this.keyOrValue = keyOrValue;
}
// must also override hashcode and equals...
}


This way when you add something, it gets added as both key and value.
You might want to override "put" to do that automatically. When you look
up either a key or a value, you'll find both.

There may be better ways of doing this, it's just the first thing that I
thought of. Also the code above is meant more as a sketch than a
rigorous code example.

Thanks for the idea. I do not yet see how to deal with the
one-to-many aspect of my problem.

To give an example, I'm trying to solve a problem like this:
Associate tasks with workspaces, where a workspace may hold many
tasks,but a task may be associate with at most one workspace.

Idea:

private ConcurrentMap<Item, Workspace> wsMap =
new ConcurrentHashMap<Item, Workspace>();

public boolean isAssigned( Item item ) {
return wsMap.containsKey( item );
}

public void assign( Item item, Workspace ws ) {
wsMap.putIfAbsent( item, ws );
if( !ws.equals( wsMap.get( item) ) ) {
throw new NotAssignableException();
}
}


Now I want to be able to close a workspace, releasing all tasks
to be assignable again to other workspaces.

public void closeWorkspace( Workspace ws ) {
// how do this efficiently? iterate over all
// entries (holding a write lock) and remove it
// when the value equals ws?
}

-- Sebastian
 
R

Roedy Green

If this beast doesn't exist, how would I go about inventing it?
I must say that I am by no means any sort of concurrency guru...

You need a map that looks up another collection, e.g. ArrayList.
You could even use an Array.

--
Roedy Green Canadian Mind Products
http://mindprod.com
Politicians complain that Kindles and iBooks are killing jobs by
destroying the paper book industry. I see it that they have create a way
to produce books for less than a third the cost without destroying forests
and emitting greenhouse gases in the process. They have created wealth.
They are encouraging literacy and cutting the costs of education.
 
L

Lew

Sebastian said:
schrieb markspace:

You want Apache Commons Collections, BidiMap, perhaps.
Thanks for the idea. I do not yet see how to deal with the
one-to-many aspect of my problem.

To give an example, I'm trying to solve a problem like this:
Associate tasks with workspaces, where a workspace may hold many
tasks,but a task may be associate with at most one workspace.

Idea:

private ConcurrentMap<Item, Workspace> wsMap =
new ConcurrentHashMap<Item, Workspace>();

public boolean isAssigned( Item item ) {
return wsMap.containsKey( item );
}

public void assign( Item item, Workspace ws ) {
wsMap.putIfAbsent( item, ws );
if( !ws.equals( wsMap.get( item) ) ) {
throw new NotAssignableException();
}
}


Now I want to be able to close a workspace, releasing all tasks
to be assignable again to other workspaces.

public void closeWorkspace( Workspace ws ) {
// how do this efficiently? iterate over all
// entries (holding a write lock) and remove it
// when the value equals ws?
}

You're closing workspaces, so presumably you're at the end of some sort of
session. How efficient does this have to be? What else is using this
'Workspace' instance at the same time? If something else is trying to access
the same 'Workspace' instance at the same time, isn't that already too bad
because you're closing the 'Workspace'? SO - just hold the lock (one you use
explicitly and for all operations on the same instance) until the method is done.

You use the work "efficiently" as if it means anything here, but you have not
thought about that meaning in context. On the face of it, it looks like you
need nothing at all to do with "efficiency".

I'm not so sure that 'ConcurrentMap' is the way to go here. You need to use
explicit locks based on what I see here.
 
M

markspace

Now I want to be able to close a workspace, releasing all tasks
to be assignable again to other workspaces.

public void closeWorkspace( Workspace ws ) {
// how do this efficiently? iterate over all
// entries (holding a write lock) and remove it
// when the value equals ws?
}


Fundamentally, I don't see any solution other than Patricia's. And I
think you'll need to hold locks for atomicity over all structures
anyway: I can't think of a different way of doing it. Unless your
problem can somehow tolerate finding a task as it's workspace is being
closed, you'll have to guarantee that all operations complete before the
next one is processed.

It really does depend on what "efficiently" means here. I'd just use
locks and let the JVM optimize it. It's pretty good at that.
 
L

Lew

You want Apache Commons Collections, BidiMap, perhaps.

Nope, sorry, that won't work after all.
"This map [org.apache.commons.collections.BidiMap] enforces the restriction
that there is a 1:1 relation between keys and values, meaning that multiple
keys cannot map to the same value."

Patricia's suggestion is both straightforward and standard.
 
S

Sebastian

Am 07.05.2011 18:49, schrieb Lew:
You want Apache Commons Collections, BidiMap, perhaps.

Nope, sorry, that won't work after all.
"This map [org.apache.commons.collections.BidiMap] enforces the
restriction that there is a 1:1 relation between keys and values,
meaning that multiple keys cannot map to the same value."

Patricia's suggestion is both straightforward and standard.

Thanks everyone for your comments. I'll follow Patricia's advice.
-- Sebastian
 
T

Tom Anderson

On 5/7/2011 2:43 AM, Sebastian wrote:
...
...

I'd deal with that sort of problem by having a custom data structure
that uses java.util structures in its implementation.

That's an eminently sensible course of action.

The question then is what structures does it use, and how? Sebastian was
quite specific about the behaviour he needs from this custom structure;
simply telling him he should use standard structures to build it is not
that much more helpful than telling him he should use classes and methods
to build it.

I say this because this is not - that i can see - one of those cases where
the solution is a simple matter of combining existing parts. The central
problem is actually quite a tricky one; managing a bidirectional mapping
is one thing, doing it in a threadsafe concurrent manner is not that much
harder, nor is making it one-to-many, but i think the combination of
efficient thread safety and one-to-manyness is actually pretty tricky. You
can make it work, and work safely, by using a coarse-grained lock over a
suitable agglomeration of data structures (a map of keys to sets of
values, plus a backwards map of values to keys, say), but that has very
poor concurrency.

It's not obvious to me how to make it work correctly with good
concurrency; the ConcurrentHashMap approach is to stripe the locks, so as
to partition the hashtable into independently-locked domains, but to use
the same set of stripes for the reverse mapping, you would need to be able
to tell which stripe a value belongs to - and that can only be done by
looking it up in the very reverse mapping you are worrying about locking!
Can you have a separate set of stripes for the reverse mapping? Do you
need some multi-phase approach, where you determine the stripes without
locking, then acquire the locks to do the actual lookup or insertion? Is
there some lock-free voodoo which could be used? Is there any mileage in
using a union-find structure to collapse the sets of values into a single
representative which could participate in a 1:1 key-value mapping? I am
certainly not any sort of data structure guru, but i don't have answers to
any of those questions. That makes it a problem worth discussing here, in
my book. Does anyone have any ideas on how to do this?

tom
 
R

Robert Klemme

That's an eminently sensible course of action.

The question then is what structures does it use, and how? Sebastian was
quite specific about the behaviour he needs from this custom structure;

Actually I am missing more information about access patterns and maybe
estimations about how big the "too many" side is expected to be (at
least on average).
simply telling him he should use standard structures to build it is not
that much more helpful than telling him he should use classes and methods
to build it.

I say this because this is not - that i can see - one of those cases where
the solution is a simple matter of combining existing parts. The central
problem is actually quite a tricky one; managing a bidirectional mapping
is one thing, doing it in a threadsafe concurrent manner is not that much
harder, nor is making it one-to-many, but i think the combination of
efficient thread safety and one-to-manyness is actually pretty tricky. You
can make it work, and work safely, by using a coarse-grained lock over a
suitable agglomeration of data structures (a map of keys to sets of
values, plus a backwards map of values to keys, say), but that has very
poor concurrency.

I'd be cautious with this statement: as far as I can see we do not
know access patters. If reads far outweigh writes then read write
locking with a global lock works perfectly and has the added advantage
to be quite simple to do as you mentioned already.
It's not obvious to me how to make it work correctly with good
concurrency; the ConcurrentHashMap approach is to stripe the locks, so as
to partition the hashtable into independently-locked domains, but to use
the same set of stripes for the reverse mapping, you would need to be able
to tell which stripe a value belongs to - and that can only be done by
looking it up in the very reverse mapping you are worrying about locking!
Can you have a separate set of stripes for the reverse mapping? Do you
need some multi-phase approach, where you determine the stripes without
locking, then acquire the locks to do the actual lookup or insertion? Is
there some lock-free voodoo which could be used? Is there any mileage in
using a union-find structure to collapse the sets of values into a single
representative which could participate in a 1:1 key-value mapping? I am
certainly not any sort of data structure guru, but i don't have answers to
any of those questions. That makes it a problem worth discussing here, in
my book. Does anyone have any ideas on how to do this?

This could be solved by standard relationship implementations: Make
workspace a member of Task and synchronize accesses on Task. Example:

https://gist.github.com/962538

By using monitors of Task and Workspace we have quite fine granularity
of locking.

Kind regards

robert
 
M

Michal Kleczek

Sebastian said:
Thanks for the idea. I do not yet see how to deal with the
one-to-many aspect of my problem.

To give an example, I'm trying to solve a problem like this:
Associate tasks with workspaces, where a workspace may hold many
tasks,but a task may be associate with at most one workspace.

Idea:

private ConcurrentMap<Item, Workspace> wsMap =
new ConcurrentHashMap<Item, Workspace>();

public boolean isAssigned( Item item ) {
return wsMap.containsKey( item );
}

public void assign( Item item, Workspace ws ) {
wsMap.putIfAbsent( item, ws );
if( !ws.equals( wsMap.get( item) ) ) {
throw new NotAssignableException();
}
}


Now I want to be able to close a workspace, releasing all tasks
to be assignable again to other workspaces.

public void closeWorkspace( Workspace ws ) {
// how do this efficiently? iterate over all
// entries (holding a write lock) and remove it
// when the value equals ws?
}

Maybe it is a dumb question but: is there any reason not to implement it
simply as

class Item {
private Workspace workspace;
//getters setters
}

class Workspace {
private Collection<Item> items;
//getters setters

public close() {
for (final Item item : items) {
//release item
}
}
}
 
T

Tom Anderson

This could be solved by standard relationship implementations: Make
workspace a member of Task and synchronize accesses on Task. Example:

https://gist.github.com/962538

By using monitors of Task and Workspace we have quite fine granularity
of locking.

True! The fastest map is no map at all.

If the OP can't modify Task or Workspace, perhaps he could consider
writing wrappers which refer to each other, a TaskWithWorkspace, and
WorkspaceWithTasks (perhaps with better names), and passing those around
rather than plain Tasks and Workspaces.

tom
 
S

Sebastian

Am 09.05.2011 19:28, schrieb Tom Anderson:
True! The fastest map is no map at all.

If the OP can't modify Task or Workspace, perhaps he could consider
writing wrappers which refer to each other, a TaskWithWorkspace, and
WorkspaceWithTasks (perhaps with better names), and passing those around
rather than plain Tasks and Workspaces.

tom

I can't modify Task, or Workspace, and I won't do much "passing around"
either, as I'm writing a component that will be called remotely (over
RMI). But I suppose I could create a wrapper object for each
Item/Workspace on the first call and look them up in maps indexed on
the Item/Workspace id. Now I wonder if I just have shifted the problem
to these maps?

I'll have to brush up on "monitors" and look at the example.

As for access patterns, that will depend on worker behavior. If they are
well-behaved, they won't competemuch for tasks and finish them quickly,
so I'd hope for about as many writes as reads.

-- Sebastian
 
L

Lew

Sebastian said:
schrieb Tom Anderson:
I can't modify Task, or Workspace, and I won't do much "passing around"
either, as I'm writing a component that will be called remotely (over
RMI). But I suppose I could create a wrapper object for each
Item/Workspace on the first call and look them up in maps indexed on
the Item/Workspace id. Now I wonder if I just have shifted the problem
to these maps?

I'll have to brush up on "monitors" and look at the example.

As for access patterns, that will depend on worker behavior. If they are
well-behaved, they won't competemuch for tasks and finish them quickly, so I'd
hope for about as many writes as reads.

"Monitor", very loosely speaking, is Javaese for "lock". More specifically,
every object in Java has a built-in lock that has the properties of a Hoare
monitor. I can't be bothered to look that up right now, but for us workaday
Java programmers it means that thing that 'synchronized' locks.

public class Lockaday
{
private static final State state = new State();

synchronized // class object's monitor
public static void setState( State newState )
{
state.copy( newState );
}

synchronized // class object's monitor
public static State getState()
{
return state.clone();
}

private final Object lock = new Object();

private Attribute attribute;

public void setAttribute( Attribute newAttr )
{
synchronized ( lock ) // monitor of instance that 'lock' references
{
attribute = newAttr;
}
}

public Attribute getAttribute()
{
synchronized ( lock ) // monitor of instance that 'lock' references
{
return attribute;
}
}

}
 
T

Tom Anderson

Am 09.05.2011 19:28, schrieb Tom Anderson:

I can't modify Task, or Workspace, and I won't do much "passing around"
either, as I'm writing a component that will be called remotely (over
RMI). But I suppose I could create a wrapper object for each
Item/Workspace on the first call and look them up in maps indexed on the
Item/Workspace id. Now I wonder if I just have shifted the problem to
these maps?

I don't think you have. The contents of those maps will never change - you
will need to add a new mapping when you first see a new key, but the
mapping will never change (except to be removed), and mappings in the two
maps are independent.
I'll have to brush up on "monitors" and look at the example.

They're pretty simple. A thread can lock something, and later unlock it.
If a thread has already locked something, it can lock it again, but must
make a corresponding number of unlocks to unlock it. If another thread has
already locked something, the thread trying to lock it will wait until
it's unlocked. Acquisition and release of different locks must be nested
(ie lock A, lock B, release A, release B is impossible). While holding a
lock, a thread can wait and notify on an object, which is something you
probably won't need.

For the master maps, though, i'd look at using a ConcurrentMap, which does
not involve locking the whole map. You want to look at the putIfAbsent
method in particular.

You probably will want to use monitors for the WorkspaceWrapper and
TaskWrapper's relationships with each other, though.

tom
 
S

Sebastian

Am 10.05.2011 09:34, schrieb Tom Anderson:
Thank you Robert for that instructive example.
I don't think you have. The contents of those maps will never change -
you will need to add a new mapping when you first see a new key, but the
mapping will never change (except to be removed), and mappings in the
two maps are independent.
...
For the master maps, though, i'd look at using a ConcurrentMap, which
does not involve locking the whole map. You want to look at the
putIfAbsent method in particular.
...

I hope I got the quoting right, and left enough context for everyone to
follow the discussion.

The maps are independent from each other, but I think not independent
from the rest of the coding.

Let's call the class that uses Robert's TskWp as a datastructure and
manages the two concurrent maps the WorkspaceManager. This class would
need to be able to look up tasks on their ID's and add them to a
workspace. It can use putIfAbsent() and get() to find the Task instance.
It would also need to clear tasks that have workspace==null from the
task map (perhaps after some timeout). In order to prevent concurrency
issues (the removal getting in between the putIfAbsent and the get),
I'd have to hold locks on the map.

It would be a bad thing to assign tasks to a workspace that is
being closed.

I show some code below to illustrate what I mean. Am I correct?

And if I have to hold these locks, how much advantage of an advantage
is left over the original (and much simpler) suggestion by Patricia?,
which was:
[> Am 07.05.2011 15:40, schrieb Patricia Shanahan:
I'd deal with that sort of problem by having a custom data structure
that uses java.util structures in its implementation.

For example, class TaskMapping could have a Map<Task,Workspace> that
maps a task to its workspace, and a Map<Workspace,Set<Task>> that maps a
workspace to the set of tasks it currently holds.
]


You probably will want to use monitors for the WorkspaceWrapper and
TaskWrapper's relationships with each other, though.

tom

Yes, that's exactly what Robert does in his coding.

I don't feel too clever, not being able to wrap my mind around this
concurrency stuff.

-- Sebastian


Here's a bit of the WorkspaceManager code referred to above:

import clj.TskWp.Task;
import clj.TskWp.Workspace;

private ConcurrentMap<String, Task> taskMap =
new ConcurrentHashMap<String, Task>();
private ConcurrentMap<UUID, Workspace> wpMap =
new ConcurrentHashMap<UUID, Workspace>();

private final ReadWriteLock lock = new ReentrantReadWriteLock();
private final Lock readLock = lock.readLock();
private final Lock writeLock = lock.writeLock();

public void addTasks( UUID id, String tid ) throws IllegalStateException
{
// have to synchronize externally because of closeWorkspace() ?
readLock.lock(); // <<<<<<<<<<< necessary ?
try {
Workspace wp = wpMap.get( id );
if( wp != null ) {
taskMap.putIfAbsent( tid, new Task( tid ) );
Task t = taskMap.get( tid );
wp.addTask( t );
}
}
finally {
readLock.unlock();
}
};

public void closeWorkspace( UUID id )
{
writeLock.lock(); // <<<<<<<<<<< necessary ?
try {
Workspace wp = wpMap.remove( id );
if( wp != null ) {
wp.removeTasks();
}
// remove unassigned tasks from global map (across workspaces)
for( Task t : taskMap.values() ) {
if( t.getWorkspace() == null ) {
taskMap.remove( t.getId() );
}
}
}
finally {
writeLock.unlock();
}
};
 
S

Sebastian

Am 11.05.2011 10:09, schrieb Sebastian:
Here's a bit of the WorkspaceManager code referred to above:

public void closeWorkspace( UUID id )
{
writeLock.lock(); // <<<<<<<<<<< necessary ?
try {
Workspace wp = wpMap.remove( id );
...

As afterthought to my immediately preceding post,
let me add that it would be nice if we could process
unrelated workspaces in parallel. How about having a
ReentrantReadWriteLock associated with each workspace,
and using themlike this:

Workspace wp = wpMap.get( id );
wp.writeLock.lock();
try {
wpMap.remove( id );
...

Wouldn't this improve things?

-- Sebastian
 
R

Robert Klemme

Am 11.05.2011 10:09, schrieb Sebastian:





As afterthought to my immediately preceding post,
let me add that it would be nice if we could process
unrelated workspaces in parallel. How about having a
ReentrantReadWriteLock associated with each workspace,
and using  themlike this:

   Workspace wp = wpMap.get( id );
   wp.writeLock.lock();
   try {
     wpMap.remove( id );
     ...

Wouldn't this improve things?

You said you can't modify Workspace and Task. If you create wrapper
classes then you need synchronization when managing the mapping from
original to wrapped. Sorry, I don't have time to go into more detail,
but externally adding state in a thread safe but still scalable manner
is tricky (unless you code Ruby that is ;-)).

Cheers

robert
 
L

Lew

Am 11.05.2011 10:09, schrieb Sebastian:

As afterthought to my immediately preceding post,
let me add that it would be nice if we could process
unrelated workspaces in parallel. How about having a
ReentrantReadWriteLock associated with each workspace,
and using themlike this:

Workspace wp = wpMap.get( id );
wp.writeLock.lock();
try {
wpMap.remove( id );
...

Wouldn't this improve things?

Why have you never answered my questions about long locks?

You seem to be going through a lot of gyrations unnecessarily. Just lock the
data you need to lock.

It's kind of rude that you ask for help but don't respond to requests for
further information.
 
S

Sebastian

Am 11.05.2011 15:00, schrieb Lew:
Why have you never answered my questions about long locks?

You seem to be going through a lot of gyrations unnecessarily. Just lock
the data you need to lock.

That Read-Write-Lock stuff does seem to be unnecessarily complex, I
should really synchronize on the workspace instance, as you already
recommended in your post dated 07.05.2011 13:59:
SO - just hold
the lock (one you use explicitly and for all operations on the same
instance) until the method is done.
It's kind of rude that you ask for help but don't respond to requests
for further information.

I'm sorry, I certainly did not intend to withhold information. To what
post are you referring? I find three previous posts in this thread that
were authored by you: The one mentioned above dated 07.05.2011 13:59,
one dated 07.05.2011 18:49 endorsing Patricia's suggestion, and one
dated 10.05.2011 00:36 kindly explaining the concept of "Monitor".
I did just reread them and did not find a question about "long locks".
(I am also not familiar with the term.)

Sorry if I'm being a dunce, but if you care to repeat your question
I'll try to answer it.

-- Sebastian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top