Are Collections synchronized for concurrent reads?

D

Daniel

Hi,

I have a situation where I first popuplate a Collection (HashMap,
TreeMap, and friends) with data. After the Collection has been fully
populated, it will be accessed concurrently by many threads, but only
for read operations - specifically, get() and iterator(). My question
is this: do I need to synchronize access to the maps, since the data
will NEVER be structurally modified after the initialization?

My understanding is that I do not have to. To quote a piece of the
HashMap API documentation:

"If multiple threads access this map concurrently, and at least one of
the threads modifies the map structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more mappings; merely changing the value associated
with a key that an instance already contains is not a structural
modification.)"

I read this as a guarantee that if the map is not structurally
modified, then you do not need to synchronize it. However I have seen
other places on the Java website and others that have blanket
statements such as "ALL access to a HashMap must be synchronized if it
is accessed by multiple threads." Is it safe to assume they're talking
about the case where the map might be modified by one of these
threads?

Thanks for any pointers or assurances,

Daniel

P.S. I find it interesting that the documentation for HashMap implies
that you do not need to synchronize access even when changing a value
associated with a key. Is the value assignment atomic?
 
T

Thomas Weidenfeller

My understanding is that I do not have to. To quote a piece of the
HashMap API documentation:

"If multiple threads access this map concurrently, and at least one of
the threads modifies the map structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more mappings; merely changing the value associated
with a key that an instance already contains is not a structural
modification.)"

I read this as a guarantee that if the map is not structurally
modified, then you do not need to synchronize it.

So would I. But my question would be, how much risk are you willing to
take? The JDK JavaDoc is notoriously bad when it comes to the
specification of the concurrency behavior and thread safety of classes
and methods. It would not be the first time that the documentation is
wrong. It would not be the first time that different JDKs/JREs
interpret the statement differently.

I would at least study the source code of the implementaion of the JREs
you intend to support.
P.S. I find it interesting that the documentation for HashMap implies
that you do not need to synchronize access even when changing a value
associated with a key. Is the value assignment atomic?

From looking at the source code of HashMap.put() in a 1.4 JDK, it
should work, given that the key indeed already exists in the map, and
that really no one else changes the structure at the same time. put()
does not synchronize between the lookup of the key's entry and changing
the entry. Once the entry has been found, changing the value is done by
assigning an Object reference. If you run two or more put()s for the
same existing key at the same time, the last one doing the assignment
will "win".

The big risk, however, is that you must be absolutely sure that the key
exists, otherwise put() will insert it, changing the structure.


/Thomas
 
C

Chris Uppal

Thomas said:
So would I. But my question would be, how much risk are you willing to
take? The JDK JavaDoc is notoriously bad when it comes to the
specification of the concurrency behavior and thread safety of classes
and methods. It would not be the first time that the documentation is
wrong. It would not be the first time that different JDKs/JREs
interpret the statement differently.

In any case, even if the concurrent access didn't destroy the structure of the
HashMap, reading and writing the "value" concurrently is a definite no-no.
Just as unsynchronised reading and writing of any non-volatile variable would
be.

The HashMap doc was probably written before it had sunk in to the programming
world that Java's defined atomicity of assignments provides almost exactly no
useful guarantee.

-- chris
 
S

Steve Horsley

Hi,

I have a situation where I first popuplate a Collection (HashMap,
TreeMap, and friends) with data. After the Collection has been fully
populated, it will be accessed concurrently by many threads, but only
for read operations - specifically, get() and iterator(). My question is
this: do I need to synchronize access to the maps, since the data will
NEVER be structurally modified after the initialization?

My understanding is that I do not have to. To quote a piece of the
HashMap API documentation:

"If multiple threads access this map concurrently, and at least one of
the threads modifies the map structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more mappings; merely changing the value associated with
a key that an instance already contains is not a structural
modification.)"

I read this as a guarantee that if the map is not structurally modified,
then you do not need to synchronize it. However I have seen other places
on the Java website and others that have blanket statements such as "ALL
access to a HashMap must be synchronized if it is accessed by multiple
threads." Is it safe to assume they're talking about the case where the
map might be modified by one of these threads?

Thanks for any pointers or assurances,

Daniel

Logically, provided that no-one is modifying the thing any more, all
comers will be able to read it without conflict.

I think the problem is not really one of concurrent reading, but with the
definition of "simultaneous" and "concurent". Memory cacheing can conspire
to make a sequence of events look different to different threads,
especially on multiprocessor machines. Synchronized acccess kills this
relativity problem by forcing cache updates. IIRC, leaving a synchronized
block forces a cache flush to main memory, and entering a synchronized
block forces a cache refresh from memory. Without both of these actions,
one thread may read stale data, or only part of some other thread's
update. So although it's a long shot, an un-synchronized read could read
an empty ot partially filled, or simply corrupt data stucture.

It seems to me that the best you can safely do is for the structure to be
written inside a synchronized block, and for the REFERENCE to the
structure to be retrieved from within a synchronized block. I think the
structure can then be safely accessed in an un-synchronized way. The
second thread should not pass the reference to a third thread without
requiring that third thread also pass through a synchronized block though.

P.S. I find it interesting that the documentation for HashMap implies
that you do not need to synchronize access even when changing a value
associated with a key. Is the value assignment atomic?

Yes, reference assignments are atomic. However, without synchronized
access to trigger cache updates, the reader thread may not see the change
for some time. Also, without synchronization, the reader of the new Object
in the map may not see the same contents as the writer of the Object
wrote. You have moved the problem of concurrency out of the HashMap and
into the Object that is passed via the HashMap. It's no use pushing your
peas round your plate - it won't make them go away.

Steve
 
A

Adam Maass

Daniel said:
Hi,

I have a situation where I first popuplate a Collection (HashMap,
TreeMap, and friends) with data. After the Collection has been fully
populated, it will be accessed concurrently by many threads, but only
for read operations - specifically, get() and iterator(). My question
is this: do I need to synchronize access to the maps, since the data
will NEVER be structurally modified after the initialization?

My understanding is that I do not have to. To quote a piece of the
HashMap API documentation:

"If multiple threads access this map concurrently, and at least one of
the threads modifies the map structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more mappings; merely changing the value associated
with a key that an instance already contains is not a structural
modification.)"

I read this as a guarantee that if the map is not structurally
modified, then you do not need to synchronize it. However I have seen
other places on the Java website and others that have blanket
statements such as "ALL access to a HashMap must be synchronized if it
is accessed by multiple threads." Is it safe to assume they're talking
about the case where the map might be modified by one of these
threads?

Thanks for any pointers or assurances,

Daniel

P.S. I find it interesting that the documentation for HashMap implies
that you do not need to synchronize access even when changing a value
associated with a key. Is the value assignment atomic?

Most collection implementations do not treat reads as structural
modifications, but some do. (LinkedHashMap, for example.) If I decided to
rely on many unsynchronized concurrent reads of a Collection, I would, at a
minimum:

1. Make the read and write of the reference to the Collection synchronized;
2. Stay away from the Collection interfaces as types -- your behavior
critically depends on the specific type of the collection; you want your
maintenence programmers, 10 years down the line, to know which specific
implementations you are using and why.


-- Adam Maass
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top