When to synchronize a list

H

HalcyonWild

Hi,

I wanted to know when should I synchronize a list. No, this is not the
old question regarding ArrayList Vs Vector. My doubts are more related
to synchronization ( whether a plain ArrayList or synchronized
Arraylist or Vector)
The common answer I find is that Vector or synchronizedList( new
ArrayList ) should be used when two or more threads are accessing the
contents of the List.

For example, consider following code in a Java class.

Vector vec = new Vector();
for (int i = 0; i < somenumber; i++)
{
//do something
vec.add(new Integer(i));
//do more
}


My thought is that whether you use any list, you still would have to
make the loop synchronized. Even if a vector or synchronized ArrayList
is used, another piece of code might just come in and modify that
vector (not run that loop, but just want to modify vec ) before the for
loop comes back again to that place. This is because, synchronized list
means when one thread is updating, another cannot. But if the first
thread is not updating, other freely can. This might create
inconsistent data in the List.

As far as I understand, the synchronization in the Vector is on the
instance itself. So before calling any method on the Vector instance, a
lock is obtained by runtime. Once method execution is over, the lock is
available to any other thread.

This is more possible in Servlets or JSP. ( EJB , I do not think so,
because, a separate instance is assigned to each thread calling a
method in the EJB ).

Another example, would be using a List as a member inside a singleton
class. If two threads request the instance of the singleton, both will
get the reference to the same instance of the class. Say the second
thread gets an index and has to retrieve the object at that index
position in the list, while the first thread is updating the vector. It
might turn out that the value retreived by the second thread might not
be what it expected. So I would anyway have to synchronize the methods
in the singleton, to get a lock on that instance before attempting to
call any method on that instance.

When would I use a List and when would I use synchronized List. Does it
really make any difference whether I use Vector or ArrayList when I
have to synchronize all access to the List. Please give any example to
help me understand the situation.

I have searched google and also I have searched previous posts. I could
not find the answer to this specific question so I am posting this. I
did find lot of stuff related to ArrayList Vs Vector, and which is
better , but not related to synchronization and Lists.

Thanks.
 
T

Thomas Weidenfeller

HalcyonWild said:
The common answer I find is that Vector or synchronizedList( new
ArrayList ) should be used when two or more threads are accessing the
contents of the List.

This is the common, but to simplistic answer, as you have figured out by
yourself.

My thought is that whether you use any list, you still would have to
make the loop synchronized.

It depends ... No really, it depends on you usage of threads, your data
model, etc. In short, you are seeing the typical problems of thread save
programming. A sequence of code like

v.add(something);
v.add(somethingElse);

might be thread save or not, depending on the relation of 'something'
and 'somethingElse'. If your algorithm e.g. requires that your vector
always contains consecutive 'something', 'somethingElse' pairs, you can
get in trouble with this. If the particular order doesn't mater then the
above code is thread-save - maybe. Maybe, because you have to look at
the rest of the code, too. Whenever data is shared you have to carefully
look what happens with the data.


/Thomas
 
E

Eric Sosman

HalcyonWild said:
Hi,

I wanted to know when should I synchronize a list. No, this is not the
old question regarding ArrayList Vs Vector. My doubts are more related
to synchronization ( whether a plain ArrayList or synchronized
Arraylist or Vector)
The common answer I find is that Vector or synchronizedList( new
ArrayList ) should be used when two or more threads are accessing the
contents of the List.

For example, consider following code in a Java class.

Vector vec = new Vector();
for (int i = 0; i < somenumber; i++)
{
//do something
vec.add(new Integer(i));
//do more
}


My thought is that whether you use any list, you still would have to
make the loop synchronized. Even if a vector or synchronized ArrayList
is used, another piece of code might just come in and modify that
vector (not run that loop, but just want to modify vec ) before the for
loop comes back again to that place. This is because, synchronized list
means when one thread is updating, another cannot. But if the first
thread is not updating, other freely can. This might create
inconsistent data in the List.

It depends on what your application thinks "inconsistent"
is. If you require that the values 0 through somenumber-1
appear in one contiguous stretch of the list, then yes: you
need to synchronize the entire loop so the entire loop becomes
"atomic." However, if you only require that all the numbers
get inserted into the list at some point (even though other
threads might delete some of them, insert others, or otherwise
fool around), then you don't need to synchronize the loop.
There's no one answer; it depends on your application.

Either way, the individual operations on the list must be
synchronized. This happens automatically with Vector and with
lists obtained from Collections.synchronizedList, but with a
plain ArrayList you need to synchronize explicitly.
As far as I understand, the synchronization in the Vector is on the
instance itself. So before calling any method on the Vector instance, a
lock is obtained by runtime. Once method execution is over, the lock is
available to any other thread.

Right. This makes individual operations "atomic," but
provides no guarantees for sequences of operations. For
example,

Vector v = ...;
...
if (! v.isEmpty())
Object obj = v.firstElement();

.... might throw NoSuchElementException in a multithreaded
program where two or more threads have access to `v'.
This is more possible in Servlets or JSP. ( EJB , I do not think so,
because, a separate instance is assigned to each thread calling a
method in the EJB ).

Another example, would be using a List as a member inside a singleton
class. If two threads request the instance of the singleton, both will
get the reference to the same instance of the class. Say the second
thread gets an index and has to retrieve the object at that index
position in the list, while the first thread is updating the vector. It
might turn out that the value retreived by the second thread might not
be what it expected. So I would anyway have to synchronize the methods
in the singleton, to get a lock on that instance before attempting to
call any method on that instance.

Again, it depends on what your application needs. For
example, if you're using the Vector as a stack, where one thread
pushes things onto the stack and the other removes them, the
pusher can just do v.add(obj) and rely on the synchronization
within Vector to maintain consistency.

The popper, though, can't do things quite so simply. The
problem is that the popper wants to do something special when
the Vector is empty, so it needs to query the Vector's state
before attempting to remove anything. Both the query and the
removal are synchronized, but the combination isn't -- and the
state of affairs could change between the query and the pop.
To guard against this, the popper needs to do something like

Object poppedObj;
synchronized(v) {
if (v.isEmpty())
poppedObj = null;
else
poppedObj = v.remove(v.size() - 1);
}

Note that the pusher's v.add(obj) cannot be rewritten as
v.add(v.size(), obj) -- I imagine you can figure out why.
When would I use a List and when would I use synchronized List. Does it
really make any difference whether I use Vector or ArrayList when I
have to synchronize all access to the List. Please give any example to
help me understand the situation.

List is an interface implemented by several classes, some
of which synchronize their methods and some of which don't.
This is a case where the implementation details protrude through
the veneer of the interface contract. You can either rely on
knowing that Vector's methods are synchronized, or you can use
any List type (including Vector) with Collections.synchronizedList.

If two or more threads have access to the List, you must
somehow arrange for synchronization. The most convenient way to
do this is to make synchronization a property of the List itself,
either with Collections.synchronizedList or by using Vector. An
alternative is to use a "bare" List and synchronize every method
call manually, but this is both tedious and fragile.

However, as the examples above should show, synchronizing
the individual method calls is not always enough. If you're
going to traverse an entire List with an Iterator you need to
synchronize the entire traversal loop, not just the methods
you call inside it. If you're going to add something to the
List only if the List contains fewer than forty-two elements
already, you need to synchronize the entire query-and-add
"combo-operation." Your application determines what's needed.
 
T

Thomas G. Marshall

HalcyonWild coughed up:
Hi,

I wanted to know when should I synchronize a list. No, this is not the
old question regarding ArrayList Vs Vector. My doubts are more related
to synchronization ( whether a plain ArrayList or synchronized
Arraylist or Vector)
The common answer I find is that Vector or synchronizedList( new
ArrayList ) should be used when two or more threads are accessing the
contents of the List.

For example, consider following code in a Java class.

Vector vec = new Vector();
for (int i = 0; i < somenumber; i++)
{
//do something
vec.add(new Integer(i));
//do more
}


My thought is that whether you use any list, you still would have to
make the loop synchronized.

YES. If the loop requires (as is usually the case) that the body of it
remain atomic, then you must mutex out execution of the list.

Vector vec = new Vector();

[...]

synchronized(vec)
{
for (int i = 0; i < somenumber; i++)
{
//do something
vec.add(new Integer(i));
//do more
}
}

Try to synchronize only the minimum needed. You should not over
synchronize, unless it is truly hairy to do otherwise. That is, it is often
important to sacrifice runtime efficiency to keep later maintainability
intact, but lets leave that idea alone for now. And I'll skip giving you a
hairy example.


Even if a vector or synchronized ArrayList
is used, another piece of code might just come in and modify that
vector (not run that loop, but just want to modify vec ) before the
for loop comes back again to that place. This is because,
synchronized list means when one thread is updating, another cannot.
But if the first thread is not updating, other freely can. This might
create inconsistent data in the List.

As far as I understand, the synchronization in the Vector is on the
instance itself. So before calling any method on the Vector instance,
a lock is obtained by runtime.

Remember that that lock is only obtained when calling methods that are
synchronized, or when some block of clode is synchronized. The class
/itself/ is not synchronized per se, though when using that term, what is
meant is that the majority of its methods are synchronized.

Once method execution is over, the
lock is available to any other thread.

This is more possible in Servlets or JSP. ( EJB , I do not think so,
because, a separate instance is assigned to each thread calling a
method in the EJB ).

Another example, would be using a List as a member inside a singleton
class. If two threads request the instance of the singleton, both will
get the reference to the same instance of the class. Say the second
thread gets an index and has to retrieve the object at that index
position in the list, while the first thread is updating the vector.
It might turn out that the value retreived by the second thread might
not be what it expected. So I would anyway have to synchronize the
methods in the singleton, to get a lock on that instance before
attempting to call any method on that instance.

When would I use a List and when would I use synchronized List. Does
it really make any difference whether I use Vector or ArrayList when I
have to synchronize all access to the List. Please give any example to
help me understand the situation.

Take the example I gave above. That loop itself does infact lock out any
access to the vector while that loop is running. Technically speaking, if
all such access are externally synchronized like this, you would never have
to worry. /All/ such access.

But what if there is an unsynchronized call adding something to the vector
that is mid stream when this loop elsewhere starts to run. If you were
using an UNsynchronized arraylist instead of vector or synchronized
arraylist, then in the middle of an add(), the loop could start, grab the
lock (which does no good), and both the add and loop would be running at the
same time. OI.

I have searched google and also I have searched previous posts. I
could not find the answer to this specific question so I am posting
this. I did find lot of stuff related to ArrayList Vs Vector, and
which is better , but not related to synchronization and Lists.

Having been part of several of such threads, you ought to look again. ;)


--
Unix users who vehemently argue that the "ln" command has its arguments
reversed do not understand much about the design of the utilities. "ln
arg1 arg2" sets the arguments in the same order as "mv arg1 arg2".
Existing file argument to non-existing argument. And in fact, mv
itself is implemented as a link followed by an unlink.
 
Y

Yamin

This is really a design question that depends on your particular
situation.

I've found that trying to overencapsulate synchronization can often
lead to more problems that its worth.

If you are going to deal with the structure only within the confines of
add/remove or push/pop or whatever...then by all means use one of the
storage classes that contain synchronization within the methods. For
example, if you're using a fifo queue for messages. Then you probably
aren't going to need to operate on the queue itself...you'll just need
to add,remove,clear the queue...

But if, as in your example with the for loop, you actually plan to
'operate on' the structure itself, I would suggest not encapsulating
the synchronization and instead rely on an unsynchronized storage
class, with explicit synchronized blocks around code that needs to be
protected.

Just my two cents...
PS...I know I may not be clear on what 'operate on' means...but
hopefully you get that point. Basically if any operation you plan to
take on teh structure extends beyond a single method call to the
structure, I would use explicit synchronized blocks.

Yamin Bismilla
 
A

AWieminer

As far as I understand, the synchronization in the Vector is on the
instance itself. So before calling any method on the Vector instance, a
lock is obtained by runtime. Once method execution is over, the lock is
available to any other thread.
Another example, would be using a List as a member inside a singleton
class. If two threads request the instance of the singleton, both will
get the reference to the same instance of the class. Say the second
thread gets an index and has to retrieve the object at that index
position in the list, while the first thread is updating the vector. It
might turn out that the value retreived by the second thread might not
be what it expected. So I would anyway have to synchronize the methods
in the singleton, to get a lock on that instance before attempting to
call any method on that instance.
When would I use a List and when would I use synchronized List. Does it
really make any difference whether I use Vector or ArrayList when I
have to synchronize all access to the List. Please give any example to
help me understand the situation.

I never use Vector class, but always ArrayList class. Then if I have
multithreading apps I analyze the code. It's really easy to do if does
not care about the optimization issues to avoid unnecessary locks.

* if list is single thread use, no point synchronisation (method level
instance or private single thread member or single thread app).
* if list is readonly use then no syncronization needed. This is common
case for configuration values which are initialized at the app startup
phase.
* if list is modified and read from multiple threads, then I use
"Collections.synchronizedList" wrapper. Store a returned reference and
use it everywhere I need that list.

JRE1.5 has more list implementations to choose from. You could use
copyOnWrite implementation to avoid syncronization in multithread use.
Here all read operations are safe and write operation creates a new
internal list buffer.
 
T

Thomas G. Marshall

AWieminer coughed up:
I never use Vector class, but always ArrayList class. Then if I have
multithreading apps I analyze the code. It's really easy to do if does
not care about the optimization issues to avoid unnecessary locks.

* if list is single thread use, no point synchronisation (method level
instance or private single thread member or single thread app).
* if list is readonly use then no syncronization needed.

*WARNING* ! This may be true for the built-in collection classes (I haven't
looked) but it is perilous to assume it is ok in general. Many accessor
methods actually change the internal state of their object, even if
slightly, and temporarily. It's not ideal, nor even good design--in fact
it's almost certainly terrible design. But you have to "worry" about it
unless you fully trust that:

1. the code doesn't do it.
2. the code is *very* unlikely to do it in the future.

Making such assumptions is the halmark of fragility.

This is
common case for configuration values which are initialized at the app
startup phase.
* if list is modified and read from multiple threads, then I use
"Collections.synchronizedList" wrapper. Store a returned reference and
use it everywhere I need that list.

JRE1.5 has more list implementations to choose from. You could use
copyOnWrite implementation to avoid syncronization in multithread use.
Here all read operations are safe and write operation creates a new
internal list buffer.



--
Unix users who vehemently argue that the "ln" command has its arguments
reversed do not understand much about the design of the utilities. "ln
arg1 arg2" sets the arguments in the same order as "mv arg1 arg2".
Existing file argument to non-existing argument. And in fact, mv
itself is implemented as a link followed by an unlink.
 
E

Eric Sosman

Thomas said:
AWieminer coughed up:
* if list is single thread use, no point synchronisation (method level
instance or private single thread member or single thread app).
* if list is readonly use then no syncronization needed.

*WARNING* ! This may be true for the built-in collection classes (I haven't
looked) but it is perilous to assume it is ok in general. Many accessor
methods actually change the internal state of their object, even if
slightly, and temporarily. It's not ideal, nor even good design--in fact
it's almost certainly terrible design. [...]

The warning is a good one, but is "terrible design"
defensible? Self-organizing data structures (e.g. splay
trees) certainly seem to require the ability to modify
their internal state even on read accesses. There's also
the matter of instrumentation: a hash table implementation
might keep track of things like collision rates and probe
counts; such statistics would be ancillary in a sense, but
would certainly be part of the internal state.
 
T

Thomas Hawtin

AWieminer said:
* if list is readonly use then no syncronization needed. This is common
case for configuration values which are initialized at the app startup
phase.

At some point the list will have been written to, so there needs to be
something done to ensure uninitialised values are seen.

In most cases the list will be created in class initialisation. So long
as the list doesn't leak to another thread during the initialisation
process, the class initialisation happens-before any use of the class
and hence list.

Tom Hawtin
 
T

Thomas G. Marshall

Eric Sosman coughed up:
Thomas said:
AWieminer coughed up:
* if list is single thread use, no point synchronisation (method
level instance or private single thread member or single thread
app). * if list is readonly use then no syncronization needed.

*WARNING* ! This may be true for the built-in collection classes (I
haven't looked) but it is perilous to assume it is ok in general.
Many accessor methods actually change the internal state of their
object, even if slightly, and temporarily. It's not ideal, nor even
good design--in fact it's almost certainly terrible design. [...]

The warning is a good one, but is "terrible design"
defensible? Self-organizing data structures (e.g. splay
trees) certainly seem to require the ability to modify
their internal state even on read accesses. There's also
the matter of instrumentation: a hash table implementation
might keep track of things like collision rates and probe
counts; such statistics would be ancillary in a sense, but
would certainly be part of the internal state.

Yes, you're right. I was careful to say " /almost/ certainly".

But in any case, the warning is clear. Accessors can mutate behind the
scenes.
 
I

Ingo R. Homann

Hi,
...
* if list is modified and read from multiple threads, then I use
"Collections.synchronizedList" wrapper. Store a returned reference and
use it everywhere I need that list.

Note that this is not sufficient very often:

for(int i=0;i<arrayList.size();i++) {
// (*)
System.out.println(arrayList.get(i));
}

Consider another Thread will remove some elements when the current
Thread is at the marked position (*). This will cause an
IndexOutOfBoundsException, no matter if the the arrayList is a
synchronized List or not.

It is necessary to synchronize the *complete* for-loop.

Ciao,
Ingo
 
H

HalcyonWild

HalcyonWild said:
Hi,

I wanted to know when should I synchronize a list. No, this is not the
old question regarding ArrayList Vs Vector. My doubts are more related
to synchronization ( whether a plain ArrayList or synchronized
Arraylist or Vector)
The common answer I find is that Vector or synchronizedList( new
ArrayList ) should be used when two or more threads are accessing the
contents of the List.

Helped me a lot to clear my understanding.

Thanks a lot for all replies.
Please do post if anyone has more ideas.
 
T

Thomas G. Marshall

Ingo R. Homann coughed up:
Hi,


Note that this is not sufficient very often:

for(int i=0;i<arrayList.size();i++) {
// (*)
System.out.println(arrayList.get(i));
}

Consider another Thread will remove some elements when the current
Thread is at the marked position (*). This will cause an
IndexOutOfBoundsException, no matter if the the arrayList is a
synchronized List or not.

It is necessary to synchronize the *complete* for-loop.

YES! I absolutely cannot believe how very many engineers just don't
understand this. I've used the following example everywhere I can, even a
few times in usenet:

thing.set(thing.get() + 1);

is *NOT* safe even if .set() and .get() are synchronized! In this
particular case, it also messes with how you read. To many it looks as if
the set() is establishing the lock, and everything else happens within it,
which is a mistake.

FURTHERMORE, I honestly believe that most engineers out there (yes, I said
most) do not understand synchronization at all, and view it as some sort of
"magical idiom" to add to things when they flake out.
 
J

John Currier

Thomas said:
Ingo R. Homann coughed up:

YES! I absolutely cannot believe how very many engineers just don't
understand this. I've used the following example everywhere I can, even a
few times in usenet:

thing.set(thing.get() + 1);

is *NOT* safe even if .set() and .get() are synchronized! In this
particular case, it also messes with how you read. To many it looks as if
the set() is establishing the lock, and everything else happens within it,
which is a mistake.

FURTHERMORE, I honestly believe that most engineers out there (yes, I said
most) do not understand synchronization at all, and view it as some sort of
"magical idiom" to add to things when they flake out.

I run into that frequently when dealing with Iterators/Enumerators over
collections that are supposedly synchronized. Many developers don't
realize the exposure.

John
http://schemaspy.sourceforge.net
 
T

Thomas Hawtin

Thomas said:
FURTHERMORE, I honestly believe that most engineers out there (yes, I said
most) do not understand synchronization at all, and view it as some sort of
"magical idiom" to add to things when they flake out.

Yup. I've met some developers who were very proud that they had used the
synchronized keyword. Never mind the deadlocks and race conditions all
over the shop. It was so bad that even people who worked in the same
room refused to use the application.

I must admit though, in my first year of Java I was a bit out of my
depth with it. Doug Lea's book should be compulsory reading for anyone
using threads and Java.

Tom Hawtin
 
P

Patricia Shanahan

John said:
I run into that frequently when dealing with Iterators/Enumerators over
collections that are supposedly synchronized. Many developers don't
realize the exposure.

John
http://schemaspy.sourceforge.net

The newsgroup thread subject itself contains one of the key problems.
The question should be something like "How to design shared data
structure operations to ensure multi-thread safety?"

Patricia
 
K

Kenneth P. Turvey

Thomas said:
I must admit though, in my first year of Java I was a bit out of my
depth with it. Doug Lea's book should be compulsory reading for anyone
using threads and Java.

If you learned C first with POSIX or SVR4 IPC you would think Java is a
wonderful simplification. :)
 
T

Thomas Hawtin

Kenneth said:
If you learned C first with POSIX or SVR4 IPC you would think Java is a
wonderful simplification. :)

I got the mechanics of it, but I just didn't grok how it was really all
supposed to fit together. I understood I should synchronise even if I
was doing a single operation. I was happy using repaint events because
EventQueue.invokeLater wasn't there yet. But there was something missing
at a higher level, which I guess is much the same however the mechanics
of it are exposed.

Tom Hawtin
 
T

Thomas G. Marshall

Thomas Hawtin coughed up:
Yup. I've met some developers who were very proud that they had used
the synchronized keyword. Never mind the deadlocks and race
conditions all over the shop. It was so bad that even people who
worked in the same room refused to use the application.

A note to this. Many shriek in horror over this, but I strongly suggest
most (but particularly newbies) to adopt this:

public void method()
{
synchronized(this)
{
}
}

Instead of the ubiquitous

public synchronized void method()
{
}

For the following reasons:

1. It firms up in their noggins just what object is holding the lock
2. It is only the barest smidgeon slower (I've never met anyone who can
measure it)
3. It allows you (and further maintainers down the road) to easily move
things not requiring protection before and after the lock.

Many times, you'll discover that the vast majority of a synchronized method
does not require the lock at all, which makes #3 above important.
 
K

Kenneth P. Turvey

Thomas said:
Many times, you'll discover that the vast majority of a synchronized
method does not require the lock at all, which makes #3 above important.

If you are writing small self contained methods that only do one thing, this
shouldn't matter.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top