Mutable Objects and Thread Boundaries

A

Alan Gutierrez

I'm implementing a log and I have a logging thread that writes log
entires to file. Below is skeletal code to illustrate a question I have
about concurrency and memory.

public class Queued {
public void main(String[] args) {
final LinkedBlockingQueue<List<Integer>> queue =
new LinkedBlockingQueue<List<Integer>>();
Thread thread = new Thread(new Runnable() {
public void run() {
for (;;) {
try {
List<Integer> list = queue.take();
if (list.get(0) != 0) {
continue;
}
} catch (InterruptedException e) {
continue;
}
}
}
});
thread.start();
List<Integer> list = new ArrayList<Integer>();
list.add(1);
queue.offer(list);
list = new ArrayList<Integer>();
list.add(0);
queue.offer(list);
thread.join();
}
}

As you can see, I'm using a normal list and passing it through a
blocking queue to a thread that pulls a list form the queue and acts on
the first element in the list.

This code appears to me to be broken. The list is not synchronized or
otherwise thread safe, so when the taker thread obtains a reference to
the list, the list may be empty.

It wouldn't be a question if I was using an immutable object, but is
there a good way to ensure that mutable objects are flushed before
sending them off to another thread? (Or does the above code work and I
just don't know it?)
 
M

markspace

Alan said:
This code appears to me to be broken. The list is not synchronized or
otherwise thread safe, so when the taker thread obtains a reference to
the list, the list may be empty.

Actually, the LinkedBlockingQueue is synchronized. Why do you say it's not?

It's direct parent, BlockingQueue, makes the memory consistency guarantee.


> try { ....
> } catch (InterruptedException e) {
> continue;
> }

Interrupts do not happen spuriously or for no reason. If you get an
interrupt, someone has requested that your thread exit. It's best to do
so. I'd recommend something like:

public void run() {
try {
for (;;) {

List<Integer> list = queue.take();
if (list.get(0) != 0) {
continue;
}
}
} catch (InterruptedException e) {
}

}

In other words, go ahead and catch the exception to prevent ugly
messages, but "handle" it by exiting. This is almost always the correct
response.
 
J

Joshua Cranmer

It wouldn't be a question if I was using an immutable object, but is
there a good way to ensure that mutable objects are flushed before
sending them off to another thread? (Or does the above code work and I
just don't know it?)

java.util.concurrent.BlockingQueue says:
Memory consistency effects: As with other concurrent collections,
actions in a thread prior to placing an object into a BlockingQueue
happen-before actions subsequent to the access or removal of that
element from the BlockingQueue in another thread.
 
A

Alan Gutierrez

Joshua said:
java.util.concurrent.BlockingQueue says:
Memory consistency effects: As with other concurrent collections,
actions in a thread prior to placing an object into a BlockingQueue
happen-before actions subsequent to the access or removal of that
element from the BlockingQueue in another thread.

Thank you. That answers my question. I'm much relieved. Something I'd
read in the recent immutability thread had me thinking that I had
everything all wrong.

It was all the talk about the importance of immutability that made me
worry that field assignments that were unsynchronized or non-volatile
could be hidden from the receiving thread. This bit of documentation
that I missed, plus a first reading of JLS Ch 17, put that to rest.
 
A

Alan Gutierrez

markspace said:
Actually, the LinkedBlockingQueue is synchronized. Why do you say it's
not?

It's direct parent, BlockingQueue, makes the memory consistency guarantee.

Joshua Cramer's snipped bit of the API docs spelled this out for me a
little more clearly. I probably would have frustrated you by accusing
you of misunderstanding my question had I not seen that first.
Interrupts do not happen spuriously or for no reason. If you get an
interrupt, someone has requested that your thread exit.

InterruptedException is a curious case for me.

It is my understanding that the only someone who will request that my
thread exit would be me. Not in this example, but in the log that I am
implementing, the logging thread is an implementation detail. I have
provided a way to terminate the log by sending a terminal value, and
this is not exposed to the client, either. Its what happens when call
shutdown on a facade, the client does not add items to the queue
directly, either, so they can't log the terminal value.

With a terminal value to shutdown the queue, and with the thread
unexposed to the client, there seems to be no way to send an interrupt
to the thread, so it becomes the classic unreachable checked exception
block. How do you handle the interrupt that never happens? I decided
that trying again after the impossible was as good as giving into the
impossible.

Unless there is some mechanism in Java that sends interrupts to threads
at shutdown of which I'm not aware. Honestly, I'm not sure how Java
stops say, a Thread.setDaemon() thread, but I always assumed it was
synonymous to kill -9.
 
J

Joshua Cranmer

It was all the talk about the importance of immutability that made me
worry that field assignments that were unsynchronized or non-volatile
could be hidden from the receiving thread. This bit of documentation
that I missed, plus a first reading of JLS Ch 17, put that to rest.

Immutability is the easiest way to guarantee safe publication: final
fields guarantee that they can be used by any thread safely. Any other
type of object has to be safely published. Java Concurrency in Practice
lists four ways of doing it:
* Initializing an object reference from a static initializer
* Storing a reference to it into a volatile field or AtomicReference
* Storing a reference to it into a final field of a properly constructed
object
* Storing a reference to it into a field that is properly guarded by a lock.

It also mentions that Java's thread-safe libraries all constitute safe
publication, because of their internal synchronization. To be clear, the
following are explicitly mentioned:
Hashtable, Collections.synchronizedMap, ConcurrentMap, Vector,
CopyOnWriteArrayList,CopyOnWriteArraySet, Collections.synchronizedList,
Collections.synchronizedSet, BlockingQueue, and ConcurrentLinkedQueue.

One key thing to note is that this safe publication only guarantees
visibility of changes made to an object before publication; anything
that happens afterwords must still be handled using regular
thread-safety techniques.
 
M

markspace

Alan said:
With a terminal value to shutdown the queue, and with the thread
unexposed to the client, there seems to be no way to send an interrupt
to the thread, so it becomes the classic unreachable checked exception
block.


See, I'd go the other way. Even though I don't personally know any way
your thread is exposed to a client either, I'd still let an interrupt
kill the thread. Two reasons for this:

1) There may be some way a client can send a SIGHUP or similar through
the console, which the JVM might propagate to all threads, for an
orderly shutdown (kill -9 is disorderly and abrupt).

2) Another programmer might assume that any thread can be killed with an
interrupt, and might be confused or frustrated when it can't. In this
light, it's just a bit of future proofing and courtesy to make your
thread work in the standard way.

This is not to say that there is never a good reason to ignore
interrupts, just that one shouldn't do so without a good reason.
 
A

Alan Gutierrez

Joshua said:
Immutability is the easiest way to guarantee safe publication: final
fields guarantee that they can be used by any thread safely. Any other
type of object has to be safely published. Java Concurrency in Practice
lists four ways of doing it:
* Initializing an object reference from a static initializer
* Storing a reference to it into a volatile field or AtomicReference
* Storing a reference to it into a final field of a properly constructed
object
* Storing a reference to it into a field that is properly guarded by a
lock.

It also mentions that Java's thread-safe libraries all constitute safe
publication, because of their internal synchronization. To be clear, the
following are explicitly mentioned:
Hashtable, Collections.synchronizedMap, ConcurrentMap, Vector,
CopyOnWriteArrayList,CopyOnWriteArraySet, Collections.synchronizedList,
Collections.synchronizedSet, BlockingQueue, and ConcurrentLinkedQueue.

One key thing to note is that this safe publication only guarantees
visibility of changes made to an object before publication; anything
that happens afterwords must still be handled using regular
thread-safety techniques.

Very cool. Thank you.

I recall seeing something like this earlier today.

public void foo(int number) {
final List<Integer> list = new ArrayList<Integer>();
list.add(number);
new Thread(new Runnable() {
final List<Integer> happensAfter = list;
public void run() {
doSomethingWithList(happensAfter);
}
}).start();
}

And I understand now that by virtue of assigning to the final in the
Runnable, that makes the call to add happen before.

If that's correct, then I feel pretty confident that I get all this.
 
C

ClassCastException

With a terminal value to shutdown the queue, and with the thread
unexposed to the client, there seems to be no way to send an interrupt
to the thread, so it becomes the classic unreachable checked exception
block. How do you handle the interrupt that never happens? I decided
that trying again after the impossible was as good as giving into the
impossible.

My usual way of handling an "unreachable" checked exception block is
"throw new Error();".

If the "unreachable" checked exception block ever IS reached, then the
code did not behave the way I anticipated it would. The code contains a
bug of some sort; it's not doing what I thought it would. Something is
allowed to happen (reaching that block) that I thought was prevented.
More things may be allowed to happen (e.g., data corruption) that I
thought was prevented, too.

Ignoring the exception, or even closing the one thread down silently,
would not raise the alarm; it would mask the symptom or, at best, replace
it with a subtler and less specific symptom (logging thread silently and
mysteriously terminates).

Throwing an Error, preferably with the InterrutpedException chained to it
as a cause, will produce a stack trace in the console for me to examine
that may reveal much about what's happening, including how an assumed
unreachable block got reached. The stack trace will also point to which
such location got reached; I can add breakpoints or other debugging code
in there and try to reproduce the bug to get more information and get a
step closer to its source, in this case, to where the
InterruptedException came from that shouldn't have been possible (by
adding an e.printStackTrace(); in the catch block, say -- indeed, I might
put that there and then throw new Error(); to begin with -- and then
looking for the throw of the InterruptedException in its stack trace).
 
A

Alan Gutierrez

Alan said:
Very cool. Thank you.

I recall seeing something like this earlier today.

public void foo(int number) {
final List<Integer> list = new ArrayList<Integer>();
list.add(number);
new Thread(new Runnable() {
final List<Integer> happensAfter = list;
public void run() {
doSomethingWithList(happensAfter);
}
}).start();
}

And I understand now that by virtue of assigning to the final in the
Runnable, that makes the call to add happen before.

If that's correct, then I feel pretty confident that I get all this.

Hmm...

JLS Ch 17 says that...

A call to start() on a thread happens-before any actions in the started
thread.

Which implies that the final member of the runnable is unnecessary since
the item was added to the array before calling Thread.start and the run
method of the runnable will happen after thread start.
 
L

Lew

Very cool. Thank you.

I recall seeing something like this earlier today.

public void foo(int number) {
final List<Integer> list = new ArrayList<Integer>();
list.add(number);
new Thread(new Runnable() {
final List<Integer> happensAfter = list;
public void run() {
doSomethingWithList(happensAfter);
}
}).start();
}

And I understand now that by virtue of assigning to the final in the
Runnable, that makes the call to add happen before.

If that's correct, then I feel pretty confident that I get all this.

Almost. Both the initialization of 'list' and the 'add()' to it
/happen-before/ the call to 'doSomethingWithList()', so you could more
compactly write:

public void foo( int number )
{
final List <Integer> list = new ArrayList <Integer> ();
list.add( number );

new Thread( new Runnable()
{
public void run()
{
doSomethingWithList( list );
}
}).start();
}

on account of
<http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.4.5>
"A call to start() on a thread happens-before any actions in the started thread."

The local variable 'list' must be final to be accessible to the inner class.

Be very careful with auto(un)boxing.
 
A

Alan Gutierrez

Lew said:
Be very careful with auto(un)boxing.

Why? I asked my friend Google and it said:

http://fupeg.blogspot.com/2007/07/auto-unboxing-in-java.html

I don't see how in the example code in this thread, auto unboxing
triggers an NPE.

public List<Integer> list(int i) {
List<Integer> list = new ArrayList<Integer>();
list.add(i);
return list;
}

Other than the silliness of it, what is wrong with the above? (Or maybe
I reduced the example too much?)
 
M

markspace

Joshua said:
Immutability is the easiest way to guarantee safe publication: final
fields guarantee that they can be used by any thread safely.


Note quite. Final fields by themselves don't guarantee immutability or
safe publication. An immutable object must:

* All fields are final.
* It is properly constructed. (No "this escapes." That bit is important.)
* Its state cannot be modified after construction.

Those three things must hold for an object to be treated as immutable by
the JVM. Mess up any one, and poof, no more immutability.

I'm re-reading JCIP because of all the discussion here, and there's
stuff in there I missed that first time around. It's intersting to
re-read that book.

Short answer is that a full discussion of immutability and thread safety
is really hard in a forum where we're limited by how much we can type.
Get the book, it's well worth it.
 
M

markspace

Peter said:
It is worth noting that the reference to the object itself need not be
the thing that is subject to synchronization. Anything that introduces
the necessary memory barrier is sufficient.


One caveat here: "anything" must be the same thing in all threads which
access the shared data in question. If thread A locks objectA, and
thread B locks objectB, they are in no way synchronized, nor is there
any happens-before relationship establish for any shared data they access.

So, for example, any of the "storing a reference to it" examples above
are actually satisfied by any data being stored in the described
location, so long as the data being published has been initialized prior
to storage _and_ when the data is later read by a different thread, that
thread first retrieves whatever data was in fact stored in the described
location.


I don't think this is correct for all of the examples Joshua listed.
For example, storing a reference to a field that is properly guarded by
a lock does NOT create a happens-before relationship for "any data"
later read by a different thread.

Actually, the more I reread your statement, the more I'm sure I have no
idea what the heck you are actually trying to say. I can't think of any
combination of events where the events you listed actually bear on each
other. Could you produce an example where that's relevant?
 
M

markspace

Peter said:
class Test
{
private volatile boolean set = false;
In the above example, if methodA() is executed on one thread, and then
methodB() is executed in a different thread, methodB() is guaranteed to
see the new value for "data".


Yes, for volatile, this works. Volatile fields have special semantics
which prevent re-ordering reads and writes by the compiler, so this is
guaranteed to work.

I said it didn't work for all of the cases Joshua listed, particularily
the synchronized one. Replace the volatile with a synchronized block
and it won't work anymore.

Just trying to let other folks know what works and what doesn't. I
think you knew this already, just that you didn't touch all the bases
when you mentioned it originally.

class Test
{
private boolean set;
private int data;

void methodA()
{
data = 5;
synchronized( this ) {
set = true;
}
}

void methodB()
{
synchronized( this ) {
if (set)
{
System.out.print(data); // may print "0"
}
}
}
}
 
M

markspace

Peter said:
I would be interested in seeing an example where your third point is
violated but the first point is not.


class Mutable {

private final List<String> names;

public Mutable() {
names = new List<String>();
names.add("Joe");
names.add("Jim");
}

public List<String> getNames() {
return names;
}
}

This class would be immutable except for the fact that the private final
variable "names" escapes and could be modified. If the getNames()
method is removed, then the class is immutable and can be published even
in a data race.

And I know what's coming next, but it really is immutable if the getter
is removed. The JLS was designed to work this way because String has to
be immutable, and String initializes itself the same way, so the JLS was
constructed to make this kind of initialization safe and immutable.
 
L

Lew

Alan said:

Because of the risk of exceptions or surprising results.
I asked my friend Google and it said:
http://fupeg.blogspot.com/2007/07/auto-unboxing-in-java.html

I don't see how in the example code in this thread, auto unboxing
triggers an NPE.

public List<Integer> list(int i) {
List<Integer> list = new ArrayList<Integer>();
list.add(i);
return list;
}

Other than the silliness of it, what is wrong with the above? (Or maybe
I reduced the example too much?)

That is not an example of the risk for NPE. Nor of the risk of autoboxing.
You need a different example.

NPE is not the only risk. Yes, it is a risk when autounboxing, because the
wrapper might be null, and unboxing will raise NPE in that case. But you can
also confuse the compiler if you mix autoboxing and varargs in certain ways.

I've also seen cases where a class refactored a method that took a 'long'
argument to one that took a 'Long' but forgot to refactor the corresponding
method in the interface implemented by the class.

Auto(un)boxing hides an operation that sometimes produces different behaviors
than one expects. Therefore one must be very careful with it.

IMHO it's not a terribly useful feature. It hides type assertions, which
diminishes one of Java's great strengths. I prefer to do all my (un)boxing
explicitly. What, I'm going to sprain my fingers with the extra typing? It's
worth the effort to improve the self-documenting nature of my code, and to
obviate the occurrence of those bugs.
 
L

Lew

Yes, for volatile, this works. Volatile fields have special semantics
which prevent re-ordering reads and writes by the compiler, so this is
guaranteed to work.

I said it didn't work for all of the cases Joshua listed, particularily
the synchronized one. Replace the volatile with a synchronized block and
it won't work anymore.

It will so, as long as they synchronize on the same lock or monitor.
Just trying to let other folks know what works and what doesn't. I think
you knew this already, just that you didn't touch all the bases when you
mentioned it originally.

class Test
{
private boolean set;
private int data;

void methodA()
{
data = 5;
synchronized( this ) {
set = true;
}
}

void methodB()
{
synchronized( this ) {
if (set)
{
System.out.print(data); // may print "0"
}
}
}
}

You're mistaken. The assignment in 'methodA()' of 'data' /happens-before/ the
write to 'set', which latter is synchronized. The read in 'methodB()' of
'data' /happens-after/ the read of 'set', which latter is synchronized on the
same monitor. Therefore the read is guaranteed to see the write.
 
M

markspace

Lew said:
You're mistaken. The assignment in 'methodA()' of 'data'
/happens-before/ the write to 'set', which latter is synchronized. The
read in 'methodB()' of 'data' /happens-after/ the read of 'set', which
latter is synchronized on the same monitor. Therefore the read is
guaranteed to see the write.


Yup, I'm mistaken.

I thought I'd read that the compiler was allowed to re-order writes and
reads around a synchronized block. "data" for example I thought might
be moved down below the synchronized block and so might not be read as
expected by methodB(). Turns out that is not the case. Synchronization
blocks provide the same memory guarantees as volatiles.

Brian Goetz calls this piggy-backing in JCIP. (He uses a volatile
variable as an example, which is why I was confused about synchronized
blocks.) And he says it's a dangerous thing to do because it's hard to
reason out these side effect correctly, and hard for maintenance
programmers to spot them as well. So to the OP and others, don't rely
on these side effects. Put all data in a synchronized block and you'll
be less likely to wonder what the heck is going on six months after your
wrote the code.

The only reason you would piggy-back memory synchronization like this is
for extreme speed optimization. The Future object in Brain Goetz's
example uses a single volatile variable and piggy-backs all its
synchronization on a single write/read of that variable. This is for
speed, because the Future object is heavily used in Executors. Don't to
the same until you are sure that you need to.
 
L

Lew

markspace said:
Brian Goetz calls this piggy-backing in JCIP.  (He uses a volatile
variable as an example, which is why I was confused about synchronized
blocks.)  And he says it's a dangerous thing to do because it's hard to
reason out these side effect correctly, and hard for maintenance
programmers to spot them as well.  So to the OP and others, don't rely
on these side effects.  Put all data in a synchronized block and you'll
be less likely to wonder what the heck is going on six months after your
wrote the code.

Like everything else in programming, there are tradeoffs that mean
that that is sometimes, but not always good advice. The
countervailing advice is to minimize the extent of critical sections
to reduce contention.

If you use a syncrhonized block, it's because you need concurrent
execution. That much is painfully obvious, but the question of how
concurrent is less obvious. I've worked on systems that run on stages
of 64-core machines running many threads for high-volume
applications. Lock contention was a major bottleneck - really major -
where the original implementors had not fully grokked the implications
of highly concurrent systems.

The more work resides in a critical section, the higher the likelihood
of thread contention for that section. Lock contention has a
cascading effect - the more threads have to wait, the worse the wait
gets, causing more threads to have to wait, worsening the wait, ...
Compounding that, modern JVMs optimize the handling of uncontended
locks.

IMO the rule to "get in, get out quickly" for critical sections trumps
the questionable notion that we cannot expect Java programmers to
reason effectively about /happens-before/. If a programmer works with
concurrent programming, they absolutely must educate themselves about
the matter or they are negligent and irresponsible. Java 5
incorporated the notion of /happens-before/ and reworked the memory
model precisely in order to reduce the effort to understand the
consequences of concurrency. We should expect and demand that
programmers of concurrent systems understand it.

Programs don't exist for the convenience of programmers, but for our
clients. Although I do believe in making life easier for the
maintenance programmer, I don't believe in making it easier for the
ignorant or incompetent maintenance programmer. Reducing the extent
of critical sections benefits the program, and thus the client, not
the less-educated programmer, but the competent programmer can deal
with that.

As Albert Einstein said, we should make things as simple as possible,
but no simpler. Face it, concurrent programming is hard no matter
what we do. It's good to keep code straightforward, but it's
inadvisable to ruin concurrent performance because we're afraid
someone will be too lazy to think carefully.
The only reason you would piggy-back memory synchronization like this is
for extreme speed optimization.  The Future object in Brain Goetz's

That's not the only reason. The other reason is for non-extreme speed
optimization, or if you will, to prevent stupid de-optimization.
example uses a single volatile variable and piggy-backs all its
synchronization on a single write/read of that variable.  This is for
speed, because the Future object is heavily used in Executors.  Don't to
the same until you are sure that you need to.

Do piggy-back on synchronization. That's why /happens-before/
exists. Do not make critical sections larger than they need to be.

The rule of thumb I recommend: Keep critical sections to the minimum
length necessary to correctly coordinate concurrent access. That will
introduce tension against markspace's advice to simplify the
concurrent code by piling more into the critical section. Balancing
the two recommendations is a matter of art.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top