In the philosophers' case, there's an even simpler solution that doesn't
involve a total order: you colour the forks red or black, and lay them
around the table in alternating colours (and if there are an odd number of
diners, you set an extra place, or give one of them chopsticks). You then
instruct the philosophers to pick up their black fork first, then their
red fork. You can only ever be waiting for a red fork, while holding a
black fork, so you know that nobody who holds the fork you're after can
ever be waiting for a fork you're holding. Deadlock cannot ensue.
The former case is applicable to the list equals problem, although the
latter is not.
The graph whose nodes are objects that are synchronized and whose
edges are between all (and only) pairs that get locked together must
be two-colorable. This won't be true in general, though as you noted
it is true when the objects form a line, or are even in number and
form a circle, and only adjacent pairs are locked together.
static boolean order(List<?> a, List<?> b) {
// this is the only tricky bit
// but as long as it's consistent, it doesn't matter how you do it
}
The evil mythical being is in the details is it not?
Sun should add a construct
synchronized (x, y, z, ...) {
}
that allows any number of items in the parentheses and always locks
the same set of objects in a deterministic order.
As for how this could be implemented.
One way adds four bytes (eight on 64 bit systems) to an object: add an
"object number" hidden field that is globally unique. (It might also
be used as the identity hash, making those unique, possibly improving
some hash table performance). These might be issued sequentially in
allocation order, but that requires all allocations to go through a
global lock. Smarter might be to partition the space of possible
object numbers into non-overlapping ranges that are assigned to the
thread-local allocation buffers; objects allocated from these get
numbers from the associated range. When a TLAB needs to be
replenished, it also gets a new, previously-unused range. (And the
TLAB now "needs to be replenished" if it can't satisfy an allocation
*either* of memory *or* of an object number). This returns the
performance of allocation close to what it is in present Hotspot VMs.
Another way would use the object's hashCode return, with tie-breaking
by assigning an "object number" on the fly to only the objects that
come up in ties (which gets reused for the same objects if they come
up again). A global lock only has to be acquired when a particular
object comes up in a tie for the first time. Tiebreaking could proceed
first by hashCode and then by (if different) identityHashCode, further
reducing (in some cases) the need to acquire that global lock.
This method has the advantage of not needing an additional field in
every object, though it needs a way to associate a small subset of all
objects with a tie-breaking number.
Personally, I'd favor ensuring that the identity hash code is globally
unique and using that, with the range-allocation-to-TLAB method to
avoid damaging the performance of allocations. Since objects *already*
have to get, and somehow encode, their identity hash code, this
shouldn't make objects any bigger or allocation any slower if done
right.
Of course, the hashCode() and System.identityHashCode() methods have a
return type of "int", not "long", so though the internal "identity
hash code" would be unique on 64-bit platforms, the hash code method
returns still wouldn't be, though they now would be on 32-bit
platforms. The hash code methods would have to return the low 32 bits
of the internal number.
There's also a consideration with long-running systems eventually
going through the entire pool of numbers (though it might take an
astronomically long time in the 64-bit case). Two solutions to *that*:
* Use 64 bits no matter what, and pray to one of your cultures'
deities.
* In combination with the new "G1" collector, associate ranges (and
presumably TLABs) with GC spaces within the heap; whenever that
space
is free of live objects, that range of object numbers is noted as
available for reuse. Indeed, the spaces and the ranges can be
permanently associated to one another. Objects move less in G1, so
the
initial address of an object can be used most of the time. When an
object has been moved and is still live, though, the same initial
address mustn't be reused. If there is an efficient way to detect
that
such an object exists (with the GC presumably keeping track), the
address of a new object allocated at the same spot can be
"perturbed"
to give the "object number". The lowest three bits and many of the
highest bits are all likely to be zero, so flipping bits 0, 1, 2,
and
then 63 (or 31), 62 (or 30), etc. might be used to displace the
numbers of objects initially allocated in the same spot from one
another.
* If an OS exists that can hand out huge virtual address spaces to
processes and efficiently use only as much page file and RAM as the
process uses from that address space, however distributed, then JVMs
on that platform can just allocate all objects at unique addresses!
Here, I'd say the best bet is just to associate a range with a TLAB,
and when a range empties or a TLAB empties, use a new range and a new
TLAB; if the global counter used to generate new ranges wraps, so be
it. Low values are likely to be mostly unused. Wrapping should be rare-
to-inconceivable if longs are used internally.
A less efficient but robust solution is to associate with each object
a unique *ternary* string, stored using the bit-pairs 01, 10, and 11,
with 00 a sentinel value, and making this get as long as is needed.
Long-running systems will slowly lose efficiency of allocation,
identity hash codes, and multi-synchronization but never acquire the
risk of deadlock.
The ternary string with terminator exceeds 32 bits at 15 ternary
digits stored, so this becomes worse than the current situation, space-
wise, at around 3^15 = 14,348,907 objects allocated. That is a
dismayingly small number. But it only becomes worse, space-wise, than
64-bit binary integers at 3^30 = 205,891,132,094,649 objects
allocated, or a couple hundred trillion. (At this point, the ternary
strings jump from 62 bits to 65 bits, exceeding 64, at two bits per
ternary digit plus two sentinel zero bits.)
Speed-wise, it is slightly worse from the get-go due to the more
complex operations to be performed.
Normal speed efficiency can be restored by using integers of machine-
native size up until -1 (all Fs in hex; the "maximum" before it wraps
to numbers already used) and then layering on an alternative scheme
for all subsequent objects (which are marked by all having -1 in the
"object number" field). Ternary strings is one such alternative
scheme. So is a second integer field, used until it becomes -1, then a
third, and so forth, or a zero-terminated string of bytes or words
used as a bignum, or something. (A bignum that only has to support
order-comparison and addition, simplifying things somewhat.)
One final option, but with potential difficulties, involves freeing
and reallocating numbers using the garbage collector through garbage
collecting in *both* memory *and* the space of numbers, marking ranges
free when no objects with numbers in that range are live. Another
option involving the GC is to use objects' memory addresses at all
times (very cheap 99% of the time, though it can't be used for the
identity hash code then, unless the latter is copied from the address
and then frozen) and do something "special" if the object is forced to
move. The deadlock risk arises if some objects are being locked in
sequence by two threads AND one of these objects is moved. Suppose A >
B, a thread locks A, then B is moved so B < A, the other thread locks
B, and now each is waiting on the other object. Oops.
Since the deadlock risk only arises while a multi-synchronized is
actually in the process of acquiring locks, though, we can do a few
tricks:
* Multi-synchronized may start by flagging all of the objects, then
proceed to get their numbers, then begin acquiring locks, then
unflag the objects. The garbage collector will refuse to move
objects
that are flagged at the time, waiting until the next GC or just
waiting until the object is unflagged. A volatile boolean suffices
for
the flag, obviating the need for global locking.
* A GC moving objects can cause multi-synchronizeds that are in
progress
to abort -- that is, the lock-acquisition behavior is viewed as a
transaction. If it acquires a few locks but hasn't got them all when
a
GC occurs that moves an object, it releases all the acquired locks,
then begins acquiring locks all over again, in (possibly-changed)
sequence order. Since none of the objects has been modified by the
*body* of the synchronized block yet, releasing and reacquiring
locks
doesn't violate thread-safety.
The above might actually be the simplest to implement of all of these
ideas.
I leave it up to Sun's engineers to pick whichever one they think is
best, or come up with an alternative all their own.