Aside from the fact that the JVM optimizes lock acquisition in the
uncontended case, once a thread blocks on a monitor, all the other
threads also trying to acquire that monitor also block. As soon as one
finally gobbles the lock, the rest mill about waiting their turn while
still more threads pile up on the monitor.
Sure, the critical section might only require a few hundred clock
cycles,
but the threads can wait seconds, even minutes, as they jostle about the
revolving door trying to enter.
It reminds me of city gridlock. As rush hour commences, traffic begins to
ramp up smoothly. It also keeps flowing smoothly right up to a certain
point, and then whamo! It's like it hits a brick wall. Suddenly every
brake light in the city comes on seemingly at the same time and huge
traffic jams develop.
The reason? Once the traffic gets tight enough, people slowing to turn,
or merging in onto the highway, or whatever will trigger jams that just
keep growing and growing. If the traffic is sparse enough one person
slowing briefly doesn't slow anyone else down much of the time. If they
do, it's often for a shorter time. If it gets dense enough, one person
slowing down slows one or more others down for as long or longer. What
eventually ends it is enough people getting where they're going and
removing themselves from the roads, reducing the traffic density again.
The jams dissolve then as the cars at the front speed up and then the
ones behind them, and so on, and they all spread themselves out.
Of course there are things that can make it worse. A good concurrency
design with lock striping is like a good cloverleaf intersection; there's
some merging but no stop signs or lights. A really awful concurrency
design with a global lock that's highly contended is like having a large
city's roads all radiate from a single hub, which has a stop sign. (Nah.
I guess worse is with the hub, but without the stop sign. Screeeech!
Crash! Crunch. Race condition.)
If you can swing it, functional programming idioms with immutable data
structures are like making your whole road network 3D, so traffic flows
rarely intersect at all, only where truly necessary.