Serious concurrency problems on fast systems

  • Thread starter Kevin McMurtrie
  • Start date
E

Eric Sosman

No; the only assumption being made is that the shorter the interval since
a thread has resumed, the lower the probability of its being preempted in
the next x nanoseconds for some fairly small value of x. I don't think
that that's an unreasonable assumption; context switches are expensive
enough that a reasonably-designed OS scheduler is probably going to avoid
frequently putting two of them (on the same core) too close together in
time.

Dear Sir or Madam (as the case may be),
You may be right, at that.
Sincerely,
E.A.S.

(With apologies to George Bernard Shaw.)
 
A

Arne Vajhøj

Though giving a thread higher priority while it holds a shared lock
isn't exactly rocket science; VMS did it back in the early 80s. JVMs
could do a really nice job of this, noticing which monitors cause
contention and how long they tend to be held. A shame they don't.

A higher priority could reduce the problem, but would not
eliminate it.

Arne

PS: I thougth DECThreads came with VMS 5.5 in 1991-
 
A

Arne Vajhøj

No, but someday there could be an option to let a Thread synchronized on
a monitor for which another Thread is waiting run a little longer, in
hope that it will desynchronize.

Maybe.

But I think it would be difficult to implement, because
the JVM knows about the locks but the OS manage the
threads.

But it is possible. If the OS API has:
- StartBadTimeToKickMeOff
- EndBadTimeToKickMeOff
then the JVM could call it at enter and exit.

Arne
 
M

Mike Schilling

Arne Vajhøj said:
A higher priority could reduce the problem, but would not
eliminate it.

Arne

PS: I thougth DECThreads came with VMS 5.5 in 1991-

VMS actually gave a boost to *processes* that held locks. Close enough to
the same thing, methinks.
 
A

Arne Vajhøj

VMS actually gave a boost to *processes* that held locks. Close enough
to the same thing, methinks.

True.

Arne

PS: A quick glance in IDS indicates that it is locking on
mutexes not regular $ENQ/$DEQ that raises priority.
 
M

Mike Schilling

Arne Vajhøj said:
True.

Arne

PS: A quick glance in IDS indicates that it is locking on
mutexes not regular $ENQ/$DEQ that raises priority.

That's a great book, but I must have given away my copy at least fifteen
yeas ago.
 
A

Arne Vajhøj

To clarify a bit, this isn't hammering a shared resource. I'm talking
about 100 to 800 synchronizations on a shared object per second for a
duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
cause a complete collapse of concurrency.

But either it does or your entire problem analysis is wrong.
My older 4 core Mac Xenon can have 64 threads call getProperty(String)
on a shared Property instance 2 million times each in only 21 real
seconds. That's one call every 164 ns. It's not as good as
ConcurrentHashMap (one per 0.30 ns) but it's no collapse.

That is a call per clock cycle.

?!?!
Many of the basic Sun Java classes are synchronized.

Practically only old ones that you should not be using anymore
anyway.

Arne
 
A

Arne Vajhøj

That's a great book, but I must have given away my copy at least fifteen
yeas ago.

I still have the VAX 3.3, VAX 5.2 and Alpha 1.5 versions on the shelf.

Arne
 
K

Kevin McMurtrie

Arne Vajhøj said:
But either it does or your entire problem analysis is wrong.


That is a call per clock cycle.

HotSpot has some (benchmark-driven?) optimizations for this case. It's
hard to not hit them when using simple tests on String and
ConcurrentHashMap.


?!?!


Practically only old ones that you should not be using anymore
anyway.

Arne

Properties is a biggie. A brute-force replacement of Properties caused
the system throughput to collapse to almost nothing in Spring's
ResourceBundleMessageSource. There's definitely a JVM/OS problem. The
next test is to disable hyperthreading.
 
R

Robert Klemme

HotSpot has some (benchmark-driven?) optimizations for this case. It's
hard to not hit them when using simple tests on String and
ConcurrentHashMap.

What exactly do you mean by that? I can't seem to get rid of the
impression that you are doing the second step (micro optimization with
JVM internals in mind) before the first (proper design and implementation).
Properties is a biggie. A brute-force replacement of Properties caused
the system throughput to collapse to almost nothing in Spring's
ResourceBundleMessageSource. There's definitely a JVM/OS problem. The
next test is to disable hyperthreading.

As someone else (Lew?) pointed out it's a bad idea to always go to
System.properties. You should rather be evaluating them on startup and
initialize some other data structure - if only to not always repeat
checking of input values over and over again.

Cheers

robert
 
L

Lew

Robert said:
As someone else (Lew?) pointed out it's a bad idea to always go to
System.properties. You should rather be evaluating them on startup and
initialize some other data structure - if only to not always repeat
checking of input values over and over again.

I worked on a big Java Enterprise project a while ago that had highly
concurrent deployment but made quite a number of concurrency mistakes that
hugely slowed it down.

Kevin's comments about clock cycles and all are somewhat beside the point.
There is a cascade effect once locks start undergoing contention. Aside from
the fact that the JVM optimizes lock acquisition in the uncontended case, once
a thread blocks on a monitor, all the other threads also trying to acquire
that monitor also block. As soon as one finally gobbles the lock, the rest
mill about waiting their turn while still more threads pile up on the monitor.
Sure, the critical section might only require a few hundred clock cycles,
but the threads can wait seconds, even minutes, as they jostle about the
revolving door trying to enter.

Up until threads start blocking, you can get quite good performance, but
heaven help you once contention gets heavy.

On that big project we proved this with various enterprise monitoring products
that reported on locks, wait times for locks and other performance issues.

Nothing beats immutable members for eliminating that contention. Right with
that is not to share data between threads in the first place.

We did three things on that project to improve concurrency: eliminated shared
data, made shared data immutable, and used 'java.util.concurrent' classes.

'ConcurrentHashMap', for example, with its multiple lock stripes, unlimbered
one major bottleneck on synchronized 'Map' access. (I fought for eliminating
the shared 'Map' entirely, but lost that battle. You can lead a horse to
water ...)

Stop hitting System.properties altogether, except for once at static class
initialization.
 
K

Kevin McMurtrie

Robert Klemme said:
What exactly do you mean by that? I can't seem to get rid of the
impression that you are doing the second step (micro optimization with
JVM internals in mind) before the first (proper design and implementation).


As someone else (Lew?) pointed out it's a bad idea to always go to
System.properties. You should rather be evaluating them on startup and
initialize some other data structure - if only to not always repeat
checking of input values over and over again.

Cheers

robert

The properties aren't immutable. The best feature of properties rather
than hard-coded values is being able to update them in an emergency
without server restarts. Anyways, that was fixed by overriding every
method in Properties with a high-concurrency implementation. Too bad
Properties isn't an interface.

Fixing every single shared synchronized method in every 3rd party
library could take a very, very long time.

Today's test had hyperthreading turned off. The performance drop-off
wasn't a fatal collapse like before but it was still bad. The backlog
came out of nowhere, cycled through several points of code, then went
away. It's starting to sound like a HotSpot problem. Argh. We left
Java 1.5.0_16 because of GC stalling. We left 1.5.0_21 because it
unrolled loops incorrectly. Java 1.6.0_17 optimized away monitorexit,
which is amusing but quite fatal. We're on 1.6.0_20 now so it may be
time to ask Sun/Oracle for help.
 
C

ClassCastException

Aside from the fact that the JVM optimizes lock acquisition in the
uncontended case, once a thread blocks on a monitor, all the other
threads also trying to acquire that monitor also block. As soon as one
finally gobbles the lock, the rest mill about waiting their turn while
still more threads pile up on the monitor.
Sure, the critical section might only require a few hundred clock
cycles,
but the threads can wait seconds, even minutes, as they jostle about the
revolving door trying to enter.

It reminds me of city gridlock. As rush hour commences, traffic begins to
ramp up smoothly. It also keeps flowing smoothly right up to a certain
point, and then whamo! It's like it hits a brick wall. Suddenly every
brake light in the city comes on seemingly at the same time and huge
traffic jams develop.

The reason? Once the traffic gets tight enough, people slowing to turn,
or merging in onto the highway, or whatever will trigger jams that just
keep growing and growing. If the traffic is sparse enough one person
slowing briefly doesn't slow anyone else down much of the time. If they
do, it's often for a shorter time. If it gets dense enough, one person
slowing down slows one or more others down for as long or longer. What
eventually ends it is enough people getting where they're going and
removing themselves from the roads, reducing the traffic density again.
The jams dissolve then as the cars at the front speed up and then the
ones behind them, and so on, and they all spread themselves out.

Of course there are things that can make it worse. A good concurrency
design with lock striping is like a good cloverleaf intersection; there's
some merging but no stop signs or lights. A really awful concurrency
design with a global lock that's highly contended is like having a large
city's roads all radiate from a single hub, which has a stop sign. (Nah.
I guess worse is with the hub, but without the stop sign. Screeeech!
Crash! Crunch. Race condition.)

If you can swing it, functional programming idioms with immutable data
structures are like making your whole road network 3D, so traffic flows
rarely intersect at all, only where truly necessary.
 
M

Martin Gregorie

The properties aren't immutable.
But its unlikely that you'll want to change them frequently, regardless
of whether 'frequently' is defined in human or electronic terms.
The best feature of properties rather
than hard-coded values is being able to update them in an emergency
without server restarts.
So, why not design in the same sort of mechanism that is used by high
performance servers such as bind or Samba? IOW, store properties in read-
only objects and provide a method that reloads them on demand. It could
be an explicit command that's issued when a change is made or simply a
thread that scans for changes every minute or so and only makes changes
when a modification is detected.
 
L

Lew

Martin said:
But its unlikely that you'll want to change them frequently, regardless
of whether 'frequently' is defined in human or electronic terms.

So, why not design in the same sort of mechanism that is used by high
performance servers such as bind or Samba? IOW, store properties in read-
only objects and provide a method that reloads them on demand. It could
be an explicit command that's issued when a change is made or simply a
thread that scans for changes every minute or so and only makes changes
when a modification is detected.

Why are people so deucedly afraid of "new ImmutableThingie()"?

Kevin, you asked for advice. You've gotten good advice - make shared objects
immutable.

Even non-shared objects should mostly be immutable, unless they really,
really, really, really should be mutable by their inherent nature. Most of
the time, as in your properties case, you can get by with creation of a new
immutable object.

In in-between scenarios, like if you want slowly-changing properties, you
create one immutable shared object from a synchronized factory (that in this
case copies System.properties) at the top of a long operation.

System.properties is such the wrong place to put properties that change during
the life of the program. As you point out it's not designed for good
concurrency. Put those mutable properties somewhere else, like in a
ConcurrentHashMap as I've already suggested.

Sheesh.
 
R

Robert Klemme

The properties aren't immutable. The best feature of properties rather
than hard-coded values is being able to update them in an emergency
without server restarts. Anyways, that was fixed by overriding every
method in Properties with a high-concurrency implementation. Too bad
Properties isn't an interface.

Well, then use an immutable Hash map as Lew suggested and store it via
AtomicReference. You get fast concurrent access and can update it at
any time.
Fixing every single shared synchronized method in every 3rd party
library could take a very, very long time.

I have no idea where you take that from. Nobody suggested fixing third
party libraries - if anything the suggestion was to use them properly.
Note that you can even use a Vector without too much performance
degradation if it is accessed by a single thread only (of course one
would use ArrayList then). It's not the libs - it's the way you use them.

Cheers

robert
 
M

Mike Schilling

Robert Klemme said:
On 08.06.2010 05:39, Kevin McMurtrie wrote:

I have no idea where you take that from. Nobody suggested fixing third
party libraries - if anything the suggestion was to use them properly.

What if they use system properties promiscuously? Hypothetically:

1. My application receives XML messages.
2. I use a third-party library to deserialize the XML into Java objects.
3. The third-party library uses JAXP to find an XML parser.
4. JAXP always checks for a system property that points to the parser's
class name.

Even if the details are off (I don't know whether current versions of JAXP
cache the class name), you get the idea.
 
K

Kevin McMurtrie

Patricia Shanahan said:
Kevin McMurtrie wrote:
...
...

Have you considered other possibilities, such as memory thrashing? The
resource does not seem heavily enough used for contention to be a big
issue, but it is about the sort of access rate that is low enough to
allow a page to be swapped out, but high enough for the time waiting for
it to matter.

Patricia

It happened today again during testing of a different server class on
the same OS and hardware. This time it was under a microscope. There
were 10 gigabytes of idle RAM, no DB contention, no tenured GC, no disk
contention, and the total CPU was around 25%. There was no gridlock
effect - it always involved one synchronized method that did not depend
on other resources to complete. Throughput dropped to ~250 calls per
second at a specific method for several seconds then it recovered. Then
it happened again elsewhere, then recovered. After several minutes the
server was at top speed again. We then pushed traffic until its 1Gbps
Ethernet link saturated and there wasn't a trace of thread contention
ever returning.
 
R

Robert Klemme

It happened today again during testing of a different server class on
the same OS and hardware. This time it was under a microscope. There
were 10 gigabytes of idle RAM, no DB contention, no tenured GC, no disk
contention, and the total CPU was around 25%. There was no gridlock
effect - it always involved one synchronized method that did not depend
on other resources to complete. Throughput dropped to ~250 calls per
second at a specific method for several seconds then it recovered. Then
it happened again elsewhere, then recovered. After several minutes the
server was at top speed again. We then pushed traffic until its 1Gbps
Ethernet link saturated and there wasn't a trace of thread contention
ever returning.

Did you scrutinize the GC's log? This would be something I definitively
would look into. Other than that it's difficult to come up with
concrete information with such a general problem description.

Cheers

robert
 
R

Robert Klemme

What if they use system properties promiscuously? Hypothetically:

1. My application receives XML messages.
2. I use a third-party library to deserialize the XML into Java objects.
3. The third-party library uses JAXP to find an XML parser.
4. JAXP always checks for a system property that points to the parser's
class name.

Even if the details are off (I don't know whether current versions of
JAXP cache the class name), you get the idea.

In that case I would check whether the lib was used properly and if not
indeed the lib would need fixing. Alternatively you would have to
replace it with something else (or a newer version, but IIRC JAXP is
part of the JDK nowadays).

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,104
Latest member
LesliVqm09
Top