What's the deal with deadlocks

J

Joe Snodgrass

The general concept is simple enough, but it seems to me that you'll
need special tools to diagnose this specific problem. How do you get
the debugger to look inside threads, see that they're hung, and find
out where the problem is happening? Do the debuggers have some
features that I haven't heard of? TIA.
 
M

markspace

The general concept is simple enough, but it seems to me that you'll
need special tools to diagnose this specific problem. How do you get
the debugger to look inside threads, see that they're hung, and find
out where the problem is happening? Do the debuggers have some
features that I haven't heard of? TIA.


Probably. Goggle is your friend. Since you posted this to a Java
newsgroup, I'll supply one answer on topic for that group:

<http://netbeans.org/kb/docs/java/debug-deadlock-screencast.html>
 
I

Ian Collins

Debugging deadlocks is easier than e.g. race conditions, because when a
deadlock appears the program is effectively stopped at the point of the
error and one can easily attach the debugger and study the stack traces of
all the threads. All the debuggers I use support this. The only problem is
that if there are many threads running then finding the actual culprit may
become tedious. I am not sure if this can be automated by some tools
currently.

For avoiding deadlocks in advance one can use valgrind+helgrind and fix all
inconsistent lock order diagnostics it spits out. This way one should be
able to get rid of all potential deadlock scenarios in all code paths
covered by the test run.

There are also static deadlock analysis tools such as Sun Studio's LockLint.
 
R

Roedy Green

The general concept is simple enough, but it seems to me that you'll
need special tools to diagnose this specific problem. How do you get
the debugger to look inside threads, see that they're hung, and find
out where the problem is happening? Do the debuggers have some
features that I haven't heard of? TIA.

The primary technique is to use library code even if somewhat awkward.
That way you have an expert author, and many eyes looking for bugs. It
is much harder to find bugs involving threads since they are not as
deterministic as ordinary bugs. Even if you have single stepped every
instruction through all possible branchings, there can still be bugs
hiding.
 
J

Joshua Maurice

In multithreaded apps that require multiple synchronization primitives, if a
resource must be locked and during that block you must lock yet again, then you
must unlock in the reverse order that you locked.

I'm not sure this is correct. In fact, I'm pretty sure this is
incorrect. The unlock order is irrelevant. That can never lead to
deadlocks (at least not for what I would consider sane code). In sane
code, you can unlock in any order, and deadlocks will not arise.

Of course, you should be using stack destructors for unlocking in C++,
which will result in LIFO ordering.
 
J

Joe Snodgrass

(e-mail address removed):




Debugging deadlocks is easier than e.g. race conditions, because when a
deadlock appears the program is effectively stopped at the point of the
error and one can easily attach the debugger and study the stack traces of
all the threads. All the debuggers I use support this. The only problem is
that if there are many threads running then finding the actual culprit may
become tedious. I am not sure if this can be automated by some tools
currently.

For avoiding deadlocks in advance one can use valgrind+helgrind and fix all
inconsistent lock order diagnostics it spits out. This way one should be
able to get rid of all potential deadlock scenarios in all code paths
covered by the test run.

Here's how NASA handles race conditions.

http://tinyurl.com/42p2t5f

I searched Dr. Dobbs J, and got ten pages of matches.
 
J

Joshua Maurice

Then I refer you to the text again for a full description of what I apparently
failed to convey. (Programming with POSIX Threads, pg 64)

My description was not meant to be a complete description of a fixed locking
hierarchy. A slightly better description seems to be in order though.

Consider the list below to be a list of named resources that are shared
througout an application. Also, the use of mutex below will be in the posix
sense, not in a MS mutex sense.  The equivalent for MS would roughly bea
critical section.
A
B
C
D
E

The developer's choice of locking order should consider knowing which shared
resources must be synchronized during synchronizing other resources in the list.
In other words, the efficiency of the fixed locking hierarchy.  The greater
point is that there will be numerous cases of multiple synchronizations.  i.e.
All code that needs both mutex_a and mutex_b must always lock mutex_a andthen
mutex_b

Lets say that one thread (thread #1) needed synchronized write access to B and
D, and another thread (thread #2) needed synchronized write access to C and E.
Other threads in the application require arbitrary synchronized write access to
any and all of the other resources at other times.

To avoid a deadlock,  thread #1 would (in order) lock B,C,D and thread #2 would
(in order) lock C,D,E.

To the statement you made, " I'm not sure this is correct. In fact, I'm pretty
sure this is incorrect. The unlock order is irrelevant."

I will defer to a direct quote so that the commonality between what I explained
from memory and what the author (Butenhof is one "f" btw) coveys:

"If the code invariants permit you to unlock mutex 1 safely at this point, you
would do better to avoid owning both mutexes at the same time.  That is, unlock
mutex 1, and then lock mutex 2.  If there is a broken invariant that requires
mutex 1 to be owned, then mutex 1 cannot be released until the invariant is
restored".

Applying the same methodology to the above example implies a distinct unlock
order. In the context of the quote, unlock order seems to matter.

I'm sorry that I don't follow that at all.

In fact, I think there might be some confusion. I completely agree
that a /lock/ order can prevent deadlock. A hierarchical locking
scheme is where everyone agrees that if they're going to acquire two
or more mutexes, then they should do so in a strict global ordering.

I disagree that the /unlocking/ side matters at all in such a scheme.
Do you have a concrete example where the "wrong" unlock order can lead
to a deadlock? (Note that I might be able to contrive a case where
"unlock order contributes to a deadlock", but that wouldn't be what I
was talking about.)

For example:
lock A
lock B
lock C
unlock B
lock B
With a hierarchical locking scheme can lead to a deadlock. It
specifically can result in deadlock because at the end it's trying to
acquire B while already holding C, and that is contrary to the
hierarchical locking rules we have established. However, consider the
following two examples:

Ex:
lock A
lock B
lock C
unlock C
unlock B
unlock A
Ex:
lock A
lock B
lock C
unlock B
unlock A
unlock C

The above two are functionally equivalent. There is no way that either
unlock order could contribute to a deadlock (at least not without
additional locking in this thread contrary to the hierarchical order).
Other threads don't care what order you unlock in. They cannot. If
they also follow the order rules, then they might just have to wait a
little longer depending on your unlock order, but there's no way that
unlock order can result in thread 1 owning A and trying to acquire B,
while thread 2 owning B and trying to acquire A. The only way for that
to happen is if one thread breaks the acquire order.

I am happy to discuss if I am wrong, and I would very much appreciate
an example proving me wrong.

And again, to repeat, as a matter of good C++ programming style, I
expect (nearly) all mutex acquisition and releasing to be done in
stack based lock objects, RAII, so in good code the order will be
LIFO, but it's not required to avoid deadlocks with a hierarchical
locking scheme.
 
L

Lew

Joe said:
Here's how NASA handles race conditions.

http://tinyurl.com/42p2t5f

I searched Dr. Dobbs J, and got ten pages of matches.

That link was worth ten times the expected maximum value for a Usenet link.

Not least because it led to http://www.usingcsp.com/cspbook.pdf,
/Communicating Sequential Processes/, by C. A. R. "Tony" Hoare, with foreword
by Edsger W. Dijkstra.

--
Lew
"For a variety of reasons, this is a book eagerly awaited by all who knew it
was in the making; to say that their patience has been rewarded would be an
understatement."
- Edsger W. Dijkstra, in the foreword to /Communicating Sequential Processes/,
by C. A. R. "Tony" Hoare.
 
A

Alf P. Steinbach /Usenet

* Lew, on 18.04.2011 22:22:
That link was worth ten times the expected maximum value for a Usenet link.

Not least because it led to http://www.usingcsp.com/cspbook.pdf, /Communicating
Sequential Processes/, by C. A. R. "Tony" Hoare, with foreword by Edsger W.
Dijkstra.

I have that in hardcover.

Some other interesting old books:

Parallell Programming in ANSI Standard ADA - George W. Cherry
Lucid, the Dataflow Language - William W. Wadge & Edward A. Ashcroft
OCCAM Programming Manual - INMOS Limited

OCCAM was sort of a language designed to express more or less directly Hoare's
concepts.

Hm, I don't have anything on Linda.


Cheers & hth.,

- Alf
 
I

Ian Collins

* Lew, on 18.04.2011 22:22:

I have that in hardcover.

Some other interesting old books:

Parallell Programming in ANSI Standard ADA - George W. Cherry
Lucid, the Dataflow Language - William W. Wadge& Edward A. Ashcroft
OCCAM Programming Manual - INMOS Limited

OCCAM was sort of a language designed to express more or less directly Hoare's
concepts.

Someone else who has used OCCAM in anger? Wow I thought I was the only
one left!
 
A

Alf P. Steinbach /Usenet

* Ian Collins, on 18.04.2011 23:53:
Someone else who has used OCCAM in anger? Wow I thought I was the only one left!

You may well be the last: I never used it, I just wanted to learn about it
(within my economic means, and I think I got that manual at a "basement bargain"
shop).

Anyway, this is fun.

Discussing OCCAM, cross-posted to [comp.lang.c++] and
[comp.lang.java.programmer]. :)


Cheers,

- Alf
 
T

Tom Anderson

That link was worth ten times the expected maximum value for a Usenet link.

Not least because it led to http://www.usingcsp.com/cspbook.pdf,
/Communicating Sequential Processes/, by C. A. R. "Tony" Hoare, with foreword
by Edsger W. Dijkstra.

If you like Dijkstra - and really, who doesn't - you might enjoy dipping
into the archive of his papers:

http://www.cs.utexas.edu/users/EWD/

There's some heavily obscure stuff in there, but also dry wit, rigorous
thinking, and interesting ideas.

tom
 
N

Noah Roberts

The general concept is simple enough, but it seems to me that you'll
need special tools to diagnose this specific problem. How do you get
the debugger to look inside threads, see that they're hung, and find
out where the problem is happening? Do the debuggers have some
features that I haven't heard of? TIA.

I myself have always wondered how they could stand the smell.

It's GOT to itch like a mother too.
 
J

Joe Snodgrass

Someone else who has used OCCAM in anger?  Wow I thought I was the only
one left!

Ok, so tell me if this is how it works.

You got a concurrent programming project, let's say in C++, and you're
climbing the walls, because the results are unpredictable, the
debugger shows nothing and the code looks fine.

You say to yourself "It's gotta be a race condition, right? I mean
I've tried everything else, and the behavior is consistent with that."

The solution is to push C++ onto the back burner, whip out your credit
card and buy a copy of Occam (or Linda, or whatever the kids are using
these days.)

Your job now becomes to interface Occam into your existing C++ code
and use the Occam features to find and fix the race condition. Then
you go back to C++ and work on whatever the boss says he wants done
next.

But the key is to give up all hope of ever fixing the race condition
WITHOUT a language like Occam, because if you don't buy Occam, you're
gonna be twisting in the wind until you're so far behind schedule that
the boss fires you.

Am I correct that my description is a highly reliable portrayal of how
these situations develop?
 
L

Lew

Joe said:
Ok, so tell me if this is how it works.

I'm going to respond to your /reductio ad absurdum/ as though you were
presenting a serious argument.
You got a concurrent programming project, let's say in C++, and you're
climbing the walls, because the results are unpredictable, the
debugger shows nothing and the code looks fine.

The argument breaks down right here.

The code *looks* fine is a function of how one looks, not of the code. Buying
OCCAM or MAGICBOOJUM won't fix diddly, because the failure is in the Mark 1
Eyeball and associated neural control and analysis circuits, not the software
tools.

"The debugger shows nothing" is useless because debuggers don't diagnose
concurrency issues until they happen, and then only if you catch them in the
right place. But hey, you throw enough shit at the wall and some of it will
stick, right?

"The results are unpredictable" only because one hasn't gathered enough
information to make a prediction. The problem isn't in the results, once
again. The failure is in the one gathering the information.

So given the failure in the premises, any correctness in the conclusion will
be pure coincidence.
You say to yourself "It's gotta be a race condition, right? I mean
I've tried everything else, and the behavior is consistent with that."

Right, because incomplete information gathering combined with lame and
ridiculously insufficient and misguided analysis should always be followed
immediately by an baseless leap to an unfounded conclusion, followed by a rush
to a randomly-chosen and utterly misunderstood ameliorative strategy.
The solution is to push C++ onto the back burner, whip out your credit
card and buy a copy of Occam (or Linda, or whatever the kids are using
these days.)

Your job now becomes to interface Occam into your existing C++ code
and use the Occam features to find and fix the race condition. Then
you go back to C++ and work on whatever the boss says he wants done
next.

But the key is to give up all hope of ever fixing the race condition
WITHOUT a language like Occam, because if you don't buy Occam, you're
gonna be twisting in the wind until you're so far behind schedule that
the boss fires you.

Am I correct that my description is a highly reliable portrayal of how
these situations develop?

No. At least, there's no evidence here that you are.
 
J

Joe Snodgrass

I'm going to respond to your /reductio ad absurdum/ as though you were
presenting a serious argument.


The argument breaks down right here.

The code *looks* fine is a function of how one looks, not of the code.  Buying
OCCAM or MAGICBOOJUM won't fix diddly, because the failure is in the Mark1
Eyeball and associated neural control and analysis circuits, not the software
tools.

"The debugger shows nothing" is useless because debuggers don't diagnose
concurrency issues until they happen, and then only if you catch them in the
right place.  But hey, you throw enough shit at the wall and some of itwill
stick, right?

"The results are unpredictable" only because one hasn't gathered enough
information to make a prediction.  The problem isn't in the results, once
again.  The failure is in the one gathering the information.

So given the failure in the premises, any correctness in the conclusion will
be pure coincidence.


Right, because incomplete information gathering combined with lame and
ridiculously insufficient and misguided analysis should always be followed
immediately by an baseless leap to an unfounded conclusion, followed by arush
to a randomly-chosen and utterly misunderstood ameliorative strategy.







No.  At least, there's no evidence here that you are.

OK, so two of you have just contradicted me sarcastically, which means
my post must have been wrong. In other words, there ARE ways to find
and fix race conditions with without using a language like Occam.
What are they, and where can I read up on them?
 
L

Lew

Joe said:
OK, so two of you have just contradicted me sarcastically, which means
my post must have been wrong. In other words, there ARE ways to find
and fix race conditions with without using a language like Occam.
What are they, and where can I read up on them?

Sarcasm for sarcasm - if you set it up, don't be too astonished to receive it
back.

But the points you raised, as ours, were valid and useful if one is trained to
educe the point from Socratic discourse. I interpreted your post as a rather
brilliant diatribe against the same sort of rush to judgment I attacked in my
response, thus I believe us to be allies in championing an intelligent
approach. That is why I introduced my post, "I'm going to respond to your
/reductio ad absurdum/ as though you were presenting a serious argument." I
feel quite sure you were pointing out exactly the sort of flawed reasoning I
attacked in detail. Thus, you are providing the script for the young student,
naively thinking Occam will save the day, and I the for the crotchety teacher
attempting to impose a more engineering-oriented outlook.

That said, the universe of discourse admits of a lot of room besides "Occam is
our savior!" and "you can fix everything without Occam". I don't agree to
frame the discussion in the terms you propose because the world is not limited
to just the options you suggest. You asserted a conclusion that "there ARE
[sic] ways" based on the sole premise (and fact) that Occam is not a magic
bullet, without showing evidence or a chain of reasoning to link those
statements.

In fact, there ARE ways to find and fix race conditions, some with and some
without Occam; they comprise a whole mess of practices and approaches and
mindsets that are and likely always will be active topics of research and
discussion among partisans.

I shall forebear suggesting

http://lmgtfy.com/?q=Java+race+condition+detect+and+cure,

choosing instead to follow that link myself and come up with dozens of useful
leads that I read with pleasure.

I shall not forebear suggesting:

http://java.sun.com/docs/books/effective/
http://www.javaconcurrencyinpractice.com/

nor to scan through many of:
http://www.ibm.com/search/csass/search/?q=Goetz&Search=Search

Also, a scan upthread in this very discussion yields several answers to that
very question. I commend to your attention in particular Dr. Patricia
Shanahan's responses, which in a few short sentences lay the groundwork for
everything you need to know. In one post, she recommends preventing the bugs,
which is actually attainable by disciplined practice and thoughtful design.
In the other, she limns the essentials of troubleshooting such matters.

Your question is excellent, and seminal to the art of programming. If I may
rephrase:

How can one find, fix [and I'll add, prevent] race conditions, deadlocks, and
other such parallel-execution foobars?

Indeed, Grasshopper. Indeed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top