Dump complete java VM state as core dump (not via OS) possible?

halfdog · May 10, 2006

Hi everyone,

I've a problem debugging an application.

Background: Sometimes my application comes to a very unlikely state,
which at the moment results in an error message. The stack trace alone
has no great value, since this state is cause by the interaction of
more than one thread. The state is resolved throwing an exception, the
program continues normally.

Goal: If I reach this state, I want to suspend the application, dump
the complete state of all java threads, objects, ... (complete java
memory core dump) to analyse it later.

Question: Is there a possibility to generate such dumps?

Just thinking:

One posibility would be, to send a signal from the vm to some external
program (e.g. udp packet), this program attaches a standard OS-level
debugger (e.g. gdb under linux), dumps the core, resumes app, detach.
But I guess reconstruction of internal VM state from complete dump is
rather hard, is it?

I've looked at jdb, to see if it has some java-core-dump functions, but
it seems not to be so. Are there alternative implementations, or helper
scripts that make jdb loop over all java object addresses and dump
them?

Does someone known about the jdb remote debugging interface? Is the
protocol public, can I implement such a feature on myself?

stixwix · May 10, 2006

halfdog said:
Goal: If I reach this state, I want to suspend the application, dump
the complete state of all java threads, objects, ... (complete java
memory core dump) to analyse it later.

Question: Is there a possibility to generate such dumps?

If it stays in this state long enough for human intervention then you
can do kill -3 [process number] under Linux I think.
Andy

Razvan · May 10, 2006

Goal: If I reach this state, I want to suspend the application, dump

the complete state of all java threads, objects, ... (complete java
memory core dump) to analyse it later.

Just curios ? What tools would you use to analyze that ?

I never used very advanced debugging techniques. For me,
"System.out.println()" and Java exceptions are more than enough.
Whenever I had a complex issue I just analyzed the whole algorithm with
extra attention. I may loose several hours just thinking but it the end
it always paid out for me. The truth is that I am too lazy to use more
advanced debugging techniques.

Regards,
Razvan

http://www.mihaiu.name/2004/sun_java_scjp_310_035/

halfdog · May 10, 2006

stixwix said:
If it stays in this state long enough for human intervention then you
can do kill -3 [process number] under Linux I think.
Andy

There are two problems: The state is not reproducible, it occurs at any
day or nighttime. Secondly, the server is contacted by various xml-rpc
clients. If they do not get a response within [http-client-timeout]
(120sec) they fall out of sync and the connected clients will report
errors. So halting the system for longer than 1min is out of question.

(Apart from that: with gdb --pid [processID] you can attach to any
running process, then call "generate-core-file x.core", "quit" and you
have a core dump to analyse without killing the process)

Just curios ? What tools would you use to analyze that ?

I heard that there is a tool to analyse OS process core dumps from java
VMs and reconstruct some of the java object state information. I have
no information how to use these and how good the data reconstruction
would be.

I never used very advanced debugging techniques. For me,
"System.out.println()" and Java exceptions are more than enough.
Whenever I had a complex issue I just analyzed the whole algorithm with
extra attention. I may loose several hours just thinking but it the end
it always paid out for me. The truth is that I am too lazy to use more
advanced debugging techniques.

I also used these very frequently until I had some strange debugging
problems to solve:

1: Very rare time race condition: There was no possibility to reproduce
the bug, it occured at a frequency of about 1:1000 000, a test program
calling the methods could make them fail when running long enough:

Problem: Attaching any loggers or system.outs modified the time course
of the program (possibly through thread scheduling when doing IO ops),
so that it was never possible to get the error. Removing the logging
output made it reapper.

Solution: Added many unneeded synchronized{} blocks, so that the error
resulted in a deadlock, which was debuggable

Now it was a deadlock problem, which is also impossible to debug with
System.out, but with debugger attached it is possible to see which
threads wait for monitors. Afterwards you can fix the buggy code

Razvan Mihaiu · May 10, 2006

Just some suggestions:

1. Take the system out of the production environment. I mean replicate
the system somewhere else. I cannot see you working calmly and
efficiently on a system that cannot be down more than 1 minute.

2. Insert as many "System.out" statements as possible. Sometime they
can help even in case of deadlocks - for example you expected a certain
statement to be printed but it wasn't. This could be a good indication
that a deadlock occurred.

I agree - in case of a complex system, this is a nightmare.

2. Create a test application that will also parse the log file of your
application. When a certain error occurs instruct the test application
to stop. Also make a log file for the test application itself. You need
to know the exact steps that were done right before the error.

Now, it all depends on how much time you can spend on this. On most
bugs you cannot spend too much time - so what I told you might still be
impracticable.

Just out of curiosity: how many threads are you speaking about ?

http://www.mihaiu.name/2004/sun_java_scjp_310_035/

halfdog · May 10, 2006

Razvan said:
1. Take the system out of the production environment. I mean replicate
the system somewhere else. I cannot see you working calmly and
efficiently on a system that cannot be down more than 1 minute.

The main difficulty is: the errors are not easily replicable, but may
cause severe troubles in future. Since i'm an perfectionist, I want to
fix them after a single occurence. All errors that I could replicate
are already fixed.

I have a test system and test programs which I run in indefinite loops,
but they only produce some errors (or do you have a test program for
the case: client DNS mapping changes during transaction, or https
certificate expires while reading data?)

The remaining errors have no clear cause, e.g:

"Resource already in use": The application detects that a resource is
still in use so it cannot open it, client call fails (no exception, ..)
As a I know, there should be no lock on the resource. Which other
thread still holds the lock? Or is there even no lock owner, perhaps it
is just an error in the locking system itself?

2. Insert as many "System.out" statements as possible. Sometime they
can help even in case of deadlocks - for example you expected a certain
statement to be printed but it wasn't. This could be a good indication
that a deadlock occurred.

The difficulty is: one thread can detect when the problem has occured,
but has nothing to do with it (see resource in use example), so writing
log output in this thread does not make much sense. So all other
threads have to produce the logging output (they already do, with
level=ALL i get about 100kb/s log info on server when running high
throughput tests, client produces about 800kb in 30sec transaction).
When I have a clue about an error, i work through it, but it is out of
question to enable logging on the production site.

Just out of curiosity: how many threads are you speaking about ?

With no load 20 (timeout checkers, cache optimizers), then 1 per client
and service (if a client needs 3 services in parallel - which is
normal, it will start 3 threads).

Razvan Mihaiu · May 10, 2006

The main difficulty is: the errors are not easily replicable, but may

cause severe troubles in future. Since i'm an perfectionist, I want to
fix them after a single occurence. All errors that I could replicate
are already fixed.

If you can do that please post your technique here. I will sure follow
this thread in the future.

Maybe with the tools that you mention you can do something like this.
Keep the group informed.

With no load 20 (timeout checkers, cache optimizers), then 1 per client
and service (if a client needs 3 services in parallel - which is
normal, it will start 3 threads).

I have to admit. I have not developed such a complex network
application in Java.

Regards,
Razvan

halfdog · May 11, 2006

I did some more searching and found some methods in java.lang.Class
Runtime:

void traceInstructions(boolean on) Enables/Disables tracing
of instructions.
void traceMethodCalls(boolean on) Enables/Disables tracing of
method calls.

After my "bad event", i'll enable these for some seconds to capture at
least some of the other (live) threads, but their data still stays
invisible.

It seems that there is no automated way to capture vm state data for
debugging, if someone has one, pls let me know!

Chris Uppal · May 11, 2006

halfdog said:
void traceInstructions(boolean on) Enables/Disables tracing
of instructions.
void traceMethodCalls(boolean on) Enables/Disables tracing of
method calls.

I don't think either of them do anything.

It seems that there is no automated way to capture vm state data for
debugging, if someone has one, pls let me know!

You /might/ be able to find something here

http://java.sun.com/j2se/1.5.0/docs/tooldocs/index.html#manage

In particular there's a link to a trouble shooting guide:

http://java.sun.com/j2se/1.5/pdf/jdk50_ts_guide.pdf

Lastly, the -Xrunhprof option to the java command has some options which
/might/
be relevant. Try java -Xrunhprof:help for a start, and then try Google for
more explanations

-- chris

halfdog · May 11, 2006

Wow, thanks, you brought me to the right track. Your link pointed to
j2se tools, which I never heard of until now, and there it was:
''jsadebugd'' The tool allows to attach to a running java-vm, or a
core-dump of a vm and present the result via RMI

ps aux | grep java

xxx 21849 0.0 13.1 278344 51012 pts/5 S May10 0:07
home/xxx/external_data/java/jdk1.5.0_06/bin/java
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=/home/xxx/var/tomcat/devel/conf/logging.properties
-Xdebug
-Xrunjdwp:transport=dt_socket,address=localhost:33333,server=y,suspend=n
-Djava.endorsed.dirs=/home/xxx/external_data/java/apache-tomcat-5.5.12/common/endorsed
-classpath
:/home/fiedler/external_data/java/apache-tomcat-5.5.12/bin/bootstrap.jar:/home/xxx/external_data/java/apache-tomcat-5.5.12/bin/commons-logging-api.jar
-Dcatalina.base=/home/xxx/var/tomcat/devel
-Dcatalina.home=/home/xxx/external_data/java/apache-tomcat-5.5.12
-Djava.io.tmpdir=/home/xxx/var/tomcat/devel/temp
org.apache.catalina.startup.Bootstrap start

gdb --pid 21849

Attaching to process 21849
(gdb) generate-core-file java.core
Saved corefile java.core
(gdb) quit

jsadebugd /home/xxx/external_data/java/jdk1.5.0_06/bin/java java.core DebugServer

Attaching to core java.core from executable
/home/xxx/external_data/java/jdk1.5.0_06/bin/java and starting RMI
services, please wait...
Debugger attached and RMI services started.

## Now open another console, print a stack trace

jstack DebugServer@localhost

Attaching to remote server DebugServer@localhost, please wait...
Debugger attached successfully.
Client compiler detected.
JVM version is 1.5.0_06-b05
Thread t@ 22418: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- java.lang.Object.wait() @bci=2, line=474 (Interpreted frame)
- xxxxxxxxxxxx.GenericCallDispatcher$CallDispatcherWorkerThread.run()
@bci=12, line=388 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)

...... and so on

Currently I'm looking for the RMI interface specification for the
jsadebugd, if there are more methods for inspection available out
there. The jdb and this server seem to be incompatible, so it is just a
small step.

PS: Bad? luck Windows users, seems that this ultra-chaotic, scratch
your nose with your feets-geek tool is only available for linux

Chris Uppal · May 14, 2006

halfdog said:
PS: Bad? luck Windows users, seems that this ultra-chaotic, scratch
your nose with your feets-geek tool is only available for linux

Thanks for the follow up. It does sound rather a rather, um, baroque way of
getting at the data you need. But if it works, it works....

-- chris

halfdog · May 16, 2006

With the help of a guy from javasoft I managed to do it all: Debug a
dump of a java vm:

$ cat said:
gcore tomcat.core
detach
quit
END

$ gdb --pid 26003 -x gdb.commands
$ jsadebugd ..../jdk1.5.0_06/bin/java tomcat.core DebugServer &
$ jdb -connect
sun.jvm.hotspot.jdi.SADebugServerAttachingConnector:debugServerName=DebugServer@localhost

You can inspect all the data (threads, stack frames, local variables,
objects) just as if you would have attached to a live VM, only
modifications do not work (step, continue, set...) because a VM core
dump is dead.

This is a really strange way to debug a java application, I never
thought that it could work that way.

ajay.singhes · Feb 21, 2013

Here is a shell script to take and save thread dump in a text file. I have written a shell script to take the thread dump automatically.

visit : http://www.technotechi.com/2012/12/shell-script-to-perform-thread-dump.html

It might help you.

Core dump revisited	14	Dec 17, 2006
JNI FindClass(java/lang/string) causes core dump on AIX 5.2	2	Mar 31, 2006
saving Python process state for later debugging	4	Mar 31, 2007
Generating Java Core with Symbols On Solaris	0	Oct 30, 2003
SCJP 5.0 : 310-055 Sun Certified Programmer for the Java FREE Practice Tests Here	0	Aug 6, 2007
comp.lang.java.gui FAQ	0	Sep 13, 2006
Seek Contract Programming Work - 17 Years Experience	0	Feb 22, 2005
[ANN] JRuby 1.2.0 Released	1	Mar 16, 2009

Dump complete java VM state as core dump (not via OS) possible?

halfdog

stixwix

Razvan

halfdog

Razvan Mihaiu

halfdog

Razvan Mihaiu

halfdog

Chris Uppal

halfdog

Chris Uppal

halfdog

ajay.singhes

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads