Dump complete java VM state as core dump (not via OS) possible?

Discussion in 'Java' started by halfdog, May 10, 2006.

  1. halfdog

    halfdog Guest

    Hi everyone,

    I've a problem debugging an application.

    Background: Sometimes my application comes to a very unlikely state,
    which at the moment results in an error message. The stack trace alone
    has no great value, since this state is cause by the interaction of
    more than one thread. The state is resolved throwing an exception, the
    program continues normally.

    Goal: If I reach this state, I want to suspend the application, dump
    the complete state of all java threads, objects, ... (complete java
    memory core dump) to analyse it later.

    Question: Is there a possibility to generate such dumps?

    Just thinking:

    One posibility would be, to send a signal from the vm to some external
    program (e.g. udp packet), this program attaches a standard OS-level
    debugger (e.g. gdb under linux), dumps the core, resumes app, detach.
    But I guess reconstruction of internal VM state from complete dump is
    rather hard, is it?

    I've looked at jdb, to see if it has some java-core-dump functions, but
    it seems not to be so. Are there alternative implementations, or helper
    scripts that make jdb loop over all java object addresses and dump
    them?

    Does someone known about the jdb remote debugging interface? Is the
    protocol public, can I implement such a feature on myself?
    halfdog, May 10, 2006
    #1
    1. Advertising

  2. halfdog

    stixwix Guest

    halfdog wrote:
    >
    > Goal: If I reach this state, I want to suspend the application, dump
    > the complete state of all java threads, objects, ... (complete java
    > memory core dump) to analyse it later.
    >
    > Question: Is there a possibility to generate such dumps?
    >

    If it stays in this state long enough for human intervention then you
    can do kill -3 [process number] under Linux I think.
    Andy
    stixwix, May 10, 2006
    #2
    1. Advertising

  3. halfdog

    Razvan Guest

    > Goal: If I reach this state, I want to suspend the application, dump
    > the complete state of all java threads, objects, ... (complete java
    > memory core dump) to analyse it later.


    Just curios ? What tools would you use to analyze that ?

    I never used very advanced debugging techniques. For me,
    "System.out.println()" and Java exceptions are more than enough.
    Whenever I had a complex issue I just analyzed the whole algorithm with
    extra attention. I may loose several hours just thinking but it the end
    it always paid out for me. The truth is that I am too lazy to use more
    advanced debugging techniques.



    Regards,
    Razvan

    http://www.mihaiu.name/2004/sun_java_scjp_310_035/


    halfdog wrote:
    > Hi everyone,
    >
    > I've a problem debugging an application.
    >
    > Background: Sometimes my application comes to a very unlikely state,
    > which at the moment results in an error message. The stack trace alone
    > has no great value, since this state is cause by the interaction of
    > more than one thread. The state is resolved throwing an exception, the
    > program continues normally.
    >
    > Goal: If I reach this state, I want to suspend the application, dump
    > the complete state of all java threads, objects, ... (complete java
    > memory core dump) to analyse it later.
    >
    > Question: Is there a possibility to generate such dumps?
    >
    > Just thinking:
    >
    > One posibility would be, to send a signal from the vm to some external
    > program (e.g. udp packet), this program attaches a standard OS-level
    > debugger (e.g. gdb under linux), dumps the core, resumes app, detach.
    > But I guess reconstruction of internal VM state from complete dump is
    > rather hard, is it?
    >
    > I've looked at jdb, to see if it has some java-core-dump functions, but
    > it seems not to be so. Are there alternative implementations, or helper
    > scripts that make jdb loop over all java object addresses and dump
    > them?
    >
    > Does someone known about the jdb remote debugging interface? Is the
    > protocol public, can I implement such a feature on myself?
    Razvan, May 10, 2006
    #3
  4. halfdog

    halfdog Guest

    stixwix wrote:

    > If it stays in this state long enough for human intervention then you
    > can do kill -3 [process number] under Linux I think.
    > Andy


    There are two problems: The state is not reproducible, it occurs at any
    day or nighttime. Secondly, the server is contacted by various xml-rpc
    clients. If they do not get a response within [http-client-timeout]
    (120sec) they fall out of sync and the connected clients will report
    errors. So halting the system for longer than 1min is out of question.

    (Apart from that: with gdb --pid [processID] you can attach to any
    running process, then call "generate-core-file x.core", "quit" and you
    have a core dump to analyse without killing the process)


    Razvan wrote:
    > Just curios ? What tools would you use to analyze that ?


    I heard that there is a tool to analyse OS process core dumps from java
    VMs and reconstruct some of the java object state information. I have
    no information how to use these and how good the data reconstruction
    would be.

    > I never used very advanced debugging techniques. For me,
    > "System.out.println()" and Java exceptions are more than enough.
    > Whenever I had a complex issue I just analyzed the whole algorithm with
    > extra attention. I may loose several hours just thinking but it the end
    > it always paid out for me. The truth is that I am too lazy to use more
    >advanced debugging techniques.


    I also used these very frequently until I had some strange debugging
    problems to solve:

    1: Very rare time race condition: There was no possibility to reproduce
    the bug, it occured at a frequency of about 1:1000 000, a test program
    calling the methods could make them fail when running long enough:

    Problem: Attaching any loggers or system.outs modified the time course
    of the program (possibly through thread scheduling when doing IO ops),
    so that it was never possible to get the error. Removing the logging
    output made it reapper.

    Solution: Added many unneeded synchronized{} blocks, so that the error
    resulted in a deadlock, which was debuggable

    Now it was a deadlock problem, which is also impossible to debug with
    System.out, but with debugger attached it is possible to see which
    threads wait for monitors. Afterwards you can fix the buggy code
    halfdog, May 10, 2006
    #4
  5. Just some suggestions:

    1. Take the system out of the production environment. I mean replicate
    the system somewhere else. I cannot see you working calmly and
    efficiently on a system that cannot be down more than 1 minute.

    2. Insert as many "System.out" statements as possible. Sometime they
    can help even in case of deadlocks - for example you expected a certain
    statement to be printed but it wasn't. This could be a good indication
    that a deadlock occurred.

    I agree - in case of a complex system, this is a nightmare.

    2. Create a test application that will also parse the log file of your
    application. When a certain error occurs instruct the test application
    to stop. Also make a log file for the test application itself. You need
    to know the exact steps that were done right before the error.

    Now, it all depends on how much time you can spend on this. On most
    bugs you cannot spend too much time - so what I told you might still be
    impracticable.


    Just out of curiosity: how many threads are you speaking about ?


    http://www.mihaiu.name/2004/sun_java_scjp_310_035/
    Razvan Mihaiu, May 10, 2006
    #5
  6. halfdog

    halfdog Guest

    Razvan Mihaiu wrote:

    > 1. Take the system out of the production environment. I mean replicate
    > the system somewhere else. I cannot see you working calmly and
    > efficiently on a system that cannot be down more than 1 minute.


    The main difficulty is: the errors are not easily replicable, but may
    cause severe troubles in future. Since i'm an perfectionist, I want to
    fix them after a single occurence. All errors that I could replicate
    are already fixed.

    I have a test system and test programs which I run in indefinite loops,
    but they only produce some errors (or do you have a test program for
    the case: client DNS mapping changes during transaction, or https
    certificate expires while reading data?)

    The remaining errors have no clear cause, e.g:

    "Resource already in use": The application detects that a resource is
    still in use so it cannot open it, client call fails (no exception, ..)
    As a I know, there should be no lock on the resource. Which other
    thread still holds the lock? Or is there even no lock owner, perhaps it
    is just an error in the locking system itself?

    > 2. Insert as many "System.out" statements as possible. Sometime they
    > can help even in case of deadlocks - for example you expected a certain
    > statement to be printed but it wasn't. This could be a good indication
    > that a deadlock occurred.


    The difficulty is: one thread can detect when the problem has occured,
    but has nothing to do with it (see resource in use example), so writing
    log output in this thread does not make much sense. So all other
    threads have to produce the logging output (they already do, with
    level=ALL i get about 100kb/s log info on server when running high
    throughput tests, client produces about 800kb in 30sec transaction).
    When I have a clue about an error, i work through it, but it is out of
    question to enable logging on the production site.

    > Just out of curiosity: how many threads are you speaking about ?


    With no load 20 (timeout checkers, cache optimizers), then 1 per client
    and service (if a client needs 3 services in parallel - which is
    normal, it will start 3 threads).
    halfdog, May 10, 2006
    #6
  7. > The main difficulty is: the errors are not easily replicable, but may
    > cause severe troubles in future. Since i'm an perfectionist, I want to
    > fix them after a single occurence. All errors that I could replicate
    > are already fixed.


    If you can do that please post your technique here. I will sure follow
    this thread in the future.

    Maybe with the tools that you mention you can do something like this.
    Keep the group informed.

    > With no load 20 (timeout checkers, cache optimizers), then 1 per client
    > and service (if a client needs 3 services in parallel - which is
    > normal, it will start 3 threads).


    I have to admit. I have not developed such a complex network
    application in Java.



    Regards,
    Razvan
    Razvan Mihaiu, May 10, 2006
    #7
  8. halfdog

    halfdog Guest

    I did some more searching and found some methods in java.lang.Class
    Runtime:

    void traceInstructions(boolean on) Enables/Disables tracing
    of instructions.
    void traceMethodCalls(boolean on) Enables/Disables tracing of
    method calls.

    After my "bad event", i'll enable these for some seconds to capture at
    least some of the other (live) threads, but their data still stays
    invisible.

    It seems that there is no automated way to capture vm state data for
    debugging, if someone has one, pls let me know!
    halfdog, May 11, 2006
    #8
  9. halfdog

    Chris Uppal Guest

    halfdog wrote:

    > void traceInstructions(boolean on) Enables/Disables tracing
    > of instructions.
    > void traceMethodCalls(boolean on) Enables/Disables tracing of
    > method calls.


    I don't think either of them do anything.


    > It seems that there is no automated way to capture vm state data for
    > debugging, if someone has one, pls let me know!


    You /might/ be able to find something here

    http://java.sun.com/j2se/1.5.0/docs/tooldocs/index.html#manage

    In particular there's a link to a trouble shooting guide:

    http://java.sun.com/j2se/1.5/pdf/jdk50_ts_guide.pdf

    Lastly, the -Xrunhprof option to the java command has some options which
    /might/
    be relevant. Try java -Xrunhprof:help for a start, and then try Google for
    more explanations

    -- chris
    Chris Uppal, May 11, 2006
    #9
  10. halfdog

    halfdog Guest

    Wow, thanks, you brought me to the right track. Your link pointed to
    j2se tools, which I never heard of until now, and there it was:
    ''jsadebugd'' The tool allows to attach to a running java-vm, or a
    core-dump of a vm and present the result via RMI

    > ps aux | grep java

    xxx 21849 0.0 13.1 278344 51012 pts/5 S May10 0:07
    home/xxx/external_data/java/jdk1.5.0_06/bin/java
    -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
    -Djava.util.logging.config.file=/home/xxx/var/tomcat/devel/conf/logging.properties
    -Xdebug
    -Xrunjdwp:transport=dt_socket,address=localhost:33333,server=y,suspend=n
    -Djava.endorsed.dirs=/home/xxx/external_data/java/apache-tomcat-5.5.12/common/endorsed
    -classpath
    :/home/fiedler/external_data/java/apache-tomcat-5.5.12/bin/bootstrap.jar:/home/xxx/external_data/java/apache-tomcat-5.5.12/bin/commons-logging-api.jar
    -Dcatalina.base=/home/xxx/var/tomcat/devel
    -Dcatalina.home=/home/xxx/external_data/java/apache-tomcat-5.5.12
    -Djava.io.tmpdir=/home/xxx/var/tomcat/devel/temp
    org.apache.catalina.startup.Bootstrap start

    > gdb --pid 21849

    Attaching to process 21849
    (gdb) generate-core-file java.core
    Saved corefile java.core
    (gdb) quit

    > jsadebugd /home/xxx/external_data/java/jdk1.5.0_06/bin/java java.core DebugServer

    Attaching to core java.core from executable
    /home/xxx/external_data/java/jdk1.5.0_06/bin/java and starting RMI
    services, please wait...
    Debugger attached and RMI services started.

    ## Now open another console, print a stack trace

    > jstack DebugServer@localhost

    Attaching to remote server DebugServer@localhost, please wait...
    Debugger attached successfully.
    Client compiler detected.
    JVM version is 1.5.0_06-b05
    Thread t@ 22418: (state = BLOCKED)
    - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
    - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame)
    - xxxxxxxxxxxx.GenericCallDispatcher$CallDispatcherWorkerThread.run()
    @bci=12, line=388 (Interpreted frame)
    - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)

    ...... and so on


    Currently I'm looking for the RMI interface specification for the
    jsadebugd, if there are more methods for inspection available out
    there. The jdb and this server seem to be incompatible, so it is just a
    small step.

    PS: Bad? luck Windows users, seems that this ultra-chaotic, scratch
    your nose with your feets-geek tool is only available for linux
    halfdog, May 11, 2006
    #10
  11. halfdog

    Chris Uppal Guest

    halfdog wrote:

    > PS: Bad? luck Windows users, seems that this ultra-chaotic, scratch
    > your nose with your feets-geek tool is only available for linux


    Thanks for the follow up. It does sound rather a rather, um, baroque way of
    getting at the data you need. But if it works, it works....

    -- chris
    Chris Uppal, May 14, 2006
    #11
  12. halfdog

    halfdog Guest

    With the help of a guy from javasoft I managed to do it all: Debug a
    dump of a java vm:

    $ cat << END > gdb.commands
    > gcore tomcat.core
    > detach
    > quit
    > END

    $ gdb --pid 26003 -x gdb.commands
    $ jsadebugd ..../jdk1.5.0_06/bin/java tomcat.core DebugServer &
    $ jdb -connect
    sun.jvm.hotspot.jdi.SADebugServerAttachingConnector:debugServerName=DebugServer@localhost

    You can inspect all the data (threads, stack frames, local variables,
    objects) just as if you would have attached to a live VM, only
    modifications do not work (step, continue, set...) because a VM core
    dump is dead.

    This is a really strange way to debug a java application, I never
    thought that it could work that way.
    halfdog, May 16, 2006
    #12
  13. halfdog

    Guest

    , Feb 21, 2013
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    1,770
  2. Replies:
    0
    Views:
    646
  3. Replies:
    4
    Views:
    1,386
  4. Replies:
    2
    Views:
    324
    Default User
    Jul 18, 2007
  5. Wesley
    Replies:
    1
    Views:
    112
    Wesley
    Apr 15, 2014
Loading...

Share This Page