Get performance statistics?

P

Patricia Shanahan

I would like to collect, inside a Java application, statistics such as
the amount of CPU time used. Any idea how?

I can, of course, measure the elapsed time, but that does not tell me
how much time was spent actually computing vs. waiting for disk.

Patricia
 
D

Daniel Pitts

Patricia said:
I would like to collect, inside a Java application, statistics such as
the amount of CPU time used. Any idea how?

I can, of course, measure the elapsed time, but that does not tell me
how much time was spent actually computing vs. waiting for disk.

Patricia
A quick googling leads me to believe you might need to use JNI (and
therefore have a platform-specific solution)
<http://www.google.com/search?q=java+system+monitoring>

Example for CPU on Win32
<http://www.javaworld.com/javaworld/javaqa/2002-11/01-qa-1108-cpu.html>

Hope this helps.

- Daniel.
 
R

Robert Klemme

A quick googling leads me to believe you might need to use JNI (and
therefore have a platform-specific solution)
<http://www.google.com/search?q=java+system+monitoring>

Example for CPU on Win32
<http://www.javaworld.com/javaworld/javaqa/2002-11/01-qa-1108-cpu.html>

JVMTI might also help:

http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/index.html
http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html#timers

A low level solution would be to create a TimedInputStream and
TimedOutputStream which measure and sum up time spend in write() and
read(). You could then substract that from wall clock for this thread.
If you want to get more fancy those streams could register themselves
with some thread global counter so you automatically get all IO timings
if you make sure every stream is replaced (not easy though with 3rd
party libs like JDBC drivers). It depends on what you actually want to
measure and to what level of detail.

Kind regards

robert
 
P

Patricia Shanahan

Robert said:
JVMTI might also help:

http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/index.html
http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html#timers

A low level solution would be to create a TimedInputStream and
TimedOutputStream which measure and sum up time spend in write() and
read(). You could then substract that from wall clock for this thread.
If you want to get more fancy those streams could register themselves
with some thread global counter so you automatically get all IO timings
if you make sure every stream is replaced (not easy though with 3rd
party libs like JDBC drivers). It depends on what you actually want to
measure and to what level of detail.

JVMTI looks interesting. The disk accesses that I'm worried about are
due to paging, not explicit requests, but JVMTI does have CPU time
collection.

Patricia
 
P

Patricia Shanahan

Daniel said:
A quick googling leads me to believe you might need to use JNI (and
therefore have a platform-specific solution)
<http://www.google.com/search?q=java+system+monitoring>

Example for CPU on Win32
<http://www.javaworld.com/javaworld/javaqa/2002-11/01-qa-1108-cpu.html>

Yup, this mirrors the results of my searching.

Can anyone recommend a good tutorial on making JNI code portable? I need
to make this work on MS-WindowsXP, Linux with 32-bit JVM, and Linux with
64-bit JVM.

Patricia
 
D

Daniel Pitts

Patricia said:
Yup, this mirrors the results of my searching.

Can anyone recommend a good tutorial on making JNI code portable? I need
to make this work on MS-WindowsXP, Linux with 32-bit JVM, and Linux with
64-bit JVM.

Patricia

My guess is that what you want specifically is only obtainable through
platform specific means. You'll have to write the platform specifics
for finding the current disk load, but the rest can be portable.

I also think that the code could be the same for both the Linux
versions (it would just need to be compiled for both). so you'll
probably just need two "versions". You might be able to use #ifdef
directives to have one source that compiles differently on
windows/linux. If you don't have much experience in C/C++, I would
suggest reading a quick primer.

Good luck,

Daniel.
 
R

Robert Klemme

JVMTI looks interesting. The disk accesses that I'm worried about are
due to paging, not explicit requests, but JVMTI does have CPU time
collection.

If it is just for a one time debug (i.e. not necessarily part of a
product, I am not 100% sure from what you wrote) you could use OS
specific tools. On Windows that should be fairly easy with PerfMon and
on Linux you can use iostat, vmstat and relatives.

Regards

robert
 
P

Patricia Shanahan

Robert said:
If it is just for a one time debug (i.e. not necessarily part of a
product, I am not 100% sure from what you wrote) you could use OS
specific tools. On Windows that should be fairly easy with PerfMon and
on Linux you can use iostat, vmstat and relatives.

The lack of clarity about debug vs product is inherent in the nature of
the application. It is part of a CS research project. Understanding the
behavior of the program is part of the product.

Yes, there are basically two alternatives. Ideally, I would like the
data to appear in the output file, so that it is packaged with the rest
of the information about the run. However, I may go outside.

Either way, I'm afraid it is going to be less convenient than my current
lifestyle - one makefile to control the runs, one Jar file to contain my
program, and it all works on my home system, works on my university
desktop, and runs dozens of jobs in parallel on a large grid computer.

Patricia
 
R

Robert Klemme

The lack of clarity about debug vs product is inherent in the nature of
the application. It is part of a CS research project. Understanding the
behavior of the program is part of the product.

That sounds interesting! Are you allowed to disclose more detail?
Yes, there are basically two alternatives. Ideally, I would like the
data to appear in the output file, so that it is packaged with the rest
of the information about the run. However, I may go outside.

Ah, I see.
Either way, I'm afraid it is going to be less convenient than my current
lifestyle - one makefile to control the runs, one Jar file to contain my
program, and it all works on my home system, works on my university
desktop, and runs dozens of jobs in parallel on a large grid computer.

Oh, you are using "make"? I am so glad that I did not have to touch
"make" for years now. "ant" is a really great alternative when in the
Java world.

I believe there is a generic way to launch a shared lib via the java
command line. As long as you create that for all platforms involved and
make sure it's installed you could still get away with the single make.

I never closely looked at Java WebStart but I figure it might contain
features to also install binary extensions - might be worth a look.

Good luck!

Kind regards

robert
 
P

Patricia Shanahan

Robert said:
That sounds interesting! Are you allowed to disclose more detail?

I'm doing research in ubiquitous computing. My adviser is Bill Griswold
- you can see some of the sort of work by looking at his home page,
http://www.cs.ucsd.edu/~wgg/

My particular line is applying machine learning to ubiquitous computing.
Machine learning algorithms can easily get into time and/or space
trouble.

My immediate problem is whether runs that take 24 hours do so
because they are thrashing, or because of sheer CPU time. It is harder
than it sounds, because the largest jobs only run on a grid computer
where I have limited access to the compute elements. However, the
performance data is something I should collect so that I can put some
statistics in papers.

Each job reads a small XML parameter file describing a simulation, sets
up and runs it, and outputs a very slightly larger XML file containing
the results. Because of the use of XML, I can add e.g. performance data
to the output file without disturbing my output analysis programs.

Ah, I see.


Oh, you are using "make"? I am so glad that I did not have to touch
"make" for years now. "ant" is a really great alternative when in the
Java world.

I believe there is a generic way to launch a shared lib via the java
command line. As long as you create that for all platforms involved and
make sure it's installed you could still get away with the single make.

I never closely looked at Java WebStart but I figure it might contain
features to also install binary extensions - might be worth a look.

I have no ability to install software on the grid I use for bulk
runs. It is shared and has its own administration team. I install cygwin
on the windows machines I do control (my home desktop, desktop at UCSD,
laptop, and tablet). The grid has a grid-aware make, qmake, installed.

I have an EXTREMELY simple makefile, so the usual issues don't arise. It
is just a portable way of managing runs. Depending on which command and
command line parameters I use, it can do one job at a time, or 50.

Patricia
 
C

Chris Uppal

Patricia said:
Either way, I'm afraid it is going to be less convenient than my current
lifestyle - one makefile to control the runs, one Jar file to contain my
program, and it all works on my home system, works on my university
desktop, and runs dozens of jobs in parallel on a large grid computer.

Then it might be easier to use the Java-native JMX interfaces to the same (I
assume) features as JVMTI. See java.lang.management.ThreadMXBean.

I have never used it myself, so I don't know what lurking problems there may
be, but I'd guess it's worth spending a little time on it in the hope of
avoiding JVMTI or (worse) OS-specific JNI code.

But -- just a thought -- I'd have expected the target grid computer(s) to have
monitoring built-in for all this kind of thing. If that already exists, then
you can presumably activate and collect it as part of your "job submission" (or
whatever the modern equivalent is); so there's be no need to replicate it in
your other environments. The parsing code would run on, say, your home machine
just the same as on the grid machine, but it'd be reading dummy data for
testing.

-- chris
 
P

Patricia Shanahan

Chris said:
Then it might be easier to use the Java-native JMX interfaces to the same (I
assume) features as JVMTI. See java.lang.management.ThreadMXBean.

I have never used it myself, so I don't know what lurking problems there may
be, but I'd guess it's worth spending a little time on it in the hope of
avoiding JVMTI or (worse) OS-specific JNI code.

Thanks! That looks like just the sort of thing I was looking for. I'll
have to do tests to see whether the features I need are supported, but
since both MS-Windows and Linux have the OS features to support it, I
would expect it to work.

If that works out, I can get my ideal, a performance record in the XML
output from the program.
But -- just a thought -- I'd have expected the target grid computer(s) to have
monitoring built-in for all this kind of thing. If that already exists, then
you can presumably activate and collect it as part of your "job submission" (or
whatever the modern equivalent is); so there's be no need to replicate it in
your other environments. The parsing code would run on, say, your home machine
just the same as on the grid machine, but it'd be reading dummy data for
testing.

I do production runs on every machine I use. If I just need a couple of
results from smallish jobs, it isn't worth while creating a session on
the grid's control node and transferring files around. On the other
hand, when I need 200 runs or a job that hits out-of-memory if -Xmx
specifies less that 1500m, the grid is the place to be.

Patricia
 
P

Patricia Shanahan

Chris said:
Then it might be easier to use the Java-native JMX interfaces to the same (I
assume) features as JVMTI. See java.lang.management.ThreadMXBean.

I have never used it myself, so I don't know what lurking problems there may
be, but I'd guess it's worth spending a little time on it in the hope of
avoiding JVMTI or (worse) OS-specific JNI code.

Yes, that's the answer. I've tested the following sample program on a
couple of my MS-Windows system and on the grid:

package performance_stats;
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;

public class CPUTime {
public static void main(String[] args) {
System.out.println(getThreadCPUTime());
}

/** Get the CPU time used so far in this thread.
*
* @return CPU time in seconds
* @throws UnsupportedOperationException CPU time either not supported
* or not enabled.
*/
private static double getThreadCPUTime() throws
UnsupportedOperationException {
ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
long rawTime = threadBean.getCurrentThreadCpuTime();
if(rawTime == -1){
throw new UnsupportedOperationException("Thread CPU time capture
not enabled");
}
return rawTime/1e9;
}
}

It's pure Java, so I don't need to compile anything specially for the
Linux boxes, and it runs in my program, so I can put my stats in the
output file.

Thanks,

Patricia
 
D

Daniel Pitts

Patricia said:
I'm doing research in ubiquitous computing. My adviser is Bill Griswold
- you can see some of the sort of work by looking at his home page,
http://www.cs.ucsd.edu/~wgg/

My particular line is applying machine learning to ubiquitous computing.
Machine learning algorithms can easily get into time and/or space
trouble.

My immediate problem is whether runs that take 24 hours do so
because they are thrashing, or because of sheer CPU time. It is harder
than it sounds, because the largest jobs only run on a grid computer
where I have limited access to the compute elements. However, the
performance data is something I should collect so that I can put some
statistics in papers.

Each job reads a small XML parameter file describing a simulation, sets
up and runs it, and outputs a very slightly larger XML file containing
the results. Because of the use of XML, I can add e.g. performance data
to the output file without disturbing my output analysis programs.

It might be possible to at least get garbage collection stats (as well
as other JVM specific stats)

If you think it is thrashing, try to find a situation where you can
execute this in a controlled environment where you can monitor such
activity with specific tool. If you are using a lot of memory, you
might also try increasing the starting memory pool, and see if that
helps.
 
A

Andrew Thompson

Patricia Shanahan wrote:
....
package performance_stats;

As an aside. That package name threw me. I cannot recall
ever seeing a package name with an '_' character!

Does..
// my immediate pref., lacking further info.
package stats.performance;
...or..
package performance.stats;
...not 'work for you'?

Andrew T.
 
R

Robert Klemme

Yes, that's the answer. I've tested the following sample program on a
couple of my MS-Windows system and on the grid:
It's pure Java, so I don't need to compile anything specially for the
Linux boxes, and it runs in my program, so I can put my stats in the
output file.

Great! I'll make a mental note of this. Could be that I have use for
this soon, too.

Kind regards

robert
 
P

Patricia Shanahan

Andrew said:
Patricia Shanahan wrote:
...

As an aside. That package name threw me. I cannot recall
ever seeing a package name with an '_' character!

Does..
// my immediate pref., lacking further info.
package stats.performance;
..or..
package performance.stats;
..not 'work for you'?

Andrew T.

The syntax for package names is identifier syntax. Why not use
underscores? It seems to me to be a very natural way to separate words
that do not represent hierarchy levels.

I hate identifiers that just munch words together, like "datatransfer"

If I planned to have a bunch of stats packages, I would use
"stats.performance". If I planned to have several performance packages,
I would go with "performance.stats".

Patricia
 
E

Ed

Patricia Shanahan skrev:
Andrew Thompson wrote:
The syntax for package names is identifier syntax. Why not use
underscores? It seems to me to be a very natural way to separate words
that do not represent hierarchy levels.

I hate identifiers that just munch words together, like "datatransfer"

If I planned to have a bunch of stats packages, I would use
"stats.performance". If I planned to have several performance packages,
I would go with "performance.stats".

Patricia

Yeah, why not use underscores? Can't think of a reason.

Though it seems that your, "performance," is qualifying your, "stats;"
as though you might also have some other type of stats, in which case,
as you said, you'd use, "stats.performance." So if your, "performance,"
is not a qualifier, why not just call your package, "stats?"

..ed
 
R

RedGrittyBrick

Patricia said:
If I planned to have a bunch of stats packages, I would use
"stats.performance". If I planned to have several performance packages,
I would go with "performance.stats".

Couldn't you end up with ambiguous/redundant packages?
stats.foo
stats.bar
stats.performance // ***
performance.pling
performance.splat
performance.stats // ***
 
P

Patricia Shanahan

RedGrittyBrick said:
Couldn't you end up with ambiguous/redundant packages?
stats.foo
stats.bar
stats.performance // ***
performance.pling
performance.splat
performance.stats // ***

If I planned to have both multiple stats packages and multiple
performance packages, I would have to choose, possibly arbitrarily,
which to consider the higher level.

That is always a problem with anything that forces hierarchical
organization on data that is really a lattice.

Patricia
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top