jni to optimize a java application using native mathematical libraries

dimitri.ognibene · Apr 2, 2006

Hi,
I've built an application, nn and simulation with a lot of
visualization , in java, now my computational needs are exceeding the
power of my system.
I usually used the observer pattern to listen to changes on the data
model, and the visualization only redraws if needed and isn't
synchronized with the numerical compuation.
However i read matrix data inside my paint methods.

Can i use jni and an optimized native library to implement the
numerical elaboration?
Doing so, will my visualization code be preserved and reused?
Will my system perform better?

I think that much of the matter is in the grain of jni interface and
data exchanged between the 2 systems.. but I've not the experience to
give you all data without a little preliminary help.

So this data are only to describe the problem in general. I'll give the
details that you tell me are needed

Thanks

Dimitri Ognibene · Apr 3, 2006

Thanks gordon,
i suppose to use Intel Math Kernel Library to re-implement Neural
Networks code. I will need to extract all data (connections weights and
activation values) any time the gui is refreshed.
so you say

1)make as few, as "large" calls to native methods as possible.
2)try to reduce the amount of data you have to copy back and forth.

I think to have only calls to train and evaluate..
but i want that the native library preserve results, becouse they will
be used over and over again in the calculation.. while java needs only
to send new data and read data to display results.

pass only primitives (or arrays of primitives) to your native

methods in order to reduce their dependency on invoking Java methods
or JNI accessors to get the job done.
I suppose to only pass multi-dim double arrays...so here I've no
problem

I'm still in doubt that in this situation perhaps i'll gain more and
more easily rewriting all using c++ or using something like Ninja
classes by ibm to implement calculation, or find an already built
interface (jni or nio) to optimized mathematical libraries.
Any further suggestion?
Thanks Dimitri

Gordon Beaton · Apr 3, 2006

Will my system perform better?

I think that much of the matter is in the grain of jni interface and
data exchanged between the 2 systems..

You have the right idea. JNI is not a guarantee of better performance,
but you can increase its chances of improving your performance by
following a few simple rules:

- make as few, as "large" calls to native methods as possible.

- try to reduce the amount of data you have to copy back and forth.

- pass only primitives (or arrays of primitives) to your native
methods in order to reduce their dependency on invoking Java methods
or JNI accessors to get the job done.

- If you need to return results from a method, use the return value
like it was intended, i.e. avoid writing void methods that pass
results back through object reference arguments or by updating
fields in the calling object.

Note that CPU bound calculations aren't necessarily much faster in
native code than in Java, so the potential gain is easily lost if you
aren't careful.

/gordon

Remon van Vliet · Apr 3, 2006

Dimitri Ognibene said:
Thanks gordon,
i suppose to use Intel Math Kernel Library to re-implement Neural
Networks code. I will need to extract all data (connections weights and
activation values) any time the gui is refreshed.
so you say
I think to have only calls to train and evaluate..
but i want that the native library preserve results, becouse they will
be used over and over again in the calculation.. while java needs only
to send new data and read data to display results.
methods in order to reduce their dependency on invoking Java methods
or JNI accessors to get the job done.
I suppose to only pass multi-dim double arrays...so here I've no
problem

I'm still in doubt that in this situation perhaps i'll gain more and
more easily rewriting all using c++ or using something like Ninja
classes by ibm to implement calculation, or find an already built
interface (jni or nio) to optimized mathematical libraries.
Any further suggestion?
Thanks Dimitri

Err, i sincerely doubt multi dimensional arrays, mathematics, neural net
computation and such are considerably faster in C++ compared to Java. Have
you actually profiled your code and checked where the hotspots are? Have you
tried running your code in the server VM? Your original post suggests you
overdesigned parts of your code (i.e. implemented the observer pattern).
Basically i dont think your C++ code will be considerably faster than a
direct port in Java. Claiming C++ magically makes your code 5 times as fast
is very 1998.

Dimitri Ognibene · Apr 4, 2006

hi Remon
i think so too, but i've already profiled a lot, and I'll do it again,
but the intel libraries are very fast adn scalable to multi-core so I
look at them as a possible solution. My application actually is a
simulator running on a desktop but i hope to parellelize it asap, but
there will be big sync and performance issues. do you have any
suggestion to visualize data without requiring "overdesigned" patterns
like observer? my system is pretty complex with some async neural
networks that interact.. and I don't want to fill my code of things
like redraw or similar, i dislike observable.update too, but it the
littlest evil i've found. One other problem is to do the as few copies
of data as possible but I need some buffer to let the components
interact.. and to don't broke my data by error (why not final arrays in
java? :-(
This are the first points i will optimize, and i've seen that jni will
increace array copies...
If you have any suggestion please let me know.
Dimitri

Roedy Green · Apr 4, 2006

Claiming C++ magically makes your code 5 times as fast
is very 1998.

recent benchmarks posted here show the reverse. Java compiler
technology is now ahead of C++.

James Westby · Apr 4, 2006

Roedy said:
recent benchmarks posted here show the reverse. Java compiler
technology is now ahead of C++.

Matlab has recently deprecated the tool that automatically converts .m
files to mex files (kind of JNI for Matlab) as they say that the
run-time optimistation performed in Matlab removes the need to do this.

However I have just finished moving some Matlab code from an m file to
C. The code basically computes a random walk, and so involves numerical
integration of a function involving gaussians. I moved from using
Matlab's randn (very useful to have a generator with gaussian pdf built
in) to using the GNU scientific library. This has speeded up my code a
staggering amount, and makes it reasonable to run the experiments now.

I'm not sure how the randn function is implemented in Matlab, so I'm not
sure where the optimisations are coming from, but they impress me none
the less. I also think that Matlab's JIT is not a Java JIT.

I'm not trying to say Java is slow, and I certainly don't believe it is.
This function is where the code previously spent 99% of it's time, and
was executed approximately 2.5 million times per run, needing to
generate 50 million random numbers in that time, and this is just the
simple test while I'm developing the code, the real numbers will
probably be thousands of times bigger. I realise that this is the
comment usually made about optimisation, that it should be done at
exactly this point, and I agree with the arguments. The optimisation I
did involved switching to highly developed code using rigorously studied
algorithms, far far far better than I could ever have implemented myself.

If I have time then I will be porting a lot of this code over to Java,
and if I do I will post some measurements of how Sun's HotSpot fares on
this code, as I assume it would be a perfect candidate for their
optimisations (Short loops, but enough in them to give the CPU something
to do between branches, highly predictable branching, few memory
requirements, though I wouldn't exactly call it a real-world application).

James

Roedy Green · Apr 4, 2006

I also think that Matlab's JIT is not a Java JIT.

If you can collect all the jars, try compiling it with Jet and see
what sort of speed you get.

Dimitri Ognibene · Apr 4, 2006

Thanks James,

I've done exaclty the same considerations, the problem (mine specific
problem) is not if java is faster then c++, but to use optimized
libraries like GNU scientific library, that i didn't know till your
answer (thks), or Intel MKL, to replace mine non optimized, and not
truly able to optimize any further, code. In my lab we have used MKL
exp function instead of standard c implementation and we obtained a
speed up of 5 times in our neural network code, and we can't link
statically!!!
Now, I'm sure that those libraries are faster then any code i'll ever
write, but I don't know if interfacing them with my already written
(130 classes) java system. I suppose, as I've said in my first post,
that in my specific application the use of an external library using
JNI will make the system much more complex (And difficult to mantain
and debug) and only a little faster.
I will be happier if I find a good java Math library, even if not as
good as MKL, and I've not to write boring JNI stuff, array copyng
methods and so on.
If I'm right you are translating your entire Matlab simulation to c, i
don't want to do this, at the moment, my coworker want, but I'm the
sw-engineer in the lab so.. much of the effort and of the decisions are
mine.. And i would like to find a compromise using some good Math
library to optimize the code where , as you said, 90% cpu of time is
spent, random, gaussian, sin and similar functions and perhaps Matrix
multiplication.. the use of java multidimensianl arrays isn't good,
I've tried to optimizing using code like:

double matrix1[50][900];
double vector2[900];
for(int i =0;i<50;i++){
final double matrix_col[]=matrix;
for(int j=0;j<900;j++){
.....}}

But only little speed up is gained..
And I don't have the time to find optimization tricks.. so a library
perhaps would be a better solution..
If you have any suggestion, please let me know

Dimitri Ognibene · Apr 4, 2006

Thanks Gordon,
I already know this page, it looks outdated.. but it contains several
interesting links.. Like the colt project. However the libraries links
that I've found look outdated too.. has numerical compuation in java
disappeared? If you have ever used one of this libraries or have any
other insight please let me know.
p.s. I've found a project on sf to interface java to gsl
http://sourceforge.net/projects/gsl-java, does anyone ever used it? It
look outadated tooooo

Good work
Dimitri

Gordon Beaton · Apr 4, 2006

And I don't have the time to find optimization tricks.. so a library
perhaps would be a better solution..

Do any of these libraries help?

http://math.nist.gov/javanumerics/

Chris Uppal · Apr 4, 2006

I usually used the observer pattern to listen to changes on the data
model, and the visualization only redraws if needed and isn't
synchronized with the numerical compuation.
However i read matrix data inside my paint methods.

I suspect that you should use /more/ copying of data, not less. Your
simulation engine will run best if it can ignore the possibility that something
else is reading the same data. So it runs at full speed on one thread (not
doing any synchronisation). At extremely long intervals by computer
standards -- roughly once a second, say -- it makes a copy of the current state
of the simulation, and saves it. The test for whether to do that is in the
outermost loop of the simulation, and so that will have negligible effect on
the overall speed. When it determines that it is time to make a copy, it does
so, and then (and only then) uses a synchronised method to save the new
description of the state.

The GUI meantime (running on a different thread) updates the screen display at
regular intervals. To do that it uses a synchronised method to get the most
recent copy and refreshed from that. It keeps that copy around so that it can
repaint() itself as necessary.

Depending on how you've structured your existing code, making the copy may be
almost trivial. Note that you will have no display-related code in the
simulation engine at all (not even triggering notifications for any Observers).

Can i use jni and an optimized native library to implement the
numerical elaboration?
Doing so, will my visualization code be preserved and reused?
Will my system perform better?

A lot depends on how much work you do in each call to JNI. If you are just
doing something trivial like generating the next random number, then almost
certainly not. The cost of crossing the JNI barrier is pretty high, and will
swamp the gains from using (say) Intel's maths libraries. On the other hand,
if you have some slow operation (like matrix multiplication) where the time
taken is high, and -- more importantly -- the time required to copy any
necessary data across the JNI barrier is small in comparison[*] then using the
native libraries may help you.

([*] Copy is O(N) but if the operation is, say, O(N**2) then you can ignore the
cost of the copy.)

It may be that your code is dominated by a small number of slow operations
which can be implemented quickly in an external library. For instance it may
be that array multiplication dominates the time, and that the Intel library has
a particularly well-tuned implementation of that. If that applies then you may
see big gains by using JNI for array multiplication. If not then you'll have
to rewrite your code so that the bulk of the implementation /is/ in C/C++ if
you want to take advantage of Intel's libraries -- e..g make each step of your
simulation into a single call to JNI.

BTW, the way that Java represents 2D arrays is not efficient, and is probably
incompatible with what an external library would expect. If you represent a
logically two-dimensional array of doubles as a double[][] then each access
will require two indirections. A better scheme (albeit quite a bit more work)
is to represent it as a single double[] and use arithmetic combinations of the
row/column coordinates to find each element. The external library will expect
to find the data in this format anyway, so by using it internally you minimise
the messing around (and perhaps the copying too) when you cross the JNI
barrier. For instance from one of your later posts in this thread:

double matrix1[50][900];
for(int i =0;i<50;i++){
final double matrix_col[]=matrix;
for(int j=0;j<900;j++){
....}}

becomes:

double[] matrix = new double[50*900];
for (int i = 0; i < 50; i++)
{
int start = i * 900;
int end = start + 900;
for (int j = start; j < end; j++)
{
float elem = matix[j];
...
}
}

Some people have reported seeing useful speedups using that technique (not huge
but useful), but the main reason for using it is so that highly tuned external
implementations of the array operation can work on the data more-or-less
directly.

-- chris

Dimitri Ognibene · Apr 4, 2006

thank you for your general advices,
my simulation code si built by many components, and not any of them
changes its state at every step, so an update method is usefull for me,
another problem is that i'm afraid of modifing by mystake data inside
the model step, I had very small time so I'm not sure of some pieces of
code, so I copy my data between model components.. Yes it's my mistake
and i will remove superfluos copies asap.
Do you know if there is some pre-compiler tool that can verify write
violations like the const keyword of c++?
Another thing that I can't use is the synch of the gui, because it is
not important in displaying large data set, only global data, a few
doubles, are synchronized, and obviously copied as args of updates.

A lot depends on how much work you do in each call to JNI. If you are just
doing something trivial like generating the next random number, then almost
certainly not. The cost of crossing the JNI barrier is pretty high, and will
swamp the gains from using (say) Intel's maths libraries. On the other hand,
if you have some slow operation (like matrix multiplication) where the time
taken is high, and -- more importantly -- the time required to copy any
necessary data across the JNI barrier is small in comparison[*] then using the
native libraries may help you.

Do you know where i can find some resource on the performance of JNI
barrier?
I've a matrix of 900X400.. but it's element are the results of the
previews computation... so i wish to leave a copy in the native library
and only extract a copy when I need one...
I'm starting to think that it is easier to rewrite all in c++ MKL and
qt... If i'll obtain less then 2 time speedup..

I was thinking of unwindin matrix operation like you suggested but I've
some operation like appling moving 2D filters that are a little
complex, now that i've seen them work in simple non-unwinded mode,
perhaps i can optimize them.. Can you suggest a manner to profile
effective speed-up?

Thanks,
Dimitri

James Westby · Apr 4, 2006

Dimitri said:
Thanks James,

I've done exaclty the same considerations, the problem (mine specific
problem) is not if java is faster then c++, but to use optimized
libraries like GNU scientific library, that i didn't know till your
answer (thks), or Intel MKL, to replace mine non optimized, and not
truly able to optimize any further, code. In my lab we have used MKL
exp function instead of standard c implementation and we obtained a
speed up of 5 times in our neural network code, and we can't link
statically!!
Now, I'm sure that those libraries are faster then any code i'll ever
write, but I don't know if interfacing them with my already written
(130 classes) java system. I suppose, as I've said in my first post,
that in my specific application the use of an external library using
JNI will make the system much more complex (And difficult to mantain
and debug) and only a little faster.

That is a problem that you should avoid if possible. There is a
trade-off between the speedup you can get and the increased complexity
in maintaining the code.

I will be happier if I find a good java Math library, even if not as
good as MKL, and I've not to write boring JNI stuff, array copyng
methods and so on.
If I'm right you are translating your entire Matlab simulation to c, i
don't want to do this, at the moment, my coworker want, but I'm the
sw-engineer in the lab so.. much of the effort and of the decisions are
mine.. And i would like to find a compromise using some good Math
library to optimize the code where , as you said, 90% cpu of time is
spent, random, gaussian, sin and similar functions and perhaps Matrix
multiplication.. the use of java multidimensianl arrays isn't good,
I've tried to optimizing using code like:

I've only moved one small part to C, the bit that was taking all the
time when I profiled the code. Have you done that? I don't know of maths
libraries in Java, it would like to know if there are any good ones.

double matrix1[50][900];
double vector2[900];
for(int i =0;i<50;i++){
final double matrix_col[]=matrix;
for(int j=0;j<900;j++){
....}}

Matrix multiplication is a slow operation, and probably a good candidate
for optimistation, either by you or by swapping the code for something
specialised (BLAS springs to mind, but I can only find a small mention
to jBLAS by Google).

But only little speed up is gained..
And I don't have the time to find optimization tricks.. so a library
perhaps would be a better solution..
If you have any suggestion, please let me know

Click to expand...

James

Dimitri Ognibene · Apr 4, 2006

Thanks James, I'll take a look to jBLAS api, if they are developed by
google they should be usefull and updated, i hope
thanks
Dimitri

James Westby · Apr 4, 2006

Dimitri said:
Thanks James, I'll take a look to jBLAS api, if they are developed by
google they should be usefull and updated, i hope
thanks
Dimitri

No, it was a Google search. But it looked like the API was not fully
developed yet, so it probably wont be very useful.

James

Chris Uppal · Apr 6, 2006

Dimitri said:
Do you know if there is some pre-compiler tool that can verify write
violations like the const keyword of c++?

No. Sorry ;-)

Do you know where i can find some resource on the performance of JNI
barrier?

Not offhand, and it varies according to what you are doing anyway. You'll have
to measure it yourself.

FWIW, I recently measured that on this 1.5 GHz WinXP box, the time taken for a
JNI call to a native method declared as:

static native int nothing(int i);

is about 30 nanoeconds on a 1.5.0 JVM. The actual implementation is:

JNIEXPORT jint JNICALL
Java_Test_nothing(JNIEnv *e, jclass c, jint i)
{
return i;
}

so presumably the time is almost all JNI overhead. Other JNI operations have
different overheads.

I've a matrix of 900X400.. but it's element are the results of the
previews computation... so i wish to leave a copy in the native library
and only extract a copy when I need one...

Given my point that you are probably not copying /enough/, I doubt if this is
the right way to go.

I was thinking of unwindin matrix operation like you suggested but I've
some operation like appling moving 2D filters that are a little
complex, now that i've seen them work in simple non-unwinded mode,
perhaps i can optimize them.. Can you suggest a manner to profile
effective speed-up?

Just try it. If the re-write is too difficult to be feasible as an experiment,
then it's probably too complex to use for production purposes.

-- chris

Using NetBeans 7.1.2 JNI: Where Is a Good Tutorial on How to InvokeC++ from Java	18	Jun 15, 2012
Using JNI to Call C++ Methods from Java Using NetBeans IDE 7.1.2	0	Jun 14, 2012
Using JNI (to get C++ routines) with Java : NetBeans IDE 7.1.2	0	Jun 14, 2012
Using JNI to Invoke C++ Method from Java via NetBeans 7.1.2	2	Jun 14, 2012
Using JNI to Invoke C++ Method in Java via IDE NetBeans 7.1.2	0	Jun 14, 2012
loading native libraries with java webstart	0	May 10, 2006
Freeware installer for java application with native library	3	Jun 22, 2009
jni, native methods and access to shared memory	0	Jul 2, 2011

jni to optimize a java application using native mathematical libraries

dimitri.ognibene

Dimitri Ognibene

Gordon Beaton

Remon van Vliet

Dimitri Ognibene

Roedy Green

James Westby

Roedy Green

Dimitri Ognibene

Dimitri Ognibene

Gordon Beaton

Chris Uppal

Dimitri Ognibene

James Westby

Dimitri Ognibene

James Westby

Chris Uppal

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads