jni to optimize a java application using native mathematical libraries

  • Thread starter dimitri.ognibene
  • Start date
D

dimitri.ognibene

Hi,
I've built an application, nn and simulation with a lot of
visualization , in java, now my computational needs are exceeding the
power of my system.
I usually used the observer pattern to listen to changes on the data
model, and the visualization only redraws if needed and isn't
synchronized with the numerical compuation.
However i read matrix data inside my paint methods.

Can i use jni and an optimized native library to implement the
numerical elaboration?
Doing so, will my visualization code be preserved and reused?
Will my system perform better?

I think that much of the matter is in the grain of jni interface and
data exchanged between the 2 systems.. but I've not the experience to
give you all data without a little preliminary help.

So this data are only to describe the problem in general. I'll give the
details that you tell me are needed

Thanks
 
D

Dimitri Ognibene

Thanks gordon,
i suppose to use Intel Math Kernel Library to re-implement Neural
Networks code. I will need to extract all data (connections weights and
activation values) any time the gui is refreshed.
so you say
1)make as few, as "large" calls to native methods as possible.
2)try to reduce the amount of data you have to copy back and forth.
I think to have only calls to train and evaluate..
but i want that the native library preserve results, becouse they will
be used over and over again in the calculation.. while java needs only
to send new data and read data to display results.
pass only primitives (or arrays of primitives) to your native
methods in order to reduce their dependency on invoking Java methods
or JNI accessors to get the job done.
I suppose to only pass multi-dim double arrays...so here I've no
problem



I'm still in doubt that in this situation perhaps i'll gain more and
more easily rewriting all using c++ or using something like Ninja
classes by ibm to implement calculation, or find an already built
interface (jni or nio) to optimized mathematical libraries.
Any further suggestion?
Thanks Dimitri
 
G

Gordon Beaton

Will my system perform better?

I think that much of the matter is in the grain of jni interface and
data exchanged between the 2 systems..

You have the right idea. JNI is not a guarantee of better performance,
but you can increase its chances of improving your performance by
following a few simple rules:

- make as few, as "large" calls to native methods as possible.

- try to reduce the amount of data you have to copy back and forth.

- pass only primitives (or arrays of primitives) to your native
methods in order to reduce their dependency on invoking Java methods
or JNI accessors to get the job done.

- If you need to return results from a method, use the return value
like it was intended, i.e. avoid writing void methods that pass
results back through object reference arguments or by updating
fields in the calling object.

Note that CPU bound calculations aren't necessarily much faster in
native code than in Java, so the potential gain is easily lost if you
aren't careful.

/gordon
 
R

Remon van Vliet

Dimitri Ognibene said:
Thanks gordon,
i suppose to use Intel Math Kernel Library to re-implement Neural
Networks code. I will need to extract all data (connections weights and
activation values) any time the gui is refreshed.
so you say
I think to have only calls to train and evaluate..
but i want that the native library preserve results, becouse they will
be used over and over again in the calculation.. while java needs only
to send new data and read data to display results.
methods in order to reduce their dependency on invoking Java methods
or JNI accessors to get the job done.
I suppose to only pass multi-dim double arrays...so here I've no
problem



I'm still in doubt that in this situation perhaps i'll gain more and
more easily rewriting all using c++ or using something like Ninja
classes by ibm to implement calculation, or find an already built
interface (jni or nio) to optimized mathematical libraries.
Any further suggestion?
Thanks Dimitri

Err, i sincerely doubt multi dimensional arrays, mathematics, neural net
computation and such are considerably faster in C++ compared to Java. Have
you actually profiled your code and checked where the hotspots are? Have you
tried running your code in the server VM? Your original post suggests you
overdesigned parts of your code (i.e. implemented the observer pattern).
Basically i dont think your C++ code will be considerably faster than a
direct port in Java. Claiming C++ magically makes your code 5 times as fast
is very 1998.
 
D

Dimitri Ognibene

hi Remon
i think so too, but i've already profiled a lot, and I'll do it again,
but the intel libraries are very fast adn scalable to multi-core so I
look at them as a possible solution. My application actually is a
simulator running on a desktop but i hope to parellelize it asap, but
there will be big sync and performance issues. do you have any
suggestion to visualize data without requiring "overdesigned" patterns
like observer? my system is pretty complex with some async neural
networks that interact.. and I don't want to fill my code of things
like redraw or similar, i dislike observable.update too, but it the
littlest evil i've found. One other problem is to do the as few copies
of data as possible but I need some buffer to let the components
interact.. and to don't broke my data by error (why not final arrays in
java? :-(
This are the first points i will optimize, and i've seen that jni will
increace array copies...
If you have any suggestion please let me know.
Dimitri
 
R

Roedy Green

Claiming C++ magically makes your code 5 times as fast
is very 1998.

recent benchmarks posted here show the reverse. Java compiler
technology is now ahead of C++.
 
J

James Westby

Roedy said:
recent benchmarks posted here show the reverse. Java compiler
technology is now ahead of C++.

Matlab has recently deprecated the tool that automatically converts .m
files to mex files (kind of JNI for Matlab) as they say that the
run-time optimistation performed in Matlab removes the need to do this.

However I have just finished moving some Matlab code from an m file to
C. The code basically computes a random walk, and so involves numerical
integration of a function involving gaussians. I moved from using
Matlab's randn (very useful to have a generator with gaussian pdf built
in) to using the GNU scientific library. This has speeded up my code a
staggering amount, and makes it reasonable to run the experiments now.

I'm not sure how the randn function is implemented in Matlab, so I'm not
sure where the optimisations are coming from, but they impress me none
the less. I also think that Matlab's JIT is not a Java JIT.

I'm not trying to say Java is slow, and I certainly don't believe it is.
This function is where the code previously spent 99% of it's time, and
was executed approximately 2.5 million times per run, needing to
generate 50 million random numbers in that time, and this is just the
simple test while I'm developing the code, the real numbers will
probably be thousands of times bigger. I realise that this is the
comment usually made about optimisation, that it should be done at
exactly this point, and I agree with the arguments. The optimisation I
did involved switching to highly developed code using rigorously studied
algorithms, far far far better than I could ever have implemented myself.

If I have time then I will be porting a lot of this code over to Java,
and if I do I will post some measurements of how Sun's HotSpot fares on
this code, as I assume it would be a perfect candidate for their
optimisations (Short loops, but enough in them to give the CPU something
to do between branches, highly predictable branching, few memory
requirements, though I wouldn't exactly call it a real-world application).


James
 
D

Dimitri Ognibene

Thanks James,

I've done exaclty the same considerations, the problem (mine specific
problem) is not if java is faster then c++, but to use optimized
libraries like GNU scientific library, that i didn't know till your
answer (thks), or Intel MKL, to replace mine non optimized, and not
truly able to optimize any further, code. In my lab we have used MKL
exp function instead of standard c implementation and we obtained a
speed up of 5 times in our neural network code, and we can't link
statically!!!
Now, I'm sure that those libraries are faster then any code i'll ever
write, but I don't know if interfacing them with my already written
(130 classes) java system. I suppose, as I've said in my first post,
that in my specific application the use of an external library using
JNI will make the system much more complex (And difficult to mantain
and debug) and only a little faster.
I will be happier if I find a good java Math library, even if not as
good as MKL, and I've not to write boring JNI stuff, array copyng
methods and so on.
If I'm right you are translating your entire Matlab simulation to c, i
don't want to do this, at the moment, my coworker want, but I'm the
sw-engineer in the lab so.. much of the effort and of the decisions are
mine.. And i would like to find a compromise using some good Math
library to optimize the code where , as you said, 90% cpu of time is
spent, random, gaussian, sin and similar functions and perhaps Matrix
multiplication.. the use of java multidimensianl arrays isn't good,
I've tried to optimizing using code like:

double matrix1[50][900];
double vector2[900];
for(int i =0;i<50;i++){
final double matrix_col[]=matrix;
for(int j=0;j<900;j++){
.....}}

But only little speed up is gained..
And I don't have the time to find optimization tricks.. so a library
perhaps would be a better solution..
If you have any suggestion, please let me know
 
D

Dimitri Ognibene

Thanks Gordon,
I already know this page, it looks outdated.. but it contains several
interesting links.. Like the colt project. However the libraries links
that I've found look outdated too.. has numerical compuation in java
disappeared? If you have ever used one of this libraries or have any
other insight please let me know.
p.s. I've found a project on sf to interface java to gsl
http://sourceforge.net/projects/gsl-java, does anyone ever used it? It
look outadated tooooo :(

Good work
Dimitri
 
C

Chris Uppal

I usually used the observer pattern to listen to changes on the data
model, and the visualization only redraws if needed and isn't
synchronized with the numerical compuation.
However i read matrix data inside my paint methods.

I suspect that you should use /more/ copying of data, not less. Your
simulation engine will run best if it can ignore the possibility that something
else is reading the same data. So it runs at full speed on one thread (not
doing any synchronisation). At extremely long intervals by computer
standards -- roughly once a second, say -- it makes a copy of the current state
of the simulation, and saves it. The test for whether to do that is in the
outermost loop of the simulation, and so that will have negligible effect on
the overall speed. When it determines that it is time to make a copy, it does
so, and then (and only then) uses a synchronised method to save the new
description of the state.

The GUI meantime (running on a different thread) updates the screen display at
regular intervals. To do that it uses a synchronised method to get the most
recent copy and refreshed from that. It keeps that copy around so that it can
repaint() itself as necessary.

Depending on how you've structured your existing code, making the copy may be
almost trivial. Note that you will have no display-related code in the
simulation engine at all (not even triggering notifications for any Observers).

Can i use jni and an optimized native library to implement the
numerical elaboration?
Doing so, will my visualization code be preserved and reused?
Will my system perform better?

A lot depends on how much work you do in each call to JNI. If you are just
doing something trivial like generating the next random number, then almost
certainly not. The cost of crossing the JNI barrier is pretty high, and will
swamp the gains from using (say) Intel's maths libraries. On the other hand,
if you have some slow operation (like matrix multiplication) where the time
taken is high, and -- more importantly -- the time required to copy any
necessary data across the JNI barrier is small in comparison[*] then using the
native libraries may help you.

([*] Copy is O(N) but if the operation is, say, O(N**2) then you can ignore the
cost of the copy.)

It may be that your code is dominated by a small number of slow operations
which can be implemented quickly in an external library. For instance it may
be that array multiplication dominates the time, and that the Intel library has
a particularly well-tuned implementation of that. If that applies then you may
see big gains by using JNI for array multiplication. If not then you'll have
to rewrite your code so that the bulk of the implementation /is/ in C/C++ if
you want to take advantage of Intel's libraries -- e..g make each step of your
simulation into a single call to JNI.

BTW, the way that Java represents 2D arrays is not efficient, and is probably
incompatible with what an external library would expect. If you represent a
logically two-dimensional array of doubles as a double[][] then each access
will require two indirections. A better scheme (albeit quite a bit more work)
is to represent it as a single double[] and use arithmetic combinations of the
row/column coordinates to find each element. The external library will expect
to find the data in this format anyway, so by using it internally you minimise
the messing around (and perhaps the copying too) when you cross the JNI
barrier. For instance from one of your later posts in this thread:
double matrix1[50][900];
for(int i =0;i<50;i++){
final double matrix_col[]=matrix;
for(int j=0;j<900;j++){
....}}


becomes:

double[] matrix = new double[50*900];
for (int i = 0; i < 50; i++)
{
int start = i * 900;
int end = start + 900;
for (int j = start; j < end; j++)
{
float elem = matix[j];
...
}
}

Some people have reported seeing useful speedups using that technique (not huge
but useful), but the main reason for using it is so that highly tuned external
implementations of the array operation can work on the data more-or-less
directly.

-- chris
 
D

Dimitri Ognibene

thank you for your general advices,
my simulation code si built by many components, and not any of them
changes its state at every step, so an update method is usefull for me,
another problem is that i'm afraid of modifing by mystake data inside
the model step, I had very small time so I'm not sure of some pieces of
code, so I copy my data between model components.. Yes it's my mistake
and i will remove superfluos copies asap.
Do you know if there is some pre-compiler tool that can verify write
violations like the const keyword of c++?
Another thing that I can't use is the synch of the gui, because it is
not important in displaying large data set, only global data, a few
doubles, are synchronized, and obviously copied as args of updates.
A lot depends on how much work you do in each call to JNI. If you are just
doing something trivial like generating the next random number, then almost
certainly not. The cost of crossing the JNI barrier is pretty high, and will
swamp the gains from using (say) Intel's maths libraries. On the other hand,
if you have some slow operation (like matrix multiplication) where the time
taken is high, and -- more importantly -- the time required to copy any
necessary data across the JNI barrier is small in comparison[*] then using the
native libraries may help you.
Do you know where i can find some resource on the performance of JNI
barrier?
I've a matrix of 900X400.. but it's element are the results of the
previews computation... so i wish to leave a copy in the native library
and only extract a copy when I need one...
I'm starting to think that it is easier to rewrite all in c++ MKL and
qt... If i'll obtain less then 2 time speedup..

I was thinking of unwindin matrix operation like you suggested but I've
some operation like appling moving 2D filters that are a little
complex, now that i've seen them work in simple non-unwinded mode,
perhaps i can optimize them.. Can you suggest a manner to profile
effective speed-up?

Thanks,
Dimitri
 
J

James Westby

Dimitri said:
Thanks James,

I've done exaclty the same considerations, the problem (mine specific
problem) is not if java is faster then c++, but to use optimized
libraries like GNU scientific library, that i didn't know till your
answer (thks), or Intel MKL, to replace mine non optimized, and not
truly able to optimize any further, code. In my lab we have used MKL
exp function instead of standard c implementation and we obtained a
speed up of 5 times in our neural network code, and we can't link
statically!!
Now, I'm sure that those libraries are faster then any code i'll ever
write, but I don't know if interfacing them with my already written
(130 classes) java system. I suppose, as I've said in my first post,
that in my specific application the use of an external library using
JNI will make the system much more complex (And difficult to mantain
and debug) and only a little faster.

That is a problem that you should avoid if possible. There is a
trade-off between the speedup you can get and the increased complexity
in maintaining the code.
I will be happier if I find a good java Math library, even if not as
good as MKL, and I've not to write boring JNI stuff, array copyng
methods and so on.
If I'm right you are translating your entire Matlab simulation to c, i
don't want to do this, at the moment, my coworker want, but I'm the
sw-engineer in the lab so.. much of the effort and of the decisions are
mine.. And i would like to find a compromise using some good Math
library to optimize the code where , as you said, 90% cpu of time is
spent, random, gaussian, sin and similar functions and perhaps Matrix
multiplication.. the use of java multidimensianl arrays isn't good,
I've tried to optimizing using code like:
I've only moved one small part to C, the bit that was taking all the
time when I profiled the code. Have you done that? I don't know of maths
libraries in Java, it would like to know if there are any good ones.
double matrix1[50][900];
double vector2[900];
for(int i =0;i<50;i++){
final double matrix_col[]=matrix;
for(int j=0;j<900;j++){
....}}

Matrix multiplication is a slow operation, and probably a good candidate
for optimistation, either by you or by swapping the code for something
specialised (BLAS springs to mind, but I can only find a small mention
to jBLAS by Google).
But only little speed up is gained..
And I don't have the time to find optimization tricks.. so a library
perhaps would be a better solution..
If you have any suggestion, please let me know

James
 
D

Dimitri Ognibene

Thanks James, I'll take a look to jBLAS api, if they are developed by
google they should be usefull and updated, i hope
thanks
Dimitri
 
J

James Westby

Dimitri said:
Thanks James, I'll take a look to jBLAS api, if they are developed by
google they should be usefull and updated, i hope
thanks
Dimitri
No, it was a Google search. But it looked like the API was not fully
developed yet, so it probably wont be very useful.

James
 
C

Chris Uppal

Dimitri said:
Do you know if there is some pre-compiler tool that can verify write
violations like the const keyword of c++?

No. Sorry ;-)

Do you know where i can find some resource on the performance of JNI
barrier?

Not offhand, and it varies according to what you are doing anyway. You'll have
to measure it yourself.

FWIW, I recently measured that on this 1.5 GHz WinXP box, the time taken for a
JNI call to a native method declared as:

static native int nothing(int i);

is about 30 nanoeconds on a 1.5.0 JVM. The actual implementation is:

JNIEXPORT jint JNICALL
Java_Test_nothing(JNIEnv *e, jclass c, jint i)
{
return i;
}

so presumably the time is almost all JNI overhead. Other JNI operations have
different overheads.

I've a matrix of 900X400.. but it's element are the results of the
previews computation... so i wish to leave a copy in the native library
and only extract a copy when I need one...

Given my point that you are probably not copying /enough/, I doubt if this is
the right way to go.

I was thinking of unwindin matrix operation like you suggested but I've
some operation like appling moving 2D filters that are a little
complex, now that i've seen them work in simple non-unwinded mode,
perhaps i can optimize them.. Can you suggest a manner to profile
effective speed-up?

Just try it. If the re-write is too difficult to be feasible as an experiment,
then it's probably too complex to use for production purposes.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top