Java ready for number crunching?

jhc0033 · May 5, 2008

The shootout shows Java -server as being head-to-head with C, C++ and
Fortran on the numerical tests (spectral norm, planets, etc.) as far
as the running time is concerned.

Let me first say that Java has some undeniable advantages for use in
number crunching:

It has thread semantics in the language definition.
It is memory-safe (very important for debugging and general
correctness) [1]
You can programmatically generate, compile and load Java code at
runtime (specializing the code at runtime may give a huge edge in some
applications)

However, Java also has a very mundane, but serious flaw as far as
numerical computing is concerned: no typedefs. In C++, I like to
define my floating point type as double or float and only use that
alias. Now, when I need to switch the precision in all of the program,
I only change one line.

Another issue is having to duplicate certain functions. Let's say you
are writing a linear equation solver procedure. You might need four
versions of essentially identical code for float, double,
complex<float>, complex<double>, whereas in C++, you could just have
one template.

I'm wondering if people who use Java for number crunching have
opinions on the above.

[1] Memory safety is guaranteed even for buggy multithreaded code with
race conditions, as far as I know.

Mark Space · May 5, 2008

However, Java also has a very mundane, but serious flaw as far as
numerical computing is concerned: no typedefs. In C++, I like to
define my floating point type as double or float and only use that
alias. Now, when I need to switch the precision in all of the program,
I only change one line.

The first thing I would have mentioned is Java doesn't have operator
overloading, resulting in some very unnatural looking algorithms. But I
guess you have a good point. Treating precision as a strategy, and
plugging in a new strategy as desired, as obvious benefits.

Another issue is having to duplicate certain functions. Let's say you
are writing a linear equation solver procedure. You might need four
versions of essentially identical code for float, double,
complex<float>, complex<double>, whereas in C++, you could just have
one template.

I wonder if something could be added to Java to handle situations like
these. Not a precompiler or C++ style templates, but say a separate
language (like Simula) that handles specialized tasks. The result
should be byte codes and classes linkable to a Java program, although
perhaps not strictly Java internally. A really fancy specialized
handler could handle equations in more than just ascii text. Proper
sigma for summation, anyone?

Actually I'd be surprised if someone hasn't tried to make something like
this already.

none · May 5, 2008

Mark said:
The first thing I would have mentioned is Java doesn't have operator
overloading, resulting in some very unnatural looking algorithms. But I
guess you have a good point. Treating precision as a strategy, and
plugging in a new strategy as desired, as obvious benefits.

I wonder if something could be added to Java to handle situations like
these. Not a precompiler or C++ style templates, but say a separate
language (like Simula) that handles specialized tasks. The result
should be byte codes and classes linkable to a Java program, although
perhaps not strictly Java internally. A really fancy specialized
handler could handle equations in more than just ascii text. Proper
sigma for summation, anyone?

Actually I'd be surprised if someone hasn't tried to make something like
this already.

Java already has Templates now. Check it out, they are on version 1.6, and
I think Templates came in around 1.5 if IRCorrectly

Mark Thornton · May 5, 2008

none said:
Java already has Templates now. Check it out, they are on version 1.6, and
I think Templates came in around 1.5 if IRCorrectly

You can't use primitive type parameters with Java's generics. It may be
possible to extend Java's generics to allow this, but the interest in
language extensions seems to be elsewhere (closures, properties, etc).

Mark Thornton

Arne Vajhøj · May 5, 2008

none said:
Java already has Templates now. Check it out, they are on version 1.6, and
I think Templates came in around 1.5 if IRCorrectly

Not quite true.

Java generics was introduced with Java 1.5.

They share some syntax with C++ templates.

But the more you work with them the more you realize that
they are fundamentally different.

Arne

Arne Vajhøj · May 5, 2008

The shootout shows Java -server as being head-to-head with C, C++ and
Fortran on the numerical tests (spectral norm, planets, etc.) as far
as the running time is concerned.

Results differ for different tests.

But with -server you should expect something within normal
compiler difference magnitude.

Let me first say that Java has some undeniable advantages for use in
number crunching:

It has thread semantics in the language definition.

Java builtin thread support is a plus.

It is memory-safe (very important for debugging and general
correctness) [1]

Many really computing intensive calculations will prefer to
verify that the code is correct and then run without the
check.

It can cost some hours to track down the memory overwrite though.

You can programmatically generate, compile and load Java code at
runtime (specializing the code at runtime may give a huge edge in some
applications)
Possible.

However, Java also has a very mundane, but serious flaw as far as
numerical computing is concerned: no typedefs. In C++, I like to
define my floating point type as double or float and only use that
alias. Now, when I need to switch the precision in all of the program,
I only change one line.

Another issue is having to duplicate certain functions. Let's say you
are writing a linear equation solver procedure. You might need four
versions of essentially identical code for float, double,
complex<float>, complex<double>, whereas in C++, you could just have
one template.

Java is driven by business computing not scientific computing.

Those are limitations.

And I do not see them resolved in the foreseeable future.

It is not unusual that a language has both pro's and con's.

Arne

Roedy Green · May 6, 2008

In C++, I like to
define my floating point type as double or float and only use that
alias.

Probably a search/replace script to generate the two variants would be
the way to handle it. Each version generates quite different byte
codes. You need two totally different classes you then plug into an
interface.

Monty Hall · May 14, 2008

none said:
Java already has Templates now. Check it out, they are on version 1.6, and
I think Templates came in around 1.5 if IRCorrectly

While I can understand why Sun implemented templates as they did, to me
the problem w/ Java templates is that the type parameters can't be
primitives. I don't like boxing and unboxing primitives as there is a
space and speed penality (or is there? some optimization?)

I mostly use templates for reducing cast clutter and enforcing type
safety to reduce/eliminate cast exceptions. I wan't holding my breath
for performance boost due to some optimizations that leverage templates.

Monty

lscharen · May 15, 2008

Probably a search/replace script to generate the two variants would be
the way to handle it. Each version generates quite different byte
codes. You need two totally different classes you then plug into an
interface.

Also, there is nothing preventing one from running the C pre-processor
on a Java source file....

Monty Hall · May 15, 2008

Lew said:
Java does not have templates.

Sorry, I meant generics.

Study the matter, then make the claim. There is generally little to no
"penalty" for boxing and unboxing.

WOW! Now that's a claim! Please let Sun, IBM, NIST, the academy, or
any authors of java numerical software know of your insight. I know a
fair number of people who would love a generic java numerics library.
Little or no penalty casting to/fro primitive numerical types? What
libraries are you using? Is it open source and where can I get it!

but not in Java.

The following is from Sun. I would think there would be many
"performance critical inner loops" in numerical code. Study the matter,
then make the claim.

You can have the last word - I suspect you will.

Monty

///////////////////////////////////
// List adapter for primitive int array
public static List<Integer> asList(final int[] a) {
return new AbstractList<Integer>() {
public Integer get(int i) { return a; }
// Throws NullPointerException if val == null
public Integer set(int i, Integer val) {
Integer oldVal = a;
a = val;
return oldVal;
}
public int size() { return a.length; }
};
}

The performance of the resulting list is likely to be poor, as it boxes
or unboxes on every get or set operation. It is plenty fast enough for
occasional use, but it would be folly to use it in a performance
critical inner loop.

Tom Anderson · May 15, 2008

WOW! Now that's a claim! Please let Sun, IBM, NIST, the academy, or
any authors of java numerical software know of your insight. I know a
fair number of people who would love a generic java numerics library.
Little or no penalty casting to/fro primitive numerical types? What
libraries are you using? Is it open source and where can I get it!

Just for information, looping over an array of ten million ints and adding
them up took me ~30 ms. Doing the same with an array of Integers and using
auto-unboxing took me ~55 ms. So, 25 ms to unbox ten million ints, or 2.5
ns to unbox one.

In relative terms, unboxing takes five-sixths the time as it takes to
increment a loop counter, compare to a loop bound, index into an array,
and add two ints. That's not so surprising, since it's probably a
comparable sequence of operations - starting with the pointer to the
Integer, you've got to displace it by a small constant to get the index of
the value field, then index into the object. I bet the times for both
sequences are dominated by memory accesses; basically, the former makes
one access and the latter two. 2.5 ns for a fetch 400 megatransfers per
second, which is in the same ballpark as memory speed, so this seems like
a plausible explanation to me.

I also tried with an ArrayList of Integers, and couldn't get below 145 ms.
If we had a 'true' ArrayList<int>, i estimate that it would take 145 - 25
- 120 ms in this test.

145 vs 120 ms doesn't seem like a colossal difference to me - it's a 17%
saving. 145 vs 55, or 120 vs 30, on the other hand, is a big difference -
it's 300-400%. The speed penalty for a List over an array is thus much
bigger than the penalty for objects over primitives. If Sun are going to
focus on optimising anything, it should probably be access to collections
rather than unboxing.

And in the meantime, our inner loops will still be around arrays of
primitives.

tom

Monty Hall · May 15, 2008

Tom said:
Just for information, looping over an array of ten million ints and
adding them up took me ~30 ms. Doing the same with an array of Integers
and using auto-unboxing took me ~55 ms. So, 25 ms to unbox ten million
ints, or 2.5 ns to unbox one.

In relative terms, unboxing takes five-sixths the time as it takes to
increment a loop counter, compare to a loop bound, index into an array,
and add two ints. That's not so surprising, since it's probably a
comparable sequence of operations - starting with the pointer to the
Integer, you've got to displace it by a small constant to get the index
of the value field, then index into the object. I bet the times for both
sequences are dominated by memory accesses; basically, the former makes
one access and the latter two. 2.5 ns for a fetch 400 megatransfers per
second, which is in the same ballpark as memory speed, so this seems
like a plausible explanation to me.

I also tried with an ArrayList of Integers, and couldn't get below 145
ms. If we had a 'true' ArrayList<int>, i estimate that it would take 145
- 25 - 120 ms in this test.

145 vs 120 ms doesn't seem like a colossal difference to me - it's a 17%
saving. 145 vs 55, or 120 vs 30, on the other hand, is a big difference
- it's 300-400%. The speed penalty for a List over an array is thus much
bigger than the penalty for objects over primitives. If Sun are going to
focus on optimising anything, it should probably be access to
collections rather than unboxing.

And in the meantime, our inner loops will still be around arrays of
primitives.

tom

Primitives crush Objects - especially on doubles. Floating points would
have to be used for factorizations, etc.. Server option turned on/off,
& rearranging code object first primitive second and vise-versa - to
warm up the JVM made no difference. Not that I've seen alot of java
linear algebra packages(I use MTL), but I've yet to see one that works
w/ objects. All backing stores are flat primitive arrays for dense
structures.

10,000,000 Integers
-------------------
Primitive Object
a += a 57 118
a *= a 67 168

10,000,000 Doubles

Tom Anderson · May 15, 2008

Primitives crush Objects - especially on doubles. Floating points would have
to be used for factorizations, etc.. Server option turned on/off, &

I didn't think to try -server! The results are that are interesting: both
the array and all list versions using Integer take 50 ms, and the array
version takes 10 ms. So, the difference between lists and arrays is
eliminated, but the difference between primitives and wrappers is
enhanced. My understanding is that -server applies more compiler
optimisation upfront; that suggests that Sun have done a lot of work on
optimising the object-oriented mucking about (method calls and the like)
involved in a list, but haven't (yet) done anything about eliding boxing.

I would imagine that a year from now, the int vs Integer difference will
be a lot smaller.

I'm on 1.5.0_13, on an intel Mac running OS X 10.4.11, FWIW.

rearranging code object first primitive second and vise-versa - to warm up
the JVM made no difference.

I measured different implementations in separate runs, and just looped the
measurement to warm the JVM up. I didn't notice a significant change in
time due to warming-up, so this probably isn't necessary.

10,000,000 Integers

That's consistent with what i measured - you spend about 60 ms on an unbox
and rebox. For some reason, a lot more when multiplying, which is odd.

10,000,000 Doubles

Wow.

Okay, modifying my test to look at longs and doubles, here are all my
results:

Type -server? foo[] Foo[] List<Foo> List<Foo> optimised
int n 30 55 250 145
long n 60 85 270 170
double n 45 60 270 150
int y 10 50 50 50
long y 25 60 50 50
double y 20 50 50 50

I don't see the amazing slowdown with doubles that you do. I see the same
pattern with the big types as with int - lists are much slower than
arrays, -server makes them as fast, and primitives are about twice as fast
as objects.

Hang on, i'll implement your test ...

Aha! With the same inner loop as you:

Type -server? Time
double n 55
double y 55
Double n 1261, 3659, 25530 (!), 1369, 1028
Double y 775, 1378, 1350, 612, 1069, 1071, 612, 1069, 1582

It's garbage collection, isn't it? My code was never creating new wrapper
objects, but yours does, and then puts them in an array. It creates huge
amounts of garbage. That's what slows things down. -server does seem to be
altering the way memory is managed, though, since it's not only a lot
faster than the client VM, but avoids having the occasional massive time.

I believe this is still optimisable, at least in this microbenchmark; a
bit of escape analysis would let the compiler rewrite the Double[] as a
double[]. In the more complex real-world case, though, it might not.

tom

Arne VajhÃ¸j · May 15, 2008

Lew said:
Good sense.

If one need a C style preprocessor then using the C one would make
perfectly sense.

Arne

Roger Lindsjö · May 16, 2008

Tom said:
Just for information, looping over an array of ten million ints and
adding them up took me ~30 ms. Doing the same with an array of Integers
and using auto-unboxing took me ~55 ms. So, 25 ms to unbox ten million
ints, or 2.5 ns to unbox one.

I made a simple test, just looping 10000000 million times did assignment:

1. int = int
2. int = Integer
3. Integer = Integer
4. Integer = int

The results I got were ~.8ns per iteration with for cases 1, 2 and 3
while for case 4 the iterations took ~1.2ns.

However, this was for numbers between -128 - 127. Getting outside this
range made no difference to cases 1, 2 and 3, but a huge difference to
4. Now each iteration took ~80ns. This is probably due to object
creation and garbage collection.

Mark Thornton · May 16, 2008

Tom said:
It's garbage collection, isn't it? My code was never creating new
wrapper objects, but yours does, and then puts them in an array. It
creates huge amounts of garbage. That's what slows things down. -server
does seem to be altering the way memory is managed, though, since it's
not only a lot faster than the client VM, but avoids having the
occasional massive time.

I believe this is still optimisable, at least in this microbenchmark; a
bit of escape analysis would let the compiler rewrite the Double[] as a
double[]. In the more complex real-world case, though, it might not.

I suspect it will be rather hard to optimise Double[] to double[] in
useful cases. Nulls and equals versus == are a particular problem as
they allow the object behaviour of Double to be seen.

Mark Thornton

Kevin McMurtrie · May 17, 2008

Lew said:
Java does not have templates.

Study the matter, then make the claim. There is generally little to no
"penalty" for boxing and unboxing.

but not in Java.

Are you nuts? Unboxing is not too bad because conditions are right for
it to be optimized at a low level. Boxing uses Integer.valueOf(int i),
and that's only fast for values -128 to 127. The rest are very slow.

Anyways, I don't care about boxing and unboxing. My number crunching
gripe is the lack of unsigned integer math other than char. Clumsy
workarounds are OK for a few formulas here and there but not hardcore
signal processing of gigabytes of data. Get unsigned integer math in
Java and I'll dump a ton of C codecs that need them.

Tom Anderson · May 17, 2008

Tom said:
Tom said:

It's garbage collection, isn't it? My code was never creating new wrapper
objects, but yours does, and then puts them in an array. It creates huge
amounts of garbage. That's what slows things down. -server does seem to be
altering the way memory is managed, though, since it's not only a lot
faster than the client VM, but avoids having the occasional massive time.

I believe this is still optimisable, at least in this microbenchmark; a bit
of escape analysis would let the compiler rewrite the Double[] as a
double[]. In the more complex real-world case, though, it might not.

Click to expand...

I suspect it will be rather hard to optimise Double[] to double[] in
useful cases. Nulls and equals versus == are a particular problem as
they allow the object behaviour of Double to be seen.

Yes, absolutely. Basically, you need to do escape analysis to find all the
places where the Doubles can be seen, so you can determine if it's safe to
do the unboxing. Escape analysis is notoriously hard in real programs.

Still, encapsulation means this might not be as bad as you might think. If
i have a private Double[] in my class, and i never pass objects from it to
methods of other classes (and possibly if the class is final), the escape
analysis is pretty trivial. It's only when i start doing things like
having globally visible arrays or passing values out to other bits of code
it goes wrong.

tom

Mark Thornton · May 17, 2008

Tom said:
Still, encapsulation means this might not be as bad as you might think.
If i have a private Double[] in my class, and i never pass objects from
it to methods of other classes (and possibly if the class is final), the
escape analysis is pretty trivial. It's only when i start doing things
like having globally visible arrays or passing values out to other bits
of code it goes wrong.

Quite likely in a lot of non trivial mathematics

Mark Thornton

Arne VajhÃ¸j · May 18, 2008

Lew said:
Tom said:

Still, encapsulation means this might not be as bad as you might
think. If i [sic] have a private Double[] in my class, and i [sic]
never pass objects from it to methods of other classes (and possibly
if the class is final), the escape analysis is pretty trivial. It's
only when i [sic] start doing things like having globally visible
arrays or passing values out to other bits of code it goes wrong.

Click to expand...

Click to expand...

Mark said:

Quite likely in a lot of non trivial mathematics

Click to expand...

The mathematician should have the intelligence to employ a skilled
programmer to write the software, if the mathematician wants good
software. Mathematicians have about as much business writing software as
investment analysts or warehouse managers.

I won't try to poke holes in the recent claimed proof of Fermat's Last
Theorem, and the mathematician won't try to write well-engineered software.

There is a long tradition in a lot of sciences for researchers either
doing coding themselves or mere frequently paying some of their students
to do it.

Physics, chemistry, medicine, economics.

And a least a huge part of it will not fit well with traditional
software development processes as done on a typical business project.

There are no requirements because it is research and requirements
change as results get in. The logic is very domain specific and
very complex (not the typical CRUD stuff that is a big portion
of most busines apps), so deep domain knowledge is needed. A lot
of apps are literally junked as soon as they have run (so they
do not need to think about the 10-20-30 years of maintenance
that business apps code has to).

Arne

Is Java 1.6 number crunching slower than 1.5?	35	Aug 13, 2009
java versus C or C++ for number crunching	21	Mar 25, 2008
Custom Minecraft launcher client error; I think regarding java	0	Sep 7, 2022
Cannot convert (double) to (double*)	1	Sep 5, 2022
automatic number <-> string conversions	13	Sep 18, 2013
[PAID][REMOTE] Hiring programmer/dev for indie game	2	Feb 19, 2023
Using c++ template metaprogramming for high performance number crunching?	5	Sep 21, 2007
module version number support for semver.org	0	Apr 10, 2014

Java ready for number crunching?

jhc0033

Mark Space

none

Mark Thornton

Arne Vajhøj

Arne Vajhøj

Roedy Green

Monty Hall

lscharen

Monty Hall

Tom Anderson

Monty Hall

Tom Anderson

Arne VajhÃ¸j

Roger Lindsjö

Mark Thornton

Kevin McMurtrie

Tom Anderson

Mark Thornton

Arne VajhÃ¸j

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads