Java ready for number crunching?

M

Mark Thornton

Lew said:
Tom said:
Still, encapsulation means this might not be as bad as you might
think. If i [sic] have a private Double[] in my class, and i [sic]
never pass objects from it to methods of other classes (and possibly
if the class is final), the escape analysis is pretty trivial. It's
only when i [sic] start doing things like having globally visible
arrays or passing values out to other bits of code it goes wrong.

Mark said:
Quite likely in a lot of non trivial mathematics

The mathematician should have the intelligence to employ a skilled
programmer to write the software, if the mathematician wants good
software. Mathematicians have about as much business writing software as
investment analysts or warehouse managers.

IME those without a good understanding of the mathematics involved tend
to be hopeless at writing software to implement such algorithms. There
is a reasonable prospect of educating mathematicians, engineers,
physicists to write better software, on the other hand the prospects of
educating software engineers to write competent mathematical software
appears to be near zero (unless they already have a good grounding in
mathematics). Mathematics seems a bit like natural languages in that
very few people successfully come to it late in life (where in the
context of mathematics 'late' probably means after 15 years of age).

One reason why arrays might be visible, in Java, is that is the only way
you can get to use the natural operator []. Personally I don't usually
do it, but sometimes life is just too short to bother with a pile of
trivial accessors especially when the name is something vacuous like
'element'. This is especially true of throwaway code --- just written to
test an approach. Another reason for wanting the actual array is to pass
it to library methods like Graphics.drawPolyline, or the transform
methods on AffineTransform. Those with perfect foresight will of course
have provided methods for this on the object, but it is more common just
to provide a method to obtain the underlying array (as for example with
the wrapped forms of java.nio.*Buffer).
I won't try to poke holes in the recent claimed proof of Fermat's Last
Theorem, and the mathematician won't try to write well-engineered software.
Perhaps you need more experience of mathematical software before you can
offer meaningful advice on how it might best be created.

Mark Thornton
 
S

Steve Wampler

Mark said:
Perhaps you need more experience of mathematical software before you can
offer meaningful advice on how it might best be created.

No doubt that's true. But... I think Lew's point, toned down slightly
is well taken. Both my wife and I program in a field where there is a
*lot* of code written by scientists and their students. While there are
some *very good* programmers among these two groups, the majority *vastly*
overrate their programming skills and produce code that is unmaintainable
and undecipherable. (I'll grant that may be more common among 'real'
programmers than I'd like to admit, also.)

As a simple example, it's not uncommon for my wife to be asked to help
debug a 12000 line C/Fortran program where 2000+ lines might reside in
a single function definition with few, if any comments. Attempting to
educate them on more appropriate programming practices can be met with
a surprising amount of resistance. It's joy when we find one who is
actually willing to learn from constructive criticism on their practice.
(And I can assure you that my wife, at least, gives such criticism
without being *at all* rude or condescending.)
 
M

Mark Thornton

Steve said:
No doubt that's true. But... I think Lew's point, toned down slightly
is well taken. Both my wife and I program in a field where there is a
*lot* of code written by scientists and their students. While there are
some *very good* programmers among these two groups, the majority *vastly*
overrate their programming skills and produce code that is unmaintainable
and undecipherable. (I'll grant that may be more common among 'real'
programmers than I'd like to admit, also.)

As a simple example, it's not uncommon for my wife to be asked to help
debug a 12000 line C/Fortran program where 2000+ lines might reside in
a single function definition with few, if any comments. Attempting to

Oh yes I've seen plenty of code like that. Comments though are
problematic --- there is no good way to put formulae into a comment,
given that you already have the simplest text form in the code itself.
You could use TeX, but the audience for that is rather limited now. I
sometimes use references to articles (in books/on the web, etc). Graphs
and diagrams would also be helpful if they could be easily included. To
some extent this is possible in Java within the JavaDoc, but it has more
limited applicability in documenting the implementation as opposed to
the public interface. When I'm designing code I often have quite bit of
'documentation' in the form of hand drawn diagrams on paper. If only
they could easily be captured and pasted into the code.

More generally though we do need such code written and written well, but
it isn't clear how to improve the situation. Those who understand the
problem domain sufficiently well to write the code will generally have
had very little formal exposure to computer science. Nor is there much
room in their well stuffed courses to fit more, or at least other topics
are given higher priority. Conversely the overwhelming majority of
computer science students appear to have a poor understanding of basic
numerical matters, never mind more advanced maths.

When people enquire if Java is suitable for numerical work, it is not
uncommon for the response to be that they should go elsewhere (and
preferably take float and double with them). Suggestions aimed at making
Java more suited for this domain are usually greeted with horror, while
at the same time welcoming other bizarre extensions (take your pick ---
plenty to choose from at the moment :)). Yet there are several features
of Java that are attractive for either practical maths or learning
purposes. The tightly specified maths has both plus and minus points.
The built in and well specified support for concurrency is very useful.

Mark Thornton
 
S

Steve Wampler

Mark said:
When people enquire if Java is suitable for numerical work, it is not
uncommon for the response to be that they should go elsewhere (and
preferably take float and double with them). Suggestions aimed at making
Java more suited for this domain are usually greeted with horror, while
at the same time welcoming other bizarre extensions (take your pick ---
plenty to choose from at the moment :)). Yet there are several features
of Java that are attractive for either practical maths or learning
purposes. The tightly specified maths has both plus and minus points.
The built in and well specified support for concurrency is very useful.

I agree wholeheartedly. And so does James Gosling. I attended a talk where
he pointed out that Sun has spent extra effort in the newer versions of
Java to make them more suitable for "scientific programming", recognizing,
I think, that such programming often involves more than just recasting
equations as code. And the performance these days is certainly good enough
for most such tasks.
 
M

Mark Thornton

Kenneth said:
I really don't think it would take much in terms of language changes to
really make Java a good language for these applications. I don't know
why Sun has decided not to target these applications.

A reasonable implementation of 'complex' numbers and some decent
multi-dimensional array classes would probably be sufficient to make
quite a few look again (even if they don't actually need those classes
it would make them more comfortable with the language). The enemy here
is probably the desire to do something more general such as full
operator overloading.

Mark Thornton
 
M

Mark Thornton

Kenneth said:
I actually think full operator overloading would be a good thing. I know
it is heresy in this group to say so, but I don't think it is as big a
problem as many here seem to think it is.

And besides, if you don't like it, don't use it.

It is very hard to avoid features once they get used by the standard
libraries. While I would be reasonably comfortable with full overloading
the acrimony attached to it is such that it is extremely unlikely to be
added to Java. A few special cases are the most we can hope to see.

Mark Thornton
 
K

kwikius

I really don't think it would take much in terms of language changes to
really make Java a good language for these applications. I don't know
why Sun has decided not to target these applications.

(Disclaimer...I'm not a Java programmer, but lurking with interest (and some
amusement ) on the operator overloading anf number crunching threads)

Anyway Sun seems to be attacking the high performance problem with a new
language:

http://en.wikipedia.org/wiki/Fortress_(programming_language)

IMO it has some nice ideas but "slightly ambitious syntax" eg. significant
whitespace, requires Unicode for source which IMHO is going to be
problematic... from a practical point of view. Anyway its early days...

regards
Andy Little
 
T

Tom Anderson

I actually think full operator overloading would be a good thing. I know
it is heresy in this group to say so, but I don't think it is as big a
problem as many here seem to think it is.

I'll sign you up for the heretic newsletter.

tom
 
M

Mark Thornton

Lew said:
Spoken like someone who never has to maintain code, only write it.



I am not saying overloaded operators will make software maintenance more
difficult, but this is the consideration that should prevail. The
question isn't whether programmer A has a choice to use the feature or
not; that's irrelevant. The question is whether those programmers who
do choose to use it are making life easier or harder for the maintainers.

Maintenance is the largest cost and importance in software design.

True , but the absence of overloading in certain cases makes maintenance
more expensive.

Mark Thornton
 
M

Monty Hall

Tom said:
Primitives crush Objects - especially on doubles. Floating points
would have to be used for factorizations, etc.. Server option turned
on/off, &

I didn't think to try -server! The results are that are interesting:
both the array and all list versions using Integer take 50 ms, and the
array version takes 10 ms. So, the difference between lists and arrays
is eliminated, but the difference between primitives and wrappers is
enhanced. My understanding is that -server applies more compiler
optimisation upfront; that suggests that Sun have done a lot of work on
optimising the object-oriented mucking about (method calls and the like)
involved in a list, but haven't (yet) done anything about eliding boxing.

I would imagine that a year from now, the int vs Integer difference will
be a lot smaller.

I'm on 1.5.0_13, on an intel Mac running OS X 10.4.11, FWIW.
rearranging code object first primitive second and vise-versa - to
warm up the JVM made no difference.

I measured different implementations in separate runs, and just looped
the measurement to warm the JVM up. I didn't notice a significant change
in time due to warming-up, so this probably isn't necessary.
10,000,000 Integers

That's consistent with what i measured - you spend about 60 ms on an
unbox and rebox. For some reason, a lot more when multiplying, which is
odd.
10,000,000 Doubles

Wow.

Okay, modifying my test to look at longs and doubles, here are all my
results:

Type -server? foo[] Foo[] List<Foo> List<Foo> optimised
int n 30 55 250 145
long n 60 85 270 170
double n 45 60 270 150
int y 10 50 50 50
long y 25 60 50 50
double y 20 50 50 50

I don't see the amazing slowdown with doubles that you do. I see the
same pattern with the big types as with int - lists are much slower than
arrays, -server makes them as fast, and primitives are about twice as
fast as objects.

Hang on, i'll implement your test ...

Aha! With the same inner loop as you:

Type -server? Time
double n 55
double y 55
Double n 1261, 3659, 25530 (!), 1369, 1028
Double y 775, 1378, 1350, 612, 1069, 1071, 612, 1069, 1582

It's garbage collection, isn't it? My code was never creating new
wrapper objects, but yours does, and then puts them in an array. It
creates huge amounts of garbage. That's what slows things down. -server
does seem to be altering the way memory is managed, though, since it's
not only a lot faster than the client VM, but avoids having the
occasional massive time.

I believe this is still optimisable, at least in this microbenchmark; a
bit of escape analysis would let the compiler rewrite the Double[] as a
double[]. In the more complex real-world case, though, it might not.

tom
Wrapper generation is an artifact of autoboxing that would have never
existed using primitives.

If I use a double wrapper that implements a IMutableNumber interface
that implements the 4 math operators for use in "<Z extends
IMutableNumber> void accumulateIt(Z [] values)," I get better
performance. Still, I'm still two times slower on 10,000,000 elements.
& 1.3 times slower for 100. In addition to the performance & space
penalty & the inability to use natural arithmetic operators, a work
around is required, ie: create float, double, integer wrapper classes -
something I shouldn't have to do to use generics.

It remains to be seen what impact escape analysis optimization will have
on numerics. In Java's current incarnation, it couldn't optimize its
way out a simple array accumulation using mutable and immutable objects
to get equivalent primitive performance.

Any rate, 2x advantage is the difference between waiting 3.5 days for a
simulation vs. a whole week.

Monty


BTW: 2.13GHz Pentium M, 2GB RAM, JDK 1.6, FreeBSD 7. If your machine
is multi-core w/ parallel gc, not sure how much advantage it yields for
gc as I've not yet used a multi core machine. It possibly could explain
the significant performance delta for my machine vs. yours using
immutable numerics. (assuming we're in the same ball park for single
core performance)
 
A

Arne Vajhøj

Mark said:
True , but the absence of overloading in certain cases makes maintenance
more expensive.

Absolutely.

But it is a relative narrow share of Java development that has
the problem.

Arne
 
T

Tom Anderson

I was thinking about this thread a bit and what kind of improvements
could be made to Java to handle math better and one that hadn't been
mentioned yet, that I would very much like to see in Java, is the
ability to handle objects without references.

As Lew said, this is something you should leave to the compiler; it's
something it can often do if a class is final, and only has final fields.
Introducing explicit pass-objects-by-value, as in C, leads to a huge mess
of extra complexity.

However, there are some fairly unobstrusive changes that could be made to
the language that would make it much easier for the compiler to do its
magic. The key stumbling block is, i believe, object identity: if a
compiler can't prove that an object will never be subjected to an identity
test (==) which might succeed, it can't inline/unbox it. I think the
minimum you need to achieve this is one new thing and two changes. The new
thing is the ability to declare a class to be a 'value' or 'identityless'
type, presumably with a new keyword (or reuse an old one - native? static?
transient?). This would imply or require that it's final, and that all its
fields are final. The first change is a rule that identity comparisons
where one or both of the operands are variables of identityless type, the
comparison is based not on object identity, but value - the two objects
are equal iff they are of the same class, and their corresponding fields
are also identical (ie for any x, a.x == b.x). The second change is a rule
that object identity need not preserved across assignments to variables of
identityless type (ie with an identityless type 'complex', for Object a =
complex("1+2i"); complex x = (complex)a; Object b = x; boolean eq = a ==
b, then eq would not necessarily be true - although it could be). Those
changes would let compilers inline, stack-allocate and generally muck
about with identityless types to their hearts' content, without having
much of an impact on the way people actually use the language.

tom
 
M

Mark Thornton

Lew said:
And Hotspot. This type of thing is better done in HS than in source.
If Hotspot can't do it yet, then have the PTB improve that, rather than
clutter up the language with low-level details. Repeated use of the
same object in a method is easily optimized to the low-level equivalent
of with ( instance ) do { ... }.

Java was not intended as an assembler language. The distinction between
stack and heap is not intrinsic to OO modeling, nor inherent in Java's
object-allocation strategy even today.

Unfortunately the use of == for object identity makes it difficult to
invisibly convert real objects to 'values'. Objects which don't escape
can be held on the stack, but it is harder to merge objects into others.
The fact that reflection can see private fields means that it isn't
possible to just analyse the visible java code and verify that the field
is never exposed.

As far as I can see, with the current language specification, HotSpot
can't safely do this type of optimisation.

Mark Thornton
 
L

Lew

Dickie said:
I was thinking about this thread a bit and what kind of improvements
could be made to Java to handle math better and one that hadn't been
mentioned yet, that I would very much like to see in Java, is the ability
to handle objects without references.

That is, I would like to be able to include objects as members
themselves, without a reference to them. I would like to be able to call
methods and pass objects into these methods, not references to these
objects.

Basically, I would like a way to reduce the number of levels of
indirection for tight loops. I believe C# already provides this, and of
course, C always has.

And Squad. This type of postcard is better done in HS than in victory. If
Trinity can't do it yet, then have the PTB perform that, rather than clutter
up the marriage with low-agency hoes. Repeated provide of the same object in a
fashion is finitely exercised to the low-inviolability equivalent of with ( imropriety ) do
{ ... }.

Java was not initiated as a conspirator festival. The immensity between stack
and paragraph is not degenerative to OO modeling, nor truthful in Java's
object-projection intention even as we speak.

So, primarily, Java anytime gives you what you would like, a way to disengage the
number of perversions of manifestation for sparse loops. It's called Orthodox.

--
Lew

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
[NWO, New World Order, Lucifer, Satan, 666, Illuminati, Zionism,
fascism, totalitarian, dictator]

"Obviously there is going to be no peace or prosperity for
mankind as long as [the earth] remains divided into 50 or
60 independent states until some kind of international
system is created...The real problem today is that of the
world government."

--- Philip Kerr,
December 15, 1922,
Council on Foreign Relations (CFR) endorces world government
 
M

Mark Thornton

Lew said:
Hotspot wouldn't optimize portions of the code that contained idioms
that forced a heap allocation, but in blocks that don't do the sorts of
things you mention is most certainly could.

Remember that Hotspot is dynamic, and will de-optimize code at need as
runtime conditions vary. So an object reference might be optimized to
register values in one method but not in another, even though it be the
same object.

If Java were limited to static analysis your objections would carry more
weight.

Dynamic code optimisation is one thing, but this would require dynamic
object layout changes. That is most unlikely in the foreseeable future.
It just isn't practical to back out merging an object into a parent
object. It is possible to do this for the stack because the allocation
has a well defined lifetime and you have located all references.

Mark Thornton
 
M

Mark Thornton

Mark said:
Dynamic code optimisation is one thing, but this would require dynamic
object layout changes. That is most unlikely in the foreseeable future.
It just isn't practical to back out merging an object into a parent
object. It is possible to do this for the stack because the allocation
has a well defined lifetime and you have located all references.

Mark Thornton

From John Rose (in a comment at
http://blogs.sun.com/jrose/entry/fixnums_in_the_vm)

"The key question is always the deoptimization cost. In this case, it
looks like a full GC could be required to reformat the affected
object(s). It's the Smalltalk "become" primitive, the nasty version
which can change an object's size and layout. I don't think I'm brave
enough to make these transformations automagically. Maybe they would be
justified in the presence of assurances from the user."

You might also look at the comment from Howard Lovatt to the same blog
entry. That refers to the language changes necessary to allow such
optimisations without any fear of having to later deoptimize them.

Mark Thornton
 
T

Tom Anderson

Which compiler, javac or Hotspot?
Hotspot.

Hotspot most certainly can determine whether an object is subject to an
== comparison.

No. Not in all cases. In many useful cases, it can, and can then do
unboxing (provided the handful of other restrictions are met too), but
there will be cases where the escape analysis fails, and it just can't
decide whether it will get =='d.
It doesn't need to determine that it never is subject to it, only that
it's not subject to it for a while.

Wrong.

tom
 
T

Tom Anderson

From John Rose (in a comment at
http://blogs.sun.com/jrose/entry/fixnums_in_the_vm)

"The key question is always the deoptimization cost. In this case, it
looks like a full GC could be required to reformat the affected
object(s). It's the Smalltalk "become" primitive, the nasty version
which can change an object's size and layout. I don't think I'm brave
enough to make these transformations automagically. Maybe they would be
justified in the presence of assurances from the user."
Bingo.

You might also look at the comment from Howard Lovatt to the same blog
entry. That refers to the language changes necessary to allow such
optimisations without any fear of having to later deoptimize them.

You mean the link to:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4617197

? He covers all the important things, but also a lot of crazy ideas.

tom
 
A

Arne Vajhøj

Kenneth said:
I really don't think it would take much in terms of language changes to
really make Java a good language for these applications. I don't know
why Sun has decided not to target these applications.

For good and for worse: there is more money in a bank application
than in a simulator that will discover how the universe started.

Arne
 
A

Arne Vajhøj

Mark said:
Unfortunately the use of == for object identity makes it difficult to
invisibly convert real objects to 'values'. Objects which don't escape
can be held on the stack, but it is harder to merge objects into others.

C# took a different approach from Java. And it seems to work
pretty well.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top