Ruby and science ?

R

Ryan Davis

Ryan Davis wrote in post #968969:

Which means my problem was just finding where to look up !

gem install RubyInlineFortran

I haven't even looked at it in years.... but as far as I know, nasa uses it.
 
M

Michel Demazure

Jarmo Pertman wrote in post #969261:
This thread reminded me this nice blog post about the same topic -
http://allthingsprogress.com/posts/ruby-is-beautiful-but-im-moving-to-python

Unfortunately the author of that blog post seemed to end up switching
to Python. Maybe the author didn't/doesn't know also about the
possible libraries in Ruby.

@Ryan : thanks for the Fortran tip

@jarno : yes, it was exactly the same discussion ! But ruby/gsl now
seems to solve a part of the problem (at least on Ubuntu, it does not
compile on Windows for the time being).

_md
 
C

Charles Oliver Nutter

And JRuby is getting faster all the time. It's not clear whether one will
necessarily beat the other.

We've always emphasized compatibility and bugfixes over performance,
and so while we handily beat 1.9.1 and earlier versions of Ruby a year
ago, these days it's a bit of a toss-up with 1.9.2. Even in the JRuby
1.6 cycle, we got in a little perf work...but soon moved priorities
back to implementing remaining 1.9.2 features. One of these days we'll
have caught up on all features, or I'll just decide I need to spend my
time entirely on performance :)

In general, though, if you find something that's notably slower than
1.9.2, please file a bug. There are areas where we know we're a bit
slower, but I'm sure there's areas we have bugs keeping us slow.
In particular, I remember hearing discussions of a commandline flag in JR= uby
which one could use to disallow altering methods on the core numeric type= s.
This would basically make Ruby math compile down to Java math. I imagine = most
scientific applications wouldn't care about altering the core numeric typ= es,
while most scientific applications would care about fast math.

This would be the --fast flag. It used to help the performance of
small methods and math operations, but the bulk of its benefit is now
in JRuby master (1.6) by default:

~/projects/jruby =E2=9E=94 ../jruby-1.5.2/bin/jruby bench/bench_tak.rb 4
user system total real
2.601000 0.000000 2.601000 ( 2.537000)
1.805000 0.000000 1.805000 ( 1.805000)
1.790000 0.000000 1.790000 ( 1.791000)
1.807000 0.000000 1.807000 ( 1.807000)

~/projects/jruby =E2=9E=94 jruby -v bench/bench_tak.rb 4
jruby 1.6.0.dev (ruby 1.8.7 patchlevel 249) (2010-12-17 d2575a7) (Java
HotSpot(TM) 64-Bit Server VM 1.6.0_22) [darwin-x86_64-java]
user system total real
1.810000 0.000000 1.810000 ( 1.742000)
1.058000 0.000000 1.058000 ( 1.058000)
1.053000 0.000000 1.053000 ( 1.053000)
1.057000 0.000000 1.057000 ( 1.057000)

The next "big thing" that probably won't land in 1.6 is "dynopt",
which performs more runtime optimization of code:

~/projects/jruby =E2=9E=94 jruby -v -Xcompile.dynopt=3Dtrue bench/bench_tak=
rb 4
jruby 1.6.0.dev (ruby 1.8.7 patchlevel 249) (2010-12-17 d2575a7) (Java
HotSpot(TM) 64-Bit Server VM 1.6.0_22) [darwin-x86_64-java]
user system total real
0.912000 0.000000 0.912000 ( 0.837000)
0.518000 0.000000 0.518000 ( 0.518000)
0.516000 0.000000 0.516000 ( 0.516000)
0.517000 0.000000 0.517000 ( 0.517000)

Both the 1.6 and the 1.6+dynopt results should consistently be faster
than 1.9 for small benchmarks.

For large benchmarks and real applications, performance almost always
comes down to the performance of core classes like String and Array.
At that point, it's mostly a matter of figuring out where the core
classes don't perform as well...and fixing them.
About the only unintuitive thing I ever found was implementing a Java
interface, and while it's somewhat unintuitive, it's still trivial:

If you have suggestions for how to improve it, we'd love to hear them :)
# singleton comparator
comp =3D Class.new {
=C2=A0include Comparator
=C2=A0def compare a,b
=C2=A0 =C2=A0a.to_s <=3D> b.to_s
=C2=A0end
}.new

pq =3D PriorityQueue.new 11, comp

You can also do:

Comparator.impl do |name, a, b|
# name is name of interface method, check it or not
a.to_s <=3D> b.to_s
end

Or this may work too (I don't remember PriorityQueue's API):

pq =3D PriorityQueue.new(11) do |a, b|
Oracle's behavior lately is making me kind of iffy about the future of Ja= va as
a platform, but JRuby is just made of awesome.

Oracle's actions relating to Java have all been political. At the same
time people publish that they're fighting with Apache or Google, they
are also getting IBM (GPL-haters) and Apple (not big OSS contributors)
to collaborate on the GPLed OpenJDK, and making concrete plans for
OpenJDK to continue beyond Java 8.

As far as *using* Java, nothing has changed for the worse in the past year.
ruby-inline is very cool, but it's still not quite as easy as being able = to
write a Java class, pretend it's a Ruby class, and have it work.

There's also java_inline, an extension to ruby_inline I made that
allows you to write Java code inline like C code in ruby_inline:

https://github.com/jruby/java-inline

require 'java_inline'

class Foo
inline :Java do |builder|
builder.package "org.jruby.test"
builder.java "
public static int fib_java(int n) {
if (n < 2) return n;

return fib_java(n - 2) + fib_java(n - 1);
}
"
end
end

Foo.new.fib_java(45)

Fun stuff.

- Charlie
 
C

Charles Oliver Nutter

2. Integration with Java seems to be easy: I'm not a Java programmer, but
I've found it easy to write Java code to do the number crunching, and mostly
easy to integrate the "compiled" Java code with JRuby. (I say mostly because
at the start I couldn't find a way to compile the Java code in a way that
would reliably work with JRuby, but that was essentially me not
understanding how Java packages really worked. I still don't understand how
Java packages really work, but I've found a way to compile that reliably
works for me with JRuby!) That's a big plus because I definitely don't
understand at the moment how to compile C code and integrate that with MRI
Ruby.

Apart from perhaps MacRuby calling ObjC and IronRuby calling .NET,
JRuby calling Java is *by far* the easiest way to pull in external
libraries. C extensions and FFI are nowhere near as easy, and never
will be.

I always recommend using JRuby to take advantage of Java/JVM libraries
over C extensions and FFI, but of course I'm a bit biased :)

- Charlie
 
C

Charles Oliver Nutter

of course JRuby is a fantastic tool for many use cases, but i've personal= ly
found science to be perhaps the worst possible application of it. =C2=A0t= hese
reasons are quite simple:

- speed. =C2=A0when you need something to be big or fast in science, gene= rally
even c won't cut it. =C2=A0fortran is still used in maybe 80% of big weat= her
systems for a reason: the compilers are generally doing faster floating
point ops than the equiv c compilers. =C2=A0one can bridge fortran -> c -=
ruby
quite easily (narray does this, gsl does this, etc) and it's =C2=A0place = where
JRuby actually makes the job much harder. =C2=A0Java, of course, isn't ev= en in
the ballpark.

If you need C or Fortran, you need C or Fortran. I won't argue that.
Most people, however, don't.
- OS integration: the general approach to making ruby faster is to use
parallelism. =C2=A0the best way is to run lot's of processes. =C2=A0JRuby= 's interface
to the operating system level primitives for this (fork, et all) make thi= s
really really hard, close to impossible, to deal with simply. =C2=A0Mmap = is
another great example of something you want at your finger tips in
science... =C2=A0Interfaces to hardware boards connected to a research de= vice,
etc. =C2=A0I think any research based science makes getter close to the m= etal a
requirement.

The general approach to making Ruby faster is to use a faster Ruby or
write better Ruby code. JRuby's good for the former.

If you need to parallelize, processes are only one tool, and perhaps
the most blunt tool. In-process concurrency opens up many options that
are difficult or impossible with processes. So JRuby enables one set
of methodologies for concurrency while perhaps not supporting others
well. Trade-offs.

JRuby doesn't support fork, but it supports memory-mapping (via NIO
memory-mapping, and again you don't have to write or compile a line of
C). As for interfaces to hardware boards...if you need C, you need C.
I won't argue that. Most people don't.
- start up time. =C2=A0related to the above is the fact that science tend= s to
lead to many small programs running very often. =C2=A0map reduce jobs, cr= on jobs,
process pipe lines of related algorithims, toolkits made extensible via f= ile
based processing, tons of processing of stdin/stdout tend to be facts of
life when algorithm writers produce systems as a side effect. =C2=A0it's = not
pretty, but it is a fact i've seen repeated over and over.

This is how you do parallel processing for your work. It's not the
only way, and being able to pass whole in-memory object graphs over to
another thread is distinctly more elegant than having to marshal it
through a memory-mapped file or IO pipe.
i am definitely aware of some projects which make really heavy use of jav= a
and there, JRuby sure would be an awesome tool but my personal experience= is
that anything related to the JVM is a total non-starter. =C2=A0YMMV.

Java is not a requirement for someone to want JRuby. All that's
required is wanting to avoid monkeying with native code, wanting a
really solid VM, and wanting to run concurrent threads in a robust
environment. You can do all that without ever touching a line of Java
code. Just because you don't do Java for science doesn't mean Java and
the JVM are bad options for science.

And in any case...it was based on my recommendations, after dealing
with and hearing from dozens of MRI users who have no end of problems
with native C extensions. With JRuby, you write it once, build it
once, and ship it. Perhaps it's not quite as fast as C, perhaps it
doesn't integrate with the OS as well...but it's a hell of a lot less
painful to use. Perhaps you can't fork, but you can use real
concurrent threads, which are almost certainly easier (provided you
don't share mutable data, as with processes). Perhaps it's not as
low-level and bare-metal as MRI, but it's a better experience for
many, many cases. And that's the Ruby way.

- Charlie
 
J

James Edward Gray II

If you need to parallelize, processes are only one tool, and perhaps
the most blunt tool. In-process concurrency opens up many options that
are difficult or impossible with processes.
This is how you do parallel processing for your work. It's not the
only way, and being able to pass whole in-memory object graphs over to
another thread is distinctly more elegant than having to marshal it
through a memory-mapped file or IO pipe.
Perhaps you can't fork, but you can use real
concurrent threads, which are almost certainly easier (provided you
don't share mutable data, as with processes).

Before I say this, I need to state that I love and use JRuby. The =
reasons are that it completely rocks at some things, like Java =
integration.

Of course, like anything, there are tradeoffs and JRuby sucks at other =
things, like manipulating processes in a POSIX environment. I don't use =
it in these scenarios and you know that I've filed bugs for the specific =
problems I've run into (some of those have been partially addressed).

All that said, I think you were pretty harsh on using processes for =
concurrency in general. That "blunt tool" is pretty much the core of =
the Unix operating system, which I think a lot of us are found of. I =
often find it easier to work with processes that threads myself, though =
obviously some programmers think the other way.

On the contrary, threading is so challenging to get right that =
"threading is hard" is a popular saying:

http://www.google.com/search?q=3D"threading+is+hard"

It bugs me that people are so harsh on fork(). I avoided it like the =
plague when I was a younger programmer because everyone had me convinced =
it was evil. I'm now far more dangerous because I took the time to =
learn it and understand it. I strongly recommend all programmers do the =
same. (By the way, ara.t.howard taught me most of what I know about =
processes, directly and indirectly!)

So JRuby is good at threads and not so good at processes, in my opinion. =
Processes are also not at all evil. Judge not lest ye be judged. ;)

James Edward Gray II
 
M

Michel Demazure

@all, esp. @ara, @james, @charles

Thanks for this enlightening discussion, which clarify the issues I was
- quite clumsily - adressing.

_md
 
B

Benjamin J. Racine

I totally understand the desire to have thse capabilities and the elegance =
of ruby, but I think you'd find the science and engineering community in th=
e python world worth looking at in more detail.

Numpy, scipy, matplotlib, ipython mayavi2 are some buzzwords to look up and=
then decide for yourself.

Regards,
Ben R.


________________________________________
From: Michel Demazure [[email protected]]
Sent: Friday, December 17, 2010 12:56 AM
To: ruby-talk ML
Subject: Re: Ruby and science ?

Phillip Gawlowski wrote in post #969006:
Not quite, but have a look at ruby-toolbox.com (IIRC), which gives an
overview of what's available fir what. And there's the Ruby
Application Archive, of course.
'gsl' was not in the toolbox, and (stupid me) I did not look in the RAA
!
_md
 
M

Michel Demazure

Benjamin J. Racine wrote in post #969482:
I totally understand the desire to have thse capabilities and the
elegance of ruby, but I think you'd find the science and engineering
community in the python world worth looking at in more detail.

Numpy, scipy, matplotlib, ipython mayavi2 are some buzzwords to look up
and then decide for yourself.

Regards,
Ben R.
Benjamin, you are moving the knife in the wound (translated from French,
do you say that in English ?)
_md
 
M

Michel Demazure

Martin DeMello wrote in post #969546:
"Twisting the knife" in English

martin

In French it is "remuer le couteau dans la plaie". Twisting is certainly
meaner than "remuer" ;-)

_md
 
A

andrew mcelroy

I totally understand the desire to have thse capabilities and the elegance of ruby, but I think you'd find the science and engineering community in the python world worth looking at in more detail.

Numpy, scipy, matplotlib, ipython mayavi2 are some buzzwords to look up and then decide for yourself.

It depends on what kind of science you are trying to do in ruby.

I would like to point out that there are ruby bindings for Root.
http://root.cern.ch/root/HowtoRuby.html

http://root.cern.ch/ ( loosely think of it as the software behind the LHC )

Andrew McElroy
Regards,
Ben R.


________________________________________
From: Michel Demazure [[email protected]]
Sent: Friday, December 17, 2010 12:56 AM
To: ruby-talk ML
Subject: Re: Ruby and science ?

Phillip Gawlowski wrote in post #969006:
Not quite, but have a look at ruby-toolbox.com (IIRC), which gives an
overview of what's available fir what. And there's the Ruby
Application Archive, of course.
'gsl' was not in the toolbox, and (stupid me) I did not look in the RAA
!
_md
 
C

Charles Oliver Nutter

Of course, like anything, there are tradeoffs and JRuby sucks at other th=
ings, like manipulating processes in a POSIX environment. =C2=A0I don't use=
it in these scenarios and you know that I've filed bugs for the specific p=
roblems I've run into (some of those have been partially addressed).

The JVM and the JDK APIs suck at process manipulation...not JRuby.
JRuby does the best job it can do cross-platform with the JDK APIs
provided for it. If you need to go outside those APIs, or if we "suck"
in how we utilize them, it's a trivial matter to bind native C
process-management logic via FFI and use that. It won't be as portable
as what we provide, but it will work.

Providing the excellent cross-platform experience the JVM provides
(and which JRuby provides by extension) means a lot of
platform-specific things are a bit cumbersome. Our direction has been
to provide the cross-platform experience and allow people to opt out
of portability through FFI if necessary. You may disagree with that
approach.
All that said, I think you were pretty harsh on using processes for concu=
rrency in general. =C2=A0That "blunt tool" is pretty much the core of the U=
nix operating system, which I think a lot of us are found of. =C2=A0I often=
find it easier to work with processes that threads myself, though obviousl=
y some programmers think the other way.

Processes for concurrency works great. The blunt tool I meant was how
you get those processes to coordinate. You basically have a handful of
cumbersome options:

* Signals, which can't communicate much data
* Streams, pipes, files, shared memory, which can only carry byte[]
data, requiring marshaling

With threads, it's possible to communicate between concurrent
processes using normal OO constructs like queues, actors, and simple
method calls. You can emulate that with processes using one of the
above mechanisms, but it's a leaky abstraction. On the other hand,
your queues, actors, and method calls across threads need to be
thread-safe. Tradeoffs.

JRuby is perfectly happy to work with a multi-process model, but you
may need to opt out of portability to get the lowest-level behaviors
of a typical UNIX environment. I personally have nothing against
processes. Threads are just easier, if you stay out of the danger
zones.

Threading is hard if you do it wrong. The problem is that it's easy to
do it wrong.

Follow these rules and threading is a very nice, very clean, very easy
way to do concurrency:

1. Don't share data
2. If you must share data, don't share immutable data
3. If you must share mutable data, guarantee ACID (atomicity,
consistency, isolation, durability)

Clojure is a perfect example of an environment that uses threads
heavily by defaulting to (2) and providing software transactional
memory for (3). Other than enforcing immutability, nothing Clojure
does for concurrency could not be done in Ruby. Anyone interested in
seeing concurrency done the Clojure way with JRuby can find many
examples online.

Threads "fail" in that none of these rules are enforced at any level.
They're a very sharp tool with many dangerous paths. But I prefer
sharp tools.
It bugs me that people are so harsh on fork(). =C2=A0I avoided it like th=
e plague when I was a younger programmer because everyone had me convinced =
it was evil. =C2=A0I'm now far more dangerous because I took the time to le=
arn it and understand it. =C2=A0I strongly recommend all programmers do the=
same. =C2=A0(By the way, ara.t.howard taught me most of what I know about =
processes, directly and indirectly!)

I have no problem with fork. If JRuby could support fork on the JVM,
we would do so. We don't only because all mainstream JVMs spin up
multiple threads, which are not carried along to forked child
processes (and even if they could be restarted, it's a very
complicated transition that might defeat much of the benefit of
forking).
So JRuby is good at threads and not so good at processes, in my opinion. =
=C2=A0Processes are also not at all evil. =C2=A0Judge not lest ye be judged=
=C2=A0;)

It might be more correct to say that the JVM is good at threads and
not so good at processes, nothing that JRuby makes it possible via FFI
to be nearly as good at processes as any POSIX application. We have
simply prioritized making JRuby work uniformly across platforms first,
while still providing the tools people need to opt out of portability
for lower-level behaviors and features.

- Charlie
 
C

Charles Oliver Nutter

I totally understand the desire to have thse capabilities and the elegance of ruby, but I think you'd find the science and engineering community in the python world worth looking at in more detail.

Numpy, scipy, matplotlib, ipython mayavi2 are some buzzwords to look up and then decide for yourself.

What I'd really like to see are FFI-based wrappers around key science
and math libraries, rather than more blasted C extensions that can't
be run concurrently and aren't easily portable across impls. FFI works
incredibly well for these isolated libraries (as opposed to FFI for
kernel-level features, which can have many platofrm-specific
differences).

C extensions are the devil.

- Charlie
 
R

Ryan Davis

The JVM and the JDK APIs suck at process manipulation...not JRuby.

Oh come now. If the JVM sucks at something, JRuby sucks at it too. Don't =
pass the buck.
JRuby does the best job it can do cross-platform with the JDK APIs
provided for it. If you need to go outside those APIs, or if we "suck"
in how we utilize them, it's a trivial matter to bind native C
process-management logic via FFI and use that. It won't be as portable
as what we provide, but it will work.

If it were trivial, why aren't you shipping it (or at least pointing to =
a jruby supported gem that does)? You've espoused FFI as the C-API =
silver bullet time and again. I have doubts that it is that trivial as =
FFI itself seems non-portable.
 
C

Charles Oliver Nutter

Oh come now. If the JVM sucks at something, JRuby sucks at it too. Don't pass the buck.

You couldn't be more wrong. It's understandable since you don't
actually know anything about JRuby's implementation.

Notice I specifically called out the JDK APIs. The JDK provides very
primitive APIs for dealing with processes, providing no way to share
stdio streams with child processes, no way to let child processes run
without pumping their IO streams, no way to get actual PIDs and send
signals to them, and so on. These are APIs that haven't changed in
over a decade, designed to provide the lowest common denominator of
Process management features across many platforms. They pretty much
suck.

JRuby, in its default mode, does all its process management using
these APIs. We use a few mostly-portable tricks to get real PIDs and
to make processes appear more detached than they are, but we don't do
much more than that. However, using FFI, it's trivial to route around
those cumbersome built-in APIs and get much more modern behavior. A
perfect example is my "spoon" gem, which uses FFI to bind the
posix_spawn syscall, which allows something no other standard JDK API
can do: sharing stdio with child processes. We also ship with a set of
native bindings (across almost a dozen platforms) to POSIX functions
that have no equivalent in the JDK, ranging from process management
(waitpid, kill), to signals, to filesystem 'stat', and more.

JRuby goes above and beyond the typical JVM-based language in
supporting the POSIX features in question, and the limitations of the
JVM and JDK are often not applicable to JRuby.
If it were trivial, why aren't you shipping it (or at least pointing to a jruby supported gem that does)? You've espoused FFI as the C-API silver bullet time and again. I have doubts that it is that trivial as FFI itself seems non-portable.

We don't ship anything yet because exactly one person has reported
these issues, and we provided workarounds for almost every case with
just a few lines of FFI code. I'd love to work out a complete set of
native-behaving process management APIs (for users to opt into), but
there's only so much we can do in a given cycle. Given limited
resources, we cater to the majority first. The majority of JRuby users
do not have these issues, and would prefer we work on Ruby 1.9
compatibility, user-reported bugs, and Java integration features.

Perhaps you'd like to help? I'd happily support you.

- Charlie
 
T

timr

Hi Michel,
I think you are completely correct in your criticism of ruby's lack of
support for scientific computing, and complex graphing. I think the
problem is that python had a bit of a lead time (~5 yrs.) before ruby
became popular. During that period, many of the scientists switched
from perl to python and began developing the tools they needed. When
Ruby was popularized with rails, python was already established. So
most of the talent that could solve this problem for the ruby
community is already happily doing science with python and has no
reason to switch to a slightly more elegantly designed language (as
some would argue). I learned some R and use ruby to crunch the numbers
and R to plot them. I think it is a reasonable solution. If I did a
lot of data crunching/graphing, I might write a DSL that allowed R
plotting code to be made via a nicer ruby-like syntax, but so far I
haven't been motivated enough. Alternatively, I would switch to
python, I think it would take a couple of weeks to get up to speed as
the two are quite similar conceptually. I think it would be great if
we had a dedicated scientific community to build the tools in ruby,
but I don't see it happening, because python has already filled the
niche.
Tim
 
R

Ryan Davis

Don't pass the buck.
=20
You couldn't be more wrong. It's understandable since you don't
actually know anything about JRuby's implementation.

Nice Ad Hominem. It's just that I don't need to know anything about =
jruby's implementation to identify someone passing the buck when they do =
it.

Or... are you claiming that JRuby doesn't suck at process manipulation =
and that JEG is wrong, or worse, a liar?

But JEG is right, it does suck. That's not a terrible thing. Sometimes =
your dogmatic "rah rah java/jruby" thing blinds you to simple truths: =
jruby being built on the jvm gives it a lot of strengths, but as JEG =
said (so succinctly), "like anything, there are tradeoffs and JRuby =
sucks at other things". That's not the end of the world, but just =
because you claim it isn't so, doesn't make it true.
 
M

Martin DeMello

What I'd really like to see are FFI-based wrappers around key science
and math libraries, rather than more blasted C extensions that can't
be run concurrently and aren't easily portable across impls. FFI works
incredibly well for these isolated libraries (as opposed to FFI for
kernel-level features, which can have many platofrm-specific
differences).

How do FFI wrappers handle concurrent running? (As in, how do they
differ from C extensions in that respect?)

martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,607
Members
45,241
Latest member
Lisa1997

Latest Threads

Top