Lua is faster than Fortran???

sturlamolden · Jul 3, 2010

I was just looking at Debian's benchmarks. It seems LuaJIT is now (on
median) beating Intel Fortran!

C (gcc) is running the benchmarks faster by less than a factor of two.
Consider that Lua is a dynamically typed scripting language very
similar to Python.

LuaJIT also runs the benchmarks faster than Java 6 server, OCaml, and
SBCL.

I know it's "just a benchmark" but this has to count as insanely
impressive. Beating Intel Fortran with a dynamic scripting language,
how is that even possible? And what about all those arguments that
dynamic languages "have to be slow"?

If this keeps up we'll need a Python to Lua bytecode compiler very
soon. And LuaJIT 2 is rumoured to be much faster than the current...

Looking at median runtimes, here is what I got:

gcc 1.10

LuaJIT 1.96

Java 6 -server 2.13
Intel Fortran 2.18
OCaml 3.41
SBCL 3.66

JavaScript V8 7.57

PyPy 31.5
CPython 64.6
Perl 67.2
Ruby 1.9 71.1

The only comfort for CPython is that Ruby and Perl did even worse.

Steven D'Aprano · Jul 4, 2010

I know it's "just a benchmark" but this has to count as insanely
impressive. Beating Intel Fortran with a dynamic scripting language, how
is that even possible?

By being clever, using Just In Time compilation as much as possible, and
almost certainly using masses of memory at runtime. (The usual trade-off
between space and time.)

See the PyPy project, which aims to do the same thing for Python as Lua
have done. Their ultimate aim is to beat the C compiler and be faster
than C. So far they've got a bit to go, but they're currently about twice
as fast as CPython.

And what about all those arguments that dynamic
languages "have to be slow"?

They're bullshit, of course. It depends on the nature of the dynamicism.
Some things are inherently slow, but not everything.

Fast, tight, dynamic: pick any two.

If this keeps up we'll need a Python to Lua bytecode compiler very soon.

"Need" is a bit strong. There are plenty of applications where if your
code takes 0.1 millisecond to run instead of 0.001, you won't even
notice. Or applications that are limited by the speed of I/O rather than
the CPU.

But I'm nitpicking... this is a nice result, the Lua people should be
proud, and I certainly wouldn't say no to a faster Python

[...]

The only comfort for CPython is that Ruby and Perl did even worse.

It's not like this is a race, and speed is not the only thing which a
language is judged by. Otherwise you'd be programming in C, not Python,
right?

sturlamolden · Jul 4, 2010

"Need" is a bit strong. There are plenty of applications where if your
code takes 0.1 millisecond to run instead of 0.001, you won't even
notice. Or applications that are limited by the speed of I/O rather than
the CPU.

But I'm nitpicking... this is a nice result, the Lua people should be
proud, and I certainly wouldn't say no to a faster Python

Need might be too strong, sorry. I'm not a native speaker of
English

Don't read this as a complaint about Python being too slow. I don't
care about milliseconds either. But I do care about libraries like
Python's standard library, wxPython, NumPy, and matplotlib. And when I
need C, C++ or Fortran I know where to fint it. Nobody in the
scientific community would be sad if Python was so fast that no C or
Fortran would have to be written. And I am sure Google and many other
users of Python would not mind either. And this is kind of a proof
that it can be. Considering that Lua is to Python what C is to C++
(more or less), it means that it is possible to make Python run very
fast as well.

Yes the LuaJIT team should be proud. Making a scripting language run
faster than Fortran on CPU-bound work is a superhuman result.

Rami Chowdhury · Jul 4, 2010

I was just looking at Debian's benchmarks. It seems LuaJIT is now (on
median) beating Intel Fortran!

That's amazing! Congrats to the Lua team!

If this keeps up we'll need a Python to Lua bytecode compiler very
soon. And LuaJIT 2 is rumoured to be much faster than the current...

Looking at median runtimes, here is what I got: [snip]
The only comfort for CPython is that Ruby and Perl did even worse.

Out of curiosity, does anyone know how the Unladen Swallow version of Python
does by comparison?

Stefan Behnel · Jul 4, 2010

sturlamolden, 04.07.2010 05:30:

I was just looking at Debian's benchmarks. It seems LuaJIT is now (on
median) beating Intel Fortran!

C (gcc) is running the benchmarks faster by less than a factor of two.
Consider that Lua is a dynamically typed scripting language very
similar to Python.

Sort of. One of the major differences is the "number" type, which is (by
default) a floating point type - there is no other type for numbers. The
main reason why Python is slow for arithmetic computations is its integer
type (int in Py3, int/long in Py2), which has arbitrary size and is an
immutable object. So it needs to be reallocated on each computation. If it
was easily mappable to a CPU integer, Python implementations could just do
that and be fast. But its arbitrary size makes this impossible (or requires
a noticeable overhead, at least). The floating point type is less of a
problem, e.g. Cython safely maps that to a C double already. But the
integer type is.

So it's not actually surprising that Lua beats CPython (and the other
dynamic languages) in computational benchmarks.

It's also not surprising to me that a JIT compiler beats a static compiler.
A static compiler can only see static behaviour of the code, potentially
with an artificially constructed idea about the target data. A JIT compiler
can see the real data that flows through the code and can optimise for that.

Stefan

Teemu Likonen · Jul 4, 2010

The main reason why Python is slow for arithmetic computations is its
integer type (int in Py3, int/long in Py2), which has arbitrary size
and is an immutable object. So it needs to be reallocated on each
computation. If it was easily mappable to a CPU integer, Python
implementations could just do that and be fast. But its arbitrary size
makes this impossible (or requires a noticeable overhead, at least).
The floating point type is less of a problem, e.g. Cython safely maps
that to a C double already. But the integer type is.

You may be right. I'll just add that Common Lisp's integers are of
arbitrary size too but programmer can declare them as fixnums. Such
declarations kind of promise that the numbers really are between
most-negative-fixnum and most-positive-fixnum. Compiler can then
optimize the code to efficient machine instructions.

I guess Python might have use for some sort of

(defun foo (variable)
(declare (type fixnum variable))
...)

David Cournapeau · Jul 4, 2010

sturlamolden, 04.07.2010 05:30:

Sort of. One of the major differences is the "number" type, which is (by
default) a floating point type - there is no other type for numbers. The
main reason why Python is slow for arithmetic computations is its integer
type (int in Py3, int/long in Py2), which has arbitrary size and is an
immutable object. So it needs to be reallocated on each computation. If it
was easily mappable to a CPU integer, Python implementations could just do
that and be fast. But its arbitrary size makes this impossible (or requires
a noticeable overhead, at least). The floating point type is less of a
problem, e.g. Cython safely maps that to a C double already. But the integer
type is.

Actually, I think the main reason why Lua is much faster than other
dynamic languages is its size. The language is small. You don't list,
dict, tuples, etc... Making 50 % of python fast is "easy" (in the
sense that it has been done). I would not be surprised if it is
exponentially harder the closer you get to 100 %. Having a small
language means that the interpreter is small - small enough to be kept
in L1, which seems to matter a lot
(http://www.reddit.com/r/programming..._2_beta_3_is_out_support_both_x32_x64/c0lrus0).

If you are interested in facts and technical details (rather than mere
speculations), this thread is interesting
http://lambda-the-ultimate.org/node/3851. It has participation of
LuaJIT author, Pypy author and Brendan Eich

It's also not surprising to me that a JIT compiler beats a static compiler.
A static compiler can only see static behaviour of the code, potentially
with an artificially constructed idea about the target data. A JIT compiler
can see the real data that flows through the code and can optimise for that.

Although I agree that in theory, it is rather obvious that a JIT
compiler can do many things that static analysis cannot, this is the
first time it has happened in practice AFAIK. Even hotspot was not
faster than fortran and C, and it has received tons of work by people
who knew what they were doing. The only example of a dynamic language
being as fast/faster than C that I am aware of so far is Staline, the
aggressive compiler for scheme (used in signal processing in
particular).

David

D'Arcy J.M. Cain · Jul 4, 2010

"Need" is a bit strong. There are plenty of applications where if your
code takes 0.1 millisecond to run instead of 0.001, you won't even
notice. Or applications that are limited by the speed of I/O rather than
the CPU.

Which is 99% of the real-world applications if you factor out the code
already written in C or other compiled languages. That's the point of
Python after all. You speed up programming rather than programs but
allow for refactoring into C when necessary. And it's not call CPython
for nothing. off-the-shelf benchmarks are fun but mostly useless for
choosing a language, priogram, OS or machine unless you know that it
checks the actual things that you need in the proportion that you need.

But I'm nitpicking... this is a nice result, the Lua people should be
proud, and I certainly wouldn't say no to a faster Python

Ditto, ditto, ditto and ditto.

It's not like this is a race, and speed is not the only thing which a
language is judged by. Otherwise you'd be programming in C, not Python,
right?

Or assembler.

David Cournapeau · Jul 4, 2010

Which is 99% of the real-world applications if you factor out the code
already written in C or other compiled languages.

This may be true, but there are areas where the percentage is much
lower. Not everybody uses python for web development. You can be a
python fan, be reasonably competent in the language, and have good
reasons to wish for python to be one order of magnitude faster.

I find LUA quite interesting: instead of providing a language simple
to develop in, it focuses heavily on implementation simplicity. Maybe
that's the reason why it could be done at all by a single person.

David

bart.c · Jul 4, 2010

sturlamolden said:
I was just looking at Debian's benchmarks. It seems LuaJIT is now (on
median) beating Intel Fortran!

C (gcc) is running the benchmarks faster by less than a factor of two.
Consider that Lua is a dynamically typed scripting language very
similar to Python.

LuaJIT also runs the benchmarks faster than Java 6 server, OCaml, and
SBCL.

I know it's "just a benchmark" but this has to count as insanely
impressive. Beating Intel Fortran with a dynamic scripting language,
how is that even possible? And what about all those arguments that
dynamic languages "have to be slow"?

If this keeps up we'll need a Python to Lua bytecode compiler very
soon. And LuaJIT 2 is rumoured to be much faster than the current...

Looking at median runtimes, here is what I got:

gcc 1.10

LuaJIT 1.96

Java 6 -server 2.13
Intel Fortran 2.18
OCaml 3.41
SBCL 3.66

JavaScript V8 7.57

PyPy 31.5
CPython 64.6
Perl 67.2
Ruby 1.9 71.1

The only comfort for CPython is that Ruby and Perl did even worse.

I didn't see the same figures; LuaJIT seem to be 4-5 times as slow as one of
the C's, on average. Some benchmarks were slower than that.

But I've done my own brief tests and I was quite impressed with LuaJIT which
seemed to outperform C on some tests.

I'm developing my own language and LuaJIT is a new standard to beat for this
type of language. However, Lua is quite a lightweight language with
minimalist data types, it doesn't suit everybody.

I suspect also the Lua JIT compiler optimises some of the dynamicism out of
the language (where it can see, for example, that something is always going
to be a number, and Lua only has one numeric type with a fixed range), so
that must be a big help.

D'Arcy J.M. Cain · Jul 4, 2010

This may be true, but there are areas where the percentage is much
lower. Not everybody uses python for web development. You can be a
python fan, be reasonably competent in the language, and have good
reasons to wish for python to be one order of magnitude faster.

I wish it was orders of magnitude faster for web development. I'm just
saying that places where we need compiled language speed that Python
already has that in C.

But, as I said in the previous message, in the end it is up to you to
write your own benchmark based on the operations you need and the usage
patterns you predict that it will need as well. If your application
needs to calculate Pi to 100 places but only needs to do it once there
is no need to include that in your benchmark a million times. A
language that is optimized for calculating Pi shouln't carry a lot of
weight for you.

I find LUA quite interesting: instead of providing a language simple
to develop in, it focuses heavily on implementation simplicity. Maybe
that's the reason why it could be done at all by a single person.

Is that really true about LUA? I haven't looked that closely at it but
that paragraph probably turned off most people on this list to LUA.

D'Arcy J.M. Cain · Jul 4, 2010

CPython 64.6

By the way, I assume that that's Python 2.x. I wonder how Python 3.1
would fare.

David Cournapeau · Jul 4, 2010

I wish it was orders of magnitude faster for web development. Â I'm just
saying that places where we need compiled language speed that Python
already has that in C.

Well, I wish I did not have to use C, then

For example, as a
contributor to numpy, it bothers me at a fundamental level that so
much of numpy is in C.

Also, there are some cases where using C for speed is very difficult,
because the marshalling cost almost entirely alleviate the speed
advantages - that means you have to write in C more than you
anticipated. Granted, those may be quite specific to scientific
applications, and cython already helps quite a fair bit in those
cases.

But, as I said in the previous message, in the end it is up to you to
write your own benchmark based on the operations you need and the usage
patterns you predict that it will need as well. Â If your application
needs to calculate Pi to 100 places but only needs to do it once there
is no need to include that in your benchmark a million times.

I would question the sanity of anyone choosing a language because it
can compute Pi to 100 places very quickly

I am sure google search
would beat most languages if you count implementation + running time
anyway.

Is that really true about LUA? Â I haven't looked that closely at it but
that paragraph probably turned off most people on this list to LUA.

I hope I did not turn anyone off - but it is definitely not the same
set of tradeoff as python. LUA runtime is way below 1 Mb, for example,
which is one reason why it is so popular for video games. The
following presentation gives a good overview (by LUA creator):

http://www.stanford.edu/class/ee380/Abstracts/100310-slides.pdf

To go back to the original topic: a good example is numeric types. In
python, you have many different numerical types with different
semantics. In LUA, it is much simpler. This makes implementation
simpler, and some aggressive optimizations very effective. The fact
that a LUA interpreter can fit in L1 is quite impressive.

David

sturlamolden · Jul 4, 2010

I suspect also the Lua JIT compiler optimises some of the dynamicism out of
the language (where it can see, for example, that something is always going
to be a number, and Lua only has one numeric type with a fixed range), so
that must be a big help.

Python could do the same, replace int and float with a "long double".
It is 80 bit and has a 64 bit mantissa. So it can in theory do the job
of all floating point types and integers up to 64 bit (signed and
unsigned). A long double can 'duck type' all the floating point and
integer types we use. There is really no need for more than one number
type. For an interpreted language, it's just a speed killer. Other
number types belong in e.g. the ctypes, array, struct and NumPy
modules. Speed wise a long double (80 bit) is the native floating
point type on x86 FPUs. There is no penalty memory-wise either,
wrapping an int as PyObject takes more space. For a dynamic language
it can be quite clever to just have one 'native' number type,
observing that the mantissa of a floating point number is an unsigned
integer. That takes a lot of the dynamicity out of the equation. Maybe
you like to have integers and floating point types in the 'language'.
But that does not mean it should be two types in the
'implementation' (i.e. internally in the VM). The implementation could
duck type both with a suffciently long floating point type, and the
user would not notice in the syntax.

MATLAB does the same as Lua. Native number types are always double,
you have to explicitly create the other. Previously they did not even
exist. Scientists have been doing numerical maths with MATLAB for
decades. MATLAB never prevented me from working with integers
mathematically, even if I only worked with double. If I did not know,
I would not have noticed.

a = 1; % a is a double
a = 1 + 1; % a is a double and exactly 2
a = int32(1);

Sturla

sturlamolden · Jul 4, 2010

Sort of. One of the major differences is the "number" type, which is (by
default) a floating point type - there is no other type for numbers. The
main reason why Python is slow for arithmetic computations is its integer
type (int in Py3, int/long in Py2), which has arbitrary size and is an
immutable object. So it needs to be reallocated on each computation.

That is why Lua got it right. A floating point type has a mantissa and
can duck type an integer. MATLAB does the same.

Sturla

If it

sturlamolden · Jul 4, 2010

Actually, I think the main reason why Lua is much faster than other
dynamic languages is its size. The language is small. You don't list,
dict, tuples, etc...

They have managed to combine list and dict into one type (table) that
does the job of both. And yes there are tuples.

There are no classes, but there are closures and other building blocks
that can be used to create any object-oriented type system (just like
CLOS is defined by Lisp, not a part of the basic Lisp syntax). So I
imagine it would be possible to define an equivalent to the Python
type system in Lua, and compile Python to Lua. Lua can be compiled to
Lua byte code. Factoring Lua, out that means we should be able to
compile Python to Lua byte code.

David Cournapeau · Jul 4, 2010

That is why Lua got it right. A floating point type has a mantissa and
can duck type an integer. MATLAB does the same.

I sincerly doubt it - where do take the information that matlab use
float to represent int ? It would not be able to represent the full
range of 64 bits integer for example.

David

sturlamolden · Jul 4, 2010

Out of curiosity, does anyone know how the Unladen Swallow version of Python
does by comparison?

Judging from their PyCon slides, it's roughly 1.5 times faster than
CPython.

That might be important to Google, but not to me.

sturlamolden · Jul 4, 2010

I sincerly doubt it - where do take the information that matlab use
float to represent int ?

I've used Matlab since 1994, so I know it rather well...

Only the recent versions can do arithmetics with number types
different from double (or complex double).

It would not be able to represent the full
range of 64 bits integer for example.

There is a 53 bit mantissa plus a sign bit. Nobody complained on 32
bit systems. That is, when the signed 54 bit integer contained in a
double was overflowed, there was a loss of precision but the numerical
range would still be that of a double.

You get an unsigned integer in MATLAB like this

x = uint64(0)

but until recently, MATLAB could not do any arithmetics with it. It
was there for interaction with Java and C MEX files.

A long double has a mantissa of 64 bit however, so it can represent
signed 65 bit integers without loss of precision.

Stefan Behnel · Jul 4, 2010

sturlamolden, 04.07.2010 18:37:

Judging from their PyCon slides, it's roughly 1.5 times faster than
CPython.

A number like "1.5 times faster" is meaningless without a specific
application and/or code section in mind. I'm pretty sure there are cases
where they are much faster than that, and there are cases where the net
gain is zero (or -0.x or whatever).

Stefan

Lua is faster than Fortran???

sturlamolden

Steven D'Aprano

sturlamolden

Rami Chowdhury

Stefan Behnel

Teemu Likonen

David Cournapeau

D'Arcy J.M. Cain

David Cournapeau

bart.c

D'Arcy J.M. Cain

D'Arcy J.M. Cain

David Cournapeau

sturlamolden

sturlamolden

sturlamolden

David Cournapeau

sturlamolden

sturlamolden

Stefan Behnel

Members online

Forum statistics

Latest Threads