For performance, write it in C

Chad Perrin · Jul 26, 2006

I have yet to write anything in Ruby was less than twice as fast to
code as it would have been in bourne-sh/Java/whatever, never mind
twice as fun or maintainable. I recently rewrote an 830 line Java/
Hibernate web service client as 67 lines of Ruby, in about an hour.
With that kind of productivity, performance can go to hell!

With a 92% cut in code weight, I can certainly sympathize with that
sentiment. Wow.

Chad Perrin · Jul 26, 2006

Half true. The Java VM could be called "half-compiled and half-interpreted"
at runtime for only a short time, and only if you do not consider VM
bytecodes to be a valid "compiled" state. However most bytecode is very
quickly compiled into processor-native code, making those bits fully
compiled. After a long enough runtime (not very long in actuality), all Java
code is running as native code for the target processor (with various
degrees of optimization and overhead).

True . . . but this results in fairly abysmal performance, all things
considered, for short runs. Also, see below regarding dynamic
programming.

The difference between AOT compilation with GCC or .NET is that Java's
compiler can make determinations based on runtime profiling about *how* to
compile that "last mile" in the most optimal way possible. The bytecode
compilation does, as you say, primarily speed up the interpretation process.
However it's far from the whole story, and the runtime JITing of bytecode
into native code is where the magic lives. To miss that is to miss the
greatest single feature of the JVM.

This also is true, but that benefit is entirely unusable for highly
dynamic code, unfortunately -- and, in fact, even bytecode compilation
might be a bit too much to ask for too-dynamic code. I suppose it's
something for pointier heads than mine, since I'm not actually a
compiler-writer or language-designer (yet). It's also worth noting that
this isn't accomplishing anything that isn't also accomplished by the
Perl JIT compiler.

When execution happens in Perl 5.x, on the other hand, a compiler runs

I am not familiar with Perl's compiler. Does it compile to processor-native
code or to an intermediate bytecode of some kind?

There is no intermediate bytecode step for Perl, as far as I'm aware.
It's not a question I've directly asked one of the Perl internals
maintainers, but everything I know about the Perl compiler confirms my
belief that it simply does compilation to machine code.

We're also juggling terms pretty loosely here. A compiler converts
human-readable code into machine-readable code. If the "machine" is a VM,
then you're fully compiling. If the VM code later gets compiled into "real
machine" code, that's another compile cycle. Compilation isn't as cut and
dried as you make it out to be, and claiming that, for example, Java is
"half compiled" is just plain wrong.

Let's call it "virtually compiled", then, since it's being compiled to
code that is readable by a "virtual machine" -- or, better yet, we can
call it bytecode and say that it's not fully compiled to physical
machine-readable code, which is what I was trying to explain in the
first place.

Something akin to bytecode compilation could be used to improve upon the

There are a lot of clever programmers out there.

True, of course. The problem is getting them to work on a given
problem.

Having worked heavily on a Ruby implementation, I can say for certain that
99% of Ruby code is static. There are some dynamic bits, especially within
Rails where methods are juggled about like flaming swords, but even these
dynamic bits eventually settle into mostly-static sections of code.

I love that imagery, with the flaming sword juggling. Thanks.

Compilation of Ruby code into either bytecode for a fast interpreter engine
like YARV or into bytecode for a VM like Java is therefore perfectly valid
and very effective. Preliminary compiler results for JRuby show a boost of
50% performance over previous versions, and that's without optimizing many
of the more expensive Ruby operations (call logic, block management).
Whether a VM is present (as in JRuby) or not (as may be the case with YARV),
eliminating the overhead of per-node interpretation is a big positive. JRuby
will also feature a JIT compiler to allow running arbitrary .rb files
directly, optimizing them as necessary and as seems valid based on runtime
characteristics. I don't know if YARV will do the same, but it's a good
idea.

I'm sure a VM or similar approach (and, frankly, I do prefer the
fast-interpreter approach over the VM approach) would provide ample
opportunity to improve upon Ruby's current performance, but that doesn't
necessarily mean it's better than other approaches to improving
performance. That's where I was aiming.

The whole VM thing is such a small issue. Ruby itself is really just a VM,
where its instructions are the elements in its AST. The definition of a VM
is sufficiently vague enough to include most other interpreters in the same
family. Perhaps you are specifically referring to VMs that provide a set of
"processor-like" fine-grained operations, attempting to simulate some sort
of magical imaginary hardware? That would describe the Java VM pretty well,
though in actuality there are real processes that run Java bytecodes
natively as well. Whether or not a language runs on top of a VM is
irrelevant, especially considering JRuby is a mostly-compatible version of
Ruby running on top of a VM. It matters much more that translation to
whatever underlying machine....virtual or otherwise...is as optimal and
clean as possible.

A dividing line between "interpreter" and "VM" has always seemed rather
more clear to me than you make it sound. Yes, I do refer to a
simulation of an "imaginary" (or, more to the point, "virtual") machine,
as opposed to a process that interprets code. Oh, wait, there's that
really, really obvious dividing line I keep seeing.

The use (or lack) of a VM does indeed matter: it's an implementation
detail, and implementation details make a rather significant difference
in performance. The ability of the parser to quickly execute what's fed
to it is important, as you indicate, but so too is the ability of the
parser to run quickly itself -- unless your program is actually compiled
to machine-native code for the hardware, in which case the lack of need
for the parser to execute at all at runtime is significant.

Chad Perrin · Jul 26, 2006

Just out of curiosity (since I don't know much about this subject),
what do yo think of the approach Microsoft took with the CLR? From
what I read it's very similar to the JVM except it compiles directly
to native code, and makes linking to native libraries easier. I
assume this is closer to JVM behaviour than Perl 5 behaviour. Is
there anything to be learnt from it for Ruby?

I'm not as familiar with what's going on under the hood of the CLR as
the JVM, but from what I do know it exhibits both advantages and
disadvantages in comparison with the Java VM. Thus far, the evidence
seems to be leaning in the direction of the CLR's advantages over the
JVM coming into play more often than the disadvantages, however, which
seems to indicate that the compromises that were made may have been the
"right" compromises, as far as this comparison goes.

In fact, the CLR seems in some ways to be a compromise between
Perl-style JIT compilation and Java-style bytecode compilation with
runtime VM-interpretation (there really needs to be a term for what a VM
does separate from either compilation or interpretation, since what it
does generally isn't strictly either of them). There may well be
something to learn from that for future Ruby implementations, though I'd
warn away from trying to take the "all languages compile to the same
intermediate bytecode" approach that the CLR takes -- it tries to be too
many things at once, basically, and ends up introducing some
inefficiencies in that sense. If you want to do everything CLR does,
with Ruby, then port Ruby to the CLR, but if you want to simply gain
performance benefits from studying up on the CLR, make sure you
cherry-pick the bits that are relevant to the task at hand.

I think Ruby would probably best benefit from something somewhere
between the Perl compiler's behavior and the CLR compiler.
Specifically, compile all the static algorithm behavior in your code to
something persistent, link in all the rest as uncompiled (though perhaps
parse-tree compiled, which is almost but not quite the same as bytecode
compiled) code, and let that be machine-code compiled at runtime. This
might even conceivably be broken into two separate compilers to minimize
the last-stage compiler size needed on client systems and to optimize
each part to operate as quickly as possible.

Run all this stuff past a true expert before charging off to implement
it, of course. I'm an armchair theorist.

Csaba Henk · Jul 26, 2006

implemented. For example, Ruby has real closures, Python doesn't. I

Even if OT, just for the sake of correctness: let me remark that Python
does have closures. Local functions (ones defined within another
function's body) are scoped lexically.

It's just sort of an anti-POLA (and inconvenient, as-is) piece of
semantics that variables get reinitalized upon assignment.

Hence:

def foo():
x = 5
def bar():
x = 6
return x
bar()
return x, bar

x, bar = foo()
print x, bar() ==> 5 6

def foo():
_x = [5]
def bar():
_x[0] = 6
return _x[0]
bar()
return _x[0], bar

x, bar = foo()
print x, bar() ==> 6 6

Regards,
Csaba

Chad Perrin · Jul 26, 2006

implemented. For example, Ruby has real closures, Python doesn't. I

Click to expand...

Even if OT, just for the sake of correctness: let me remark that Python
does have closures. Local functions (ones defined within another
function's body) are scoped lexically.

It's just sort of an anti-POLA (and inconvenient, as-is) piece of
semantics that variables get reinitalized upon assignment.

Hence:

def foo():
x = 5
def bar():
x = 6
return x
bar()
return x, bar

x, bar = foo()
print x, bar() ==> 5 6

def foo():
_x = [5]
def bar():
_x[0] = 6
return _x[0]
bar()
return _x[0], bar

x, bar = foo()
print x, bar() ==> 6 6

Is it just me, or are there no proper closures in that example code?

Sean O'Halpin · Jul 26, 2006

You're mixing language semantics and implementation details here.

Click to expand...

[snip]
In some ways, you're right: implementation details are being mixed up
with language definition in the preceding list of features.

[snip]

Nope - there's no mix up. My point is that any feature of a language
that requires extra work
whether it be at compile time or run time incurs a cost. Those
features are generally there to make life easier for us programmers,
not the machine. The only way to make sure you're not paying that
price is to hand-code optimised machine code for a specific processor
and hardware context. No language translator can guarantee that it
will produce better code (regardless of the nonsense of the 'perfect
compiler').

Let me take two examples from my list. First, method lookup in OO
languages. There is no
way you can optimise this across the board to static jumps in a
language like Ruby or Smalltalk. There will always be the requirement
(imposed by the ~semantics~ of the language) to be able to find the
right method at runtime. This is part of the language design which
imposes constraints on the implementation that assembly languages (for
example) do not have to pay. There is a cost for abstraction (a cost
which I am willing to pay by the way). Of course, you can implement
virtual methods in assembly, but you don't ~have~ to. In Ruby there is
no choice. Everything is an object. (You can optimise most of it away,
but not all).

Second, closures:

Chad Perrin said:

A little extra memory usage does not translate directly to performance loss.

Coming from a background where I had to make everything happen in 48K,
I have to disagree. And it's not always just 'a little extra memory
usage'. Careless use of closures can cripple an application. See the
problems the Rails team encountered.

Charles - you say that closures are explicit - I beg to differ. By
definition, they are surely implicit. Doesn't your argument that they
can be simulated by other means contradict your statement?

As for the notion that a hardware YARV processor would make a
difference - how would that ameliorate the issues Ruby has with memory
usage? Performance isn't just about time - space also matters.

I am surprised that you think I am confusing language features with
implementation details. From my point of view, it is you who are
ignoring the fact that abstractions incur a cost.

Best regards (I'm enjoying this discussion BTW

Sean

Sean O'Halpin · Jul 26, 2006

implemented. For example, Ruby has real closures, Python doesn't. I

Click to expand...

Even if OT, just for the sake of correctness: let me remark that Python
does have closures. Local functions (ones defined within another
function's body) are scoped lexically.

It's just sort of an anti-POLA (and inconvenient, as-is) piece of
semantics that variables get reinitalized upon assignment.

Hence:

def foo():
x = 5
def bar():
x = 6
return x
bar()
return x, bar

x, bar = foo()
print x, bar() ==> 5 6

def foo():
_x = [5]
def bar():
_x[0] = 6
return _x[0]
bar()
return _x[0], bar

x, bar = foo()
print x, bar() ==> 6 6

Click to expand...

Is it just me, or are there no proper closures in that example code?

I've crossed my eyes twice and still can't see it

Chad Perrin · Jul 26, 2006

Chad Perrin said:

Coming from a background where I had to make everything happen in 48K,
I have to disagree. And it's not always just 'a little extra memory
usage'. Careless use of closures can cripple an application. See the
problems the Rails team encountered.

Careless use of ANYTHING can cripple an application. Using an extra
kilobyte of memory on a 1GB system for a closure instance or two is not
indicative of an inescapable performance cost for the mere fact of the
existence of closures. While your point about 48K of RAM is well-taken,
it's pretty much inapplicable here: I wouldn't be writing Ruby programs
to run in 48K. Hell, my operating system won't run in 48K, nor even a
hundred times that (I'm using Debian GNU/Linux Etch/Testing, if that
matters). I'm sure as heck not going to expect all my applications to
run in that environment.

Careless use of pointers can cripple not only the application, but the
whole system. Careless use of loops can crash it. Careless use of the
power cord can destroy the hardware. In the grand scheme of things,
closures are not a good example of a language feature that hinders
performance when we're talking about high-level languages such as Ruby.

Chad Perrin · Jul 26, 2006

I've crossed my eyes twice and still can't see it

Remember what your mother said: if you keep doing that, your eyes might
stick that way. Be careful.

Keith Gaughan · Jul 26, 2006

implemented. For example, Ruby has real closures, Python doesn't. I

Click to expand...

Even if OT, just for the sake of correctness: let me remark that Python
does have closures. Local functions (ones defined within another
function's body) are scoped lexically.

It's just sort of an anti-POLA (and inconvenient, as-is) piece of
semantics that variables get reinitalized upon assignment.

Hence:

def foo():
x = 5
def bar():
x = 6
return x
bar()
return x, bar

x, bar = foo()
print x, bar() ==> 5 6

def foo():
_x = [5]
def bar():
_x[0] = 6
return _x[0]
bar()
return _x[0], bar

x, bar = foo()
print x, bar() ==> 6 6

Click to expand...

Is it just me, or are there no proper closures in that example code?

No, Chad. There's closures in there. What you're not seeing is
anonymous functions, but closures are not the same as anonymous
functions.

Chad Perrin · Jul 26, 2006

No, Chad. There's closures in there. What you're not seeing is
anonymous functions, but closures are not the same as anonymous
functions.

Maybe I'm missing something critical about Python, but I don't see the
persistent code construct being passed from the function when its
lexical scope (assuming it's truly lexical, which it might not be in
this case) closes. It's only a closure if there's something persistent
that was "closed" by the scope closing.

Keith Gaughan · Jul 26, 2006

No, Chad. There's closures in there. What you're not seeing is
anonymous functions, but closures are not the same as anonymous
functions.

Clarification: like Java, Python won't let you assign to variables in
the outer scope, so you have to use an array or some other object to
hack around that if you need that functionality. I know, it sucks, but
the fact it doesn't allow you to assign to an outer scope doesn't stop
Python from having closures, just that it doesn't trust the developer
not to screw things up.

Here's a better example:

def foo():
x = [0]
def bar():
x[0] += 1
print x[0]
return bar

baz = foo()
baz() -> 1
baz() -> 2
baz() -> 3

Of course, this is better implemented as a generator.

Thomas E Enebo · Jul 26, 2006

There is no intermediate bytecode step for Perl, as far as I'm aware.
It's not a question I've directly asked one of the Perl internals
maintainers, but everything I know about the Perl compiler confirms my
belief that it simply does compilation to machine code.

I have not been on the perl train for years, but I believe for Perl5
at least this is not true. I remember Malcolm Beatties B module which
basically exposed the intermediate bytecodes that perl normally interprets.
That was some time ago and things may have changed?

Here is some documentation on this (it could be old but it seems to
match my memory):

http://www.faqs.org/docs/perl5int/c163.html

So it looks like Perl is somewhat similiar to Java (perhaps the other
way around since Perl's interpreter pre-dates Java). An analogy of the
difference would be that Perl is CISC and Java is RISC since Perl bytecode
is higher level. Maybe they JIT pieces?

-Tom

David Pollak · Jul 26, 2006

Is that heavily optimized Java vs. "normal" (untweaked) C?

No. That's heavily optimized Java vs. heavily optimized C. I spent a
fair amount of time chatting with the Excel team a while back. They
cared as much about performance as I did. They spent a lot more time
and money optimizing Excel than I did with Integer. They had far more
in terms of tools and access to compiler tools than I did (although
Sun was very helpful to me.)

What was at stake was not someone's desktop spreadsheet, but was the
financial trader's desk. Financial traders move millions (and
sometimes billions) of Dollars, Euros, etc. through their spreadsheets
every day. A 5 or 10 second advantage in calculating a spreadsheet
could mean a significant profit for a trading firm.

So, I am comparing apples to apples. A Java program can be optimized
to perform as well as a C program for *certain* tasks.

Chad Perrin · Jul 26, 2006

No. That's heavily optimized Java vs. heavily optimized C. I spent a
fair amount of time chatting with the Excel team a while back. They
cared as much about performance as I did. They spent a lot more time
and money optimizing Excel than I did with Integer. They had far more
in terms of tools and access to compiler tools than I did (although
Sun was very helpful to me.)

Excel isn't a very good point of comparison for C. For one thing, it's
not C -- it's C++. For another, it has a pretty bad case of featuritis.

Chad Perrin · Jul 26, 2006

I have not been on the perl train for years, but I believe for Perl5
at least this is not true. I remember Malcolm Beatties B module which
basically exposed the intermediate bytecodes that perl normally interprets.
That was some time ago and things may have changed?

Here is some documentation on this (it could be old but it seems to
match my memory):

http://www.faqs.org/docs/perl5int/c163.html

So it looks like Perl is somewhat similiar to Java (perhaps the other
way around since Perl's interpreter pre-dates Java). An analogy of the
difference would be that Perl is CISC and Java is RISC since Perl bytecode
is higher level. Maybe they JIT pieces?

I believe you are correct, with regard to an intermediate code step,
after all. I've done some research on it to refresh my memory. Whether
it continues to compile to a machine-readable executable or interprets
the intermediate code form is something I haven't been able to nail down
yet. I'll keep looking. Apparently, I was wrong somewhere along the
way.

Keith Gaughan · Jul 26, 2006

Excel isn't a very good point of comparison for C. For one thing, it's
not C -- it's C++. For another, it has a pretty bad case of featuritis.

Actually, the part that counts, calculation engine, comes in two
varieties: a slow but provably correct version, and a fast, highly
optimised version, a significant portion of which is written _assembly
language_. MS use a battery of regression tests to ensure that the fast
one always gives the same results as the slow one.

Just because the bits that aren't performance sensitive are written in
C++ doesn't mean that the rest of it is slow and bloated.

K.

Chad Perrin · Jul 26, 2006

Actually, the part that counts, calculation engine, comes in two
varieties: a slow but provably correct version, and a fast, highly
optimised version, a significant portion of which is written _assembly
language_. MS use a battery of regression tests to ensure that the fast
one always gives the same results as the slow one.

That might be the "part that counts" (nice pun) for calculation, but
it's not the only part that counts. Interface rendering, interactive
operations, and so on are also fairly important performance-wise -- at
least to the user. In fact, calculation waits can be easier to overlook
as a user than waits for the application to catch up when you click on a
button.

On the other hand, if we were specifically referring to things like
column calculation speed (of which I wasn't strictly aware), then your
point is made.

Keith Gaughan · Jul 26, 2006

That might be the "part that counts" (nice pun) for calculation, but
it's not the only part that counts.

As far as Excel goes, it is. It's the single biggest time sink in the
application.

Interface rendering, interactive
operations, and so on are also fairly important performance-wise

...most of which is down to Windows itself, not Excel. Excel's
contribution to that lag isn't, I believe, all that great. So in this
regard, your complaint is more to do with GDI and so on than with Excel
itself.

least to the user. In fact, calculation waits can be easier to overlook
as a user than waits for the application to catch up when you click on a
button.

Two point:

1. As far as I know, Excel runs its interface on one thread and the
calculation engine on another. This helps give the apprearance of
Excel being snappier than it actually is: you're able to work on the
spreadsheet while it's recalculating cells.

2. On simple spreadsheets, the lag isn't noticible. But Excel is
designed to be able to handle big spreadsheets well. That's why so
much work is put into the calculation engine rather than having an
interface which is completely fat free: in time critical applications,
it's the calculation engine that really matters.

I use Excel a lot, and have for a few years now. Grudgingly, mind you,
because I dislike having to deal with spreadsheets. But as far as MS
applications go, I think your accusations of slowness and bloat are a
little off the mark and better targeted towards its fellow MS Office
software.

Where Excel *does* fall down in turns of speed is disc I/O. There it can
be atrociously slow.

On the other hand, if we were specifically referring to things like
column calculation speed (of which I wasn't strictly aware), then your
point is made.

Recalculating a spreadsheet is something more that just calculating
columns. Excel itself is a Turing-complete dataflow machine. Getting
something like that which is both correct *and* fast is hard.

K.

Chad Perrin · Jul 26, 2006

No, Chad. There's closures in there. What you're not seeing is
anonymous functions, but closures are not the same as anonymous
functions.

Click to expand...

Clarification: like Java, Python won't let you assign to variables in
the outer scope, so you have to use an array or some other object to
hack around that if you need that functionality. I know, it sucks, but
the fact it doesn't allow you to assign to an outer scope doesn't stop
Python from having closures, just that it doesn't trust the developer
not to screw things up.

Here's a better example:

def foo():
x = [0]
def bar():
x[0] += 1
print x[0]
return bar

baz = foo()
baz() -> 1
baz() -> 2
baz() -> 3

That's a bit clearer -- and it does look like a proper closure. It also
looks unnecessarily complex to implement. Thanks for the example.

Weird Behavior with Rays in C and OpenGL	4	Feb 12, 2024
Homework in C - Help Needed	1	Oct 16, 2024
How to write variable in a python file then import it in another python file?	2	Dec 23, 2024
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
How do i get numberOfItemsHired to only accept 1-500 if it is outside those values error message should be displayed	10	Jul 5, 2024
Function is not worked in C	2	Jun 26, 2023
Write your own isascii in c programming	0	Nov 7, 2020
How do you correctly plot for a 3d graph because my code seems to be stuck in an infinite loop because it is unable to plot.	2	Jul 20, 2023

For performance, write it in C

Chad Perrin

Chad Perrin

Chad Perrin

Csaba Henk

Chad Perrin

Sean O'Halpin

Sean O'Halpin

Chad Perrin

Chad Perrin

Keith Gaughan

Chad Perrin

Keith Gaughan

Thomas E Enebo

David Pollak

Chad Perrin

Chad Perrin

Keith Gaughan

Chad Perrin

Keith Gaughan

Chad Perrin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads