For performance, write it in C

Chad Perrin · Jul 26, 2006

As far as Excel goes, it is. It's the single biggest time sink in the
application.

. .
I'll put it this way: it's not the only part that counts for me, and for
other spreadsheet users with whom I've discussed Excel in the past.

...most of which is down to Windows itself, not Excel. Excel's
contribution to that lag isn't, I believe, all that great. So in this
regard, your complaint is more to do with GDI and so on than with Excel
itself.

Excel doesn't run so well on Linux, so I'll just leave that lying where
it is.

Two point:

1. As far as I know, Excel runs its interface on one thread and the
calculation engine on another. This helps give the apprearance of
Excel being snappier than it actually is: you're able to work on the
spreadsheet while it's recalculating cells.

2. On simple spreadsheets, the lag isn't noticible. But Excel is
designed to be able to handle big spreadsheets well. That's why so
much work is put into the calculation engine rather than having an
interface which is completely fat free: in time critical applications,
it's the calculation engine that really matters.

. . and yet, the interface being fat-free would be awfully nice.
Instead, it gets fatter with every new version.

I use Excel a lot, and have for a few years now. Grudgingly, mind you,
because I dislike having to deal with spreadsheets. But as far as MS
applications go, I think your accusations of slowness and bloat are a
little off the mark and better targeted towards its fellow MS Office
software.

It's true that other MS Office applications are worse. That doesn't
make Excel perfect.

Where Excel *does* fall down in turns of speed is disc I/O. There it can
be atrociously slow.

I certainly won't disagree with that. Disk access seems to be another
problem -- though it's easier to overlook than some other issues, once a
spreadsheet is loaded and before it needs to be saved again.

Recalculating a spreadsheet is something more that just calculating
columns. Excel itself is a Turing-complete dataflow machine. Getting
something like that which is both correct *and* fast is hard.

I don't particularly see how that contradicts what I said. I may have
been more flippant in my reference to calculations than you'd like, but
I didn't say anything inaccurate.

Keith Gaughan · Jul 26, 2006

Excel doesn't run so well on Linux, so I'll just leave that lying where
it is.

In fairness, if you're judging it's performance based on running it in
Wine, that's not really a fair comparison.

K.

Keith Gaughan · Jul 26, 2006

Here's a better example:

def foo():
x = [0]
def bar():
x[0] += 1
print x[0]
return bar

baz = foo()
baz() -> 1
baz() -> 2
baz() -> 3

Click to expand...

That's a bit clearer -- and it does look like a proper closure. It also
looks unnecessarily complex to implement. Thanks for the example.

In practice, it's not really all that bad. Most of the places where I'd
end up using closures in, say Ruby and JavaScript, I'd end up using
generators, list comprehensions, &c. in Python. Having to name the inner
function's a bit of a pain, but generally I don't end up assigning to
the variables in the outer scope anyway, so that's not such a big deal
either.

Different stroke, and all that.

K.

Francis Cianfrocca · Jul 27, 2006

Ashley said:
I think the total data size is about 1.5GB, but the individual files
are smaller, the largest being a few hundred GB. The most rows in a
file is ~15,000,000 I think. The server I run it on has 2GB RAM (an
Athlon 3500+ running FreeBSD/amd64, so the hardware is not really an
issue)... it can get all the way through without swapping (just!)

The problem isn't the raw size of the dataset. It's the number of
objects you create and the amount of garbage that has to be cleaned up.
If you're clever about how you write, you can help Ruby by not creating
so much garbage. That can give a huge benefit.

The processing is pretty trivial, and mainly involves incrementing
some ID columns so we can merge datasets together, adding a text
column to the start of every row, and eliminating a few duplicates.

Eliminating the dupes is the only scary thing I've seen here. What's the
absolute smallest piece of data that you need to look at in order to
distinguish a dupe? (If it's the whole line, then the answer is 16
bytes- the length of the MD5 hash ;-)) That's the critical working set.
If you can't get the Ruby version fast enough, it's cheap and easy to
sort through 15,000,000 of them in C. Then one pass through the sorted
set finds your dupes. I've never found a consistently-fastest performer
among Ruby's several different ways of storing sorted sets.

Make sure that your inner loop doesn't allocate any new variables,
especially arrays- declare them outside your inner loop and re-use them
with Array#clear.

doesn't improve things, there's always the option of going dual-core
and forking to do independent files.

Obviously I haven't seen your code or your data, but if the Ruby app is
memory-bus-bound, then this approach may make your problem worse, not
better.

Good luck. I recently got a Ruby program that aggregates several LDAP
directory-pulls with about a million entries down from a few hours to a
few seconds, without having to drop into C. It can be done, and it's
kindof fun too.

Chad Perrin · Jul 27, 2006

In fairness, if you're judging it's performance based on running it in
Wine, that's not really a fair comparison.

I'm judging it based on running it on Windows. My point is that
divorcing it from the only environment in which it runs (natively) is
less than strictly sporting of you, when trying to discuss its
performance characteristics (or lack thereof).

William James · Jul 27, 2006

Kroeger said:
Hi Peter!

Whenever the question of performance comes up with scripting
languages
such as Ruby, Perl or Python there will be people whose
response can be
summarised as "Write it in C". I am one such person. Some people take
offence at this and label us trolls or heretics of the true
programming
language (take your pick).

Click to expand...

The last (and only) time I called someone a troll for saying
'Write it C' it was in response to a rails related question.
Further the OP asked for configuration items and such, but maybe
that's a whole other storry. (and of course you can write
C Extensions for rails... yeah, yadda, yadda )

..snip 52 lines Perl, some hundred lines C ...

[Latin]$ time ./Latin1.pl 5 > x5

real 473m45.370s
user 248m59.752s
sys 2m54.598s

[Latin]$ time ./Latin4.pl 5 > x5

real 12m51.178s
user 12m14.066s
sys 0m7.101s

[Latin]$ time ./c_version.sh 5

real 0m5.519s
user 0m4.585s
sys 0m0.691s

Click to expand...

Just to show the beauty of ruby:
-----------------------------------------------------------
require 'rubygems'
require 'permutation'
require 'set'

$size = (ARGV.shift || 5).to_i

$perms = Permutation.new($size).map{|p| p.value}
$out = $perms.map{|p| p.map{|v| v+1}.join}
$filter = $perms.map do |p|
s = SortedSet.new
$perms.each_with_index do |o, i|
o.each_with_index {|v, j| s.add(i) if p[j] == v}
end && s.to_a
end

$latins = []
def search lines, possibs
return $latins << lines if lines.size == $size
possibs.each do |p|
search lines + [p], (possibs -
$filter[p]).subtract(lines.last.to_i..p)
end
end

search [], SortedSet[*(0...$perms.size)]

$latins.each do |latin|
$perms.each do |perm|
perm.each{|p| puts $out[latin[p]]}
puts
end
end

Here's a much slower version that has no 'require'.

Wd = ARGV.pop.to_i
$board = []

# Generate all possible valid rows.
Rows = (0...Wd**Wd).map{|n| n.to_s(Wd).rjust(Wd,'0')}.
reject{|s| s=~/(.).*\1/}.map{|s| s.split(//).map{|n| n.to_i + 1} }

def check( ary, n )
ary[0,n+1].transpose.all?{|x| x.size == x.uniq.size }
end

def add_a_row( row_num )
if Wd == row_num
puts $board.map{|row| row.join}.join(':')
else
Rows.size.times { |i|
$board[row_num] = Rows
if check( $board, row_num )
add_a_row( row_num + 1 )
end
}
end
end

add_a_row 0

Keith Gaughan · Jul 27, 2006

I'm judging it based on running it on Windows. My point is that
divorcing it from the only environment in which it runs (natively) is
less than strictly sporting of you, when trying to discuss its
performance characteristics (or lack thereof).

Wait... I did no such thing. All I said was that what interface
sluggishness you get from Excel can't be blamed on Excel. They're
performance characteristics that *can* be divorced from Excel (because
they're Window's own performance characteristic, not Excel's). Argue
those points, and you're arguing about the wrong software.

But Wine is an emulator, and while it does a good job approaching the
speed of Windows, it doesn't hit it, nor can it hit it. You're not
comparing like with like. Now that's far from sporting.

You're argument it disingenuous. Consider Cygwin running on Windows
compared to FreeBSD running on the same machine. I can make this
comparison because the machine I'm currently using dual-boots such a
setup. I run many of the same applications under Cygwin as I do under
FreeBSD on the same box. Those same applications running under Cygwin
are noticably slower than the native equivalents under FreeBSD. Do I
blame the software I'm running under Cygwin for being slow? No, because
I'm well aware that it zips along in its native environment. Do a blame
Cygwin? No, because it does an awful lot of work to trick the software
running under it that it's running on a *nix. Do I blame Windows? No,
because I use some of that software--gcc being an example--natively
under Windows and it performs just as well as when it's ran natively
under FreeBSD. Bringing Wine in is a red herring. Software cannot be
blamed for the environment it's executed in.

K.

Chad Perrin · Jul 27, 2006

Wait... I did no such thing. All I said was that what interface
sluggishness you get from Excel can't be blamed on Excel. They're
performance characteristics that *can* be divorced from Excel (because
they're Window's own performance characteristic, not Excel's). Argue
those points, and you're arguing about the wrong software.

Design decisions that involve interfacing with interface software that
sucks is related to the software under discussion -- and not all of the
interface is entirely delegated to Windows, either. No software can be
evaluated for its performance characteristics separate from its
environment except insofar as it runs without that environment.

But Wine is an emulator, and while it does a good job approaching the
speed of Windows, it doesn't hit it, nor can it hit it. You're not
comparing like with like. Now that's far from sporting.

Actually, no, it's not an emulator. It's a set of libraries (or a
single library -- I'm a little sketchy on the details) that provides the
same API as Windows software finds in a Windows environment. An
emulator actually creates a faux/copy version of the environment it's
emulating. It is to Linux compared with Unix as an actual emulator is
to Cygwin compared with Unix: one is a differing implementation and the
other is an emulator.

. . and, in fact, there are things that run faster via Wine on Linux
than natively on Windows.

[ snip ]

under FreeBSD. Bringing Wine in is a red herring. Software cannot be
blamed for the environment it's executed in.

I didn't bring it up. You did. I made a comment about Excel not
working in Linux as a bit of a joke, attempting to make the point that
saying Excel performance can be evaluated separately from its dependence
on Windows doesn't strike me as useful.

Peter Hickman · Jul 27, 2006

On my machine it took around 33 seconds but I think that I can improve
it a little, besides I have to test the results first.

Ashley Moran · Jul 27, 2006

Eliminating the dupes is the only scary thing I've seen here. What's the
absolute smallest piece of data that you need to look at in order to
distinguish a dupe? (If it's the whole line, then the answer is 16
bytes- the length of the MD5 hash ;-)) That's the critical working set.
If you can't get the Ruby version fast enough, it's cheap and easy to
sort through 15,000,000 of them in C. Then one pass through the sorted
set finds your dupes. I've never found a consistently-fastest performer
among Ruby's several different ways of storing sorted sets.

Make sure that your inner loop doesn't allocate any new variables,
especially arrays- declare them outside your inner loop and re-use them
with Array#clear.

Nice MD5 trick! I'll remember that. Fortunately the files that need
duplicate elimination are really small, so I won't need to resort to that.
But I'll remember it for future reference.

Obviously I haven't seen your code or your data, but if the Ruby app is
memory-bus-bound, then this approach may make your problem worse, not
better.

Hadn't thought of that, good point...

Good luck. I recently got a Ruby program that aggregates several LDAP
directory-pulls with about a million entries down from a few hours to a
few seconds, without having to drop into C. It can be done, and it's
kindof fun too.

Next time I get a morning free I might apply some of the tweaks that have been
suggested. Might be interested to see how much I can improve the
performance.

Cheers
Ashley

Ashley Moran · Jul 27, 2006

With a 92% cut in code weight, I can certainly sympathize with that
sentiment. =A0Wow.

Even the last remaining member of the Anti-Ruby Defence League in my office=
=20
admitted (reluctantly) that it was impressive!

Ashley

=2D-=20
"If you do it the stupid way, you will have to do it again"
- Gregory Chudnovsky

Isaac Gouy · Jul 27, 2006

Hal said:
JIT is the key to a lot of that. Performance depends greatly on
the compiler, the JVM, the algorithm, etc.

I won a bet once from a friend. We wrote comparable programs in
Java and C++ (some arbitrary math in a loop running a bazillion
times).

With defaults on both compiles, the Java was actually *faster*
than the C++. Even I didn't expect that. But as I said, this
sort of thing is highly dependent on many different factors.

Hal

Sometimes when we look at different workloads we can see the
performance crossover when relatively slow startup is overcome, code
JIT'd and adaptive optimisation kicks in

http://shootout.alioth.debian.org/g...nbody&p1=java-0&p2=gcc-0&p3=clean-0&p4=java-0

Tim Bray · Jul 27, 2006

Sorry for coming late to the party.

Whenever the question of performance comes up with scripting
languages such as Ruby, Perl or Python there will be people whose
response can be summarised as "Write it in C".

The conclusion is wrong in the general case. Suppose that, instead
of computing permutations, your task had been to read ten million
lines of textual log files and track statistics about certain kinds
of events coded in there. I bet a version coded in perl, making
straightforward uses of regexes and hashes, would have performance
that would be very hard to match in C or any other language. Ruby
would be a little slower I bet just because Perl's regex engine is so
highly-tuned, although it's been claimed Oniguruma is faster.

So, first gripe: C is faster than Ruby *in certain problem domains*.
In others, it's not.

Second gripe. The notion of doing a wholesale rewrite in C is almost
certainly wrong. In fact, the notion of doing any kind of serious
hacking, without doing some measuring first, is almost always wrong.
The *right* way to build software that performs well is to write a
natural, idiomatic implementation, trying to avoid stupid design
errors but not worrying too much about performance. If it's fast
enough, you're done. If it's not fast enough, don't write another
line of code till you've used used a profiler and understand what the
problem is. If in fact this is the kind of a problem where C is
going to do better, chances are you only have to replace 10% of your
code to get 90% of the available speedup.

And don't remember to budget downstream maintenance time for the
memory-allocation errors and libc dependencies and so on that cause C
programs to be subject to periodic downstream crashes.

-Tim

John W. Kennedy · Jul 27, 2006

David said:
There are some applications that will never perform as in Java (e.g.,
stuff that's heavily oriented to bit manipulation.) But for many
classes of applications (e.g., spreadsheets) Java can perform as well
as C.

Actually, it's notorious that a Sieve of Eratosthenes using the
respective Bitset classes is faster in Java than in C++.

I don't get it, either, but I've verified it myself.

John W. Kennedy · Jul 27, 2006

Chad said:
The canonical example for comparison, I suppose, is the Java VM vs. the
Perl JIT compiler. In Java, the source is compiled to bytecode and
stored. In Perl, the source remains in source form, and is stored as
ASCII (or whatever). When execution happens with Java, the VM actually
interprets the bytecode. Java bytecode is compiled for a virtual
computer system (the "virtual machine"), which then runs the code as
though it were native binary compiled for this virtual machine. That
virtual machine is, from the perspective of the OS, an interpreter,
however. Thus, Java is generally half-compiled and half-interpreted,
which speeds up the interpretation process.

Huh? Small-system (PDA, etc.) Java implementations may use bytecode
interpreters, but the mainstream ones started out using JIT, and were
later upgraded to start execution by interpreting, and then, if and when
it observes that a given segment is being repeatedly executed, compile
it to native code.

John W. Kennedy · Jul 27, 2006

Chad said:
Ada, on the other hand -- for circumstances in which it is most commonly
employed (embedded systems, et cetera), it does indeed tend to kick C's
behind a bit. That may have more to do with compiler optimization than
language spec, though.

Language specs mean a good deal, actually. C semantics (except in C99,
/if/ the restrict keyword is consistently used by the coder and
seriously implemented by the optimizer) lead to unnecessarily slow
object code in many common cases. Ada's in-language support of tasking
also makes it easier for the compiler to know whether a segment of code
will or will not be multithreaded, which allows the compiler to optimize
more aggressively in many instances.

These are two of the reasons that so many C compilers include an
"optimize beyond safe limits" switch.

Indeed, many aspects of C's design cause optimization problems.
0-terminated strings were dandy on 8-bit processors, but the S/370
actually had to add some new opcodes just to make strcpy and strcmp
tolerably fast.

Ada has another advantage in that the language design strongly
encourages designing global optimization into the compiler, whereas the
C tradition of make files tends to discourage it.

Chad Perrin · Jul 28, 2006

Huh? Small-system (PDA, etc.) Java implementations may use bytecode
interpreters, but the mainstream ones started out using JIT, and were
later upgraded to start execution by interpreting, and then, if and when
it observes that a given segment is being repeatedly executed, compile
it to native code.

Wait . . . what? When some Java applet (for example) is sent over an
HTTP connection to your computer to be executed cient-side, it is NOT
just source code. Similarly, when you install a Java application, it
too is NOT simply copied onto the system in source code form. It's
compiled to bytecode (or whatever the hell you want to call it) and
distributed thusly, for the JVM to run it.

Chad Perrin · Jul 28, 2006

Language specs mean a good deal, actually.

I didn't say they didn't. I said only that *in this case* it *may* be
that it's more a matter of compiler optimization than language spec. I
don't know enough about Ada to be able to comment authoritatively on the
comparison, but I certainly do know that language design can have an
effect (via the requirements it imposes on the implementation).

Chad Perrin · Jul 28, 2006

Interpretation does not necessarily mean raw source code is being processed.
Even interpreters parse raw source into a form they can understand.
Interpretation in the Java VM comes in the form of bytecode interpretation,
so called because instead of the system CPU running native operations it's
running another process that steps through the bytecodes. This is what's
typically called "interpreted mode" in the JVM. However every VM since Java
1.3 has taken the next step at run time and compiled that bytecode into
native processor instructions, so that the interpreter is no longer involved
for those compiled pieces.

Bytecode is what's distributed, yes, but it's little more than pre-parsed
and lightly optimized source code. You can convert it back to source, if you
like. Your definition of "interpreted" is too narrow.

How do you figure? You just reiterated, in different words, everything
I said, then held it up as "proof" I'm "wrong". I think you're
violently agreeing with me, or something, and don't realize it.

Keith Gaughan · Jul 28, 2006

How do you figure? You just reiterated, in different words, everything
I said, then held it up as "proof" I'm "wrong". I think you're
violently agreeing with me, or something, and don't realize it.

No, just that you left out the bit about JIT compilation into native
code.

K.

Weird Behavior with Rays in C and OpenGL	4	Feb 12, 2024
Homework in C - Help Needed	1	Oct 16, 2024
How to write variable in a python file then import it in another python file?	2	Dec 23, 2024
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
How do i get numberOfItemsHired to only accept 1-500 if it is outside those values error message should be displayed	10	Jul 5, 2024
Function is not worked in C	2	Jun 26, 2023
Write your own isascii in c programming	0	Nov 7, 2020
How do you correctly plot for a 3d graph because my code seems to be stuck in an infinite loop because it is unable to plot.	2	Jul 20, 2023

For performance, write it in C

Chad Perrin

Keith Gaughan

Keith Gaughan

Francis Cianfrocca

Chad Perrin

William James

Keith Gaughan

Chad Perrin

Peter Hickman

Ashley Moran

Ashley Moran

Isaac Gouy

Tim Bray

John W. Kennedy

John W. Kennedy

John W. Kennedy

Chad Perrin

Chad Perrin

Chad Perrin

Keith Gaughan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads