python/ruby benchmark.

Austin Ziegler · Jun 12, 2005

Austin Ziegler wrote:
-snip-
Never once needed to implement a recursive function?

Not an Ackermann recursive. Only simple recursive. Most of the time, I
haven't even needed that. There's an important point there. Indeed, I
have sometimes gone in and changed recursive into iterative because it
was too expensive to implement as recursion.

But what are you really testing with this? If you want to test stack
winding and unwinding speeds, then you're probably testing the wrong
thing here. If you're testing something else, it's not clear. And
that's the damnable thing about the whole alioth shootout -- you
refuse to take an editorial stance anywhere (well, sort of) on the
interpretation of the numbers, leaving them to stand on their own --
which leads to people making stupid assumptions about them. You state
that you refuse to take an editorial stance, but you end up doing so
-- both by what tests you accept for the shootout and by what
implementations you'll accept.

I stand by what I said, though: the alioth shootout is a waste of
everyone's time. Frankly, I think it would be better if it were just
taken down.

Some details are always specific to a problem domain and application,
but the same representations and approaches are used across problem
domains and across applications - yes the details are from DNA sequence
analysis, but the programs process ascii strings and hashtables.

Perhaps. But more often than not, the approaches aren't perfectly portable.

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)

Steven Jenkins · Jun 13, 2005

Austin said:
Huh. They didn't do it with benchmarks.

They did. I was there; you weren't. But don't let facts get in the way.

Steve

Austin Ziegler · Jun 13, 2005

They did. I was there; you weren't. But don't let facts get in the way.

Bully for you. Were you there when they mixed unit systems, too? Did
they do *that* with benchmarks?

Sorry, but I don't buy the argument that they did this with
benchmarks. Measurements, perhaps, of the performance of their own
programs (or even prototypes), but *not* with benchmarks. In other
words, I think you're full of it and I'm calling you on it. Provide
some evidence that the decisions were made with benchmarks -- "A
standard by which something can be measured or judged."

Tell me that they did prototyping and I'll believe you. Tell me that
they measured *performance* and I'll believe you. Tell me that
benchmarks were involved in the process and I won't believe you unless
you provide some evidence of this claim.

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)

Isaac Gouy · Jun 13, 2005

Austin said:
Not an Ackermann recursive. Only simple recursive. Most of the time, I
haven't even needed that. There's an important point there. Indeed, I
have sometimes gone in and changed recursive into iterative because it
was too expensive to implement as recursion.

And in some other language implementations that change is unecessary -
perhaps that's the point.

-snip-

And that's the damnable thing about the whole alioth shootout -- you
refuse to take an editorial stance anywhere (well, sort of) on the
interpretation of the numbers, leaving them to stand on their own --
which leads to people making stupid assumptions about them.

-snip-

Alioth Shootout - a Rorschach test for programmers.

Steven Jenkins · Jun 13, 2005

Austin said:
Bully for you. Were you there when they mixed unit systems, too? Did
they do *that* with benchmarks?

Yes, I was there. It's irrelevant to the discussion, but I was there.

[badgering elided]

Tell me that they did prototyping and I'll believe you. Tell me that
they measured *performance* and I'll believe you. Tell me that
benchmarks were involved in the process and I won't believe you unless
you provide some evidence of this claim.

I've said what I have to say. Believe it or don't, as you see fit. We're
done.

Steve

Lothar Scholz · Jun 13, 2005

Hello Michael,

MC> How does this issue relate to *ruby*? The same charge could be levied
MC> at python, even J2EE. "What happens if you don't have the resources
MC> to speed up something slow?"

If a language is a few times faster then the risk factor that this
happens is just lower. This is a management descision.

There are many things to keep in balance when you start a project.
Development time is just one of them and often not that important.

I can just speak for the application domain where i worked for years:
This is products for the mass market. Here you don't have something
like "your customer". In this market you must look a lot about "your
competitor" and this makes it different. Spending a few month more on
development is just okay if the (peak !!) speed is well and
competitive.

This is a domain where Ruby is unfortunately not useable at the
moment (missing a good GUI toolkit is the second reason). And i would
like to see it possible to use it there too.

Phil Tomson · Jun 13, 2005

Huh. They didn't do it with benchmarks. (And I could very easily point
out that those same people have screwed up massively -- forgetting to
convert between English units and Metric units?) Look closely at the
people who espouse benchmarks. They're mostly marketers or fools who
can't tell the difference.

There are *real* measures to deal with; they're not benchmarks. They
aren't and never have been.

For a first person shooter, the real measure is "is the game fun?" The
answer will be different for everyone, but there are some objective
things that will break the "fun" factor for just about everyone.
Frames Per Second. Load Time. These things should be as fast as they
possibly can. Gigaflops never enters the question here. Nor does
specmark or anything else like that.

For image manipulations, they need to be quick. But not once does the
Ackermann function ever enter the question.

No, but as you've said, you need quick response from a game in order for
it to be engaging. It comes down to frames per second (as you note).

Never *once* have I needed to implement an Ackermann function. Not
once. In my entire career.

No one is saying that you would need to. The Ackermann benchmark can be
a measure of how well your language handles recursion (of course it can
be written iteratively as well). Sure it may not be something that you'd
ever use, but if someone is perusing the benchmark results and sees that
Ruby falls way behind gawk in this particular benchmark, they're likely
to draw some conclusions. Now you and I might think that the conclusions
they draw are ot fair or accurate, but that doesn't matter. If I've
never encountered Ruby before and all I know of it is from the alioth
benchmarks, the conclusions I come to will not be positive.

I look at the crap that is on alioth and
there's very little that represents common use. There's some neat
things -- the new DNA transformation ones -- but exactly how many
people will actually be using that in their work?
I won't. Wouldn't have my entire career.

Not everyone is doing web development. Some people work in interesting
areas like Bioinformatics. It's great that Ruby is finding a niche in web
development with Rails, but I really hope that doesn't translate into
Ruby only aspiring to be another PHP (ie. being perceived as a language
that is only useful for web development - I'm afraid that's already
starting to happen).

The fact is that it's in various scientific areas where you'll find the
most compute-intensive applications (the other area would be gaming), so
including DNA transformation benchmarks can be very informative.

Measures that mattered to me
at my last job were "how many bills can I generate in an hour?" At my
current job "what's the average backup speed throughput for this?"

If we're not getting the performance we need, we fix the damn problem.

Good; we can all agree on this ;-)

We don't rely on "benchmarks" -- we rely on real world measurements of
our real problems. Not on pseudo-crap like gigaflops or specmark or
the speed of an airborne swallow. Actually, strike that. The last is
useful.

Again, benchmarks don't tell the whole story, but unfortuneately many
will judge Ruby based on these benchmarks - that's just reality. What
are you going to do, buy banner ads on the alioth site which read "Don't
trust these benchmarks!". I don't think that'll work.

Since benchmarks tend to be a uniform measure (we can't all trot out
our 'real world' code to be translated into all the popular languages and
then tested and timed in each one - and if we did that would just become
the next benchmark, wouldn't it?). A good set of benchmarks needs to be
easily implemented in all of the languages being compared and they also
need to exercise a lot of different features that would show up in real
world programs. I'm going to give the alioth folks the benefit of the
doubt here and assume that they want to develop a 'good' set of
benchmarks that can be used to measure relative performance of various
languages. We can either dismiss their efforts with various
pejoratives (which will make people outside of our community wonder
what we're trying to hide) or help out.

Phil

Austin Ziegler · Jun 13, 2005

Hello Michael,
If a language is a few times faster then the risk factor that this
happens is just lower. This is a management descision.

Sometimes yes, sometimes no. More often than I would like, but less
often, I imagine, than you think.

There are many things to keep in balance when you start a project.
Development time is just one of them and often not that important.

Actually, in my experience -- both with cusomisable largeware (a
billing system for ISPs) and with consumer-oriented software --
development time is one of the top items of concern. Not *the* top,
but definitely a component in the top (which is usually support
costs, a combination of tech support time, QA time, and developer
time, the cost of each rising).

This is a domain where Ruby is unfortunately not useable at the
moment (missing a good GUI toolkit is the second reason). And i
would like to see it possible to use it there too.

I'd argue that Ruby's speed is secondary to the lack of a good
cross-platform GUI kit. I, too, would like to see Ruby perform
faster than it does. But I don't think that satisfying the alioth
shootout is going to make that happen. The problem *is* known, and
the solutions are at hand. If moving slowly.

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)

lypanov · Jun 14, 2005

you said:
If we keep telling ourselves that Ruby is 'fast
enough' for our application (and it may well be) are we going to be
sitting still while other languages improve performance?

luckily there is work on this.
what i don't get it why everyone
talks about it rather than actually working on it

Alex

Austin Ziegler · Jun 15, 2005

So, do we call that "benchmarketing"?

You know, it's a good thing that I didn't have a drink in my hands
when I read this.

Yeah. Benchmarketing. I like that.

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)

Steven Jenkins · Jun 15, 2005

Ralph said:
One observation I would make would be that you set up
benchmark/test/simulation that was very relevant to your problem domain,
you didn't use some industry standard MIPS or TPH or such. That is the
real problem with the type of benchmarks such as spawned the debate
here, such things are interesting and I like to look at them, but that's
as far as their usefulness go.

I said I had other examples

.

It's been a long time since I was involved in one, but I'm reasonably
confident that we use "standard" benchmarks for large procurements. When
you spend US Government money, you have to jump through a lot of hoops
to ensure a level competitive playing field. A protest from a losing
bidder can tie you up for a long time, so you try to avoid that. Using
your own benchmarks for procurement qualification invites protest.

Nobody wins just because their TPC-A or whatever is highest. A Request
for Proposal may give a particular performance threshold, and the
proposing vendors use that to decide which of their products to propose.
They don't want to propose anything more expensive than they have to,
because they're in a cost competition. It's a rough and imperfect but
vendor-neutral way to talk about classes of performance. The real key is
that, if the vendors buy into it, they can't protest on that point.

Obviously, if one vendor claims dramatically better performance than
another in the same price class, that might be worth looking to. For the
most part, however, the benchmarks just establish who's in the game, and
most of the competition is on cost.

Steve

Stephen Kellett · Jun 15, 2005

Steven Jenkins said:
It's been a long time since I was involved in one, but I'm reasonably
confident that we use "standard" benchmarks for large procurements.

I think some people have lost sight of what "benchmark" means. For
computer apps some people have been claiming its TPS, MIPS or whatever
form of throughput they are proposing. However, take a step back and
think about "benchmark" in more general terms and you get a better idea
for what a benchmark is. This is what Steven Jenkins was identifying
with his satellite TCP/IP benchmark.

A benchmark is something, anything by which you can compare. Typically
it is the best of breed at some point or other. Here is an example:

I play various musical instruments, one of them being the Border Bagpipe
made by Jon Swayne. Jon Swayne is a legend in his own lifetime to many
dancers and many musicians in the UK. For dancers it is because he is
part of Blowzabella, a major musical force in social dancing throughout
the last 25 years. For musicians, and particularly bagpipers, it is
because he took the bagpipe, an instrument known for not typically being
in tune, and if it was, not necessarily in tune with another bagpipe of
the same type (or even by the same maker!) and creating a new standard,
a new benchmark, if you will, by which other bagpipes are judged. Its
not just Jon Swayne, there are some other makers, but they changed
everyones perception and his pipes are the benchmark by which others are
judged (yes, they really are that good). When you talk to pipers in the
UK and mention his name there is a respect that is accorded. You don't
get that without good reason. Anyway I digress.

The benchmark for Steven's satellite test was did it match the
round-trip criteria. I think absolutely Steven's example is a benchmark.
Its much looser than other benchmarks, but thats not the point. The
point is did it serve a purpose?

For other people the benchmark will be does it perform the test within a
given tolerance? For other people it may be how much disk space does it
use? or is the latency between packets between X and Y? For other people
it will be is it faster than X?

Where Austin's point comes in is that he points out the latter test is
meaningless because you are comparing apples with oranges, when you
should really be comparing GMO engineered (optimized) apples with GMO
(optimized) oranges to be even getting close to a meaningful test. Even
so you are still comparing cores to segments and it gets a bit messy
after that, although they both have pips.

Even so, I once worked for a GIS company (A) that wrote their software
in C with an in-house scripting language. We won the benchmarks when in
competition with other GIS companies. The competition won because of
clever marketing. Their customers lost (*) though because the
competitors software was too hard to configure and our marketing people
were not smart enough to identify this and inform the customer of the
problem.

What sort of benchmarks were being tested?
o Time to compute catchment area of potential customer base within X
minutes drive given a drive time to location.
o Time to compute catchment area of potential customer base within X
minutes drive given a drive time from location.
o Time to compute drive time to location of potential customer base
within X minutes drive given a particular post code area.
o Time to compute drive time from location of potential customer base
within X minutes drive given a particular post code area.
o Think up any other bizarre thing you want.

Times to and from are/location may not be the same because of highway
on/off ramps, traffic light network delay bias and one-way systems.
Superstores often don't care much about drive time from, but care a lot
about drive-time to. For example drive time from may be 15mins, but
drive-time to may be only 5mins.

As you can see the customer requirements are highly subjective, but the
raw input data is hard data - maps and fixed road networks. The
computing time etc, thats also a fixed reality given the hardware.

Its all about perception and need.

I think the benchmarketing term is quite apt for most benchmarks.

....and Steven, your story was great. I could really relate to a lot of
that.

Stephen

(*) Its a matter of debate, they also used an in-house language and
finding non-competitor engineers that used the language was nigh on
impossible and thus they were very expensive to hire to do the
configuration. Our (A) stuff was not so configurable, but didn't need to
be.

When were we doing this stuff? 90..94 for me. X11 and Motif was the cool
stuff back then.

Ralph \PJPizza\ Siegler · Jun 16, 2005

I said I had other examples .

It's been a long time since I was involved in one, but I'm reasonably
confident that we use "standard" benchmarks for large procurements. When
you spend US Government money, you have to jump through a lot of hoops
to ensure a level competitive playing field. A protest from a losing
bidder can tie you up for a long time, so you try to avoid that. Using
your own benchmarks for procurement qualification invites protest.

Nobody wins just because their TPC-A or whatever is highest. A Request
for Proposal may give a particular performance threshold, and the

I used to do some spending of U.S. D.O.E. money at Fermilab for servers/workstations/networks for CADD/CAE , as you say the standard benchmarks were a starting point to see what vendors might be considered, but for justifications the capabilities for in-house needs were the main thing. My projects were in $100-$200K range, surely a few orders of magnitude smaller than your NASA ones, with the procurement requirements not as burdensome.

Our group made civil engineering packages (all those tunnels and collision halls) for outside bid, and of course there the spec book that accompanied the drawings was what ruled. That could be called a set of benchmarks, I suppose; they were a mix of construction industry standards and what our engineers had calculated.

Ralph "PJPizza" Siegler

Performance metrics	2	Oct 16, 2008
Signum Benchmark revisited	13	Feb 19, 2006
Ruby Weekly News 5th - 11th June 2006	0	Jun 14, 2006
Ruby 1.9.1 installation	0	Jun 19, 2010
Ruby Weekly News 6th - 12th June 2005	0	Jun 14, 2005
Run python script with ./	0	Apr 5, 2013
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
[shootout] n body problem	22	Mar 12, 2005

python/ruby benchmark.

Austin Ziegler

Steven Jenkins

Austin Ziegler

Isaac Gouy

Steven Jenkins

Lothar Scholz

Phil Tomson

Austin Ziegler

lypanov

Austin Ziegler

Steven Jenkins

Stephen Kellett

Ralph \PJPizza\ Siegler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads