a trival array/ hash benchmark

Rainer Weikusat · Mar 20, 2013

When running the trivial microbenchmark

-----------
use Benchmark;

my $h = { Worschtsupp => 4 };
my $a = [4];

timethese(-5,
{
h => sub { return $h->{Worschtsupp}; },
a => sub { return $a->[0]; }});
------------

on 'some computer', the result of three runs was that the hash lookup
ran at about 28.31% of the speed of the array access on average.
10,000,000 hash lookups are needed in order to spend 1s of processing
time solely on that and about 33,333,333 could have been done in the
same time.

Rainer Weikusat · Mar 21, 2013

Eli the Bearded said:
In comp.lang.perl.misc said:

When running the trivial microbenchmark

Click to expand...

I love these sorts of things, so I tried to duplicate your results.

-----------
use Benchmark;

my $h = { Worschtsupp => 4 };
my $a = [4];

timethese(-5,
{
h => sub { return $h->{Worschtsupp}; },
a => sub { return $a->[0]; }});

Click to expand...

Why not post actual results?

They are not really meaningful anywhere except on the computer where I
tested this _and_ with the perl version where I tested this.

When I run it three times:

a: 5 wallclock secs ( 5.30 usr + 0.00 sys = 5.30 CPU) @ 18706996.60/s (n=99147082)
h: 4 wallclock secs ( 5.24 usr + 0.00 sys = 5.24 CPU) @ 15713758.59/s (n=82340095)

a: 6 wallclock secs ( 5.57 usr + 0.00 sys = 5.57 CPU) @ 28009483.30/s (n=156012822)
h: 6 wallclock secs ( 5.27 usr + 0.00 sys = 5.27 CPU) @ 24815075.90/s (n=130775450)

a: 5 wallclock secs ( 5.21 usr + 0.00 sys = 5.21 CPU) @ 19115772.55/s (n=99593175)
h: 4 wallclock secs ( 5.03 usr + 0.00 sys = 5.03 CPU) @ 22237998.61/s (n=111857133)

Eg, this suggests that 5.14.2 does something more intelligent (for a
certain definition of 'intelligent') for a static hash lookup than 'do
it from scratch every time', possibly even because hordes of "stupid
hash users" (like herds of mooses) made it more worthwhile to put an
effort into this. This could lead to an 'interesting' arms race where
some people devise more elaborate benchmarks in order to 'catch them
red-handed' while the people supposed to be caught devise more
elaborate ways to fool benchmarks.

The point of this posting was mainly to provide others with a way to
determine what the difference is for them with some simple test case
and to demonstrate that there's a difference at all (When in court, at
least in a civil case, a 'good' lawyer in a losing position will
simply deny everything the other party claims[*], including that the
sun rises in the morning, because this means everything had to be
proven which ensures that the case will grind down to a halt in
technicalities 'for a long time'. This tactic is even more useful to
cover unwelcome statements on USENET with surreptitious noise so that
nobody hears them).

[*] Real world example: I had the mispleasure to be in a court case
because somebody ran his car into my car because he ignored certain
traffic sign some ten years ago. Among the things which were denied
by the lawyer of the other party was that a road running uphill to the
next village right at the edge of the town would actually run uphill
and that the ignored traffic sign stood there at all.

Rainer Weikusat · Mar 21, 2013

Ben Morrow said:
Quoth Rainer Weikusat said:

When running the trivial microbenchmark

-----------
use Benchmark;

my $h = { Worschtsupp => 4 };
my $a = [4];

timethese(-5,
{
h => sub { return $h->{Worschtsupp}; },
a => sub { return $a->[0]; }});
------------

on 'some computer', the result of three runs was that the hash lookup
ran at about 28.31% of the speed of the array access on average.
10,000,000 hash lookups are needed in order to spend 1s of processing
time solely on that and about 33,333,333 could have been done in the
same time.

Click to expand...

Like most microbenchmarks, this tells you very little about real code.
Try something a bit more realistic, like

#!/opt/perl/bin/perl

use Benchmark qw/cmpthese/;

sub one_a { my ($self) = @_; $self->[0]; }
sub one_h { my ($self) = @_; $self->{a}; }
sub two_a { my ($self) = @_; $self->one_a; }
sub two_h { my ($self) = @_; $self->one_h; }
sub if_a { my ($self) = @_; if (rand > 0.2) { $self->two_a } }
sub if_h { my ($self) = @_; if (rand > 0.2) { $self->two_h } }

This includes all kinds of unrelated effects in the test case and, as
you yourself admitted ('as I suspected ...'), even unrelated effects
you specifically included because you expected them to render the
result useless for comparing hash and array lookups. Unsurprisingly,
it became (almost) useless for this particular purpose.

[numbers deleted]

the method-call overhead completely dominates the
overhead of the hash lookup. If you can save one method call, you will
save more time than you would have saved by using an arrayref;

Putting this in another way: The overhead of decoupling the so-called
'classs methods' from 'the object representation' with a layer of
'more privileged core class methods' aka 'accessors' is so high that
the actual representation doesn't matter anymore which means that the
theoretical benefit of that ('the representation could be changed
without affecting the 'set of general-purpose subroutines needlessly
tied to the so-called class) isn't worthwhile: The cure is worse than
the disease.

Rainer Weikusat · Mar 21, 2013

Rainer Weikusat said:
Ben Morrow said:

Quoth Rainer Weikusat <[email protected]>:

Click to expand...

[...]

-----------
use Benchmark;

my $h = { Worschtsupp => 4 };
my $a = [4];

timethese(-5,
{
h => sub { return $h->{Worschtsupp}; },
a => sub { return $a->[0]; }});
------------

Click to expand...

Click to expand...

[...]

Like most microbenchmarks, this tells you very little about real code.
Try something a bit more realistic, like

Click to expand...

[...]

the method-call overhead completely dominates the
overhead of the hash lookup. If you can save one method call, you will
save more time than you would have saved by using an arrayref;

Click to expand...

Putting this in another way: The overhead of decoupling the so-called
'classs methods' from 'the object representation' with a layer of
'more privileged core class methods' aka 'accessors' is so high that
the actual representation doesn't matter anymore which means that the
theoretical benefit of that ('the representation could be changed
without affecting the 'set of general-purpose subroutines needlessly
tied to the so-called class) isn't worthwhile: The cure is worse than
the disease.

Two things which kept bothering me about this:

1. My statement above is a little disingenious in this context: The
method call overhead is higher and the 'The cure ...' is - at best -
an attempt to change the topic of the converstation and - at worst -
some piece of not that easily discardable nonsense (both
intentionally).

2. The 'Like most ...' statement is wrong: The example code I gave
should be identical to the code in a typical 'getter' method.

Rainer Weikusat · Mar 21, 2013

Ben Morrow said:
Quoth Rainer Weikusat <[email protected]>:
[...]

2. The 'Like most ...' statement is wrong: The example code I gave
should be identical to the code in a typical 'getter' method.

Click to expand...

That's not the point, though. The interesting question is not 'Are array
derefs faster than hash derefs?': everyone already knows they are. The
interesting questions are 'How much difference would it make to the
overall performance of my application if I switched from hashref to
arrayref objects?' and 'Does that make it worth putting up with the
limitations of arrayref objects?'. My benchmark suggests the answer to
the first is 'maybe 2 or 3%, for an app which is CPU-bound rather than
IO-bound'; and unless the answer to the first was something huge, like a
50x speed increase, my answer to the second would be an unqualified
'No'.

This 'overall performance' idea is too simplistic. Eg, assume the
application is a server whose purpose is to make 'policy descisions'
for HTTP requests, that is, deny or allow them based on 'some
criterion' like 'URL black- or whitelisted' or 'classified as
belonging to a page of content-category ...'. Ideally, 'making the
policy descision shouldn't add any noticeably latency to requests AND
the measurable latency should be so small that it is possible to point
people at the fact that this can't be causing their 'internet
performance problems' even despite they suspect otherwise (which they
are going to do). Also, the descision really shouldn't add any latency
because only one such question can be processed at any given time and
this then becomes a question of 'how many requests can be processed per
second', considering that adding a latency of 1s is WAY too much. In
addition, all of this may need to run on relatively low-spec (that
is, cheap to mass-produce, hardware) and people are going to expect
that it will support at least a couple of thousands of concurrent
users (this is not a contrived example).

Taking this into account, what is more worthwhile? Using
datastructures with minimally faster access times (and less memory
requirements) or 'genuflecting before the god of code bumming', that
is "de-structuring the code" by 'inlining everything which can
conceivably be inlined'. Do I rather put up with the perceived
limitations of the former in order to avoid the latter for as long as
humanly possible, ie, am I willing to make a concsious effort to avoid
wasteing time on stuff where 'wasting time' doesn't buy me anything so
that I can afford to waste time on stuff where it does by me
something?

In line with this reflection, using anonymous arrays also enables
creating objects which retain direct access to their properties
without ad hoc hackery like

use constant NAME => __PACKAGE__.'name';

$h->{+NAME} = ...

because, just like structures in C, arrays provide an ordered set of
slots which means that something like 'use the next five free slots'
is possible which can't be done with hashes in a straight-forward way.

Of course, *if* someone were to produce a system for building arrayref
objects which *didn't* place any limits on their flexibility,

The idea to use anonymous arrays in order to enable 'nested superclass
attributes' is inherently limited to single-inheritance data
inheritance. This implies that everything which can be done in Java
can be done in this way in Perl, too, and even easier because
contortions like 'interfaces' aren't needed. Considering that this
language isn't entirely unused, discarding single-inheritance out of
hand as 'not good enough for me' seems a little preposterous. And -
since Perl isn't a "one size for all and users are fitted to it as
necessary" - the option to use other object representation where it
actually matters is always available.

[...]

To change the subject a little back to something more interesting,
here's a sketch of an idea I was playing with a while ago for a
completely different way of representing attributes.
[...]

package Foo;
use Object::Closures;

BUILD {
my $att = $_[0];

method get_att => sub { $att };
method set_att => sub { $att = $_[0] };
};

sub new { construct @_ }

The most important disadvantages are that it makes method calls even
more expensive,

According to some text I read on the web some
years ago[*], the main problem with this is that every object uses
two additional lexical pads for each of its attributes in this way.

[*] Unfortunately, I can't quote this here because doing so would
seriously upset 'certain people' who prefer keeping their heads firmly
dug into the sand wrt what is or isn't available for free on the
internet (this means they claim that this 'free availability' would
be a serious problems but - for some reason - they don't consider it
worthwhile to do something against it ...).

Rainer Weikusat · Mar 22, 2013

[...]

a. Who, exactly, do you think cares about this little rant?

Since you apparently can't get over this 'efficient' habit, any
attempt at discussing anyhting with you is evidently a waste of time.

Ted Zlatanov · Mar 22, 2013

BM> [Rainer's] habit of continually digressing into unpleasant remarks
BM> about someone or something irrelevant is starting to get on my
BM> nerves.

Phew, I thought I was the only one annoyed by that. AWKWARD!

Ted

Curious benchmark results with Inline::C	3	Dec 16, 2009
dynamically creating a hash from an array	16	Mar 21, 2014
Hash array with variable size?	5	Feb 28, 2011
Hello guys ! How do I convert a string from an array into numbers ? Javascript	3	Dec 19, 2022
DB_File (hash of array) problem	5	Jun 22, 2010
ctype performance benchmark	2	Jul 17, 2009
Trying to return true if a sub array exists at given index	1	Mar 4, 2021
Benchmark results unrealistic?	3	Feb 11, 2008

a trival array/ hash benchmark

Rainer Weikusat

Rainer Weikusat

Rainer Weikusat

Rainer Weikusat

Rainer Weikusat

Rainer Weikusat

Ted Zlatanov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads