a trival array/ hash benchmark

Discussion in 'Perl Misc' started by Rainer Weikusat, Mar 20, 2013.

  1. When running the trivial microbenchmark

    -----------
    use Benchmark;

    my $h = { Worschtsupp => 4 };
    my $a = [4];

    timethese(-5,
    {
    h => sub { return $h->{Worschtsupp}; },
    a => sub { return $a->[0]; }});
    ------------

    on 'some computer', the result of three runs was that the hash lookup
    ran at about 28.31% of the speed of the array access on average.
    10,000,000 hash lookups are needed in order to spend 1s of processing
    time solely on that and about 33,333,333 could have been done in the
    same time.
    Rainer Weikusat, Mar 20, 2013
    #1
    1. Advertising

  2. Eli the Bearded <*@eli.users.panix.com> writes:
    > In comp.lang.perl.misc, Rainer Weikusat <> wrote:
    >> When running the trivial microbenchmark

    >
    > I love these sorts of things, so I tried to duplicate your results.
    >
    >> -----------
    >> use Benchmark;
    >>
    >> my $h = { Worschtsupp => 4 };
    >> my $a = [4];
    >>
    >> timethese(-5,
    >> {
    >> h => sub { return $h->{Worschtsupp}; },
    >> a => sub { return $a->[0]; }});
    >> ------------
    >>
    >> on 'some computer', the result of three runs was that the hash lookup
    >> ran at about 28.31% of the speed of the array access on average.

    >
    > Why not post actual results?


    They are not really meaningful anywhere except on the computer where I
    tested this _and_ with the perl version where I tested this.

    > When I run it three times:
    >
    > a: 5 wallclock secs ( 5.30 usr + 0.00 sys = 5.30 CPU) @ 18706996.60/s (n=99147082)
    > h: 4 wallclock secs ( 5.24 usr + 0.00 sys = 5.24 CPU) @ 15713758.59/s (n=82340095)
    >
    > a: 6 wallclock secs ( 5.57 usr + 0.00 sys = 5.57 CPU) @ 28009483.30/s (n=156012822)
    > h: 6 wallclock secs ( 5.27 usr + 0.00 sys = 5.27 CPU) @ 24815075.90/s (n=130775450)
    >
    > a: 5 wallclock secs ( 5.21 usr + 0.00 sys = 5.21 CPU) @ 19115772.55/s (n=99593175)
    > h: 4 wallclock secs ( 5.03 usr + 0.00 sys = 5.03 CPU) @ 22237998.61/s (n=111857133)


    Eg, this suggests that 5.14.2 does something more intelligent (for a
    certain definition of 'intelligent') for a static hash lookup than 'do
    it from scratch every time', possibly even because hordes of "stupid
    hash users" (like herds of mooses) made it more worthwhile to put an
    effort into this. This could lead to an 'interesting' arms race where
    some people devise more elaborate benchmarks in order to 'catch them
    red-handed' while the people supposed to be caught devise more
    elaborate ways to fool benchmarks.

    The point of this posting was mainly to provide others with a way to
    determine what the difference is for them with some simple test case
    and to demonstrate that there's a difference at all (When in court, at
    least in a civil case, a 'good' lawyer in a losing position will
    simply deny everything the other party claims[*], including that the
    sun rises in the morning, because this means everything had to be
    proven which ensures that the case will grind down to a halt in
    technicalities 'for a long time'. This tactic is even more useful to
    cover unwelcome statements on USENET with surreptitious noise so that
    nobody hears them).

    [*] Real world example: I had the mispleasure to be in a court case
    because somebody ran his car into my car because he ignored certain
    traffic sign some ten years ago. Among the things which were denied
    by the lawyer of the other party was that a road running uphill to the
    next village right at the edge of the town would actually run uphill
    and that the ignored traffic sign stood there at all.
    Rainer Weikusat, Mar 21, 2013
    #2
    1. Advertising

  3. Ben Morrow <> writes:
    > Quoth Rainer Weikusat <>:
    >> When running the trivial microbenchmark
    >>
    >> -----------
    >> use Benchmark;
    >>
    >> my $h = { Worschtsupp => 4 };
    >> my $a = [4];
    >>
    >> timethese(-5,
    >> {
    >> h => sub { return $h->{Worschtsupp}; },
    >> a => sub { return $a->[0]; }});
    >> ------------
    >>
    >> on 'some computer', the result of three runs was that the hash lookup
    >> ran at about 28.31% of the speed of the array access on average.
    >> 10,000,000 hash lookups are needed in order to spend 1s of processing
    >> time solely on that and about 33,333,333 could have been done in the
    >> same time.

    >
    > Like most microbenchmarks, this tells you very little about real code.
    > Try something a bit more realistic, like
    >
    > #!/opt/perl/bin/perl
    >
    > use Benchmark qw/cmpthese/;
    >
    > sub one_a { my ($self) = @_; $self->[0]; }
    > sub one_h { my ($self) = @_; $self->{a}; }
    > sub two_a { my ($self) = @_; $self->one_a; }
    > sub two_h { my ($self) = @_; $self->one_h; }
    > sub if_a { my ($self) = @_; if (rand > 0.2) { $self->two_a } }
    > sub if_h { my ($self) = @_; if (rand > 0.2) { $self->two_h } }


    This includes all kinds of unrelated effects in the test case and, as
    you yourself admitted ('as I suspected ...'), even unrelated effects
    you specifically included because you expected them to render the
    result useless for comparing hash and array lookups. Unsurprisingly,
    it became (almost) useless for this particular purpose.

    [numbers deleted]

    > the method-call overhead completely dominates the
    > overhead of the hash lookup. If you can save one method call, you will
    > save more time than you would have saved by using an arrayref;


    Putting this in another way: The overhead of decoupling the so-called
    'classs methods' from 'the object representation' with a layer of
    'more privileged core class methods' aka 'accessors' is so high that
    the actual representation doesn't matter anymore which means that the
    theoretical benefit of that ('the representation could be changed
    without affecting the 'set of general-purpose subroutines needlessly
    tied to the so-called class) isn't worthwhile: The cure is worse than
    the disease.
    Rainer Weikusat, Mar 21, 2013
    #3
  4. Rainer Weikusat <> writes:
    > Ben Morrow <> writes:
    >> Quoth Rainer Weikusat <>:


    [...]

    >>> -----------
    >>> use Benchmark;
    >>>
    >>> my $h = { Worschtsupp => 4 };
    >>> my $a = [4];
    >>>
    >>> timethese(-5,
    >>> {
    >>> h => sub { return $h->{Worschtsupp}; },
    >>> a => sub { return $a->[0]; }});
    >>> ------------


    [...]


    >> Like most microbenchmarks, this tells you very little about real code.
    >> Try something a bit more realistic, like


    [...]


    >> the method-call overhead completely dominates the
    >> overhead of the hash lookup. If you can save one method call, you will
    >> save more time than you would have saved by using an arrayref;

    >
    > Putting this in another way: The overhead of decoupling the so-called
    > 'classs methods' from 'the object representation' with a layer of
    > 'more privileged core class methods' aka 'accessors' is so high that
    > the actual representation doesn't matter anymore which means that the
    > theoretical benefit of that ('the representation could be changed
    > without affecting the 'set of general-purpose subroutines needlessly
    > tied to the so-called class) isn't worthwhile: The cure is worse than
    > the disease.


    Two things which kept bothering me about this:

    1. My statement above is a little disingenious in this context: The
    method call overhead is higher and the 'The cure ...' is - at best -
    an attempt to change the topic of the converstation and - at worst -
    some piece of not that easily discardable nonsense (both
    intentionally).

    2. The 'Like most ...' statement is wrong: The example code I gave
    should be identical to the code in a typical 'getter' method.
    Rainer Weikusat, Mar 21, 2013
    #4
  5. Re: Object attribute representation

    Ben Morrow <> writes:
    > Quoth Rainer Weikusat <>:


    [...]

    >> 2. The 'Like most ...' statement is wrong: The example code I gave
    >> should be identical to the code in a typical 'getter' method.

    >
    > That's not the point, though. The interesting question is not 'Are array
    > derefs faster than hash derefs?': everyone already knows they are. The
    > interesting questions are 'How much difference would it make to the
    > overall performance of my application if I switched from hashref to
    > arrayref objects?' and 'Does that make it worth putting up with the
    > limitations of arrayref objects?'. My benchmark suggests the answer to
    > the first is 'maybe 2 or 3%, for an app which is CPU-bound rather than
    > IO-bound'; and unless the answer to the first was something huge, like a
    > 50x speed increase, my answer to the second would be an unqualified
    > 'No'.


    This 'overall performance' idea is too simplistic. Eg, assume the
    application is a server whose purpose is to make 'policy descisions'
    for HTTP requests, that is, deny or allow them based on 'some
    criterion' like 'URL black- or whitelisted' or 'classified as
    belonging to a page of content-category ...'. Ideally, 'making the
    policy descision shouldn't add any noticeably latency to requests AND
    the measurable latency should be so small that it is possible to point
    people at the fact that this can't be causing their 'internet
    performance problems' even despite they suspect otherwise (which they
    are going to do). Also, the descision really shouldn't add any latency
    because only one such question can be processed at any given time and
    this then becomes a question of 'how many requests can be processed per
    second', considering that adding a latency of 1s is WAY too much. In
    addition, all of this may need to run on relatively low-spec (that
    is, cheap to mass-produce, hardware) and people are going to expect
    that it will support at least a couple of thousands of concurrent
    users (this is not a contrived example).

    Taking this into account, what is more worthwhile? Using
    datastructures with minimally faster access times (and less memory
    requirements) or 'genuflecting before the god of code bumming', that
    is "de-structuring the code" by 'inlining everything which can
    conceivably be inlined'. Do I rather put up with the perceived
    limitations of the former in order to avoid the latter for as long as
    humanly possible, ie, am I willing to make a concsious effort to avoid
    wasteing time on stuff where 'wasting time' doesn't buy me anything so
    that I can afford to waste time on stuff where it does by me
    something?

    In line with this reflection, using anonymous arrays also enables
    creating objects which retain direct access to their properties
    without ad hoc hackery like

    use constant NAME => __PACKAGE__.'name';

    $h->{+NAME} = ...

    because, just like structures in C, arrays provide an ordered set of
    slots which means that something like 'use the next five free slots'
    is possible which can't be done with hashes in a straight-forward way.

    > Of course, *if* someone were to produce a system for building arrayref
    > objects which *didn't* place any limits on their flexibility,


    The idea to use anonymous arrays in order to enable 'nested superclass
    attributes' is inherently limited to single-inheritance data
    inheritance. This implies that everything which can be done in Java
    can be done in this way in Perl, too, and even easier because
    contortions like 'interfaces' aren't needed. Considering that this
    language isn't entirely unused, discarding single-inheritance out of
    hand as 'not good enough for me' seems a little preposterous. And -
    since Perl isn't a "one size for all and users are fitted to it as
    necessary" - the option to use other object representation where it
    actually matters is always available.

    [...]

    > To change the subject a little back to something more interesting,
    > here's a sketch of an idea I was playing with a while ago for a
    > completely different way of representing attributes.


    [...]

    > package Foo;
    > use Object::Closures;
    >
    > BUILD {
    > my $att = $_[0];
    >
    > method get_att => sub { $att };
    > method set_att => sub { $att = $_[0] };
    > };
    >
    > sub new { construct @_ }
    >
    > The most important disadvantages are that it makes method calls even
    > more expensive,


    According to some text I read on the web some
    years ago[*], the main problem with this is that every object uses
    two additional lexical pads for each of its attributes in this way.

    [*] Unfortunately, I can't quote this here because doing so would
    seriously upset 'certain people' who prefer keeping their heads firmly
    dug into the sand wrt what is or isn't available for free on the
    internet (this means they claim that this 'free availability' would
    be a serious problems but - for some reason - they don't consider it
    worthwhile to do something against it ...).
    Rainer Weikusat, Mar 21, 2013
    #5
  6. Re: Object attribute representation

    Ben Morrow <> writes:

    [...]

    > a. Who, exactly, do you think cares about this little rant?


    Since you apparently can't get over this 'efficient' habit, any
    attempt at discussing anyhting with you is evidently a waste of time.
    Rainer Weikusat, Mar 22, 2013
    #6
  7. Rainer Weikusat

    Ted Zlatanov Guest

    Re: Object attribute representation

    On Fri, 22 Mar 2013 03:01:40 +0000 Ben Morrow <> wrote:

    BM> [Rainer's] habit of continually digressing into unpleasant remarks
    BM> about someone or something irrelevant is starting to get on my
    BM> nerves.

    Phew, I thought I was the only one annoyed by that. AWKWARD!

    Ted
    Ted Zlatanov, Mar 22, 2013
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rp
    Replies:
    1
    Views:
    512
    red floyd
    Nov 10, 2011
  2. Peter Michaux
    Replies:
    13
    Views:
    220
    Erik Veenstra
    Oct 29, 2006
  3. Anthony Martinez
    Replies:
    4
    Views:
    268
    Robert Klemme
    Jun 11, 2007
  4. Michal Suchanek
    Replies:
    6
    Views:
    226
    Nobuyoshi Nakada
    Jun 13, 2007
  5. Srijayanth Sridhar
    Replies:
    19
    Views:
    608
    David A. Black
    Jul 2, 2008
Loading...

Share This Page