a trival array/ hash benchmark

Discussion in 'Perl Misc' started by Rainer Weikusat, Mar 20, 2013.

  1. When running the trivial microbenchmark

    -----------
    use Benchmark;

    my $h = { Worschtsupp => 4 };
    my $a = [4];

    timethese(-5,
    {
    h => sub { return $h->{Worschtsupp}; },
    a => sub { return $a->[0]; }});
    ------------

    on 'some computer', the result of three runs was that the hash lookup
    ran at about 28.31% of the speed of the array access on average.
    10,000,000 hash lookups are needed in order to spend 1s of processing
    time solely on that and about 33,333,333 could have been done in the
    same time.
     
    Rainer Weikusat, Mar 20, 2013
    #1
    1. Advertisements

  2. They are not really meaningful anywhere except on the computer where I
    tested this _and_ with the perl version where I tested this.
    Eg, this suggests that 5.14.2 does something more intelligent (for a
    certain definition of 'intelligent') for a static hash lookup than 'do
    it from scratch every time', possibly even because hordes of "stupid
    hash users" (like herds of mooses) made it more worthwhile to put an
    effort into this. This could lead to an 'interesting' arms race where
    some people devise more elaborate benchmarks in order to 'catch them
    red-handed' while the people supposed to be caught devise more
    elaborate ways to fool benchmarks.

    The point of this posting was mainly to provide others with a way to
    determine what the difference is for them with some simple test case
    and to demonstrate that there's a difference at all (When in court, at
    least in a civil case, a 'good' lawyer in a losing position will
    simply deny everything the other party claims[*], including that the
    sun rises in the morning, because this means everything had to be
    proven which ensures that the case will grind down to a halt in
    technicalities 'for a long time'. This tactic is even more useful to
    cover unwelcome statements on USENET with surreptitious noise so that
    nobody hears them).

    [*] Real world example: I had the mispleasure to be in a court case
    because somebody ran his car into my car because he ignored certain
    traffic sign some ten years ago. Among the things which were denied
    by the lawyer of the other party was that a road running uphill to the
    next village right at the edge of the town would actually run uphill
    and that the ignored traffic sign stood there at all.
     
    Rainer Weikusat, Mar 21, 2013
    #2
    1. Advertisements

  3. This includes all kinds of unrelated effects in the test case and, as
    you yourself admitted ('as I suspected ...'), even unrelated effects
    you specifically included because you expected them to render the
    result useless for comparing hash and array lookups. Unsurprisingly,
    it became (almost) useless for this particular purpose.

    [numbers deleted]
    Putting this in another way: The overhead of decoupling the so-called
    'classs methods' from 'the object representation' with a layer of
    'more privileged core class methods' aka 'accessors' is so high that
    the actual representation doesn't matter anymore which means that the
    theoretical benefit of that ('the representation could be changed
    without affecting the 'set of general-purpose subroutines needlessly
    tied to the so-called class) isn't worthwhile: The cure is worse than
    the disease.
     
    Rainer Weikusat, Mar 21, 2013
    #3
  4. Two things which kept bothering me about this:

    1. My statement above is a little disingenious in this context: The
    method call overhead is higher and the 'The cure ...' is - at best -
    an attempt to change the topic of the converstation and - at worst -
    some piece of not that easily discardable nonsense (both
    intentionally).

    2. The 'Like most ...' statement is wrong: The example code I gave
    should be identical to the code in a typical 'getter' method.
     
    Rainer Weikusat, Mar 21, 2013
    #4
  5. This 'overall performance' idea is too simplistic. Eg, assume the
    application is a server whose purpose is to make 'policy descisions'
    for HTTP requests, that is, deny or allow them based on 'some
    criterion' like 'URL black- or whitelisted' or 'classified as
    belonging to a page of content-category ...'. Ideally, 'making the
    policy descision shouldn't add any noticeably latency to requests AND
    the measurable latency should be so small that it is possible to point
    people at the fact that this can't be causing their 'internet
    performance problems' even despite they suspect otherwise (which they
    are going to do). Also, the descision really shouldn't add any latency
    because only one such question can be processed at any given time and
    this then becomes a question of 'how many requests can be processed per
    second', considering that adding a latency of 1s is WAY too much. In
    addition, all of this may need to run on relatively low-spec (that
    is, cheap to mass-produce, hardware) and people are going to expect
    that it will support at least a couple of thousands of concurrent
    users (this is not a contrived example).

    Taking this into account, what is more worthwhile? Using
    datastructures with minimally faster access times (and less memory
    requirements) or 'genuflecting before the god of code bumming', that
    is "de-structuring the code" by 'inlining everything which can
    conceivably be inlined'. Do I rather put up with the perceived
    limitations of the former in order to avoid the latter for as long as
    humanly possible, ie, am I willing to make a concsious effort to avoid
    wasteing time on stuff where 'wasting time' doesn't buy me anything so
    that I can afford to waste time on stuff where it does by me
    something?

    In line with this reflection, using anonymous arrays also enables
    creating objects which retain direct access to their properties
    without ad hoc hackery like

    use constant NAME => __PACKAGE__.'name';

    $h->{+NAME} = ...

    because, just like structures in C, arrays provide an ordered set of
    slots which means that something like 'use the next five free slots'
    is possible which can't be done with hashes in a straight-forward way.
    The idea to use anonymous arrays in order to enable 'nested superclass
    attributes' is inherently limited to single-inheritance data
    inheritance. This implies that everything which can be done in Java
    can be done in this way in Perl, too, and even easier because
    contortions like 'interfaces' aren't needed. Considering that this
    language isn't entirely unused, discarding single-inheritance out of
    hand as 'not good enough for me' seems a little preposterous. And -
    since Perl isn't a "one size for all and users are fitted to it as
    necessary" - the option to use other object representation where it
    actually matters is always available.

    [...]
    According to some text I read on the web some
    years ago[*], the main problem with this is that every object uses
    two additional lexical pads for each of its attributes in this way.

    [*] Unfortunately, I can't quote this here because doing so would
    seriously upset 'certain people' who prefer keeping their heads firmly
    dug into the sand wrt what is or isn't available for free on the
    internet (this means they claim that this 'free availability' would
    be a serious problems but - for some reason - they don't consider it
    worthwhile to do something against it ...).
     
    Rainer Weikusat, Mar 21, 2013
    #5
  6. [...]
    Since you apparently can't get over this 'efficient' habit, any
    attempt at discussing anyhting with you is evidently a waste of time.
     
    Rainer Weikusat, Mar 22, 2013
    #6
  7. Rainer Weikusat

    Ted Zlatanov Guest

    BM> [Rainer's] habit of continually digressing into unpleasant remarks
    BM> about someone or something irrelevant is starting to get on my
    BM> nerves.

    Phew, I thought I was the only one annoyed by that. AWKWARD!

    Ted
     
    Ted Zlatanov, Mar 22, 2013
    #7
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.