make faster Richards benchmark

Discussion in 'Python' started by Duncan Lissett, May 13, 2004.

  1. I'd appreciate any suggestions on how to make faster Python
    implementations of Richards benchmark. Perhaps there are obvious
    problems that can be corrected?

    http://www.lissett.com/ben/bench1.htm
    Duncan Lissett, May 13, 2004
    #1
    1. Advertising

  2. Duncan Lissett

    Peter Maas Guest

    Duncan Lissett wrote:
    > I'd appreciate any suggestions on how to make faster Python
    > implementations of Richards benchmark. Perhaps there are obvious
    > problems that can be corrected?


    The most obvious problem: how does the Richards benchmark work?
    Care to post the source code?

    Mit freundlichen Gruessen,

    Peter Maas

    --
    -------------------------------------------------------------------
    Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
    Tel +49-241-93878-0 Fax +49-241-93878-20 eMail
    -------------------------------------------------------------------
    Peter Maas, May 13, 2004
    #2
    1. Advertising

  3. Duncan Lissett

    Josef Meile Guest

    Duncan Lissett wrote:
    > I'd appreciate any suggestions on how to make faster Python
    > implementations of Richards benchmark. Perhaps there are obvious
    > problems that can be corrected?
    >
    > http://www.lissett.com/ben/bench1.htm

    What's about including a second python implementation of the Richards
    benchmark using psyco? You don't have to modify your code, you only have
    to add two lines. It would be also interesting to see the differences
    between both source codes.

    Regards,
    Josef
    Josef Meile, May 13, 2004
    #3
  4. Hi !

    On the site, click on the word "Python #92"
    Michel Claveau/Hamster, May 13, 2004
    #4
  5. Duncan Lissett

    Wilk Guest

    (Duncan Lissett) writes:

    > I'd appreciate any suggestions on how to make faster Python
    > implementations of Richards benchmark. Perhaps there are obvious
    > problems that can be corrected?
    >
    > http://www.lissett.com/ben/bench1.htm


    import psyco
    psyco.full()

    2 times faster :)

    --
    Wilk - http://flibuste.net
    Wilk, May 13, 2004
    #5
  6. Duncan Lissett

    Duncan Booth Guest

    (Duncan Lissett) wrote in
    news::

    > I'd appreciate any suggestions on how to make faster Python
    > implementations of Richards benchmark. Perhaps there are obvious
    > problems that can be corrected?


    That should say "the Richards benchmark", named for Martin Richards.

    >
    > http://www.lissett.com/ben/bench1.htm


    Well, if I was going to do a Python implementation of this I would want to
    look at what the benchmark is trying to achieve, and then program it fresh
    from the ground up instead of trying to ape some other language. In
    particular I expect that the classes with 'run' methods would probably
    become generators with most or all of their state held as local variables.

    For example, imagining that all the 'run' methods have first been renamed
    'next', IdleTask becomes (untested):

    def IdleTask(scheduler, v1, v2):
    for i in range(v2):
    if v1 & 1:
    v1 = (v1 >> 1) ^ 0xD008
    yield scheduler.release(DEVICEB)
    else:
    v1 = v1 >> 1
    yield scheduler.release(DEVICEA)
    yield scheduler.holdCurrent()

    Next most obvious thing is to junk the linked lists and replace them with
    ordinary Python builtin lists. Pass the relevant list around instead of the
    id constants 'DEVICEA', 'DEVICEB' and you can get rid of all that lookup
    overhead. This involves quite a major restructuring of the scheduler
    though: hence my comment about understanding what the code is trying to
    achieve and starting from the ground up.

    Packet.addTo should look more like:

    def addTo(self, queue):
    queue.append(self)

    and then vapourise entirely.


    Minor points:

    Remove the pointless use of the 'global' keyword in various places. Replace
    the traceOn variable with __debug__ so you get the same benefits as
    compiled languages by optimising out the test for the trace statements.

    Remove the pointless set/get methods and just access the members directly.
    If you are writing a benchmark they will cripple performance.

    Avoiding multiple accesses to the same instance variable, or assigning to
    instance variables until you are about to return from a method: use a local
    during the execution of the method.
    Duncan Booth, May 13, 2004
    #6
  7. Duncan Lissett

    Peter Hansen Guest

    Josef Meile wrote:

    > Duncan Lissett wrote:
    >
    >> I'd appreciate any suggestions on how to make faster Python
    >> implementations of Richards benchmark. Perhaps there are obvious
    >> problems that can be corrected?
    >>
    >> http://www.lissett.com/ben/bench1.htm

    >
    > What's about including a second python implementation of the Richards
    > benchmark using psyco? You don't have to modify your code, you only have
    > to add two lines. It would be also interesting to see the differences
    > between both source codes.


    I get an immediate 38% speedup by doing "import psyco; psyco.full()"
    at the start of the benchmark.

    -Peter
    Peter Hansen, May 13, 2004
    #7
  8. Duncan Lissett

    Peter Maas Guest

    Michel Claveau/Hamster wrote:
    > On the site, click on the word "Python #92"


    Thanks. Oh, how silly of me. I didn't recognize these as links,
    still tuned to underscore links.

    Mit freundlichen Gruessen,

    Peter Maas

    --
    -------------------------------------------------------------------
    Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
    Tel +49-241-93878-0 Fax +49-241-93878-20 eMail
    -------------------------------------------------------------------
    Peter Maas, May 13, 2004
    #8
  9. Duncan Booth <> wrote in message news:<Xns94E8683D9EEF2duncanrcpcouk@127.0.0.1>...

    -snip-
    > Well, if I was going to do a Python implementation of this I would want to
    > look at what the benchmark is trying to achieve, and then program it fresh
    > from the ground up instead of trying to ape some other language. In
    > particular I expect that the classes with 'run' methods would probably
    > become generators with most or all of their state held as local variables.
    >
    > For example, imagining that all the 'run' methods have first been renamed
    > 'next', IdleTask becomes (untested):
    >
    > def IdleTask(scheduler, v1, v2):
    > for i in range(v2):
    > if v1 & 1:
    > v1 = (v1 >> 1) ^ 0xD008
    > yield scheduler.release(DEVICEB)
    > else:
    > v1 = v1 >> 1
    > yield scheduler.release(DEVICEA)
    > yield scheduler.holdCurrent()
    >
    > Next most obvious thing is to junk the linked lists and replace them with
    > ordinary Python builtin lists. Pass the relevant list around instead of the
    > id constants 'DEVICEA', 'DEVICEB' and you can get rid of all that lookup
    > overhead. This involves quite a major restructuring of the scheduler
    > though: hence my comment about understanding what the code is trying to
    > achieve and starting from the ground up.
    >
    > Packet.addTo should look more like:
    >
    > def addTo(self, queue):
    > queue.append(self)
    >
    > and then vapourise entirely.
    >
    >
    > Minor points:
    >
    > Remove the pointless use of the 'global' keyword in various places. Replace
    > the traceOn variable with __debug__ so you get the same benefits as
    > compiled languages by optimising out the test for the trace statements.


    Thanks, these are all interesting suggestions.


    > Remove the pointless set/get methods and just access the members directly.
    > If you are writing a benchmark they will cripple performance.
    >
    > Avoiding multiple accesses to the same instance variable, or assigning to
    > instance variables until you are about to return from a method: use a local
    > during the execution of the method.


    The pointless set/get methods are pointless in the other language
    implementations as well - that's the point. (And I'll cripple that
    Oberon-2 implementation real soon by enforcing privacy with modules
    and set/get procedures.)

    OTOH there's definitely a place for a truly Pythonic implementation
    here:
    http://www.lissett.com/ben/bench3.htm

    best wishes, Duncan
    Duncan Lissett, May 13, 2004
    #9
  10. Michel Claveau/Hamster, May 13, 2004
    #10
  11. Wilk <> wrote in message news:<>...
    > (Duncan Lissett) writes:
    >
    > > I'd appreciate any suggestions on how to make faster Python
    > > implementations of Richards benchmark. Perhaps there are obvious
    > > problems that can be corrected?
    > >
    > > http://www.lissett.com/ben/bench1.htm

    >
    > import psyco
    > psyco.full()
    >
    > 2 times faster :)


    And Simon: "I just tried the benchmark with Psyco, and cut the run
    time for input=10000 from 8.1 seconds to 3.4"

    Joseph, Simon, Wilk: thanks for the Psyco suggestion, and special
    thanks for going ahead and trying it - we get slightly better than x2.

    Given how little I know about Python, my assumption is that there's
    scope for x10 improvement by writing better code... a more Pythonic
    version for http://www.lissett.com/ben/bench3.htm

    Suggestions appreciated.

    best wishes, Duncan
    Duncan Lissett, May 13, 2004
    #11
  12. Duncan Lissett

    Peter Hansen Guest

    Duncan Lissett wrote:

    > Wilk <> wrote in message news:<>...
    >
    >> (Duncan Lissett) writes:
    >>
    >>
    >>>I'd appreciate any suggestions on how to make faster Python
    >>>implementations of Richards benchmark. Perhaps there are obvious
    >>>problems that can be corrected?
    >>>
    >>>http://www.lissett.com/ben/bench1.htm

    >>
    >>import psyco
    >>psyco.full()
    >>
    >>2 times faster :)

    >
    > And Simon: "I just tried the benchmark with Psyco, and cut the run
    > time for input=10000 from 8.1 seconds to 3.4"
    >
    > Joseph, Simon, Wilk: thanks for the Psyco suggestion, and special
    > thanks for going ahead and trying it - we get slightly better than x2.


    What platform is this on? On WinXP, with an AMD 2500+ chip, and lots
    of memory etc., I'm still getting only 38% speedup, nowhere near 100%...

    -Peter
    Peter Hansen, May 13, 2004
    #12
  13. On Thu, May 13, 2004 at 09:15:58AM -0700, Duncan Lissett wrote:
    > Duncan Booth <> wrote in message news:<Xns94E8683D9EEF2duncanrcpcouk@127.0.0.1>...
    > > Remove the pointless set/get methods and just access the members directly.
    > > If you are writing a benchmark they will cripple performance.
    > >
    > > Avoiding multiple accesses to the same instance variable, or assigning to
    > > instance variables until you are about to return from a method: use a local
    > > during the execution of the method.

    >
    > The pointless set/get methods are pointless in the other language
    > implementations as well - that's the point. (And I'll cripple that
    > Oberon-2 implementation real soon by enforcing privacy with modules
    > and set/get procedures.)


    I would leave python and some other languages out of the comparison.
    You can do near line-for-line translations for languages in the same class
    - say C++ and Java, or Assembly and C. Requiring the python/perl/ruby
    versions to look like a C++ program just measures how badly C++ maps to
    Python, and not much else.

    I've even seen some line-for-line translations in production use that
    use python list-of-strings where the orignal version used char arrarys.
    You can imagine what that does for performance *shudder*.

    Writing benchmarks is just hard, if you allow people to solve the problem
    in whatever way they like you end up measuring how good a coder the
    language A guy is compared to the language B submitter.

    -jackdied
    Jack Diederich, May 13, 2004
    #13
  14. Duncan Lissett

    Peter Hansen Guest

    Jack Diederich wrote:

    > On Thu, May 13, 2004 at 09:15:58AM -0700, Duncan Lissett wrote:
    >>The pointless set/get methods are pointless in the other language
    >>implementations as well - that's the point.

    >
    > I would leave python and some other languages out of the comparison.
    > You can do near line-for-line translations for languages in the same class
    > - say C++ and Java, or Assembly and C. Requiring the python/perl/ruby
    > versions to look like a C++ program just measures how badly C++ maps to
    > Python, and not much else.
    >
    > I've even seen some line-for-line translations in production use that
    > use python list-of-strings where the orignal version used char arrarys.
    > You can imagine what that does for performance *shudder*.
    >
    > Writing benchmarks is just hard, if you allow people to solve the problem
    > in whatever way they like you end up measuring how good a coder the
    > language A guy is compared to the language B submitter.


    In the case of the Richards benchmark, it's better to say "specifying
    benchmarks is hard". The problem that I saw with it is that it
    describes the problem in terms that are inappropriate for many of the
    languages it has been translated to, effectively constraining the
    implementation in ways that are more suitable to certain languages
    than others.

    One key example is the requirement for "global registers" which act
    similarly to how the low-level registers in the CPU have to be handled,
    specifically by saving and restoring them in a task-local context
    whenever a task switch occurs. This is clearly what Richards really
    wanted, but I'd argue that the approach is ill-conceived, especially
    in this century...

    On the other hand, if the specification had been more thoroughly
    thought out, or perhaps modernized is a more appropriate way of looking
    at it, then it would describe the expected *use and behaviour* of the
    tasks in terms of black box testability... in other words, it shouldn't
    matter that "global registers" are used, but merely that the
    benchmark produces certain results.

    Without those tests, it's fairly open to interpretation just how
    much deviation from the original BCPL implementation is appropriate.
    Duncan L's ideas for a generator-based approach are interesting,
    though there are probably those who would argue that it wouldn't
    follow the benchmark specs exactly. (Especially if it showed
    performance much closer to C. <wink>)

    -Peter
    Peter Hansen, May 13, 2004
    #14
  15. Duncan Booth <> wrote in message news:<Xns94E8683D9EEF2duncanrcpcouk@127.0.0.1>...

    -snip-
    > Next most obvious thing is to junk the linked lists and replace them with
    > ordinary Python builtin lists.


    After the Packet links were replace with a Python list the code was
    ~1% slower...

    > Replace the traceOn variable with __debug__ so you get the same benefits as
    > compiled languages by optimising out the test for the trace statements.


    Using __debug__ made the code ~1% faster

    > Remove the pointless set/get methods and just access the members directly.


    As a special dispensation to Python, directly accessed Packet and Tcb
    fields, ~10% faster


    > Avoiding multiple accesses to the same instance variable, or assigning to
    > instance variables until you are about to return from a method: use a local
    > during the execution of the method.


    Seems we're pushing into code optimization rather than not coding
    badly.

    Being careful about the tests in conditionals, and reordering some
    conditionals sped things up some more. (I should roll the same changes
    into the other langauge implementations.)

    The OO style implementation took 104.4s and now takes 87.1s

    I took the OO style implementation, made the methods of the scheduler
    class into functions, made the fields into globals, and made the Task
    class run methods into top-level functions.

    The C style implementation took 114.9s and the OO conversion takes
    80.3s
    http://www.lissett.com/ben/bench3.htm

    Further suggestions are welcome (yes, I will try pysco later)
    Duncan Lissett, May 14, 2004
    #15
  16. Jack Diederich <> wrote in message news:<>...

    > I would leave python and some other languages out of the comparison.
    > You can do near line-for-line translations for languages in the same class
    > - say C++ and Java, or Assembly and C. Requiring the python/perl/ruby
    > versions to look like a C++ program just measures how badly C++ maps to
    > Python, and not much else.


    There's no requirement that Python versions look like a C++ program.
    Peter Hansen chose to write a Python version closely based on the C
    implementation. I've now replaced that with a faster version initially
    based on a Smalltalk OO implementation.

    -snip-
    > Writing benchmarks is just hard, if you allow people to solve the problem
    > in whatever way they like you end up measuring how good a coder the
    > language A guy is compared to the language B submitter.


    Maybe the hardest thing about benchmarks is keeping some perspective
    on what they might and might not mean ;-)

    Yes, the fine performance of the BCPL interpreter implementation (and
    the C implementation) has a lot to do with programmer skill. But I'm
    not that sure about the other versions ;-)
    Duncan Lissett, May 14, 2004
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Hansen

    Re: Richards bench benchmark

    Peter Hansen, Apr 2, 2004, in forum: Python
    Replies:
    10
    Views:
    456
    Duncan Lissett
    May 7, 2004
  2. Simon Wittber

    RE: make faster Richards benchmark

    Simon Wittber, May 14, 2004, in forum: Python
    Replies:
    4
    Views:
    305
    Duncan Lissett
    May 17, 2004
  3. Duncan Lissett

    make faster Richards benchmark

    Duncan Lissett, May 24, 2004, in forum: Ruby
    Replies:
    11
    Views:
    240
    Duncan Lissett
    Jun 1, 2004
  4. Farhad Farzaneh

    Does this benchmark make sense?

    Farhad Farzaneh, May 31, 2007, in forum: Ruby
    Replies:
    3
    Views:
    118
    Brian Candler
    May 31, 2007
  5. Melzzzzz
    Replies:
    2
    Views:
    322
    Melzzzzz
    Oct 4, 2012
Loading...

Share This Page