ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++compiler

Discussion in 'Python' started by Mark Dufour, Jul 19, 2009.

  1. Mark Dufour

    Mark Dufour Guest

    Hi all,

    I have just released version 0.2 of Shed Skin, an experimental
    (restricted) Python-to-C++ compiler (http://shedskin.googlecode.com).
    It comes with 7 new example programs (for a total of 40 example
    programs, at over 12,000 lines) and several important improvements/bug
    fixes. See http://code.google.com/p/shedskin/wiki/ReleaseNotes for the
    full changelog.

    The new example programs consist of Disco, an elegant go player (see
    http://shed-skin.blogspot.com/2009/07/disco-elegant-python-go-player.html),
    a larger Voronoi implementation at 800 lines, a TSP algorithm
    simulating ant colonies, a nicer neural network algorithm and three
    compressors (Lempel-Ziv, huffman block, and arithmetic).

    Other than bug fixes for these programs, this release adds some
    important optimizations. First and foremost, inlining was greatly
    improved, resulting in potential speedups across the board. Second,
    loops such as 'for a, b in enumerate/zip(sequence[, sequence])' should
    now be dramatically faster (also inside list comprehensions), by
    avoiding allocation of intermediate tuples. Finally, basic list
    slicing should now be much faster.


    Please try it out!
    Mark Dufour.
    --
    "One of my most productive days was throwing away 1000 lines of code"
    - Ken Thompson
     
    Mark Dufour, Jul 19, 2009
    #1
    1. Advertising

  2. Mark Dufour

    William Dode Guest

    On 19-07-2009, Mark Dufour wrote:
    > Hi all,
    >
    > I have just released version 0.2 of Shed Skin, an experimental
    > (restricted) Python-to-C++ compiler (http://shedskin.googlecode.com).


    I just tested it with a litle game, to find the places of horse on
    a board 5x5. The result is :

    c 5s
    gcj 7s
    java 7s
    shedskin 8s
    python + psyco 18s
    cython avec malloc *int 18s
    cython 55s avec [] python
    python 303s (5m3s)

    --
    William Dodé - http://flibuste.net
    Informaticien Indépendant
     
    William Dode, Jul 20, 2009
    #2
    1. Advertising

  3. William Dode wrote:
    > On 19-07-2009, Mark Dufour wrote:
    >> I have just released version 0.2 of Shed Skin, an experimental
    >> (restricted) Python-to-C++ compiler (http://shedskin.googlecode.com).

    >
    > I just tested it with a litle game, to find the places of horse on
    > a board 5x5. The result is :
    >
    > [...]
    > shedskin 8s
    > python + psyco 18s
    > cython avec malloc *int 18s
    > cython 55s avec [] python
    > python 303s (5m3s)


    Note that both Psyco and Cython make a lot less assumptions about Python
    code than Shed Skin does. Psyco has the advantage of just needing to jump
    in when it finds out that it can help, so it's the most broadly compatible
    of the three. But Cython also supports quite a large corpus of dynamic
    Python code by now. Shed Skin has a lot of restrictions, many of which are
    by design. It's not intended to compile dynamic code, and I think that's a
    good thing, because that's what makes it fast for the code that it
    supports. Getting the same speed in Cython requires a bit more explicit
    typing, simply because Cython does not assume these restrictions.

    I think that all three have their raison d'être, and it currently looks
    like all three are there to stay and to keep growing better. And I'm also
    happy to read that some optimisations jump from one to the other. ;)

    Stefan
     
    Stefan Behnel, Jul 20, 2009
    #3
  4. Mark Dufour

    Bearophile Guest

    William Dode':
    > I just tested it with a litle game, to find the places of horse on
    > a board 5x5. The result is :
    >
    > c 5s
    > gcj 7s
    > java 7s
    > shedskin 8s
    > python + psyco 18s
    > cython avec malloc *int 18s
    > cython 55s avec [] python
    > python 303s (5m3s)


    Nice timings, can you please show me the Python, Java and C code
    versions? I may do more tests.
    The purpose of all those "example programs" in ShedSkin is to find
    bugs and to find details to speedup.

    Bye,
    bearophile
     
    Bearophile, Jul 20, 2009
    #4
  5. Mark Dufour

    Guest

    William> c 5s
    William> gcj 7s
    William> java 7s
    William> shedskin 8s
    William> python + psyco 18s
    William> cython avec malloc *int 18s
    William> cython 55s avec [] python
    William> python 303s (5m3s)

    I read just enough French to know that "avec" means "with", but I don't
    understand the difference between "avec malloc *int" and "avec []". Can you
    explain please?

    Thx,

    --
    Skip Montanaro - - http://www.smontanaro.net/
    That's more than a dress. That's an Audrey Hepburn movie. -- Jerry Maguire
     
    , Jul 20, 2009
    #5
  6. Mark Dufour

    Bearophile Guest

    Skip Montanaro:
    > I read just enough French to know that "avec" means "with", but I don't
    > understand the difference between "avec malloc *int" and "avec []".  Can you
    > explain please?


    Maybe it's the time difference between using a Python list from Cython
    and using a C "array" allocated with a malloc from Cython.

    Bye,
    bearophile
     
    Bearophile, Jul 20, 2009
    #6
  7. Mark Dufour

    srepmub Guest


    > Nice timings, can you please show me the Python, Java and C code
    > versions? I may do more tests.


    also, which shedskin options did you use? did you use -bw, to disable
    bounds and wrap-around checking? this can make quite a difference for
    code that does a lot of indexing. if the code uses random numbers,
    then -r can also make a big difference, to use C rand(), instead of
    Python compatible random numbers.

    and which C++ compiler flags did you use? the default -O2, or
    something like -O3 -fomit-frame-pointer -msse2..?


    thanks,
    mark.
     
    srepmub, Jul 20, 2009
    #7
  8. Mark Dufour

    William Dode Guest

    On 20-07-2009, srepmub wrote:
    >
    >> Nice timings, can you please show me the Python, Java and C code
    >> versions? I may do more tests.


    Of course, the codes are here :

    http://hg.flibuste.net/libre/games/cheval

    Like you'll see, i tried to use exactly the same code for each langage.

    >
    > also, which shedskin options did you use? did you use -bw, to disable
    > bounds and wrap-around checking? this can make quite a difference for
    > code that does a lot of indexing. if the code uses random numbers,
    > then -r can also make a big difference, to use C rand(), instead of
    > Python compatible random numbers.
    >
    > and which C++ compiler flags did you use? the default -O2, or
    > something like -O3 -fomit-frame-pointer -msse2..?


    I used the default, shedksin cheval.py; make
    shedskin 0.2

    With -bw and -O3 -fomit-frame-pointer -msse2 i have 5.5s (instead of 8)

    Let me know if you find better.

    --
    William Dodé - http://flibuste.net
    Informaticien Indépendant
     
    William Dode, Jul 21, 2009
    #8
  9. Mark Dufour

    William Dode Guest

    On 20-07-2009, Bearophile wrote:
    > Skip Montanaro:
    >> I read just enough French to know that "avec" means "with", but I don't
    >> understand the difference between "avec malloc *int" and "avec []".  Can you
    >> explain please?

    >
    > Maybe it's the time difference between using a Python list from Cython
    > and using a C "array" allocated with a malloc from Cython.


    yes, it's this

    --
    William Dodé - http://flibuste.net
    Informaticien Indépendant
     
    William Dode, Jul 21, 2009
    #9
  10. Mark Dufour

    srepmub Guest


    > With -bw and -O3 -fomit-frame-pointer -msse2 i have 5.5s (instead of 8)
    >
    > Let me know if you find better.


    thanks. now I'm wondering how fast does the C version become with
    these flags..? :)


    mark.
     
    srepmub, Jul 21, 2009
    #10
  11. Mark Dufour

    William Dode Guest

    On 21-07-2009, srepmub wrote:
    >
    >> With -bw and -O3 -fomit-frame-pointer -msse2 i have 5.5s (instead of 8)
    >>
    >> Let me know if you find better.

    >
    > thanks. now I'm wondering how fast does the C version become with
    > these flags..? :)


    I don't see any difference...

    --
    William Dodé - http://flibuste.net
    Informaticien Indépendant
     
    William Dode, Jul 21, 2009
    #11
  12. Mark Dufour

    Bearophile Guest

    Nick Craig-Wood:
    > Can you give a hint as to how I debug this?  I presume my program has
    > some instances of non static types which is causing the problem, but
    > it is going to be a very long debugging session if it takes me an hour
    > each cycle ;-)
    >
    > The program is about 700 lines of python (excluding comments).


    You can show us the Python (SSPython) code, and we can try to find the
    problem. Sometimes there's no simple ways to solve such problems.

    Generally for not very large progrograms if SS doesn't compile in
    about a minute or so then it's gone in infinite loop (there's a
    compilation flag that avoids some infinite loops, try it).

    Bye,
    bearophile
     
    Bearophile, Jul 21, 2009
    #12
  13. Mark Dufour

    Bearophile Guest

    William Dode':
    > http://hg.flibuste.net/libre/games/cheval
    > Like you'll see, i tried to use exactly the same code for each langage.


    It's a cute solver.

    Few more versions of mine:

    #1, a Psyco version of mine:
    http://codepad.org/9m5rf7kX

    #2, unrolled Psyco version:
    http://codepad.org/gKFLu34M

    #3, a quick D (D1) version:
    http://codepad.org/Tk9FL7Xk

    Timings (no printing), seconds, best of 3:
    #1: 4.79
    #2: 3.67
    #3: 1.10
    Your Psyco version: 13.37
    Your C version, compiled with GCC 4.3.2, -s -O3 -fomit-frame-pointer:
    3.79

    I have timed the whole running time of the programs, with an external
    timer, so my timings include the start of the PythonVM and the final
    cleanup of the GC.

    Please, feel free to time my code again, so we can compare with your
    other timings (Java, etc) better.

    I have used Psyco 1.6 final and Python 2.6.2 on a Core2 CPU at 2 GHz.

    To compile the D1 code you can use the free DMD compiler for D1, I
    have used version 1.042, www.digitalmars.com/d/download.html ).

    In such benchmarks it's often better to start from the fastest low
    level code (written in an imperative language as C/D/C++, or a version
    in a functional language like Clean or OCaML) and then use it to
    create higher level versions (in Python, etc). This offers you a
    baseline timing, and usually the small differences of the higher level
    code compared to the original C/Clean code don't invalidate the test.

    I have changed only small things in the program, as you can see the
    Psyco program #2 is faster than your C code :) If you learn how to
    use it well, Psyco1.6 can often lead to very good performance. But lot
    of people don't know how to use Psyco well (I too am ignorant: I don't
    know how to profile Python programs yet).

    Bye,
    bearophile
     
    Bearophile, Jul 21, 2009
    #13
  14. Mark Dufour

    Terry Reedy Guest

    Nick Craig-Wood wrote:
    >
    > I tried it on a program I wrote to solve a puzzle (Rush Hour).
    > (Actually it solves the meta-puzzle - trying to make the hardest
    > possible Rush Hour puzzle.)
    >
    > After a bit of twiddling (remove psyco and profiling) I got it to
    > start compiling, but unfortunately it compiled for about an hour (in
    > iterative type analysis..) used up 600 MB of RAM printing an '*' every
    > 10 minutes or so then gave an error message and gave up.
    >
    > Unfortunately I shut the window by accident, but the error message was
    > something about not being able to resolve types I think with a list of
    > 20 or so unresolved types.
    >
    > Can you give a hint as to how I debug this? I presume my program has
    > some instances of non static types which is causing the problem, but
    > it is going to be a very long debugging session if it takes me an hour
    > each cycle ;-)
    >
    > The program is about 700 lines of python (excluding comments).


    Split it into pieces and compile each separately.
    Recurse.

    tjr
     
    Terry Reedy, Jul 21, 2009
    #14
  15. Mark Dufour

    greg Guest

    William Dode wrote:

    > I just tested it with a litle game, to find the places of horse on
    > a board 5x5. The result is :
    >
    > cython avec malloc *int 18s


    Posting benchmark times for Pyrex or Cython is pretty
    meaningless without showing the exact code that was
    used, since times can vary enormously depending on
    how much you C-ify things.

    --
    Greg
     
    greg, Jul 22, 2009
    #15
  16. Mark Dufour

    Bearophile Guest

    Bearophile, Jul 22, 2009
    #16
  17. Mark Dufour

    William Dode Guest

    I updated the script (python, c and java) with your unrolled version
    + somes litle thinks.

    I also tried with python3.1, unladen Q2, ironpython1.1.1

    Unfortunately it doesn't work more with shedskin, i'll see on the
    shedskin group...

    c 1.85s
    gcj 2.15s
    java 2.8s
    python2.5 + psyco 3.1s
    unladen-2009Q2 145s (2m45)
    python2.5 254s (4m14s)
    python3.1 300s (5m)
    ironpython1.1.1 680s (11m20)


    --
    William Dodé - http://flibuste.net
    Informaticien Indépendant
     
    William Dode, Jul 22, 2009
    #17
  18. Mark Dufour

    srepmub Guest

    please send any program that doesn't work with shedskin (it still is
    experimental after all) to me, or create an issue at
    shedskin.googlecode.com, and I will have a look at the problem.


    thanks,
    mark.
     
    srepmub, Jul 22, 2009
    #18
  19. Mark Dufour

    William Dode Guest

    William Dode, Jul 22, 2009
    #19
  20. On Jul 22, 7:38 am, William Dode <> wrote:

    > I updated the script (python, c and java) with your unrolled version
    > + somes litle thinks.
    >
    > I also tried with python3.1, unladen Q2, ironpython1.1.1
    >
    > Unfortunately it doesn't work more with shedskin, i'll see on the
    > shedskin group...
    >
    > c 1.85s
    > gcj 2.15s
    > java 2.8s
    > python2.5 + psyco 3.1s
    > unladen-2009Q2 145s (2m45)
    > python2.5 254s (4m14s)
    > python3.1 300s (5m)
    > ironpython1.1.1 680s (11m20)


    Cool; it would be interesting to see the numbers for Jython and Boo as
    well if it's not too much effort.

    George
     
    George Sakkis, Jul 22, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Dufour
    Replies:
    14
    Views:
    753
    Mark Dufour
    Sep 18, 2005
  2. Mark Dufour
    Replies:
    0
    Views:
    226
    Mark Dufour
    Feb 24, 2008
  3. srepmub
    Replies:
    0
    Views:
    296
    srepmub
    Jun 3, 2008
  4. Mark Dufour
    Replies:
    2
    Views:
    293
    srepmub
    Oct 1, 2008
  5. Mark Dufour
    Replies:
    1
    Views:
    249
    Paul Boddie
    Dec 5, 2008
Loading...

Share This Page