ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++compiler

M

Mark Dufour

Hi all,

I have just released version 0.2 of Shed Skin, an experimental
(restricted) Python-to-C++ compiler (http://shedskin.googlecode.com).
It comes with 7 new example programs (for a total of 40 example
programs, at over 12,000 lines) and several important improvements/bug
fixes. See http://code.google.com/p/shedskin/wiki/ReleaseNotes for the
full changelog.

The new example programs consist of Disco, an elegant go player (see
http://shed-skin.blogspot.com/2009/07/disco-elegant-python-go-player.html),
a larger Voronoi implementation at 800 lines, a TSP algorithm
simulating ant colonies, a nicer neural network algorithm and three
compressors (Lempel-Ziv, huffman block, and arithmetic).

Other than bug fixes for these programs, this release adds some
important optimizations. First and foremost, inlining was greatly
improved, resulting in potential speedups across the board. Second,
loops such as 'for a, b in enumerate/zip(sequence[, sequence])' should
now be dramatically faster (also inside list comprehensions), by
avoiding allocation of intermediate tuples. Finally, basic list
slicing should now be much faster.


Please try it out!
Mark Dufour.
 
W

William Dode

Hi all,

I have just released version 0.2 of Shed Skin, an experimental
(restricted) Python-to-C++ compiler (http://shedskin.googlecode.com).

I just tested it with a litle game, to find the places of horse on
a board 5x5. The result is :

c 5s
gcj 7s
java 7s
shedskin 8s
python + psyco 18s
cython avec malloc *int 18s
cython 55s avec [] python
python 303s (5m3s)
 
S

Stefan Behnel

William said:
I have just released version 0.2 of Shed Skin, an experimental
(restricted) Python-to-C++ compiler (http://shedskin.googlecode.com).

I just tested it with a litle game, to find the places of horse on
a board 5x5. The result is :

[...]
shedskin 8s
python + psyco 18s
cython avec malloc *int 18s
cython 55s avec [] python
python 303s (5m3s)

Note that both Psyco and Cython make a lot less assumptions about Python
code than Shed Skin does. Psyco has the advantage of just needing to jump
in when it finds out that it can help, so it's the most broadly compatible
of the three. But Cython also supports quite a large corpus of dynamic
Python code by now. Shed Skin has a lot of restrictions, many of which are
by design. It's not intended to compile dynamic code, and I think that's a
good thing, because that's what makes it fast for the code that it
supports. Getting the same speed in Cython requires a bit more explicit
typing, simply because Cython does not assume these restrictions.

I think that all three have their raison d'être, and it currently looks
like all three are there to stay and to keep growing better. And I'm also
happy to read that some optimisations jump from one to the other. ;)

Stefan
 
B

Bearophile

William Dode':
I just tested it with a litle game, to find the places of horse on
a board 5x5. The result is :

c 5s
gcj 7s
java 7s
shedskin 8s
python + psyco 18s
cython avec malloc *int 18s
cython 55s avec [] python
python 303s (5m3s)

Nice timings, can you please show me the Python, Java and C code
versions? I may do more tests.
The purpose of all those "example programs" in ShedSkin is to find
bugs and to find details to speedup.

Bye,
bearophile
 
S

skip

William> c 5s
William> gcj 7s
William> java 7s
William> shedskin 8s
William> python + psyco 18s
William> cython avec malloc *int 18s
William> cython 55s avec [] python
William> python 303s (5m3s)

I read just enough French to know that "avec" means "with", but I don't
understand the difference between "avec malloc *int" and "avec []". Can you
explain please?

Thx,
 
B

Bearophile

Skip Montanaro:
I read just enough French to know that "avec" means "with", but I don't
understand the difference between "avec malloc *int" and "avec []".  Can you
explain please?

Maybe it's the time difference between using a Python list from Cython
and using a C "array" allocated with a malloc from Cython.

Bye,
bearophile
 
S

srepmub

Nice timings, can you please show me the Python, Java and C code
versions? I may do more tests.

also, which shedskin options did you use? did you use -bw, to disable
bounds and wrap-around checking? this can make quite a difference for
code that does a lot of indexing. if the code uses random numbers,
then -r can also make a big difference, to use C rand(), instead of
Python compatible random numbers.

and which C++ compiler flags did you use? the default -O2, or
something like -O3 -fomit-frame-pointer -msse2..?


thanks,
mark.
 
W

William Dode

Of course, the codes are here :

http://hg.flibuste.net/libre/games/cheval

Like you'll see, i tried to use exactly the same code for each langage.
also, which shedskin options did you use? did you use -bw, to disable
bounds and wrap-around checking? this can make quite a difference for
code that does a lot of indexing. if the code uses random numbers,
then -r can also make a big difference, to use C rand(), instead of
Python compatible random numbers.

and which C++ compiler flags did you use? the default -O2, or
something like -O3 -fomit-frame-pointer -msse2..?

I used the default, shedksin cheval.py; make
shedskin 0.2

With -bw and -O3 -fomit-frame-pointer -msse2 i have 5.5s (instead of 8)

Let me know if you find better.
 
W

William Dode

Skip Montanaro:
I read just enough French to know that "avec" means "with", but I don't
understand the difference between "avec malloc *int" and "avec []".  Can you
explain please?

Maybe it's the time difference between using a Python list from Cython
and using a C "array" allocated with a malloc from Cython.

yes, it's this
 
S

srepmub

With -bw and -O3 -fomit-frame-pointer -msse2 i have 5.5s (instead of 8)

Let me know if you find better.

thanks. now I'm wondering how fast does the C version become with
these flags..? :)


mark.
 
B

Bearophile

Nick Craig-Wood:
Can you give a hint as to how I debug this?  I presume my program has
some instances of non static types which is causing the problem, but
it is going to be a very long debugging session if it takes me an hour
each cycle ;-)

The program is about 700 lines of python (excluding comments).

You can show us the Python (SSPython) code, and we can try to find the
problem. Sometimes there's no simple ways to solve such problems.

Generally for not very large progrograms if SS doesn't compile in
about a minute or so then it's gone in infinite loop (there's a
compilation flag that avoids some infinite loops, try it).

Bye,
bearophile
 
B

Bearophile

William Dode':
http://hg.flibuste.net/libre/games/cheval
Like you'll see, i tried to use exactly the same code for each langage.

It's a cute solver.

Few more versions of mine:

#1, a Psyco version of mine:
http://codepad.org/9m5rf7kX

#2, unrolled Psyco version:
http://codepad.org/gKFLu34M

#3, a quick D (D1) version:
http://codepad.org/Tk9FL7Xk

Timings (no printing), seconds, best of 3:
#1: 4.79
#2: 3.67
#3: 1.10
Your Psyco version: 13.37
Your C version, compiled with GCC 4.3.2, -s -O3 -fomit-frame-pointer:
3.79

I have timed the whole running time of the programs, with an external
timer, so my timings include the start of the PythonVM and the final
cleanup of the GC.

Please, feel free to time my code again, so we can compare with your
other timings (Java, etc) better.

I have used Psyco 1.6 final and Python 2.6.2 on a Core2 CPU at 2 GHz.

To compile the D1 code you can use the free DMD compiler for D1, I
have used version 1.042, www.digitalmars.com/d/download.html ).

In such benchmarks it's often better to start from the fastest low
level code (written in an imperative language as C/D/C++, or a version
in a functional language like Clean or OCaML) and then use it to
create higher level versions (in Python, etc). This offers you a
baseline timing, and usually the small differences of the higher level
code compared to the original C/Clean code don't invalidate the test.

I have changed only small things in the program, as you can see the
Psyco program #2 is faster than your C code :) If you learn how to
use it well, Psyco1.6 can often lead to very good performance. But lot
of people don't know how to use Psyco well (I too am ignorant: I don't
know how to profile Python programs yet).

Bye,
bearophile
 
T

Terry Reedy

Nick said:
I tried it on a program I wrote to solve a puzzle (Rush Hour).
(Actually it solves the meta-puzzle - trying to make the hardest
possible Rush Hour puzzle.)

After a bit of twiddling (remove psyco and profiling) I got it to
start compiling, but unfortunately it compiled for about an hour (in
iterative type analysis..) used up 600 MB of RAM printing an '*' every
10 minutes or so then gave an error message and gave up.

Unfortunately I shut the window by accident, but the error message was
something about not being able to resolve types I think with a list of
20 or so unresolved types.

Can you give a hint as to how I debug this? I presume my program has
some instances of non static types which is causing the problem, but
it is going to be a very long debugging session if it takes me an hour
each cycle ;-)

The program is about 700 lines of python (excluding comments).

Split it into pieces and compile each separately.
Recurse.

tjr
 
G

greg

William said:
I just tested it with a litle game, to find the places of horse on
a board 5x5. The result is :

cython avec malloc *int 18s

Posting benchmark times for Pyrex or Cython is pretty
meaningless without showing the exact code that was
used, since times can vary enormously depending on
how much you C-ify things.
 
W

William Dode

I updated the script (python, c and java) with your unrolled version
+ somes litle thinks.

I also tried with python3.1, unladen Q2, ironpython1.1.1

Unfortunately it doesn't work more with shedskin, i'll see on the
shedskin group...

c 1.85s
gcj 2.15s
java 2.8s
python2.5 + psyco 3.1s
unladen-2009Q2 145s (2m45)
python2.5 254s (4m14s)
python3.1 300s (5m)
ironpython1.1.1 680s (11m20)
 
S

srepmub

please send any program that doesn't work with shedskin (it still is
experimental after all) to me, or create an issue at
shedskin.googlecode.com, and I will have a look at the problem.


thanks,
mark.
 
G

George Sakkis

I updated the script (python, c and java) with your unrolled version
+ somes litle thinks.

I also tried with python3.1, unladen Q2, ironpython1.1.1

Unfortunately it doesn't work more with shedskin, i'll see on the
shedskin group...

c 1.85s
gcj 2.15s
java 2.8s
python2.5 + psyco 3.1s
unladen-2009Q2 145s (2m45)
python2.5 254s (4m14s)
python3.1 300s (5m)
ironpython1.1.1 680s (11m20)

Cool; it would be interesting to see the numbers for Jython and Boo as
well if it's not too much effort.

George
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top