ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++compiler

William Dode · Jul 22, 2009

Cool; it would be interesting to see the numbers for Jython and Boo as
well if it's not too much effort.

I just tried with jython, but oddly it's faster without array.
Thanks to Mark, i could compile to shedskin again.
And add somes little improvements...

c 1.65s
gcj 1.9s
java 2.4s
python2.5 + psyco 2.9s
shedskin 3.4s
unladen-2009Q2 125s (2m05)
Jython 2.2.1 on java1.6.0_12 176s (without array, like shedskin)
Jython 2.2.1 on java1.6.0_12 334s (with array)
python2.5 215s (3m35s)
python3.1 246s (4m06s)
ironpython1.1.1 512 (8m32s)

William Dode · Jul 22, 2009

c 1.65s
gcj 1.9s
java 2.4s
python2.5 + psyco 2.9s
shedskin 3.4s

with -bw i have 2.6s

unladen-2009Q2 125s (2m05)
Jython 2.2.1 on java1.6.0_12 176s (without array, like shedskin)
Jython 2.2.1 on java1.6.0_12 334s (with array)
python2.5 215s (3m35s)
python3.1 246s (4m06s)
ironpython1.1.1 512 (8m32s)

somebody can test with ironpython on windows ?

Anyway, it's very impressive. I wonder if unladen will be so close in
the futur.

George Sakkis · Jul 22, 2009

I just tried with jython, but oddly it's faster without array.

FYI Jython 2.5 was released last month, you may want to try this
instead of 2.2.

George

Kurt Smith · Jul 22, 2009

greg:

Was this link, shown by William, not enough?
http://hg.flibuste.net/libre/games/cheval/file/46797c3a5136/chevalx.pyx#l1

I took a stab at converting the recent psyco-optimized code to cython,
and got a speedup.

gcj 4.3.3 1.39s
gcc 4.3.3 1.55s
cython 11.2 1.91s
psyco 1.94s
javac 1.5.0_19 2.00s
python 2.5.4 168.37s

It was just a matter of cdef-ing all the arrays & integers --
bearophile already did the hard work

Here's the cython code; all the others are from the repo.

#################################################
DEF NMOVES = 8
DEF SIDE = 5
DEF SQR_SIDE = SIDE * SIDE

cdef int circuit[SQR_SIDE]
cdef int nsolutions = 0

cdef int movex[NMOVES]
cdef int movey[NMOVES]
py_movex = [-1,-2,-2,-1,+1,+2,+2,+1]
py_movey = [-2,-1,+1,+2,+2,+1,-1,-2]
for i in range(NMOVES):
movex = py_movex
movey = py_movey
shift = [x * SIDE + y for x,y in zip(py_movex, py_movey)]
cdef int shift_0 = shift[0]
cdef int shift_1 = shift[1]
cdef int shift_2 = shift[2]
cdef int shift_3 = shift[3]
cdef int shift_4 = shift[4]
cdef int shift_5 = shift[5]
cdef int shift_6 = shift[6]
cdef int shift_7 = shift[7]

def showCircuit():
print
for x in xrange(SIDE):
x_SIDE = x * SIDE
for y in xrange(SIDE):
if SQR_SIDE < 100:
print "%02d " % circuit[x_SIDE + y],
else:
print "%03d " % circuit[x_SIDE + y],
print

cdef void solve(int nb, int x, int y,
int SIDE=SIDE, int SQR_SIDE=SQR_SIDE, int *circuit=circuit,
int shift_0=shift_0,
int shift_1=shift_1,
int shift_2=shift_2,
int shift_3=shift_3,
int shift_4=shift_4,
int shift_5=shift_5,
int shift_6=shift_6,
int shift_7=shift_7,
):
global nsolutions

cdef int newx, newy
cdef int pos = x * SIDE + y
circuit[pos] = nb
if nb == SQR_SIDE:
#showCircuit()
nsolutions += 1
circuit[pos] = 0
return

newx = x + -1
if newx >= 0 and newx < SIDE:
newy = y + -2
if newy >= 0 and newy < SIDE and not circuit[pos + shift_0]:
solve(nb+1, newx, newy)

newx = x + -2
if newx >= 0 and newx < SIDE:
newy = y + -1
if newy >= 0 and newy < SIDE and not circuit[pos + shift_1]:
solve(nb+1, newx, newy)

newx = x + -2
if newx >= 0 and newx < SIDE:
newy = y + 1
if newy >= 0 and newy < SIDE and not circuit[pos + shift_2]:
solve(nb+1, newx, newy)

newx = x + -1
if newx >= 0 and newx < SIDE:
newy = y + 2
if newy >= 0 and newy < SIDE and not circuit[pos + shift_3]:
solve(nb+1, newx, newy)

newx = x + 1
if newx >= 0 and newx < SIDE:
newy = y + 2
if newy >= 0 and newy < SIDE and not circuit[pos + shift_4]:
solve(nb+1, newx, newy)

newx = x + 2
if newx >= 0 and newx < SIDE:
newy = y + 1
if newy >= 0 and newy < SIDE and not circuit[pos + shift_5]:
solve(nb+1, newx, newy)

newx = x + 2
if newx >= 0 and newx < SIDE:
newy = y + -1
if newy >= 0 and newy < SIDE and not circuit[pos + shift_6]:
solve(nb+1, newx, newy)

newx = x + 1
if newx >= 0 and newx < SIDE:
newy = y + -2
if newy >= 0 and newy < SIDE and not circuit[pos + shift_7]:
solve(nb+1, newx, newy)

circuit[pos] = 0

def main():
print "Search for side=%d" % SIDE
cdef int x,y
for x in range(SIDE):
for y in range(SIDE):
solve(1, x, y);
print "\n%dx%d case, %d solutions." % (SIDE, SIDE, nsolutions)

def run():
import time
s=time.time()
main()
print time.time()-s
#################################################

William Dode · Jul 23, 2009

I took a stab at converting the recent psyco-optimized code to cython,
and got a speedup.

thanks, i updated my repo with litle more tips.

greg · Jul 24, 2009

Bearophile said:
Was this link, shown by William, not enough?
http://hg.flibuste.net/libre/games/cheval/file/46797c3a5136/chevalx.pyx#l1

Yes, sorry, I posted too soon.

Bearophile · Jul 26, 2009

William Dode':

I updated the script (python, c and java) with your unrolled version
+ somes litle thinks. [...]
c 1.85s
gcj 2.15s
java 2.8s
python2.5 + psyco 3.1s
unladen-2009Q2 145s (2m45)
python2.5 254s (4m14s)
python3.1 300s (5m)
ironpython1.1.1 680s (11m20)

Sorry for being late, I was away.

In your last C version this code is useless because the C compiler is
able to perform such simple optimization by itself (but probably
Python isn't able, so if you want the code to be the similar in all
versions it may be better to keep it):

shift_0=shift[0];
shift_1=shift[1];
shift_2=shift[2];
shift_3=shift[3];
shift_4=shift[4];
shift_5=shift[5];
shift_6=shift[6];
shift_7=shift[7];

This part in the Python code is useless:

shift_0 = shift[0]
shift_1 = shift[1]
shift_2 = shift[2]
shift_3 = shift[3]
shift_4 = shift[4]
shift_5 = shift[5]
shift_6 = shift[6]
shift_7 = shift[7]

Because later you copy values locally anyway:

def solve(nb, x, y,
SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit,
shift_0=shift_0,
shift_1=shift_1,
shift_2=shift_2,
shift_3=shift_3,
shift_4=shift_4,
shift_5=shift_5,
shift_6=shift_6,
shift_7=shift_7,
):

So doing something like this is probably enough:

def solve(nb, x, y,
SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit,
shift_0=shift[0],
shift_1=shift[1],
shift_2=shift[2],
shift_3=shift[3],
shift_4=shift[4],
shift_5=shift[5],
shift_6=shift[6],
shift_7=shift[7],
):

In low-level languages like C unrolling has to be done with care, to
avoid slowing down the code.

I have tried your latest C version using your compiler options, my
MinGW based on GCC 4.3.2 produces a crash at runtime. Using LLVM-GCC
it runs in 1.31 seconds. The D version is a bit less optimized than
your last C versions, yet using DMD it runs in 1.08-1.10 seconds.
Let's see if someone is able to write a C version faster than that D
code

Have you have compiled/read my D version? In the D version you may
have missed that I did use an extra trick: unsigned integers, so it
needs just two tests to see if a number is in the 0-5, 0-5 square

Note that Pyd, the Python-D bridge, may work with the latest DMD
version still (and it works if you use a bit older DMD compiler):
http://pyd.dsource.org/

Bye,
bearophile

William Dode · Jul 27, 2009

William Dode':

I updated the script (python, c and java) with your unrolled version
+ somes litle thinks. [...]
c 1.85s
gcj 2.15s
java 2.8s
python2.5 + psyco 3.1s
unladen-2009Q2 145s (2m45)
python2.5 254s (4m14s)
python3.1 300s (5m)
ironpython1.1.1 680s (11m20)

Click to expand...

Sorry for being late, I was away.

In your last C version this code is useless because the C compiler is
able to perform such simple optimization by itself (but probably
Python isn't able, so if you want the code to be the similar in all
versions it may be better to keep it):

I wanted so, specialy if it doesn't change a lot of the result (the
difference is so small that i cannot see it)...

....

I have tried your latest C version using your compiler options, my
MinGW based on GCC 4.3.2 produces a crash at runtime.

Maybe because of -msse2 ?

Using LLVM-GCC
it runs in 1.31 seconds. The D version is a bit less optimized than
your last C versions, yet using DMD it runs in 1.08-1.10 seconds.
Let's see if someone is able to write a C version faster than that D
code

Have you have compiled/read my D version? In the D version you may
have missed that I did use an extra trick: unsigned integers, so it
needs just two tests to see if a number is in the 0-5, 0-5 square

I didn't see, fine ! But the difference is also too small to see...

Note that Pyd, the Python-D bridge, may work with the latest DMD
version still (and it works if you use a bit older DMD compiler):
http://pyd.dsource.org/

I completly don't know anything about D... When i see the result of
psyco or shedskin, i'm affraid i'll not look somewhere else soon !

However, i'd like to see a lisp implementation of this...

bye

ANN: Shed Skin 0.0.30,an experimental (restricted-)Python-to-C++ Compiler	1	Dec 5, 2008
Shed Skin (restricted) Python-to-C++ Compiler 0.0.28	0	Jun 3, 2008
[ANN] Shed Skin (restricted) Python-to-C++ compiler 0.0.27	0	Feb 24, 2008
Shed Skin (restricted) Python-to-C++ compiler 0.0.29	2	Sep 30, 2008
Shed Skin Python-to-C++ compiler 0.0.23	4	Aug 20, 2007
Shed Skin Python-to-C++ Compiler	0	Feb 8, 2007
Shed Skin Python-to-C++ Compiler 0.0.21, Help needed	32	Mar 31, 2007
Shed Skin Python-to-C++ Compiler	0	Jan 18, 2007

ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++compiler

William Dode

William Dode

George Sakkis

Kurt Smith

William Dode

greg

Bearophile

William Dode

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads