ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++compiler


W

William Dode

Cool; it would be interesting to see the numbers for Jython and Boo as
well if it's not too much effort.

I just tried with jython, but oddly it's faster without array.
Thanks to Mark, i could compile to shedskin again.
And add somes little improvements...

c 1.65s
gcj 1.9s
java 2.4s
python2.5 + psyco 2.9s
shedskin 3.4s
unladen-2009Q2 125s (2m05)
Jython 2.2.1 on java1.6.0_12 176s (without array, like shedskin)
Jython 2.2.1 on java1.6.0_12 334s (with array)
python2.5 215s (3m35s)
python3.1 246s (4m06s)
ironpython1.1.1 512 (8m32s)
 
Ad

Advertisements

W

William Dode

c 1.65s
gcj 1.9s
java 2.4s
python2.5 + psyco 2.9s
shedskin 3.4s

with -bw i have 2.6s
unladen-2009Q2 125s (2m05)
Jython 2.2.1 on java1.6.0_12 176s (without array, like shedskin)
Jython 2.2.1 on java1.6.0_12 334s (with array)
python2.5 215s (3m35s)
python3.1 246s (4m06s)
ironpython1.1.1 512 (8m32s)

somebody can test with ironpython on windows ?

Anyway, it's very impressive. I wonder if unladen will be so close in
the futur.
 
K

Kurt Smith


I took a stab at converting the recent psyco-optimized code to cython,
and got a speedup.

gcj 4.3.3 1.39s
gcc 4.3.3 1.55s
cython 11.2 1.91s
psyco 1.94s
javac 1.5.0_19 2.00s
python 2.5.4 168.37s

It was just a matter of cdef-ing all the arrays & integers --
bearophile already did the hard work :)

Here's the cython code; all the others are from the repo.

#################################################
DEF NMOVES = 8
DEF SIDE = 5
DEF SQR_SIDE = SIDE * SIDE

cdef int circuit[SQR_SIDE]
cdef int nsolutions = 0

cdef int movex[NMOVES]
cdef int movey[NMOVES]
py_movex = [-1,-2,-2,-1,+1,+2,+2,+1]
py_movey = [-2,-1,+1,+2,+2,+1,-1,-2]
for i in range(NMOVES):
movex = py_movex
movey = py_movey
shift = [x * SIDE + y for x,y in zip(py_movex, py_movey)]
cdef int shift_0 = shift[0]
cdef int shift_1 = shift[1]
cdef int shift_2 = shift[2]
cdef int shift_3 = shift[3]
cdef int shift_4 = shift[4]
cdef int shift_5 = shift[5]
cdef int shift_6 = shift[6]
cdef int shift_7 = shift[7]

def showCircuit():
print
for x in xrange(SIDE):
x_SIDE = x * SIDE
for y in xrange(SIDE):
if SQR_SIDE < 100:
print "%02d " % circuit[x_SIDE + y],
else:
print "%03d " % circuit[x_SIDE + y],
print

cdef void solve(int nb, int x, int y,
int SIDE=SIDE, int SQR_SIDE=SQR_SIDE, int *circuit=circuit,
int shift_0=shift_0,
int shift_1=shift_1,
int shift_2=shift_2,
int shift_3=shift_3,
int shift_4=shift_4,
int shift_5=shift_5,
int shift_6=shift_6,
int shift_7=shift_7,
):
global nsolutions

cdef int newx, newy
cdef int pos = x * SIDE + y
circuit[pos] = nb
if nb == SQR_SIDE:
#showCircuit()
nsolutions += 1
circuit[pos] = 0
return

newx = x + -1
if newx >= 0 and newx < SIDE:
newy = y + -2
if newy >= 0 and newy < SIDE and not circuit[pos + shift_0]:
solve(nb+1, newx, newy)

newx = x + -2
if newx >= 0 and newx < SIDE:
newy = y + -1
if newy >= 0 and newy < SIDE and not circuit[pos + shift_1]:
solve(nb+1, newx, newy)

newx = x + -2
if newx >= 0 and newx < SIDE:
newy = y + 1
if newy >= 0 and newy < SIDE and not circuit[pos + shift_2]:
solve(nb+1, newx, newy)

newx = x + -1
if newx >= 0 and newx < SIDE:
newy = y + 2
if newy >= 0 and newy < SIDE and not circuit[pos + shift_3]:
solve(nb+1, newx, newy)

newx = x + 1
if newx >= 0 and newx < SIDE:
newy = y + 2
if newy >= 0 and newy < SIDE and not circuit[pos + shift_4]:
solve(nb+1, newx, newy)

newx = x + 2
if newx >= 0 and newx < SIDE:
newy = y + 1
if newy >= 0 and newy < SIDE and not circuit[pos + shift_5]:
solve(nb+1, newx, newy)

newx = x + 2
if newx >= 0 and newx < SIDE:
newy = y + -1
if newy >= 0 and newy < SIDE and not circuit[pos + shift_6]:
solve(nb+1, newx, newy)

newx = x + 1
if newx >= 0 and newx < SIDE:
newy = y + -2
if newy >= 0 and newy < SIDE and not circuit[pos + shift_7]:
solve(nb+1, newx, newy)

circuit[pos] = 0

def main():
print "Search for side=%d" % SIDE
cdef int x,y
for x in range(SIDE):
for y in range(SIDE):
solve(1, x, y);
print "\n%dx%d case, %d solutions." % (SIDE, SIDE, nsolutions)

def run():
import time
s=time.time()
main()
print time.time()-s
#################################################
 
Ad

Advertisements

B

Bearophile

William Dode':
I updated the script (python, c and java) with your unrolled version
+ somes litle thinks. [...]
c 1.85s
gcj 2.15s
java 2.8s
python2.5 + psyco 3.1s
unladen-2009Q2 145s (2m45)
python2.5 254s (4m14s)
python3.1 300s (5m)
ironpython1.1.1 680s (11m20)

Sorry for being late, I was away.

In your last C version this code is useless because the C compiler is
able to perform such simple optimization by itself (but probably
Python isn't able, so if you want the code to be the similar in all
versions it may be better to keep it):

shift_0=shift[0];
shift_1=shift[1];
shift_2=shift[2];
shift_3=shift[3];
shift_4=shift[4];
shift_5=shift[5];
shift_6=shift[6];
shift_7=shift[7];


This part in the Python code is useless:

shift_0 = shift[0]
shift_1 = shift[1]
shift_2 = shift[2]
shift_3 = shift[3]
shift_4 = shift[4]
shift_5 = shift[5]
shift_6 = shift[6]
shift_7 = shift[7]

Because later you copy values locally anyway:

def solve(nb, x, y,
SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit,
shift_0=shift_0,
shift_1=shift_1,
shift_2=shift_2,
shift_3=shift_3,
shift_4=shift_4,
shift_5=shift_5,
shift_6=shift_6,
shift_7=shift_7,
):

So doing something like this is probably enough:

def solve(nb, x, y,
SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit,
shift_0=shift[0],
shift_1=shift[1],
shift_2=shift[2],
shift_3=shift[3],
shift_4=shift[4],
shift_5=shift[5],
shift_6=shift[6],
shift_7=shift[7],
):

In low-level languages like C unrolling has to be done with care, to
avoid slowing down the code.

I have tried your latest C version using your compiler options, my
MinGW based on GCC 4.3.2 produces a crash at runtime. Using LLVM-GCC
it runs in 1.31 seconds. The D version is a bit less optimized than
your last C versions, yet using DMD it runs in 1.08-1.10 seconds.
Let's see if someone is able to write a C version faster than that D
code :)

Have you have compiled/read my D version? In the D version you may
have missed that I did use an extra trick: unsigned integers, so it
needs just two tests to see if a number is in the 0-5, 0-5 square :)
Note that Pyd, the Python-D bridge, may work with the latest DMD
version still (and it works if you use a bit older DMD compiler):
http://pyd.dsource.org/

Bye,
bearophile
 
Ad

Advertisements

W

William Dode

William Dode':
I updated the script (python, c and java) with your unrolled version
+ somes litle thinks. [...]
c 1.85s
gcj 2.15s
java 2.8s
python2.5 + psyco 3.1s
unladen-2009Q2 145s (2m45)
python2.5 254s (4m14s)
python3.1 300s (5m)
ironpython1.1.1 680s (11m20)

Sorry for being late, I was away.

In your last C version this code is useless because the C compiler is
able to perform such simple optimization by itself (but probably
Python isn't able, so if you want the code to be the similar in all
versions it may be better to keep it):

I wanted so, specialy if it doesn't change a lot of the result (the
difference is so small that i cannot see it)...

....
I have tried your latest C version using your compiler options, my
MinGW based on GCC 4.3.2 produces a crash at runtime.

Maybe because of -msse2 ?
Using LLVM-GCC
it runs in 1.31 seconds. The D version is a bit less optimized than
your last C versions, yet using DMD it runs in 1.08-1.10 seconds.
Let's see if someone is able to write a C version faster than that D
code :)

Have you have compiled/read my D version? In the D version you may
have missed that I did use an extra trick: unsigned integers, so it
needs just two tests to see if a number is in the 0-5, 0-5 square :)

I didn't see, fine ! But the difference is also too small to see...
Note that Pyd, the Python-D bridge, may work with the latest DMD
version still (and it works if you use a bit older DMD compiler):
http://pyd.dsource.org/

I completly don't know anything about D... When i see the result of
psyco or shedskin, i'm affraid i'll not look somewhere else soon !

However, i'd like to see a lisp implementation of this...

bye
 

Top