First release of Shed Skin, a Python-to-C++ compiler.

Discussion in 'Python' started by Mark Dufour, Sep 10, 2005.

  1. Mark Dufour

    Mark Dufour Guest

    After nine months of hard work, I am proud to introduce my baby to the
    world: an experimental Python-to-C++ compiler. It can convert many
    Python programs into optimized C++ code, without any user intervention
    such as adding type declarations. It uses rather advanced static type
    inference techniques to deduce type information by itself. In
    addition, it determines whether deduced types may be parameterized,
    and if so, it generates corresponding C++ generics. Based on deduced
    type information, it also attempts to convert heap allocation into
    stack and static preallocation (falling back to libgc in case this
    fails.)

    The compiler was motivated by the belief that in many cases it should
    be possible to automatically deduce C++ versions of Python programs,
    enabling users to enjoy both the productivity of Python and the
    efficiency of C++. It works best for Python programs written in a
    relatively static C++-style, in essence enabling users to specify C++
    programs at a higher level.

    At the moment the compiler correctly handles 124 unit tests, six of
    which are serious programs of between 100 and 200 lines:

    -an othello player
    -two satisfiability solvers
    -a japanese puzzle solver
    -a sudoku solver
    -a neural network simulator

    Unfortunately I am just a single person, and much work remains to be
    done. At the moment, there are several limitations to the type of
    Python programs that the compiler accepts. Even so, there is enough of
    Python left to be able to remain highly productive in many cases.
    However, for most larger programs, there are probably some minor
    problems that need to be fixed first, and some external dependencies
    to be implemented/bridged in C++.

    With this initial release, I hope to attract other people to help me
    locate remaining problems, help implement external dependencies, and
    in the end hopefully even to contribute to the compiler itself. I
    would be very happy to receive small programs that the compiler does
    or should be able to handle. If you are a C++ template wizard, and you
    would be interested in working on the C++ implementation of builtin
    types, I would also love to get in contact with you. Actually, I'd
    like to talk to anyone even slightly interested in the compiler, as
    this would be highly motivating to me.

    The source code is available at the following site. Please check the
    README for simple installation/usage instructions. Let me know if you
    would like to create ebuild/debian packages.

    Sourceforge site: http://shedskin.sourceforge.net
    Shed Skin blog: http://shed-skin.blogspot.com

    Should you reply to this mail, please also reply to me directly. Thanks!


    Credits

    Parts of the compiler have been sponsored by Google, via its Summer of
    Code program. I am very grateful to them for keeping me motivated
    during a difficult period. I am also grateful to the Python Software
    Foundation for chosing my project for the Summer of Code. Finally, I
    would like to thank my university advisor Koen Langendoen for guiding
    this project.


    Details

    The following describes in a bit more detail various aspects of the
    compiler. Before seriously using the compiler, please make sure to
    understand especially its limitations.

    Main Features

    -very precise, efficient static type inference (iterative object
    contour splitting, where each iteration performs the cartesian product
    algorithm)
    -stack and static pre-allocation (libgc is used as a fall-back)
    -support for list comprehensions, tuple assignments, anonymous funcs
    -generation of arbitrarily complex class and function templates
    (even member templates, or generic, nested list comprehensions)
    -binary tuples are internally analyzed
    -some understanding of inheritance (e.g. list(dict/list) becomes
    list<pyiter<A>>)
    -hierarchical project support: generation of corresponding C++
    hierarchy, including (nested) Makefiles; C++ namespaces
    -annotation of source code with deduced types
    -builtin classes, functions (enumerate, sum, min, max, range, zip..)
    -polymorphic inline caches or virtual vars/calls (not well tested)
    -always unbox scalars (compiler bails out with error if scalars are
    mixed with pointer types)
    -full source code available under the MIT license

    Main Limitations/TODO's

    -Windows support (I don't have Windows, sorry)
    -reflection (getattr, hasattr), dynamic inheritance, eval, ..
    -mixing scalars with pointer types (e.g. int and None in a single variable)
    -mixing unrelated types in single container instance variable other
    than tuple-2
    -holding different types of objects in tuples with length >2;
    builtin 'zip' can only take 2 arguments.
    -exceptions, generators, nested functions, operator overloading
    -recursive types (e.g. a = []; a.append(a))
    -expect some problems when mixing floats and ints together
    -varargs (*x) are not very well supported; keyword args are not supported yet
    -arbitrary-size arithmetic
    -possible non-termination ('recursive customization', have not
    encountered it yet)
    -profiling will be required for scaling to very large programs
    -combining binary-type tuples with single-type tuples (e.g. (1,1.0)+(2,))
    -unboxing of small tuples (should form a nice speedup)
    -foreign code has to be modeled and implemented/bridged in C++
    -some builtins are not implemented yet, e.g. 'reduce' and 'map'
     
    Mark Dufour, Sep 10, 2005
    #1
    1. Advertisements

  2. Mark Dufour

    Paul Rubin Guest

    Wow, looks really cool. But why that instead of Pypy?
     
    Paul Rubin, Sep 11, 2005
    #2
    1. Advertisements

  3. Hi!

    PyPy can currently compile Python code to C code and to LLVM bytecode.
    Note that even for LLVM bytecode the argument is void since LLVM
    (despite its name, which might lead one to think that it is Java-like)
    compiles its bytecode to native assembler.
    it's really just plain python (it completely runs on top of CPython
    after all) together with some restrictions -- which seem similar to the
    restictions that shedskin imposes btw.
    Sorry, I can't really follow you here. In what way does PyPy have a
    Java-like arrangement?
    there is. look at the LLVM page for details: www.llvm.org


    Cheers,

    Carl Friedrich Bolz
     
    Carl Friedrich Bolz, Sep 11, 2005
    #3
  4. Mark Dufour

    Paul Boddie Guest

    I imagine that this remark was made in reference to the just-in-time
    compilation techniques that PyPy may end up using, although I was under
    the impression that most CLR implementations also use such techniques
    (and it is possible to compile Java to native code as gcj proves).

    But on the subject of LLVM: although it seems like a very interesting
    and versatile piece of software, it also seems to be fairly difficult
    to build; my last attempt made the old-style gcc bootstrapping process
    seem like double-clicking on setup.exe. Does this not worry the PyPy
    team, or did I overlook some easier approach? (Noting that a Debian
    package exists for LLVM 1.4 but not 1.5.)

    Paul
     
    Paul Boddie, Sep 11, 2005
    #4
  5. Well, you did say you want help with locating problems. One problem with
    this is it doesn't build...


    If I try and build (following your instructions), I get presented with a
    whole slew of build errors - knock on errors from the first few:

    In file included from builtin_.cpp:1:
    builtin_.hpp:4:29: gc/gc_allocator.h: No such file or directory
    builtin_.hpp:5:23: gc/gc_cpp.h: No such file or directory
    In file included from builtin_.cpp:1:
    builtin_.hpp:89: error: syntax error before `{' token
    builtin_.hpp:93: error: virtual outside class declaration

    Which C++ libraries are you dependent on? (Stating this would be really
    useful, along with specific versions and if possible where you got them :)

    For reference, I'm building this on SuSE 9.3, under which I also have
    boehm-gc-3.3.5-5 installed. I suspect you're using the same gc library
    (having downloaded libgc from sourceforge and finding the includes don't
    match the above include names) but a different version. For reference this
    version/distribution of boehm-gc has the following file structure:

    /usr/include/gc.h
    /usr/include/gc_backptr.h
    /usr/include/gc_config_macros.h
    /usr/include/gc_cpp.h
    /usr/include/gc_local_alloc.h
    /usr/include/gc_pthread_redirects.h
    /usr/lib/libgc.a
    /usr/lib/libgc.la
    /usr/lib/libgc.so
    /usr/lib/libgc.so.1
    /usr/lib/libgc.so.1.0.1

    It's specifically the gc_cpp.h file that makes me suspect it's the same gc.

    Regards,


    Michael.
     
    Michael Sparks, Sep 11, 2005
    #5
  6. Hi Paul!

    Well, PyPy is still quite far from having a JIT build in. Plus the
    JIT-techniques will probably differ quite a bit from Java _and_ the CLR :).
    We are not that worried about this since

    a) building LLVM is not _that_ bad (you don't need to build the
    C-frontend, which is the really messy part) and

    b) the LLVM-backend is one of the more experimental backends we have
    anyway (in fact, we have discovered some bugs in LLVM with PyPy
    already). Since the C backend is quite stable we are not dependent
    solely on LLVM so this is not too big a problem. Note that this doesn't
    mean that the LLVM backend is not important: it's the only other backend
    (apart from the C one) that can succesfully translate the whole PyPy
    interpreter.

    Cheers,

    Carl Friedrich
     
    Carl Friedrich Bolz, Sep 11, 2005
    #6
  7. Mark Dufour

    Paul Boddie Guest

    I found that I needed both the libgc and libgc-dev packages for my
    Kubuntu distribution - installing them fixed the include issues that
    you observed - and it does appear to be the Boehm-Demers-Weiser GC
    library, yes. The only other issue I observed was the importing of the
    profile and pstats modules which don't exist on my system, but those
    imports seemed to be redundant and could be commented out anyway.

    Paul
     
    Paul Boddie, Sep 11, 2005
    #7
  8. Mark Dufour

    A.B., Khalid Guest

    Good work.

    I have good news and bad news.

    First the good ShedSkin (SS) more or less works on Windows. After
    patching gc6.5 for MinGW, building it, and testing it on WinXP with
    some succuess, and after patching my local copy of SS, I can get the
    test.py to compile from Python to C++, and it seems that I can get
    almost all the unit tests in unit.py to pass.

    Here is what I used:


    1. shedskin-0.0.1

    2. pyMinGW patched and MinGW compiled Python 2.4.1 from CVS:
    Python 2.4.1+ (#65, Aug 31 2005, 22:34:14)
    [GCC 3.4.4 (mingw special)] on win32
    Type "help", "copyright", "credits" or "license" for more information.

    3. MinGW 3.4.4:
    g++ -v
    Reading specs from
    e:/UTILIT~1/PROGRA~1/MINGW/BIN/../lib/gcc/mingw32/3.4.4/specs
    Configured with: ../gcc/configure --with-gcc --with-gnu-ld
    --with-gnu-as --host=mingw32 --target=mingw32 --prefix=/mingw
    --enable-threads --disable-nls
    --enable-languages=c,c++,f77,ada,objc,java --disable-win32-registry
    --disable-shared --enable-sjlj-exceptions --enable-libgcj
    --disable-java-awt --without-x --enable-java-gc=boehm
    --disable-libgcj-debug --enable-interpreter
    --enable-hash-synchronization --enable-libstdcxx-debug
    Thread model: win32
    gcc version 3.4.4 (mingw special)


    4. Also using:
    - mingw-runtime 3.8
    - w32api-3.3
    - binutils-2.16.91-20050827-1
    - gc6.5 (Bohem GC) locally patched



    Now the bad news. Four tests in Unit.py fail, brief output is as
    follows[1].

    [SKIP 19532 lines]
    *** tests failed: 4
    [(60, '__class__ and __name__ attributes'), (85, 'ifa: mixing strings
    and lists of strings in the same list'), (122, 'neural network
    simulator XXX later: recursive customization, plus some small fixes'),
    (124, 'small factorization program by Rohit Krishna Kumar')]


    Moreover, and since the GC system you used only works in "recent
    versions of Windows", it follows that this solution will not work in
    all versions. I tested it on Win98 and both GC tests and SS's unit.py
    tests crash; although SS can still seem to compile the tests to C++.

    At any rate, if anyone is interested in the patches they can be
    downloaded from [2].


    Regards,
    Khalid


    [1] The entire output of unit.py can also be found at [2]
    [2] http://jove.prohosting.com/iwave/ipython/Patches.html
     
    A.B., Khalid, Sep 11, 2005
    #8
  9. Mark's also let me know this. Part of the problem is the version in SuSE 9.3
    of the GC used is ancient - it should be version 6.5 onwards. Also for
    people compiling from source you (at minimum) should be using the
    configure line along the lines of:

    ./configure --enable-cplusplus

    If you don't, you get build problems because one of the needed libraries
    isn't built by default.

    I started off with the obvious "hello world" type program :

    ----------------------------------------
    print "GAME OVER"
    ----------------------------------------

    Which compiled cleanly and worked as expected. I then read Mark's short
    paper linked from his blog "Efficient Implementation of Modern Imperative
    Languages; Application to Python", and got concerned by the comments:

    """We have decided not to investigate two types of features: [...snip...];
    and those features that may be turned off without affecting correct
    programs, e.g. array bounds checking, and exceptions"""

    That set some alarm bells ringing, largely because LBYL being deprecated by
    many people in favour of exceptions based code. (And more to the point,
    widely used as a result)

    As a result, I tried a trivial, but obvious program that should have clear
    behaviour:

    ----------------------------------------
    x = []
    print "GAME OVER"

    x.append(5)
    print x[0]
    try:
    print x[1]
    print "This should never be seen..."
    except IndexError:
    print "It's OK, we caught it..."
    ----------------------------------------

    This compiles, but unfortunately has the following behaviour:

    GAME OVER
    5
    0
    This should never be seen...
    It's OK, we caught it...

    Obviously, neither the 0 nor the message following should have been
    displayed. It's a pity that this assumption was made, but given the short
    time the project's been going I can understand it, hopefully Mark will
    continue towards greater python compliance :)



    Michael.
     
    Michael Sparks, Sep 11, 2005
    #9
  10. Mark Dufour

    Paul Boddie Guest

    That piece of wisdom must have passed me by last time, when I probably
    heeded the scary warning from the configure script and made the mistake
    of getting the C front end. This time, the build process was virtually
    effortless, and I'll now have to investigate LLVM further.

    Thanks for the tip!

    Paul
     
    Paul Boddie, Sep 11, 2005
    #10
  11. Mark Dufour

    beliavsky Guest

    I am reluctant to attempt an arduous installation on Windows, but if
    Mr. Dufour or someone else could create a web site that would let you
    paste in Python code and see a C++ translation, I think this would
    expand the user base. Alternatively, a Windows executable would be
    nice.
     
    beliavsky, Sep 12, 2005
    #11
  12. This is great news. Congratulations!

    By the way, I read in your blog that you would be releasing a windows
    intaller soon.
    Have you, or anyone else, managed to do it?

    Cheers,
    Luis
     
    Luis M. Gonzalez, Sep 17, 2005
    #12
  13. Mark Dufour

    Mark Dufour Guest

    By the way, I read in your blog that you would be releasing a windows
    I just finished making a 20 MB (!) package for Windows XP (I'm not
    sure which older versions of Windows it will run on.) It includes the
    Boehm garbage collector and a C++ compiler (MingW), which hopefully
    will make it really easy to create executables. However, I'm not
    releasing it until somebody with XP can test it for me :) If you'd
    like to try what I have so far, please download
    http://kascade.org/shedskin-0.0.2.zip, unzip it and follow some simple
    steps in the README file. I would really like to know about anything
    that doesn't work, or is unclear!

    BTW, I also fixed all OSX problems, but I'm waiting for a friend to
    give it a final test.

    What kind of program would you like to compile?


    thanks!
    mark.
     
    Mark Dufour, Sep 17, 2005
    #13
  14. Mark Dufour

    A.B., Khalid Guest


    Here is the very end of a very long output of unit.py run in Python
    2.4.1 on WinXP Pro SP2:

    [generating c++ code..]
    *** compiling & running..
    rm test.o test.exe
    g++ -O3 -IG:/Downloads/Temp/ss2/shedskin -c test.cpp
    g++ -O3 -IG:/Downloads/Temp/ss2/shedskin test.o
    G:/Downloads/Temp/ss2/shedskin/libss.a -lgc -o test
    output:
    [3, 3, 3, 1097, 70201]

    *** success: small factorization program by Rohit Krishna Kumar 124
    *** no failures, yay!


    :)

    Well done. So what was causing that crash in test '__class__ and
    __name__ attributes' after all?

    I'll also try to test it on Win98.

    Regards,
    Khalid
     
    A.B., Khalid, Sep 18, 2005
    #14
  15. Mark Dufour

    Mark Dufour Guest

    *** success: small factorization program by Rohit Krishna Kumar 124
    Well, I did something like this:

    class_ c(..);
    class_ *cp = &c;

    class list {
    list() {
    this->class = cp;
    }
    }

    constant_list = new list(..);

    Now, depending on the order of things, I think this->class became
    somewhat undefined. In any case, putting all initializations in the
    right order in an initialization function called from main() fixed the
    problem.

    The problem with test 85 was that it should not actually be passed to
    g++, since it is indeed incorrect code :) However, because of some
    bug in unit.py, on sys.platform 'win32' g++ would always be called.

    Thanks again for your help. Your Makefile made it very easy for me to
    create a Windows package. I'm glad to learn about Mingw too.. very
    nice.
    I think you said the GC wouldn't work in this case..? Anyway, I've had
    my share of Windows for a while.. I would be really glad if somebody
    else could look into this.. :)

    Have you tried compiling any code of your own yet..?


    thanks!
    mark.
     
    Mark Dufour, Sep 18, 2005
    #15
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.