a basic bytecode to machine code compiler

Discussion in 'Python' started by Rouslan Korneychuk, Mar 31, 2011.

  1. I was looking at the list of bytecode instructions that Python uses and
    I noticed how much it looked like assembly. So I figured it can't be to
    hard to convert this to actual machine code, to get at least a small
    boost in speed.

    And so I whipped up a proof of concept, available at
    https://github.com/Rouslan/nativecompile

    I'm aware that PyPy already has a working JIT compiler, but I figure it
    will be a long time before they have a version of Python that is ready
    for everybody to use, so this could be useful in the mean time.

    I chose to create this for the latest stable version of Python and I
    happen to use some functionality that is only available since Python 3.2.

    The basic usage is:

    >>> import nativecompile
    >>> bcode = compile('print("Hello World!")','<string>','exec')
    >>> mcode = nativecompile.compile(bcode)
    >>> mcode()

    Hello World!


    This compiler does absolutely nothing clever. The only difference
    between the bytecode version and the compiled version is there is no
    interpreter loop and the real stack is used instead of an array.

    Most of it is written in Python itself. There is one module written in C
    that does the things that cannot easily be done in pure Python, such as
    get the addresses of API functions and to execute the newly created code.

    So far I have only implemented a few bytecode instructions and only have
    32-bit x86-compatible support. I have only tested this on Linux. It
    might work on Windows but only if you can run programs without any sort
    of data execution prevention (I can fix that if anyone wants). And I'm
    sure more optimized machine code can be generated (such as rearranging
    the error checking code to work better with the CPU's branch predictor).

    Since so few instructions are implemented I haven't done any benchmarks.


    What do people think? Would I be wasting my time going further with this?
    Rouslan Korneychuk, Mar 31, 2011
    #1
    1. Advertising

  2. Rouslan Korneychuk, 01.04.2011 00:33:
    > I was looking at the list of bytecode instructions that Python uses and I
    > noticed how much it looked like assembly. So I figured it can't be to hard
    > to convert this to actual machine code, to get at least a small boost in
    > speed.


    I think I recall having read about something like this before, but I can't
    quite find it. In any case, you may want to take a look at both Cython and
    Psyco.

    Stefan
    Stefan Behnel, Apr 1, 2011
    #2
    1. Advertising

  3. Rouslan Korneychuk

    Terry Reedy Guest

    On 3/31/2011 6:33 PM, Rouslan Korneychuk wrote:
    > I was looking at the list of bytecode instructions that Python uses and
    > I noticed how much it looked like assembly. So I figured it can't be to
    > hard to convert this to actual machine code, to get at least a small
    > boost in speed.
    >
    > And so I whipped up a proof of concept, available at
    > https://github.com/Rouslan/nativecompile
    >
    > I'm aware that PyPy already has a working JIT compiler, but I figure it
    > will be a long time before they have a version of Python that is ready
    > for everybody to use, so this could be useful in the mean time.


    I believe PyPy folk think it ready now, at least for some uses.
    Speedwise, it is more or less is now comparable to CPython, winning some
    benchmarks, losing others.
    ....
    > What do people think? Would I be wasting my time going further with this?


    Depends on whether your goal is personal (learning, fun, use) or
    usefulness to others. For the latter, you *might* do better to help with
    an existing project, such as Cython or Dufour's ShedSkin.

    --
    Terry Jan Reedy
    Terry Reedy, Apr 1, 2011
    #3
  4. On Thu, 31 Mar 2011 18:33:36 -0400, Rouslan Korneychuk wrote:

    > I'm aware that PyPy already has a working JIT compiler, but I figure it
    > will be a long time before they have a version of Python that is ready
    > for everybody to use, so this could be useful in the mean time.


    PyPy is ready to use *now*, if you are happy writing code that targets
    Python 2.5 and don't need C extensions.


    [...]
    > What do people think? Would I be wasting my time going further with
    > this?


    Depends on what your ultimate aim is. If it is to learn things yourself,
    then it is never a waste of time to learn new things.

    If your aim is to get a good working project that you can be proud to put
    on your CV, then go right ahead.

    If your aim is to contribute to a Python compiler that will actually be
    used by people other than yourself, I'm not so sure... personally, I
    expect that PyPy is the future of Python optimizing compilers, but I
    could be wrong.

    I suggest you check out the competitors:

    Shedskin is a Python to C++ compiler;
    Psyco is a JIT specialising compiler;
    Nuitka claims to be a C++ implementation that compiles to machine code;
    Berp claims to be a Haskell implementation that does the same;
    Compyler claims to be a native x86 assembly compiler;
    UnPython claims to be an experimental Python to C compiler.


    Of the six, as far as I know only Shedskin and Psyco are widely used.



    Good luck, and remember:

    Release early, release often, and let the community know when you do!


    --
    Steven
    Steven D'Aprano, Apr 1, 2011
    #4
  5. Steven D'Aprano, 01.04.2011 14:57:
    > I suggest you check out the competitors:
    >
    > Shedskin is a Python to C++ compiler;
    > Psyco is a JIT specialising compiler;
    > Nuitka claims to be a C++ implementation that compiles to machine code;
    > Berp claims to be a Haskell implementation that does the same;
    > Compyler claims to be a native x86 assembly compiler;
    > UnPython claims to be an experimental Python to C compiler.
    >
    >
    > Of the six, as far as I know only Shedskin and Psyco are widely used.


    Erm, yes, right. If you want to exclude Cython, which arguably is the only
    static Python compiler that actually has a large user base, then those may
    really be the only two that are widely used. Except that Psyco is certainly
    being used a lot more often than Shedskin, mainly because it actually
    allows you to execute Python code.

    Stefan
    Stefan Behnel, Apr 1, 2011
    #5
  6. Thanks for all the replies. I wasn't aware of some of these
    alternatives. Most of these seem to transform Python code/bytecode into
    another language. I was already well aware of Cython. On the Nuitka
    blog, I notice it says "Compiling takes a lot [sic] time, ...". Compyler
    seems to generate assembly and then parse the assembly to generate a
    Windows exe. Berp turns python into Haskell, not directly into machine code.

    The closest thing to mine seems to be Psyco. It tries to do something
    more ambitious. It analyzes the program while it's running to create
    specialized versions of certain functions. High memory usage seems to be
    an issue with Psyco.

    My approach is to simply translate the bytecode into raw machine code as
    directly as possible, quickly and without using much memory. Basically I
    was going for a solution with no significant drawbacks. It was also
    meant to be very easy to maintain. The machine code is generated with a
    series of functions that very closely mirrors AT&T syntax (same as the
    default syntax for the GNU assembler) with some convenience functions
    that make it look like some kind of high-level assembly. For example,
    here is the implementation for LOAD_GLOBAL:

    @hasname
    def _op_LOAD_GLOBAL(f,name):
    return (
    f.stack.push_tos(True) + [
    ops.push(address_of(name)),
    ops.push(GLOBALS)
    ] + call('PyDict_GetItem') +
    if_eax_is_zero([
    discard_stack_items(1),
    ops.push(BUILTINS)
    ] + call('PyDict_GetItem') +
    if_eax_is_zero([
    discard_stack_items(1),
    ops.push(pyinternals.raw_addresses[
    'GLOBAL_NAME_ERROR_MSG']),
    ops.push(pyinternals.raw_addresses['PyExc_NameError'])
    ] + call('format_exc_check_arg') + [
    goto(f.end)
    ])
    ) + [
    discard_stack_items(2)
    ])

    To make sense of it, you just need to ignore the square brackets and
    plus signs (they are there to create a list that gets joined into one
    byte string at the very end) and imagine it's assembly code (I should
    probably just write a variadic function or use operator overloading to
    make this syntactically clearer). Any time a new version of Python is
    released, you would just run diff on Python-X.X/Python/ceval.c and see
    which op codes need updating. You wouldn't need to make a broad series
    of changes because of a minor change in Python's syntax or bytecode.

    And that's just one of the larger op code implementations. Here's the
    one for LOAD_CONST:

    @hasconst
    def _op_LOAD_CONST(f,const):
    return f.stack.push_tos() + [f.stack.push(address_of(const))]


    Anyway, It was certainly interesting working on this. I'll probably at
    least implement looping and arithmetic so I can have something
    meaningful to benchmark.
    Rouslan Korneychuk, Apr 1, 2011
    #6
  7. On Fri, 01 Apr 2011 17:45:39 +0200, Stefan Behnel wrote:

    > Steven D'Aprano, 01.04.2011 14:57:
    >> I suggest you check out the competitors:
    >>
    >> Shedskin is a Python to C++ compiler; Psyco is a JIT specialising
    >> compiler; Nuitka claims to be a C++ implementation that compiles to
    >> machine code; Berp claims to be a Haskell implementation that does the
    >> same; Compyler claims to be a native x86 assembly compiler; UnPython
    >> claims to be an experimental Python to C compiler.
    >>
    >>
    >> Of the six, as far as I know only Shedskin and Psyco are widely used.

    >
    > Erm, yes, right. If you want to exclude Cython, which arguably is the
    > only static Python compiler that actually has a large user base, then
    > those may really be the only two that are widely used. Except that Psyco
    > is certainly being used a lot more often than Shedskin, mainly because
    > it actually allows you to execute Python code.


    My apologies, I thought about including Cython in the list, but my
    understanding of it is that it is a derivative of Pyrex, and used for
    writing C extensions in a Python-like language (Python + type
    annotations). We were talking about talking ordinary, unmodified Python
    code and compiling it to machine code, and I didn't think either Pyrex or
    Cython do that.




    --
    Steven
    Steven D'Aprano, Apr 2, 2011
    #7
  8. Steven D'Aprano, 02.04.2011 12:04:
    > On Fri, 01 Apr 2011 17:45:39 +0200, Stefan Behnel wrote:
    >
    >> Steven D'Aprano, 01.04.2011 14:57:
    >>> I suggest you check out the competitors:
    >>>
    >>> Shedskin is a Python to C++ compiler; Psyco is a JIT specialising
    >>> compiler; Nuitka claims to be a C++ implementation that compiles to
    >>> machine code; Berp claims to be a Haskell implementation that does the
    >>> same; Compyler claims to be a native x86 assembly compiler; UnPython
    >>> claims to be an experimental Python to C compiler.
    >>>
    >>>
    >>> Of the six, as far as I know only Shedskin and Psyco are widely used.

    >>
    >> Erm, yes, right. If you want to exclude Cython, which arguably is the
    >> only static Python compiler that actually has a large user base, then
    >> those may really be the only two that are widely used. Except that Psyco
    >> is certainly being used a lot more often than Shedskin, mainly because
    >> it actually allows you to execute Python code.

    >
    > My apologies, I thought about including Cython in the list, but my
    > understanding of it is that it is a derivative of Pyrex, and used for
    > writing C extensions in a Python-like language (Python + type
    > annotations). We were talking about talking ordinary, unmodified Python
    > code and compiling it to machine code, and I didn't think either Pyrex or
    > Cython do that.


    Ok, no problem. Pyrex certainly doesn't play in the same league.

    Cython actually supports most Python language features now (including
    generators in the development branch), both from Python 2 and Python 3.
    Chances are that the next release will actually compile most of your Python
    code unchanged, or only with minor adaptations.

    Stefan
    Stefan Behnel, Apr 2, 2011
    #8
  9. Rouslan Korneychuk

    John Nagle Guest

    On 4/2/2011 3:30 AM, Stefan Behnel wrote:
    > Cython actually supports most Python language features now (including
    > generators in the development branch), both from Python 2 and Python 3.
    > Chances are that the next release will actually compile most of your
    > Python code unchanged, or only with minor adaptations.


    Cython requires the user to insert type declarations, though, to
    get a speed improvement.

    Shed Skin has a good type inference system, but it insists on
    a unique type for each object (which includes "null"; a type of
    "str or null" is not acceptable). The rest of Shed Skin, outside
    the type inference system, is not very well developed.

    There's a path there to a fast Python with some restrictions.
    The Shed Skin inference engine with the Cython engine might have
    potential.

    PyPy gets some speedups, mostly by recognizing when numbers can be
    unboxed. (CPython carries around a CObject for every integer; that's
    what's meant by "boxing") But outside of number-crunching, PyPy
    doesn't show big gains. And the whole PyPy JIT system is far
    too complex. They try to retain all of Python's dynamism, and
    as a result, they need to be able to bail from compiled code
    to one of two different interpreters, then recompile and go
    back to compiled mode.

    Unladen Swallow failed to produce much of a speed improvement
    and Google pulled the plug. It's time to look at alternatives.

    There's no easy way to speed up Python; that's been tried.
    It needs either a very, very elaborate JIT system, more complex
    than the ones for Java or Self, or some language restrictions.
    The main restriction I would impose is to provide a call that says:
    "OK, we're done with loading, initialization, and configuration.
    Now freeze the code." At that moment, all the global
    analysis and compiling takes place. This allows getting rid
    of the GIL and getting real performance out of multithread
    CPUs.

    John Nagle
    John Nagle, Apr 2, 2011
    #9
  10. Rouslan Korneychuk

    Paul Rubin Guest

    John Nagle <> writes:
    > There's no easy way to speed up Python; that's been tried.
    > It needs either a very, very elaborate JIT system, more complex
    > than the ones for Java or Self, or some language restrictions.


    Is it worse than Javascript? Tracemonkey and its descendants produce
    pretty fast code, I think.

    > The main restriction I would impose is to provide a call that says:
    > "OK, we're done with loading, initialization, and configuration.
    > Now freeze the code." At that moment, all the global
    > analysis and compiling takes place. This allows getting rid
    > of the GIL and getting real performance out of multithread CPUs.


    That's quite an interesting idea. I do think a lot of production Python
    code implicitly depends on the GIL and would need rework for multicore.
    For example, code that expects "n += 1" to be atomic, because the
    CPython bytecode interpreter won't switch threads in the middle of it.
    Paul Rubin, Apr 2, 2011
    #10
  11. Rouslan Korneychuk

    Robert Kern Guest

    On 4/2/11 2:05 PM, John Nagle wrote:

    > There's no easy way to speed up Python; that's been tried.
    > It needs either a very, very elaborate JIT system, more complex
    > than the ones for Java or Self, or some language restrictions.
    > The main restriction I would impose is to provide a call that says:
    > "OK, we're done with loading, initialization, and configuration.
    > Now freeze the code." At that moment, all the global
    > analysis and compiling takes place. This allows getting rid
    > of the GIL and getting real performance out of multithread
    > CPUs.


    That is not the reason the GIL exists, and being able to "freeze" the code will
    not let you get rid of the GIL alone. CPython's GIL exists to protect the data
    structures internal to its implementation and provide implicit protection to C
    extension modules (making them relatively easy to write threadsafe). Reference
    counts are the primary data structures being protected. IronPython and Jython
    allow just as much dynamism as CPython, and they have no mechanism for
    "freezing" the code, but they do not have a GIL. I believe this has been pointed
    out to you before, but I don't think I've seen you acknowledge it.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, Apr 3, 2011
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    5,669
    Michael Borgwardt
    Dec 10, 2004
  2. Robert McLay

    Is bytecode machine (in)dependent?

    Robert McLay, Oct 28, 2005, in forum: Python
    Replies:
    1
    Views:
    375
    Tim Peters
    Oct 28, 2005
  3. gk
    Replies:
    13
    Views:
    2,183
    Dijon Yu
    Sep 22, 2006
  4. Rouslan Korneychuk

    basic bytecode to machine code compiler (part 2)

    Rouslan Korneychuk, May 18, 2011, in forum: Python
    Replies:
    0
    Views:
    195
    Rouslan Korneychuk
    May 18, 2011
  5. Rouslan Korneychuk

    basic bytecode to machine code compiler (part 3)

    Rouslan Korneychuk, Jun 21, 2011, in forum: Python
    Replies:
    2
    Views:
    314
    Rouslan Korneychuk
    Jun 21, 2011
Loading...

Share This Page