ANN compiler2 : Produce bytecode from Python 2.5 Abstract Syntax Trees

M

Michael Spencer

Announcing: compiler2
---------------------

For all you bytecode enthusiasts: 'compiler2' is an alternative to the standard
library 'compiler' package, with several advantages.

Improved pure-python compiler

- Produces identical bytecode* to the built-in compile function for all /Lib
and Lib/test modules, including 'peephole' optimizations
- Works with 2.5's 'factory-installed' ASTs, rather than 2.4's 'after-market'
version
- Is significantly faster

* Except for the pesky stack-depth calculation

Possible applications

- Understanding/documenting/verifying the compilation process
- Implementing experimental compilation features (compile-time constants,
function in-lining anyone?)
- Whatever the old compiler package is used for ;-)


Getting started
---------------
Point your svn client to:
http://svn.brownspencer.com/pycompiler/branches/new_ast/

Check out to a compiler2 directory on PYTHONPATH
Test with python test/test_compiler.py


Cheers
Michael
 
G

Georg Brandl

Michael said:
Announcing: compiler2

Is this a rewrite from scratch, or an improved stdlib compiler package?

In the latter case, maybe you can contribute some patches to the core.

Georg
 
M

Michael Spencer

Georg said:
Is this a rewrite from scratch, or an improved stdlib compiler package?

In the latter case, maybe you can contribute some patches to the core.

Georg

Georg

It started as the latter (i.e., the stdlib compiler package improved) and
proceeded via incremental change with the twin goals of generating correct
object code and creating a clean compiler architecture (the latter somewhat
subjective of course). I'd be happy in principle to contribute the work, but
the sum of the changes amounts to a substantial re-write, so I don't think it
can be meaningfully submitted as a patch.

Regards

Michael
 
G

Georg Brandl

Michael said:
Georg

It started as the latter (i.e., the stdlib compiler package improved) and
proceeded via incremental change with the twin goals of generating correct
object code and creating a clean compiler architecture (the latter somewhat
subjective of course). I'd be happy in principle to contribute the work, but
the sum of the changes amounts to a substantial re-write, so I don't think it
can be meaningfully submitted as a patch.

Perhaps you can bring up a discussion on python-dev about your improvements
and how they could be integrated into the standard library...

Georg
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Georg said:
Perhaps you can bring up a discussion on python-dev about your improvements
and how they could be integrated into the standard library...

Let me second this. The compiler package is largely unmaintained and
was known to be broken (and perhaps still is). A replacement
implementation, especially if it comes with a new maintainer, would
be welcome.

Regards,
Martin
 
P

Paul Boddie

Martin said:
Let me second this. The compiler package is largely unmaintained and
was known to be broken (and perhaps still is). A replacement
implementation, especially if it comes with a new maintainer, would
be welcome.

I don't agree entirely with the "broken" assessment. Although I'm not
chasing the latest language constructs, the AST construction part of
the package seems good enough to me, and apparent bugs like duplicate
parameters in function signatures are actually documented shortcomings
of the functionality provided. I certainly don't like the level of code
documentation; from the baroque compiler.visitor, for example:

# XXX should probably rename ASTVisitor to ASTWalker
# XXX can it be made even more generic?

However, a cursory scan of the bugs filed against the compiler module,
trying hard to filter out other compiler-related things, reveals that
most of the complaints are related to code generation, and the
compiler2 module appears to be squarely aimed at this domain.

I find the compiler package useful - at least the bits not related to
code generation - and despite apparent unawareness of its existence in
the community (judging from observed usage of the parser and tokenizer
modules in cases where the compiler module would have been more
appropriate), I'd be unhappy to see it dropped.

Paul
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Paul said:
I don't agree entirely with the "broken" assessment.

I was primarily talking about language support. For quite some time,
the compiler package wasn't able to compile the Python standard library,
until Guido van Rossum (and others) brought it back to work at the last
PyCon. It would simply reject certain more recent language constructs.
In the process of fixing it, it was also found to deviate from the
normal language definition, i.e. it would generate bad code.

Many of these are fixed, but it wouldn't surprise me if there are
still bugs remaining.

Regards,
Martin
 
M

Michael Spencer

Paul said:
Martin v. Löwis wrote:
....The compiler package is largely unmaintained and
I don't agree entirely with the "broken" assessment. Although I'm not
chasing the latest language constructs, the AST construction part of
the package seems good enough to me, and apparent bugs like duplicate
parameters in function signatures are actually documented shortcomings
of the functionality provided.

The existing package is only lightly tested, so it's hard to say whether it's
broken or not. The main test from test_compiler says

def testCompileLibrary(self):
# A simple but large test. Compile all the code in the
# standard library and its test suite. This doesn't verify
# that any of the code is correct, merely the compiler is able
# to generate some kind of code for it.

I certainly don't like the level of code
documentation; from the baroque compiler.visitor, for example:

# XXX should probably rename ASTVisitor to ASTWalker
# XXX can it be made even more generic?

However, a cursory scan of the bugs filed against the compiler module,
trying hard to filter out other compiler-related things, reveals that
most of the complaints are related to code generation, and the
compiler2 module appears to be squarely aimed at this domain.

That's right, compiler2 just uses the builtin compile to get the AST, then does
the code generation in Python. (ASTVisitor has disappeared in the process).
I find the compiler package useful - at least the bits not related to
code generation

I think you're saying that you find the AST itself valuable. I agree, and I've
promoted that sort of use on this list e.g.,
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/364469

However, 2.5 ASTs are better for this purpose: they are more accurate, faster to
create, have a more consistent node structure, and are somewhat easier to
traverse. I think the only reason to continue using 2.4 ASTs is for backward
compatibility. Moreover, the 2.5 trees will be automatically maintained as part
of the core compile process. Who knows what will happen to the 2.4 version?
>... I'd be unhappy to see it dropped.
Sooner or later, I think we should drop the 2.4 AST format - it's confusing to
have two formats, and produces unnecesary maintenance work. I think new
AST-manipulating apps would be better done using the 2.5 AST.

Until now, it hasn't been possible for Python apps to compile 2.5 ASTs to
bytecode, but compiler2 is working on fixing that.

It would be straightforward to write a new transformer that took 2.5 ASTs and
turned them into 2.4 ASTs, (and possible, but a bit harder to go the other way,
I suspect). But I'd rather just leave compiler alone, and document the changes
required to use 2.5 trees. The compiler2 package does this (see
http://svn.brownspencer.com/pycompiler/branches/new_ast/test/pyast.py ) - and
the changes required to a 2.4 application are easy.

Regards

Michael
 
M

Michael Spencer

Martin said:
Let me second this. The compiler package is largely unmaintained and
was known to be broken (and perhaps still is). A replacement
implementation, especially if it comes with a new maintainer, would
be welcome.

Regards,
Martin
Thanks, I will raise this on python-dev soon.

Regards

Michael
 
N

nnorwitz

Martin said:
Many of these are fixed, but it wouldn't surprise me if there are
still bugs remaining.

There's no maybe about it. http://python.org/sf/1544277
There were also problems with the global statement and something with
keyword args. Though, both of those may have been fixed. Georg would
probably remember, I think he fixed at least one of them.

It's definitely broken unless someone fixed it when I wasn't looking.
:)

I agree with Martin, a new maintainer would be nice. I plan to look at
compiler2 when I get a chance.

n
 
?

=?iso-8859-1?q?s=E9bastien_martini?=

Hi,
I was primarily talking about language support. For quite some time,
the compiler package wasn't able to compile the Python standard library,
until Guido van Rossum (and others) brought it back to work at the last
PyCon. It would simply reject certain more recent language constructs.
In the process of fixing it, it was also found to deviate from the
normal language definition, i.e. it would generate bad code.

Many of these are fixed, but it wouldn't surprise me if there are
still bugs remaining.

I don't know if it can hide some bugs or if the module has just never
been updated to support this bytecode but LIST_APPEND is never emitted.
In the module compiler, list comprehensions are implemented without
emitting this bytecode, howewer the current implementation seems to be
correct from syntax and execution point of view.

For example:
src = "[a for a in range(3)]"
co = compiler.compile(src, 'lc1', 'exec')
co
dis.dis(co)
1 0 BUILD_LIST 0
3 DUP_TOP
4 LOAD_ATTR 0 (append)
7 STORE_NAME 1 ($append0)
10 LOAD_NAME 2 (range)
13 LOAD_CONST 1 (3)
16 CALL_FUNCTION 1
19 GET_ITER 23 STORE_NAME 3 (a)
26 LOAD_NAME 1 ($append0)
29 LOAD_NAME 3 (a)
32 CALL_FUNCTION 1
35 POP_TOP
36 JUMP_ABSOLUTE 20 42 POP_TOP
43 LOAD_CONST 0 (None)
46 RETURN_VALUE 1 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_NAME 0 (_[1])
7 LOAD_NAME 1 (range)
10 LOAD_CONST 0 (3)
13 CALL_FUNCTION 1
16 GET_ITER 20 STORE_NAME 2 (a)
23 LOAD_NAME 0 (_[1])
26 LOAD_NAME 2 (a)
29 LIST_APPEND
30 JUMP_ABSOLUTE 17
>> 33 DELETE_NAME 0 (_[1])
36 POP_TOP
37 LOAD_CONST 1 (None)
40 RETURN_VALUE

Cordially,

sébastien martini
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

sébastien martini said:
I don't know if it can hide some bugs or if the module has just never
been updated to support this bytecode but LIST_APPEND is never emitted.
In the module compiler, list comprehensions are implemented without
emitting this bytecode, howewer the current implementation seems to be
correct from syntax and execution point of view.

It's probably a matter of personal taste, but I think the compiler
library should perform the same way as the builtin compiler - except
that it might be "better" in some cases.

So feel free to report this as a bug at sf.net/projects/python.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top