Shed Skin Python-to-C++ Compiler 0.0.21, Help needed

M

mark.dufour

I don't see how that can be--we're talking about a GCC-based compiler,
right?

no, Shed Skin is a completely separate entity, that outputs C++ code.
it's true I only use GCC to test the output, and I use some GCC-
specific extensions (__gnu_cxx::hash_map/hash_set), but people have
managed to compile things with Visual Studio or whatever it is
called.

btw, the windows version of Shed Skin comes with GCC so it's easy to
compile things further (two commands, 'ss program' and 'make run'
suffice to compile and run some program 'program.py')


mark dufour (Shed Skin author).
 
K

Kay Schluehr

Trying to incrementally convert an old interpreter into a compiler
is probably not going to work.

I'm talking about something that is not very different from what Psyco
does but Psyco works at runtime and makes continous measurements for
deciding whether it can compile some bytecodes just-in-time or let the
interpreter perform their execution. You can also try a different
approach and decide statically whether you can compile some function
or interpret it. Then you factorize each module m into m = m_native *
m_interp. This factorization shall depend only on the capabilities of
the translator / native compiler and the metadata available for your
functions. Since you care for the correct interfaces and glue code
early and maintain it continually you never run into severe
integration problems.

----------------------------------------------------------

A factorization always follows a certain pattern that preserves the
general form and creates a specialization:

def func(x,y):
# algorithm

====>

from native import func_int_int

def func(x,y):
if isinstance(x, int) and isinstance(y, int):
return func_int_int(x,y) # wrapper of natively compiled
specialized function
else:
# perform original unmodified algorithm on bytecode interpreter

Or in decorator notation:

from native import func_int_int

@apply_special( ((int, int), func_int_int) )
def func(x,y):
# algorithm

where apply_special transforms the first version of func into the
second version.

Now we have the correct form and the real and hard work can begin i.e.
the part Mark was interested and engaged in.
Shed Skin may be demonstrating that "annotations" are unnecessary
cruft and need not be added to Python. Automatic type inference
may be sufficient to get good performance.

You still dream of this, isn't it? Type inference in dynamic languages
doesn't scale. It didn't scale in twenty years of research on
SmallTalk and it doesn't in Python. However there is no no-go theorem
that prevents ambitious newbies to type theory wasting their time and
efforts.
The Py3K annotation model is to some extent a repeat of the old
Visual Basic model. Visual Basic started as an interpreter with one
default type, which is now called Variant, and later added the usual types,
Integer, String, Boolean, etc., which were then manually declared.
That's where Py3K is going.

Read the related PEP, John. You will see that Guidos genius is that of
a good project manager in that case who knows that the community works
for him. The trade is that he supplies the syntax/interface and the
hackers around him fantasize about semantics and start
implementations. Not only annotations are optional but also their
meaning. This has nothing to do with VB and it has not even much to do
with what existed before in language design.

Giving an example of annotation semantics:

def func(x:int, y:int):
# algorithm

can be translated according to the same pattern as above. The meaning
of the annotation according to the envisioned annotation handler is as
follows: try to specialize func on the types of the arguments and
perform local type inference. When successfull, compile func with
these arguments and map the apply_special decorator. When translation
is unfeasible, send a warning. If type violation is detected under
this specialization send a warning or an exception in strict-checking-
mode.

I fail to see how this violates duck-typing and brings VisualBasic to
the Python community. But maybe I just underrate VB :)

Kay
 
J

John Nagle

Kay said:
I'm talking about something that is not very different from what Psyco
does but Psyco works at runtime and makes continous measurements for
deciding whether it can compile some bytecodes just-in-time or let the
interpreter perform their execution.

That can work. That's how the Tamirin JIT compiler does Javascript
inside Mozilla. The second time something is executed interpretively,
it's compiled. That's a tiny JIT engine, too; it's inside the Flash
player. Runs both JavaScript and ActionScript generated programs.
Might be able to run Python, with some work.
A factorization always follows a certain pattern that preserves the
general form and creates a specialization:

def func(x,y):
# algorithm

====>

from native import func_int_int

def func(x,y):
if isinstance(x, int) and isinstance(y, int):
return func_int_int(x,y) # wrapper of natively compiled
specialized function
else:
# perform original unmodified algorithm on bytecode interpreter

You can probably offload that decision onto the linker by creating
specializations with different type signatures and letting the C++
name resolution process throw out the ones that aren't needed at
link time.
You still dream of this, isn't it? Type inference in dynamic languages
doesn't scale. It didn't scale in twenty years of research on
SmallTalk and it doesn't in Python.

I'll have to ask some of the Smalltalk people from the PARC era
about that one.
> However there is no no-go theorem
that prevents ambitious newbies to type theory wasting their time and
efforts.

Type inference analysis of Python indicates that types really don't
change all that much. See

http://www.python.org/workshops/2000-01/proceedings/papers/aycock/aycock.html

Only a small percentage of Python variables ever experience a type change.
So type inference can work well on real Python code.

The PyPy developers don't see type annotations as a win. See Karl Botlz'
comments in

http://www.velocityreviews.com/forums/t494368-p3-pypy-10-jit-compilers-for-free-and-more.html

where he writes:

"Also, I fail to see how type annotations can have a huge speed-advantage
versus what our JIT and Psyco are doing."
This has nothing to do with VB and it has not even much to do
with what existed before in language design.

Type annotations, advisory or otherwise, aren't novel. They
were tried in some LISP variants. Take a look at this
experimental work on Self, too.

http://www.cs.ucla.edu/~palsberg/paper/spe95.pdf

Visual Basic started out more or less declaration-free, and
gradually backed into having declarations. VB kept a "Variant"
type, which can hold anything and was the implicit type.
Stripped of the Python jargon, that's what's proposed for Py3K.
Just because it has a new name doesn't mean it's new.

It's common for languages to start out untyped and "simple",
then slowly become more typed as the limits of the untyped
model are reached.

Another thing that can go wrong with a language: if you get too hung
up on providing ultimate flexibility in the type and object system,
too much of the language design and machinery is devoted to features
that are very seldom used. C++ took that wrong turn a few years ago,
when the language designers became carried away with their template
mechanism, to the exclusion of fixing the real problems that drive their
user base to Java or C#.

Python, the language, is in good shape. It's the limitations
of the CPython implementation that are holding it back. It looks
like at least two projects are on track to go beyond the
limitations of that implementation. This is good.

John Nagle
 
M

mark.dufour

You still dream of this, isn't it? Type inference in dynamic languages
doesn't scale. It didn't scale in twenty years of research on
SmallTalk and it doesn't in Python. However there is no no-go theorem

type inference sure is difficult business, and I won't deny there are
scalability issues, but the situation has improved a lot since back in
the smalltalk days. since then, type inference theory has not stood
still: agesen' cartesian product algorithm and plevyak's iterative
flow analysis (both published around '96) have greatly improved the
situation; a 1000-fold or more increase in computer speeds have
additionally made actual type inference (experimentation) much more
practical. (for anyone interested in the techniques powering shed
skin, see agesen and plevyak's phd theses for a relatively recent
update on the field.)

but in any case, I believe there are several reasons why type
inference scalability is actually not _that_ important (as long as it
works and doesn't take infinite time):

-I don't think we want to do type inference on large Python programs.
this is indeed asking for problems, and it is not such a bad approach
to only compile critical parts of programs (why would we want to
compile PyQt code, for example.) I do think type inference scales well
enough to analyze arbitrary programs of up to, say, 5,000 lines. I'm
not there yet with Shed Skin, but I don't think it's that far away (of
course I'll need to prove this now :))

-type inference can be assisted by profiling (so dramatically less
iterations are necessary to come to a full proof). profiling doesn't
have to fully cover code, because type inference fills in the gaps;
type inference can also be assisted by storing and reusing analysis
results, so profiling only has to be done once, or the analysis can be
made easier by running it multiple times during development. because
Shed Skin doesn't use profiling or memorization, and I know many
things to improve the type analysis scalability, I'm confident it can
scale much further than the programs it works for now (see ss-
progs.tgz from the homepage for a collection of 27 programs, such as
ray tracers, chess engines, sat solvers, sudoku solvers, pystone and
richards..).

besides, (as john points out I think), there is a difference between
analyzing an actual dynamic language and a essentially static language
(such as the Python subset that Shed Skin accepts). it allows one to
make certain assumptions that make type inference easier.
that prevents ambitious newbies to type theory wasting their time and
efforts.

yes, it's probably a waste of time to try and analyze large, actually
dynamic, Python programs, but I don't think we should want this at
all. computer speeds make Python fast enough for many purposes, and
global type inference scalability would demand us to throw out many
nice Python features. a JIT compiler seems better here..

where I think Shed Skin and similar tools can shine is in compiling
pure Python extensions modules and relatively small programs. having
worked on type inference for some time now, with modern techniques :),
I see no reason why we can't compile statically typed Python programs,
up to several thousands of lines. my analysis works pretty well
already (see ss-progs.tgz), and there are many things I can still
improve, besides adding profiling and memorization..
Read the related PEP, John. You will see that Guidos genius is that of
a good project manager in that case who knows that the community works
for him. The trade is that he supplies the syntax/interface and the
hackers around him fantasize about semantics and start
implementations. Not only annotations are optional but also their
meaning. This has nothing to do with VB and it has not even much to do
with what existed before in language design.

I think it's more Pythonic to just profile a program to learn about
actual types..


mark dufour (Shed Skin author).
 
J

John Nagle

but in any case, I believe there are several reasons why type
inference scalability is actually not _that_ important (as long as it
works and doesn't take infinite time):

-I don't think we want to do type inference on large Python programs.
this is indeed asking for problems, and it is not such a bad approach
to only compile critical parts of programs (why would we want to
compile PyQt code, for example.) I do think type inference scales well
enough to analyze arbitrary programs of up to, say, 5,000 lines. I'm
not there yet with Shed Skin, but I don't think it's that far away (of
course I'll need to prove this now :))

-type inference can be assisted by profiling

Something else worth trying: type inference for separately
compiled modules using the test cases for the modules. One
big problem with compile-time type inference is what to do
about separate compilation, where you have to make decisions
without seeing the whole program. An answer to this is to
optimize for the module's test cases. If the test cases
always use an integer value for a parameter, generate hard
code for the case where that variable is a integer. As long
as there's some way to back down, at link time, to a more general
but slower version, programs will still run. If the test
cases reflect normal use cases for the module, this should
lead to generation of reasonable library module code cases.
besides, (as john points out I think), there is a difference between
analyzing an actual dynamic language and a essentially static language
(such as the Python subset that Shed Skin accepts). it allows one to
make certain assumptions that make type inference easier.

Yes. And, more than that, most programs have relatively
simple type behavior for most variables. The exotic stuff
just doesn't happen that often.

John Nagle
 
P

Paul Boddie

Something else worth trying: type inference for separately
compiled modules using the test cases for the modules.

I mentioned such possibilities once upon a time:

http://blog.amber.org/2004/12/23/static-typing-and-python/

Note the subject of the original article, by the way. And as a
postscript, I'd advise anyone wondering what happened to Starkiller to
take a look at Shed Skin instead, since it more or less does what
Starkiller was supposed to do.
One big problem with compile-time type inference is what to do
about separate compilation, where you have to make decisions
without seeing the whole program. An answer to this is to
optimize for the module's test cases. If the test cases
always use an integer value for a parameter, generate hard
code for the case where that variable is a integer. As long
as there's some way to back down, at link time, to a more general
but slower version, programs will still run. If the test
cases reflect normal use cases for the module, this should
lead to generation of reasonable library module code cases.

People are always going to argue that a just-in-time compiler saves
everyone from thinking too hard about these issues, but I still think
that there's a lot of mileage in deducing types at compile time for a
number of reasons. Firstly, there are some applications where it might
be desirable to avoid the overhead of a just-in-time compilation
framework - I can imagine that this is highly desirable when
developing software for embedded systems. Then, there are applications
where one wants to know more about the behaviour of the code, perhaps
for documentation purposes or to help with system refactoring, perhaps
to minimise the risk of foreseeable errors. Certainly, if this latter
area were not of interest, there wouldn't be tools like pylint.

Paul

P.S. Another aspect of the referenced article that is worth noting is
the author's frustration with the state of the standard library:
something which almost always gets mentioned in people's pet Python
hates, but something mostly ignored in the wider enthusiasm for
tidying up the language.
 
K

Kay Schluehr

Something else worth trying: type inference for separately
compiled modules using the test cases for the modules.

Seem like we agree on this point. The idea of defining a type
recording phase and identifying it with UT execution was actually my
original motivation to consider alternative frameworks for compiling
Python.
One
big problem with compile-time type inference is what to do
about separate compilation, where you have to make decisions
without seeing the whole program. An answer to this is to
optimize for the module's test cases. If the test cases
always use an integer value for a parameter, generate hard
code for the case where that variable is a integer. As long
as there's some way to back down, at link time, to a more general
but slower version, programs will still run. If the test
cases reflect normal use cases for the module, this should
lead to generation of reasonable library module code cases.

The nice thing about this idea is that your type system is complient
with your test base. When a test succeeds it can not be invalidated by
a recorded type that gets annotated. You really get some metadata and
program description elements for free. There are also some caveats
about subtyping and using type information that is to special - or not
special enough. In the first case the maximal nominal type that
includes the recorded type as a subtype can be chosen that is
complient to the structural properties of the type i.e. the interface
description of the class accordingly. I have no idea about deeper
specialization. Guess this can't be handled automatically.

Methodologically this means that testing and important aspects of
optimizing Python code are not separated from each other. This is
elegant and will increase overall code quality and acceptance of the
language.

Ciao,
Kay
 
B

bearophileHUGS

Paul Boddie:
the author's frustration with the state of the standard library:
something which almost always gets mentioned in people's pet Python
hates, but something mostly ignored in the wider enthusiasm for
tidying up the language.

There is some possibility that Python 3.1 will have what you ask for:
http://www.python.org/dev/peps/pep-3108/

Bye,
bearophile
 
P

Paul Boddie

K

Kay Schluehr

Prior to that PEP being written/published, I made this proposal:

http://wiki.python.org/moin/CodingProjectIdeas/StandardLibrary/Restru...

After being brought to the attention of the PEP's author, it seems to
have been swept under the carpet on the Wiki, but it's a more radical
proposal in a number of ways than the PEP seems to be.

Paul

Note that the conflict of putting modules on top level or better
within separate packages is not an either-or decision from a
programmers point of view who just wants to access those modules. A
top level module like lib or std can be pretty virtual since you can
create modules at runtime whenever you try to import them. I used this
strategy for a project where editing objects in separate files leaded
to a better overview compared to one large file containing all
definitions. However I created one module at runtime that served as a
common access point for all these particular definitions that were
tedious to import separately and would have required file system
lookups quite often. This might even allow multiple classifications
but I haven't experimented with them yet.

Kay
 
P

Paul Boddie

Note that the conflict of putting modules on top level or better
within separate packages is not an either-or decision from a
programmers point of view who just wants to access those modules. A
top level module like lib or std can be pretty virtual since you can
create modules at runtime whenever you try to import them.

Or, if the subpackages/submodules are small enough, just import them
and make their contents available at higher levels in the hierarchy.
I used this strategy for a project where editing objects in separate files leaded
to a better overview compared to one large file containing all
definitions. However I created one module at runtime that served as a
common access point for all these particular definitions that were
tedious to import separately and would have required file system
lookups quite often. This might even allow multiple classifications
but I haven't experimented with them yet.

Yes, I tend to make aliases available quite a bit. It seems to me that
introducing a hierarchy into the standard library has a fairly limited
cost: you need to occupy some more top-level names, some unfortunately
being potentially common, but any growth in the library isn't as
likely to introduce more naming conflicts - something that anyone with
their own calendar.py, new.py or xml.py programs might find
desirable. ;-)

One problem I've become more aware of, however, is the role of C-based
modules in the library. I think there was an unresolved issue about
such modules living at non-root locations in the library hierarchy,
but aside from that, the work required to clean up any abstractions
eventually requires people to roll up their sleeves and look at non-
Python code.

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top