Why is it impossible to create a compiler than can compile Python tomachinecode like C?

C

CM

The main issue is that python has dynamic typing.  The type of object
that is referenced by a particular name can vary, and there's no way
(in general) to know at compile time what the type of object "foo" is.

That makes generating object code to manipulate "foo" very difficult.

Could you help me understand this better? For example, if you
have this line in the Python program:

foo = 'some text'
bar = {'apple':'fruit'}

If the interpreter can determine at runtime that foo is a string
and bar is a dict, why can't the compiler figure that out at
compile time? Or is the problem that if later in the program
you have this line:

foo = 12

now foo is referring to an integer object, not a string, and
compilers can't have two names referring to two different
types of objects? Something like that?

I in no way doubt you that this is not possible, I just don't
understand enough about how compiling works to yet "get"
why dynamic typing is a problem for compilers.

Thanks.
 
8

88888 Dihedral

Could you help me understand this better? For example, if you

have this line in the Python program:



foo = 'some text'

bar = {'apple':'fruit'}



If the interpreter can determine at runtime that foo is a string

and bar is a dict, why can't the compiler figure that out at

compile time? Or is the problem that if later in the program

you have this line:



foo = 12



now foo is referring to an integer object, not a string, and

compilers can't have two names referring to two different

types of objects? Something like that?



I in no way doubt you that this is not possible, I just don't

understand enough about how compiling works to yet "get"

why dynamic typing is a problem for compilers.



Thanks.

The dynamic type part is normally in the higher level components of
objects and functions and generators.

Of course if one can be sure of the types of variables used
in some functions then that is the cython way to speed up pure OOP python
programs in executions.
 
B

Benjamin Kaplan

Could you help me understand this better? For example, if you
have this line in the Python program:

foo = 'some text'
bar = {'apple':'fruit'}

If the interpreter can determine at runtime that foo is a string
and bar is a dict, why can't the compiler figure that out at
compile time? Or is the problem that if later in the program
you have this line:

foo = 12

now foo is referring to an integer object, not a string, and
compilers can't have two names referring to two different
types of objects? Something like that?

I in no way doubt you that this is not possible, I just don't
understand enough about how compiling works to yet "get"
why dynamic typing is a problem for compilers.

Thanks.

In the case of literals, the compiler can figure out type information (and
I believe it does do some constant folding). But as soon as you let
something else get in between you and the constant, you lose all guarantees.

import random

if random.random() <0.5 :
spam = 3
else:
spam = "hello world"

Then you get into monkey patching and dealing with types that may not be
defined at compile time. The only way a python compiler could convert that
to x86 assembly is if generated code that would look up the type
information at runtime. You'd basically be outputting a python interpreter.
 
T

Terry Reedy

Could you help me understand this better? For example, if you
have this line in the Python program:

foo = 'some text'
bar = {'apple':'fruit'}

If the interpreter can determine at runtime that foo is a string
and bar is a dict, why can't the compiler figure that out at
compile time? Or is the problem that if later in the program
you have this line:

foo = 12

now foo is referring to an integer object, not a string, and
compilers can't have two names referring to two different
types of objects?

I believe you mean one name referring to multiple types.
Something like that?

Something like that. In python, objects are strongly typed, names do not
have types. Or if you prefer, names are typed according to their current
binding, which can freely change. As for why this can be an advantage,
consider this simple function.

def min2(a, b):
if a <= b:
return a
else:
return b

When the def statement is executed, the names 'a' and 'b' have no
binding and therefore no type, not even a temporary type.

In a statically typed language, either you or the compiler must rewrite
that function for every pair of types that are ever input to min2. If
the compiler does it, it either has to analyze an entire program, or it
have to compile variations on the fly, as needed. The latter is what the
psycho module, and, I believe, the pypy jit compiler does.
 
C

Chris Angelico

Could you help me understand this better? For example, if you
have this line in the Python program:

foo = 'some text'
bar = {'apple':'fruit'}

If the interpreter can determine at runtime that foo is a string
and bar is a dict, why can't the compiler figure that out at
compile time? Or is the problem that if later in the program
you have this line:

foo = 12

now foo is referring to an integer object, not a string, and
compilers can't have two names referring to two different
types of objects? Something like that?

I in no way doubt you that this is not possible, I just don't
understand enough about how compiling works to yet "get"
why dynamic typing is a problem for compilers.

Python doesn't have "variables" with "values"; it has names, which may
(or may not) point to objects. Dynamic typing just means that one name
is allowed to point to multiple different types of object at different
times.

The problem with dynamic typing is more one of polymorphism. Take this
expression as an example:

foo += bar;

In C, the compiler knows the data types of the two variables, and can
compile that to the appropriate code. If they're both integers,
that'll possibly become a single machine instruction that adds two
registers and stores the result back.

In C++, foo could be a custom class with an operator+= function. The
compiler will know, however, what function to call; unless it's a
virtual function, in which case there's a run-time check to figure out
what subclass foo is of, and then the function is called dynamically.

In Python, *everything* is a subclass of PyObject, and every function
call is virtual. That += operation is backed by the __iadd__ function,
defined by PyObject and possibly overridden by whatever type foo is.
So, at run time, the exact function is looked up.

C++ is most definitely a compiled language, at least in most
implementations I've seen. But it has the exact same issue as Python
has: true dynamism requires run-time lookups. That's really what
you're seeing here; it's nothing to do with any sort of "compiled" vs
"interpreted" dichotomy, but with "compile time" vs "run time"
lookups. In C, everything can be done at compile time; in Python, most
things are done at run time.

It's mainly a matter of degree. A more dynamic language needs to do
more work at run time.

ChrisA
 
S

Steven D'Aprano

The main issue is that python has dynamic typing. The type of object
that is referenced by a particular name can vary, and there's no way (in
general) to know at compile time what the type of object "foo" is.

That makes generating object code to manipulate "foo" very difficult.

That's only a limitation with a static compiler that tries to generate
fast machine code at compile time. A JIT compiler can generate fast
machine code at runtime. The idea is that the time you save by running
more optimized code is greater than the time it costs to generate that
code each time you run. If not, then you just run the normal byte-code
you would have run, and you've lost very little.

This has been a very successful approach. Psyco worked very well, and
it's successor PyPy seems to work even better. PyPy, on average, runs
about 5-6 times faster than CPython, and for some tasks about 50 times
faster. That makes it broadly speaking as fast as Java and approaching C,
and even in some circumstances even beat static C compilers. (Admittedly
only under very restrictive circumstances.)

The downside is that JIT compilers need a lot of memory. To oversimplify,
you might take source code like this:

x = a + b

A static compiler knows what types a and b are, and can turn it into a
single fairly compact piece of machine code. Using my own crappy invented
machine code notation:

ADD a, b
STORE x


A JIT compiler has to generate a runtime check and one or more fast
branches, plus a fallback:


CASE (a and b are both ints):
ADD a, b
STORE x
CASE (a and b are both strings):
COPY a, x
CONCATENATE b, x
OTHERWISE:
execute byte code to add two arbitrary objects


The above is probably nothing like the way PyPy actually works. Anyone
interested in how PyPy really works should spend some time reading their
website and blog.

http://pypy.org/


I can especially recommend this to give you a flavour for just how
complicated this sort of thing can become:

http://morepypy.blogspot.com.au/2011/01/loop-invariant-code-motion.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top