Why not a Python compiler?

  • Thread starter Santiago Romero
  • Start date
S

Steven D'Aprano

Gary Kurtz at SunCon 77 explained that it was a test to see if Obi-Wan
knew what he was doing; supposedly, Obi-Wan's expression indicated that
he knew Solo was feeding him shit.

Why the hell would the pilot care whether the passengers knew what a
parsec was? Did Concorde pilots quiz their passengers what Mach 1 means?

Especially a pirate like Solo, who really only cared about one question:
"can the passenger pay?".

I think Lucas didn't have a clue, myself; it's not credible that
citizens of a starfaring civilization who deliberately set out to hire a
starship wouldn't know the difference between time and distance. Occam's
razor says Lucas screwed up and doesn't want to admit it.

For sure.
 
S

Steven D'Aprano

Ryszard said:
I don't know the exact details but I think the issue is the dynamic
nature of Python makes it impossible to correctly store the various
types and changes into compiled code. Someone else will probably be
able to provide a good reason as to why it isn't very feasible, nor a
good idea. If you want to speed up your python look at Psyco.
http://psyco.sourceforge.net/

Yeah, but exactly what features make it so hard to write a compiler for
Python?
[...]

a. People tell me writing a compiler for Python is hard.

b. It's certainly way to hard for me.

c. But hey, I've heard about this neat language called Common Lisp that
has a compiler. It looks a lot like Python.

d. So why can't you brainboxes write a compiler for Python?

Please tell me if I'm missing anything from this summary of your thought
processes.


Be fair -- he's asking what specific features of Python make it hard.
That's a reasonable question.
 
J

Jeroen Ruigrok van der Werven

-On [20080207 22:09] said:
Errr... didn't one of the novels explain it away by describing the
kessel run as a region of space warped by black holes or other objects?
Bragging rights for crossing such a field thus centered on shortest
distance instead of time.

http://starwars.wikia.com/wiki/Kessel_Run

Han Solo claimed that his Millennium Falcon "made the Kessel Run in less
than twelve parsecs." The parsec is a unit of distance, not time. Solo was
not referring directly to his ship's speed when he made this claim. Instead,
he was referring to the shorter route he was able to travel by skirting the
nearby Maw black hole cluster, thus making the run in under the standard
distance. However, parsec relates to time in that a shorter distance equals
a shorter time at the same speed. By moving closer to the black holes, Solo
managed to cut the distance down to about 11.5 parsecs.

In the A New Hope novelization, Han says "standard time units" rather than
"parsecs". Therefore, the reduced distance of Solo's Kessel Run is most
likely a retcon to explain George Lucas's confusion of time and distance
units.
 
E

Erik Max Francis

Ivan said:
Gary Kurtz at SunCon 77 explained that it was a test to see if Obi-Wan
knew what he was doing; supposedly, Obi-Wan's expression indicated
that he knew Solo was feeding him shit.

I think Lucas didn't have a clue, myself; it's not credible that
citizens of a starfaring civilization who deliberately set out to hire
a starship wouldn't know the difference between time and distance.
Occam's razor says Lucas screwed up and doesn't want to admit it.

Indeed, that is precisely what I would have said. Any after-the-fact
explanation of why the error was present (but not corrected or pointed
out in the movie) is just so much rationalization.
 
S

Steve Holden

Steven said:
Why the hell would the pilot care whether the passengers knew what a
parsec was? Did Concorde pilots quiz their passengers what Mach 1 means?
They didn't need to, since the aircraft's speed was displayed inside the
cabin as a Mach number.
Especially a pirate like Solo, who really only cared about one question:
"can the passenger pay?".



For sure.
What's more, he won't pay dues so ILM can be a full sponsor member of
the PSF. It doesn't seem credible to me that an organization of that
size couldn't find $2,000 in its budget to support the language of which
it may well be the largest single user in the world. Not that I
seriously imagine George Lucas micromanages the budget down to that level.

regards
Steve
 
H

Hrvoje Niksic

Steven D'Aprano said:
Be fair -- he's asking what specific features of Python make it
hard. That's a reasonable question.

Indeed. The best explanation I've seen explained goes something like
this: imagine a hypothetical Python compiler that achieves native
compilation by compiling to Common Lisp and using the CL's compiler to
produce native code. Upon encountering the expression such as:

a + b

the compiler could do little else except translate it to something
like:

(python:add a b)

In order to correctly implement Python addition, python:add needs to
do a lot of work at run-time. It needs to check for __add__ method of
one or both operands without assuming what it does, since a
user-defined class is free to define __add__ to do whatever it
pleases. The compiler could attempt to infer the types of operands,
but that is hard since an expression such as "a = module.SomeClass()"
completely changes meaning if module.SomeClass or
module.SomeClass.__add__ change. Such changes may seem improbable,
but fact is that being able to do them is a documented part of the
language, and a lot of code makes good use of it. Assuming these
things don't happen means the compiler doesn't implement Python.

This applies not only to addition; expressions such as "foo.bar",
which include any method call, would be translated to (python:getattr
foo "bar"), and so on. Most functions would have to construct actual
tuples, since a function can be replaced with one that takes *args.
Again, optimizing almost any of this away would change the semantics
of Python. From the ability to assign to classes, to modules, to
globals(), and to __dict__'s, literally anything can change at
run-time. *Some* kinds of runtime dispatches can be sped up by
setting up sophisticated caches (one such cache for methods is being
applied to CPython), but getting that right without breaking
correctness is quite tricky. Besides the same caches could be used to
speed up CPython too, so they don't constitute an advantage of the
compiler.

The main determinant of Python's performance isn't the interpreter
overhead, but the amount of work that must be done at run-time and
cannot be moved to compile-time or optimized away.
 
S

Steve Holden

Steven said:
Ryszard said:
On Feb 5, 9:30 am, (e-mail address removed) wrote:

I don't know the exact details but I think the issue is the dynamic
nature of Python makes it impossible to correctly store the various
types and changes into compiled code. Someone else will probably be
able to provide a good reason as to why it isn't very feasible, nor a
good idea. If you want to speed up your python look at Psyco.
http://psyco.sourceforge.net/
Yeah, but exactly what features make it so hard to write a compiler for
Python?
[...]
a. People tell me writing a compiler for Python is hard.

b. It's certainly way to hard for me.

c. But hey, I've heard about this neat language called Common Lisp that
has a compiler. It looks a lot like Python.

d. So why can't you brainboxes write a compiler for Python?

Please tell me if I'm missing anything from this summary of your thought
processes.


Be fair -- he's asking what specific features of Python make it hard.
That's a reasonable question.
Bah, humbug. Maybe I should be getting more sleep ...

Fortunately someone less grumpy provided quite a decent answer.

regards
Steve
 
A

Arnaud Delobelle

Indeed.  The best explanation I've seen explained goes something like
this: imagine a hypothetical Python compiler that achieves native
compilation by compiling to Common Lisp and using the CL's compiler to
produce native code.  Upon encountering the expression such as:

a + b

the compiler could do little else except translate it to something
like:

(python:add a b)
[snip more interesting considerations about compiling python]

Please get back on topic. This discussion is about parsecs and
wookies now.
 
D

Dennis Lee Bieber

Why the hell would the pilot care whether the passengers knew what a
parsec was? Did Concorde pilots quiz their passengers what Mach 1 means?
But Mach 1 is flexible value, depending upon the density of the
media one is traversing... {Makes flying the variations of the U2 fun...
as the U2 is not a supersonic design, but at the altitudes it flies,
Mach1 and subsonic cruise are very close to each other -- too fast, and
the shock wave tears off the wings <G>}

A Parsec is a fixed value (which, admittedly, presumes the culture
developed a 360degree circle broken into degrees => minutes =>
seconds... or, at least, some units compatible with the concept of an
"arc second", like 400 grads of, say, 100 "minutes", each of 100
"seconds")
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
T

Torsten Bronger

Hallöchen!

-----Original Message-----
From: [email protected] [mailto:python-
[email protected]] On Behalf Of Torsten Bronger
Sent: Thursday, February 07, 2008 3:32 PM
To: (e-mail address removed)
Subject: Re: Why not a Python compiler?
I wonder if George Lucas intended it as a joke or if he thought
a parsec was a unit of time.

The latter because it was corrected in the novelization.

Errr... didn't one of the novels explain it away by describing the
kessel run as a region of space warped by black holes or other
objects? Bragging rights for crossing such a field thus centered
on shortest distance instead of time.

Well, in the link that Grant provided, it says

In the A New Hope novelization, Han says "standard time units"
rather than "parsecs". Therefore, the reduced distance of Solo's
Kessel Run is most likely a retcon to explain George Lucas's
confusion of time and distance units.

Tschö,
Torsten.
 
J

jack trades

Santiago Romero said:
Why not a Python COMPILER?


Check out CLPython it has a Python compiler, though I highly doubt it is
what you are thinking of.


From http://common-lisp.net/project/clpython/manual.html

Sometimes, the generated Python code can be simplified because the value of
an expressions is known at compile time. This is where compiler macros come
in. In this example, as 4 > 3 always holds, the compiler macro for py->
replaces (funcall #'py-> 4 3) by the Python value True. After that, the
compiler macro for py-val->lisp-bool recognizing True is a constant value,
replaces (py-val->lisp-bool True) by t. The Lisp compiler then deduces that
always the first branch of the if expression is taken, and replace the whole
(cond ...) by (py-print nil (list "y") nil).
In this example the compiler macros were able to remove a lot of the Lisp
code at compile time. This results in more efficient code. However, in
practice there is often not that much that can be decided at compile time,
due to Python-the-language being very dynamic. For example, in the
expression 5 + x the value of x can be anything. As classes are able to
redefine how the + operator behaves (by means of the __add__ and __radd__
methods), the value of 5 + x can be anything as well. Unless the context
gives more information about the type of x, the Lisp code must contain a
call to the generic addition function py-+.

Nevertheless, the compiler macro will inline "common" case, and make the
generic call only for "uncommon" arguments. If small integers (fixnums) are
common for the + operator, the compiler macro for py-+ could emit:

(if (typep x 'fixnum)
(+ 5 x)
(py-+ 5 x))The check for x being fixnum is very fast; and if x is indeed a
fixnum then the inline addition is also very fast. If x is not a fixnum, it
could another kind of (Lisp) number, or even a Pythonic object posing as a
number. The generic py-+ will handle that.


Compiled vs Interpreted Code
CLPython can run Python code in two modes, interpreted or compiled. In the
latter case, the Lisp code is translated into assembly. The advantage of
interpreted code is that debugging is easier (the stack trace contains more
information); on the other hand execution of compiled code is much faster.



Jack Trades
 
S

Steven D'Aprano

Hallöchen!

-----Original Message-----
From: [email protected] [mailto:python-
[email protected]] On Behalf Of Torsten Bronger
Sent: Thursday, February 07, 2008 3:32 PM To: (e-mail address removed)
Subject: Re: Why not a Python compiler?

I wonder if George Lucas intended it as a joke or if he thought a
parsec was a unit of time.

The latter because it was corrected in the novelization.

Errr... didn't one of the novels explain it away by describing the
kessel run as a region of space warped by black holes or other objects?
Bragging rights for crossing such a field thus centered on shortest
distance instead of time.

Well, in the link that Grant provided, it says

In the A New Hope novelization, Han says "standard time units"
rather than "parsecs". Therefore, the reduced distance of Solo's
Kessel Run is most likely a retcon to explain George Lucas's
confusion of time and distance units.

Bah humbug! New Hope, new poke.

Some of us are old enough to remember when Star Wars was Star Wars: the
movie was called Star Wars, the opening credits listed it as Star Wars
(with New Hope merely a subtitle specifically to invoke the flavour of
the Saturday afternoon movies of Lucas' teen years) and the novelization
was called Star Wars.

That novel was credited to Lucas but actually ghost-written by Alan Dean
Foster. The 1976 Sphere Books edition is called Star Wars, it's subtitled
"From the Adventures of Luke Skywalker", and the inside front page refers
to the movie as Star Wars. With no subtitle.

In that book, Han refers to "twelve standard time parts", a clunky and
ugly phrase even stupider than calling it twelve parsecs, which reads and
sounds like real language.

Personally, I think Lucas should have just said that in that particular
galaxy far far away and a long time ago, "parsec" *was* a measure of
time. I'd buy that.
 
P

Paul Boddie

[snip more interesting considerations about compiling python]

Please get back on topic. This discussion is about parsecs and
wookies now.

Yes, it's like the lower-value parts of Wikipedia have spilled out
onto Usenet. ;-)

But I think Hrvoje's post summarises quite well: Python operations are
not directly equivalent to the primitive operations which often employ
the same symbols (eg. arithmetic operators), determining the
attributes of objects typically requires relatively expensive
operations (compared to optimised cases in other languages),
predicting the outcome of such run-time operations is difficult
because even the slightest change can have global consequences.

That said, there are some things in Python which are intentionally
predictable: names are always associated with a particular scope
(local, global), access to such scopes is not configurable (unlike
access to instance namespaces, for example). Moreover, people have
asserted that many programs do not use the full potential of Python's
dynamic facilities. For example, how many programs do something like
this...?

class A:
...

class B:
...

for cls in A, B:
class C(cls):
...

Removing the mere possibility of such stuff piece by piece, or rather
telling the programmer that it makes their programs run slow, could
make Python programs more readily inspectable and potentially more
open to optimisation. I regard this as a more interesting route than
just slapping type annotations all over the place and pretending that
Java's younger brother hasn't just been conceived.

Paul
 
R

Ryszard Szopa

Indeed. The best explanation I've seen explained goes something like
this: imagine a hypothetical Python compiler that achieves native
compilation by compiling to Common Lisp and using the CL's compiler to
produce native code. Upon encountering the expression such as:

a + b

the compiler could do little else except translate it to something
like:

(python:add a b)

In order to correctly implement Python addition, python:add needs to
do a lot of work at run-time. It needs to check for __add__ method of
one or both operands without assuming what it does, since a
user-defined class is free to define __add__ to do whatever it
pleases. The compiler could attempt to infer the types of operands,
but that is hard since an expression such as "a = module.SomeClass()"
completely changes meaning if module.SomeClass or
module.SomeClass.__add__ change. Such changes may seem improbable,
but fact is that being able to do them is a documented part of the
language, and a lot of code makes good use of it. Assuming these
things don't happen means the compiler doesn't implement Python.
This applies not only to addition; expressions such as "foo.bar",
which include any method call, would be translated to (python:getattr
foo "bar"), and so on. Most functions would have to construct actual
tuples, since a function can be replaced with one that takes *args.
Again, optimizing almost any of this away would change the semantics
of Python. From the ability to assign to classes, to modules, to
globals(), and to __dict__'s, literally anything can change at
run-time. *Some* kinds of runtime dispatches can be sped up by

In this respect, CL's is similar to Python. Generic functions are even
more dynamic than Python's methods. You can add a method to a gf
whenever you please. Also, you can assign to classes, change their
structure (add or remove slots), change their metaclass and so on. As
for foo.bar and friends: in CL you have to define an accessor function
if you don't want to use (SLOT-VALUE foo 'bar) all the time (this is
usually done through a shortcut in the DEFCLASS macro).

CL objects are not hash-tables, so you will get an error if you try to
assign to a bogus (by which I mean not present in the class
definition) slot. However, you can implement a slot-missing method
to sanely handle this situation.

You cannot reset slots containing methods in CL (as they do not
exist). However, you should be able to implement
SLOT-VALUE-USING-CLASS and (SETF SLOT-VALUE-USING-CLASS) which would
emulate Python's behavior. Finally, you can use EQL specializers,
which give you object (not class) specific behavior.

The one thing that isn't so easy to emulate is assignment to modules
(though redefinition of a function in some package works as you would
expect).
setting up sophisticated caches (one such cache for methods is being
applied to CPython), but getting that right without breaking
correctness is quite tricky. Besides the same caches could be used to
speed up CPython too, so they don't constitute an advantage of the
compiler.

The main determinant of Python's performance isn't the interpreter
overhead, but the amount of work that must be done at run-time and
cannot be moved to compile-time or optimized away.

Well, I am still not convinced that Python is intrinsically
un-compilable :).

Some optimizations that are nowadays done by hand probably could be
abstracted. Think assigning a method to a local variable before using
it in a loop... Expressing simple loops as C for loops... Tail call
optimization... (Of course, my intuitions about what would be a great
optimization for a Python compiler are vague, so please correct me if
I am drastically wrong.) Note that these are the kind of optimizations
you don't want in your interpreter. When it comes to debugging, the
lack of tail call optimization is a feature, not a bug.

Finally, skimming the CLPython mailing list suggests it actually
works. Presently it is slower than CPython, but after all it is a far
from mature one developer project, so you can treat it as a proof of
concept.

Anyway, I won't be very surprised if in a couple of years your average
c.l.py troll is going to be asking "So I heard that Python is an
interpreted only language, how can it be any good?", and you will be
explaining for the thousandth time: "Since version 4.2 Python has a
fast native code compiler, so...". ;-)

Cheers,

-- Richard


BTW maybe Dylan could be a better model for Python for compilation? In
many respects it is a lot more similar to Python than Common Lisp. It
is a Lisp-1 (it has a single namespace for functions and variables),
and it seems to be more object oriented than CL.
 
G

Grant Edwards

-On [20080207 22:09] said:
Errr... didn't one of the novels explain it away by describing the
kessel run as a region of space warped by black holes or other objects?
Bragging rights for crossing such a field thus centered on shortest
distance instead of time.

http://starwars.wikia.com/wiki/Kessel_Run

Han Solo claimed that his Millennium Falcon "made the Kessel Run in less
than twelve parsecs." The parsec is a unit of distance, not time. Solo was
not referring directly to his ship's speed when he made this claim. Instead,
he was referring to the shorter route he was able to travel by skirting the
nearby Maw black hole cluster, thus making the run in under the standard
distance. However, parsec relates to time in that a shorter distance equals
a shorter time at the same speed. By moving closer to the black holes, Solo
managed to cut the distance down to about 11.5 parsecs.

Um, yea, I'd have to call bullshit on that.

IIRC, he was answering a question something like "is she fast".
If you buy the above BS, he'd have to be be answering a
question about his piloting skills not about how fast the ship
is.

One could give GL the benefit of the doubt and claim that GL
intentionally miswrote the line to give the movie the feel of
the badly-written serials he was emulating. But then again, I
think GL has since proven beyond a doubt that he's just a
really bad writer who is particularly awful at dialog.
 
G

Grant Edwards

A Parsec is a fixed value (which, admittedly, presumes the culture
developed a 360degree circle broken into degrees => minutes =>
seconds... or, at least, some units compatible with the concept of an
"arc second", like 400 grads of, say, 100 "minutes", each of 100
"seconds")

It also presumes a standard diamter of that circle.
 
H

Hrvoje Niksic

Ryszard Szopa said:
Well, I am still not convinced that Python is intrinsically
un-compilable :).

It's not. My point is that it's very hard to optimize if your goal is
to implement Python as currently defined.
Some optimizations that are nowadays done by hand probably could be
abstracted. Think assigning a method to a local variable before using
it in a loop...

That's an example of what I'm talking about: it is simply not correct
to cache methods in general. If you think changing classes is a
no-no, remember that function objects can be and do get added to an
individual instance's __dict__. Of course, your compiler could
support a declaration that disables the optimization for code that
really needs to do it, but then you're no longer compatible with
Python. (And by "Python" I don't mean just CPython, but the core
language as defined by the language reference and implemented in
CPython, Jython, and IronPython.)
Anyway, I won't be very surprised if in a couple of years your
average c.l.py troll is going to be asking "So I heard that Python
is an interpreted only language, how can it be any good?", and you
will be explaining for the thousandth time: "Since version 4.2
Python has a fast native code compiler, so...". ;-)

I'll be very happy to be proven wrong in that respect. :)
 
G

Grant Edwards

the compiler could do little else except translate it to something
like:

(python:add a b)
[snip more interesting considerations about compiling python]

Please get back on topic. This discussion is about parsecs and
wookies now.

What's a "wookie" a unit of?
 
R

Reedick, Andrew

-----Original Message-----
From: [email protected] [mailto:python-
[email protected]] On Behalf Of Grant Edwards
Sent: Friday, February 08, 2008 12:46 PM
To: (e-mail address removed)
Subject: Re: Why not a Python compiler?

the compiler could do little else except translate it to something
like:

(python:add a b)
[snip more interesting considerations about compiling python]

Please get back on topic. This discussion is about parsecs and
wookies now.

What's a "wookie" a unit of?

How many ewoks are there to a wookie? (Yes, I know the metric system is
archaic, but I really don't care for them new fangled 'standard units'.)


*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top