interpreter vs. compiled

C

castironpi

Well - in IronPython user code gets compiled to in memory assemblies
which can be JIT'ed.

Michael Foord
--http://www.ironpythoninaction.com/

I don't believe so.

Take a sample interpreted language, to be compiled by a sample JIT.

a= stack( )
a.push( 0 )
a.push( 2 )
a.push( 4 )

Bytecode output is:

allocate var0
call STACK_CONSTRUCTOR
initialize var0 return
call METH0 var0 const 0
call METH0 var0 const 2
call METH0 var0 const 4

JIT output is assembly:

reg0 malloc 4
setjmp
jmp STACK_CONSTRUCTOR
popjmpv reg1
setjmp
reg1 0
jmp METH0
reg1 2
jmp METH0
reg1 4
jmp METH0

But when you compare the C code for 'x= y+ 1' to Python JIT code for
'x= y+ 1', there is no stack and no error checking. They're quite
different. In a JIT language, Python runs in native assembly, output
similar to CPython, theoretically. User's code -does- -not- -run-.
It is moved in to and out of a stack by a program that is running,
namely the interpreter, whether it came from C and a compiler,
or .NEt, bytecode, and a JIT.
 
T

Tim Roberts

castironpi said:
In CPython yes. In IronPython yes: the parts that are compiled into
machine code are the interpreter, *not user's code*.

WRONG! You are WRONG. At "compile" time, the Python code is compiled to
an intermediate language. At "run" time, the intermediate language (which
is still the user's code, just in another representation) is compiled into
machine language. It is the user's program, not the interpreter.

It's the exact same process that occurs in a C compiler. Most C compilers
translate the C program to an intermediate form before finally converting
it to machine language. The only difference is that, in a C compiler, both
steps occur within the compiler. In IronPython, the two steps are
separated in time. There is no other difference.
Without that
step, the interpreter would be running on an interpreter, but that
doesn't get the user's statement 'a= b+ 1' into registers-- it gets
'push, push, add, pop' into registers.

You have a fundamental misunderstanding of the compilation process.
 
A

alex23

I don't believe so.

Uh, you're questioning someone who is not only co-author of a book on
IronPython, but also a developer on one of the first IronPython-based
commercial applications.

I know authorship isn't always a guarantee of correctness, but what
experience do you have with IronPython that makes you so unwilling to
accept the opinion of someone with substantial knowledge of the
subject?
 
C

castironpi

Uh, you're questioning someone who is not only co-author of a book on
IronPython, but also a developer on one of the first IronPython-based
commercial applications.

I know authorship isn't always a guarantee of correctness, but what
experience do you have with IronPython that makes you so unwilling to
accept the opinion of someone with substantial knowledge of the
subject?

None, no experience, no authority, only the stated premises &
classifications, which I am generally tending to misinterpret. I'm
overstepping my bounds and trying to do it politely. (Some might call
it learning, which yes, though uncustomary, *requires questioning
authorities*, or reinventing.)

Evidently, I have a "fundamental misunderstanding of the compilation
process", which I'm trying to correct by stating what I believe. I'm
trying to elaborate, and I'm meeting with increasingly much detail.
So, perhaps I'll learn something out of this. Until then...

What I know I have is two conflicting, contradictory, inconsistent
beliefs. Maybe I've spent too much time in Python to imagine how a
dynamic language can compile.

This is from 7/22/08, same author:
I wouldn't say "can't". The current CPython VM does not compile
code. It COULD. The C#/.NET VM does.

Three big claims here that I breezed right over and didn't believe.
It COULD.

I'm evidently assuming that if it could, it would.
The current CPython VM does not compile code.

Therefore it couldn't, or the assumption is wrong. Tim says it is.
And the glaring one--

WHY NOT? Why doesn't CPython do it?

From 7/18/08, own author:#define TOP() (stack_pointer[-1])
#define BASIC_POP() (*--stack_pointer)

...(line 1159)...
w = POP();
v = TOP();
if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
/* INLINE: int + int */
register long a, b, i;
a = PyInt_AS_LONG(v);
b = PyInt_AS_LONG(w);
i = a + b;
<<

I am imagining that every Python implementation has something like
it. If IronPython does not, in particular, not have the 'POP();
TOP();' sequence, then it isn't running on a stack machine. Is the
IronPython code open source, and can someone link to it? I'm not
wading through it from scratch. What does it have instead? Does
dynamic typing still work?

<closing hostile remark>
If you're bluffing, bluff harder; I call. If you're not, I apologize;
teach me something. If you can ask better, teach me that too.
</hostile>
 
C

castironpi

WRONG!  You are WRONG.  At "compile" time, the Python code is compiled to
an intermediate language.  At "run" time, the intermediate language (which
is still the user's code, just in another representation) is compiled into
machine language.  It is the user's program, not the interpreter.

It's the exact same process that occurs in a C compiler.  Most C compilers
translate the C program to an intermediate form before finally converting
it to machine language.  The only difference is that, in a C compiler, both
steps occur within the compiler.  In IronPython, the two steps are
separated in time.  There is no other difference.


You have a fundamental misunderstanding of the compilation process.

In C, we have:

int x, y;
x= 10;
y= x+ 1;

It translates as, roughly:


8000 .data
7996 ffffffff #x
7992 ffffffff #y
7988 .end data
7984 loadi reg0 7996
7980 loadi reg1 7992
7976 loadi reg2 10
7972 loadi reg3 1
7968 storv reg2 reg0
7964 add reg0 reg1 reg2
7960 storv reg3 reg1


You are telling me that the same thing happens in IronPython. By the
time the instruction pointer gets to 'x= 10', the next 7 instructions
are the ones shown here compiled from C.

CMIIW, but the CPython implementation -does- -not-. Instead, it has,

push 10
stor x
push 1
add
stor y

which each amounts to, to give a rough figure, 5-10 machine
instructions. push 10, for example, with instruction_pointer in reg0:

loadi reg1 4 #add 4 to stack pointer (one word)
add reg0 reg1 reg2
load reg0 reg2
loadi reg2 10 #load ten
stor reg0 reg2 #store at top of stack

And this is all not to mention (i) the extra comparisons in
intobject.h,

#define PyInt_CheckExact(op) ((op)->ob_type == &PyInt_Type)

(ii) the huge case statement just to evaluate add, OR (iii)

a = PyInt_AS_LONG(v);
b = PyInt_AS_LONG(w);

because CPython -does- -not- -know- ahead of time which op it will be
executing, or what addresses (remember __coerce__), it will be
performing the op on. Does not know EVER, not until it gets there.

My point is, CPython takes more than seven steps. My question is,
does IronPython?
 
D

Dino Viehland

IronPython doesn't have an interpreter loop and therefore has no POP / TOP / etc... Instead what IronPython has is a method call Int32Ops.Add which looks like:

public static object Add(Int32 x, Int32 y) {
long result = (long) x + y;
if (Int32.MinValue <= result && result <= Int32.MaxValue) {
return Microsoft.Scripting.Runtime.RuntimeHelpers.Int32ToObject((Int32)(result));
}
return BigIntegerOps.Add((BigInteger)x, (BigInteger)y);
}

This is the implementation of int.__add__. Note that calling int.__add__ can actually return NotImplemented and that's handled by the method binder looking at the strong typing defined on Add's signature here - and then automatically generating the NotImplemented result when the arguments aren't ints. So that's why you don't see that here even though it's the full implementation of int.__add__.

Ok, next if you define a function like:

def adder(a, b):
return a + b

this turns into a .NET method, which will get JITed, which in C# would look something like like:

static object adder(object a, object b) {
return $addSite.Invoke(a, b)
}

where $addSite is a dynamically updated call site.

$addSite knows that it's performing addition and knows how to do nothing other than update the call site the 1st time it's invoked. $addSite is local to the function so if you define another function doing addition it'll have its own site instance.

So the 1st thing the call site does is a call back into the IronPython runtime which starts looking at a & b to figure out what to do. Python defines that as try __add__, maybe try __radd__, handle coercion, etc... So we go looking through finding the __add__ method - if that can return NotImplemented then we find the __radd__ method, etc... In this case we're just adding two integers and we know that the implementation of Add() won't return NotImplemented - so there's no need to call __radd__. We know we don't have to worry about NotImplemented because the Add method doesn't have the .NET attribute indicating it can return NotImplemented.

At this point we need to do two things. We need to generate the test which is going to see if future arguments are applicable to what we just figured out and then we need to generate the code which is actually going to handle this. That gets combined together into the new call site delegate and it'll look something like:

static void CallSiteStub(CallSite site, object a, object b) {
if (a != null && a.GetType() == typeof(int) && b != null && b.GetType() == typeof(int)) {
return IntOps.Add((int)a, (int)b);
}
return site.UpdateBindingAndInvoke(a, b);
}

That gets compiled down as a lightweight dynamic method which also gets JITed. The next time through the call site's Invoke body will be this method and things will go really fast if we have int's again. Also notice this is looking an awful lot like the inlined/fast-path(?) code dealing with int's that you quoted. If everything was awesome (currently it's not for a couple of reasons) the JIT would even inline the IntOps.Add call and it'd probably be near identical. And everything would be running native on the CPU.

So that's how 2 + 2 works... Finally if it's a user type then we'd generate a more complicated test like (and getting more and more pseudo code to keep things simple):

if (PythonOps.CheckTypeVersion(a, 42) && PythonOps.CheckTypeVersion(b, 42)) {
return $callSite.Invoke(__cachedAddSlot__.__get__(a), b);
}

Here $callSite is another stub which will handle doing optimal dispatch to whatever __add__.__get__ will return. It could be a Python type, it could be a user defined function, it could be the Python built-in sum function, etc... so that's the reason for the extra dynamic dispatch.

So in summary: everything is compiled to IL. At runtime we have lots of stubs all over the place which do the work to figure out the dynamic operation and then cache the result of that calculation.

Also what I've just described is how IronPython 2.0 works. IronPython 1.0 is basically the same but mostly w/o the stubs and where we use stub methods they're much less sophisticated.

Also, IronPython is open source - www.codeplex.com/IronPython

-----Original Message-----
From: [email protected] [mailto:p[email protected]] On Behalf Of castironpi
Sent: Tuesday, July 29, 2008 9:20 PM
To: (e-mail address removed)
Subject: Re: interpreter vs. compiled

Uh, you're questioning someone who is not only co-author of a book on
IronPython, but also a developer on one of the first IronPython-based
commercial applications.

I know authorship isn't always a guarantee of correctness, but what
experience do you have with IronPython that makes you so unwilling to
accept the opinion of someone with substantial knowledge of the
subject?

None, no experience, no authority, only the stated premises &
classifications, which I am generally tending to misinterpret. I'm
overstepping my bounds and trying to do it politely. (Some might call
it learning, which yes, though uncustomary, *requires questioning
authorities*, or reinventing.)

Evidently, I have a "fundamental misunderstanding of the compilation
process", which I'm trying to correct by stating what I believe. I'm
trying to elaborate, and I'm meeting with increasingly much detail.
So, perhaps I'll learn something out of this. Until then...

What I know I have is two conflicting, contradictory, inconsistent
beliefs. Maybe I've spent too much time in Python to imagine how a
dynamic language can compile.

This is from 7/22/08, same author:
I wouldn't say "can't". The current CPython VM does not compile
code. It COULD. The C#/.NET VM does.

Three big claims here that I breezed right over and didn't believe.
It COULD.

I'm evidently assuming that if it could, it would.
The current CPython VM does not compile code.

Therefore it couldn't, or the assumption is wrong. Tim says it is.
And the glaring one--

WHY NOT? Why doesn't CPython do it?
From 7/18/08, own author:
#define TOP() (stack_pointer[-1])
#define BASIC_POP() (*--stack_pointer)

....(line 1159)...
w = POP();
v = TOP();
if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
/* INLINE: int + int */
register long a, b, i;
a = PyInt_AS_LONG(v);
b = PyInt_AS_LONG(w);
i = a + b;
<<

I am imagining that every Python implementation has something like
it. If IronPython does not, in particular, not have the 'POP();
TOP();' sequence, then it isn't running on a stack machine. Is the
IronPython code open source, and can someone link to it? I'm not
wading through it from scratch. What does it have instead? Does
dynamic typing still work?

<closing hostile remark>
If you're bluffing, bluff harder; I call. If you're not, I apologize;
teach me something. If you can ask better, teach me that too.
</hostile>
 
C

castironpi

I note that IronPython and Python's pickle.dumps do not return the
same value. Perhaps this relates to the absence of interpreter loop.
IPy: '(dp0\nVb\np1\nc__builtin__\nset\np3\n((lp4\ntp5\nRp2\nsVa
\np6\nI01\ns.'
CPy: "(dp0\nS'a'\np1\nI01\nsS'b'\np2\nc__builtin__\nset
\np3\n((lp4\ntp5\nRp6\ns."

You make me think of a more elaborate example.

for k in range( 100 ):
i= j()
g= h+ i
e= f+ g
c= d+ e
a= b+ c

Here, j creates a new class dynamically, and returns an instance of
it. Addition is defined on it but the return type from it varies.

If I read you correctly, IPy can leave hundreds of different addition
stubs laying around at the end of the for-loop, each of which only
gets executed once or twice, each of which was compiled for the exact
combination of types it was called for.

I might construe this to be a degenerate case, and the majority of
times, you'll reexecute stubs enough to outweigh the length of time
the compilation step takes. If you still do the bounds checking, it
takes extra instructions (C doesn't), but operation switch-case
BINARY_ADD, (PyInt_CheckExact(v) && PyInt_CheckExact(w)), and POP and
TOP, are all handled by the selection of stubs from $addSite.

I'm read from last April:<<< http://lists.ironpython.com/pipermail/users-ironpython.com/2007-April/004773.html

It's interesting that CPython can make those gains still by using a
stack implementation.

I'll observe that IronPython has the additional dependency of the
full .NET runtime. (It was my point 7/18 about incorporating the GNU
libs, that to compile to machine-native, as a JIT does, you need the
instruction set of the machine.) Whereas, CPython can disregard
them, having already been compiled for it.

I think what I was looking for is that IronPython employs the .NET to
compile to machine instructions, once it's known what the values of
the variables are that are the operands. The trade-off is compilation
time + type checks + stub look-up.

What I want to know is, if __add__ performs an attribute look-up, is
that optimized in any way, after the IP is already in compiled code?

After all that, I don't feel so guilty about stepping on Tim's toes.
 
D

Dino Viehland

It looks like the pickle differences are due to two issues. First IronPython doesn't have ASCII strings so it serializes strings as Unicode. Second there are dictionary ordering differences. If you just do:

{ 'a': True, 'b': set( ) }

Cpy prints: {'a': True, 'b': set([])}
Ipy prints: {'b': set([]), 'a': True}

The important thing is that we interop - and indeed you can send either pickle string to either implementation and the correct results are deserialized (modulo getting Unicode strings).

For your more elaborate example you're right that there could be a problem here. But the DLR actually recognizes this sort of pattern and optimizes for it. All of the additions in your code are what I've been calling serially monomorphic call sites. That is they see the same types for a while, maybe even just once as in your example, and then they switch to a new type - never to return to the old one. When IronPython gives the DLR the code for the call site the DLR can detect when the code only differs by constants - in this case type version checks. It will then re-write the code turning the changing constants into variables. The next time through when it sees the same code again it'll re-use the existing compiled code with the new sets of constants.

That's still slower than we were in 1.x so we'll need to push on this more in the future - for example producing a general rule instead of a type-specific rule. But for the time being having the DLR automatically handle this has been working good enough for these situations.

-----Original Message-----
From: [email protected] [mailto:p[email protected]] On Behalf Of castironpi
Sent: Tuesday, July 29, 2008 11:40 PM
To: (e-mail address removed)
Subject: Re: interpreter vs. compiled

I note that IronPython and Python's pickle.dumps do not return the
same value. Perhaps this relates to the absence of interpreter loop.
IPy: '(dp0\nVb\np1\nc__builtin__\nset\np3\n((lp4\ntp5\nRp2\nsVa
\np6\nI01\ns.'
CPy: "(dp0\nS'a'\np1\nI01\nsS'b'\np2\nc__builtin__\nset
\np3\n((lp4\ntp5\nRp6\ns."

You make me think of a more elaborate example.

for k in range( 100 ):
i= j()
g= h+ i
e= f+ g
c= d+ e
a= b+ c

Here, j creates a new class dynamically, and returns an instance of
it. Addition is defined on it but the return type from it varies.

If I read you correctly, IPy can leave hundreds of different addition
stubs laying around at the end of the for-loop, each of which only
gets executed once or twice, each of which was compiled for the exact
combination of types it was called for.

I might construe this to be a degenerate case, and the majority of
times, you'll reexecute stubs enough to outweigh the length of time
the compilation step takes. If you still do the bounds checking, it
takes extra instructions (C doesn't), but operation switch-case
BINARY_ADD, (PyInt_CheckExact(v) && PyInt_CheckExact(w)), and POP and
TOP, are all handled by the selection of stubs from $addSite.

I'm read from last April:<<< http://lists.ironpython.com/pipermail/users-ironpython.com/2007-April/004773.html

It's interesting that CPython can make those gains still by using a
stack implementation.

I'll observe that IronPython has the additional dependency of the
full .NET runtime. (It was my point 7/18 about incorporating the GNU
libs, that to compile to machine-native, as a JIT does, you need the
instruction set of the machine.) Whereas, CPython can disregard
them, having already been compiled for it.

I think what I was looking for is that IronPython employs the .NET to
compile to machine instructions, once it's known what the values of
the variables are that are the operands. The trade-off is compilation
time + type checks + stub look-up.

What I want to know is, if __add__ performs an attribute look-up, is
that optimized in any way, after the IP is already in compiled code?

After all that, I don't feel so guilty about stepping on Tim's toes.
 
T

Terry Reedy

CPython compiles Python code to bytecode for its CPython *hardware
independent* VM using standard compiler methdods and tools (lexer,
parser, code generator, and optimizer). That VM (interpreter) is
written in portable-as-possible C, with machine/OS #ifdefs added as
needed.
WHY NOT? Why doesn't CPython do it?

1. Portability: The Microsoft C# JIT compiler runs under Windows .NET on
x86/amd64 and maybe it64 and what else? Just porting .NET to run 0n
Linux on the same processors was/is a bit task. Does MONO have a JIT also?

There is a JIT for Python: Psyco. It originally only worked on x86. I
am not sure what else. It originated as a PhD project, working with
CPython, and was developed further as part of PyPy, but I do not know if
there is any current progress.

Python VM runs on numerous platforms.

2. Money: C#, its JIT, and IronPython were and are funded by MS.
Getting JIT right is hard and tedious.

CPython is mostly a volunteer project. It is also the Python development
platform. So it has to be simple enough for volunteers to pick up on
its innards and for experimentation to be possible. Give the PSF more
resources and

tjr
 
T

Tim Roberts

castironpi said:
In C, we have:

int x, y;
x= 10;
y= x+ 1;

It translates as, roughly:
8000 .data
7996 ffffffff #x
7992 ffffffff #y
7988 .end data
7984 loadi reg0 7996
7980 loadi reg1 7992
7976 loadi reg2 10
7972 loadi reg3 1
7968 storv reg2 reg0
7964 add reg0 reg1 reg2
7960 storv reg3 reg1

I don't recognize that assembly language. Is that another intermediate
language?
You are telling me that the same thing happens in IronPython.

Yes, the same process happens.
By the
time the instruction pointer gets to 'x= 10', the next 7 instructions
are the ones shown here compiled from C.

I most certainly did NOT say that, as you well know. Different C compilers
produce different instruction sequences for a given chunk of code. Indeed,
a single C compiler will produce different instruction sequences based on
the different command-line options. It's unreasonable to expect a Python
compiler to produce exactly the same code as a C compiler.

However, that does disqualify the Python processor as a "compiler".
CMIIW, but the CPython implementation -does- -not-.

And again, I never said that it did. CPython is an interpreter. the
user's code is never translated into machine language.
My point is, CPython takes more than seven steps. My question is,
does IronPython?

So, if compiler B isn't as good at optimization as compiler A, does that
mean in your mind that compiler B is not a "compiler"?
 
B

Bob Martin

in 76135 20080731 090911 Dennis Lee Bieber said:
Using that definition, the UCSD P-code Pascal and Java are also not
"compilers" -- all three create files containing instructions for a
non-hardware virtual machine.

The only difference between Python, UCSD Pascal, and Java is that
Python foregoes the explicit "compiler" pass.

BASIC (classical microcomputer implementations -- like the one M$
supplied for TRS-80s) is an interpreter -- the pre-scan of the source
merely translated BASIC keywords into a byte index, not into opcodes for
any virtual machine.

You are confusing languages with implementations, as I pointed out earlier.
Java is a language.
I have used at least 2 Java compilers, ie they compiled Java source to native
machine language.
 
T

Terry Reedy

Duncan said:
Technically there is no such thing as a Microsoft C# JIT compiler: the C#
compiler targets IL and the JIT compilers convert IL to the native machine,
but C# is just one of the frontend compilers you could use.

Microsoft do JIT compilers for .Net Compact Framework that target ARM,
MIPS, SHx and x86. The Mono JIT supports:

s390, s390x (32 and 64 bits) Linux
SPARC(32) Solaris, Linux
PowerPC Linux, Mac OSX
x86 Linux, FreeBSD, OpenBSD, NetBSD, Microsoft Windows, Solaris, OS X
x86-64: AMD64 and EM64T (64 bit) Linux, Solaris
IA64 Itanium2 (64 bit) Linux
ARM: little and big endian Linux (both the old and the new ABI)
Alpha Linux
MIPS Linux
HPPA Linux

(from http://www.mono-project.com/Supported_Platforms)


So I'd say .Net scores pretty highly on the portability stakes. (Although
of course code written for .Net might not do so well).

Did you mean IL scores highly? In any case, scratch 1. portability as a
reason why CPython lacks JIT. That leave 2. $ in the multimillions.

More
3. Design difference1: I suspect that IL was designed with JIT
compilation in mind, whereas PyCode was certainly not.

4. Design difference2: The first 'killer ap' for Python was driving
compiled Fortran/C functions (early Numeric). If a 'Python' program
spend 95% of its time in compiled-to-machine-code extensions, reducing
the other 5% to nothing gains little. CPython *was* and has-been
designed for this.

The process continues. The relatively new itertools module was designed
and tested in Python (see itertools in Libary reference). But the
delivered module is compiled C.

tjr
 
C

castironpi

I don't recognize that assembly language.  Is that another intermediate
language?

I'm looking at a system word of 1's and 0's that gets executed on a
per-cycle basis in the processor. Could easily be that the designs
are tuned to JIT's these days and I'm out of date, what with
pipelining and lookahead branching and all, but no, it's what I
remember from system architecture class.
Yes, the same process happens.


I most certainly did NOT say that, as you well know.  Different C compilers
produce different instruction sequences for a given chunk of code.  Indeed,
a single C compiler will produce different instruction sequences based on
the different command-line options.  It's unreasonable to expect a Python
compiler to produce exactly the same code as a C compiler.

However, that does disqualify the Python processor as a "compiler".


And again, I never said that it did.  CPython is an interpreter.  the
user's code is never translated into machine language.


So, if compiler B isn't as good at optimization as compiler A, does that
mean in your mind that compiler B is not a "compiler"?

You can translate C code to machine code, for any given C code, for
any given machine.

You can translate Python code to machine code, for some Python code,
for any given machine.

Given the restrictions (or rather, freedoms) of Python, does there
exist code that necessarily cannot translate to machine code? In
other words, can you translate all Python code to machine code?
Similarly, I take it that the decision to make CPython a stack machine
+ VM was a design decision, not a necessity, favoring internal
simplicity over the extra 5%.

The output of the program is determined from the input by the Python
specification, regardless of implementation, but the answer still
isn't necessarily yes. But I think the only counterexample that comes
to me is the case of a dynamic grammar, not merely dynamic data type,
so for Python maybe it is. And furthermore, I think I'm getting
confused about what exactly constitutes an interpreter: it is whether
there is a process that runs product instructions, or the product
instructions can run standalone. I would take 'compiler' to mean,
something that outputs an .EXE executable binary file, and I don't
just mean bundling up the python.exe executable with a file. Python
needs to be present on the machine you run target code on, not so with
C binaries. Can .NET bring its targets to such a state, or are those
run-times requried? Are they just DLL's, or are they, for lack of
better word, driving?

(Of course, for futuristic abstract hardware designs, CPython may be
mostly native instructions, and the output of its processor on
function definition, can be stored and run directly, such as for a
stack machine architecture.)

But I don't want to restrict my question to a question of
optimization. Your compiler could output something like this:

read variables into registers
reorganize variable dictionaries
perform addition
do laundry
write variables into system memory
clean sink

and run when you write 'C:\>silly.exe', and so on. And still be
compiled, even if the live IronPython session, which you invoke with,
'C:\>ironpy.exe silly.py', outputs the same 7 MIPS instructions.
 
P

Paul Boddie

Given the restrictions (or rather, freedoms) of Python, does there
exist code that necessarily cannot translate to machine code?  In
other words, can you translate all Python code to machine code?

Given that all valid Python code can be executed somehow and that
execution takes place as the processor performs instructions which "it
gets from somewhere", meaning that those instructions can belong
either to a general interpreter or to specific code generated for a
given user program (or a combination of these things), I think that
you have to refine your question. What you seem to be asking is this:
can you translate Python code to machine code which encodes the
behaviour of the user program in a way nearing the efficiency of code
generated from other programming languages? Rephrased, the question is
this: can Python code be efficiently represented using low-level
machine instructions?

I think you've already touched upon this when thinking about integer
operations. The apparently simple case of integer addition in Python
is not completely encoded by a few machine instructions. In other
words...

a + b # in Python

...is not sufficiently represented by...

ldr r1, a
ldr r2, b
add r3, r1, r2

...in some assembly language (and the resulting machine code), mostly
because the semantics of Python addition are more complicated. Of
course, you can generate code for those semantics, which would lead to
quite a few more machine instructions than those suggested above, but
then it might be interesting to bundle those instructions in some kind
of subroutine, and we could call this subroutine BINARY_ADD. At this
point, you'd almost be back at the stage where you're writing a
bytecode interpreter again.

Of course, it's worth considering something in between these
situations (the verbose expansion of the user program vs. a bytecode
interpreter which examines virtual instructions and jumps to
subroutines), and there are apparently a few techniques which make
virtual machines more efficient (so that the processor isn't jumping
around too much in the interpreter code, for example), and there are
also going to be techniques which permit the simplification of any
verbose machine code representation (most likely by not generating
code which is never going to be executed, due to various properties of
the program).

Obviously, CPython isn't oriented towards investigating these matters
in great depth, but that doesn't mean that other implementations can't
pursue other approaches.
Similarly, I take it that the decision to make CPython a stack machine
+ VM was a design decision, not a necessity, favoring internal
simplicity over the extra 5%.

Probably: it simplifies code generation somewhat.

Paul
 
T

Terry Reedy

castironpi said:
Similarly, I take it that the decision to make CPython a stack machine
+ VM was a design decision, not a necessity, favoring internal
simplicity over the extra 5%.

Years ago, someone once started a project to write a register-based
virtual machine for (C)Python. I suspect it was abandoned for some
combination of lack of time and preliminary results showing little
speedup for the increased complication. But I never saw any 'final
report'.
And furthermore, I think I'm getting
confused about what exactly constitutes an interpreter: it is whether
there is a process that runs product instructions, or the product
instructions can run standalone. I would take 'compiler' to mean,
something that outputs an .EXE executable binary file,

This is way too restrictive. Does *nix have no compilers? In any case,
the CPython compiler uses stadard compiler components: lexer, parser,
syntax tree, code generator, and peephole optimizer. The result is a
binary file (.pyc for Python compiled) executable on a PyCode machine.
 
C

castironpi

Given that all valid Python code can be executed somehow and that
execution takes place as the processor performs instructions which "it
gets from somewhere", meaning that those instructions can belong
either to a general interpreter or to specific code generated for a
given user program (or a combination of these things), I think that
you have to refine your question. What you seem to be asking is this:
can you translate Python code to machine code which encodes the
behaviour of the user program in a way nearing the efficiency of code
generated from other programming languages? Rephrased, the question is
this: can Python code be efficiently represented using low-level
machine instructions?

I think you've already touched upon this when thinking about integer
operations. The apparently simple case of integer addition in Python
is not completely encoded by a few machine instructions. In other
words...

  a + b # in Python

...is not sufficiently represented by...

  ldr r1, a
  ldr r2, b
  add r3, r1, r2

...in some assembly language (and the resulting machine code), mostly
because the semantics of Python addition are more complicated.

No, it is not sufficiently represented. Python runs checks before and
after, to check for overflows.

test safeinteger a
test safeinteger b
ldr r1, a
ldr r2, b
add r3, r1, r2
test not overflow

However, no implementation of Python can do better, given Python's
specification.
Of
course, you can generate code for those semantics, which would lead to
quite a few more machine instructions than those suggested above, but
then it might be interesting to bundle those instructions in some kind
of subroutine, and we could call this subroutine BINARY_ADD. At this
point, you'd almost be back at the stage where you're writing a
bytecode interpreter again.

This isn't the bytecode interpreter returning, it's bounds checking,
which is part and parcel of Python, I hold.

Another factor, a and b are known to be and are always integers, in a
given C context.

int a, b;
...
a + b

The C compilation process outputs:

ldr r1, a
ldr r2, b
add r3, r1, r2

and you are correct. However, for:

string a, b;
a + b

performs a concatenation which is not that simple. The point is, C
compilation runs, and you actually have -ldr, ldr, add- lying around
in a file somewhere, which can run as three consecutive instructions
on a processor. It's already in the interpreter in Python, and you
have the -test, test, ldr, ldr, add, test- sequence somewhere in
Python.exe, specifically wherever the object code for ceval.c is
going.

Incidentally, I find 2 bytes in 16K different in a simple C program
that merely executes a + b vs. a - b. It is not in this practical
case a three-word output (12 bytes, ldr-ldr-add vs. ldr-ldr-subtr),
though it's not clear what entry points the OS requires.
Of course, it's worth considering something in between these
situations (the verbose expansion of the user program vs. a bytecode
interpreter which examines virtual instructions and jumps to
subroutines), and there are apparently a few techniques which make
virtual machines more efficient (so that the processor isn't jumping
around too much in the interpreter code, for example), and there are
also going to be techniques which permit the simplification of any
verbose machine code representation (most likely by not generating
code which is never going to be executed, due to various properties of
the program).

I think it's relevant and fair to consider -consecutive- language
instructions at this point. Working example:

int a, b, c, d;
...
a + b
c + d

The C compilation process outputs:

ldr r1, a
ldr r2, b
add r3, r1, r2
ldr r1, c
ldr r2, d
add r3, r1, r2

Whereas there is no equivalent in CPython. The actual code that runs
on the processor (summary) is:

:loop
...
:addition_sign
test safeinteger a
test safeinteger b
ldr r1, a
ldr r2, b
add r3, r1, r2
test not overflow
:goto loop

as opposed to two duplicate sections being outputted back-to-back,
even though they are -run- effectively back-to-back. For a different
type, say list concatenation, the disassembly looks like:
... []+[]
... 2 0 BUILD_LIST 0
3 BUILD_LIST 0
6 BINARY_ADD
7 POP_TOP
8 LOAD_CONST 0 (None)
11 RETURN_VALUE

the meaning of which I am not finding in ceval.c. Anyone?

Regardless, the JIT compilation process allocates a new executable
block of memory:

:loop
...
:addition_sign
output( 'ldr r1, a; ldr r1, a; ldr r2, b' )
:goto loop

which in this stage executes twice, yielding

ldr r1, a
ldr r2, b
add r3, r1, r2
ldr r1, c
ldr r2, d
add r3, r1, r2

somewhere in memory, same as C. Then it runs the block. It also has
to have already ascertained that 'a' and 'b' are necessarily integers
by the time it makes the output( ) statement. I remain unclear on the
disagreement between why JIT is not called compiling, and why it
doesn't output an executable binary, unless the practical reasons of
saving developer time and cross-platform object files, or merely
terminology, are the only difference.
 
T

Tim Roberts

Dennis Lee Bieber said:
Using that definition, the UCSD P-code Pascal and Java are also not
"compilers" -- all three create files containing instructions for a
non-hardware virtual machine.

Right. UCSD p-code Pascal was almost always implemented as an interpreter.
I would be surprised if anyone argued that it was a compiler.

However, I thought Java was usually JIT compiled to machine language. Am I
mistaken?
 
T

Tim Roberts

castironpi said:
And furthermore, I think I'm getting
confused about what exactly constitutes an interpreter: it is whether
there is a process that runs product instructions, or the product
instructions can run standalone. I would take 'compiler' to mean,
something that outputs an .EXE executable binary file, and I don't
just mean bundling up the python.exe executable with a file.

OK, let me give MY definition. I freely grant that my definition might be
different from anyone elses, but perhaps this will help you understand the
basis for my arguments.

If I run three different CPython programs, the bytes of machine language
that get executed are come from the same place: python24.dll. My user
programs are just data. That, in my mind, makes the CPython implementation
an interpreter.

If I compile and run three different C programs, the bytes of machine
language will be come from three different places. That, in my mind, makes
my C implementation a compiler.

If I compile and run three different C# programs, the JIT compiler makes
new machine language for each one. The bytes of machine language will come
from three different places. That, in my mind, makes the C# implementation
a compiler.

If I compile and run three different IronPython programs, the JIT compiler
makes new machine language for each one. The bytes of machine language
will come from three different places. That, in my mind, makes the
IronPython implementation a compiler.

All four of those scenarios require run-time library support. Even the C
progam does not run on its own. Execution starts in the run-time library,
which sets up an environment before jumping to "main". The C# and
IronPython situations are the same; it's just that there's more processing
going on before jumping to "main".
 
P

Paul Boddie

No, it is not sufficiently represented. Python runs checks before and
after, to check for overflows.

It does more than this...
test safeinteger a
test safeinteger b
ldr r1, a
ldr r2, b
add r3, r1, r2
test not overflow

However, no implementation of Python can do better, given Python's
specification.

In fact, as was probably mentioned before, it does something more like
this:

get method __add__ on a (if possible) or jump to (1)
populate an invocation frame with a and b
call the method
test the result against the special NotImplemented value
if result is not NotImplemented then jump to (3)
(1) get method __radd__ on b (if possible) or jump to (2)
populate an invocation frame with b and a
call the method
test the result against the special NotImplemented value
if result is not NotImplemented then jump to (3)
(2) raise a TypeError exception
(3) provide the result to whatever gets evaluated next

The instructions that you list, which is based on the really simple
case which I mentioned, happens in the method that gets called (eg.
__add__), and then only for integers. Note that seemingly trivial
things like getting the methods can be quite a few instructions in any
low-level code.

[...]
Another factor, a and b are known to be and are always integers, in a
given C context.

int a, b;
...
a + b

The C compilation process outputs:

ldr r1, a
ldr r2, b
add r3, r1, r2

Right. That's why some people want to have type declarations in
Python, and others want to be able to generate specialised code at run-
time for such cases.
and you are correct. However, for:

string a, b;
a + b

performs a concatenation which is not that simple. The point is, C
compilation runs, and you actually have -ldr, ldr, add- lying around
in a file somewhere, which can run as three consecutive instructions
on a processor. It's already in the interpreter in Python, and you
have the -test, test, ldr, ldr, add, test- sequence somewhere in
Python.exe, specifically wherever the object code for ceval.c is
going.

Right. The way to think about this is that due to the mechanics of
working out what kind of operations should be performed (should it be
an integer addition, a string concatentation, something else?),
there's a lot of code executed which is really "dancing around" the
actual work, and then for short bursts of instructions, the work
actually gets done. It's like having to jet around the world, sampling
drinks in different locations, rather than just lining the different
drinks up at home.

Paul
 
C

castironpi

OK, let me give MY definition. I freely grant that my definition might be
different from anyone elses, but perhaps this will help you understand the
basis for my arguments.

I understand that we're having a disagreement about terminology. I
further don't understand exactly what JIT languages are, so I can't
agree on that either.

I will observe the certain amount of corporate hype behind, and worker
base morale riding on, the notion that JIT technology compiles code.
I suspect it's an exaggeration, not outright false, but I can't prove
it until I tell you what instructions run, one right after another, on
a concrete architecture I've held in my hand, like the x86 die. Nor
can I thoroughly believe that it's true either, though, until its
creators have told me what instructions they are. So I'll proclaim
ignorance and await facts... or consistent stories about them.
If I run three different CPython programs, the bytes of machine language
that get executed are come from the same place: python24.dll. My user
programs are just data. That, in my mind, makes the CPython implementation
an interpreter.

If I compile and run three different C programs, the bytes of machine
language will be come from three different places. That, in my mind, makes
my C implementation a compiler.

True. I agree on the facts and the terms.
If I compile and run three different C# programs, the JIT compiler makes
new machine language for each one. The bytes of machine language will come
from three different places. That, in my mind, makes the C# implementation
a compiler.

If I compile and run three different IronPython programs, the JIT compiler
makes new machine language for each one. The bytes of machine language
will come from three different places. That, in my mind, makes the
IronPython implementation a compiler.

I don't know enough to attest to these for a fact, and you haven't
given enough details to corroborate them as facts. But when you do,
I'll be able to take and learn your terms for them (not that I will,
of course, but I can).
All four of those scenarios require run-time library support. Even the C
progam does not run on its own.

I disagree with this, if the C program is statically linked -- the OS
copies the binary (.EXE) from disk into memory, then jumps to a
specific offset in that block / address space. It runs all its own
bytes, then jumps back to an OS-specified point of return of control.
For the other three, though, this is true.
Execution starts in the run-time library,
which sets up an environment before jumping to "main". The C# and
IronPython situations are the same; it's just that there's more processing
going on before jumping to "main".

I want to give a concrete example of 'generating machine code' per se
(as such).

I run this program: <fiction>

bin= open( 'abinary.exe', 'w' )
bin.write( '\x09\x0f\x00\x00' )
for x in range( 10 ):
bin.write( '\x04\xA0' + chr( x ) + '\x00' )
bin.write( '\x01\x20\x00\x00' )

It outputs to 'abinary.exe':

\x09\x0f\x00\x00
\x04\xa0\x00\x00
\x04\xa0\x01\x00
\x04\xa0\x02\x00
\x04\xa0\x03\x00
\x04\xa0\x04\x00
\x04\xa0\x05\x00
\x04\xa0\x06\x00
\x04\xa0\x07\x00
\x04\xa0\x08\x00
\x04\xa0\x09\x00
\x01\x20\x00\x00

Which is 12 bytes long and runs in a millisecond. What it does is set
a memory address to successive integers 0..9, then yields. Due to the
nature of program flow control, while it runs its first steps on any
x86 machine, the yield only succeeds if on Windows 98+, and crashes
the machine, or otherwise loses control if not. (That part depends on
those OSses.)

I can try something similar dynamically.

char* mem= alloc( 48 )
setpermission( mem, EXECUTE )
memcpy( mem+ 0, "\x09\x0f\x00\x00", 4 )
for( int x= 0; x< 10; ++x ) {
memcpy( mem+ 4* (x+ 1 ), '\x04\xA0\x00\x00', 4 )
mem[ 4* (x+ 1 )+ 3 ]= (char) x
memcpy( mem+ 44, '\x01\x20\x00\x01', 4 )
setjump
goto mem

Which with some imagination produces the contents of 'abinary.exe'
above (one difference, last word) in a memory block, at address 'mem',
then jumps to it, which then jumps back, and then exits. </fiction>

I'll compare a C complation to the first example, 'abinary.exe', and a
JIT compilation to the second example, 'char* mem'. If the comparison
isn't accurate, say how, because these are places I can start from...
(yes, that is, instead of just repeating the claims).

When does a JIT do this, and what does it do in the meantime?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top