interpreter vs. compiled

P

paulo.jpinto

OK, let me give MY definition.  I freely grant that my definition might be
different from anyone elses, but perhaps this will help you understand the
basis for my arguments.

I understand that we're having a disagreement about terminology.  I
further don't understand exactly what JIT languages are, so I can't
agree on that either.

I will observe the certain amount of corporate hype behind, and worker
base morale riding on, the notion that JIT technology compiles code.
I suspect it's an exaggeration, not outright false, but I can't prove
it until I tell you what instructions run, one right after another, on
a concrete architecture I've held in my hand, like the x86 die.  Nor
can I thoroughly believe that it's true either, though, until its
creators have told me what instructions they are.  So I'll proclaim
ignorance and await facts... or consistent stories about them.
If I run three different CPython programs, the bytes of machine language
that get executed are come from the same place: python24.dll.  My user
programs are just data.  That, in my mind, makes the CPython implementation
an interpreter.
If I compile and run three different C programs, the bytes of machine
language will be come from three different places.  That, in my mind, makes
my C implementation a compiler.

True.  I agree on the facts and the terms.
If I compile and run three different C# programs, the JIT compiler makes
new machine language for each one.  The bytes of machine language will come
from three different places.  That, in my mind, makes the C# implementation
a compiler.
If I compile and run three different IronPython programs, the JIT compiler
makes new machine language for each one.  The bytes of machine language
will come from three different places.  That, in my mind, makes the
IronPython implementation a compiler.

I don't know enough to attest to these for a fact, and you haven't
given enough details to corroborate them as facts.  But when you do,
I'll be able to take and learn your terms for them (not that I will,
of course, but I can).
All four of those scenarios require run-time library support.  Even the C
progam does not run on its own.

I disagree with this, if the C program is statically linked -- the OS
copies the binary (.EXE) from disk into memory, then jumps to a
specific offset in that block / address space.  It runs all its own
bytes, then jumps back to an OS-specified point of return of control.
For the other three, though, this is true.
Execution starts in the run-time library,
which sets up an environment before jumping to "main".  The C# and
IronPython situations are the same; it's just that there's more processing
going on before jumping to "main".

I want to give a concrete example of 'generating machine code' per se
(as such).

I run this program: <fiction>

bin= open( 'abinary.exe', 'w' )
bin.write( '\x09\x0f\x00\x00' )
for x in range( 10 ):
   bin.write( '\x04\xA0' + chr( x ) + '\x00' )
bin.write( '\x01\x20\x00\x00' )

It outputs to 'abinary.exe':

\x09\x0f\x00\x00
\x04\xa0\x00\x00
\x04\xa0\x01\x00
\x04\xa0\x02\x00
\x04\xa0\x03\x00
\x04\xa0\x04\x00
\x04\xa0\x05\x00
\x04\xa0\x06\x00
\x04\xa0\x07\x00
\x04\xa0\x08\x00
\x04\xa0\x09\x00
\x01\x20\x00\x00

Which is 12 bytes long and runs in a millisecond.  What it does is set
a memory address to successive integers 0..9, then yields.  Due to the
nature of program flow control, while it runs its first steps on any
x86 machine, the yield only succeeds if on Windows 98+, and crashes
the machine, or otherwise loses control if not.  (That part depends on
those OSses.)

I can try something similar dynamically.

char* mem= alloc( 48 )
setpermission( mem, EXECUTE )
memcpy( mem+ 0, "\x09\x0f\x00\x00", 4 )
for( int x= 0; x< 10; ++x ) {
   memcpy( mem+ 4* (x+ 1 ), '\x04\xA0\x00\x00', 4 )
   mem[ 4* (x+ 1 )+ 3 ]= (char) x
memcpy( mem+ 44, '\x01\x20\x00\x01', 4 )
setjump
goto mem

Which with some imagination produces the contents of 'abinary.exe'
above (one difference, last word) in a memory block, at address 'mem',
then jumps to it, which then jumps back, and then exits. </fiction>

I'll compare a C complation to the first example, 'abinary.exe', and a
JIT compilation to the second example, 'char* mem'.  If the comparison
isn't accurate, say how, because these are places I can start from...
(yes, that is, instead of just repeating the claims).

When does a JIT do this, and what does it do in the meantime?

The JIT works like an assembler/linker that writes to memory. It will
load the
file(s) containing the bytecode and generate the required assembly
instructions into
memory.

In the case there are dependencies to other modules, they will be
loaded as well, and
compiled. Then the linker will take care that cross references between
modules are correct,
like memory addresses and branch targets.

A clever JIT might add instrumentation points, so that it can rewrite
the code using profile
guided optimizations, this means generating optimized code using as
input the program behaviour.

This makes JIT code usually faster than normal compiled code. Although
normal native code is
able to start executing faster, it only targets a specific set of
processors.

JIT code is independent of the processor, and a good JIT
implementation is able to explore the
processor better than a direct native compiler. There is however the
time penalty on program
startup.
 
C

castironpi

Which is 12 bytes long and runs in a millisecond.  What it does is set
a memory address to successive integers 0..9, then yields.  Due to the
nature of program flow control, while it runs its first steps on any
x86 machine, the yield only succeeds if on Windows 98+, and crashes
the machine, or otherwise loses control if not.  (That part depends on
those OSses.)
I can try something similar dynamically.
char* mem= alloc( 48 )
setpermission( mem, EXECUTE )
memcpy( mem+ 0, "\x09\x0f\x00\x00", 4 )
for( int x= 0; x< 10; ++x ) {
   memcpy( mem+ 4* (x+ 1 ), '\x04\xA0\x00\x00', 4 )
   mem[ 4* (x+ 1 )+ 3 ]= (char) x
memcpy( mem+ 44, '\x01\x20\x00\x01', 4 )
setjump
goto mem
Which with some imagination produces the contents of 'abinary.exe'
above (one difference, last word) in a memory block, at address 'mem',
then jumps to it, which then jumps back, and then exits. </fiction>
I'll compare a C complation to the first example, 'abinary.exe', and a
JIT compilation to the second example, 'char* mem'.  If the comparison
isn't accurate, say how, because these are places I can start from...
(yes, that is, instead of just repeating the claims).
When does a JIT do this, and what does it do in the meantime?

The JIT works like an assembler/linker that writes to memory. It will
load the
file(s) containing the bytecode and generate the required assembly
instructions into
memory.

In the case there are dependencies to other modules, they will be
loaded as well, and
compiled. Then the linker will take care that cross references between
modules are correct,
like memory addresses and branch targets.

So far this is the same as any compilation, except the first half is
done, and the output location, which is not any bottleneck.
A clever JIT might add instrumentation points, so that it can rewrite
the code using profile
guided optimizations, this means generating optimized code using as
input the program behaviour.

This makes JIT code usually faster than normal compiled code.

Here you need an example. You are suggesting that a compiler can make
better optimizations if it knows what functions are going to carry
what loads, run how many times, etc., and it can use profile
statistics as a partial indicator to do that.
Although
normal native code is
able to start executing faster, it only targets a specific set of
processors.

JIT code is independent of the processor, and a good JIT
implementation is able to explore the
processor better than a direct native compiler. There is however the
time penalty on program
startup.

Once again, you are asserting that knowing what the program has done
so far, say in the first 5 seconds ( or .5 ), can improve
performance. In this case it can make better use of what instructions
to use on the CPU. I need an example.
 
C

Chris Mellon

Which is 12 bytes long and runs in a millisecond. What it does is set
a memory address to successive integers 0..9, then yields. Due to the
nature of program flow control, while it runs its first steps on any
x86 machine, the yield only succeeds if on Windows 98+, and crashes
the machine, or otherwise loses control if not. (That part depends on
those OSses.)
I can try something similar dynamically.
char* mem= alloc( 48 )
setpermission( mem, EXECUTE )
memcpy( mem+ 0, "\x09\x0f\x00\x00", 4 )
for( int x= 0; x< 10; ++x ) {
memcpy( mem+ 4* (x+ 1 ), '\x04\xA0\x00\x00', 4 )
mem[ 4* (x+ 1 )+ 3 ]= (char) x
memcpy( mem+ 44, '\x01\x20\x00\x01', 4 )
setjump
goto mem
Which with some imagination produces the contents of 'abinary.exe'
above (one difference, last word) in a memory block, at address 'mem',
then jumps to it, which then jumps back, and then exits. </fiction>
I'll compare a C complation to the first example, 'abinary.exe', and a
JIT compilation to the second example, 'char* mem'. If the comparison
isn't accurate, say how, because these are places I can start from...
(yes, that is, instead of just repeating the claims).
When does a JIT do this, and what does it do in the meantime?

The JIT works like an assembler/linker that writes to memory. It will
load the
file(s) containing the bytecode and generate the required assembly
instructions into
memory.

In the case there are dependencies to other modules, they will be
loaded as well, and
compiled. Then the linker will take care that cross references between
modules are correct,
like memory addresses and branch targets.

So far this is the same as any compilation, except the first half is
done, and the output location, which is not any bottleneck.
A clever JIT might add instrumentation points, so that it can rewrite
the code using profile
guided optimizations, this means generating optimized code using as
input the program behaviour.

This makes JIT code usually faster than normal compiled code.

Here you need an example. You are suggesting that a compiler can make
better optimizations if it knows what functions are going to carry
what loads, run how many times, etc., and it can use profile
statistics as a partial indicator to do that.
Although
normal native code is
able to start executing faster, it only targets a specific set of
processors.

JIT code is independent of the processor, and a good JIT
implementation is able to explore the
processor better than a direct native compiler. There is however the
time penalty on program
startup.

Once again, you are asserting that knowing what the program has done
so far, say in the first 5 seconds ( or .5 ), can improve
performance. In this case it can make better use of what instructions
to use on the CPU. I need an example.

Is there a reason why you're expecting c.l.p to be your personal tutor
for Introduction to Compilers?

It's not that I want to dissuade you in your quest for
self-betterment, but you're being extremely confrontational as well as
vastly ignorant about terminology. Defining your own terms that don't
agree with formal definitions and then demanding (not even politely
asking, for goodness sake) that people justify to you, in excruciating
detail, why simple concepts are true is simply an inexcusable way to
behave.There seems to be something of a rash of this on c.l.p lately.

JIT has been around for decades now, it's well documented, well
understood, and quite common. You'd learn enough to answer every
single one of your demands in 20 minutes with Google, and if you're
seriously going to continue to argue that JIT doesn't exist (and this
is even granting your own bizarre definition of compile, which may as
well be called "purplizing") you should be able to argue from a
position of knowledge instead of stunning, jaw dropping, soul
shattering ignorance.
 
P

paulo.jpinto

Regarding exploring processor instructions.

Lets say you compile a C program targeting x86 architecture, with
optimizations
turned on for speed, and let the compiler automatic select MMX and SSE
instructions
for numeric code.

I have now a program that executes very fast, and does what I want
very well. Now
when I execute it on a x86 processor with the new SSE 4 instructions,
it will not
matter, because it cannot take advantage of them.

With a JIT is different. Assuming that the JIT is also aware of the
SSE 4 instructions,
it might take advantage of this new set, if for a given instruction
sequence it is better
to do so.

For the usage of the profile guided optimizations, here go a few
examples.

The JIT might find out that on a given section, the vector indexes are
always correct, so
no need for bounds verification is needed. Or if the language is a OOP
one, it might come
to the conclusion that the same virtual method is always called, so
there is no need for
a VMT lookup before calling the method, thus it replaces the already
generated code by
a direct call.

Or that a small method is called enough times, so it would be better
to inline it instead.

Here are a few papers about profile guided optimizations:

http://rogue.colorado.edu/EPIC6/EPIC6-ember.pdf
http://www.cs.princeton.edu/picasso/mats/HotspotOverview.pdf

Of course most of these optimizations are only visible in applications
that you use for longer
that 5m.
 
P

Paul Boddie

JIT has been around for decades now, it's well documented, well
understood, and quite common.

Apart from Psyco, whose status is hopefully that of being revived
somewhat [1], not quite common enough to permeate the most popular
Python implementation, it would seem.
You'd learn enough to answer every
single one of your demands in 20 minutes with Google, and if you're
seriously going to continue to argue that JIT doesn't exist (and this
is even granting your own bizarre definition of compile, which may as
well be called "purplizing") you should be able to argue from a
position of knowledge instead of stunning, jaw dropping, soul
shattering ignorance.

Well, I'd rather that we went through the process of occasional
tuition here on comp.lang.python - a process which I think has shown
progress and remained moderately on-topic - rather than have the
endless recycling of threads on syntax polishing and the usual
furniture rearrangement, punctuated by outbursts fuelled by gross
misunderstandings which never get corrected because everyone has
offended everyone else and stormed off to agitate elsewhere.

Indeed, I'd like to see such matters discussed more in the Python
community, not less, and I imagine that I'm not alone with this
opinion. Python 3000 is a prime example of how language tidying has
had complete dominance over certain practical matters like
performance. If such discussion leads people to insights that they
otherwise wouldn't have had, thus improving the situation, then I for
one am happy to entertain the inquirer's apparent ignorance.

Paul

[1] http://www.europython.org/Talks and Themes/Abstracts#53
 
C

castironpi

Which is 12 bytes long and runs in a millisecond.  What it does is set
a memory address to successive integers 0..9, then yields.  Due to the
nature of program flow control, while it runs its first steps on any
x86 machine, the yield only succeeds if on Windows 98+, and crashes
the machine, or otherwise loses control if not.  (That part depends on
those OSses.)
I can try something similar dynamically.
char* mem= alloc( 48 )
setpermission( mem, EXECUTE )
memcpy( mem+ 0, "\x09\x0f\x00\x00", 4 )
for( int x= 0; x< 10; ++x ) {
   memcpy( mem+ 4* (x+ 1 ), '\x04\xA0\x00\x00', 4 )
   mem[ 4* (x+ 1 )+ 3 ]= (char) x
memcpy( mem+ 44, '\x01\x20\x00\x01', 4 )
setjump
goto mem
Which with some imagination produces the contents of 'abinary.exe'
above (one difference, last word) in a memory block, at address 'mem',
then jumps to it, which then jumps back, and then exits. </fiction>
I'll compare a C complation to the first example, 'abinary.exe', and a
JIT compilation to the second example, 'char* mem'.  If the comparison
isn't accurate, say how, because these are places I can start from....
(yes, that is, instead of just repeating the claims).
When does a JIT do this, and what does it do in the meantime?
The JIT works like an assembler/linker that writes to memory. It will
load the
file(s) containing the bytecode and generate the required assembly
instructions into
memory.
In the case there are dependencies to other modules, they will be
loaded as well, and
compiled. Then the linker will take care that cross references between
modules are correct,
like memory addresses and branch targets.
So far this is the same as any compilation, except the first half is
done, and the output location, which is not any bottleneck.
Here you need an example.  You are suggesting that a compiler can make
better optimizations if it knows what functions are going to carry
what loads, run how many times, etc., and it can use profile
statistics as a partial indicator to do that.
Once again, you are asserting that knowing what the program has done
so far, say in the first 5 seconds ( or .5 ), can improve
performance.  In this case it can make better use of what instructions
to use on the CPU.  I need an example.

Is there a reason why you're expecting c.l.p to be your personal tutor
for Introduction to Compilers?

It's not that I want to dissuade you in your quest for
self-betterment, but you're being extremely confrontational as well as
vastly ignorant about terminology. Defining your own terms that don't
agree with formal definitions and then demanding (not even politely
asking, for goodness sake) that people justify to you, in excruciating
detail, why simple concepts are true is simply an inexcusable way to
behave.There seems to be something of a rash of this on c.l.p lately.

JIT has been around for decades now, it's well documented, well
understood, and quite common. You'd learn enough to answer every
single one of your demands in 20 minutes with Google, and if you're
seriously going to continue to argue that JIT doesn't exist (and this
is even granting your own bizarre definition of compile, which may as
well be called "purplizing") you should be able to argue from a
position of knowledge instead of stunning, jaw dropping, soul
shattering ignorance.

Chris,

I looked at your profile on Google Groups. I found this quote of
yours from May:

"The messiness of the real world is *why* you should use processes,
not
a reason to avoid them."

I like it and I do not hate you. But we are definitely "off" on the
wrong "foot", for relevant values of "off" and "foot".
Is there a reason why you're expecting c.l.p to be your personal tutor
for Introduction to Compilers?

Yes. There is a reason fueling this expectation. If people want me
to understand compilers, they are making progress. If they don't,
they are pretending to. Are you expecting c.l.p. not to tutor anyone
to any extent at all? Or have I merely reached my quota of free
tuition? I could see the latter, and merely would disagree; I know of
nothing that makes either of us the end-all be-all of free tuition
quotas. I am curious to pursue this knowledge as both an end and a
means, I am willing to put in my share of the time in maintaining the
dialog, and I am not forcing anyone to disclose secret information.
So there.
It's not that I want to dissuade you in your quest for
self-betterment, but you're being extremely confrontational as well as
vastly ignorant about terminology. Defining your own terms that don't
agree with formal definitions and then demanding (not even politely
asking, for goodness sake) that people justify to you, in excruciating
detail, why simple concepts are true is simply an inexcusable way to
behave.There seems to be something of a rash of this on c.l.p lately.

I am talking like I do not understand because I do not understand. If
you think I am ignorant, you are right! The degree of my
confrontationality might be unusually high to you, unusually low to
others, but is just right to me. I am frustrated that I do not
understand yet, and if it's coming across in my temper that's my
fault. Furthermore, if I am to question everyone's motives equally,
then there is no reason to expect that everyone out there, or everyone
on this thread, will cooperate to learn; and some reason to expect
that some will detract.

I do not see myself as making demands. "Why not?" is an inflammatory
question if you're lying, and an exciting one if you're eager to
learn, study, and teach. If you do not agree with or do not know the
principle of questioning authority, I subscribe to it and will share.

My old hero's fear was to be vested with authority, and then let
people down; thencefore, I will not take any that is *not* subject to
question. Self-perception is critical to self-awareness; if I have
false beliefs about myself, I can't correct them without others'
help. Therefore, it's critical for me in pursuing that goal to seek
questions from my peers and friends, and thus common reciprocal
courtesy to offer mine to them.

I am trying to correct my own beliefs in the least harmful way I know
or can conceive. If I fail, it's never permanent; if it's harmful,
that something I want to know.

Would you rather I recite from rote, "JIT exists, JIT exists", or tell
you what I don't understand until I do?
 
C

castironpi

Regarding exploring processor instructions.

Lets say you compile a C program targeting x86 architecture, with
optimizations
turned on for speed, and let the compiler automatic select MMX and SSE
instructions
for numeric code.

I have now a program that executes very fast, and does what I want
very well. Now
when I execute it on a x86 processor with the new SSE 4 instructions,
it will not
matter, because it cannot take advantage of them.

With a JIT is different. Assuming that the JIT is also aware of the
SSE 4 instructions,
it might take advantage of this new set, if for a given instruction
sequence it is better
to do so.

For the usage of the profile guided optimizations, here go a few
examples.

The JIT might find out that on a given section, the vector indexes are
always correct, so
no need for bounds verification is needed. Or if the language is a OOP
one, it might come
to the conclusion that the same virtual method is always called, so
there is no need for
a VMT lookup before calling the method, thus it replaces the already
generated code by
a direct call.

Or that a small method is called enough times, so it would be better
to inline it instead.

Here are a few papers about profile guided optimizations:

http://rogue.colorado.edu/EPIC6/EPI...rinceton.edu/picasso/mats/HotspotOverview.pdf

Of course most of these optimizations are only visible in applications
that you use for longer
that 5m.

There are two things I can emphasize after reading the papers and your
post.

One is the benefit of distributing an incompletely compiled module,
which is that it makes machine-targeted specializations possible right
then and there, and does not require recompiling from the source on
the original author's/distributor's budget. Even if there's no
YOURSUITE.EXE built, the code is still in a state where you can make
one, by only needing JIT.LIB on your machine.

The second is the massive use of lightweight profiling in choosing
optimizations in JIT. One of the advantages is inlining common
function sequences, that you would either have to detect yourself, or
inline everything, to achieve. What are some others?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top