compiling to python byte codes

M

Maurice LING

Hi,

I remembered reading a MSc thesis about compiling Perl to Java bytecodes
(as in java class files). At least, it seems that someone had compiled
scheme to java class files quite successfully. I'm wondering if
something of such had been attempted in python, as in compiling X
language into .pyc. I do not understand the schematics of .pyc files but
I assume that they are the so called python bytecode files.

Or is there any documentation or books that is the python equivalent of
"Programming for the Java Virtual Machine" by Joshua Engel?

Thanks
Maurice
--
Maurice Han Tong LING, BSc(Hons)(MCB), AdvDipComp, SSN, FIFA
Doctor of Philosophy (Science) Candidate, The University of Melbourne
mobile: +61 4 22781753
+65 96669233
mailing address: Department of Zoology, The University of Melbourne
Royal Parade, Parkville, Victoria 3010, Australia
residential address: 9/41 Dover Street
Flemington, Victoria 3031, Australia
email: (e-mail address removed)
resume: http://maurice.vodien.com/maurice_resume.pdf
www: http://www.geocities.com/beldin79/

The information contained in this message, including its attachment(s),
is CONFIDENTIAL and solely intended to its addressee(s) only. The
content of this message, including its attachment(s), may be subjected
to copyright and privacy laws. If you have received this email in error,
please let me know by returning this email, and then destroy all copies.

"I cannot discover anyone knows enough to say definitely what is
and what is not possible" -Henry Ford
"The difference between the impossible and the possible lies
in a person's determination" -Tommy Charles Lasorda
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Maurice said:
I remembered reading a MSc thesis about compiling Perl to Java bytecodes
(as in java class files).

You don't have to look that far. Jython compiles Python code into Java
bytecode; IronPython compiles Python code into Microsoft intermediate
language.
I'm wondering if
something of such had been attempted in python, as in compiling X
language into .pyc.

The easiest way to create a .pyc file is to create a Python file,
and then compile that. There are various tools that compile X to
..pyc. For example, Fnorb compiles OMG IDL into .pyc files.
I do not understand the schematics of .pyc files but
I assume that they are the so called python bytecode files.

That's correct.
Or is there any documentation or books that is the python equivalent of
"Programming for the Java Virtual Machine" by Joshua Engel?

There is the dis module and its documentation. However, as I said, in
Python, you don't really *need* to create .pyc files directly, as
the Python compiler is always available through the compile() builtin
function. This is unlike Java or .NET, where the compiler is not
available in the JRE, or the .NET commercial framework.

Regards,
Martin
 
L

Leif K-Brooks

Maurice said:
Or is there any documentation or books that is the python equivalent of
"Programming for the Java Virtual Machine" by Joshua Engel?

Python's byte code isn't very stable, so you might have to recreate your
entire code base with every new Python version. I would suggest
generating Python code (not byte code) instead and compiling that.
 
M

Michael Foord

Martin v. Löwis said:
You don't have to look that far. Jython compiles Python code into Java
bytecode; IronPython compiles Python code into Microsoft intermediate
language.


The easiest way to create a .pyc file is to create a Python file,
and then compile that. There are various tools that compile X to
.pyc. For example, Fnorb compiles OMG IDL into .pyc files.


That's correct.


There is the dis module and its documentation. However, as I said, in
Python, you don't really *need* to create .pyc files directly, as
the Python compiler is always available through the compile() builtin
function. This is unlike Java or .NET, where the compiler is not
available in the JRE, or the .NET commercial framework.

Regards,
Martin


But that still doesn't answer the OPs question which is about writing
code in another language to generate python bytecode....

Which is interesting.. but not that interesting I suppose.
Is python bytecode *that* different to Java bytecode (not in detail
but in concept ?). There's no reason why another compiler couldn't
emit python bytecode to run on the 'python virtual machine' ? (plenty
of reasons not to do it I suppose just no reasons why it shouldn't be
possible).

Regards,

Fuzzy

http://www.voidspace.org.uk/atlantibots/pythonutils.html
 
M

Michael Hudson

Maurice LING said:
Hi,

I remembered reading a MSc thesis about compiling Perl to Java
bytecodes (as in java class files). At least, it seems that someone
had compiled scheme to java class files quite successfully. I'm
wondering if something of such had been attempted in python, as in
compiling X language into .pyc.

Not to my knowledge. It wouldn't be very interesting: the Python
bytecode is pretty Python specific.
I do not understand the schematics of .pyc files but I assume that
they are the so called python bytecode files.

Or is there any documentation or books that is the python equivalent
of "Programming for the Java Virtual Machine" by Joshua Engel?

Nope. As others point out, the details tend to change each (major)
version of Python. The documentation for the standard library module
'dis' might help. You could also look at the 'bytecodehacks' package
(google, and make sure you get the CVS version).

Cheers,
mwh
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Michael said:
But that still doesn't answer the OPs question which is about writing
code in another language to generate python bytecode....

I did. I told him about the compile() function, and indeed,

compile("2+4","<string>","eval")

generates Python bytecode.
Is python bytecode *that* different to Java bytecode (not in detail
but in concept ?).

Yes. Java bytecode is typed; Python bytecode is not.
There's no reason why another compiler couldn't
emit python bytecode to run on the 'python virtual machine' ?

It is certainly possible. Indeed, the Python compiler does generate
Python bytecode from source code, so it must be possible :)

Regards,
Martin
 
M

Maurice LING

Hi Martin,
I did. I told him about the compile() function, and indeed,

compile("2+4","<string>","eval")

generates Python bytecode.

Can I feed a python source file into compile(), line by line, and expect
it to generate a working .pyc file? I suppose my intended use is to be
able to handle python codes written at run time, to execute python codes
line by line, in a python program. It is somewhat like a tracer routine
that can interpret a line of python code, read out the variables, before
going to the next line of python code. Can compile() do this, or do I
have to use pexpect to run an instance of python?
Yes. Java bytecode is typed; Python bytecode is not.

I was thinking that it may be simpler to say, write a PHP-to-Python
compiler which compiles PHP into an intermediate form, which is then
converted into python bytecodes, rather than trying to automate source
code conversion from PHP to Python. Well, PHP is just an off-hand
example, it may be COBOL or Pascal. Any ideas?

Regards
Maurice
 
M

Maurice LING

Hi Leif,
Python's byte code isn't very stable, so you might have to recreate your
entire code base with every new Python version. I would suggest
generating Python code (not byte code) instead and compiling that.

So are you suggesting that say I want to write a compiler to compile
Pascal to Python Virtual Machine, it will be wiser to do source code
conversion from Pascal to Python?

maurice
 
J

Jason Lai

Maurice said:
Hi Leif,



So are you suggesting that say I want to write a compiler to compile
Pascal to Python Virtual Machine, it will be wiser to do source code
conversion from Pascal to Python?

maurice

What about generating an Abstract Syntax Tree (compiler.ast) and using
the compiler module (compiler.pycodegen) to write the bytecode?
 
J

Jeremy Bowers

Can I feed a python source file into compile(), line by line, and expect
it to generate a working .pyc file? I suppose my intended use is to be
able to handle python codes written at run time, to execute python codes
line by line, in a python program. It is somewhat like a tracer routine
that can interpret a line of python code, read out the variables, before
going to the next line of python code. Can compile() do this, or do I
have to use pexpect to run an instance of python?

Why don't you clearly spell out your intended use and ask about that,
instead?

If, based on your use of "I suppose" and "somewhat", you are still unclear
on your intended use, figuring that out would be step #1. :)

Many good modules exist for many things already; if you're trying to trace
for instance, there is a module for that. Let's start at the beginning:
What are you trying to do?
 
M

Maurice LING

Jeremy said:
Why don't you clearly spell out your intended use and ask about that,
instead?

If, based on your use of "I suppose" and "somewhat", you are still unclear
on your intended use, figuring that out would be step #1. :)

Many good modules exist for many things already; if you're trying to trace
for instance, there is a module for that. Let's start at the beginning:
What are you trying to do?

I am using SBML (system biology markup language) as a front-end
modelling language for my project. And for ease of further maintenance
of the model and interoperability purposes, my project requires me to
convert it into an intermediate form (MA), which is somewhat assembly is
structure, as in, each instruction takes the form of <opcode>
<operand>*. Here I am, attempting to write a virtual machine that can
run MA, using python. So, it becomes a MA virtual machine running on
python virtual machine.

My concern is, is it simpler to convert MA to python codes or python
bytecodes. What are the pros and cons? Assuming that to convert to
python source code is a choice, I'm thinking that MA virtual machine can
then read a MA instruction and output the corresponding python source
codes, but are there facilities in python to run python codes, line by
line, as it is being thrown out by MA virtual machine?

As a side note, does anyone think that this project might be suitable
enough to apply for PSF Grant?

Thanks
Maurice

--
Maurice Han Tong LING, BSc(Hons)(MCB), AdvDipComp, SSN, FIFA
Doctor of Philosophy (Science) Candidate, The University of Melbourne
mobile: +61 4 22781753
+65 96669233
mailing address: Department of Zoology, The University of Melbourne
Royal Parade, Parkville, Victoria 3010, Australia
residential address: 9/41 Dover Street
Flemington, Victoria 3031, Australia
email: (e-mail address removed)
resume: http://maurice.vodien.com/maurice_resume.pdf
www: http://www.geocities.com/beldin79/

The information contained in this message, including its attachment(s),
is CONFIDENTIAL and solely intended to its addressee(s) only. The
content of this message, including its attachment(s), may be subjected
to copyright and privacy laws. If you have received this email in error,
please let me know by returning this email, and then destroy all copies.

"I cannot discover anyone knows enough to say definitely what is
and what is not possible" -Henry Ford
"The difference between the impossible and the possible lies
in a person's determination" -Tommy Charles Lasorda
 
J

Jeremy Bowers

I am using SBML (system biology markup language) as a front-end
modelling language for my project. And for ease of further maintenance
of the model and interoperability purposes, my project requires me to
convert it into an intermediate form (MA), which is somewhat assembly is
structure, as in, each instruction takes the form of <opcode>
<operand>*. Here I am, attempting to write a virtual machine that can
run MA, using python. So, it becomes a MA virtual machine running on
python virtual machine.

Hmmm, could you post an example of this assembly-like code? It might be
easiest to implement a Python interpreter directly; if the assembly-like
code is simple enough it isn't even worth a true parser.

Without knowing about your code, I can't be sure, but I would be surprised
if MA is similar enough to Python to make it worth running MA on the
Python machine directly.

Assembly language is right up there with LISP (without macros) in terms of
ease of parsing, if no opcode ever crosses multiple lines.
 
M

Maurice LING

Jeremy said:
Hmmm, could you post an example of this assembly-like code? It might be
easiest to implement a Python interpreter directly; if the assembly-like
code is simple enough it isn't even worth a true parser.

What do you mean by implementing a Python interpreter directly? Sorry, I
am unable to provide an example of this assembly-like code. This is
currently still unpublished work, so I'm not able to disclose much,
especially in a public forum.
Without knowing about your code, I can't be sure, but I would be surprised
if MA is similar enough to Python to make it worth running MA on the
Python machine directly.

Do you think that there is very slight chance that it is worthwhile
converting MA directly to python bytecodes? This is how I read it.
Please tell me if I've misunderstood you.
Assembly language is right up there with LISP (without macros) in terms of
ease of parsing, if no opcode ever crosses multiple lines.

Some parts of MA is still undergoing development and cleaning up but I
certainly do not see why any opcode should cross multiple lines. As far
as I can see, 70% of the opcodes are able to be represented by multiple
lines of python codes. I've not thought hard enough on this yet.

All I can say is that MA looks similar to any assembly is structure,
with directives.

Sorry that I am not able to disclose much, but hope to get some opinions
based on what I can say.

Thank you,
Maurice
 
L

Leif K-Brooks

Jason Lai wrote:
[talking about compiling some language besides Python to Python bytecode]
What about generating an Abstract Syntax Tree (compiler.ast) and using
the compiler module (compiler.pycodegen) to write the bytecode?

That would certainly be possible, but it seems to me like it might be
easier to generate Python code. You're using Python logic if you use its
AST, after all.
 
J

Jason Lai

Maurice said:
Some parts of MA is still undergoing development and cleaning up but I
certainly do not see why any opcode should cross multiple lines. As far
as I can see, 70% of the opcodes are able to be represented by multiple
lines of python codes. I've not thought hard enough on this yet.

All I can say is that MA looks similar to any assembly is structure,
with directives.

Well, x86/PPC assembly operates on registers. Python uses a stack.
See http://docs.python.org/lib/bytecodes.html

If you're using registers, I guess you'd have to store the values in
variables, and load/store them through the stack whenever you do an
operation -- maybe with some optimization if you can keep the result on
the stack.

Python pretty much only lets you run a block of code at once (using exec
or eval). So if you compile it line by line on the fly, your VM would
have to ask Python to run each line, and take care of unstructured jumps
itself. Python doesn't really like arbitrary gotos anyway. I assume if
you were translating to Python code, you'd have to have the whole block
for if, while, etc, ahead of time. Or only jump backwards, since you
can't jump to something that hasn't been written yet.

- Jason
 
M

Maurice LING

Well, x86/PPC assembly operates on registers. Python uses a stack.
See http://docs.python.org/lib/bytecodes.html

If you're using registers, I guess you'd have to store the values in
variables, and load/store them through the stack whenever you do an
operation -- maybe with some optimization if you can keep the result on
the stack.

I dont't quite get this right. Since x86/PPC uses register operations,
why do virtual machines, like python's and java's, are designed as stack
machines? Why not just stick to registers?

maurice
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Maurice said:
Can I feed a python source file into compile(), line by line, and expect
it to generate a working .pyc file?

compile() requires the complete source code. However, it might be that
the complete source code is just a single statement, or a single
function exectuting a single statement.

Did you read the documentation of compile()? It would have told you that
compile() does not generate .pyc files at all. Instead, it generates
code objects, and you use the marshal module to save them into .pyc
files.
I suppose my intended use is to be
able to handle python codes written at run time, to execute python codes
line by line, in a python program.

Single-step execution is an issue entire independent of generating
Python bytecode, or source code. Regardless of how you have generated
the Python bytecode (directly, or through source code), Python supports
single-stepping of byte code (on a line-per-line-of-source-code basis).

However, when you generate Python code (source or byte), and you know
you are going to need single-stepping, you should put single-stepping
*into the generated code*. I.e. if your input language reads

action 1
action 2
action 3

you generate

def program():
starting()
do_action_1()
step_done()
do_action_2()
step_done()
do_action_3()
step_done()

Then, a proper implementation of step_done() will allow for user
interaction, giving you single-step capabilities.
I was thinking that it may be simpler to say, write a PHP-to-Python
compiler which compiles PHP into an intermediate form, which is then
converted into python bytecodes, rather than trying to automate source
code conversion from PHP to Python. Well, PHP is just an off-hand
example, it may be COBOL or Pascal. Any ideas?

No. Generating source code is *always* simpler. There are three reasons
why one would not generate source code even though it is simpler:
- you don't have a compiler for the source code available on the target
system. This is the compile-to-JVM example.
- the compiler for the source language is gives inefficient byte code,
and you can do better. Although it is theoretically possible, it is
unlikely to happen in practice (not because the compiler is already
optimal, but because it is very difficult to do better - if it wasn't,
the authors of the compiler would have improved it already)
- certain VM opcodes are not available through source code. This sounds
theoretical, too - why would the VM include opcodes that will never
occur in practice? The real-world example is .NET, though, which
supports many languages, and thus supports constructs not available
in, say, C# (like global fields). This is not the case for Python,
though: Python uses virtually all of its opcodes.

Regards,
Martin
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Maurice said:
What do you mean by implementing a Python interpreter directly? Sorry, I
am unable to provide an example of this assembly-like code. This is
currently still unpublished work, so I'm not able to disclose much,
especially in a public forum.

He didn't mean to suggest that that you write an interpreter *of*
Python, but an interpreter *of* your language *in* Python. Instead
of compiling your intermediate language into Python bytecode,
directly implement the VM (if you prefer that term over "interpreter")
for MA in Python.

Regards,
Martin
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Maurice said:
I dont't quite get this right. Since x86/PPC uses register operations,
why do virtual machines, like python's and java's, are designed as stack
machines? Why not just stick to registers?

I really think you should study programming language implementations
for some time before approaching your problem.

For an interpreter, what the processor does is completely irrelevant
(not completely if you have a just-in-time compiler, as that needs
to generate machine code, but totally irrelevant if you have an
interpreter). Using a stack-based implementations allows to simplify
the opcodes - many opcodes don't need parameters, or atmost a single
parameter. This allows to survive with less than 256 opcodes, which
is the source for calling these opcodes "byte code". That, in turn,
allows for an implementation that uses an "interpreter loop", which
consists of a "giant switch".

In x86, a single instruction has between 1 and 20 bytes, and the
decoding process (finding out what the instruction does) is
very lengthy. For a microprocessor, this doesn't matter, since it
is done in hardware, and in parallel with executing other
instructions (pipelining). For an interpreter, the decoding process
must be superfast, and therefore supersimple.

Regards,
Martin
 
D

Diez B. Roggisch

Maurice said:
I dont't quite get this right. Since x86/PPC uses register operations,
why do virtual machines, like python's and java's, are designed as stack
machines? Why not just stick to registers?

Because stacks are common to _all_ processors, where registers are differing
from architecture to architecture - the x86 hasn't been very gifted in that
respect (not sure if that changed recently - at least the SIMD instructions
introduced registers, but you can't rely on that beeing available)

So resorting to stacks makes the implementation totally independend of the
actual processor architecture - and stacks are as good as registers in
terms of abstract use.

What a JIT then does is purely up to its implementors - but thats another
topic.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top