A historical question

Duncan Booth · Sep 8, 2004

Jerald said:
Hi.

I'd like to know when python started working with bytecode.
It seems natural that in the first python implementations
code was really interpreted : executed directly.

As a result, in the first days, when the py-programmer
said:

def foo ():
print 'foo'

python stored the function body and executed it each time
foo was called. In some time it was decided to compile
this to bytecode, optimize it and call the bytecode instead.

Is it so?

According to Google, in April 1994 Guido posted complaining about some of the
inefficiencies in the bytecode interpreter:

http://groups.google.co.uk/[email protected]

I doubt very much whether there has ever been any implemention of Python that
didn't use a bytecode of some form. It would be a very perverse way to try to
write a language.

Peter Hansen · Sep 8, 2004

Jerald said:
I'd like to know when python started working with bytecode.
It seems natural that in the first python implementations
code was really interpreted : executed directly.

Why does that seem natural to you?

-Peter

Larry Bates · Sep 8, 2004

Unless I'm mistaken it is nearly impossible to
"execute" any software without translating the
source into some intermediate (read bytecode) set
of tokens and operators. All interpreters must
parse the source code and create some structured
representation (even if it is only internal) that
is normally VERY different from the source code
itself. Some interpreters never save out this
"byte code", but it exists nevertheless.

Larry Bates
Syscon, Inc.

Paul Watson · Sep 8, 2004

Larry Bates said:
Unless I'm mistaken it is nearly impossible to
"execute" any software without translating the
source into some intermediate (read bytecode) set
of tokens and operators. All interpreters must
parse the source code and create some structured
representation (even if it is only internal) that
is normally VERY different from the source code
itself. Some interpreters never save out this
"byte code", but it exists nevertheless.

Larry Bates
Syscon, Inc.

Agreed. However, we should also consider that "compiled" excutable images
in machine language are simply bytecodes to the processor microcode.

Now... If we had a processor for which we could write microcode to execute
Python or Parrot bytecode, ...

Carlos Ribeiro · Sep 8, 2004

I'd like to know when python started working with bytecode.
It seems natural that in the first python implementations
code was really interpreted : executed directly.

I assume that you expect direct execution to be the easiest way to
start implementing a new language. However, that's far from true. It's
actually pretty difficult to execut programs in procedural languages
*without* some form of intermediate code, and almost all computer
languages are compiled at some level before execution. The only
situation where direct execution makes sense is in the case of simple
command line interfaces; some simple shell script languages may be
still executed this way, but that's an exception to the rule. Even old
languages such as BASIC used to be compiled to some form of
intermediate code -- a similar concept to Python's bytecode, but much
simpler.

You may think that to create a virtual machine or compiler for a new
language is a hard task. But there is a huge body of theorethical
knowledge regarding all the pieces of software required to implement a
new computer language that can be used for this purpose. There is no
need to reinvent the wheel here. Concepts such as language parser,
intermediate code generator, optimizer, etc -- they're all quite old
and well understood. Automatic tools and code generators can be used,
given the language definition, to create a basic compiler for it. Of
course, there are a few areas with hot new advancements, but the
basics are already solidly understood.

The most dificult part is not implementing the basic compiler or
virtual machine. The hardest part is coming up with a clear and
powerful language design. That's where Python really shines.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)

Peter Hansen · Sep 8, 2004

Carlos said:
I assume that you expect direct execution to be the easiest way to
start implementing a new language. However, that's far from true. It's
actually pretty difficult to execut programs in procedural languages
*without* some form of intermediate code, and almost all computer
languages are compiled at some level before execution. The only
situation where direct execution makes sense is in the case of simple
command line interfaces; some simple shell script languages may be
still executed this way, but that's an exception to the rule. Even old
languages such as BASIC used to be compiled to some form of
intermediate code -- a similar concept to Python's bytecode, but much
simpler.

This is not, as far as I know, true. At least, not for the
general case, although certain specific implementations of
BASIC may have worked this way.

If you are thinking, for example, of how the early BASICs
on things like the Apple ][ and the PET computers worked,
you are misinterpreting (no pun intended) what actually
happened, IMHO.

The only "compilation" that went on was actually called
"tokenization", and that meant only that keywords such
as PRINT were turned into single-byte values that corresponded
directly to the keyword in the source. The rest of the
source was left as-is, including the quotation marks around
strings, variable names, etc. I think whitespace was
generally compressed (i.e. multiple spaces in places where it
wasn't syntactically meaningful were turned into one or none)
but this and the tokenization was more for memory conservation
than for anything else.

I guess one could call this "compilation"... I wouldn't.
In fact, I think in general compilation is a process which
is not 100% reversible, whereas tokenization in the form
BASIC did it was (whitespace aside).

-Peter

Carlos Ribeiro · Sep 8, 2004

<snip>

The only "compilation" that went on was actually called
"tokenization", and that meant only that keywords such
as PRINT were turned into single-byte values that corresponded
directly to the keyword in the source.

You're right -- I oversimplified my explanation to reinforce the fact
that few systems ever run the program directly from the source code,
as the original poster implied in his question. Tokenization is only
the first step. But as a generic (and simple) explanation, its result
is conceptually one step closer to Python's (or Java's) bytecode than
the original (textual) program source.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)

Jerald · Sep 8, 2004

Hi.

I'd like to know when python started working with bytecode.
It seems natural that in the first python implementations
code was really interpreted : executed directly.

As a result, in the first days, when the py-programmer
said:

def foo ():
print 'foo'

python stored the function body and executed it each time
foo was called. In some time it was decided to compile
this to bytecode, optimize it and call the bytecode instead.

Is it so?

I am very curious.

Gerald

John Bauman · Sep 9, 2004

According to Google, in April 1994 Guido posted complaining about some of
the
inefficiencies in the bytecode interpreter:

http://groups.google.co.uk/[email protected]

I doubt very much whether there has ever been any implemention of Python
that
didn't use a bytecode of some form. It would be a very perverse way to try
to
write a language.

From some things I read about Parrot, I'm under the impression that Ruby
(and Perl, partially) don't (yet) use bytecodes (at least internally - they
may be used as an external representation). Instead, the program is parsed
into an abstract syntax tree and the program is interpreted by walking the
tree. See http://en.wikipedia.org/wiki/Interpreted_language . The same
method would probably work with Python.

Greg Ewing · Sep 10, 2004

Peter said:
The only "compilation" that went on was actually called
"tokenization" ... was more for memory conservation
than for anything else.

It undoubtedly helped execution speed a lot, too.
The main loop of the interpreter consisted of fetching
the next token and consulting a jump table -- much
like the switch statement in ceval().

Programming games in historical linguistics with Python	3	Nov 30, 2010
historical question, C unary operators	13	Mar 29, 2012
Using Python for a demonstration in historical linguistics	10	Nov 6, 2010
Java Indexing- Historical question	62	Jan 23, 2009
Struct assignment, historical question	6	Jun 6, 2006
Historical Data	2	Oct 15, 2004
Historical question, why fwrite and not binary specifier for fprintf?	13	Nov 27, 2007
question about a command like 'goto ' in Python's bytecode or it'sjust a compiler optimization?	8	Jun 17, 2009

A historical question

Duncan Booth

Peter Hansen

Larry Bates

Paul Watson

Carlos Ribeiro

Peter Hansen

Carlos Ribeiro

Jerald

John Bauman

Greg Ewing

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads