pyvm -- faster python

  • Thread starter Stelios Xanthakis
  • Start date
?

=?iso-8859-1?Q?Fran=E7ois?= Pinard

[Paul Rubin]
François Pinard said:
Deep down, why or how not having a [traditional, to-native-code]
compiler is a deficiency for CPython? We already know that such a
beast would not increase speed so significantly, while using much
more memory.
I'd say the opposite. The 4x speedup from Psyco is quite significant.
The speedup would be even greater if the language itself were more
compiler-friendly.

Psyco is not a traditional compiler. Compilation occurs at run-time, as
typing information is not available without observation.
No of course not, there's quite a bit of overhead in interpretation in
general. Plus, having to do a dictionary lookup for 'bar' in every
call like foo.bar() adds overhead and I don't think I'd call fixing
that as similar to adding static type info.

The dictionary lookup does not occur for local variables. Changes are
planned in Python so lookups may be avoided in a few cases of global
variables. But until (and despite!) such changes occur, compilation
does not buy you much, here. This has been much studied and debated.
Also, lots of times one can do type inference at compile time.

Yes, particularily in simple or small programs. I vaguely remember
that the unborn Viper (or Vyper?) was promising compile time type
inference. It seems that so far, in many years, no one really succeeded
in demonstrating a definite advantage in this direction, at least, not
enough for the community to follow in.
I don't understand this. What purity? Why is real CPython the
only source? There are dozens of C compilers and none of them is
the "only source". Why should Python be different?

Because it is. There are not dozens of Python compilers. A few, maybe.

Pyrex gives me a lot of speed, given I taint my Python sources with
cdefs, ctypes, etc. Oh, I'm absolutely happy with Pyrex when I starve
for speed, but I know and you know that we are not writing pure Python
then. And if, for not touching sources, one supplements them with
separate declarative files, my Python code may look pure, but I have to
maintain those files as well, and Python is not my only source anymore.

So, to get back to the origin of our argument, I'm still not tempted
to think that Python not having a compiler makes it deficient, because
having a compiler (at least in the traditional acceptation of the term)
would not buy it much enough.
 
R

Rocco Moretti

Paul said:
Despite the shrieks of the "Python is not Lisp!" crowd, Python
semantics and Lisp semantics aren't THAT different, and yet compiled
Lisp implementations com completely beat the pants off of interpreted
Python in terms of performance.

I know little about Lisp compilation, so I could be mistaken, but I was
under the impression that one of the differences between Python & Lisp
is directly relevant to compilation issues.

Python, as a procedural language, makes extensive use of globals &
mutable variables. Not only can the contents of a variable change, but
that change can non-explicitly affect a function in a "remote" part of
the program, hence the requirement for the Global Interpreter Lock.
Tracking how changes propagate in Python is non-trivial, as evidenced by
the difficulty of replacing the GIL with a less "obtrusive" alternative.

IIUC, in Lisp, as a functional language, "all politics is local."
Global-like variables are much rarer, and mutability is severely
limited. In this case it is much easier to track how a piece of code
alters the program's state - the propagation of those changes are
handled explicitly, and once you have a reference to a piece of data,
you don't have to worry about some other function changing the value on
you. As such, it's a lot easier to write an optimizing compiler - you
can place much greater limits on what is known at compile time.

It's been quite a while since I've looked at Lisp/functional languages,
though, so I could be misunderstanding what I remember.
 
P

Paul Rubin

Rocco Moretti said:
Python, as a procedural language, makes extensive use of globals &
mutable variables.... IIUC, in Lisp, as a functional language, "all
politics is local." Global-like variables are much rarer, and
mutability is severely limited.

Some people write Lisp code in a functional style, but not everyone.
Lisp provides mutable objects as much as Python does--maybe more,
for example, Lisp strings are mutable. Overall I'd say there's not
much difference between Lisp and Python in this regard. Lisp lets
the programmer supply static type declarations that the compiler
can use, but those are optional. When you use them, you get faster
output code.
 
M

Mike Meyer

Paul Rubin said:
I would say it counts as a compiler and that other languages have
used a similar compilation approach and gotten much better speedups.
For example, Java JIT compilers. The DEC Scheme-to-C translator
and Kyoto Common Lisp also produced C output from their compilers
and got really significant speedups. Part of the problem may be
with the actual Python language.

The DEC Scheme->C compiler (yes, it was an honest to gods compiler -
it just generated C instead of machine code) doesn't sound very
similar to generating C code from byte code. The latter sounds like a
straightforward translation of byte code to C. The Scheme->C
compiler, on the other hand, used all the optimizations known to LISP
compiler writers. It would, with the right settings, produce code that
was comparable to hand-coded C. So unless the Python compiler that
produced the byte code in the first place does all those optimizations
(which I don't know), you're not going to get results that compare
with the Scheme->C system.
Despite the shrieks of the "Python is not Lisp!" crowd, Python
semantics and Lisp semantics aren't THAT different, and yet compiled
Lisp implementations com completely beat the pants off of interpreted
Python in terms of performance. I don't think Python can ever beat
carefully coded C for running speed, but it can and should aim for
parity with compiled Lisp.

There are people out there who claim that compiling carefully coded
LISP can compete with compiling carefully coded C. They use such
systems in real-world, time-critical applications. The thing is, to
get that performance, they had to go back and tell the compiler what
the types of all the variables are, and disable all the runtime type
checking.

So if this is a real goal, you now have some idea of what to look
forward to in Python's future.

<mike
 
S

Stelios Xanthakis

Hi,

Kay said:
Why this? eval() consumes a string, produces a code object and executes
it. Wether the code-object is bytecode or a chunk of machine code makes
a difference in the runtime but does not alter the high level
behavioural description of eval(). In either way the compile() function
behind eval is a JIT.

It makes a difference because
1) You depend on a compiler. Especially if you want to optimize the
machine code you depend on a big program (no tcc for example)
2) Optimizing machine code, needs a lot of code/time.
3) What about debugging info? Dwarf?
4) What if there's a new architecture?

The bytecode compiler is *much* faster to produce the bytecode assembly.


Also, for the other part of the thread, I think that bytecode may
be in fact faster than machine code JIT. Here is a theory:

Suppose that for each algorithm there is the "ideal implementation"
which executes at the speed limit where it can't be done any faster.
For small algorithms the speed limit may be known but for more
complex programs, it's just a theoretical limit. Now for such very
big programs bytecode has the advantage that it achieves very good
code re-use; everything is those 400kB of the libpython core and that
does not increase with the size of the program. In this case, bytecode
is the driver that operates the buldozer (the buldozer being C). So the
theory is that for very complex programs a bytecode+core-library is
closer to the ideal implementation than a JIT which produces/compiles
megabytes of machine code.

Evidence for that may be that all those JIT efforts don't get any
great speed ups (psyco is different as it exposes static-ness).

Of course, if we care about speed we'll use Pyrex to convert some
heavy routines to C. For example if we need FFT, it's madness to
do it in the HLL. Make it part of the buldozer.



Stelios
 
S

Stelios Xanthakis

Armin said:
pyvm has that. A big part of it is written in "lightweight C++" [1].


Realy ? I have downloaded the lwc distribution and checked it out.
It was a surprise that none of the examples are working.
I'm using SuSE 9.0 with gcc 3.3.1 ...

:(

Is there a working version of lwc ???

pyvm is written in lwc-2.0 which is not yet released because
nobody's using it.


Stelios
 
S

Stelios Xanthakis

Stelios said:
Also, for the other part of the thread, I think that bytecode may
be in fact faster than machine code JIT.

Forgot to add: It depends of course on how good is the bytecode.
IMO Python's bytecode is pretty good for its purpose which is
executing a dynamic language with dynamic types and namespaces.

Also, inside pyvm some things are done with internal bytecode
objects. One such thing is "list(generator)". It has been
proved that doing this in bytecode is faster than C.

Stelios
 
A

Armin Steinhoff

Stelios said:
Kay said:
Yes. What we are seeking for and this may be the meaning of Armins
intentiously provocative statement about the speed of running HLLs is a
successor of the C-language and not just another VM interpreter that is
written in C and limits all efforts to extend it in a flexible and
OO-manner. Python is just the most promising dynamic OO-language to
follow this target.


Bytecode engine is the best method for dynamic code execution
("exec", eval, etc). A low level OOP language would be very suitable
for a python VM.

pyvm has that. A big part of it is written in "lightweight C++" [1].

Realy ? I have downloaded the lwc distribution and checked it out.
It was a surprise that none of the examples are working.
I'm using SuSE 9.0 with gcc 3.3.1 ...

Is there a working version of lwc ???

Regards

Armin



That makes it less portable as the lwc preprocessor is using GNU-C
extensions. However, it's the same extensions also used by the linux
kernel and AFAIK the intel compiler supports them too.

So probably the bigger "competitor" of pyvm is boost-python.
And that's one reason the release of the source is stalled until it
gets better.


Stelios

[1] http://students.ceid.upatras.gr/~sxanth/lwc/
 
A

Armin Steinhoff

Stelios said:
Armin said:
pyvm has that. A big part of it is written in "lightweight C++" [1].



Realy ? I have downloaded the lwc distribution and checked it out.
It was a surprise that none of the examples are working.
I'm using SuSE 9.0 with gcc 3.3.1 ...

:(

Is there a working version of lwc ???

pyvm is written in lwc-2.0 which is not yet released because
nobody's using it.

As you mentioned it ... lwc-2.0 is used for pyvm. So it is used :)

Do you have an idea when lwc-2.0 will be releast ?

Everyone who are interested in pyvm will need it ...


-- Armin
 
S

Stelios Xanthakis

Armin said:
As you mentioned it ... lwc-2.0 is used for pyvm. So it is used :)

Do you have an idea when lwc-2.0 will be releast ?

Everyone who are interested in pyvm will need it ...

It will be included together with pyvm. Normally if you want to just
compile pyvm you do not need lwc, it's a preprocessor that generates C
from C++ and the C for pyvm will be pre-generated. You'd need lwc for
modifying/hacking pyvm.


Stelios
 
T

Terry Reedy

Also, for the other part of the thread, I think that bytecode may
be in fact faster than machine code JIT. Here is a theory:
Suppose that for each algorithm there is the "ideal implementation"
which executes at the speed limit where it can't be done any faster. ....
Now for such very
big programs bytecode has the advantage that it achieves very good
code re-use; everything is those 400kB of the libpython core and that
does not increase with the size of the program. In this case, bytecode
is the driver that operates the buldozer (the buldozer being C).

This, of course, is exactly what CPython does. Bytecodes index into a
switch table that jumps to interface code that calls C routines, most of
which have been optimized, on and off, for about 15 years.
So the
theory is that for very complex programs a bytecode+core-library is
closer to the ideal implementation than a JIT which produces/compiles
megabytes of machine code.

Evidence for that may be that all those JIT efforts don't get any
great speed ups (psyco is different as it exposes static-ness).

Of course, if we care about speed we'll use Pyrex to convert some
heavy routines to C. For example if we need FFT, it's madness to
do it in the HLL. Make it part of the buldozer.

And of course, it already is (replacing 'bulldozer' with 'set of
construction equipment'), under the name NumPy or Numarray. And both of
these can take advantage of an Atlas implementation of the BLAS, when
available, that is tuned to a specific platform with the intention of being
the 'ideal implementation' of the Basic Linear Algebra Subroutines for that
platform.

So I congratulate you on conceptually re-inventing Python + C extensions.
But I haven't understood yet what conceptual variations you have or plan to
make on the current implementation.

Current CPyVM simulates a stack machine. This has the virtual of
simplicity and clarity even for someone like me who has never seriously
written assembler. People have made proposals to switch to simulating a
(more complex) register machine, with a new set of bytecodes, that
theoretically would be faster. Do you have any opinion on this issue? You
seem to like the current byte code set while wanting to run faster.

Terry J. Reedy
 
M

Michael Sparks

Stelios Xanthakis wrote:
....
- It's incompatible with CPython. Not all programs run. ....
- The demo is an x86/linux binary only. You shouldn't trust binaries,
run it in a chrooted environment not as root!

Hope it works!

Whatever the merits of a system like this, a closed system with bugs
(read: incompatibility with the standard python) will be considered to
"not work" unfortunately - unless people can fix the (or get fixed)
bugs they encounter.

Releasing as closed source guarantees people will come back to you to
get the code fixed, or abandon using your code in favour of CPython,
Jython or IronPython if they want speed now. (Or PyPy if they want
speed later :)

Releasing open source means that people *may* fix their own bugs, or
abandon the code.

In your release notes you state:
WHERE IS THE SOURCE?:
The source code of pyvm is not yet released. Whether it will and
'when' depends on the interest of the community. Right now I cannot
afford the maintainance costs (fix the source, remove/insert comments,
write docs, fix known but harmless bugs, process bug reports, etc).
I'm not making any money from pyvm, but at least I'd like to avoid
paying for it too!

Whilst that's fair enough - it's your code, your decision - it might be
worth considering that if these are your only objections consider this:
* Interest from the community is likely to be low unless you release
the source. It will be an interesting curio, but no more than that.
* Releasing as open source does NOT imply you have to support (or
market) the code - it's simply releasing.
* If the code isn't stable, bear in mind that the existing python
test suite can largely be used to test your VM and improve it - if
you have something that works and you intend at *some* point to
release the code as open source the sooner you do so, the faster
your project *may* mature.

At the end of the day though, it's your code, you choose what to do with
it. Personally I find your project curious, and if you had fun creating
the project (or its useful in some other way), then it strikes me as a
positive thing (your release URL implies you're a student!).

Best Regards,


Michael.
--
(e-mail address removed)
British Broadcasting Corporation, Research and Development
Kingswood Warren, Surrey KT20 6NP

This message (and any attachments) may contain personal views
which are not the views of the BBC unless specifically stated.
 
S

Stelios Xanthakis

Hi Michael

[...]
Releasing open source means that people *may* fix their own bugs, or
abandon the code.
[...]

I agree with all the points made.

Moreover let me add that "code is one expression of a set of good
ideas", and ideas want to be free! ;)

I've decided to release the source code of pyvm as soon as it's ready.

Right now *it doesn't* make much sense to give the source because it is
still at an early development stage. Even if I did and people sent
patches they wouldn't apply because I still make big changes to the
architecture of it. I'd like to keep it in this status where I can
modify the structure of the program until it becomes really
developer-friendly. And IMO it doesn't make sense to release incomplete
open source projects: either give something that's good and people can
happily hack, or don't do it at all. Giving out the source of an
unstable project will most likely harm it (see CherryOS incidents).

The bottomline is that I estimate that pyvm will be ready within
the summer.


Thanks,

Stelios
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top