Python is faster than C

Armin Rigo · Apr 3, 2004

Hi!

This is a rant against the optimization trend of the Python interpreter.

Sorting a list of 100000 integers in random order takes:

* 0.75 seconds in Python 2.1
* 0.51 seconds in Python 2.2
* 0.46 seconds in Python 2.3

Tim Peters did a great job optimizing list.sort(). If I try with a
simple, non-stable pure Python quicksort implementation, in Python 2.3:

* 4.83 seconds
* 0.21 seconds with Psyco

First step towards world domination of high-level languages

The reason that Psyco manages to outperform the C implementation is not
that gcc is a bad compiler (it is about 10 times better than Psyco's).
The reason is that the C implementation must use a generic '<' operator
to compare elements, while the Psyco version quickly figures out that it
can expect to find ints in the list; it still has to check this
assumption, but this is cheap and then the comparison is done with a
single machine instruction.

Similarily, here are some results about the heapq module, which is
rewritten in C in the CVS tree for Python 2.4:

l = [random.random() for x in range(200000)]
heapq.heapify(l)

This code executes on my laptop in:

* 1.96 seconds on Python 2.3 (pure Python)
* 0.18 seconds on Python 2.4cvs (rewritten in C)
* 0.16 seconds on Python 2.3 with Psyco

So this is not so much a plug for Psyco as a rant against the current
trend of rewriting standard modules in C. Premature optimization and
all that.

Worse, and more importantly, the optimization starts to become visible
to the programmer. Iterators, for example, are great in limited cases
but I consider their introduction a significant complication in the
language; before, you could expect that some function from which you
would expect a sequence returned a list. Python was all lists and
dicts, with dicts used as namespaces here and there. Nowadays you have
to be careful. Moreover, it is harder to explain:

zip([1,2,3], [4,5,6]) # easy to understand and explain

Click to expand...

Click to expand...

[(1, 4), (2, 5), (3, 6)]

enumerate([6,7,8,9]) # uh ?

Click to expand...

Click to expand...

<enumerate object at 0x401a102c>

I know you can always do list(_). My point is that this is a
user-visible optimization. enumerate() should return a normal list, and
it should be someone else's job to ensure that it is correctly optimized
away if possible (and I'm not even talking about Psyco, it could be done
in the current Python implementation with a reasonable amount of
effort).

Protesting-ly yours,

Armin

Michel Claveau/Hamster · Apr 3, 2004

Hi !

For the 1st April, it's finish.

Josiah Carlson · Apr 3, 2004

For the 1st April, it's finish.

That wasn't a joke, psyco does in fact work /really/ well for some things.

- Josiah

Armin Rigo · Apr 3, 2004

Michel said:
For the 1st April, it's finish.

Check it for yourself. Find yourself an Intel machine and grab Psyco
from http://psyco.sf.net. Here is the source of my test:

# Python Quicksort Written by Magnus Lie Hetland
# http://www.hetland.org/python/quicksort.html

def _partition(list, start, end):
pivot = list[end]
bottom = start-1
top = end

done = 0
while not done:

while not done:
bottom = bottom+1

if bottom == top:
done = 1
break

if pivot < list[bottom]:
list[top] = list[bottom]
break

while not done:
top = top-1

if top == bottom:
done = 1
break

if list[top] < pivot:
list[bottom] = list[top]
break

list[top] = pivot
return top

def _quicksort(list, start, end):
if start < end:
split = _partition(list, start, end)
_quicksort(list, start, split-1)
_quicksort(list, split+1, end)

def quicksort(list):
if len(list) > 1:
_quicksort(list, 0, len(list)-1)

# ____________________________________________________________

import random, time, psyco
l = range(100000)
random.shuffle(l)

#TIMEIT = "l.sort()"
#TIMEIT = "quicksort(l)"
TIMEIT = "psyco.proxy(quicksort)(l)"

print TIMEIT, ':',
t = time.time()
exec TIMEIT
print time.time() - t

assert l == range(100000)

Paul Rubin · Apr 3, 2004

Armin Rigo said:
enumerate([6,7,8,9]) # uh ?

Click to expand...

Click to expand...

<enumerate object at 0x401a102c>

I know you can always do list(_). My point is that this is a
user-visible optimization. enumerate() should return a normal list, and
it should be someone else's job to ensure that it is correctly optimized
away if possible (and I'm not even talking about Psyco, it could be done
in the current Python implementation with a reasonable amount of
effort).

I think enumerate(xrange(1000000000)) returning a normal list would
exhaust memory before some later optimizer had a chance to do anything
with it.

Joe Mason · Apr 3, 2004

The reason that Psyco manages to outperform the C implementation is not
that gcc is a bad compiler (it is about 10 times better than Psyco's).
The reason is that the C implementation must use a generic '<' operator
to compare elements, while the Psyco version quickly figures out that it
can expect to find ints in the list; it still has to check this
assumption, but this is cheap and then the comparison is done with a
single machine instruction.

Why can't the C implementation do the same thing?

Joe

Armin Rigo · Apr 3, 2004

Hello Paul,

Paul said:
I think enumerate(xrange(1000000000)) returning a normal list would
exhaust memory before some later optimizer had a chance to do anything
with it.

There are two levels: the language specification and its implementation.
My point is that there is no reason why everything that is (at the
language level) a list, should always be implemented as a plain array of
objects. The list returned by range(1000000) doesn't have to use 4MB of
memory when the same information can be encoded in a couple of ints.
The presence of xrange() is the oldest of these user-visible
optimization hack that should have been optimized in the implementation
instead. Similarily, if you do 'a'*999999999 I can think of a better
way to encode the resulting string than with 100MB of RAM.

On your specific example, we could also argue that a slightly involved
but reasonable amount of effort would allow C functions to internally
receive a context hint, which would tell enumerate() that its result
will only ever be used for iteration when this is the case, so that it
can internally return an iterable instead of a list -- but this would
only be a hack, a workaround for the limits of CPython's internal object
representations.

Armin

Paul Rubin · Apr 3, 2004

Armin Rigo said:
There are two levels: the language specification and its implementation.
My point is that there is no reason why everything that is (at the
language level) a list, should always be implemented as a plain array of
objects. The list returned by range(1000000) doesn't have to use 4MB of
memory when the same information can be encoded in a couple of ints.
The presence of xrange() is the oldest of these user-visible
optimization hack that should have been optimized in the implementation
instead.

I think you're saying that instead of having xrange return a special
object, range should return that special object instead. I'm not too
clear on the distinction. Also, how can the compiler or interpreter
know whether the program will only access the object sequentially?
It has to predict the behavior of the program, instead of being told
explicitly.

Andrew MacIntyre · Apr 3, 2004

This is a rant against the optimization trend of the Python interpreter.

Sorting a list of 100000 integers in random order takes:

* 0.75 seconds in Python 2.1
* 0.51 seconds in Python 2.2
* 0.46 seconds in Python 2.3

Tim Peters did a great job optimizing list.sort(). If I try with a
simple, non-stable pure Python quicksort implementation, in Python 2.3:

* 4.83 seconds
* 0.21 seconds with Psyco

First step towards world domination of high-level languages
{...}

So this is not so much a plug for Psyco as a rant against the current
trend of rewriting standard modules in C. Premature optimization and
all that.
{...}

Protesting-ly yours,

While I certainly appreciate pysco and what it can do, I believe your
protest to be unreasonable as it denies better performance on platforms
that psyco doesn't yet (& probably never will) support.

Moreover, your protest about iterators is also unreasonable as people are
benefitting from the reduced memory consumption iterators and their ilk
bring (quite often accompanied by performance gains from not having to
thrash large amounts of RAM through pitiful caches). As such, iterators
are a _different_ optimisation, and I hope that you can come to terms with
this and psycoise them too!

Armin Rigo · Apr 3, 2004

Paul said:
I think you're saying that instead of having xrange return a special
object, range should return that special object instead. I'm not too
clear on the distinction.

No, range should return an object that is a list, as far as you can tell
from Python, but which is represented more efficiently than an array of
objects internally. The distinction is between the language level (it
would be a list, with all operations, etc.) and the implementation
(there is no reason why all lists should be arrays of PyObjects
internally).

Another example would be 'a'*999999999: the result is a string, but
there is no reason that it takes 100MB of memory. Instead, store it
into a C structure that contains a pointer to the original string object
'a' and the repetition counter, but still give this C structure the
Python type str, so that the difference doesn't show up and the Python
language remains simple. (This is a bit difficult to implement
currently in CPython, but not impossible.)

Also, how can the compiler or interpreter
know whether the program will only access the object sequentially?
It has to predict the behavior of the program, instead of being told
explicitly.

Ideally: If you do x=range(100); x[50]='hi' then the interpreter first
builds this optimized range representation and assigns it to x; and when
in the next statement you modify this list x it says 'oops! i cannot do
that with this representation', so it reverts to an array-like
representation (i.e. it creates all 100 elements) and then changes the
50th. No gain here. If on the other hand you only ever do 'easy'
things with your list, like iterate over it or read elements, then it
can all be done with the range representation, without falling back to
the array representation.

Again I'm not saying it is trivial to implement it, but that not having
to expose for optimization purposes 'xrange' and the whole 'iterator'
part of the language would be worth it, in my opinion.

Armin

Armin Rigo · Apr 3, 2004

Hi,

Joe said:
Why can't the C implementation do the same thing?

You could, if you wanted to optimize specifically lists of integers. If
you did the result would probably be really fast. The problem is that
you can only really special-case so many types: the C code has to deal
with all cases without knowing which cases are likely. The Psyco
version quickly figures out that for this list it pays off to make a
special case for integers; with another list, the machine code would be
different, special-cased differently.

However, in most real examples, you are not sorting a list of integers
but of something more complex anyway, where the built-in sort wins
easily. My message was intended as a long-term hint that maybe, at some
point, the built-in sort will actually be more often faster than the C
one if rewritten in Python. The advantage would then be that (as Psyco
does in a limited fashion) you can specialize the code for the
particular kind of list you are dealing with.

Armin

Josiah Carlson · Apr 3, 2004

No, range should return an object that is a list, as far as you can tell

from Python, but which is represented more efficiently than an array of
objects internally. The distinction is between the language level (it
would be a list, with all operations, etc.) and the implementation
(there is no reason why all lists should be arrays of PyObjects
internally).

You can implement such a thing already. In fact, xrange up until
recently, supported basically everything that a list object did, except
for mutations. The reason it doesn't anymore is because for multiple
versions of Python, such behavior was buggy and poorly supported. If
you are bored enough, feel free to make your own xrange-like object that
supports mutation. Heck, it can even subclass 'list', though it need
not have any standard list internals.

Another example would be 'a'*999999999: the result is a string, but
there is no reason that it takes 100MB of memory. Instead, store it
into a C structure that contains a pointer to the original string object
'a' and the repetition counter, but still give this C structure the
Python type str, so that the difference doesn't show up and the Python
language remains simple. (This is a bit difficult to implement
currently in CPython, but not impossible.)

Also, you are free to implement such a thing. I believe that the
current CPython implementation doesn't follow this route (and other
suggested optimizations) is because it needlessly complicates the
implementation of CPython.

Ideally: If you do x=range(100); x[50]='hi' then the interpreter first
builds this optimized range representation and assigns it to x; and when
in the next statement you modify this list x it says 'oops! i cannot do
that with this representation', so it reverts to an array-like
representation (i.e. it creates all 100 elements) and then changes the
50th. No gain here. If on the other hand you only ever do 'easy'
things with your list, like iterate over it or read elements, then it
can all be done with the range representation, without falling back to
the array representation.

Why not instead use a dictionary-based approach for special items? It
would be far more memory efficient, and wouldn't incur the modification
penalty.

Again I'm not saying it is trivial to implement it, but that not having
to expose for optimization purposes 'xrange' and the whole 'iterator'
part of the language would be worth it, in my opinion.

I think that one of the desires of offering 'iterator' concepts to
users, both new and seasoned, is that it allows people to think in ways
they didn't before. Allowing people to make those optimizations 'by
hand', I believe (as an educator and computer scientist), allows them to
grow as programmers (or computer scientists, as the case may be).

Don't get me wrong, it would be terribly convenient for Python to
abstract away all the nastiness of optimization, but if/when someone
were to do something that had been /very/ fast before, but was now
awfully slow (think the current Queue.Queue object for small vs. large
numbers of items), they are going to jump to the conclusion that Python
is broken in some fundamental way, maybe come here to complain about
Python being broken, and those who are familliar with the internals
would say, "you are ruining Python's early optimization by this thing
that you do".

Which really gets us to the fact that you are asking for the Python
internals to be optimized. In fact, while simultaneously saying "don't
optimize early", you are optimizing early by saying that range should be
optimized, as should string multiplication, etc. Goodness man, pick a
position and stick with it.

- Josiah

Christian Tismer · Apr 3, 2004

Armin Rigo said:
Again I'm not saying it is trivial to implement it, but that not having
to expose for optimization purposes 'xrange' and the whole 'iterator'
part of the language would be worth it, in my opinion.

First of all, I'm trying to see whether I can write through this interface.
As you might have realized, sarcastically after they fooled me with that
April joke, my site was really lost, andthis is a tad.

Anyway, I'd like to add that Armin's idea can be extended (as he surely knows)
to special casing seldom assignments to and algorithmically given array.
That is, in the case of just a few assignments, a list could internally
aufment the expression. If the number of elements grows, it could be
turned into a preferred dictionary, after reaching some threshold.
And after another threshold, it could be turned into something like
a real list, or just a specialized, typed list, depending on the type.

In general, I share Armin's impression, that iterators are nothing else
but an explicit way to spell optimizations.
While explicit is better than implicit, in the case of optimizations,
I believe it is an over-specification, and almost completely in the false
direction. We have to prove this in a known project, still.

There is one thing where I think explicit iterator do make some sense,
in a way the reader might not expect.
Let me show:

if you do something like

for each in sequence:
try:
do_something_with(each)
except SomeThingWrongWithThis:
# handle exception somehow

In terms of iterators, we implicitly create an interator here and consume it.
The explicit notation of iterators gives us this advantage:

instead you can do it this way:

it = iter(sequence)
can_continue = 1
while can_continue:
try:
for each in it:
do_something_with(each)
exceptSomeThingWrongWithThis
can_continue = some_recovery(each)
continue

In words: By the help of iterators, it is possible to write exception
handlers for special cases *outside* the loop, repair the error, and
continue iterating in the loop.
I have used this pattern a lot of times now, and I'm quite happy ith it.

But I have to admit, that this is too a bad, explicit optimization, and
a compiler could be smart enough to do this for me, automatically.

cheers - chris

Paul Rubin · Apr 3, 2004

Armin Rigo said:
Ideally: If you do x=range(100); x[50]='hi' then the interpreter first
builds this optimized range representation and assigns it to x; and when
in the next statement you modify this list x it says 'oops! i cannot do
that with this representation', so it reverts to an array-like
representation (i.e. it creates all 100 elements) and then changes the
50th. No gain here. If on the other hand you only ever do 'easy'
things with your list, like iterate over it or read elements, then it
can all be done with the range representation, without falling back to
the array representation.

Maybe there is something to this.

RPM1 · Apr 3, 2004

"Armin Rigo" wrote ...

Hi!

This is a rant against the optimization trend of the Python interpreter.

Sorting a list of 100000 integers in random order takes:

* 0.75 seconds in Python 2.1
* 0.51 seconds in Python 2.2
* 0.46 seconds in Python 2.3

Tim Peters did a great job optimizing list.sort(). If I try with a
simple, non-stable pure Python quicksort implementation, in Python 2.3:

* 4.83 seconds
* 0.21 seconds with Psyco

First step towards world domination of high-level languages

The reason that Psyco manages to outperform the C implementation is not
that gcc is a bad compiler (it is about 10 times better than Psyco's).
The reason is that the C implementation must use a generic '<' operator
to compare elements, while the Psyco version quickly figures out that it
can expect to find ints in the list; it still has to check this
assumption, but this is cheap and then the comparison is done with a
single machine instruction.

I think Psyco is great! But Python + Psyco does not outperform
C overall. Try writing a chess program in Python and see how it
performs against C.

Patrick

Joe Mason · Apr 4, 2004

Another example would be 'a'*999999999: the result is a string, but
there is no reason that it takes 100MB of memory. Instead, store it
into a C structure that contains a pointer to the original string object
'a' and the repetition counter, but still give this C structure the
Python type str, so that the difference doesn't show up and the Python
language remains simple. (This is a bit difficult to implement
currently in CPython, but not impossible.)

What this does is makes the interpreter more complicated for features
that not all programs will use. Only a very few programs will have long
strings of repeated characters, and it's reasonable to ask them to
implement their own stringlike class if they really want it.

If this is built into the interpreter, then either it's an optional
feature, in which case all those programs that rely on it to be remotely
memory-efficient aren't portable, or it requires every single
implementation to include it, including the PalmOS port and the one
that's supposed to run on cell phones.

Joe

Joe Mason · Apr 4, 2004

You could, if you wanted to optimize specifically lists of integers. If
you did the result would probably be really fast. The problem is that
you can only really special-case so many types: the C code has to deal
with all cases without knowing which cases are likely. The Psyco
version quickly figures out that for this list it pays off to make a
special case for integers; with another list, the machine code would be
different, special-cased differently.

Ah, good point. (In fact, not special-casing lots of things in the C
code is exactly what I was arguing against in my other post.)

Joe

Raymond Hettinger · Apr 4, 2004

[Armin Rigo]

enumerate([6,7,8,9]) # uh ?

Click to expand...

Click to expand...

<enumerate object at 0x401a102c>

This got me thinking about how much I would like to see the contents
of an iterator at the interactive prompt.

I wonder if the read-eval-print loop could be trained to make a better
display:

# rough pseudo-code sketch
while 1:
command = raw_input()
result = eval(command)
if result is None:
continue
if is_iterator(result):
result, copy = itertools.tee(result)
print "<%s object at 0x%8x:" %
(type(result).__name__, id(result)),
for elem in itertools.islice(copy, 3):
print repr(elem),
else:
print '...',
print '>'
else:
print repr(result)
_ = result

# intended result[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), (6, 'g'),
(7, 'h'), (8, 'i'), (9, 'j'), (10, 'k'), (11, 'l'), (12, 'm'), (13,
'n')]

Raymond Hettinger

Hye-Shik Chang · Apr 4, 2004

[Armin Rigo]

enumerate([6,7,8,9]) # uh ?

Click to expand...

<enumerate object at 0x401a102c>

Click to expand...

This got me thinking about how much I would like to see the contents
of an iterator at the interactive prompt.

I wonder if the read-eval-print loop could be trained to make a better
display: [snip]

# intended result[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), (6, 'g'),
(7, 'h'), (8, 'i'), (9, 'j'), (10, 'k'), (11, 'l'), (12, 'm'), (13,
'n')]

Yeah! I love this idea. It may make not only enumerate() but also
reverse() and itertools internal objects more interactive-friendly.

Hye-Shik

Ville Vainio · Apr 4, 2004

Armin> Worse, and more importantly, the optimization starts to
Armin> become visible to the programmer. Iterators, for example,
Armin> are great in limited cases but I consider their
Armin> introduction a significant complication in the language;
Armin> before, you could expect that some function from which you
Armin> would expect a sequence returned a list. Python was all
Armin> lists and

Iterators are an optimization? I always considered them just a more
clean and elegant way to do the same thing

.

Lua is faster than Fortran???	42	Jul 3, 2010
Taking list as an input from Python to C	1	Jul 21, 2022
Why is Python telling me variable is local not global?	3	Sep 2, 2023
IronPython faster than CPython?	3	Dec 18, 2007
Hello I have question for a programmer it`s regarding REDDIT but this forum is the best where to put it?	0	Sep 12, 2023
JIT compilers for Python, what is the latest news?	19	Apr 4, 2013
Why is it impossible to create a compiler than can compile Python tomachinecode like C?	25	Feb 28, 2013
Why is this so much faster?	3	Jun 2, 2011

Python is faster than C

Armin Rigo

Michel Claveau/Hamster

Josiah Carlson

Armin Rigo

Paul Rubin

Joe Mason

Armin Rigo

Paul Rubin

Andrew MacIntyre

Armin Rigo

Armin Rigo

Josiah Carlson

Christian Tismer

Paul Rubin

RPM1

Joe Mason

Joe Mason

Raymond Hettinger

Hye-Shik Chang

Ville Vainio

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads