Python is faster than C

Discussion in 'Python' started by Armin Rigo, Apr 3, 2004.

  1. Armin Rigo

    Armin Rigo Guest


    This is a rant against the optimization trend of the Python interpreter.

    Sorting a list of 100000 integers in random order takes:

    * 0.75 seconds in Python 2.1
    * 0.51 seconds in Python 2.2
    * 0.46 seconds in Python 2.3

    Tim Peters did a great job optimizing list.sort(). If I try with a
    simple, non-stable pure Python quicksort implementation, in Python 2.3:

    * 4.83 seconds
    * 0.21 seconds with Psyco

    First step towards world domination of high-level languages :)

    The reason that Psyco manages to outperform the C implementation is not
    that gcc is a bad compiler (it is about 10 times better than Psyco's).
    The reason is that the C implementation must use a generic '<' operator
    to compare elements, while the Psyco version quickly figures out that it
    can expect to find ints in the list; it still has to check this
    assumption, but this is cheap and then the comparison is done with a
    single machine instruction.

    Similarily, here are some results about the heapq module, which is
    rewritten in C in the CVS tree for Python 2.4:

    l = [random.random() for x in range(200000)]

    This code executes on my laptop in:

    * 1.96 seconds on Python 2.3 (pure Python)
    * 0.18 seconds on Python 2.4cvs (rewritten in C)
    * 0.16 seconds on Python 2.3 with Psyco

    So this is not so much a plug for Psyco as a rant against the current
    trend of rewriting standard modules in C. Premature optimization and
    all that.

    Worse, and more importantly, the optimization starts to become visible
    to the programmer. Iterators, for example, are great in limited cases
    but I consider their introduction a significant complication in the
    language; before, you could expect that some function from which you
    would expect a sequence returned a list. Python was all lists and
    dicts, with dicts used as namespaces here and there. Nowadays you have
    to be careful. Moreover, it is harder to explain:
    [(1, 4), (2, 5), (3, 6)]
    <enumerate object at 0x401a102c>

    I know you can always do list(_). My point is that this is a
    user-visible optimization. enumerate() should return a normal list, and
    it should be someone else's job to ensure that it is correctly optimized
    away if possible (and I'm not even talking about Psyco, it could be done
    in the current Python implementation with a reasonable amount of

    Protesting-ly yours,

    Armin Rigo, Apr 3, 2004
    1. Advertisements

  2. Hi !

    For the 1st April, it's finish.
    Michel Claveau/Hamster, Apr 3, 2004
    1. Advertisements

  3. For the 1st April, it's finish.

    That wasn't a joke, psyco does in fact work /really/ well for some things.

    - Josiah
    Josiah Carlson, Apr 3, 2004
  4. Armin Rigo

    Armin Rigo Guest

    Check it for yourself. Find yourself an Intel machine and grab Psyco
    from Here is the source of my test:

    # Python Quicksort Written by Magnus Lie Hetland

    def _partition(list, start, end):
    pivot = list[end]
    bottom = start-1
    top = end

    done = 0
    while not done:

    while not done:
    bottom = bottom+1

    if bottom == top:
    done = 1

    if pivot < list[bottom]:
    list[top] = list[bottom]

    while not done:
    top = top-1

    if top == bottom:
    done = 1

    if list[top] < pivot:
    list[bottom] = list[top]

    list[top] = pivot
    return top

    def _quicksort(list, start, end):
    if start < end:
    split = _partition(list, start, end)
    _quicksort(list, start, split-1)
    _quicksort(list, split+1, end)

    def quicksort(list):
    if len(list) > 1:
    _quicksort(list, 0, len(list)-1)

    # ____________________________________________________________

    import random, time, psyco
    l = range(100000)

    #TIMEIT = "l.sort()"
    #TIMEIT = "quicksort(l)"
    TIMEIT = "psyco.proxy(quicksort)(l)"

    print TIMEIT, ':',
    t = time.time()
    exec TIMEIT
    print time.time() - t

    assert l == range(100000)
    Armin Rigo, Apr 3, 2004
  5. Armin Rigo

    Paul Rubin Guest

    I think enumerate(xrange(1000000000)) returning a normal list would
    exhaust memory before some later optimizer had a chance to do anything
    with it.
    Paul Rubin, Apr 3, 2004
  6. Armin Rigo

    Joe Mason Guest

    Why can't the C implementation do the same thing?

    Joe Mason, Apr 3, 2004
  7. Armin Rigo

    Armin Rigo Guest

    Hello Paul,

    There are two levels: the language specification and its implementation.
    My point is that there is no reason why everything that is (at the
    language level) a list, should always be implemented as a plain array of
    objects. The list returned by range(1000000) doesn't have to use 4MB of
    memory when the same information can be encoded in a couple of ints.
    The presence of xrange() is the oldest of these user-visible
    optimization hack that should have been optimized in the implementation
    instead. Similarily, if you do 'a'*999999999 I can think of a better
    way to encode the resulting string than with 100MB of RAM.

    On your specific example, we could also argue that a slightly involved
    but reasonable amount of effort would allow C functions to internally
    receive a context hint, which would tell enumerate() that its result
    will only ever be used for iteration when this is the case, so that it
    can internally return an iterable instead of a list -- but this would
    only be a hack, a workaround for the limits of CPython's internal object

    Armin Rigo, Apr 4, 2004
  8. Armin Rigo

    Paul Rubin Guest

    I think you're saying that instead of having xrange return a special
    object, range should return that special object instead. I'm not too
    clear on the distinction. Also, how can the compiler or interpreter
    know whether the program will only access the object sequentially?
    It has to predict the behavior of the program, instead of being told
    Paul Rubin, Apr 4, 2004
  9. While I certainly appreciate pysco and what it can do, I believe your
    protest to be unreasonable as it denies better performance on platforms
    that psyco doesn't yet (& probably never will) support.

    Moreover, your protest about iterators is also unreasonable as people are
    benefitting from the reduced memory consumption iterators and their ilk
    bring (quite often accompanied by performance gains from not having to
    thrash large amounts of RAM through pitiful caches). As such, iterators
    are a _different_ optimisation, and I hope that you can come to terms with
    this and psycoise them too!
    Andrew MacIntyre, Apr 4, 2004
  10. Armin Rigo

    Armin Rigo Guest

    No, range should return an object that is a list, as far as you can tell
    from Python, but which is represented more efficiently than an array of
    objects internally. The distinction is between the language level (it
    would be a list, with all operations, etc.) and the implementation
    (there is no reason why all lists should be arrays of PyObjects

    Another example would be 'a'*999999999: the result is a string, but
    there is no reason that it takes 100MB of memory. Instead, store it
    into a C structure that contains a pointer to the original string object
    'a' and the repetition counter, but still give this C structure the
    Python type str, so that the difference doesn't show up and the Python
    language remains simple. (This is a bit difficult to implement
    currently in CPython, but not impossible.)
    Ideally: If you do x=range(100); x[50]='hi' then the interpreter first
    builds this optimized range representation and assigns it to x; and when
    in the next statement you modify this list x it says 'oops! i cannot do
    that with this representation', so it reverts to an array-like
    representation (i.e. it creates all 100 elements) and then changes the
    50th. No gain here. If on the other hand you only ever do 'easy'
    things with your list, like iterate over it or read elements, then it
    can all be done with the range representation, without falling back to
    the array representation.

    Again I'm not saying it is trivial to implement it, but that not having
    to expose for optimization purposes 'xrange' and the whole 'iterator'
    part of the language would be worth it, in my opinion.

    Armin Rigo, Apr 4, 2004
  11. Armin Rigo

    Armin Rigo Guest


    You could, if you wanted to optimize specifically lists of integers. If
    you did the result would probably be really fast. The problem is that
    you can only really special-case so many types: the C code has to deal
    with all cases without knowing which cases are likely. The Psyco
    version quickly figures out that for this list it pays off to make a
    special case for integers; with another list, the machine code would be
    different, special-cased differently.

    However, in most real examples, you are not sorting a list of integers
    but of something more complex anyway, where the built-in sort wins
    easily. My message was intended as a long-term hint that maybe, at some
    point, the built-in sort will actually be more often faster than the C
    one if rewritten in Python. The advantage would then be that (as Psyco
    does in a limited fashion) you can specialize the code for the
    particular kind of list you are dealing with.

    Armin Rigo, Apr 4, 2004
  12. No, range should return an object that is a list, as far as you can tell
    You can implement such a thing already. In fact, xrange up until
    recently, supported basically everything that a list object did, except
    for mutations. The reason it doesn't anymore is because for multiple
    versions of Python, such behavior was buggy and poorly supported. If
    you are bored enough, feel free to make your own xrange-like object that
    supports mutation. Heck, it can even subclass 'list', though it need
    not have any standard list internals.

    Also, you are free to implement such a thing. I believe that the
    current CPython implementation doesn't follow this route (and other
    suggested optimizations) is because it needlessly complicates the
    implementation of CPython.

    Why not instead use a dictionary-based approach for special items? It
    would be far more memory efficient, and wouldn't incur the modification

    I think that one of the desires of offering 'iterator' concepts to
    users, both new and seasoned, is that it allows people to think in ways
    they didn't before. Allowing people to make those optimizations 'by
    hand', I believe (as an educator and computer scientist), allows them to
    grow as programmers (or computer scientists, as the case may be).

    Don't get me wrong, it would be terribly convenient for Python to
    abstract away all the nastiness of optimization, but if/when someone
    were to do something that had been /very/ fast before, but was now
    awfully slow (think the current Queue.Queue object for small vs. large
    numbers of items), they are going to jump to the conclusion that Python
    is broken in some fundamental way, maybe come here to complain about
    Python being broken, and those who are familliar with the internals
    would say, "you are ruining Python's early optimization by this thing
    that you do".

    Which really gets us to the fact that you are asking for the Python
    internals to be optimized. In fact, while simultaneously saying "don't
    optimize early", you are optimizing early by saying that range should be
    optimized, as should string multiplication, etc. Goodness man, pick a
    position and stick with it.

    - Josiah
    Josiah Carlson, Apr 4, 2004
  13. First of all, I'm trying to see whether I can write through this interface.
    As you might have realized, sarcastically after they fooled me with that
    April joke, my site was really lost, andthis is a tad.

    Anyway, I'd like to add that Armin's idea can be extended (as he surely knows)
    to special casing seldom assignments to and algorithmically given array.
    That is, in the case of just a few assignments, a list could internally
    aufment the expression. If the number of elements grows, it could be
    turned into a preferred dictionary, after reaching some threshold.
    And after another threshold, it could be turned into something like
    a real list, or just a specialized, typed list, depending on the type.

    In general, I share Armin's impression, that iterators are nothing else
    but an explicit way to spell optimizations.
    While explicit is better than implicit, in the case of optimizations,
    I believe it is an over-specification, and almost completely in the false
    direction. We have to prove this in a known project, still.

    There is one thing where I think explicit iterator do make some sense,
    in a way the reader might not expect.
    Let me show:

    if you do something like

    for each in sequence:
    except SomeThingWrongWithThis:
    # handle exception somehow

    In terms of iterators, we implicitly create an interator here and consume it.
    The explicit notation of iterators gives us this advantage:

    instead you can do it this way:

    it = iter(sequence)
    can_continue = 1
    while can_continue:
    for each in it:
    can_continue = some_recovery(each)

    In words: By the help of iterators, it is possible to write exception
    handlers for special cases *outside* the loop, repair the error, and
    continue iterating in the loop.
    I have used this pattern a lot of times now, and I'm quite happy ith it.

    But I have to admit, that this is too a bad, explicit optimization, and
    a compiler could be smart enough to do this for me, automatically.

    cheers - chris
    Christian Tismer, Apr 4, 2004
  14. Armin Rigo

    Paul Rubin Guest

    Maybe there is something to this.
    Paul Rubin, Apr 4, 2004
  15. Armin Rigo

    RPM1 Guest

    "Armin Rigo" wrote ...
    I think Psyco is great! But Python + Psyco does not outperform
    C overall. Try writing a chess program in Python and see how it
    performs against C.

    RPM1, Apr 4, 2004
  16. Armin Rigo

    Joe Mason Guest

    What this does is makes the interpreter more complicated for features
    that not all programs will use. Only a very few programs will have long
    strings of repeated characters, and it's reasonable to ask them to
    implement their own stringlike class if they really want it.

    If this is built into the interpreter, then either it's an optional
    feature, in which case all those programs that rely on it to be remotely
    memory-efficient aren't portable, or it requires every single
    implementation to include it, including the PalmOS port and the one
    that's supposed to run on cell phones.

    Joe Mason, Apr 4, 2004
  17. Armin Rigo

    Joe Mason Guest

    Ah, good point. (In fact, not special-casing lots of things in the C
    code is exactly what I was arguing against in my other post.)

    Joe Mason, Apr 4, 2004
  18. [Armin Rigo]
    This got me thinking about how much I would like to see the contents
    of an iterator at the interactive prompt.

    I wonder if the read-eval-print loop could be trained to make a better

    # rough pseudo-code sketch
    while 1:
    command = raw_input()
    result = eval(command)
    if result is None:
    if is_iterator(result):
    result, copy = itertools.tee(result)
    print "<%s object at 0x%8x:" %
    (type(result).__name__, id(result)),
    for elem in itertools.islice(copy, 3):
    print repr(elem),
    print '...',
    print '>'
    print repr(result)
    _ = result

    # intended result[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), (6, 'g'),
    (7, 'h'), (8, 'i'), (9, 'j'), (10, 'k'), (11, 'l'), (12, 'm'), (13,

    Raymond Hettinger
    Raymond Hettinger, Apr 4, 2004
  19. Yeah! I love this idea. It may make not only enumerate() but also
    reverse() and itertools internal objects more interactive-friendly.

    Hye-Shik Chang, Apr 4, 2004
  20. Armin Rigo

    Ville Vainio Guest

    Armin> Worse, and more importantly, the optimization starts to
    Armin> become visible to the programmer. Iterators, for example,
    Armin> are great in limited cases but I consider their
    Armin> introduction a significant complication in the language;
    Armin> before, you could expect that some function from which you
    Armin> would expect a sequence returned a list. Python was all
    Armin> lists and

    Iterators are an optimization? I always considered them just a more
    clean and elegant way to do the same thing :).
    Ville Vainio, Apr 4, 2004
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.