Towards faster Python implementations - theory

Discussion in 'Python' started by John Nagle, May 8, 2007.

  1. John Nagle

    John Nagle Guest

    Some faster Python implementations are under development.
    JPython has been around for a while, and PyPy and ShedSkin
    continue to move forward. It's worth thinking about what slows
    down Python implementations.

    It isn't the dynamism, really. As others have pointed out
    in the Python literature, most of the time, the more elaborate
    dynamic features aren't being used for any given variable or
    object. If code has something like "x = 1.0", the odds are that
    "x" is going to stay a floating point number, and not suddenly turn
    into a list or an object reference. The type inference of Shed Skin
    builds on that assumption, adding some restrictions about changing of
    variable types.

    The Shed Skin effort indicates that explicit typing, via 'decorators'
    or otherwise, isn't really necessary. What's necessary is the avoidance
    of "surprises" in the language. In this context, a "surprise" is
    the use of a dynamic feature in a way that can't be seen at compile time.

    A typical "surprise" would be the use of "setattr" on an object from
    outside the compilation unit that defines the object. Within a module,
    "setattr" on an object in that module is no problem; the compiler can see
    it and generate the extra machinery needed to make an object dynamically
    alterable at run time. But if an object doesn't need that extra machinery
    and associated dictionary, it's a big win to discard the excess baggage
    and use a simpler fixed-size representation, comparable to a C struct,
    for the object.

    On the typing front, the neatest way to express typing is via
    initialization. With the Shed Skin restrictions, if all variables are
    initialized before use (preferably in __init__), there's no need to
    maintain an "undefined" flag for a variable. And, of course, if
    the type of a varaible is simple and can't change, it doesn't have to
    be "boxed", (enclosed in an object) which is a big win.

    The point here is that we don't need language changes or declarations
    to make Python much faster. All we need are a few restrictions that
    insure that, when you're doing something unusual, the compiler can
    tell.

    John Nagle
     
    John Nagle, May 8, 2007
    #1
    1. Advertising

  2. John Nagle

    Paul Boddie Guest

    On 8th May, 17:53, John Nagle <> wrote:
    > Some faster Python implementations are under development.
    > JPython


    Ahem: Jython!

    > has been around for a while, and PyPy and ShedSkin
    > continue to move forward. It's worth thinking about what slows
    > down Python implementations.


    It's the dynamicity! ;-) But things like clever memory allocation can
    pay dividends, too, and I wouldn't be surprised if this explained
    Python's better-than-expected showing when compared to other languages
    - such as that comparison you provoked with those claims of superior
    JavaScript performance. ;-)

    > It isn't the dynamism, really.


    In theory, no, but in practice CPython doesn't optimise away the
    dynamism. Recent experiments with method caches seem to have shown
    modest performance improvements, and I'd guess that such things are
    fairly established in other dynamic language implementations.

    > As others have pointed out
    > in the Python literature, most of the time, the more elaborate
    > dynamic features aren't being used for any given variable or
    > object. If code has something like "x = 1.0", the odds are that
    > "x" is going to stay a floating point number, and not suddenly turn
    > into a list or an object reference. The type inference of Shed Skin
    > builds on that assumption, adding some restrictions about changing of
    > variable types.


    The problem here, amongst many others, is knowing for sure whether the
    more elaborate features have been avoided. Approaches which attempt to
    avoid knowing such things include just-in-time compilation (you avoid
    knowing in advance) and type declarations (you give up thinking about
    whether it's possible and have the programmer do all the work).

    > The Shed Skin effort indicates that explicit typing, via 'decorators'
    > or otherwise, isn't really necessary. What's necessary is the avoidance
    > of "surprises" in the language. In this context, a "surprise" is
    > the use of a dynamic feature in a way that can't be seen at compile time.


    I concur with your assessment of the supposed necessity of explicit
    typing. However, you might be surprised as to what constitutes a
    "surprise" in Python...

    > A typical "surprise" would be the use of "setattr" on an object from
    > outside the compilation unit that defines the object. Within a module,
    > "setattr" on an object in that module is no problem; the compiler can see
    > it and generate the extra machinery needed to make an object dynamically
    > alterable at run time. But if an object doesn't need that extra machinery
    > and associated dictionary, it's a big win to discard the excess baggage
    > and use a simpler fixed-size representation, comparable to a C struct,
    > for the object.


    You don't even need to bring out setattr to make the inference
    activity a difficult one. Even straightforward attribute access needs
    to be repeatedly checked to see if the target of a normal attribute
    assignment or query has changed. Traditionally, people have shown some
    trivial function in isolation...

    def f(x):
    return x.a

    ....and said, "We don't know anything! It's all impossible!" But
    context is everything, as you know, and whole program analysis is the
    only way to go with the aforementioned goals in mind. What if the
    parameter to the above itself comes from attribute access?

    def g(y):
    return f(y.b)

    You can descend into some fairly demanding situations with respect to
    keeping track of all the possibilities.

    > On the typing front, the neatest way to express typing is via
    > initialization. With the Shed Skin restrictions, if all variables are
    > initialized before use (preferably in __init__), there's no need to
    > maintain an "undefined" flag for a variable. And, of course, if
    > the type of a varaible is simple and can't change, it doesn't have to
    > be "boxed", (enclosed in an object) which is a big win.


    I'm fairly sure, although my intuition unfortunately doesn't provide a
    ready example right now, that typing via initialisation, whilst an
    important tool, may either impose unacceptable restrictions if
    enforced too rigidly or limit the precision that one might desire in
    determining type information in the resulting system. But it is a
    worthwhile objective to identify fixed-size structures and unboxed
    values, in my opinion.

    > The point here is that we don't need language changes or declarations
    > to make Python much faster. All we need are a few restrictions that
    > insure that, when you're doing something unusual, the compiler can
    > tell.


    Agreed. And I don't buy into the negative "lesser Python" labelling of
    such work, either. People seem to have forgotten how to use older,
    conservative Python features, preferring to show off with metaclasses
    and closures even for problems that could be solved using simple
    classes and objects. If that's "greater Python" then call me a
    minimalist!

    Paul
     
    Paul Boddie, May 8, 2007
    #2
    1. Advertising

  3. In <>, Paul Boddie
    wrote:

    >> On the typing front, the neatest way to express typing is via
    >> initialization. With the Shed Skin restrictions, if all variables are
    >> initialized before use (preferably in __init__), there's no need to
    >> maintain an "undefined" flag for a variable. And, of course, if
    >> the type of a varaible is simple and can't change, it doesn't have to
    >> be "boxed", (enclosed in an object) which is a big win.


    A variable? :)

    Maybe that last sentence should start with: And, of course, if the type of
    objects bound to a name won't change…

    > I'm fairly sure, although my intuition unfortunately doesn't provide a
    > ready example right now, that typing via initialisation, whilst an
    > important tool, may either impose unacceptable restrictions if
    > enforced too rigidly or limit the precision that one might desire in
    > determining type information in the resulting system.


    I often bind a name to `None` first if it is going to be bound to it's
    "real" value later on. So this initialization doesn't say anything about
    the type(s) that will be bound to it later.

    I don't see how this type inference for static types will work unless some
    of the dynamism of the language will get restricted. But is this really
    necessary? Isn't a JIT compiler and maybe type hinting good enough? By
    type hinting I really mean just recommendations from the programmer. So
    you can say this name should be an `int` and the JIT compiler produces
    code in advance, but it's okay to pass another object instead.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, May 8, 2007
    #3
  4. John Nagle a écrit :
    > Some faster Python implementations are under development.
    > JPython has been around for a while,


    s/JP/J/

    And FWIW, I'm not sure Jython is really faster than CPython...
     
    Bruno Desthuilliers, May 8, 2007
    #4
  5. John Nagle

    John Nagle Guest

    Marc 'BlackJack' Rintsch wrote:

    > I don't see how this type inference for static types will work unless some
    > of the dynamism of the language will get restricted. But is this really
    > necessary? Isn't a JIT compiler and maybe type hinting good enough?


    Not necessarily. One of the more powerful optimizations is to optimize
    reference count updates. Often, you can hoist reference count updates
    out of loops, which is a big win. But to do that, you need to be sure
    that the code executed by the loop won't change unexpectedly.

    My point is that while all the dynamism in Python is occasionally
    useful, most of the time for most of the variables most of it isn't
    being used. If we can discard the excess baggage, unnecessary
    dictionary lookups, and unnecessary reference count updates a
    large fraction of the time, it's a big win. This requires
    "surprise-free" language semantics, so that when something unusual
    is going on, it's visibly reflected in the code.

    Surprise-free semantics also make code easier to debug.


    John Nagle
     
    John Nagle, May 9, 2007
    #5
  6. Bruno Desthuilliers <> wrote:

    > John Nagle a écrit :
    > > Some faster Python implementations are under development.
    > > JPython has been around for a while,

    >
    > s/JP/J/


    These days, yes, but it WAS originally called JPython (it was renamed to
    Jython later). So, "has been around a while" is an appropriate context
    for using the good old name.


    > And FWIW, I'm not sure Jython is really faster than CPython...


    Definitely not, last I measured. Not IronPython either, overall (though
    it's much better than Jython and does get to beat CPython on some
    specific benchmarks).


    Alex
     
    Alex Martelli, May 9, 2007
    #6
  7. John Nagle

    Kay Schluehr Guest

    On May 9, 1:25 pm, John Nagle <> wrote:
    > Marc 'BlackJack' Rintsch wrote:
    > > I don't see how this type inference for static types will work unless some
    > > of the dynamism of the language will get restricted. But is this really
    > > necessary? Isn't a JIT compiler and maybe type hinting good enough?

    >
    > Not necessarily. One of the more powerful optimizations is to optimize
    > reference count updates. Often, you can hoist reference count updates
    > out of loops, which is a big win. But to do that, you need to be sure
    > that the code executed by the loop won't change unexpectedly.


    The advantage of having a JIT is just that it can record data at
    runtime and respond flexible to them. It doesn't run into global
    static analysis problems mentioned by Paul. A "semi-dynamical"
    compromise would mean to use a profile of samples of runtime data and
    assert that they reflect typical" behaviour. Then the system needs an
    ability to fall back to usual bytecode interpretation. Psyco does that
    by analyzing the next opcode and a very clever dispatch mechanism.

    A more code oriented, profile driven approach would factor source into
    natively compilable parts and those that have to be bytecode
    interpreted. I wonder whether bridges to Pyrex, Boost.Python or the
    celerid bridge to D could be used and what the performance penalties
    are for all the argument/return value wrapping and unwrapping.
    Unfortunately ShedSkin lacks CPython integration. We talked about this
    here recently.

    Kay
     
    Kay Schluehr, May 9, 2007
    #7
  8. "John Nagle" <> wrote:

    8<---------------- summary of existing work and thinking ------------------

    > The point here is that we don't need language changes or declarations
    > to make Python much faster. All we need are a few restrictions that
    > insure that, when you're doing something unusual, the compiler can
    > tell.
    >


    I am relatively new on this turf, and from what I have seen so far, it
    would not bother me at all to tie a name's type to its first use, so that
    the name can only be bound to objects of the same type as the type
    of the object that it was originally bound to.

    But maybe I am missing the point of dynamism.

    Would an implementation of the above break lots of stuff in practice?

    It seems to me that it could have the effect of a declaration without
    the wart of actually doing it.

    - Hendrik
     
    Hendrik van Rooyen, May 9, 2007
    #8
  9. John Nagle

    Paul Boddie Guest

    On 9 May, 08:09, "Hendrik van Rooyen" <> wrote:
    >
    > I am relatively new on this turf, and from what I have seen so far, it
    > would not bother me at all to tie a name's type to its first use, so that
    > the name can only be bound to objects of the same type as the type
    > of the object that it was originally bound to.


    But it's interesting to consider the kinds of names you could restrict
    in this manner and what the effects would be. In Python, the only kind
    of name that can be considered difficult to arbitrarily modify "at a
    distance" - in other words, from outside the same scope - are locals,
    and even then there are things like closures and perverse
    implementation-dependent stack hacks which can expose local namespaces
    to modification, although any reasonable "conservative Python"
    implementation would disallow the latter.

    In a local namespace you could restrict names in this way, although
    I'd argue that with the limitations on locals, you don't gain as much
    as you would by restricting other names similarly. However, by
    restricting other kinds of names (eg. instance attributes) you have to
    start thinking about polymorphism: what if attribute x on instances of
    class C can have different types? If you aren't careful, you've
    introduced interfaces as the primary mechanism permitting some kind of
    polymorphism.

    Paul
     
    Paul Boddie, May 9, 2007
    #9
  10. John Nagle

    Terry Reedy Guest

    "Hendrik van Rooyen" <> wrote in message
    news:013d01c79210$5e441280$03000080@hendrik...
    | I am relatively new on this turf, and from what I have seen so far, it
    | would not bother me at all to tie a name's type to its first use, so that
    | the name can only be bound to objects of the same type as the type
    | of the object that it was originally bound to.
    |
    | But maybe I am missing the point of dynamism.
    |
    | Would an implementation of the above break lots of stuff in practice?

    For function local variables, if you mean 'originally bound to' in the
    current call invocation, that would sometimes be ok (and that is sort of
    what Psycho does). But if you mean in the original binding in the first
    call invocation, then that would cripple many functions.


    tjr
     
    Terry Reedy, May 9, 2007
    #10
  11. John Nagle

    John Nagle Guest

    Paul Boddie wrote:
    > On 9 May, 08:09, "Hendrik van Rooyen" <> wrote:
    >
    >>I am relatively new on this turf, and from what I have seen so far, it
    >>would not bother me at all to tie a name's type to its first use, so that
    >>the name can only be bound to objects of the same type as the type
    >>of the object that it was originally bound to.

    >
    >
    > But it's interesting to consider the kinds of names you could restrict
    > in this manner and what the effects would be. In Python, the only kind
    > of name that can be considered difficult to arbitrarily modify "at a
    > distance" - in other words, from outside the same scope - are locals,
    > and even then there are things like closures and perverse
    > implementation-dependent stack hacks which can expose local namespaces
    > to modification, although any reasonable "conservative Python"
    > implementation would disallow the latter.


    Modifying "at a distance" is exactly what I'm getting at. That's the
    killer from an optimizing compiler standpoint. The compiler, or a
    maintenance programmer, looks at a block of code, and there doesn't seem
    to be anything unusual going on. But, if in some other section of
    code, something does a "setattr" to mess with the first block of code,
    something unusual can be happening. This is tough on both optimizing
    compilers and maintenance programmers.

    Python has that capability mostly because it's free in an
    "everything is a dictionary" implementation. ("When all you have
    is a hash, everything looks like a dictionary".) But that limits
    implementation performance. Most of the time, nobody is using
    "setattr" to mess with the internals of a function, class, or
    module from far, far away. But the cost for that flexibility is
    being paid, unnecessarily.

    I'm suggesting that the potential for "action at a distance" somehow
    has to be made more visible.

    One option might be a class "simpleobject", from which other classes
    can inherit. ("object" would become a subclass of "simpleobject").
    "simpleobject" classes would have the following restrictions:

    - New fields and functions cannot be introduced from outside
    the class. Every field and function name must explicitly appear
    at least once in the class definition. Subclassing is still
    allowed.
    - Unless the class itself uses "getattr" or "setattr" on itself,
    no external code can do so. This lets the compiler eliminate the
    object's dictionary unless the class itself needs it.

    This lets the compiler see all the field names and assign them fixed slots
    in a fixed sized object representation. Basically, this means simple objects
    have a C/C++ like internal representation, with the performance that comes
    with that representation.

    With this, plus the "Shed Skin" restrictions, plus the array features of
    "numarray", it should be possible to get computationally intensive code
    written in Python up to C/C++ levels of performance. Yet all the dynamic
    machinery of Python remains available if needed.

    All that's necessary is not to surprise the compiler.

    John Nagle
     
    John Nagle, May 9, 2007
    #11
  12. John Nagle

    Paul Boddie Guest

    John Nagle wrote:
    >
    > Modifying "at a distance" is exactly what I'm getting at. That's the
    > killer from an optimizing compiler standpoint. The compiler, or a
    > maintenance programmer, looks at a block of code, and there doesn't seem
    > to be anything unusual going on. But, if in some other section of
    > code, something does a "setattr" to mess with the first block of code,
    > something unusual can be happening. This is tough on both optimizing
    > compilers and maintenance programmers.


    Agreed. It's tempting to look at some code and say which types one
    thinks are being manipulated, but the actual program behaviour can be
    quite different. Adding explicit type declarations "anchors" the names
    to a very restrictive set of types - typically those permitted through
    interfaces or inheritance in many object-oriented languages - and the
    challenge, as we've already discussed, is to attempt to "anchor" the
    names when they are in a sense "floating free" without any explicit
    annotations from the programmer.

    Python also presents other challenges. Unlike systems programming
    languages, almost nothing is a local operation: each Python function
    is mostly a product of function calls and the occasional conditional
    or looping construct, themselves wired up using yet more function
    calls. Whilst the expection is that most of these will return sensible
    results (eg. a call to the iter method on a sequence returns an
    iterator), there isn't any guarantee that this will be the case, and
    even then the precise iterator involved can vary. Some might claim
    that the class of operations which could be done locally are precisely
    the ones which would benefit from identification and optimisation,
    namely those involving primitive types, yet some pretty good solutions
    for such cases exist already: Pyrex, various numeric packages, and so
    on.

    > Python has that capability mostly because it's free in an
    > "everything is a dictionary" implementation. ("When all you have
    > is a hash, everything looks like a dictionary".) But that limits
    > implementation performance. Most of the time, nobody is using
    > "setattr" to mess with the internals of a function, class, or
    > module from far, far away. But the cost for that flexibility is
    > being paid, unnecessarily.


    I've heard claims that virtual machines are evolving to make
    dictionary-based access more performant, and I guess Microsoft's DLR
    and any JVM improvements might point in that direction, but Shed Skin
    shows the way by avoiding such things entirely.

    > I'm suggesting that the potential for "action at a distance" somehow
    > has to be made more visible.
    >
    > One option might be a class "simpleobject", from which other classes
    > can inherit. ("object" would become a subclass of "simpleobject").
    > "simpleobject" classes would have the following restrictions:
    >
    > - New fields and functions cannot be introduced from outside
    > the class. Every field and function name must explicitly appear
    > at least once in the class definition. Subclassing is still
    > allowed.


    I suppose one could mandate that only methods defined inside class
    blocks belong to the class and subclasses, and that various kinds of
    optimisations can then be applied to the code in those methods such
    that the self parameter's type is constrained in a way probably
    already imposed by CPython. This would forbid the adding of functions
    to classes and instances after the fact, but in my fairly conservative
    world of programming, I hardly ever do such things. I do add
    attributes to instances outside those instances (by the above
    definition) quite often, though.

    > - Unless the class itself uses "getattr" or "setattr" on itself,
    > no external code can do so. This lets the compiler eliminate the
    > object's dictionary unless the class itself needs it.


    I think this is a natural progression from determining the access
    needs of particular classes and their objects.

    > This lets the compiler see all the field names and assign them fixed slots
    > in a fixed sized object representation. Basically, this means simple objects
    > have a C/C++ like internal representation, with the performance that comes
    > with that representation.


    Yes, I agree that this would be desirable.

    > With this, plus the "Shed Skin" restrictions, plus the array features of
    > "numarray", it should be possible to get computationally intensive code
    > written in Python up to C/C++ levels of performance. Yet all the dynamic
    > machinery of Python remains available if needed.
    >
    > All that's necessary is not to surprise the compiler.


    One thing I had cause to think about again today was the cost of
    function invocations. Python not only has pervasive dynamic dispatch,
    but there are also such things as *args and **kwargs and the cost of
    dealing with them. However, such things are very useful in integrating
    components and extending class hierarchies, so I wouldn't want to see
    them vanish in their entirety.

    Paul
     
    Paul Boddie, May 9, 2007
    #12
  13. John Nagle

    Klaas Guest

    On May 9, 10:02 am, John Nagle <> wrote:

    > One option might be a class "simpleobject", from which other classes
    > can inherit. ("object" would become a subclass of "simpleobject").
    > "simpleobject" classes would have the following restrictions:
    >
    > - New fields and functions cannot be introduced from outside
    > the class. Every field and function name must explicitly appear
    > at least once in the class definition. Subclassing is still
    > allowed.
    > - Unless the class itself uses "getattr" or "setattr" on itself,
    > no external code can do so. This lets the compiler eliminate the
    > object's dictionary unless the class itself needs it.
    >
    > This lets the compiler see all the field names and assign them fixed slots
    > in a fixed sized object representation. Basically, this means simple objects
    > have a C/C++ like internal representation, with the performance that comes
    > with that representation.


    Hey look, it already exists:

    >>> class A(object):

    .... __slots__ = 'a b c d'.split()

    >>> a = A()
    >>> a.e = 2

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    AttributeError: 'A' object has no attribute 'e'
    >>> hasattr(a, '__dict__')

    False

    -Mike
     
    Klaas, May 10, 2007
    #13
  14. "Terry Reedy" <t..y@u,,.edu> wrote:

    > "Hendrik van Rooyen" <> wrote in message
    > news:013d01c79210$5e441280$03000080@hendrik...
    > | I am relatively new on this turf, and from what I have seen so far, it
    > | would not bother me at all to tie a name's type to its first use, so that
    > | the name can only be bound to objects of the same type as the type
    > | of the object that it was originally bound to.
    > |
    > | But maybe I am missing the point of dynamism.
    > |
    > | Would an implementation of the above break lots of stuff in practice?
    >
    > For function local variables, if you mean 'originally bound to' in the
    > current call invocation, that would sometimes be ok (and that is sort of
    > what Psycho does). But if you mean in the original binding in the first
    > call invocation, then that would cripple many functions.
    >

    Errr - I was thinking a bit simplistic - I know I can write:

    def f(x):
    for y in x:
    print y # using print here as short for doing something complicated

    And that would currently "work" with any iterable, as x could
    currently be anything.

    It seems that such functions are the problem, as something like this:

    x = [1,2,3,4,5]
    for y in x:
    print y

    does not have the same hassle for x, but can shift the problem to y:

    x = [1,2,3,4,(1,2)]
    for y in x:
    print y

    I can't see an easy way to put the patient back into his straight jacket.

    Makes you want to use pre defined globals...

    - Hendrik
     
    Hendrik van Rooyen, May 10, 2007
    #14
  15. "John Nagle" <> wrote:


    > Paul Boddie wrote:
    > > On 9 May, 08:09, "Hendrik van Rooyen" <> wrote:
    > >
    > >>I am relatively new on this turf, and from what I have seen so far, it
    > >>would not bother me at all to tie a name's type to its first use, so that
    > >>the name can only be bound to objects of the same type as the type
    > >>of the object that it was originally bound to.

    > >
    > >
    > > But it's interesting to consider the kinds of names you could restrict
    > > in this manner and what the effects would be. In Python, the only kind
    > > of name that can be considered difficult to arbitrarily modify "at a
    > > distance" - in other words, from outside the same scope - are locals,
    > > and even then there are things like closures and perverse
    > > implementation-dependent stack hacks which can expose local namespaces
    > > to modification, although any reasonable "conservative Python"
    > > implementation would disallow the latter.

    >
    > Modifying "at a distance" is exactly what I'm getting at. That's the
    > killer from an optimizing compiler standpoint. The compiler, or a
    > maintenance programmer, looks at a block of code, and there doesn't seem
    > to be anything unusual going on. But, if in some other section of
    > code, something does a "setattr" to mess with the first block of code,
    > something unusual can be happening. This is tough on both optimizing
    > compilers and maintenance programmers.
    >
    > Python has that capability mostly because it's free in an
    > "everything is a dictionary" implementation. ("When all you have
    > is a hash, everything looks like a dictionary".) But that limits
    > implementation performance. Most of the time, nobody is using
    > "setattr" to mess with the internals of a function, class, or
    > module from far, far away. But the cost for that flexibility is
    > being paid, unnecessarily.
    >
    > I'm suggesting that the potential for "action at a distance" somehow
    > has to be made more visible.
    >
    > One option might be a class "simpleobject", from which other classes
    > can inherit. ("object" would become a subclass of "simpleobject").
    > "simpleobject" classes would have the following restrictions:
    >
    > - New fields and functions cannot be introduced from outside
    > the class. Every field and function name must explicitly appear
    > at least once in the class definition. Subclassing is still
    > allowed.
    > - Unless the class itself uses "getattr" or "setattr" on itself,
    > no external code can do so. This lets the compiler eliminate the
    > object's dictionary unless the class itself needs it.
    >
    > This lets the compiler see all the field names and assign them fixed slots
    > in a fixed sized object representation. Basically, this means simple objects
    > have a C/C++ like internal representation, with the performance that comes
    > with that representation.
    >
    > With this, plus the "Shed Skin" restrictions, plus the array features of
    > "numarray", it should be possible to get computationally intensive code
    > written in Python up to C/C++ levels of performance. Yet all the dynamic
    > machinery of Python remains available if needed.
    >
    > All that's necessary is not to surprise the compiler.
    >

    If this is all it takes, I would even be happy to have to declare which things
    could be surprising - some statement like:

    x can be anything

    Would that help?

    It kind of inverts the logic - and states that if you want what is now
    the default behaviour, you have to ask for it.

    - Hendrik
     
    Hendrik van Rooyen, May 10, 2007
    #15
  16. John Nagle

    sturlamolden Guest

    On May 8, 5:53 pm, John Nagle <> wrote:

    > The point here is that we don't need language changes or declarations
    > to make Python much faster. All we need are a few restrictions that
    > insure that, when you're doing something unusual, the compiler can
    > tell.


    Franz, CMUCL, SBCL and GCL teams made Lisp almost as fast as C. A
    dynamic language can be fast if the implementation is good.

    If you look at SBCL and GCL, no code is interpreted. It's all compiled
    on the fly to native machine code. The compiler begins with some code
    and some input data, and compiles as much as it can. Then the RT
    executes the machine code, compiles again, etc. Often long stretches
    of code can be compiled without break, and tight performance critical
    loops are usually compiled only once. In addition to this, one needs
    an efficient system to cache compiled code, in order to do the
    compilation work only once. making a dynamic language fast is not
    rocket science.

    We should have somthing like "GPython", a Python RT on top of a GCC
    backend, similar to what the GCL team did for Lisp. There is no
    obvious reason as to why Lisp should have better performance than
    Python.
     
    sturlamolden, May 10, 2007
    #16
  17. John Nagle

    Tim Golden Guest

    sturlamolden wrote:
    > On May 8, 5:53 pm, John Nagle <> wrote:
    >
    >> The point here is that we don't need language changes or declarations
    >> to make Python much faster. All we need are a few restrictions that
    >> insure that, when you're doing something unusual, the compiler can
    >> tell.

    >
    > Franz, CMUCL, SBCL and GCL teams made Lisp almost as fast as C. A
    > dynamic language can be fast if the implementation is good.
    >
    > If you look at SBCL and GCL, no code is interpreted. It's all compiled
    > on the fly to native machine code. The compiler begins with some code
    > and some input data, and compiles as much as it can. Then the RT
    > executes the machine code, compiles again, etc. Often long stretches
    > of code can be compiled without break, and tight performance critical
    > loops are usually compiled only once. In addition to this, one needs
    > an efficient system to cache compiled code, in order to do the
    > compilation work only once. making a dynamic language fast is not
    > rocket science.
    >
    > We should have somthing like "GPython", a Python RT on top of a GCC
    > backend, similar to what the GCL team did for Lisp. There is no
    > obvious reason as to why Lisp should have better performance than
    > Python.


    I doubt if anyone disputes the gist of what you're
    saying[*], viz that Python could be made faster by using
    technique (a), (b) or (c) which have been successful
    elsewhere. At least that it's worth investgating.

    But the relevant bit of your last paragraph is at the start:
    "We should...". Unless someone or someones has the time,
    inclination, money, backing, wherewithal etc. to implement
    this or any other measure of speeding-up, it's all
    pie-in-the-sky. Useful, maybe, as discussion of what
    options are viable, but a project of this magnitude
    doesn't just happen in some developer's lunchbreak.

    Many people (and I include myself) are quite
    happy with Python's speed. In fact, I'm quite happy
    with most things about Python. Others would like to
    see it faster. That's great. But unless people
    puts their money where their mouths are, I don't
    see it happening.

    TJG

    [*] Actually, knowing this community, I'm sure *someone's*
    going to!
     
    Tim Golden, May 10, 2007
    #17
  18. John Nagle

    Terry Reedy Guest

    "sturlamolden" <> wrote in message
    news:...
    | Franz, CMUCL, SBCL and GCL teams made Lisp almost as fast as C. A
    | dynamic language can be fast if the implementation is good.
    |
    | If you look at SBCL and GCL, no code is interpreted. It's all compiled
    | on the fly to native machine code.

    Unfortunately, native machine code depends on the machine, or at least the
    machine being emulated by the hardware. Fortunately or not, the dominance
    of the x386 model makes this less of a problem.

    | The compiler begins with some code
    | and some input data, and compiles as much as it can. Then the RT
    | executes the machine code, compiles again, etc. Often long stretches
    | of code can be compiled without break, and tight performance critical
    | loops are usually compiled only once. In addition to this, one needs
    | an efficient system to cache compiled code, in order to do the
    | compilation work only once. making a dynamic language fast is not
    | rocket science.

    This sound somewhat similar to Psyco. If a code section is entered with
    different machine types, recompilation is needed. One of the problems with
    Psyco is that its efficient caching of multiple versions of compiled code
    is space-hungry.

    | We should have somthing like "GPython", a Python RT on top of a GCC
    | backend, similar to what the GCL team did for Lisp. There is no
    | obvious reason as to why Lisp should have better performance than
    | Python.

    In the 1980s, during the Lisp/AI boom, perhaps a billion dollars was put
    into making Lisp run faster. Plus decades of work in Computer Science
    departments. Other than that, I agree. Now, Python is slowly making
    inroads into academia as a subject of research and theses, and the PyPy
    group got at least one large EU grant.

    Terry Jan Reedy
     
    Terry Reedy, May 10, 2007
    #18
  19. John Nagle

    Guest

    On May 9, 5:02 pm, John Nagle <> wrote:
    > Paul Boddie wrote:
    >
    > Python has that capability mostly because it's free in an
    > "everything is a dictionary" implementation. ("When all you have
    > is a hash, everything looks like a dictionary".) But that limits
    > implementation performance. Most of the time, nobody is using
    > "setattr" to mess with the internals of a function, class, or
    > module from far, far away. But the cost for that flexibility is
    > being paid, unnecessarily.
    >


    I've had this idea in the back of my head for a year or so now, but
    have been too lazy or busy or both to give the implementation a try.

    The idea is to create a special dictionary for python internals that
    contains a pointer-to-a-pointer for the value side of the dictionary,
    instead of just the standard PyObject*. Then when you start to eval a
    code block, you could do a one-time lookup of the pointer-to-the-
    pointer for each method, attribute, etc; and store them in pre-
    allocated slots. The calls in the block to various GET_ and SET_
    functions can then just deal with pointer derefencing instead of a
    dictionary lookup, and the dictionary can still get changed while this
    is going on without things blowing up.

    I think you'd get a big speed up by changing all those dictionary
    function calls to inlined pointer deref code.

    Anyone else buy this approach? Or see any fatal flaws?
     
    , May 10, 2007
    #19
  20. John Nagle

    Terry Reedy Guest

    <> wrote in message
    news:...
    | The idea is to create a special dictionary for python internals that
    | contains a pointer-to-a-pointer for the value side of the dictionary,
    | instead of just the standard PyObject*. Then when you start to eval a
    | code block, you could do a one-time lookup of the pointer-to-the-
    | pointer for each method, attribute, etc; and store them in pre-
    | allocated slots. The calls in the block to various GET_ and SET_
    | functions can then just deal with pointer derefencing instead of a
    | dictionary lookup, and the dictionary can still get changed while this
    | is going on without things blowing up.
    |
    | I think you'd get a big speed up by changing all those dictionary
    | function calls to inlined pointer deref code.
    |
    | Anyone else buy this approach? Or see any fatal flaws?

    Within CPython functions, local variables are usually implemented as
    numbered slots in a C array (of PyObject pointers). There is why a) the
    compiler makes a first pass thru a function body (to determine what is
    local and what is not) and b) LOAD_FAST is faster than LOAD_GLOBAL.

    Terry Jan Reedy
     
    Terry Reedy, May 10, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff
    Replies:
    1
    Views:
    358
    Robbe Morris [C# MVP]
    Dec 20, 2005
  2. Ken Loh
    Replies:
    0
    Views:
    376
    Ken Loh
    Feb 16, 2005
  3. Sidney Cadot

    integer division towards -infinity

    Sidney Cadot, Jul 10, 2003, in forum: C Programming
    Replies:
    3
    Views:
    621
    Glen Herrmannsfeldt
    Jul 11, 2003
  4. spinoza1111
    Replies:
    1
    Views:
    269
  5. spinoza1111
    Replies:
    0
    Views:
    253
    spinoza1111
    May 16, 2008
Loading...

Share This Page