Needless copying in iterations?

Discussion in 'Python' started by James Stroud, Sep 15, 2007.

  1. James Stroud

    James Stroud Guest

    Hello all,

    I was staring at a segment of code that looked like this today:

    for something in stuff[x:y]:
    whatever(something)

    and was wondering if the compiler really made a copy of the slice from
    stuff as the code seems to suggest, or does it find some way to produce
    an iterator without the need to make a copy (if stuff is a built-in
    sequence type)? Or would it be more efficient to do the more clumsy (in
    my opinion):

    for i in xrange(x, y):
    whatever(stuff)

    James
     
    James Stroud, Sep 15, 2007
    #1
    1. Advertising

  2. James Stroud

    buffi Guest

    On Sep 15, 11:58 pm, James Stroud <> wrote:
    > Hello all,
    >
    > I was staring at a segment of code that looked like this today:
    >
    > for something in stuff[x:y]:
    > whatever(something)
    >
    > and was wondering if the compiler really made a copy of the slice from
    > stuff as the code seems to suggest, or does it find some way to produce
    > an iterator without the need to make a copy (if stuff is a built-in
    > sequence type)? Or would it be more efficient to do the more clumsy (in
    > my opinion):
    >
    > for i in xrange(x, y):
    > whatever(stuff)
    >
    > James


    itertools.islice does what you want

    import itertools
    for something in itertools.islice(stuff, x, y):
    whatever(something)
     
    buffi, Sep 15, 2007
    #2
    1. Advertising

  3. James Stroud

    buffi Guest

    On Sep 16, 12:20 am, James Stroud <> wrote:
    > buffi wrote:
    > > On Sep 15, 11:58 pm, James Stroud <> wrote:
    > >> Hello all,

    >
    > >> I was staring at a segment of code that looked like this today:

    >
    > >> for something in stuff[x:y]:
    > >> whatever(something)

    >
    > >> and was wondering if the compiler really made a copy of the slice from
    > >> stuff as the code seems to suggest, or does it find some way to produce
    > >> an iterator without the need to make a copy (if stuff is a built-in
    > >> sequence type)? Or would it be more efficient to do the more clumsy (in
    > >> my opinion):

    >
    > >> for i in xrange(x, y):
    > >> whatever(stuff)

    >
    > >> James

    >
    > > itertools.islice does what you want

    >
    > > import itertools
    > > for something in itertools.islice(stuff, x, y):
    > > whatever(something)

    >
    > Thanks buffi!
    >
    > So I guess the interpreter does no optimization in the latter?
    >
    > James


    No, as far as I know it makes a new list out of the slice when you do
    it like
    for something in stuff[x:y]

    - Björn Kempén
     
    buffi, Sep 15, 2007
    #3
  4. James Stroud

    James Stroud Guest

    buffi wrote:
    > On Sep 15, 11:58 pm, James Stroud <> wrote:
    >> Hello all,
    >>
    >> I was staring at a segment of code that looked like this today:
    >>
    >> for something in stuff[x:y]:
    >> whatever(something)
    >>
    >> and was wondering if the compiler really made a copy of the slice from
    >> stuff as the code seems to suggest, or does it find some way to produce
    >> an iterator without the need to make a copy (if stuff is a built-in
    >> sequence type)? Or would it be more efficient to do the more clumsy (in
    >> my opinion):
    >>
    >> for i in xrange(x, y):
    >> whatever(stuff)
    >>
    >> James

    >
    > itertools.islice does what you want
    >
    > import itertools
    > for something in itertools.islice(stuff, x, y):
    > whatever(something)
    >


    Thanks buffi!

    So I guess the interpreter does no optimization in the latter?

    James
     
    James Stroud, Sep 15, 2007
    #4
  5. This is a case where its up to the type involved. For example,
    xrange() slices the way you want but range() does not. Maybe a type
    would return for slices a proxy object that got the value by index or
    maybe it knows that it makes more sense to give you a copy because
    changes during the iteration should not be reflected in the iteration.
    It would be really easy to make a generic slicer.

    On 9/15/07, James Stroud <> wrote:
    > Hello all,
    >
    > I was staring at a segment of code that looked like this today:
    >
    > for something in stuff[x:y]:
    > whatever(something)
    >
    > and was wondering if the compiler really made a copy of the slice from
    > stuff as the code seems to suggest, or does it find some way to produce
    > an iterator without the need to make a copy (if stuff is a built-in
    > sequence type)? Or would it be more efficient to do the more clumsy (in
    > my opinion):
    >
    > for i in xrange(x, y):
    > whatever(stuff)
    >
    > James
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >



    --
    Read my blog! I depend on your acceptance of my opinion! I am interesting!
    http://ironfroggy-code.blogspot.com/
     
    Calvin Spealman, Sep 15, 2007
    #5
  6. James Stroud

    buffi Guest

    On Sep 16, 12:25 am, "Calvin Spealman" <> wrote:
    > This is a case where its up to the type involved. For example,
    > xrange() slices the way you want but range() does not.


    Coul you explain this?
    As far as I know you can't slice a xrange

    - Björn Kempén
     
    buffi, Sep 15, 2007
    #6
  7. On Sat, 15 Sep 2007 14:58:15 -0700, James Stroud wrote:

    > I was staring at a segment of code that looked like this today:
    >
    > for something in stuff[x:y]:
    > whatever(something)
    >
    > and was wondering if the compiler really made a copy of the slice from
    > stuff as the code seems to suggest, or does it find some way to produce
    > an iterator without the need to make a copy (if stuff is a built-in
    > sequence type)?


    The compiler can't "optimize" this as it would change the semantics.
    There's no way for the compiler to tell if this copy really is "needless".
    `whatever()` may change `stuff` where `i` is in `x:y` and this may lead
    to different results wether it iterates over a copy or the original.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Sep 16, 2007
    #7
  8. On Sun, 16 Sep 2007 00:05:58 +0000, Marc 'BlackJack' Rintsch wrote:

    > On Sat, 15 Sep 2007 14:58:15 -0700, James Stroud wrote:
    >
    >> I was staring at a segment of code that looked like this today:
    >>
    >> for something in stuff[x:y]:
    >> whatever(something)
    >>
    >> and was wondering if the compiler really made a copy of the slice from
    >> stuff as the code seems to suggest, or does it find some way to produce
    >> an iterator without the need to make a copy (if stuff is a built-in
    >> sequence type)?

    >
    > The compiler can't "optimize" this as it would change the semantics.
    > There's no way for the compiler to tell if this copy really is
    > "needless". `whatever()` may change `stuff` where `i` is in `x:y` and
    > this may lead to different results wether it iterates over a copy or the
    > original.


    In *general* the compiler can't tell, but in specific cases it could. A
    (hypothetical) optimizing compiler would tell the difference between:


    for item in alist[1:5]:
    print item # no possible side-effects


    for item in alist[1:5]:
    function(alist, item) # there might be side-effects


    for item in alist[1:5]:
    alist.append(item) # side-effects DON'T matter


    for i, item in enumerate(alist[1:5]):
    alist[i+1] = function(item) # side-effects DO matter


    Of course this is all besides the point, since no such optimizing
    compiler exists today, at least not for CPython. Any volunteers?


    --
    Steven.
     
    Steven D'Aprano, Sep 16, 2007
    #8
  9. James Stroud

    Ben Finney Guest

    Steven D'Aprano <> writes:

    > In *general* the compiler can't tell, but in specific cases it
    > could. A (hypothetical) optimizing compiler would tell the
    > difference between:
    >
    > for item in alist[1:5]:
    > print item # no possible side-effects


    The 'print' statement converts the 'item' to a str. That conversion
    could, in a pathological case, have a side-effect (say, if the class
    of 'item' had an overridden '__str__' method with side effects), and
    the compiler can't know this isn't a pathological case.

    > for item in alist[1:5]:
    > alist.append(item) # side-effects DON'T matter


    The compiler doesn't know that, at the time of invocation,
    'alist.append' doesn't have side effects that matter.

    The compiler for a dynamic language like Python has very little
    absolute "no significant side-effect in these specific cases"
    guarantee of the kind you're assuming even in the cases you choose for
    contrast with the ones that *do* have significant side-effects.

    --
    \ "When we call others dogmatic, what we really object to is |
    `\ their holding dogmas that are different from our own." -- |
    _o__) Charles Issawi |
    Ben Finney
     
    Ben Finney, Sep 16, 2007
    #9
  10. On Sun, 16 Sep 2007 00:40:13 +0000, Steven D'Aprano wrote:

    > On Sun, 16 Sep 2007 00:05:58 +0000, Marc 'BlackJack' Rintsch wrote:
    >
    > In *general* the compiler can't tell, but in specific cases it could. A
    > (hypothetical) optimizing compiler would tell the difference between:
    >
    >
    > for item in alist[1:5]:
    > print item # no possible side-effects


    To expand on what Ben said: After conversion to `str` the result is then
    given to `sys.stdout.write()` and of course `sys.stdout` could be another
    object that changes `alist`.

    > for item in alist[1:5]:
    > alist.append(item) # side-effects DON'T matter


    Side effect do matter here, even in not so pathological cases. There are
    cases where you get a different result whether you copy or not even with
    plain vanilla `list`\s:

    In [153]: alist = [1, 2, 3]

    In [154]: for item in alist[1:5]:
    .....: alist.append(item)
    .....:

    In [155]: alist
    Out[155]: [1, 2, 3, 2, 3]

    In [156]: alist = [1, 2, 3]

    In [157]: for item in islice(alist, 1, 5):
    .....: alist.append(item)
    .....:

    In [158]: alist
    Out[158]: [1, 2, 3, 2, 3, 2, 3]

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Sep 16, 2007
    #10
  11. On Sun, 16 Sep 2007 15:05:55 +1000, Ben Finney wrote:

    > Steven D'Aprano <> writes:
    >
    >> In *general* the compiler can't tell, but in specific cases it could. A
    >> (hypothetical) optimizing compiler would tell the difference between:
    >>
    >> for item in alist[1:5]:
    >> print item # no possible side-effects

    >
    > The 'print' statement converts the 'item' to a str. That conversion
    > could, in a pathological case, have a side-effect (say, if the class of
    > 'item' had an overridden '__str__' method with side effects), and the
    > compiler can't know this isn't a pathological case.


    Fair enough... but I'm reminded of a rant by Joel Spolsky about GUI
    design:

    '...programmers have tended to think that if users are allowed to resize
    and move windows, they should have complete flexibility over where these
    windows go, right down to the last pixel. After all, positioning a window
    2 pixels from the top of the screen is "equally likely" as positioning a
    window exactly at the top of the screen.'

    http://www.joelonsoftware.com/uibook/fog0000000249.html

    (three quarters of the way down, in Chapter 7.)

    Maybe I'm being unfair, but it seems to me that the attitude is similar:
    'there's no point optimizing the common case of printing (say) ints
    stored in a list, Just In Case the programmer wants the incredibly rare
    case of setting sys.stdout to some wacky object that modifies the list
    he's iterating over. It could happen.'

    *shrug* But apart from the occasional smart-alec who does it just to
    demonstrate that it is possible, it probably never has.

    It seems to me that the "consenting adults" philosophy of Python doesn't
    extend to the compiler, and perhaps it should. Maybe Python could
    optimize common cases, and if developers wanted to do wacky things, let
    them turn optimization off on a module-by-module basis.

    Or even function-by-function. Wouldn't it be nice to have decorators that
    could optimize the functions they decorated?


    >> for item in alist[1:5]:
    >> alist.append(item) # side-effects DON'T matter

    >
    > The compiler doesn't know that, at the time of invocation,
    > 'alist.append' doesn't have side effects that matter.


    It might if it knows what type alist is, and how long it is (as Marc
    Rintsch points out). That's why I used "alist" in my example rather than
    "an_arbitrary_sequence".


    > The compiler for a dynamic language like Python has very little absolute
    > "no significant side-effect in these specific cases" guarantee of the
    > kind you're assuming even in the cases you choose for contrast with the
    > ones that *do* have significant side-effects.


    Yes, in general one might not be able to make those guarantees, but still
    there are common cases where you can. That's how Psycho works.

    The point is rather moot, since CPython (and probably other Pythons) do
    almost no optimizations. But just because Python is a dynamic language
    doesn't mean there are no optimizations possible: Haskell is a dynamic
    language, and there are optimizing compilers for it. Of course, it is
    much simpler for Haskell, because of the type system it uses.

    The CPython compiler already does a couple of source-code optimizations
    (ignores some unused strings; some constant folding) and at least one
    runtime optimization (string concatenation is no longer _always_ slow):

    http://utcc.utoronto.ca/~cks/space/blog/python/ExaminingStringConcatOpt

    Because of its very nature, optimizing Python is hard. Even something as
    simple as tail-call recursion optimization is verboten, because Guido
    considers good stack traces more important than preventing stack
    overflows in the first place:

    http://www.tratt.net/laurie/tech_articles/articles/tail_call_optimization

    Anyway, I'm not really complaining. I like Python, I'm grateful for the
    work the Python-dev people have already done, and I'm in no position to
    volunteer to write an optimizing compiler. And it may be that, regardless
    of how smart you make the compiler, Python simply can't be optimized
    because of design decisions made back in 1990-whatever when Guido first
    started his grand plan.



    --
    Steven.
     
    Steven D'Aprano, Sep 16, 2007
    #11
  12. On Sun, 16 Sep 2007 09:50:39 +0000, Steven D'Aprano wrote:

    > The point is rather moot, since CPython (and probably other Pythons) do
    > almost no optimizations. But just because Python is a dynamic language
    > doesn't mean there are no optimizations possible: Haskell is a dynamic
    > language, and there are optimizing compilers for it. Of course, it is
    > much simpler for Haskell, because of the type system it uses.


    What do you mean by Haskell is a dynamic language? It is statically and
    strict typed and the compiler usually knows all the functions. No
    "surprises", no side effects, no duck typing.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Sep 16, 2007
    #12
  13. James Stroud

    Kay Schluehr Guest

    > Haskell is a dynamic language, and there are optimizing compilers for it.

    Last time I used Haskell it was statically typed but this has been
    almost a year ago and times are changing fast.

    About optimizing compilers for Python. These won't work without
    restrictions a la Shedskin which uses C++ as its backend. But unlike
    Shedskin and the design mistake being made in this project we can
    learn the lesson from Psyco that there must be a fallback to bytecode
    interpretation when some argument does not fit the signature of a
    specialized function. I'm personally a bit skeptical about JIT
    compilation since I feel it's a waste of electric currency to start
    recording all the information at runtime and perform native
    compilations again and again. It doesn't mean that JIT compilation
    can't be effective. I have a conceptual bias against it.

    Instead I expect a rather stable behaviour and rather fixed nominal
    types despite the fact that Python supports duck-typing. Unfortunately
    I'm unable to recover an investigation about the effectiveness of
    adding tests to a rather well equipped testbase. The result was one of
    "declining profits". Stable behaviour emerged early with good coverage
    properties. Than one had to work hard to improve and find more bugs. A
    similar effect might also cause the effectiveness of dynamic typing
    for catching errors which is not easy to justify by purely theoretical
    arguments.

    -----

    Note that I worked out some pieces of a type feedback mechanism that
    records types and displays the results in a slightly modified Python
    dialect called TPython. A TPython program can be regarded as a "typed
    hypothesis" about an otherwise untyped ( or dynamically typed ) Python
    program. You can keep the TPython program and translate it completely
    or fragments of it to other languages like C++, D, Java, OCaml etc.
    Obviously this approach shall be used for incremental progress and
    smooth integration and it might work even when only small portions can
    be translated ( typically functions that keep and return only integers/
    floats which are bound in range are a good start ). So far everything
    is implemented in pure Python and the amount of source is < 1000 LOC.

    I have not published anything of it yet and i'm a rather slow
    publisher anyway. If anyone is interested in this kind of work and
    willing to contribute i might set it aside from the usual EasyExtend
    stuff ( which is the foundation nevertheless ) I'm working on and
    reframe the project context e.g. on Sourceforge or elsewhere. Email
    address and project page are still


    www.fiber-space.de

    If no one contributes the fun is exclusively on my side and everything
    is moving slower. That's all.
     
    Kay Schluehr, Sep 16, 2007
    #13
  14. On Sun, 16 Sep 2007 10:58:07 +0000, Marc 'BlackJack' Rintsch wrote:

    > On Sun, 16 Sep 2007 09:50:39 +0000, Steven D'Aprano wrote:
    >
    >> The point is rather moot, since CPython (and probably other Pythons) do
    >> almost no optimizations. But just because Python is a dynamic language
    >> doesn't mean there are no optimizations possible: Haskell is a dynamic
    >> language, and there are optimizing compilers for it. Of course, it is
    >> much simpler for Haskell, because of the type system it uses.

    >
    > What do you mean by Haskell is a dynamic language? It is statically and
    > strict typed and the compiler usually knows all the functions. No
    > "surprises", no side effects, no duck typing.


    Haskell's IO monad (and possibly the do monad?) allows side effects. It
    would be a pretty poor programming language that didn't allow input or
    output!

    See also http://homepages.cwi.nl/~ralf/OOHaskell/
    showing that Haskell can do object-oriented programming, complete with
    mutable objects and side-effects. Although "duck typing" is listed as a
    keyword, I couldn't see any reference to it in the paper.

    Haskell uses type inference, and has a "maybe" type for those cases where
    it can't tell what the type will be.

    If you still don't accept that Haskell is a dynamic language, for
    whatever definition of dynamic language you use, I'll withdraw the claim
    for the sake of not getting bogged down in long arguments over something
    that was really just a minor point.


    --
    Steven.
     
    Steven D'Aprano, Sep 16, 2007
    #14
  15. On Sun, 16 Sep 2007 13:31:57 +0000, Steven D'Aprano wrote:

    > On Sun, 16 Sep 2007 10:58:07 +0000, Marc 'BlackJack' Rintsch wrote:
    >
    >> On Sun, 16 Sep 2007 09:50:39 +0000, Steven D'Aprano wrote:
    >>
    >>> The point is rather moot, since CPython (and probably other Pythons) do
    >>> almost no optimizations. But just because Python is a dynamic language
    >>> doesn't mean there are no optimizations possible: Haskell is a dynamic
    >>> language, and there are optimizing compilers for it. Of course, it is
    >>> much simpler for Haskell, because of the type system it uses.

    >>
    >> What do you mean by Haskell is a dynamic language? It is statically and
    >> strict typed and the compiler usually knows all the functions. No
    >> "surprises", no side effects, no duck typing.

    >
    > Haskell's IO monad (and possibly the do monad?) allows side effects. It
    > would be a pretty poor programming language that didn't allow input or
    > output!


    Those side effects stay in the IO monad and don't "leak" into other areas
    so the compiler still knows much more about the possible data flow than
    the compilers of most other languages.

    > Haskell uses type inference, and has a "maybe" type for those cases where
    > it can't tell what the type will be.


    Haskell is statically typed, the compiler must know the type of every name
    otherwise it stops with an error message. The `Maybe` type is for
    functions that may return something (typed) or `Nothing`. For example a
    `lookup` function can have the signature:

    lookup :: Eq a => a -> [(a,b)] -> Maybe b
    lookup key assocList = ...

    The compiler infers the concrete types of `a` and `b` at compile time.

    > If you still don't accept that Haskell is a dynamic language, for
    > whatever definition of dynamic language you use, I'll withdraw the claim
    > for the sake of not getting bogged down in long arguments over something
    > that was really just a minor point.


    You said it must be possible to optimize a dynamic language because there
    are optimizing compilers for a dynamic language like Haskell. That's a
    minor point now?

    The main problem with optimizing Python code is that the language is
    dynamically typed -- the types of objects bound to names are only checked
    at runtime and can change at nearly every point in time while executing.
    Haskell is statically typed, the types are all inferred at compile time so
    it's possible to optimize the code for those types. The compiler knows
    the exact type of every name. By what definition is that a dynamic
    language!?

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Sep 16, 2007
    #15
  16. James Stroud

    Steve Holden Guest

    Steven D'Aprano wrote:
    > On Sun, 16 Sep 2007 15:05:55 +1000, Ben Finney wrote:
    >
    >> Steven D'Aprano <> writes:
    >>
    >>> In *general* the compiler can't tell, but in specific cases it could. A
    >>> (hypothetical) optimizing compiler would tell the difference between:
    >>>
    >>> for item in alist[1:5]:
    >>> print item # no possible side-effects

    >> The 'print' statement converts the 'item' to a str. That conversion
    >> could, in a pathological case, have a side-effect (say, if the class of
    >> 'item' had an overridden '__str__' method with side effects), and the
    >> compiler can't know this isn't a pathological case.

    >
    > Fair enough... but I'm reminded of a rant by Joel Spolsky about GUI
    > design:
    >
    > '...programmers have tended to think that if users are allowed to resize
    > and move windows, they should have complete flexibility over where these
    > windows go, right down to the last pixel. After all, positioning a window
    > 2 pixels from the top of the screen is "equally likely" as positioning a
    > window exactly at the top of the screen.'
    >
    > http://www.joelonsoftware.com/uibook/fog0000000249.html
    >
    > (three quarters of the way down, in Chapter 7.)
    >
    > Maybe I'm being unfair, but it seems to me that the attitude is similar:
    > 'there's no point optimizing the common case of printing (say) ints
    > stored in a list, Just In Case the programmer wants the incredibly rare
    > case of setting sys.stdout to some wacky object that modifies the list
    > he's iterating over. It could happen.'
    >
    > *shrug* But apart from the occasional smart-alec who does it just to
    > demonstrate that it is possible, it probably never has.
    >
    > It seems to me that the "consenting adults" philosophy of Python doesn't
    > extend to the compiler, and perhaps it should. Maybe Python could
    > optimize common cases, and if developers wanted to do wacky things, let
    > them turn optimization off on a module-by-module basis.
    >
    > Or even function-by-function. Wouldn't it be nice to have decorators that
    > could optimize the functions they decorated?
    >

    No, it would be disastrous, unless you could manage to implement a
    mechanism that *told* you when the optimizations were playing havoc with
    your program's execution.

    The problem is that debugging only works if you can assume a
    deterministic environment. If you are going to put funky optimizations
    in that break determinism in "little-used" corner cases, then debugging
    the situations when the optimizations *do* affect program execution will
    be a complete pain.

    >
    >>> for item in alist[1:5]:
    >>> alist.append(item) # side-effects DON'T matter

    >> The compiler doesn't know that, at the time of invocation,
    >> 'alist.append' doesn't have side effects that matter.

    >
    > It might if it knows what type alist is, and how long it is (as Marc
    > Rintsch points out). That's why I used "alist" in my example rather than
    > "an_arbitrary_sequence".
    >

    But alist doesn't *have* a type, and static program analysis would have
    to be extensive to determine that it could only ever be of a specific type.
    >
    >> The compiler for a dynamic language like Python has very little absolute
    >> "no significant side-effect in these specific cases" guarantee of the
    >> kind you're assuming even in the cases you choose for contrast with the
    >> ones that *do* have significant side-effects.

    >
    > Yes, in general one might not be able to make those guarantees, but still
    > there are common cases where you can. That's how Psycho works.
    >

    Yes, but psycho doesn't use static program analysis but instead uses
    run-time environment examination to determine whether specific
    optimizations can be applied.

    > The point is rather moot, since CPython (and probably other Pythons) do
    > almost no optimizations. But just because Python is a dynamic language
    > doesn't mean there are no optimizations possible: Haskell is a dynamic
    > language, and there are optimizing compilers for it. Of course, it is
    > much simpler for Haskell, because of the type system it uses.
    >
    > The CPython compiler already does a couple of source-code optimizations
    > (ignores some unused strings; some constant folding) and at least one
    > runtime optimization (string concatenation is no longer _always_ slow):
    >
    > http://utcc.utoronto.ca/~cks/space/blog/python/ExaminingStringConcatOpt
    >
    > Because of its very nature, optimizing Python is hard. Even something as
    > simple as tail-call recursion optimization is verboten, because Guido
    > considers good stack traces more important than preventing stack
    > overflows in the first place:
    >
    > http://www.tratt.net/laurie/tech_articles/articles/tail_call_optimization
    >

    And who's to say he's wrong, especially since Python is intended to be
    an environment that's friendly to beginners.

    > Anyway, I'm not really complaining. I like Python, I'm grateful for the
    > work the Python-dev people have already done, and I'm in no position to
    > volunteer to write an optimizing compiler. And it may be that, regardless
    > of how smart you make the compiler, Python simply can't be optimized
    > because of design decisions made back in 1990-whatever when Guido first
    > started his grand plan.
    >

    There is some of that. The environment is so dynamic that even something
    as innocuous as attribute lookup can actually involve properties (for
    example) reading a relational database! I'm not saying that optimization
    under these circumstances is possible, but that it's much more difficult
    than is easily anticipated, and that the benefits to be gained might be
    less than you would think.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden

    Sorry, the dog ate my .sigline
     
    Steve Holden, Sep 16, 2007
    #16
  17. James Stroud

    Steve Holden Guest

    Steven D'Aprano wrote:
    > On Sun, 16 Sep 2007 15:05:55 +1000, Ben Finney wrote:
    >
    >> Steven D'Aprano <> writes:
    >>
    >>> In *general* the compiler can't tell, but in specific cases it could. A
    >>> (hypothetical) optimizing compiler would tell the difference between:
    >>>
    >>> for item in alist[1:5]:
    >>> print item # no possible side-effects

    >> The 'print' statement converts the 'item' to a str. That conversion
    >> could, in a pathological case, have a side-effect (say, if the class of
    >> 'item' had an overridden '__str__' method with side effects), and the
    >> compiler can't know this isn't a pathological case.

    >
    > Fair enough... but I'm reminded of a rant by Joel Spolsky about GUI
    > design:
    >
    > '...programmers have tended to think that if users are allowed to resize
    > and move windows, they should have complete flexibility over where these
    > windows go, right down to the last pixel. After all, positioning a window
    > 2 pixels from the top of the screen is "equally likely" as positioning a
    > window exactly at the top of the screen.'
    >
    > http://www.joelonsoftware.com/uibook/fog0000000249.html
    >
    > (three quarters of the way down, in Chapter 7.)
    >
    > Maybe I'm being unfair, but it seems to me that the attitude is similar:
    > 'there's no point optimizing the common case of printing (say) ints
    > stored in a list, Just In Case the programmer wants the incredibly rare
    > case of setting sys.stdout to some wacky object that modifies the list
    > he's iterating over. It could happen.'
    >
    > *shrug* But apart from the occasional smart-alec who does it just to
    > demonstrate that it is possible, it probably never has.
    >
    > It seems to me that the "consenting adults" philosophy of Python doesn't
    > extend to the compiler, and perhaps it should. Maybe Python could
    > optimize common cases, and if developers wanted to do wacky things, let
    > them turn optimization off on a module-by-module basis.
    >
    > Or even function-by-function. Wouldn't it be nice to have decorators that
    > could optimize the functions they decorated?
    >

    No, it would be disastrous, unless you could manage to implement a
    mechanism that *told* you when the optimizations were playing havoc with
    your program's execution.

    The problem is that debugging only works if you can assume a
    deterministic environment. If you are going to put funky optimizations
    in that break determinism in "little-used" corner cases, then debugging
    the situations when the optimizations *do* affect program execution will
    be a complete pain.

    >
    >>> for item in alist[1:5]:
    >>> alist.append(item) # side-effects DON'T matter

    >> The compiler doesn't know that, at the time of invocation,
    >> 'alist.append' doesn't have side effects that matter.

    >
    > It might if it knows what type alist is, and how long it is (as Marc
    > Rintsch points out). That's why I used "alist" in my example rather than
    > "an_arbitrary_sequence".
    >

    But alist doesn't *have* a type, and static program analysis would have
    to be extensive to determine that it could only ever be of a specific type.
    >
    >> The compiler for a dynamic language like Python has very little absolute
    >> "no significant side-effect in these specific cases" guarantee of the
    >> kind you're assuming even in the cases you choose for contrast with the
    >> ones that *do* have significant side-effects.

    >
    > Yes, in general one might not be able to make those guarantees, but still
    > there are common cases where you can. That's how Psycho works.
    >

    Yes, but psycho doesn't use static program analysis but instead uses
    run-time environment examination to determine whether specific
    optimizations can be applied.

    > The point is rather moot, since CPython (and probably other Pythons) do
    > almost no optimizations. But just because Python is a dynamic language
    > doesn't mean there are no optimizations possible: Haskell is a dynamic
    > language, and there are optimizing compilers for it. Of course, it is
    > much simpler for Haskell, because of the type system it uses.
    >
    > The CPython compiler already does a couple of source-code optimizations
    > (ignores some unused strings; some constant folding) and at least one
    > runtime optimization (string concatenation is no longer _always_ slow):
    >
    > http://utcc.utoronto.ca/~cks/space/blog/python/ExaminingStringConcatOpt
    >
    > Because of its very nature, optimizing Python is hard. Even something as
    > simple as tail-call recursion optimization is verboten, because Guido
    > considers good stack traces more important than preventing stack
    > overflows in the first place:
    >
    > http://www.tratt.net/laurie/tech_articles/articles/tail_call_optimization
    >

    And who's to say he's wrong, especially since Python is intended to be
    an environment that's friendly to beginners.

    > Anyway, I'm not really complaining. I like Python, I'm grateful for the
    > work the Python-dev people have already done, and I'm in no position to
    > volunteer to write an optimizing compiler. And it may be that, regardless
    > of how smart you make the compiler, Python simply can't be optimized
    > because of design decisions made back in 1990-whatever when Guido first
    > started his grand plan.
    >

    There is some of that. The environment is so dynamic that even something
    as innocuous as attribute lookup can actually involve properties (for
    example) reading a relational database! I'm not saying that optimization
    under these circumstances is possible, but that it's much more difficult
    than is easily anticipated, and that the benefits to be gained might be
    less than you would think.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden

    Sorry, the dog ate my .sigline
     
    Steve Holden, Sep 16, 2007
    #17
  18. James Stroud

    Paul Rubin Guest

    Steven D'Aprano <> writes:
    > > What do you mean by Haskell is a dynamic language? It is statically and
    > > strict typed and the compiler usually knows all the functions. No
    > > "surprises", no side effects, no duck typing.

    >
    > Haskell's IO monad (and possibly the do monad?) allows side effects. It
    > would be a pretty poor programming language that didn't allow input or
    > output!


    In a certain theoretical sense, input and output in the IO monad are
    not side effects. Side effects means you can somehow distinguish
    between calling f(x) twice with the same value of x, and calling it
    just once. In Haskell, the print function does an output operation
    (called an "action") but if I understand it right, it turns out you
    cannot call the print function twice with the same argument. Of
    course you can say
    main = do { print "hello"; print "hello" }
    and that prints "hello" twice, but the do statement is syntactic sugar
    that turns the above into approximately like (the exact mechanism is
    a bit hairier):
    x = (special token representing the state of the external world)
    y = print ("hello", x) # y is the new external world state
    z = print ("hello", y) # z is another new state
    so print receives a different arg each time. (Note that every Haskell
    function has exactly one argument, so f(x,y) means call f, giving it
    the tuple (x,y) as an argument).

    > See also http://homepages.cwi.nl/~ralf/OOHaskell/
    > showing that Haskell can do object-oriented programming, complete with
    > mutable objects and side-effects. Although "duck typing" is listed as a
    > keyword, I couldn't see any reference to it in the paper.


    That describes an OO extension to Haskell, which I suppose is
    interesting, but it's not Haskell. You could also bolt a Python
    interpreter into Haskell and that would not be Haskell either.

    > Haskell uses type inference, and has a "maybe" type for those cases where
    > it can't tell what the type will be.


    Type inference just means the static type system doesn't rely on
    declarations. Like when you say 3+4 in C++ or Java, the compiler
    figures out that is an integer expression, but if you say
    x = 3 + 4
    without an "int x" declaration, the compiler complains. Haskell goes
    a little further and lets you say
    x = 3 + 4
    and the compiler says "ok, now I know that x is an integer". If you
    later try to concatenate x with a string:
    y = x ++ "hello"
    you get a compile time error message. In a dynamic language like
    Python you would not get the error message til runtime. That is the
    difference between a static and dynamic language.

    The Maybe monad is still a static type. You can think of it as a
    statically typed list with a maximum of one element, i.e. a Maybe Int
    can have values like [3] (denoted "Just 3") or like [] (the empty
    list, denoted Nothing). It cannot have a value like "foo" or
    [3.14159] or [[7]] or [2,3,4]. You get compile time error messages
    if you try to assign those values to it. Of course you can have a
    Maybe Float (like [3.14159]) or a Maybe (Maybe Int) (like [[7]]).

    > If you still don't accept that Haskell is a dynamic language, for
    > whatever definition of dynamic language you use, I'll withdraw the claim
    > for the sake of not getting bogged down in long arguments over something
    > that was really just a minor point.


    Haskell is certainly not a dynamic language in the usual sense of that
    term.
     
    Paul Rubin, Sep 16, 2007
    #18
  19. On Sun, 16 Sep 2007 09:12:56 -0700, Paul Rubin wrote:

    > Steven D'Aprano <> writes:
    >> > What do you mean by Haskell is a dynamic language? It is statically
    >> > and strict typed and the compiler usually knows all the functions.
    >> > No "surprises", no side effects, no duck typing.

    >>
    >> Haskell's IO monad (and possibly the do monad?) allows side effects. It
    >> would be a pretty poor programming language that didn't allow input or
    >> output!

    >
    > In a certain theoretical sense, input and output in the IO monad are not
    > side effects.


    Nevertheless, the authors of the paper listed below say they use the IO
    monad to get mutable objects and therefore side effects.


    >> See also http://homepages.cwi.nl/~ralf/OOHaskell/ showing that Haskell
    >> can do object-oriented programming, complete with mutable objects and
    >> side-effects. Although "duck typing" is listed as a keyword, I couldn't
    >> see any reference to it in the paper.

    >
    > That describes an OO extension to Haskell, which I suppose is
    > interesting, but it's not Haskell.


    Built entirely from standard Haskell98. Saying that it isn't Haskell is a
    little like saying that Python plus the Sets module isn't Python anymore.



    --
    Steven.
     
    Steven D'Aprano, Sep 16, 2007
    #19
  20. On Sun, 16 Sep 2007 11:49:15 -0400, Steve Holden wrote:

    >> It seems to me that the "consenting adults" philosophy of Python
    >> doesn't extend to the compiler, and perhaps it should. Maybe Python
    >> could optimize common cases, and if developers wanted to do wacky
    >> things, let them turn optimization off on a module-by-module basis.
    >>
    >> Or even function-by-function. Wouldn't it be nice to have decorators
    >> that could optimize the functions they decorated?
    >>

    > No, it would be disastrous,


    Disastrous, as in thousands dead, panic in the street, martial law
    declared? I'm sure your program in important, but is it really *that*
    important? :)


    > unless you could manage to implement a
    > mechanism that *told* you when the optimizations were playing havoc with
    > your program's execution.


    How about "the test suite passes when I turn optimization off, but fails
    when I turn optimization on"? How is that any different from "the test
    suite passes when I use StringIO, but not if I use cStringIO"?



    > The problem is that debugging only works if you can assume a
    > deterministic environment. If you are going to put funky optimizations
    > in that break determinism in "little-used" corner cases, then debugging
    > the situations when the optimizations *do* affect program execution will
    > be a complete pain.


    Right, so we're back to the "any pixel is as likely as any other pixel"
    example from Joel On Software I mentioned earlier. To prevent some
    hypothetical developer doing some wacky thing from having to work harder
    to debug some weird corner case, *everybody else* has to miss out on
    optimizations that could lead to significantly better performance.

    You might think that's a good trade-off. I don't.


    >>>> for item in alist[1:5]:
    >>>> alist.append(item) # side-effects DON'T matter
    >>> The compiler doesn't know that, at the time of invocation,
    >>> 'alist.append' doesn't have side effects that matter.

    >>
    >> It might if it knows what type alist is, and how long it is (as Marc
    >> Rintsch points out). That's why I used "alist" in my example rather
    >> than "an_arbitrary_sequence".
    >>

    > But alist doesn't *have* a type, and static program analysis would have
    > to be extensive to determine that it could only ever be of a specific
    > type.


    Of course alist has a type. Python is a strongly-typed language, not
    untyped.

    And I don't believe I specifically mentioned that static program analysis
    was the only optimization possible. But even if it was...

    I can *hand optimize* a function that I write, yes? I might *choose* to
    say to myself, "Screw duck typing, in *my* program function foo will only
    ever be called with an honest-to-god built-in list argument, I'm going to
    optimize it for that case". We're all consenting adults here, right?

    Well, what's the difference between me hand optimizing the function, and
    calling a built-in decorator that does it for me? Why is it okay for me
    to shoot myself in the foot, but not for me to tell the compiler to shoot
    me in the foot?


    >> Because of its very nature, optimizing Python is hard. Even something
    >> as simple as tail-call recursion optimization is verboten, because
    >> Guido considers good stack traces more important than preventing stack
    >> overflows in the first place:
    >>
    >> http://www.tratt.net/laurie/tech_articles/articles/

    tail_call_optimization
    >>

    > And who's to say he's wrong, especially since Python is intended to be
    > an environment that's friendly to beginners.


    I'm not saying he's *wrong*, I'm saying he's making trade-offs I would
    like to see relaxed.

    But in the specific case of recursion... have you seen the exception you
    get when the recursion limit is hit? By default, you get 1000 lines of '
    File "foo", line 2, in function_name' followed by "RuntimeError: maximum
    recursion depth exceeded". Maybe I'm missing something there, but I don't
    see how that's much of a help in debugging.


    >> Anyway, I'm not really complaining. I like Python, I'm grateful for the
    >> work the Python-dev people have already done, and I'm in no position to
    >> volunteer to write an optimizing compiler. And it may be that,
    >> regardless of how smart you make the compiler, Python simply can't be
    >> optimized because of design decisions made back in 1990-whatever when
    >> Guido first started his grand plan.
    >>

    > There is some of that. The environment is so dynamic that even something
    > as innocuous as attribute lookup can actually involve properties (for
    > example) reading a relational database! I'm not saying that optimization
    > under these circumstances is possible, but that it's much more difficult
    > than is easily anticipated, and that the benefits to be gained might be
    > less than you would think.


    I'm mostly objecting to people who say that Python's compiler "can't" be
    optimized, which is not true. As I mentioned, there's already a peephole
    optimizer. There are runtime optimizations possible. Psycho could become
    a built-in, if the right people wanted it to become a built-in. Etc. With
    a few million in funding, Python could become as heavily optimized as
    Common Lisp.

    Of course, whether that would still be the Python we know *now* is
    another question. But then, with decorators and generator expressions and
    iterators, Python 2.6 is not precisely the same as the Python we knew
    back in 1997, is it?



    --
    Steven.
     
    Steven D'Aprano, Sep 16, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gordon Macleod

    Struts iterations

    Gordon Macleod, Jul 21, 2003, in forum: Java
    Replies:
    0
    Views:
    421
    Gordon Macleod
    Jul 21, 2003
  2. Joona I Palaste

    Needless casts?

    Joona I Palaste, Apr 24, 2004, in forum: Java
    Replies:
    15
    Views:
    701
    Icemerth
    Apr 25, 2004
  3. Jukka Lahtinen
    Replies:
    1
    Views:
    942
    Jukka Lahtinen
    Feb 24, 2011
  4. Aaron Smith
    Replies:
    2
    Views:
    129
    Aaron Smith
    Mar 12, 2007
  5. Rainer Weikusat
    Replies:
    10
    Views:
    333
    C.DeRykus
    Mar 25, 2013
Loading...

Share This Page