Needless copying in iterations?

J

James Stroud

Hello all,

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):

for i in xrange(x, y):
whatever(stuff)

James
 
B

buffi

Hello all,

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):

for i in xrange(x, y):
whatever(stuff)

James


itertools.islice does what you want

import itertools
for something in itertools.islice(stuff, x, y):
whatever(something)
 
B

buffi

buffi said:
Hello all,
I was staring at a segment of code that looked like this today:
for something in stuff[x:y]:
whatever(something)
and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):
for i in xrange(x, y):
whatever(stuff)
James

itertools.islice does what you want
import itertools
for something in itertools.islice(stuff, x, y):
whatever(something)

Thanks buffi!

So I guess the interpreter does no optimization in the latter?

James


No, as far as I know it makes a new list out of the slice when you do
it like
for something in stuff[x:y]

- Björn Kempén
 
J

James Stroud

buffi said:
Hello all,

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):

for i in xrange(x, y):
whatever(stuff)

James


itertools.islice does what you want

import itertools
for something in itertools.islice(stuff, x, y):
whatever(something)


Thanks buffi!

So I guess the interpreter does no optimization in the latter?

James
 
C

Calvin Spealman

This is a case where its up to the type involved. For example,
xrange() slices the way you want but range() does not. Maybe a type
would return for slices a proxy object that got the value by index or
maybe it knows that it makes more sense to give you a copy because
changes during the iteration should not be reflected in the iteration.
It would be really easy to make a generic slicer.

Hello all,

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):

for i in xrange(x, y):
whatever(stuff)

James
 
B

buffi

This is a case where its up to the type involved. For example,
xrange() slices the way you want but range() does not.

Coul you explain this?
As far as I know you can't slice a xrange

- Björn Kempén
 
M

Marc 'BlackJack' Rintsch

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)?

The compiler can't "optimize" this as it would change the semantics.
There's no way for the compiler to tell if this copy really is "needless".
`whatever()` may change `stuff` where `i` is in `x:y` and this may lead
to different results wether it iterates over a copy or the original.

Ciao,
Marc 'BlackJack' Rintsch
 
S

Steven D'Aprano

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)?

The compiler can't "optimize" this as it would change the semantics.
There's no way for the compiler to tell if this copy really is
"needless". `whatever()` may change `stuff` where `i` is in `x:y` and
this may lead to different results wether it iterates over a copy or the
original.


In *general* the compiler can't tell, but in specific cases it could. A
(hypothetical) optimizing compiler would tell the difference between:


for item in alist[1:5]:
print item # no possible side-effects


for item in alist[1:5]:
function(alist, item) # there might be side-effects


for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter


for i, item in enumerate(alist[1:5]):
alist[i+1] = function(item) # side-effects DO matter


Of course this is all besides the point, since no such optimizing
compiler exists today, at least not for CPython. Any volunteers?
 
B

Ben Finney

Steven D'Aprano said:
In *general* the compiler can't tell, but in specific cases it
could. A (hypothetical) optimizing compiler would tell the
difference between:

for item in alist[1:5]:
print item # no possible side-effects

The 'print' statement converts the 'item' to a str. That conversion
could, in a pathological case, have a side-effect (say, if the class
of 'item' had an overridden '__str__' method with side effects), and
the compiler can't know this isn't a pathological case.
for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter

The compiler doesn't know that, at the time of invocation,
'alist.append' doesn't have side effects that matter.

The compiler for a dynamic language like Python has very little
absolute "no significant side-effect in these specific cases"
guarantee of the kind you're assuming even in the cases you choose for
contrast with the ones that *do* have significant side-effects.
 
M

Marc 'BlackJack' Rintsch

On Sun, 16 Sep 2007 00:05:58 +0000, Marc 'BlackJack' Rintsch wrote:

In *general* the compiler can't tell, but in specific cases it could. A
(hypothetical) optimizing compiler would tell the difference between:


for item in alist[1:5]:
print item # no possible side-effects

To expand on what Ben said: After conversion to `str` the result is then
given to `sys.stdout.write()` and of course `sys.stdout` could be another
object that changes `alist`.
for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter

Side effect do matter here, even in not so pathological cases. There are
cases where you get a different result whether you copy or not even with
plain vanilla `list`\s:

In [153]: alist = [1, 2, 3]

In [154]: for item in alist[1:5]:
.....: alist.append(item)
.....:

In [155]: alist
Out[155]: [1, 2, 3, 2, 3]

In [156]: alist = [1, 2, 3]

In [157]: for item in islice(alist, 1, 5):
.....: alist.append(item)
.....:

In [158]: alist
Out[158]: [1, 2, 3, 2, 3, 2, 3]

Ciao,
Marc 'BlackJack' Rintsch
 
S

Steven D'Aprano

Steven D'Aprano said:
In *general* the compiler can't tell, but in specific cases it could. A
(hypothetical) optimizing compiler would tell the difference between:

for item in alist[1:5]:
print item # no possible side-effects

The 'print' statement converts the 'item' to a str. That conversion
could, in a pathological case, have a side-effect (say, if the class of
'item' had an overridden '__str__' method with side effects), and the
compiler can't know this isn't a pathological case.

Fair enough... but I'm reminded of a rant by Joel Spolsky about GUI
design:

'...programmers have tended to think that if users are allowed to resize
and move windows, they should have complete flexibility over where these
windows go, right down to the last pixel. After all, positioning a window
2 pixels from the top of the screen is "equally likely" as positioning a
window exactly at the top of the screen.'

http://www.joelonsoftware.com/uibook/fog0000000249.html

(three quarters of the way down, in Chapter 7.)

Maybe I'm being unfair, but it seems to me that the attitude is similar:
'there's no point optimizing the common case of printing (say) ints
stored in a list, Just In Case the programmer wants the incredibly rare
case of setting sys.stdout to some wacky object that modifies the list
he's iterating over. It could happen.'

*shrug* But apart from the occasional smart-alec who does it just to
demonstrate that it is possible, it probably never has.

It seems to me that the "consenting adults" philosophy of Python doesn't
extend to the compiler, and perhaps it should. Maybe Python could
optimize common cases, and if developers wanted to do wacky things, let
them turn optimization off on a module-by-module basis.

Or even function-by-function. Wouldn't it be nice to have decorators that
could optimize the functions they decorated?

for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter

The compiler doesn't know that, at the time of invocation,
'alist.append' doesn't have side effects that matter.

It might if it knows what type alist is, and how long it is (as Marc
Rintsch points out). That's why I used "alist" in my example rather than
"an_arbitrary_sequence".

The compiler for a dynamic language like Python has very little absolute
"no significant side-effect in these specific cases" guarantee of the
kind you're assuming even in the cases you choose for contrast with the
ones that *do* have significant side-effects.

Yes, in general one might not be able to make those guarantees, but still
there are common cases where you can. That's how Psycho works.

The point is rather moot, since CPython (and probably other Pythons) do
almost no optimizations. But just because Python is a dynamic language
doesn't mean there are no optimizations possible: Haskell is a dynamic
language, and there are optimizing compilers for it. Of course, it is
much simpler for Haskell, because of the type system it uses.

The CPython compiler already does a couple of source-code optimizations
(ignores some unused strings; some constant folding) and at least one
runtime optimization (string concatenation is no longer _always_ slow):

http://utcc.utoronto.ca/~cks/space/blog/python/ExaminingStringConcatOpt

Because of its very nature, optimizing Python is hard. Even something as
simple as tail-call recursion optimization is verboten, because Guido
considers good stack traces more important than preventing stack
overflows in the first place:

http://www.tratt.net/laurie/tech_articles/articles/tail_call_optimization

Anyway, I'm not really complaining. I like Python, I'm grateful for the
work the Python-dev people have already done, and I'm in no position to
volunteer to write an optimizing compiler. And it may be that, regardless
of how smart you make the compiler, Python simply can't be optimized
because of design decisions made back in 1990-whatever when Guido first
started his grand plan.
 
M

Marc 'BlackJack' Rintsch

The point is rather moot, since CPython (and probably other Pythons) do
almost no optimizations. But just because Python is a dynamic language
doesn't mean there are no optimizations possible: Haskell is a dynamic
language, and there are optimizing compilers for it. Of course, it is
much simpler for Haskell, because of the type system it uses.

What do you mean by Haskell is a dynamic language? It is statically and
strict typed and the compiler usually knows all the functions. No
"surprises", no side effects, no duck typing.

Ciao,
Marc 'BlackJack' Rintsch
 
K

Kay Schluehr

Haskell is a dynamic language, and there are optimizing compilers for it.

Last time I used Haskell it was statically typed but this has been
almost a year ago and times are changing fast.

About optimizing compilers for Python. These won't work without
restrictions a la Shedskin which uses C++ as its backend. But unlike
Shedskin and the design mistake being made in this project we can
learn the lesson from Psyco that there must be a fallback to bytecode
interpretation when some argument does not fit the signature of a
specialized function. I'm personally a bit skeptical about JIT
compilation since I feel it's a waste of electric currency to start
recording all the information at runtime and perform native
compilations again and again. It doesn't mean that JIT compilation
can't be effective. I have a conceptual bias against it.

Instead I expect a rather stable behaviour and rather fixed nominal
types despite the fact that Python supports duck-typing. Unfortunately
I'm unable to recover an investigation about the effectiveness of
adding tests to a rather well equipped testbase. The result was one of
"declining profits". Stable behaviour emerged early with good coverage
properties. Than one had to work hard to improve and find more bugs. A
similar effect might also cause the effectiveness of dynamic typing
for catching errors which is not easy to justify by purely theoretical
arguments.

-----

Note that I worked out some pieces of a type feedback mechanism that
records types and displays the results in a slightly modified Python
dialect called TPython. A TPython program can be regarded as a "typed
hypothesis" about an otherwise untyped ( or dynamically typed ) Python
program. You can keep the TPython program and translate it completely
or fragments of it to other languages like C++, D, Java, OCaml etc.
Obviously this approach shall be used for incremental progress and
smooth integration and it might work even when only small portions can
be translated ( typically functions that keep and return only integers/
floats which are bound in range are a good start ). So far everything
is implemented in pure Python and the amount of source is < 1000 LOC.

I have not published anything of it yet and i'm a rather slow
publisher anyway. If anyone is interested in this kind of work and
willing to contribute i might set it aside from the usual EasyExtend
stuff ( which is the foundation nevertheless ) I'm working on and
reframe the project context e.g. on Sourceforge or elsewhere. Email
address and project page are still

(e-mail address removed)
www.fiber-space.de

If no one contributes the fun is exclusively on my side and everything
is moving slower. That's all.
 
S

Steven D'Aprano

What do you mean by Haskell is a dynamic language? It is statically and
strict typed and the compiler usually knows all the functions. No
"surprises", no side effects, no duck typing.

Haskell's IO monad (and possibly the do monad?) allows side effects. It
would be a pretty poor programming language that didn't allow input or
output!

See also http://homepages.cwi.nl/~ralf/OOHaskell/
showing that Haskell can do object-oriented programming, complete with
mutable objects and side-effects. Although "duck typing" is listed as a
keyword, I couldn't see any reference to it in the paper.

Haskell uses type inference, and has a "maybe" type for those cases where
it can't tell what the type will be.

If you still don't accept that Haskell is a dynamic language, for
whatever definition of dynamic language you use, I'll withdraw the claim
for the sake of not getting bogged down in long arguments over something
that was really just a minor point.
 
M

Marc 'BlackJack' Rintsch

Haskell's IO monad (and possibly the do monad?) allows side effects. It
would be a pretty poor programming language that didn't allow input or
output!

Those side effects stay in the IO monad and don't "leak" into other areas
so the compiler still knows much more about the possible data flow than
the compilers of most other languages.
Haskell uses type inference, and has a "maybe" type for those cases where
it can't tell what the type will be.

Haskell is statically typed, the compiler must know the type of every name
otherwise it stops with an error message. The `Maybe` type is for
functions that may return something (typed) or `Nothing`. For example a
`lookup` function can have the signature:

lookup :: Eq a => a -> [(a,b)] -> Maybe b
lookup key assocList = ...

The compiler infers the concrete types of `a` and `b` at compile time.
If you still don't accept that Haskell is a dynamic language, for
whatever definition of dynamic language you use, I'll withdraw the claim
for the sake of not getting bogged down in long arguments over something
that was really just a minor point.

You said it must be possible to optimize a dynamic language because there
are optimizing compilers for a dynamic language like Haskell. That's a
minor point now?

The main problem with optimizing Python code is that the language is
dynamically typed -- the types of objects bound to names are only checked
at runtime and can change at nearly every point in time while executing.
Haskell is statically typed, the types are all inferred at compile time so
it's possible to optimize the code for those types. The compiler knows
the exact type of every name. By what definition is that a dynamic
language!?

Ciao,
Marc 'BlackJack' Rintsch
 
S

Steve Holden

Steven said:
Steven D'Aprano said:
In *general* the compiler can't tell, but in specific cases it could. A
(hypothetical) optimizing compiler would tell the difference between:

for item in alist[1:5]:
print item # no possible side-effects
The 'print' statement converts the 'item' to a str. That conversion
could, in a pathological case, have a side-effect (say, if the class of
'item' had an overridden '__str__' method with side effects), and the
compiler can't know this isn't a pathological case.

Fair enough... but I'm reminded of a rant by Joel Spolsky about GUI
design:

'...programmers have tended to think that if users are allowed to resize
and move windows, they should have complete flexibility over where these
windows go, right down to the last pixel. After all, positioning a window
2 pixels from the top of the screen is "equally likely" as positioning a
window exactly at the top of the screen.'

http://www.joelonsoftware.com/uibook/fog0000000249.html

(three quarters of the way down, in Chapter 7.)

Maybe I'm being unfair, but it seems to me that the attitude is similar:
'there's no point optimizing the common case of printing (say) ints
stored in a list, Just In Case the programmer wants the incredibly rare
case of setting sys.stdout to some wacky object that modifies the list
he's iterating over. It could happen.'

*shrug* But apart from the occasional smart-alec who does it just to
demonstrate that it is possible, it probably never has.

It seems to me that the "consenting adults" philosophy of Python doesn't
extend to the compiler, and perhaps it should. Maybe Python could
optimize common cases, and if developers wanted to do wacky things, let
them turn optimization off on a module-by-module basis.

Or even function-by-function. Wouldn't it be nice to have decorators that
could optimize the functions they decorated?
No, it would be disastrous, unless you could manage to implement a
mechanism that *told* you when the optimizations were playing havoc with
your program's execution.

The problem is that debugging only works if you can assume a
deterministic environment. If you are going to put funky optimizations
in that break determinism in "little-used" corner cases, then debugging
the situations when the optimizations *do* affect program execution will
be a complete pain.
for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter
The compiler doesn't know that, at the time of invocation,
'alist.append' doesn't have side effects that matter.

It might if it knows what type alist is, and how long it is (as Marc
Rintsch points out). That's why I used "alist" in my example rather than
"an_arbitrary_sequence".
But alist doesn't *have* a type, and static program analysis would have
to be extensive to determine that it could only ever be of a specific type.
Yes, in general one might not be able to make those guarantees, but still
there are common cases where you can. That's how Psycho works.
Yes, but psycho doesn't use static program analysis but instead uses
run-time environment examination to determine whether specific
optimizations can be applied.
The point is rather moot, since CPython (and probably other Pythons) do
almost no optimizations. But just because Python is a dynamic language
doesn't mean there are no optimizations possible: Haskell is a dynamic
language, and there are optimizing compilers for it. Of course, it is
much simpler for Haskell, because of the type system it uses.

The CPython compiler already does a couple of source-code optimizations
(ignores some unused strings; some constant folding) and at least one
runtime optimization (string concatenation is no longer _always_ slow):

http://utcc.utoronto.ca/~cks/space/blog/python/ExaminingStringConcatOpt

Because of its very nature, optimizing Python is hard. Even something as
simple as tail-call recursion optimization is verboten, because Guido
considers good stack traces more important than preventing stack
overflows in the first place:

http://www.tratt.net/laurie/tech_articles/articles/tail_call_optimization
And who's to say he's wrong, especially since Python is intended to be
an environment that's friendly to beginners.
Anyway, I'm not really complaining. I like Python, I'm grateful for the
work the Python-dev people have already done, and I'm in no position to
volunteer to write an optimizing compiler. And it may be that, regardless
of how smart you make the compiler, Python simply can't be optimized
because of design decisions made back in 1990-whatever when Guido first
started his grand plan.
There is some of that. The environment is so dynamic that even something
as innocuous as attribute lookup can actually involve properties (for
example) reading a relational database! I'm not saying that optimization
under these circumstances is possible, but that it's much more difficult
than is easily anticipated, and that the benefits to be gained might be
less than you would think.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline
 
S

Steve Holden

Steven said:
Steven D'Aprano said:
In *general* the compiler can't tell, but in specific cases it could. A
(hypothetical) optimizing compiler would tell the difference between:

for item in alist[1:5]:
print item # no possible side-effects
The 'print' statement converts the 'item' to a str. That conversion
could, in a pathological case, have a side-effect (say, if the class of
'item' had an overridden '__str__' method with side effects), and the
compiler can't know this isn't a pathological case.

Fair enough... but I'm reminded of a rant by Joel Spolsky about GUI
design:

'...programmers have tended to think that if users are allowed to resize
and move windows, they should have complete flexibility over where these
windows go, right down to the last pixel. After all, positioning a window
2 pixels from the top of the screen is "equally likely" as positioning a
window exactly at the top of the screen.'

http://www.joelonsoftware.com/uibook/fog0000000249.html

(three quarters of the way down, in Chapter 7.)

Maybe I'm being unfair, but it seems to me that the attitude is similar:
'there's no point optimizing the common case of printing (say) ints
stored in a list, Just In Case the programmer wants the incredibly rare
case of setting sys.stdout to some wacky object that modifies the list
he's iterating over. It could happen.'

*shrug* But apart from the occasional smart-alec who does it just to
demonstrate that it is possible, it probably never has.

It seems to me that the "consenting adults" philosophy of Python doesn't
extend to the compiler, and perhaps it should. Maybe Python could
optimize common cases, and if developers wanted to do wacky things, let
them turn optimization off on a module-by-module basis.

Or even function-by-function. Wouldn't it be nice to have decorators that
could optimize the functions they decorated?
No, it would be disastrous, unless you could manage to implement a
mechanism that *told* you when the optimizations were playing havoc with
your program's execution.

The problem is that debugging only works if you can assume a
deterministic environment. If you are going to put funky optimizations
in that break determinism in "little-used" corner cases, then debugging
the situations when the optimizations *do* affect program execution will
be a complete pain.
for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter
The compiler doesn't know that, at the time of invocation,
'alist.append' doesn't have side effects that matter.

It might if it knows what type alist is, and how long it is (as Marc
Rintsch points out). That's why I used "alist" in my example rather than
"an_arbitrary_sequence".
But alist doesn't *have* a type, and static program analysis would have
to be extensive to determine that it could only ever be of a specific type.
Yes, in general one might not be able to make those guarantees, but still
there are common cases where you can. That's how Psycho works.
Yes, but psycho doesn't use static program analysis but instead uses
run-time environment examination to determine whether specific
optimizations can be applied.
The point is rather moot, since CPython (and probably other Pythons) do
almost no optimizations. But just because Python is a dynamic language
doesn't mean there are no optimizations possible: Haskell is a dynamic
language, and there are optimizing compilers for it. Of course, it is
much simpler for Haskell, because of the type system it uses.

The CPython compiler already does a couple of source-code optimizations
(ignores some unused strings; some constant folding) and at least one
runtime optimization (string concatenation is no longer _always_ slow):

http://utcc.utoronto.ca/~cks/space/blog/python/ExaminingStringConcatOpt

Because of its very nature, optimizing Python is hard. Even something as
simple as tail-call recursion optimization is verboten, because Guido
considers good stack traces more important than preventing stack
overflows in the first place:

http://www.tratt.net/laurie/tech_articles/articles/tail_call_optimization
And who's to say he's wrong, especially since Python is intended to be
an environment that's friendly to beginners.
Anyway, I'm not really complaining. I like Python, I'm grateful for the
work the Python-dev people have already done, and I'm in no position to
volunteer to write an optimizing compiler. And it may be that, regardless
of how smart you make the compiler, Python simply can't be optimized
because of design decisions made back in 1990-whatever when Guido first
started his grand plan.
There is some of that. The environment is so dynamic that even something
as innocuous as attribute lookup can actually involve properties (for
example) reading a relational database! I'm not saying that optimization
under these circumstances is possible, but that it's much more difficult
than is easily anticipated, and that the benefits to be gained might be
less than you would think.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline
 
P

Paul Rubin

Steven D'Aprano said:
Haskell's IO monad (and possibly the do monad?) allows side effects. It
would be a pretty poor programming language that didn't allow input or
output!

In a certain theoretical sense, input and output in the IO monad are
not side effects. Side effects means you can somehow distinguish
between calling f(x) twice with the same value of x, and calling it
just once. In Haskell, the print function does an output operation
(called an "action") but if I understand it right, it turns out you
cannot call the print function twice with the same argument. Of
course you can say
main = do { print "hello"; print "hello" }
and that prints "hello" twice, but the do statement is syntactic sugar
that turns the above into approximately like (the exact mechanism is
a bit hairier):
x = (special token representing the state of the external world)
y = print ("hello", x) # y is the new external world state
z = print ("hello", y) # z is another new state
so print receives a different arg each time. (Note that every Haskell
function has exactly one argument, so f(x,y) means call f, giving it
the tuple (x,y) as an argument).
See also http://homepages.cwi.nl/~ralf/OOHaskell/
showing that Haskell can do object-oriented programming, complete with
mutable objects and side-effects. Although "duck typing" is listed as a
keyword, I couldn't see any reference to it in the paper.

That describes an OO extension to Haskell, which I suppose is
interesting, but it's not Haskell. You could also bolt a Python
interpreter into Haskell and that would not be Haskell either.
Haskell uses type inference, and has a "maybe" type for those cases where
it can't tell what the type will be.

Type inference just means the static type system doesn't rely on
declarations. Like when you say 3+4 in C++ or Java, the compiler
figures out that is an integer expression, but if you say
x = 3 + 4
without an "int x" declaration, the compiler complains. Haskell goes
a little further and lets you say
x = 3 + 4
and the compiler says "ok, now I know that x is an integer". If you
later try to concatenate x with a string:
y = x ++ "hello"
you get a compile time error message. In a dynamic language like
Python you would not get the error message til runtime. That is the
difference between a static and dynamic language.

The Maybe monad is still a static type. You can think of it as a
statically typed list with a maximum of one element, i.e. a Maybe Int
can have values like [3] (denoted "Just 3") or like [] (the empty
list, denoted Nothing). It cannot have a value like "foo" or
[3.14159] or [[7]] or [2,3,4]. You get compile time error messages
if you try to assign those values to it. Of course you can have a
Maybe Float (like [3.14159]) or a Maybe (Maybe Int) (like [[7]]).
If you still don't accept that Haskell is a dynamic language, for
whatever definition of dynamic language you use, I'll withdraw the claim
for the sake of not getting bogged down in long arguments over something
that was really just a minor point.

Haskell is certainly not a dynamic language in the usual sense of that
term.
 
S

Steven D'Aprano

In a certain theoretical sense, input and output in the IO monad are not
side effects.

Nevertheless, the authors of the paper listed below say they use the IO
monad to get mutable objects and therefore side effects.

That describes an OO extension to Haskell, which I suppose is
interesting, but it's not Haskell.

Built entirely from standard Haskell98. Saying that it isn't Haskell is a
little like saying that Python plus the Sets module isn't Python anymore.
 
S

Steven D'Aprano

No, it would be disastrous,

Disastrous, as in thousands dead, panic in the street, martial law
declared? I'm sure your program in important, but is it really *that*
important? :)

unless you could manage to implement a
mechanism that *told* you when the optimizations were playing havoc with
your program's execution.

How about "the test suite passes when I turn optimization off, but fails
when I turn optimization on"? How is that any different from "the test
suite passes when I use StringIO, but not if I use cStringIO"?


The problem is that debugging only works if you can assume a
deterministic environment. If you are going to put funky optimizations
in that break determinism in "little-used" corner cases, then debugging
the situations when the optimizations *do* affect program execution will
be a complete pain.

Right, so we're back to the "any pixel is as likely as any other pixel"
example from Joel On Software I mentioned earlier. To prevent some
hypothetical developer doing some wacky thing from having to work harder
to debug some weird corner case, *everybody else* has to miss out on
optimizations that could lead to significantly better performance.

You might think that's a good trade-off. I don't.

for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter
The compiler doesn't know that, at the time of invocation,
'alist.append' doesn't have side effects that matter.

It might if it knows what type alist is, and how long it is (as Marc
Rintsch points out). That's why I used "alist" in my example rather
than "an_arbitrary_sequence".
But alist doesn't *have* a type, and static program analysis would have
to be extensive to determine that it could only ever be of a specific
type.

Of course alist has a type. Python is a strongly-typed language, not
untyped.

And I don't believe I specifically mentioned that static program analysis
was the only optimization possible. But even if it was...

I can *hand optimize* a function that I write, yes? I might *choose* to
say to myself, "Screw duck typing, in *my* program function foo will only
ever be called with an honest-to-god built-in list argument, I'm going to
optimize it for that case". We're all consenting adults here, right?

Well, what's the difference between me hand optimizing the function, and
calling a built-in decorator that does it for me? Why is it okay for me
to shoot myself in the foot, but not for me to tell the compiler to shoot
me in the foot?

And who's to say he's wrong, especially since Python is intended to be
an environment that's friendly to beginners.

I'm not saying he's *wrong*, I'm saying he's making trade-offs I would
like to see relaxed.

But in the specific case of recursion... have you seen the exception you
get when the recursion limit is hit? By default, you get 1000 lines of '
File "foo", line 2, in function_name' followed by "RuntimeError: maximum
recursion depth exceeded". Maybe I'm missing something there, but I don't
see how that's much of a help in debugging.

There is some of that. The environment is so dynamic that even something
as innocuous as attribute lookup can actually involve properties (for
example) reading a relational database! I'm not saying that optimization
under these circumstances is possible, but that it's much more difficult
than is easily anticipated, and that the benefits to be gained might be
less than you would think.

I'm mostly objecting to people who say that Python's compiler "can't" be
optimized, which is not true. As I mentioned, there's already a peephole
optimizer. There are runtime optimizations possible. Psycho could become
a built-in, if the right people wanted it to become a built-in. Etc. With
a few million in funding, Python could become as heavily optimized as
Common Lisp.

Of course, whether that would still be the Python we know *now* is
another question. But then, with decorators and generator expressions and
iterators, Python 2.6 is not precisely the same as the Python we knew
back in 1997, is it?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top