Python 3000 idea -- + on iterables -> itertools.chain

J

John Reese

It seems like it would be clear and mostly backwards compatible if the
+ operator on any iterables created a new iterable that iterated
throught first its left operand and then its right, in the style of
itertools.chain. This would allow summation of generator expressions,
among other things, to have the obvious meaning.

Any thoughts? Has this been discussed before? I didn't see it
mentioned in PEP 3100.

The exception to the compatibility argument is of course those
iterables for which + is already defined, like tuples and lists, for
that set of code that assumes that the result is of that same type,
explicitly or implicitly by calling len or indexing or whathaveyou.
In those cases, you could call tuple or list on the result. There are
any number of other things in Python 3000 switching from lists to
one-at-a-time iterators, like dict.items(), so presumably this form of
incompatibility isn't a showstopper.
 
F

Fredrik Lundh

John said:
It seems like it would be clear and mostly backwards compatible if the
+ operator on any iterables created a new iterable that iterated
throught first its left operand and then its right, in the style of
itertools.chain.

you do know that "iterable" is an informal interface, right? to what
class would you add this operation?

</F>
 
G

George Sakkis

Fredrik said:
you do know that "iterable" is an informal interface, right? to what
class would you add this operation?

</F>

The base object class would be one candidate, similarly to the way
__nonzero__ is defined to use __len__, or __contains__ to use __iter__.

Alternatively, iter() could be a wrapper type (or perhaps mixin)
instead of a function, something like:

from itertools import chain, tee, islice

import __builtin__
_builtin_iter = __builtin__.iter

class iter(object):

def __init__(self, iterable):
self._it = _builtin_iter(iterable)

def __iter__(self):
return self
def next(self):
return self._it.next()

def __getitem__(self, index):
if isinstance(index, int):
try: return islice(self._it, index, index+1).next()
except StopIteration:
raise IndexError('Index %d out of range' % index)
else:
start,stop,step = index.start, index.stop, index.step
if start is None: start = 0
if step is None: step = 1
return islice(self._it, start, stop, step)

def __add__(self, other):
return chain(self._it, other)
def __radd__(self,other):
return chain(other, self._it)

def __mul__(self, num):
return chain(*tee(self._it,num))

__rmul__ = __mul__

__builtin__.iter = iter


if __name__ == '__main__':
def irange(*args):
return iter(xrange(*args))

assert list(irange(5)[:3]) == range(5)[:3]
assert list(irange(5)[3:]) == range(5)[3:]
assert list(irange(5)[1:3]) == range(5)[1:3]
assert list(irange(5)[3:1]) == range(5)[3:1]
assert list(irange(5)[:]) == range(5)[:]
assert irange(5)[3] == range(5)[3]

s = range(5) + range(7,9)
assert list(irange(5) + irange(7,9)) == s
assert list(irange(5) + range(7,9)) == s
assert list(range(5) + irange(7,9)) == s

s = range(5) * 3
assert list(irange(5) * 3) == s
assert list(3 * irange(5)) == s


George
 
F

Fredrik Lundh

George said:
The base object class would be one candidate, similarly to the way
__nonzero__ is defined to use __len__, or __contains__ to use __iter__.

Alternatively, iter() could be a wrapper type (or perhaps mixin)
instead of a function, something like:

so you're proposing to either make *all* objects respond to "+", or
introduce limited *iterator* algebra.

not sure how that matches the OP's wish for "mostly backwards
compatible" support for *iterable* algebra, really...

(iirc, GvR has shot down a few earlier "let's provide sugar for iter-
tools" proposals. no time to dig up the links right now, but it's in
the python-dev archives, somewhere...)

</F>
 
G

Georg Brandl

George said:
The base object class would be one candidate, similarly to the way
__nonzero__ is defined to use __len__, or __contains__ to use __iter__.

What has a better chance of success in my eyes is an extension to yield
all items from an iterable without using an explicit for loop: instead of

for item in iterable:
yield item

you could write

yield from iterable

or

yield *iterable

etc.

Georg
 
G

George Sakkis

Fredrik said:
so you're proposing to either make *all* objects respond to "+", or
introduce limited *iterator* algebra.

If by 'respond to "+"' is implied that you can get a "TypeError:
iterable argument required", as you get now for attempting "x in y" for
non-iterable y, why not ? Although I like the iterator algebra idea
better.
not sure how that matches the OP's wish for "mostly backwards
compatible" support for *iterable* algebra, really...

Given the subject of the thread, backwards compatibility is not the
main prerequisite. Besides, it's an *extension* idea; allow operations
that were not allowed before, not the other way around or modifying
existing semantics. Of course, programs that attempt forbidden
expressions on purpose so that they can catch and handle the exception
would break when suddenly no exception is raised, but I doubt there are
many of those...

George
 
C

Carl Banks

Georg said:
What has a better chance of success in my eyes is an extension to yield
all items from an iterable without using an explicit for loop: instead of

for item in iterable:
yield item

you could write

yield from iterable

or

yield *iterable

Since this is nothing but an alternate way to spell a very specific
(and not-too-common) for loop, I expect this has zero chance of
success.


Carl Banks
 
C

Carl Banks

George said:
If by 'respond to "+"' is implied that you can get a "TypeError:
iterable argument required", as you get now for attempting "x in y" for
non-iterable y, why not ?

Bad idea on many, many levels. Don't go there.

Although I like the iterator algebra idea
better.


Given the subject of the thread, backwards compatibility is not the
main prerequisite. Besides, it's an *extension* idea; allow operations
that were not allowed before, not the other way around or modifying
existing semantics.

You missed the important word (in spite of Fredrick's emphasis):
iterable. Your iter class solution only works for *iterators* (and not
even all iterators); the OP wanted it to work for any *iterable*.

"Iterator" and "iterable" are protocols. The only way to implement
what the OP wanted is to change iterable protocol, which means changing
the documentation to say that iterable objects must implement __add__
and that it must chain the iterables, and updating all iterable types
to do this. Besides the large amount of work that this will need,
there are other problems.

1. It increases the burden on third party iterable developers.
Protocols should be kept as simple as possible for this reason.
2. Many iterable types already implement __add__ (list, tuple, string),
so this new requirement would complicate these guys a lot.
Of course, programs that attempt forbidden
expressions on purpose so that they can catch and handle the exception
would break when suddenly no exception is raised, but I doubt there are
many of those...

3. While not breaking backwards compatibility in the strictest sense,
the adverse effect on incorrect code shouldn't be brushed aside. It
would be a bad thing if this incorrect code:

a = ["hello"]
b = "world"
a+b

suddenly started failing silently instead of raising an exception.


Carl Banks
 
G

George Sakkis

Carl said:
Bad idea on many, many levels. Don't go there.

Do you also find the way "in" works today a bad idea ?
You missed the important word (in spite of Fredrick's emphasis):
iterable. Your iter class solution only works for *iterators* (and not
even all iterators); the OP wanted it to work for any *iterable*.

I didn't miss the important word, I know the distinction between
iterables and iterators; that's why I said I like the iterator algebra
idea better (compared to extending the object class so that effectively
creates an iterable algebra).
"Iterator" and "iterable" are protocols. The only way to implement
what the OP wanted is to change iterable protocol, which means changing
the documentation to say that iterable objects must implement __add__
and that it must chain the iterables, and updating all iterable types
to do this. Besides the large amount of work that this will need,
there are other problems.

1. It increases the burden on third party iterable developers.
Protocols should be kept as simple as possible for this reason.
2. Many iterable types already implement __add__ (list, tuple, string),
so this new requirement would complicate these guys a lot.

If __add__ was ever to be part of the *iterable* protocol, it would be
silly to implement it for every new iterable type; the implementation
would always be the same (i.e. chain(self,other)), so it should be put
in a base class all iterables extend from. That would be either a
mixin class, or object. This is parallel to how __contains__ is part of
the sequence protocol, but if you (the 3rd party sequence developer)
don't define one, a default __contains__ that relies on __getitem__ is
created for you.
Of course, programs that attempt forbidden
expressions on purpose so that they can catch and handle the exception
would break when suddenly no exception is raised, but I doubt there are
many of those...

3. While not breaking backwards compatibility in the strictest sense,
the adverse effect on incorrect code shouldn't be brushed aside. It
would be a bad thing if this incorrect code:

a = ["hello"]
b = "world"
a+b

suddenly started failing silently instead of raising an exception.

That's a good example for why I prefer an iterator rather than an
iterable algebra; the latter is too implicit as "a + b" doesn't call
only __add__, but __iter__ as well. On the other hand, with a concrete
iterator type "iter(a) + iter(b)" is not any more error-prone than
'int(3) + int("2")' or 'str(3) + str("2")'.

What's the objection to an *iterator* base type and the algebra it
introduces explicitly ?

George
 
G

Georg Brandl

Carl said:
Since this is nothing but an alternate way to spell a very specific
(and not-too-common) for loop, I expect this has zero chance of
success.

well, it could also be optimized internally, i.e. with a new opcode.

Georg
 
C

Carl Banks

George said:
Do you also find the way "in" works today a bad idea ?

Augh. I don't like it much, but (assuming that there are good use
cases for testing containment in iterables that don't define
__contains__) it seems to be the best way to accomplish it for
iterables in general. However, "in" isn't even comparable to "add"
here.

First of all, unlike "add", the nature of "in" more of less requires
that the second operand is some kind of collection, so surprises are
kept to a minimum. Second, testing containment is just a bit more
important, and thus deserving of a special case, than chaining
iterables.

The problem is taking a very general, already highly overloaded
operator +, and adding a special case to the interpreter for one of the
least common uses. It's just a bad idea.

3. While not breaking backwards compatibility in the strictest sense,
the adverse effect on incorrect code shouldn't be brushed aside. It
would be a bad thing if this incorrect code:

a = ["hello"]
b = "world"
a+b

suddenly started failing silently instead of raising an exception.

That's a good example for why I prefer an iterator rather than an
iterable algebra; the latter is too implicit as "a + b" doesn't call
only __add__, but __iter__ as well. On the other hand, with a concrete
iterator type "iter(a) + iter(b)" is not any more error-prone than
'int(3) + int("2")' or 'str(3) + str("2")'.

What's the objection to an *iterator* base type and the algebra it
introduces explicitly ?

Well, it still makes it more work to implement iterator protocol, which
is enough reason to make me -1 on it. Anyways, I don't think it's very
useful to have it for iterators because most people write functions for
iterables. You'd have to write "iter(a)+iter(b)" to chain two
iterables, which pretty much undoes the main convenience of the +
operator (i.e., brevity). But it isn't dangerous.


Carl Banks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top