How about adding slice notation to iterators/generators?

E

Eloff

I was just working with a generator for a tree that I wanted to skip
the first result (root node.)

And it occurs to me, why do we need to do:

import sys
from itertools import islice

my_iter = islice(my_iter, 1, sys.maxint)

When we could simply add slice operations to generators?

for x in my_iter[1:]:
pass

The way I figure it, only if there is no __getitem__ defined, and the
object has an __iter__ (i.e. not a list, but a iter(list) would be
ok), then there should be a default __getitem__ that is really just
islice, with similar limitations (no negative indices.)

The idea might need a bit of polish, but fundamentally it seems like
it could make it easier to deal with slicing generators?

At the least, stop=sys.maxint, step=1, and islice needs to accept
kwargs. So you could do islice(my_iter, start=1), that's an oversight
imho.
 
T

Terry Reedy

Eloff said:
I was just working with a generator for a tree that I wanted to skip
the first result (root node.)

There is already an obvious standard way to do this.

it = <whatever>
next(it) #toss first item
for item in it:
....
And it occurs to me, why do we need to do:

import sys
from itertools import islice

my_iter = islice(my_iter, 1, sys.maxint)

When we could simply add slice operations to generators?

for x in my_iter[1:]:
pass

1. islice works with any iterator; generator method would only work with
generators
2. iterator protocol is intentionally simple.
 
B

Bearophile

Terry Reedy:
1. islice works with any iterator; generator method would only work with
generators

A slice syntax that's syntactic sugar for islice(some_iter,1,None) may
be added to all iterators.

2. iterator protocol is intentionally simple.<

Slice syntax is already available for lists, tuples, strings, arrays,
numpy, etc, so adding it to iterators too doesn't look like adding
that large amount of information to the mind of the programmer.

Bye,
bearophile
 
E

Eloff

There is already an obvious standard way to do this.

it = <whatever>
next(it) #toss first item
for item in it:
  ....

That fails if there is no first item. You're taking one corner case
and saying there's an easy way to do it, which is more or less true,
but you miss my point that I'm suggesting we make the general case
easier.

By giving iterators a default, overridable, __getitem__ that is just
syntactic sugar for islice, they would share a slightly larger
interface subset with the builtin container types. In a duck-typed
language like python, that's almost always a good thing. You could use
iterators in more situations which expect something more like a list.

As long as it breaks no rationally existing code, I can think of no
good reason why not to do this in a future python.

-Dan
 
C

Carl Banks

Terry Reedy:


A slice syntax that's syntactic sugar for islice(some_iter,1,None) may
be added to all iterators.

All custom iterators would then be burdened to support it. (That is
more than enough reason for me to oppose it.)

Slice syntax is already available for lists, tuples, strings, arrays,
numpy, etc, so adding it to iterators too doesn't look like adding
that large amount of information to the mind of the programmer.

Yes it would be, it'd be an unacceptable burden.

The difference between iterators and containers is that it's a lot
more common to write custom iterators, and a lot more useful to. The
iterator protocol must be kept simple so that customizing is kept
simple, the more burdens you place on iterator implementors, the less
people will chose to use them, and the less useful iterators will be.

For objects that are not commonly used directly by the programmer,
adding a slice syntax to the protocol is absolutely ludicrous.
There's no justification for it.

Use islice(), or slice the iterable beforehand. Leave it out of
iterators.


Carl Banks
 
R

ryles

By giving iterators a default, overridable, __getitem__ that is just
syntactic sugar for islice, they would share a slightly larger
interface subset with the builtin container types. In a duck-typed
language like python, that's almost always a good thing. You could use
iterators in more situations which expect something more like a list.

As long as it breaks no rationally existing code, I can think of no
good reason why not to do this in a future python.

I think Python programmers have learned to expect certain things from
objects that support __getitem__. For example, indexing and slicing is
repeatable on the same object:

a[1] == a[1]

a[1:4] == a[1:4]

If you saw the above code would you want to think twice above whether
or not these expressions were true?

Iterators don't have a general concept of "get item" like types such
as strings, lists, etc. They have a concept of "get next item". So,
with your proposal, i[1] != i[1] and i[1:4] != i[1:4].

Not only that, it's also common for types with __getitem__ to have
__len__, which we obviously can't provide.

So, ultimately, although it could afford some small conveniences, I
think trying to mix iterators with __getitem__ would cause too much
confusion.

The use of islice() is both readable and explicit. It's very clear to
the reader that you're working with iterators and that items will be
consumed (something that's not reversible).
 
C

Carl Banks

As long as it breaks no rationally existing code, I can think of no
good reason why not to do this in a future python.

You would burden everyone who writes a custom iterator to provide a
__getitem__ method just because you're too lazy to type out the word
islice?


Carl Banks
 
S

Steven D'Aprano

I was just working with a generator for a tree that I wanted to skip the
first result (root node.)

And it occurs to me, why do we need to do:

import sys
from itertools import islice

my_iter = islice(my_iter, 1, sys.maxint)

When we could simply add slice operations to generators?

for x in my_iter[1:]:
pass


Feel free to extend the iterator protocol to your own iterators, but it
is a deliberately simple protocol. Many general iterators *can't* provide
random access to slices, at least not without a horrible amount of work.
Slicing support was dropped from xrange() years ago because it was
surprisingly difficult to get all the odd corner cases correct. Now
imagine trying to efficiently support this:

def digits_of_pi():
"""Yield the decimal digits of pi."""
# left as an exercise for the reader

digits_of_pi()[99000:12309988234:7901]

Or this:

def spider(url):
"""Follow hyperlinks from url, yielding the contents of each page."""
# also left as an exercise for the reader

spider()[999:154:-7]

If you want to support slicing, go right ahead, but please, I beg you,
don't force me to support it in my iterators!
 
E

Eloff

You would burden everyone who writes a custom iterator to provide a
__getitem__ method just because you're too lazy to type out the word
islice?

No, of course not. That would be stupid. Custom iterators are
iterators, so they would also get the default __getitem__ which would
work just perfectly for them. If they needed to override it, they
could. Remember, my proposal was to add a default __getitem__ for
iterators that is really just islice.

Ryles makes a good point though:
I think Python programmers have learned to expect certain things from
objects that support __getitem__. For example, indexing and slicing is
repeatable on the same object:

That indexing/slicing iterators is not repeatable is likely to negate
much of the advantage of supporting a larger interface (because it
would be supported inconsistently, passing an iterator to something
that expects __getitem__ could be surprising.)

That does not mean it has no value in working with iterators, it
offers the same value islice does, just more conveniently.

Steven:
If you want to support slicing, go right ahead, but please, I beg you,
don't force me to support it in my iterators!

As I said to Carl Banks, forcing you do something differently is not
my aim here. Your other points apply equally to islice, which already
exists, and do not invalidate its purpose.

The main use case I'm thinking about here is skipping elements or
limiting results from iterators, usually in for loops:

for x in my_iter[5:]:
...

for x in my_iter[:5]:
...

Clearly that's easier and more readable than using islice. Nobody has
yet to provide a concrete reason why that would be a _bad thing_. That
doesn't make it a _good thing_, either.

-Dan
 
G

Gabriel Genellina

No, of course not. That would be stupid. Custom iterators are
iterators, so they would also get the default __getitem__ which would
work just perfectly for them. If they needed to override it, they
could. Remember, my proposal was to add a default __getitem__ for
iterators that is really just islice.

Note that iterators don't have a common base class - so where would such
__getitem__ reside? *Anything* with a next/__next__ method is an iterator.
In some, very limited cases, you can 'inject' a __getitem__ method into an
existing iterator:

py> from itertools import islice
py>
py> def __getitem__(self, subscript):
.... if isinstance(subscript, slice):
.... return islice(self, subscript.start, subscript.stop,
subscript.step)
.... elif isinstance(subscript, int):
.... return next(islice(self, subscript, subscript+1, 1))
....
py> def add_slice_support(iterator):
.... oldtype = type(iterator)
.... newtype = type(oldtype.__name__, (oldtype,), {'__getitem__':
__getitem__})
.... iterator.__class__ = newtype
.... return iterator

Unfortunately this only works for user-defined iterators:

py> add_slice_support(range(30))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in add_slice_support
TypeError: __class__ assignment: only for heap types

And even then, there are surprises:

py> class Foo(object):
.... i = 0
.... def __iter__(self): return self
.... def next(self):
.... ret, self.i = self.i, self.i+1
.... return ret
....
py> it = add_slice_support(Foo())
py> print it[2]
2
py> for z in it[5:10]: print z
....
8 # 8???
9
10
11
12
py> for z in it[5:10]: print z
....
18 # 18???
19
20
21
22
py>
The main use case I'm thinking about here is skipping elements or
limiting results from iterators, usually in for loops:

for x in my_iter[5:]:
...

for x in my_iter[:5]:
...

Clearly that's easier and more readable than using islice. Nobody has
yet to provide a concrete reason why that would be a _bad thing_. That
doesn't make it a _good thing_, either.

You'll have to remember to never reuse my_iter again, as my example above
shows. Or keep track of the past items so you can adjust the indices. But
anyway you can't retrieve those past items, unless you maintain them all
in a list and take the memory penalty. Or just use islice when needed -
only six letters, and you avoid all those problems... ;)
 
C

Carl Banks

No, of course not. That would be stupid. Custom iterators are
iterators, so they would also get the default __getitem__ which would
work just perfectly for them.


Ok, here's a simple (Python 2.x) custom iterator.

class Alphabet(object):
def __init__(self):
self.c = 65
def __iter__(self):
return self
def next(self):
if self.c > 90:
raise StopIteration
letter = chr(self.c)
self.c += 1
return letter


Let's see what happens *currently* in Python when you try to apply
slice indexing.

for x in AB()[4:9]: # loop through letters 4 through 8
print x


This, as expected, raises a type error:

TypeError: 'AB' object is unsubscriptable


Now, let's try to modify Python to fix this. All we have to do is to
define __getitem__ for the default iterator, so let's define it in the
common iterator type, which is....

Hm, wait a second, problem here. The simple iterator I defined above
isn't an instance of any common iterator type. In fact there is no
common iterator type.

You see, iterator is a *protocol*. An object is not an iterator by
virtue of being an instance of an iterator type, it's an iterator
because it supports a protocol. In fact that protocol is extremely
simple: all an iterator has to do is define __iter__() to return self,
define next() to return next item, and raise StopIteration when it's
done.

So you see, to support slice syntax one would indeed burden all custom
iterators to implement __getitem__ method, which would arguably
increase the implementor's burden by 30%.

If they needed to override it, they
could. Remember, my proposal was to add a default __getitem__ for
iterators that is really just islice.

There is no default __getitem__ for iterators so this proposal won't
work.


Carl Banks
 
J

Jaime Buelta

El 16/10/2009 3:29, Eloff escribió:
I was just working with a generator for a tree that I wanted to skip
the first result (root node.)

And it occurs to me, why do we need to do:

import sys
from itertools import islice

my_iter = islice(my_iter, 1, sys.maxint)

When we could simply add slice operations to generators?

for x in my_iter[1:]:
pass

The way I figure it, only if there is no __getitem__ defined, and the
object has an __iter__ (i.e. not a list, but a iter(list) would be
ok), then there should be a default __getitem__ that is really just
islice, with similar limitations (no negative indices.)

The idea might need a bit of polish, but fundamentally it seems like
it could make it easier to deal with slicing generators?

At the least, stop=sys.maxint, step=1, and islice needs to accept
kwargs. So you could do islice(my_iter, start=1), that's an oversight
imho.

I think the only way not to complicate more the design of iterators
will be to make equivalent

my_iter[start:n:end]

to an iterator more or less this way (just syntactic sugar)

iter = my_iter()
for i in range(start):
next(iter)

for i in range(start,end):
value = next(iter)
if i % n == 0:
yield value

Unfortunatelly, to get the last one you have to iterate completely the
iterator, but these are the rules of the game...

Adding a __getitem__ seems pointless to me. Iterators are enough
complicated the way they are, it's not so simple when you see them the
first time.

Also, I think that the language must somehow difference between
iterators and container, as they are different concepts. You can
always do something like

for i in list(iterator)[5:10]:
do things

while it can be less efficient and less clean.

I don't know, could be worth discussing it... ;-)
 
A

Anh Hai Trinh

I've written something that is better than you could've imagine.

Get it here: <http://github.com/aht/stream.py>

It works with anything iterable, no need to alter anything.

from itertools import count
from stream import item
c = count()
c >> item[1:10:2]
->[1, 3, 5, 7, 9]
c >> item[:5]
->[10, 11, 12, 13, 14]

There is a ton more you could do with that library, i.e. piping & lazy-
evaluation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top