Early halt for iterating a_list and iter(a_list)

L

Lie

When you've got a nested loop a StopIteration in the Inner Loop would
break the loop for the outer loop too:

a, b, c = [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]

def looper(a, b, c):
for a_ in a:
for b_ in b:
for c_ in c:
print a_, b_, c_

looper(a, b, c) # Intended behavior [1]
a, b, c = iter(a), b, iter(c) # b is intentionally not iter()-ed
looper(a, b, c) # Inner StopIteration prematurely halt outer loop [2]

[1]
1 1 1
1 1 2
.... a very long result ...
3 4 4
3 4 5

[2]
1 1 1
1 1 2
1 1 3
1 1 4
1 1 5

Why is this behavior? Or is it a bug?

This is a potential problem since it is possible that a function that
takes an iterable and utilizes multi-level looping could be
prematurely halted and possibly left in intermediate state just by
passing an iterator.

A similar behavior also exist in list comprehension.
a, b, c = [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]
[[[(a_, b_, c_) for a_ in a] for b_ in b] for c_ in c]
[[[(1, 1, 1), (2, 1, 1), ... result snipped ... , (3, 5, 6), (4, 5,
6)]]]
a, b, c = [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]
a, b, c = iter(a), b, iter(c)
[[[(a_, b_, c_) for a_ in a] for b_ in b] for c_ in c]
[[[(1, 1, 1), (2, 1, 1), (3, 1, 1), (4, 1, 1)], [], [], [], []], [[],
[], [], [], []], [[], [], [], [], []], [[], [], [], [], []], [[], [],
[], [], []], [[], [], [], [], []]]
 
F

Fredrik Lundh

Lie said:
When you've got a nested loop a StopIteration in the Inner Loop would
break the loop for the outer loop too:

a, b, c = [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]

def looper(a, b, c):
for a_ in a:
for b_ in b:
for c_ in c:
print a_, b_, c_

looper(a, b, c) # Intended behavior [1]
a, b, c = iter(a), b, iter(c) # b is intentionally not iter()-ed
looper(a, b, c) # Inner StopIteration prematurely halt outer loop [2]

iterators are once-only objects. there's nothing left in "c" when you
enter the inner loop the second time, so nothing is printed.
.... print i
....
0
1
2
3
4.... print i
....
> This is a potential problem since it is possible that a function that
> takes an iterable and utilizes multi-level looping could be
> prematurely halted and possibly left in intermediate state just by
> passing an iterator.

it's a problem only if you confuse iterators with sequences.

</F>
 
L

Lie

Lie said:
When you've got a nested loop a StopIteration in the Inner Loop would
break the loop for the outer loop too:
a, b, c = [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5]
def looper(a, b, c):
    for a_ in a:
        for b_ in b:
            for c_ in c:
                print a_, b_, c_
looper(a, b, c)  # Intended behavior [1]
a, b, c = iter(a), b, iter(c)  # b is intentionally not iter()-ed
looper(a, b, c)  # Inner StopIteration prematurely halt outer loop [2]

iterators are once-only objects.  there's nothing left in "c" when you
enter the inner loop the second time, so nothing is printed.

Ah, now I see. You have to "restart" the iterator if you want to use
it the second time (is it possible to do that?).
 >>> a = range(10)
 >>> a = range(5)
 >>> a = iter(a)
 >>> for i in a:
...     print i
...
0
1
2
3
4
 >>> for i in a:
...     print i
...
 >>>

 > This is a potential problem since it is possible that a function that
 > takes an iterable and utilizes multi-level looping could be
 > prematurely halted and possibly left in intermediate state just by
 > passing an iterator.

it's a problem only if you confuse iterators with sequences.

I see, but if/when a function expects a sequence but then fed with an
iterator, it would be against duck-typing to check whether something
is a sequence or an iterator, but iterator is good for one iteration
only while sequence is good for multiple usage. So is there a clean
way to handle this? (i.e. a design pattern that allows sequence and
iterator to be treated with the same code)

If there is no such design pattern for that problem, should one be
available? I'm thinking of one: "all iterables would have
iterable.restart() method, which is defined as 'restarting the
iterator for iterator' or 'do nothing for sequences'."

Then both sequence and list can be treated like this:
for a_ in a:
b.restart()
for b_ in b:
c.restart()
for c_ in c:
print a_, b_, c_

This is useful if b, c is huge (so copying the list is out of the
question) but needs to be used multiple times in a nested loop. For
the general cases where the iterators is used only once,
iterable.restart() wouldn't need to be called at all and wouldn't even
need a code change (on the spirit of make the common things easy and
the rare thing possible).

Wait a minute... I've got an idea, we could use itertools.tee to copy
the iterator and iterating on the copy, like this right?:

for a_ in a:
b, b_copy = itertools.tee(b)
for b_ in b_copy:
c, c_copy = itertools.tee(c)
for c_ in c_copy:
print a_, b_, c_

That works with both "requirement": able to handle sequence and
iterator with the same code and the code for common cases where
iterators are used once only wouldn't need to be changed.
Personally though, I don't think it's a clean solution, looks a bit of
hackery.
 
S

Steven D'Aprano

On Fri, 22 Aug 2008 07:23:18 -0700, Lie wrote:

[...]
Ah, now I see. You have to "restart" the iterator if you want to use it
the second time (is it possible to do that?).

In general, no, iterators can't be restarted. Think of it as squeezing
toothpaste out of a tube. You can't generally reverse the process.


[...]
I see, but if/when a function expects a sequence but then fed with an
iterator, it would be against duck-typing to check whether something is
a sequence or an iterator, but iterator is good for one iteration only
while sequence is good for multiple usage. So is there a clean way to
handle this? (i.e. a design pattern that allows sequence and iterator to
be treated with the same code)

The only clean way to treat iterators and sequences identically is to
limit yourself to behaviour that both use. That pretty much means a
simple for loop:

for item in iterator_or_sequence:
do_something(item)

Fortunately that's an incredibly useful pattern.

I often find myself converting sequences to iterators, so I can handle
both types identically:

def treat_first_item_specially(iterator_or_sequence):
it = iter(iterator_or_sequence)
try:
first_item(it.next)
except StopIteration:
pass
else:
for item in it:
other_items(item)



If there is no such design pattern for that problem, should one be
available? I'm thinking of one: "all iterables would have
iterable.restart() method, which is defined as 'restarting the iterator
for iterator' or 'do nothing for sequences'."

But not all iterators can be restarted. Here's a contrived example:

def once_only_iterator(directory):
"""Return the name of files being deleted."""
for filename in os.list(directory):
yield filename
os.remove(filename)


You can't restart that one, at least not with a *lot* of effort.

In general, the only ways to restart an arbitrary iterator are:

(1) make a copy of everything the iterator returns, then iterate over the
copy; or

(2) exploit idiosyncratic knowledge about the specific iterator in
question.

That in turn may mean: find the non-iterator data that your iterator
uses, and use it again.

e.g.

data = {'a': 1, 'b': 2, 'c': 4, 'd': 8}
def make_iterator(data):
items = sorted(data.items())
for item in items:
yield item

it = make_iterator(data)
for i in it:
print i

# Restart the iterator.
it = make_iterator(data)

That's not exactly what you were hoping for, but in the generic case of
arbitrary iterators, that's the best you're going to get.

Another example of exploiting specific knowledge about the iterator is
that, starting from Python 2.5, generators become co-routines that can
accept information as well as yield it. I suggest you read this:

http://docs.python.org/whatsnew/pep-342.html

but note carefully that you can't just call send() on any arbitrary
iterator and expect it to do something sensible.

Lastly, you can write your own iterator, and give it it's own restart()
method. I recommend the exercise. Once you see how much specific
knowledge of the iterator is required, you may understand why there can't
possibly be a generic restart() method that works on arbitrary iterators.


[...]
Wait a minute... I've got an idea, we could use itertools.tee to copy
the iterator and iterating on the copy, like this right?:

for a_ in a:
b, b_copy = itertools.tee(b)
for b_ in b_copy:
c, c_copy = itertools.tee(c)
for c_ in c_copy:
print a_, b_, c_

That works with both "requirement": able to handle sequence and iterator
with the same code and the code for common cases where iterators are
used once only wouldn't need to be changed. Personally though, I don't
think it's a clean solution, looks a bit of hackery.

itertools.tee() works by keeping a copy of the iterator's return values.
If your iterator is so huge you can't make a copy of its data, then tee()
will fail.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top