itertools.izip brokeness

P

Paul Rubin

Raymond Hettinger said:
Feel free to submit a feature request to the SF tracker (surprisingly,
this behavior has not been previously reported, nor have there any
related feature requests, nor was the use case contemplated in the PEP
discussions: http://www.python.org/peps/pep-0201 ).

What do you think of my suggestion of passing an optional arg to the
StopIteration constructor, that the caller can use to resume the
iterator or take other suitable recovery steps? Maybe this could
interact with PEP 343 in some useful way.
 
T

Tom Anderson

That should be the point of using anything in Python. The specific goal
for izip() was for an iterator version of zip(). Unfortunately, neither
tool fits your problem. At the root of it is the iterator protocol not
having an unget() method for pushing back unused elements of the data
stream.

An unget() isn't absolutely necessary - another way of doing it would be a
hasNext() method, as in Java, or a peek(), which gets the next item but
doesn't advance the iterator.

Here's some code (pardon the old-fashioned functional style in the
iter_foo methods):

import operator

class xiterable(object):
"""This is an entirely abstract class, just to document the
xiterable interface.

"""
def __iter__(self):
"""As in the traditional iterable protocol, returns an
iterator over this object. Note that this does not have to
be an xiterator.

"""
raise NotImplementedError
def __xiter__(self):
"""Returns an xiterator over this object.

"""
raise NotImplementedError

class xiterator(xiterable):
"""This is an entirely abstract class, just to document the xiter
interface.

The xiterable methods should return self.
"""
def hasNext(self):
"""Returns True if calling next would return a value, or
False if it would raise StopIteration.

"""
raise NotImplementedError
def next(self):
"""As in the traditional iterator protocol.

"""
raise NotImplementedError
def peek(self):
"""Returns the value that would be returned by a call to
next, but does not advance the iterator - the same value
will be returned by the next call to peek or next. If a
call to next would raise StopIteration, this method
raises StopIteration.

"""
raise NotImplementedError

def xiter(iterable):
if (hasattr(iterable, "__xiter__")):
return iterable.__xiter__()
else:
return xiterwrapper(iter(iterable))

class xiterwrapper(object):
def __init__(self, it):
self.it = it
self.advance()
def hasNext(self):
return hasattr(self, "_next")
def next(self):
try:
cur = self._next
self.advance()
return cur
except AttributeError:
raise StopIteration
def peek(self):
try:
return self._next
except AttributeError:
raise StopIteration
def advance(self):
try:
self._next = self.it.next()
except StopIteration:
if (hasattr(self, "_next")):
del self._next
def __xiter__(self):
return self
def __iter__(self):
return self

def izip_hasnext(*xiters):
xiters = map(xiter, xiters)
while True:
if (reduce(operator.and_, map(hasnext, xiters))):
yield tuple(map(getnext, xiters))
else:
raise StopIteration

def hasnext(xit):
return xit.hasNext()

def getnext(it):
return it.next()

def izip_peek(*xiters):
xiters = map(xiter, xiters)
while True:
z = tuple(map(peek, xiters))
map(advance, xiters)
yield z

def peek(xit):
return xit.peek()

def advance(xit):
return xit.advance()

Anyway, you get the general idea.

tom
 
R

rurpy

Raymond Hettinger said:
That is a reasonable use case that is not supported by zip() or izip()
as currently implemented.

I haven't thought a lot about zip because I haven't needed to.
I would phrase this as "...not supported by the itertools module...".
If it makes sense to extend izip() to provide end-of-longest
iteration, fine. If not that adding an izip_longest() to itertools
(and perhaps a coresponding imap and whatever else shares
the terminate-at-shortest behavior.)
That should be the point of using anything in Python. The specific
goal for izip() was for an iterator version of zip(). Unfortunately,
neither tool fits your problem. At the root of it is the iterator
protocol not having an unget() method for pushing back unused elements
of the data stream.

I don't understand this. Why do you need look ahead? (I
mean that literally, I am not disagreeing in a veiled way.)

This is my (mis?)understanding of how izip works:
- izip is a class
- when instantiated, it returns another iterator object, call it "x".
- the x object (being an iterator) has a next method that
returns a list of the next values returned by all the iterators
given when x was created.

So why can't izip's next method collect the results of
it's set of argument iterators, as I presume it does now,
except when one of them starts generating StopIteration
exceptions, an alternate value is placed in the result list.
When all the iterators start generating exceptions, izip
itself raises a StopIteration to signal that all the iterators
have reached exhaustion. This is what the code I posted
in a message last night does. Why is something like that
not acceptable?

All this talk of pushbacks and returning shorter lists of
unexhausted iterators makes me think I am misunderstanding
something.
I'll add a note to the docs.


Feel free to submit a feature request to the SF tracker (surprisingly,
this behavior has not been previously reported, nor have there any
related feature requests, nor was the use case contemplated in the PEP
discussions: http://www.python.org/peps/pep-0201 ).

Yes, this is interesting. In the print multiple columns"
example I presented, I felt the use of izip() met the
"one obvious way" test. The resulting code was simple
and clear. The real-world case where I ran into the
problem was comparing two files until two different
lines were found. Again, izip was the "one obvious
way".

So yes it is surprising and disturbing that these use
cases were not identified. I wonder what other features
that "should" be in Python, were similarly missed?
And more importantly what needs to change, to fix
the problem?
 
R

rurpy

I don't understand this. Why do you need look ahead?

Just before I posted, I got it (I think) but didn't want to
rewrite everything. The need for unget() (or peek(), etc)
is to fix the thrown-away-data problem in izip(), right?

As an easier alternative, what about leaving izip() alone
and simply documenting that behavior. That is, izip()
is not appropriate for use with unequal length iterables
unless you don't care what happens after the shortest,
and the state of the iterables is undefined after izip().

Then have an izip2() or a flag to izip() that changes it's
behavior, that results in iteration to the end of the
longest sequence.

This seems to me clean and symetrical -- one form
for iteration up to the shortest, the other form iterates
to the longest.
 
R

Raymond Hettinger

Paul said:
What do you think of my suggestion of passing an optional arg to the
StopIteration constructor, that the caller can use to resume the
iterator or take other suitable recovery steps? Maybe this could
interact with PEP 343 in some useful way.

Probably unworkable. Complex to explain and use. Makes the API
heavy. Hard to assure retro-fitting for every possible kind of
iterator or iterable.

Am not sure of the best solution:

1. Could add an optional arg to zip()/izip() with a mutable container
to hold a final, incomplete tuple: final=[]; zip(a,b,leftover=final).
This approach is kludgy and unlikely to lead to beautiful code, but it
does at least make accessible data that would otherwise be tossed.

2. Could add a new function with None fill-in -- essentially an
iterator version of map(None, a,b). Instead of None, a user specified
default value would be helpful in cases where the input data stream
could potentially have None as a valid data element. The function
would also need periodic signal checks to make it possible to break-out
if one the inputs is infinite. How or whether such a function would be
used can likely be answered by mining real-world code for cases where
map's None fill-in feature was used.

3. Could point people to the roundrobin() recipe in the
collections.deque docs -- it solves a closely related problem but is
not exactly what the OP needed (his use case required knowing which
iterator gave birth to each datum).

4. Could punt and leave this for straight-forward while-loop coding.
Though the use case seems like it would be common, there may be a
reason this hasn't come up since zip() was introduced way back in
Py2.0.

5. Could create an iterator wrapper that remembers its last accessed
item and whether StopIteration has been raised. While less direct than
a customized zip method, the wrapper may be useful in contexts other
than zipping -- essentially, anywhere it is inconvenient to have just
consumed an iterator element. Testing the wrapper object for
StopIteration would be akin to else-clauses in a for-loop. OTOH, this
approach is at odds with the notion of side-effect free functional
programming and the purported benefits of that programming style.



Raymond Hettinger
 
B

Bengt Richter

Paul said:
What do you think of my suggestion of passing an optional arg to the
StopIteration constructor, that the caller can use to resume the
iterator or take other suitable recovery steps? Maybe this could
interact with PEP 343 in some useful way.

Probably unworkable. Complex to explain and use. Makes the API
heavy. Hard to assure retro-fitting for every possible kind of
iterator or iterable.

Am not sure of the best solution:

1. Could add an optional arg to zip()/izip() with a mutable container
to hold a final, incomplete tuple: final=[]; zip(a,b,leftover=final).
This approach is kludgy and unlikely to lead to beautiful code, but it
does at least make accessible data that would otherwise be tossed.

2. Could add a new function with None fill-in -- essentially an
iterator version of map(None, a,b). Instead of None, a user specified
default value would be helpful in cases where the input data stream
could potentially have None as a valid data element. The function
would also need periodic signal checks to make it possible to break-out
if one the inputs is infinite. How or whether such a function would be
used can likely be answered by mining real-world code for cases where
map's None fill-in feature was used.

3. Could point people to the roundrobin() recipe in the
collections.deque docs -- it solves a closely related problem but is
not exactly what the OP needed (his use case required knowing which
iterator gave birth to each datum).

4. Could punt and leave this for straight-forward while-loop coding.
Though the use case seems like it would be common, there may be a
reason this hasn't come up since zip() was introduced way back in
Py2.0.

5. Could create an iterator wrapper that remembers its last accessed
item and whether StopIteration has been raised. While less direct than
a customized zip method, the wrapper may be useful in contexts other
than zipping -- essentially, anywhere it is inconvenient to have just
consumed an iterator element. Testing the wrapper object for
StopIteration would be akin to else-clauses in a for-loop. OTOH, this
approach is at odds with the notion of side-effect free functional
programming and the purported benefits of that programming style.
6. Could modify izip so that one could write

from itertools import izip
zipit = izip(*seqs) # bind iterator object to preserve access to its state later
for tup in zipit:
# do something with tup as now produced
for tup in zipit.rest(sentinel):
# tup starts with the tuple that would have been returned if all sequences
# had been sampled and sentinel substituted where StopIteration happened.
# continuing until but not including (sentinel,)*len(seqs)

This would seem backwards compatible, and also potentially allow you to use the rest mode
from the start, as in

for tup in izip(*seqs).rest(sentinel):
# process tup and notice sentinel for yourself


Regards,
Bengt Richter
 
A

Antoon Pardon

4) If a need does arise, it can be met by __builtins__.map() or by
writing: chain(iterable, repeat(None)).

Yes, if youre a python guru. I don't even understand the
code presented in this thread that uses chain/repeat,

And it wouldn't work in this case. chain(iterable, repeat(None))
changes your iterable into an iterator that first gives you
all elements in the iterator and when these are exhausted
will continue giving the repeat parameter. e.g.

chain([3,5,8],repeat("Bye")

Will produce 3, 5 and 8 followed by an endless stream
of "Bye".

But if you do this with all iterables, and you have to
because you don't know which one is the smaller, all
iterators will be infinite and izip will never stop.
 
B

Bengt Richter

4) If a need does arise, it can be met by __builtins__.map() or by
writing: chain(iterable, repeat(None)).

Yes, if youre a python guru. I don't even understand the
code presented in this thread that uses chain/repeat,

And it wouldn't work in this case. chain(iterable, repeat(None))
changes your iterable into an iterator that first gives you
all elements in the iterator and when these are exhausted
will continue giving the repeat parameter. e.g.

chain([3,5,8],repeat("Bye")

Will produce 3, 5 and 8 followed by an endless stream
of "Bye".

But if you do this with all iterables, and you have to
because you don't know which one is the smaller, all
iterators will be infinite and izip will never stop.

But you can fix that (only test is what you see ;-) :
>>> from itertools import repeat, chain, izip
>>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
>>> for t in it: print t
...
(3, 11)
(5, 22)
(8, 'Bye')

(Feel free to generalize ;-)

Regards,
Bengt Richter
 
R

rurpy

Bengt said:
But here is my real question...
Why isn't something like this in itertools, or why shouldn't
it go into itertools?


4) If a need does arise, it can be met by __builtins__.map() or by
writing: chain(iterable, repeat(None)).

Yes, if youre a python guru. I don't even understand the
code presented in this thread that uses chain/repeat,

And it wouldn't work in this case. chain(iterable, repeat(None))
changes your iterable into an iterator that first gives you
all elements in the iterator and when these are exhausted
will continue giving the repeat parameter. e.g.

chain([3,5,8],repeat("Bye")

Will produce 3, 5 and 8 followed by an endless stream
of "Bye".

But if you do this with all iterables, and you have to
because you don't know which one is the smaller, all
iterators will be infinite and izip will never stop.

But you can fix that (only test is what you see ;-) :
from itertools import repeat, chain, izip
it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
for t in it: print t
...
(3, 11)
(5, 22)
(8, 'Bye')

(Feel free to generalize ;-)

Which just reinforces my original point: if leaving
out a feature is justified by the existence of some
alternate method, then that method must be equally
obvious as the missing feature, or must be documented
as an idiom. Otherwise, the justification fails.

Is the above code as obvious as
izip([3,5,8],[11,22],sentinal='Bye')?
(where the sentinal keyword causes izip to iterate
to the longest argument.)
 
B

Bengt Richter

[ ... 5 options enumerated ... ]
6. Could modify izip so that one could write

from itertools import izip
zipit = izip(*seqs) # bind iterator object to preserve access to its state later
for tup in zipit:
# do something with tup as now produced
for tup in zipit.rest(sentinel):
# tup starts with the tuple that would have been returned if all sequences
# had been sampled and sentinel substituted where StopIteration happened.
# continuing until but not including (sentinel,)*len(seqs)

This would seem backwards compatible, and also potentially allow you to use the rest mode
from the start, as in

for tup in izip(*seqs).rest(sentinel):
# process tup and notice sentinel for yourself
Demo-of-concept hack: only tested as you see below

----< izip2.py >-----------------------------------------------------
class izip2(object):
"""
works like itertools.izip except that
if a reference (e.g. it) to the stopped iterator is preserved,
it.rest(sentinel) returns an iterator that will continue
to return tuples with sentinel substituted for items from
exhausted sequences, until all sequences are exhausted.
"""
FIRST, FIRST_STOP, FIRST_REST, REST, REST_STOP = xrange(5)
def __init__(self, *seqs):
self.iters = map(iter, seqs)
self.restmode = self.FIRST
def __iter__(self): return self
def next(self):
if not self.iters: raise StopIteration
if self.restmode == self.FIRST:
tup=[]
try:
for i, it in enumerate(self.iters):
tup.append(it.next())
return tuple(tup)
except StopIteration:
self.restmode = self.FIRST_STOP # stopped, not rest-restarted
self.tup=tup;self.i=i
raise
elif self.restmode==self.FIRST_STOP: # normal part exhausted
raise StopIteration
elif self.restmode in (self.FIRST_REST, self.REST):
if self.restmode == self.FIRST_REST:
tup = self.tup # saved
self.restmode = self.REST
else:
tup=[]
for it in self.iters:
try: tup.append(it.next())
except StopIteration: tup.append(self.sentinel)
tup = tuple(tup)
if tup==(self.sentinel,)*len(self.iters):
self.restmode = self.REST_STOP
raise StopIteration
return tuple(tup)
elif self.restmode==self.REST_STOP: # rest part exhausted
raise StopIteration
else:
raise RuntimeError('Bad restmode: %r'%self.restmode)
def rest(self, sentinel=''):
self.sentinel = sentinel
if self.restmode==self.FIRST: # prior to any sequence end
self.restmode = self.REST
return self
self.tup.append(sentinel)
for it in self.iters[self.i+1:]:
try: self.tup.append(it.next())
except StopIteration: self.tup.append(sentinel)
self.restmode = self.FIRST_REST
return self

def test():
assert list(izip2())==[]
assert list(izip2().rest(''))==[]
it = izip2('ab', '1')
assert list(it)==[('a', '1')]
assert list(it.rest())==[('b', '')]
it = izip2('a', '12')
assert list(it)==[('a', '1')]
assert list(it.rest())==[('', '2')]
it = izip2('ab', '12')
assert list(it)==[('a', '1'), ('b', '2')]
assert list(it.rest())==[]
it = izip2(xrange(3), (11,22), 'abcd')
assert list(it) == [(0, 11, 'a'), (1, 22, 'b')]
assert list(it.rest(None)) == [(2, None, 'c'), (None, None, 'd')]
print 'test passed'

if __name__ == '__main__': test()
---------------------------------------------------------------------

Using this, Antoon's example becomes:
>>> from izip2 import izip2
>>> it = izip2([3,5,8], [11,22])
>>> for t in it: print t
...
(3, 11)
(5, 22) ...
(8, 'Bye')

Want to make an efficient C version, Raymond? ;-)

Regards,
Bengt Richter
 
M

Michael Spencer

Bengt Richter wrote: ....
from itertools import repeat, chain, izip
it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
for t in it: print t
...
(3, 11)
(5, 22)
(8, 'Bye')

(Feel free to generalize ;-)

Is the above code as obvious as
izip([3,5,8],[11,22],sentinal='Bye')?
(where the sentinal keyword causes izip to iterate
to the longest argument.)

How about:

from itertools import repeat

def izip2(*iterables, **kw):
"""kw:fill. An element that will pad the shorter iterable"""
fill = repeat(kw.get("fill"))
iterables = map(iter, iterables)
iters = range(len(iterables))

for i in range(10):
result = []
for idx in iters:
try:
result.append(iterables[idx].next())
except StopIteration:
iterables[idx] = fill
if iterables.count(fill) == len(iterables):
raise
result.append(fill.next())
yield tuple(result)
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, None), (3, None, 3, None), (4, None, 4,
None), (None, None, 5, None), (None, None, 6, None), (None, None, 7, None)] [(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, 'Empty'), (3, 'Empty', 3, 'Empty'), (4,
'Empty', 4, 'Empty'), ('Empty', 'Empty', 5, 'Empty'), ('Empty', 'Empty', 6,
'Empty'), ('Empty', 'Empty', 7, 'Empty')]
Michael
 
P

Paul Rubin

Michael Spencer said:
for i in range(10):
result = []
...

Do you mean "while True: ..."?

def izip2(*iterables, **kw):
"""kw:fill. An element that will pad the shorter iterable"""
fill = repeat(kw.get("fill"))

Yet another attempt (untested, uses Python 2.5 conditional expression):

from itertools import chain, repeat, dropwhile
def izip2(*iterables, **kw):
fill = kw.get('fill'))
sentinel = object()
iterables = [chain(i, repeat(sentinel)) for i in iterables]
while True:
t = [i.next() for i in iterables]

# raise StopIteration immediately if all iterators are now empty
dropwhile(lambda v: v is sentinel, t).next()

# map all sentinels to the fill value and yield resulting tuple
yield tuple([(v if v is not sentinel else fill) for v in t])
 
M

Michael Spencer

Paul said:
Michael Spencer said:
for i in range(10):
result = []
...

Do you mean "while True: ..."?
oops, yes!

so, this should have been:

from itertools import repeat

def izip2(*iterables, **kw):
"""kw:fill. An element that will pad the shorter iterable"""
fill = repeat(kw.get("fill"))
iterables = map(iter, iterables)
iters = range(len(iterables))

while True:
result = []
for idx in iters:
try:
result.append(iterables[idx].next())
except StopIteration:
iterables[idx] = fill
if iterables.count(fill) == len(iterables):
raise
result.append(fill.next())
yield tuple(result)

Michael
 
R

rurpy

Michael Spencer said:
Bengt Richter wrote: ...
from itertools import repeat, chain, izip
it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
for t in it: print t
...
(3, 11)
(5, 22)
(8, 'Bye')

(Feel free to generalize ;-)

Is the above code as obvious as
izip([3,5,8],[11,22],sentinal='Bye')?
(where the sentinal keyword causes izip to iterate
to the longest argument.)

How about:

from itertools import repeat

def izip2(*iterables, **kw):
"""kw:fill. An element that will pad the shorter iterable"""
fill = repeat(kw.get("fill"))
iterables = map(iter, iterables)
iters = range(len(iterables))

for i in range(10):
result = []
for idx in iters:
try:
result.append(iterables[idx].next())
except StopIteration:
iterables[idx] = fill
if iterables.count(fill) == len(iterables):
raise
result.append(fill.next())
yield tuple(result)
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, None), (3, None, 3, None), (4, None, 4,
None), (None, None, 5, None), (None, None, 6, None), (None, None, 7, None)][(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, 'Empty'), (3, 'Empty', 3, 'Empty'), (4,
'Empty', 4, 'Empty'), ('Empty', 'Empty', 5, 'Empty'), ('Empty', 'Empty', 6,
'Empty'), ('Empty', 'Empty', 7, 'Empty')]

This may be getting too kludgey but by counting the
exhausted iterators you can allow for arguments
containing infinite iterators:

def izip4(*iterables, **kw):
"""kw:fill. An element that will pad the shorter iterable
kw:infinite. Number of non-terminating iterators """
fill = repeat(kw.get("fill"))
iterables = map(iter, iterables)
iters = range(len(iterables))
finite_cnt = len(iterables) - kw.get("infinite", 0)

while True:
result = []
for idx in iters:
try:
result.append(iterables[idx].next())
except StopIteration:
iterables[idx] = fill
finite_cnt -= 1
if finite_cnt == 0:
raise
result.append(fill.next())
yield tuple(result)
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, 'empty'), (3, 'empty', 3,
'empty'),
(4, 'empty', 4, 'empty'), ('empty', 'empty', 5, 'empty'),
('empty', 'empty', 6, 'empty'), ('empty', 'empty', 7, 'empty')]
[(0, 'foo', 0, 0), (1, 'foo', 1, 1), (2, 'foo', 2, 2), (3, 'foo', 3,
3), (4, 'foo', 4, 4),
('empty', 'foo', 5, 5), ('empty', 'foo', 6, 6), ('empty', 'foo', 7, 7)]
 
P

Paul Rubin

def izip4(*iterables, **kw):
"""kw:fill. An element that will pad the shorter iterable
kw:infinite. Number of non-terminating iterators """

That's a really kludgy API. I'm not sure what to propose instead:
maybe some way of distinguishing which iterables are supposed to be
iterated til exhaustion (untested):

class Discardable(object): pass

def izip5(*iterables, fill=None):
"""Run until all non-discardable iterators are exhausted"""
while True:
# exhausted iterables will put empty tuples into t
# non-exhausted iterables will put singleton tuples there
t = [tuple(islice(i,1)) for i in iterables]

# quit if only discardables are left
dropwhile(lambda i,t: (not isinstance(i, Discardable)) and len(t)),
izip(t, iterables)).next()

yield tuple([(v[0] if len(t) else fill) for v in t])

Then you'd wrap "infinite" and other iterators you don't need exhausted
in Discardable:

stream = izip5(a, b, Discardable(c), d, Discardable(e), fill='')

runs until a, b, and d are all exhausted.
 
P

Paul Rubin

Paul Rubin said:
# quit if only discardables are left
dropwhile(lambda i,t: (not isinstance(i, Discardable)) and len(t)),
izip(t, iterables)).next()

Ehh, that should say dropwhile(lambda (t,i): ...) to use tuple
unpacking and get the args in the right order. I'm sleepy and forgot
what I was doing. Of course I'm still not sure it's right.
 
B

Bengt Richter

Bengt said:
But here is my real question...
Why isn't something like this in itertools, or why shouldn't
it go into itertools?


4) If a need does arise, it can be met by __builtins__.map() or by
writing: chain(iterable, repeat(None)).

Yes, if youre a python guru. I don't even understand the
code presented in this thread that uses chain/repeat,

And it wouldn't work in this case. chain(iterable, repeat(None))
changes your iterable into an iterator that first gives you
all elements in the iterator and when these are exhausted
will continue giving the repeat parameter. e.g.

chain([3,5,8],repeat("Bye")

Will produce 3, 5 and 8 followed by an endless stream
of "Bye".

But if you do this with all iterables, and you have to
because you don't know which one is the smaller, all
iterators will be infinite and izip will never stop.

But you can fix that (only test is what you see ;-) :
from itertools import repeat, chain, izip
it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
for t in it: print t
...
(3, 11)
(5, 22)
(8, 'Bye')

(Feel free to generalize ;-)

Which just reinforces my original point: if leaving
out a feature is justified by the existence of some
alternate method, then that method must be equally
obvious as the missing feature, or must be documented
as an idiom. Otherwise, the justification fails.

Is the above code as obvious as
izip([3,5,8],[11,22],sentinal='Bye')?
(where the sentinal keyword causes izip to iterate
to the longest argument.)
You are right. I was just responding with a quick fix to the
problem Antoon noted.
For a more flexible izip including the above capability, but
also abble to do the default izip with a capability of continuing iteration
in the above mode after the normal izip mode stops, see izip2.py in my other
post in this thread.

Regards,
Bengt Richter
 
S

Steven D'Aprano

That's a really kludgy API. I'm not sure what to propose instead:
maybe some way of distinguishing which iterables are supposed to be
iterated til exhaustion (untested):

class Discardable(object): pass

def izip5(*iterables, fill=None):

Doesn't work: keyword arguments must be listed before * and ** arguments.
File "<stdin>", line 1
def izip5(*iterables, fill=None):
^
SyntaxError: invalid syntax


Personally, I don't see anything wrong with an API of

function(*iterators [, fill]):
Perform function on one or more iterators, with an optional fill
object.

Of course, this has to be defined in code as:

def function(*iterators, **kwargs):
if kwargs.keys() != ["fill"]:
raise ValueError
...

It might not be the easiest API to extend, but for a special case like
this, I think it is perfectly usable.
 
P

Paul Rubin

Steven D'Aprano said:
Doesn't work: keyword arguments must be listed before * and ** arguments.

Eh, ok, gotta use **kw.
def function(*iterators, **kwargs):
if kwargs.keys() != ["fill"]:
raise ValueError
...

It might not be the easiest API to extend, but for a special case like
this, I think it is perfectly usable.

Yeah, that's what the earlier version had. I tried to bypass it but
as you pointed out, it's a syntax error. The code I posted also has a
deliberate syntax error (until Python 2.5), namely the use of the new
conditional expression syntax (PEP 308). That could be worked around
of course.
 
A

Antoon Pardon

Op 2006-01-05 said:
But here is my real question...
Why isn't something like this in itertools, or why shouldn't
it go into itertools?


4) If a need does arise, it can be met by __builtins__.map() or by
writing: chain(iterable, repeat(None)).

Yes, if youre a python guru. I don't even understand the
code presented in this thread that uses chain/repeat,

And it wouldn't work in this case. chain(iterable, repeat(None))
changes your iterable into an iterator that first gives you
all elements in the iterator and when these are exhausted
will continue giving the repeat parameter. e.g.

chain([3,5,8],repeat("Bye")

Will produce 3, 5 and 8 followed by an endless stream
of "Bye".

But if you do this with all iterables, and you have to
because you don't know which one is the smaller, all
iterators will be infinite and izip will never stop.

But you can fix that (only test is what you see ;-) :

Maybe, but not with this version.
from itertools import repeat, chain, izip
it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
for t in it: print t
...
(3, 11)
(5, 22)
(8, 'Bye')

(Feel free to generalize ;-)

The problem with this version is that it will stop if for some reason
each iterable contains a 'Bye' at the same place. Now this may seem
far fetched at first. But consider that if data is collected from
experiments certain values may be missing. This can be indicated
by a special "Missing Data" value in an iterable. But this "Missing
Data" value would also be the prime canidate for a fill parameter
when an iterable is exhausted.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,244
Latest member
cryptotaxsoftware12

Latest Threads

Top