Split iterator into multiple streams

  • Thread starter Steven D'Aprano
  • Start date
S

Steven D'Aprano

Suppose I have an iterator that yields tuples of N items (a, b, ... n).

I want to split this into N independent iterators:

iter1 -> a, a2, a3, ...
iter2 -> b, b2, b3, ...
....
iterN -> n, n2, n3, ...

The iterator may be infinite, or at least too big to collect in a list.

My first attempt was this:


def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
iterators.append((t for t in iterator))
return tuple(iterators)

But it doesn't work, as all the iterators see the same values:
data = [(1,2,3), (4,5,6), (7,8,9)]
a, b, c = split(data, 3)
list(a), list(b), list(c)
([3, 6, 9], [3, 6, 9], [3, 6, 9])


I tried changing the t to use operator.itergetter instead, but no
luck. Finally I got this:

def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
f = lambda it, i=i: (t for t in it)
iterators.append(f(iterator))
return tuple(iterators)

which seems to work:
data = [(1,2,3), (4,5,6), (7,8,9)]
a, b, c = split(data, 3)
list(a), list(b), list(c)
([1, 4, 7], [2, 5, 8], [3, 6, 9])




Is this the right approach, or have I missed something obvious?
 
I

Ian

My first attempt was this:

def split(iterable, n):
    iterators = []
    for i, iterator in enumerate(itertools.tee(iterable, n)):
        iterators.append((t for t in iterator))
    return tuple(iterators)

But it doesn't work, as all the iterators see the same values:


Because the value of i is not evaluated until the generator is
actually run; so all the generators end up seeing only the final value
of i rather than the intended values. This is a common problem with
generator expressions that are not immediately run.
I tried changing the t to use operator.itergetter instead, but no
luck. Finally I got this:

def split(iterable, n):
    iterators = []
    for i, iterator in enumerate(itertools.tee(iterable, n)):
        f = lambda it, i=i: (t for t in it)
        iterators.append(f(iterator))
    return tuple(iterators)

which seems to work:
data = [(1,2,3), (4,5,6), (7,8,9)]
a, b, c = split(data, 3)
list(a), list(b), list(c)

([1, 4, 7], [2, 5, 8], [3, 6, 9])

Is this the right approach, or have I missed something obvious?


That avoids the generator problem, but in this case you could get the
same result a bit more straight-forwardly by just using imap instead:

def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
iterators.append(itertools.imap(operator.itemgetter(i),
iterator))
return tuple(iterators)
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Cheers,
Ian
 
P

Peter Otten

Steven said:
Suppose I have an iterator that yields tuples of N items (a, b, ... n).

I want to split this into N independent iterators:

iter1 -> a, a2, a3, ...
iter2 -> b, b2, b3, ...
...
iterN -> n, n2, n3, ...

The iterator may be infinite, or at least too big to collect in a list.

My first attempt was this:


def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
iterators.append((t for t in iterator))
return tuple(iterators)

But it doesn't work, as all the iterators see the same values:
data = [(1,2,3), (4,5,6), (7,8,9)]
a, b, c = split(data, 3)
list(a), list(b), list(c)
([3, 6, 9], [3, 6, 9], [3, 6, 9])


I tried changing the t to use operator.itergetter instead, but no
luck. Finally I got this:

def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
f = lambda it, i=i: (t for t in it)
iterators.append(f(iterator))
return tuple(iterators)

which seems to work:
data = [(1,2,3), (4,5,6), (7,8,9)]
a, b, c = split(data, 3)
list(a), list(b), list(c)
([1, 4, 7], [2, 5, 8], [3, 6, 9])




Is this the right approach, or have I missed something obvious?


Here's how to do it with operator.itemgetter():
from itertools import *
from operator import itemgetter
data = [(1,2,3), (4,5,6), (7,8,9)]
abc = [imap(itemgetter(i), t) for i, t in enumerate(tee(data, 3))]
map(list, abc)
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

I'd say the improvement is marginal. If you want to go fancy you can
calculate n:
.... if n is None:
.... items = iter(items)
.... first = next(items)
.... n = len(first)
.... items = chain((first,), items)
.... return [imap(itemgetter(i), t) for i, t in enumerate(tee(items, n))]
....
map(list, split([(1,2,3), (4,5,6), (7,8,9)]))
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Peter
 
R

Raymond Hettinger

I tried changing the t to use operator.itergetter instead, but no
luck. Finally I got this:

def split(iterable, n):
    iterators = []
    for i, iterator in enumerate(itertools.tee(iterable, n)):
        f = lambda it, i=i: (t for t in it)
        iterators.append(f(iterator))
    return tuple(iterators)

which seems to work:
data = [(1,2,3), (4,5,6), (7,8,9)]
a, b, c = split(data, 3)
list(a), list(b), list(c)

([1, 4, 7], [2, 5, 8], [3, 6, 9])

Is this the right approach, or have I missed something obvious?



That looks about right to me.
It can be compacted a bit:

def split(iterable, n):
return tuple(imap(itemgetter(i), it) for i, it in
enumerate(tee(iterable, n)))

Internally, the tee's iterators are going to accumulate a ton of data
unless they are consumed roughly in parallel. Of course, if they are
consumed *exactly* in lockstep, the you don't need to split them into
separate iterables -- just use the tuples as they come.


Raymond
 
P

Paul Rubin

Steven D'Aprano said:
def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
f = lambda it, i=i: (t for t in it)
iterators.append(f(iterator))
return tuple(iterators)

Is this the right approach, or have I missed something obvious?


I think there is no way around using tee. But the for loop looks ugly.
This looks more direct to me, if I didn't mess something up:

def split(iterable, n):
return tuple(imap(itemgetter(i),t) for i,t in enumerate(tee(iterable,n)))
 
A

Arnaud Delobelle

Steven D'Aprano said:
Suppose I have an iterator that yields tuples of N items (a, b, ... n).

I want to split this into N independent iterators:

iter1 -> a, a2, a3, ...
iter2 -> b, b2, b3, ...
...
iterN -> n, n2, n3, ...

The iterator may be infinite, or at least too big to collect in a list.

My first attempt was this:


def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
iterators.append((t for t in iterator))
return tuple(iterators)

But it doesn't work, as all the iterators see the same values:
data = [(1,2,3), (4,5,6), (7,8,9)]
a, b, c = split(data, 3)
list(a), list(b), list(c)
([3, 6, 9], [3, 6, 9], [3, 6, 9])


I tried changing the t to use operator.itergetter instead, but no
luck. Finally I got this:

def split(iterable, n):
iterators = []
for i, iterator in enumerate(itertools.tee(iterable, n)):
f = lambda it, i=i: (t for t in it)
iterators.append(f(iterator))
return tuple(iterators)

which seems to work:
data = [(1,2,3), (4,5,6), (7,8,9)]
a, b, c = split(data, 3)
list(a), list(b), list(c)
([1, 4, 7], [2, 5, 8], [3, 6, 9])




Is this the right approach, or have I missed something obvious?


It is quite straightforward to implement your "split" function without
itertools.tee:

from collections import deque

def split(iterable):
it = iter(iterable)
q = [deque([x]) for x in it.next()]
def proj(qi):
while True:
if not qi:
for qj, xj in zip(q, it.next()):
qj.append(xj)
yield qi.popleft()
for qi in q:
yield proj(qi)
data = [(1,2,3), (4,5,6), (7,8,9)]
a, b, c = split(data)
print list(a), list(b), list(c)
[1, 4, 7] [2, 5, 8] [3, 6, 9]

Interestingly, given "split" it is very easy to implement "tee":

def tee(iterable, n=2):
return split(([x]*n for x in iterable))
(2, 3, 1)

In fact, split(x) is the same as zip(*x) when x is finite. The
difference is that with split(x), x is allowed to be infinite and with
zip(*x), each term of x is allowed to be infinite. It may be good to
have a function unifying the two.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top