Split iterator into multiple streams

Discussion in 'Python' started by Steven D'Aprano, Nov 6, 2010.

  1. Suppose I have an iterator that yields tuples of N items (a, b, ... n).

    I want to split this into N independent iterators:

    iter1 -> a, a2, a3, ...
    iter2 -> b, b2, b3, ...
    ....
    iterN -> n, n2, n3, ...

    The iterator may be infinite, or at least too big to collect in a list.

    My first attempt was this:


    def split(iterable, n):
    iterators = []
    for i, iterator in enumerate(itertools.tee(iterable, n)):
    iterators.append((t for t in iterator))
    return tuple(iterators)

    But it doesn't work, as all the iterators see the same values:

    >>> data = [(1,2,3), (4,5,6), (7,8,9)]
    >>> a, b, c = split(data, 3)
    >>> list(a), list(b), list(c)

    ([3, 6, 9], [3, 6, 9], [3, 6, 9])


    I tried changing the t to use operator.itergetter instead, but no
    luck. Finally I got this:

    def split(iterable, n):
    iterators = []
    for i, iterator in enumerate(itertools.tee(iterable, n)):
    f = lambda it, i=i: (t for t in it)
    iterators.append(f(iterator))
    return tuple(iterators)

    which seems to work:

    >>> data = [(1,2,3), (4,5,6), (7,8,9)]
    >>> a, b, c = split(data, 3)
    >>> list(a), list(b), list(c)

    ([1, 4, 7], [2, 5, 8], [3, 6, 9])




    Is this the right approach, or have I missed something obvious?



    --
    Steven
    Steven D'Aprano, Nov 6, 2010
    #1
    1. Advertising

  2. Steven D'Aprano

    Ian Guest

    On Nov 6, 2:52 am, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > My first attempt was this:
    >
    > def split(iterable, n):
    >     iterators = []
    >     for i, iterator in enumerate(itertools.tee(iterable, n)):
    >         iterators.append((t for t in iterator))
    >     return tuple(iterators)
    >
    > But it doesn't work, as all the iterators see the same values:


    Because the value of i is not evaluated until the generator is
    actually run; so all the generators end up seeing only the final value
    of i rather than the intended values. This is a common problem with
    generator expressions that are not immediately run.

    > I tried changing the t to use operator.itergetter instead, but no
    > luck. Finally I got this:
    >
    > def split(iterable, n):
    >     iterators = []
    >     for i, iterator in enumerate(itertools.tee(iterable, n)):
    >         f = lambda it, i=i: (t for t in it)
    >         iterators.append(f(iterator))
    >     return tuple(iterators)
    >
    > which seems to work:
    >
    > >>> data = [(1,2,3), (4,5,6), (7,8,9)]
    > >>> a, b, c = split(data, 3)
    > >>> list(a), list(b), list(c)

    >
    > ([1, 4, 7], [2, 5, 8], [3, 6, 9])
    >
    > Is this the right approach, or have I missed something obvious?


    That avoids the generator problem, but in this case you could get the
    same result a bit more straight-forwardly by just using imap instead:

    def split(iterable, n):
    iterators = []
    for i, iterator in enumerate(itertools.tee(iterable, n)):
    iterators.append(itertools.imap(operator.itemgetter(i),
    iterator))
    return tuple(iterators)

    >>> map(list, split(data, 3))

    [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

    Cheers,
    Ian
    Ian, Nov 6, 2010
    #2
    1. Advertising

  3. Steven D'Aprano

    Peter Otten Guest

    Steven D'Aprano wrote:

    > Suppose I have an iterator that yields tuples of N items (a, b, ... n).
    >
    > I want to split this into N independent iterators:
    >
    > iter1 -> a, a2, a3, ...
    > iter2 -> b, b2, b3, ...
    > ...
    > iterN -> n, n2, n3, ...
    >
    > The iterator may be infinite, or at least too big to collect in a list.
    >
    > My first attempt was this:
    >
    >
    > def split(iterable, n):
    > iterators = []
    > for i, iterator in enumerate(itertools.tee(iterable, n)):
    > iterators.append((t for t in iterator))
    > return tuple(iterators)
    >
    > But it doesn't work, as all the iterators see the same values:
    >
    >>>> data = [(1,2,3), (4,5,6), (7,8,9)]
    >>>> a, b, c = split(data, 3)
    >>>> list(a), list(b), list(c)

    > ([3, 6, 9], [3, 6, 9], [3, 6, 9])
    >
    >
    > I tried changing the t to use operator.itergetter instead, but no
    > luck. Finally I got this:
    >
    > def split(iterable, n):
    > iterators = []
    > for i, iterator in enumerate(itertools.tee(iterable, n)):
    > f = lambda it, i=i: (t for t in it)
    > iterators.append(f(iterator))
    > return tuple(iterators)
    >
    > which seems to work:
    >
    >>>> data = [(1,2,3), (4,5,6), (7,8,9)]
    >>>> a, b, c = split(data, 3)
    >>>> list(a), list(b), list(c)

    > ([1, 4, 7], [2, 5, 8], [3, 6, 9])
    >
    >
    >
    >
    > Is this the right approach, or have I missed something obvious?


    Here's how to do it with operator.itemgetter():

    >>> from itertools import *
    >>> from operator import itemgetter
    >>> data = [(1,2,3), (4,5,6), (7,8,9)]
    >>> abc = [imap(itemgetter(i), t) for i, t in enumerate(tee(data, 3))]
    >>> map(list, abc)

    [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

    I'd say the improvement is marginal. If you want to go fancy you can
    calculate n:

    >>> def split(items, n=None):

    .... if n is None:
    .... items = iter(items)
    .... first = next(items)
    .... n = len(first)
    .... items = chain((first,), items)
    .... return [imap(itemgetter(i), t) for i, t in enumerate(tee(items, n))]
    ....
    >>> map(list, split([(1,2,3), (4,5,6), (7,8,9)]))

    [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

    Peter
    Peter Otten, Nov 6, 2010
    #3
  4. On Nov 6, 1:52 am, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > I tried changing the t to use operator.itergetter instead, but no
    > luck. Finally I got this:
    >
    > def split(iterable, n):
    >     iterators = []
    >     for i, iterator in enumerate(itertools.tee(iterable, n)):
    >         f = lambda it, i=i: (t for t in it)
    >         iterators.append(f(iterator))
    >     return tuple(iterators)
    >
    > which seems to work:
    >
    > >>> data = [(1,2,3), (4,5,6), (7,8,9)]
    > >>> a, b, c = split(data, 3)
    > >>> list(a), list(b), list(c)

    >
    > ([1, 4, 7], [2, 5, 8], [3, 6, 9])
    >
    > Is this the right approach, or have I missed something obvious?



    That looks about right to me.
    It can be compacted a bit:

    def split(iterable, n):
    return tuple(imap(itemgetter(i), it) for i, it in
    enumerate(tee(iterable, n)))

    Internally, the tee's iterators are going to accumulate a ton of data
    unless they are consumed roughly in parallel. Of course, if they are
    consumed *exactly* in lockstep, the you don't need to split them into
    separate iterables -- just use the tuples as they come.


    Raymond
    Raymond Hettinger, Nov 6, 2010
    #4
  5. Steven D'Aprano

    Paul Rubin Guest

    Steven D'Aprano <> writes:
    > def split(iterable, n):
    > iterators = []
    > for i, iterator in enumerate(itertools.tee(iterable, n)):
    > f = lambda it, i=i: (t for t in it)
    > iterators.append(f(iterator))
    > return tuple(iterators)
    >
    > Is this the right approach, or have I missed something obvious?


    I think there is no way around using tee. But the for loop looks ugly.
    This looks more direct to me, if I didn't mess something up:

    def split(iterable, n):
    return tuple(imap(itemgetter(i),t) for i,t in enumerate(tee(iterable,n)))
    Paul Rubin, Nov 6, 2010
    #5
  6. Steven D'Aprano <> writes:

    > Suppose I have an iterator that yields tuples of N items (a, b, ... n).
    >
    > I want to split this into N independent iterators:
    >
    > iter1 -> a, a2, a3, ...
    > iter2 -> b, b2, b3, ...
    > ...
    > iterN -> n, n2, n3, ...
    >
    > The iterator may be infinite, or at least too big to collect in a list.
    >
    > My first attempt was this:
    >
    >
    > def split(iterable, n):
    > iterators = []
    > for i, iterator in enumerate(itertools.tee(iterable, n)):
    > iterators.append((t for t in iterator))
    > return tuple(iterators)
    >
    > But it doesn't work, as all the iterators see the same values:
    >
    >>>> data = [(1,2,3), (4,5,6), (7,8,9)]
    >>>> a, b, c = split(data, 3)
    >>>> list(a), list(b), list(c)

    > ([3, 6, 9], [3, 6, 9], [3, 6, 9])
    >
    >
    > I tried changing the t to use operator.itergetter instead, but no
    > luck. Finally I got this:
    >
    > def split(iterable, n):
    > iterators = []
    > for i, iterator in enumerate(itertools.tee(iterable, n)):
    > f = lambda it, i=i: (t for t in it)
    > iterators.append(f(iterator))
    > return tuple(iterators)
    >
    > which seems to work:
    >
    >>>> data = [(1,2,3), (4,5,6), (7,8,9)]
    >>>> a, b, c = split(data, 3)
    >>>> list(a), list(b), list(c)

    > ([1, 4, 7], [2, 5, 8], [3, 6, 9])
    >
    >
    >
    >
    > Is this the right approach, or have I missed something obvious?


    It is quite straightforward to implement your "split" function without
    itertools.tee:

    from collections import deque

    def split(iterable):
    it = iter(iterable)
    q = [deque([x]) for x in it.next()]
    def proj(qi):
    while True:
    if not qi:
    for qj, xj in zip(q, it.next()):
    qj.append(xj)
    yield qi.popleft()
    for qi in q:
    yield proj(qi)

    >>> data = [(1,2,3), (4,5,6), (7,8,9)]
    >>> a, b, c = split(data)
    >>> print list(a), list(b), list(c)

    [1, 4, 7] [2, 5, 8] [3, 6, 9]

    Interestingly, given "split" it is very easy to implement "tee":

    def tee(iterable, n=2):
    return split(([x]*n for x in iterable))

    >>> a, b = tee(range(10), 2)
    >>> a.next(), a.next(), b.next()

    (0, 1, 0)
    >>> a.next(), a.next(), b.next()

    (2, 3, 1)

    In fact, split(x) is the same as zip(*x) when x is finite. The
    difference is that with split(x), x is allowed to be infinite and with
    zip(*x), each term of x is allowed to be infinite. It may be good to
    have a function unifying the two.

    --
    Arnaud
    Arnaud Delobelle, Nov 6, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Willempie
    Replies:
    2
    Views:
    480
    Kevin Spencer
    Mar 1, 2005
  2. Gary Johnson

    XSLT: Split into multiple columns

    Gary Johnson, Dec 10, 2003, in forum: XML
    Replies:
    3
    Views:
    1,354
    Baldo
    Dec 16, 2003
  3. DD
    Replies:
    3
    Views:
    4,101
    Jeff Higgins
    Nov 28, 2007
  4. Ilyas
    Replies:
    1
    Views:
    742
    Kevin Spencer
    Feb 19, 2008
  5. Andy B

    Split Menu into multiple lines?

    Andy B, May 6, 2008, in forum: ASP .Net
    Replies:
    0
    Views:
    361
    Andy B
    May 6, 2008
Loading...

Share This Page