Itertools

Discussion in 'Python' started by Ronald Legere, Jul 29, 2003.

  1. The new itertools stuff is pretty cool. One thing that bothers me though is
    that
    there seems to be no way to copy an iterator. How does one work around this?
    Is
    there a trick that I am missing?

    Without being able to 'split' an iterator, I can't
    think of how to do some of the cool things you can do with lazy lists in
    Haskell. It seems
    quite often you want to be able to do this 'splitting', and if I
    had more coffee I would post an example :) The ones I
    can think of are also defined recursively, which is another kettle of fish.

    But I hope that someone knows
    what I am talking about and has thought of a way...
    Cheers!
     
    Ronald Legere, Jul 29, 2003
    #1
    1. Advertising

  2. On Tue, 29 Jul 2003 08:32:14 -0400, "Ronald Legere"
    <> wrote:

    >The new itertools stuff is pretty cool. One thing that bothers me though is
    >that
    >there seems to be no way to copy an iterator. How does one work around this?
    >Is
    >there a trick that I am missing?
    >
    >Without being able to 'split' an iterator, I can't
    >think of how to do some of the cool things you can do with lazy lists in
    >Haskell. It seems
    >quite often you want to be able to do this 'splitting', and if I
    >had more coffee I would post an example :) The ones I
    >can think of are also defined recursively, which is another kettle of fish.
    >


    Iterables can be more far general than lazy lists: think of a socket
    as an iterable churning out lines of text, for example. How would one
    go about copying these general iterables?

    You can take slices of iterables, though - can't remember the name,
    islice? But since you don't know how to take copies of iterables in
    general this may not be that useful.

    > But I hope that someone knows
    >what I am talking about and has thought of a way...
    >Cheers!
    >


    With my best regards,
    G. Rodrigues
     
    Gonçalo Rodrigues, Jul 29, 2003
    #2
    1. Advertising

  3. Untested code that does basically this, but beware of any traps (read
    and understand the code before trying to use it!!!), uses Python 2.3
    functionality, but could be amended:

    --- START OF CODE ---

    import copy

    class IDup(object):
    def __init__(self,iterin):
    self.__iter = iterin
    self.__iterno = 0
    self.__iteritems = []
    self.__hasstopped = None

    def registerIter():
    iterno = self.__iterno
    self.__iterno += 1
    self.__iteritems.append([])

    def getNext(self,iterno):
    if self.__iteritems[iterno]:
    iteritem = self.__iteritems[iterno].pop(0)
    elif self.__hasstopped is not None:
    raise self.__hasstopped
    else:
    try:
    iteritem = self.__iter.next()
    except StopIteration, e:
    self.__hasstopped = e
    raise
    for id, i in enumerate(self.__iteritems):
    if id <> iterno:
    i.append(copy.deepcopy(iteritem))
    return iteritem

    class IDupped(object):
    def __init__(self,idup):
    self.__idup = idup
    self.__iterno = idup.registerIter()

    def next(self):
    return self.__idup.getNext(self.__iterno)

    def isplit(iterin,splitno=2):
    idup = IDup(iterin)
    iduppeds = []
    for i in range(splitno):
    iduppeds.append(IDupped(idup))
    return tuple(iduppeds)

    test = ["hello","how","are","you?"]
    x, y = isplit(iter(test))

    for i in x:
    print i
    for i in y:
    print i

    --- END OF CODE ---

    Pitfalls:

    1) You may not use the original iterator after it has been split.
    2) Your iterator must return values which are copyable. If you wish for
    the values to be truly equal (even in ID) or they are not copyable, then
    just remove the copy.deepcopy from the code.

    HTH!

    Heiko.
     
    Heiko Wundram, Jul 29, 2003
    #3
  4. Just tested the code, amend as stated (remove the - lines, and add
    the + lines):

    --- START OF CODE ---

    import copy

    class IDup(object):
    def __init__(self,iterin):
    self.__iter = iterin
    self.__iterno = 0
    self.__iteritems = []
    self.__hasstopped = None

    - def registerIter():
    + def registerIter(self):
    iterno = self.__iterno
    self.__iterno += 1
    self.__iteritems.append([])
    + return iterno

    def getNext(self,iterno):
    if self.__iteritems[iterno]:
    iteritem = self.__iteritems[iterno].pop(0)
    elif self.__hasstopped is not None:
    raise self.__hasstopped
    else:
    try:
    iteritem = self.__iter.next()
    except StopIteration, e:
    self.__hasstopped = e
    raise
    for id, i in enumerate(self.__iteritems):
    if id <> iterno:
    i.append(copy.deepcopy(iteritem))
    return iteritem

    class IDupped(object):
    def __init__(self,idup):
    self.__idup = idup
    self.__iterno = idup.registerIter()

    def next(self):
    return self.__idup.getNext(self.__iterno)

    + def __iter__(self):
    + return self

    def isplit(iterin,splitno=2):
    idup = IDup(iterin)
    iduppeds = []
    for i in range(splitno):
    iduppeds.append(IDupped(idup))
    return tuple(iduppeds)

    test = ["hello","how","are","you?"]
    x, y = isplit(iter(test))

    for i in x:
    print i
    for i in y:
    print i

    --- END OF CODE ---

    This works as stated.

    Heiko.
     
    Heiko Wundram, Jul 29, 2003
    #4
  5. streams (was: Re: Itertools)

    Ronald Legere wrote on 2003-07-29:

    > The new itertools stuff is pretty cool. One thing that bothers me
    > though is that there seems to be no way to copy an iterator. How
    > does one work around this? Is there a trick that I am missing?
    >
    > Without being able to 'split' an iterator, I can't think of how to
    > do some of the cool things you can do with lazy lists in Haskell. It
    > seems quite often you want to be able to do this 'splitting', and if
    > I had more coffee I would post an example :) The ones I can think of
    > are also defined recursively, which is another kettle of fish.
    >

    Strange, the need seems to come up more freqeuntly since I implemented
    it ;-). Or is it just me paying more attention to it? After a couple
    of design iterations, I've got:

    http://www.technion.ac.il/~cben/python/streams.py

    [NEWS: The last changes were addition of __nonzero__ methods, clued by
    discussion on c.l.py <wink>, and a `.force(count=None)` method to
    force computing values.]

    I concluded that it's best to separate the two abstractions, providing
    ability to convert both ways:

    - An iterable immutable (but lazily computed) linked list, with O(1)
    tail access and ability to cons in front of it.

    - An iterator object whose `.next()` method is necessarily
    destructive, on another hand. Turns out the most flexible
    implementation of the iterator is simply a pointer to a linked list;
    this way you can split the iterator and even "unyield" values back
    onto it. See the module for more details.

    In general it's most clean to separate iterables from iterators.
    Iterables should be changed by iterating.

    Any feedback welcome. I'd like to make it as Pythonic as possible.
    An perhaps make it standard. Particular open questions:

    - Terminology. There are too many names for the same things ;-).

    - The linked lists are called "streams" by me because they are lazy
    and that's what such beasts are called in functional languages.
    But I still have many refernces to them as linked lists,
    especially w.r.t. the `Cons` class which happens to be the base
    class of `Stream`. So `Stream` is defined as a lazy linked list
    node. Should I try to squash refences to "linked lists", treating
    them as "stream nodes which are already computed" to reduce
    confusion with Python lists?

    - car/cdr vs. first/rest vs. head/tail. I chose head/tail.
    car/cdr would be opaque to people without lisp background.
    first/rest are of different length :).

    - `Cons` still requires lisp background but I don't know any name
    that would be recognizable to non-lispers anyway. And it's not
    a bad name.

    - Is `StreamIter` the best name?

    - I called the operation for creating new iterator from same place
    "forking". Alternative names: `split`, `copy`, `clone`, `dup`,
    "lookahead", etc. What's best?

    - Consing in front of a StreamIter's stream is called `.unyeild`.
    The point is that it's the opposite of the ``yield`` statement
    but the name is a brave new invention. Alternatives: `prev`
    (antonym of `.next()`, probably misguided since it doesn't
    return the previous value but rather takes it from you),
    `pushback`, `unread`, `stuff`, etc.

    - Functionality:

    - Should the operations (`.head`/`.tail`, indexing) on the
    underlying stream be proxied by `StreamIter` or should the
    programmer have to spell out e.g. ``iterator.stream.head`` as now?

    - What to do about __repr__ and __str__? The streams could well be
    infinite and forcing lookahead for even a few values is
    unacceptable as this could be expensive.

    - What to do about __hash__ and __eq__?

    - Should I weakly memoize the `Stream` for each given iterable?
    That would reduce the risk of wrapping a destructive iterator
    twice with distinct streams, so that each produced item is seen by
    only one stream, with access-order-dependant interleaving,
    violating the functional intent of the laziness. OTOH, this would
    make the programmer less aware of the risk. Besides, I'm not sure
    what to return on the second attempt to wrap same iterator after
    some of it has already been consumed since the first time.

    - Would some convenience operations for lookahead, e.g.
    `.startswith()` be useful?

    - Anything else anybody needs?

    --
    Beni Cherniavsky <>

    Put a backslash at the evening to continue hacking onto the next day.
     
    Beni Cherniavsky, Jul 30, 2003
    #5
  6. Ronald Legere

    Mike Rovner Guest

    Re: streams (was: Re: Itertools)

    Beni Cherniavsky wrote:
    > Any feedback welcome. I'd like to make it as Pythonic as possible.


    > An perhaps make it standard. Particular open questions:


    > - The linked lists are called "streams" by me because they are lazy
    > and that's what such beasts are called in functional languages.


    Stream is used differently in Tcl and Unix.
    Minded separation Python from functional approach (deprecation of filter,
    map;
    doubtfulness of implementing tail recursion optimization) probably it is not
    so
    inspiring for regular python programmer.

    Lazy "streams" are a good thing provided they can be easily used for
    organizing data pathes and connections.

    > - `Cons` still requires lisp background but I don't know any name
    > that would be recognizable to non-lispers anyway. And it's not
    > a bad name.



    Construct, Glue, Merge?
    Taken readability any whole word is better.

    > - I called the operation for creating new iterator from same place
    > "forking". Alternative names: `split`, `copy`, `clone`, `dup`,
    > "lookahead", etc. What's best?


    copy is already using construct in python. Why not use it?

    > - Anything else anybody needs?


    If I got the intend of Stream correctly, I wish use them as Tcl streams,
    so I'd need combining and filtering and i/o.

    Just 0.02

    Mike
     
    Mike Rovner, Jul 30, 2003
    #6
  7. Re: streams (was: Re: Itertools)

    Mike Rovner wrote on 2003-07-30:

    > Beni Cherniavsky wrote:
    > > Any feedback welcome. I'd like to make it as Pythonic as possible.

    >
    > > An perhaps make it standard. Particular open questions:

    >
    > > - The linked lists are called "streams" by me because they are lazy
    > > and that's what such beasts are called in functional languages.

    >
    > Stream is used differently in Tcl and Unix. Minded separation
    > Python from functional approach (deprecation of filter, map;
    > doubtfulness of implementing tail recursion optimization) probably
    > it is not so inspiring for regular python programmer.
    >

    Point taken. But what do you call it then? Lazy linked lists has
    only one single-word name that I'm aware of and that is "streams";
    there is no name for it in Tcl and Unix becauit is almost never used
    in them, as far as I know.

    > Lazy "streams" are a good thing provided they can be easily used for
    > organizing data pathes and connections.
    >

    Python already provides the minimal construct for connecting producers
    and consumers of data: the iterator protocol. It is minimalistic and
    becomes inconvenient when you want to use the same value more than one
    time, for lookahead, backtracking or simply connecting the same source
    to many consumers. All this is quite easy with the head/tail access
    to streams; it's a very powerfull abstraction. Since streams are
    implemented as linked lists, they don't leak memory for infinite
    iterators, unless you keep a reference to a fixed position in the
    stream. Over streams I've built an iterator which also provides the
    same powers but more in line with Python's iterator protocol.

    > > - `Cons` still requires lisp background but I don't know any name
    > > that would be recognizable to non-lispers anyway. And it's not
    > > a bad name.

    >
    > Construct, Glue, Merge?
    > Taken readability any whole word is better.
    >

    `Construct` is too long. Compare with `str`, `int`, `dict` rather
    than `string`, `integer` and `dictionary`.

    How would `Glue` and `Merge` be meaningful here? The only
    associasions of "glue" are what TeX puts between boxes and "glue
    layers" between different languages or libraries. "Merge" sounds like
    the process of taking several sequences of values and interlevaing
    them in some way (like in merge sort; BTW, it needs a lookahead of 1
    for each stream, so it should be convenient to implement with
    streams). But this has nothing to do with what `Cons` does: creating
    a new stream by prepending given head to given tail. `Cons` is the
    only term I know of that unabigously refers to this operation, again
    coming from functional languages <wink>.

    There is another possibility: rename `Cons` to `Stream` and `Stream`
    to `LazyStream`; `Stream` would then be a factory function creating
    either depending on whether it got 1 or two arguments. Would this be
    better?

    > > - I called the operation for creating new iterator from same place
    > > "forking". Alternative names: `split`, `copy`, `clone`, `dup`,
    > > "lookahead", etc. What's best?

    >
    > copy is already using construct in python. Why not use it?
    >

    Copy was adviced against by Erik Max Francis, see:

    http://groups.google.com/groups?selm=

    However, it really copies the iterator, it's just very fast. So I
    think I will indeed name this `.copy()` (aliased as `.__copy__`).

    > > - Anything else anybody needs?

    >
    > If I got the intend of Stream correctly, I wish use them as Tcl streams,
    > so I'd need combining and filtering and i/o.
    >

    You can use a `Stream` or `StreamIter` over a file object to get
    lookahead and mulitple passes over it, without requiring the file to
    be seekable! That's one of the main uses. It also gives you ability
    to "unread" any amount of data. Note that if you simply apply it to a
    file the stream operates on a line at a time; you will need to wrap
    the file with a custom generator if you wish to operate at character
    level; working at token level might be the best compromise.

    I'm not familiar with Tcl streams, so I don't understand the
    "combining and filtering" reference - can you elaborate?

    --
    Beni Cherniavsky <>

    Put a backslash at the evening to continue hacking onto the next day.
     
    Beni Cherniavsky, Jul 31, 2003
    #7
  8. Ronald Legere

    Mike Rovner Guest

    Re: streams (was: Re: Itertools)

    Beni Cherniavsky wrote:
    > Mike Rovner wrote on 2003-07-30:


    I'm sorry for taking so long to answer (I had a project deadline to meet
    ;) - done).

    >> Beni Cherniavsky wrote:

    > Point taken. But what do you call it then? Lazy linked lists has
    > only one single-word name that I'm aware of and that is "streams";
    > there is no name for it in Tcl and Unix becauit is almost never used
    > in them, as far as I know.


    I made an intensive search on internet and convinced now that term 'streams'
    is globaly associated with i/o. I still think using this term for universal
    data
    structure will be misleading. However I can't come with a good word
    (my best is LazyList).

    >>> - `Cons` still requires lisp background but I don't know any
    >>> name that would be recognizable to non-lispers anyway. And
    >>> it's not a bad name.

    >>
    >> Construct, Glue, Merge?
    >> Taken readability any whole word is better.
    >>

    > `Construct` is too long. Compare with `str`, `int`, `dict` rather
    > than `string`, `integer` and `dictionary`.


    But they all are (long-lived) built-ins. AFAIK now there is a tendency
    to make language more readable (isinstance, enumerate to name a few
    hmm recent additions)

    > How would `Glue` and `Merge` be meaningful here? The only
    > associasions of "glue" are what TeX puts between boxes and "glue
    > layers" between different languages or libraries. "Merge" sounds like
    > the process of taking several sequences of values and interlevaing
    > them in some way (like in merge sort; BTW, it needs a lookahead of 1
    > for each stream, so it should be convenient to implement with
    > streams). But this has nothing to do with what `Cons` does: creating
    > a new stream by prepending given head to given tail. `Cons` is the
    > only term I know of that unabigously refers to this operation, again
    > coming from functional languages <wink>.
    >
    > There is another possibility: rename `Cons` to `Stream` and `Stream`
    > to `LazyStream`; `Stream` would then be a factory function creating
    > either depending on whether it got 1 or two arguments. Would this be
    > better?


    No. In discussion about list.insert() and/or range() difference in behaivor
    dependent on args was prononced A Bad Thing (IIRC).

    >>> - Anything else anybody needs?

    >>
    >> If I got the intend of Stream correctly, I wish use them as Tcl
    >> streams, so I'd need combining and filtering and i/o.
    >>

    > You can use a `Stream` or `StreamIter` over a file object to get
    > lookahead and mulitple passes over it, without requiring the file to
    > be seekable! That's one of the main uses. It also gives you ability
    > to "unread" any amount of data. Note that if you simply apply it to a
    > file the stream operates on a line at a time; you will need to wrap
    > the file with a custom generator if you wish to operate at character
    > level; working at token level might be the best compromise.
    >
    > I'm not familiar with Tcl streams, so I don't understand the
    > "combining and filtering" reference - can you elaborate?


    I didn't touch tcl for several years ;)
    But the idea is pretty simple: to pass file object throw 'filter' to get
    'another' file(-like) object.

    import compress as Z
    for line in tarfile.open('-','r',Z.open(filename)).extractiter()

    instead of

    stream = tarfile.open('-','r',Z.open(filename))
    for elem in tarfile.open('-','r',Z.open(filename)).getmembers():
    for line in stream.extract(elem):
    ...

    or something like that. I admit that later example based on new 2.3 syntax
    fulfill my (long being) expectations quite well.

    Thanks,
    Mike
     
    Mike Rovner, Aug 8, 2003
    #8
  9. Re: streams (was: Re: Itertools)

    According to Mike Rovner <>:
    > I made an intensive search on internet and convinced now that term 'streams'
    > is globaly associated with i/o. I still think using this term for universal
    > data
    > structure will be misleading. However I can't come with a good word
    > (my best is LazyList).


    Once upon a time, in the days of 386BSD, certain people claimed to be
    reinnovating Sys V streams for 386BSD, which were to be called "currents".

    Somehow this episode stuck in my mind. (I think it was the sound of a
    balloon bursting from too much hot air. ;-)


    --
    Ng Pheng Siong <>

    http://firewall.rulemaker.net -+- Manage Your Firewall Rulebase Changes
    http://www.post1.com/home/ngps -+- Open Source Python Crypto & SSL
     
    Ng Pheng Siong, Aug 8, 2003
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    898
  2. Dan Williams

    itertools.take?

    Dan Williams, Jul 15, 2003, in forum: Python
    Replies:
    1
    Views:
    375
    Raymond Hettinger
    Jul 15, 2003
  3. Steven Bethard
    Replies:
    0
    Views:
    411
    Steven Bethard
    Mar 12, 2005
  4. Raymond Hettinger
    Replies:
    17
    Views:
    577
    Simon Brunning
    Feb 18, 2008
  5. Nick Mellor
    Replies:
    35
    Views:
    394
    Paul Rubin
    Dec 6, 2012
Loading...

Share This Page