Fate of itertools.dropwhile() and itertools.takewhile()

Discussion in 'Python' started by Raymond Hettinger, Dec 29, 2007.

  1. I'm considering deprecating these two functions and would like some
    feedback from the community or from people who have a background in
    functional programming.

    * I'm concerned that use cases for the two functions are uncommon and
    can obscure code rather than clarify it.

    * I originally added them to itertools because they were found in
    other functional languages and because it seemed like they would serve
    basic building blocks in combination with other itertools allow
    construction of a variety of powerful, high-speed iterators. The
    latter may have been a false hope -- to date, I've not seen good
    recipes that depend on either function.

    * If an always true or always false predicate is given, it can be hard
    to break-out of the function once it is running.

    * Both functions seem simple and basic until you try to explain them
    to someone else. Likewise, when reading code containing dropwhile(),
    I don't think it is self-evident that dropwhile() may have a lengthy
    start-up time.

    * Since itertools are meant to be combined together, the whole module
    becomes easier to use if there are fewer tools to choose from.

    These thoughts reflect my own experience with the itertools module.
    It may be that your experience with them has been different. Please
    let me know what you think.

    Raymond
     
    Raymond Hettinger, Dec 29, 2007
    #1
    1. Advertising

  2. On Dec 29, 6:10 pm, Raymond Hettinger <> wrote:

    > These thoughts reflect my own experience with the itertools module.
    > It may be that your experience with them has been different. Please
    > let me know what you think.


    first off, the itertools module is amazing, thanks for creating it. It
    changed the way I think about programming. In fact nowadays I start
    all my programs with:

    from itertools import *

    which may not be the best form, but I got tired of importing every
    single function individually or writing out the module name.

    Now I never needed the dropwhile() and takewhile() functions, but that
    may not mean much. For quite a while I never needed the repeat()
    function either. It even looked nonsensical to have an iterator that
    simply repeats the same thing over and over. One day I had to solve a
    problem that needed repeat() and made me really understand what it was
    for and got to marvel at a just how neat the solution was.

    i.
     
    Istvan Albert, Dec 30, 2007
    #2
    1. Advertising

  3. On Sat, 29 Dec 2007 15:10:24 -0800, Raymond Hettinger wrote:

    > * Both functions seem simple and basic until you try to explain them to
    > someone else.


    Oh I don't know about that. The doc strings seem to do an admirable job
    to me. Compared to groupby(), the functions are simplicity themselves.


    > Likewise, when reading code containing dropwhile(), I
    > don't think it is self-evident that dropwhile() may have a lengthy
    > start-up time.


    *scratches head in confusion*

    It isn't? I can understand somebody *under*estimating the start-up time
    (perhaps because they overestimate how quickly dropwhile() can iterate
    through the items). But surely it is self-evident that a function which
    drops items has to drop the items before it can start returning?


    > * Since itertools are meant to be combined together, the whole module
    > becomes easier to use if there are fewer tools to choose from.


    True, but on the other hand a toolbox with too few tools is harder to use
    than one with too many tools.



    --
    Steven
     
    Steven D'Aprano, Dec 30, 2007
    #3
  4. Raymond Hettinger

    Guest

    Almost every day I write code that uses itertools, so I find it very
    useful, and its functions fast.
    Removing useless things and keeping things tidy is often positive. But
    I can't tell you what to remove. Here are my usages (every sub-list is
    sorted by inverted frequency usage):

    I use often or very often:
    groupby( iterable[, key])
    imap( function, *iterables)
    izip( *iterables)
    ifilter( predicate, iterable)
    islice( iterable, [start,] stop [, step])

    I use once in while:
    cycle( iterable)
    chain( *iterables)
    count( [n])
    repeat( object[, times])

    I have used probably one time or few times:
    starmap( function, iterable)
    tee( iterable[, n=2])
    ifilterfalse( predicate, iterable)

    Never used so far:
    dropwhile( predicate, iterable)
    takewhile( predicate, iterable)

    Bye,
    bearophile
     
    , Dec 30, 2007
    #4
  5. On Dec 30, 12:10 am, Raymond Hettinger <> wrote:
    > I'm considering deprecating these two functions and would like some
    > feedback from the community or from people who have a background in
    > functional programming.



    I am with Steven D'Aprano when he says that takewhile and dropwhile
    are clear enough. On the other hand, in my code
    base I have exactly zero occurrences of takewhile and
    dropwhile, even if I tend to use the itertools quite
    often. That should be telling. If my situations is
    common, that means that takewhile and dropwhile are
    useless in practice and should be deprecated.
    But I will wait for other respondents. It may just be
    that I never needed them. I presume you did scans of
    large code bases and you did not find occurrences of
    takewhile and dropwhile, right?


    Michele Simionato
     
    Michele Simionato, Dec 30, 2007
    #5
  6. On Sat, 29 Dec 2007 15:10:24 -0800, Raymond Hettinger wrote:

    > These thoughts reflect my own experience with the itertools module.
    > It may be that your experience with them has been different. Please
    > let me know what you think.


    I seem to be in a minority here as I use both functions from time to time.
    One "recipe" is extracting blocks from text files that are delimited by a
    special start and end line.

    def iter_block(lines, start_marker, end_marker):
    return takewhile(lambda x: not x.startswith(end_marker),
    dropwhile(lambda x: not x.startswith(start_marker),
    lines))

    Maybe these functions usually don't turn up in code that can be called
    "recipes" so often but are useful for themselves.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Dec 30, 2007
    #6
  7. On Dec 30, 3:29 am, Marc 'BlackJack' Rintsch <> wrote:

    > One "recipe" is extracting blocks from text files that are delimited by a
    > special start and end line.


    Neat solution!

    I actually need such functionality every once in a while.

    Takewhile + dropwhile to the rescue!

    i.
     
    Istvan Albert, Dec 30, 2007
    #7
  8. On Dec 30, 4:12 pm, Istvan Albert <> wrote:
    > On Dec 30, 3:29 am, Marc 'BlackJack' Rintsch <> wrote:
    >
    > > One "recipe" is extracting blocks from text files that are delimited by a
    > > special start and end line.

    >
    > Neat solution!
    >
    > I actually need such functionality every once in a while.
    >
    > Takewhile + dropwhile to the rescue!
    >
    > i.


    On at least one thread and a recipe for this task (http://
    aspn.activestate.com/ASPN/Cookbook/Python/Recipe/521877), the proposed
    solutions involved groupby() with an appropriate key function. The
    takewhile/dropwhile solution seems shorter and (maybe) easier to read
    but perhaps not as flexible and general. Regardless, it's a good
    example of takewhile/dropwhile.

    George
     
    George Sakkis, Dec 30, 2007
    #8
  9. [bearophile]
    > Here are my usages (every sub-list is
    > sorted by inverted frequency usage):
    >
    > I use often or very often:
    > groupby( iterable[, key])
    > imap( function, *iterables)
    > izip( *iterables)
    > ifilter( predicate, iterable)
    > islice( iterable, [start,] stop [, step])
    >
    > I use once in while:
    > cycle( iterable)
    > chain( *iterables)
    > count( [n])
    > repeat( object[, times])
    >
    > I have used probably one time or few times:
    > starmap( function, iterable)
    > tee( iterable[, n=2])
    > ifilterfalse( predicate, iterable)
    >
    > Never used so far:
    > dropwhile( predicate, iterable)
    > takewhile( predicate, iterable)


    Thank you for the useful and informative response.


    Raymond
     
    Raymond Hettinger, Dec 31, 2007
    #9
  10. [Michele Simionato]
    > in my code
    > base I have exactly zero occurrences of takewhile and
    > dropwhile, even if I tend to use the itertools quite
    > often. That should be telling.


    Thanks for the additional empirical evidence.

    > I presume you did scans of
    > large code bases and you did not find occurrences of
    > takewhile and dropwhile, right?


    Yes.


    Raymond
     
    Raymond Hettinger, Dec 31, 2007
    #10
  11. [Marc 'BlackJack' Rintsch]
    > I use both functions from time to time.
    > One "recipe" is extracting blocks from text files that are delimited by a
    > special start and end line.
    >
    > def iter_block(lines, start_marker, end_marker):
    >     return takewhile(lambda x: not x.startswith(end_marker),
    >                      dropwhile(lambda x: not x.startswith(start_marker),
    >                                lines))


    Glad to hear this came from real code instead of being contrived for
    this discussion. Thanks for the contribution.

    Looking at the code fragment, I wondered how that approach compared to
    others in terms of being easy to write, self-evidently correct,
    absence of awkward constructs, and speed. The lambda expressions are
    not as fast as straight C calls or in-lined code, and they also each
    require a 'not' to invert the startswith condition. The latter is a
    bit problematic in that it is a bit awkward, and it is less self-
    evident whether the lines with the markers are included or excluded
    from the output (the recipe may in fact be buggy -- the line with the
    start marker is included and the line with the end marker is
    excluded). Your excellent choice of indentation helps improve the
    readability of the nested takewhile/dropwhile calls.

    In contrast, the generator version is clearer about whether the start
    and end marker lines get included and is easily modified if you want
    to change that choice. It is easy to write and more self-evident
    about how it handles the end cases. Also, it avoids the expense of
    the lambda function calls and the awkwardness of the 'not' to invert
    the sense of the test:

    def iter_block(lines, start_marker, end_marker):
    inblock = False
    for line in lines:
    if inblock:
    if line.startswith(end_marker):
    break
    yield line
    elif line.startswith(start_marker):
    yield line
    inblock = True

    And, of course, for this particular application, an approach based on
    regular expressions makes short work of the problem and runs very
    fast:

    re.search('(^beginmark.*)^endmark', textblock, re.M |
    re.S).group(1)


    Raymond
     
    Raymond Hettinger, Dec 31, 2007
    #11
  12. FWIW, here is an generator version written without the state flag:

    def iter_block(lines, start_marker, end_marker):
    lines = iter(lines)
    for line in lines:
    if line.startswith(start_marker):
    yield line
    break
    for line in lines:
    if line.startswith(end_marker):
    return
    yield line

    Raymond
     
    Raymond Hettinger, Dec 31, 2007
    #12
  13. Raymond Hettinger

    Paul Hankin Guest

    On Dec 31, 1:25 am, Raymond Hettinger <> wrote:
    > FWIW, here is an generator version written without the state flag:
    >
    >     def iter_block(lines, start_marker, end_marker):
    >         lines = iter(lines)
    >         for line in lines:
    >             if line.startswith(start_marker):
    >                 yield line
    >                 break
    >         for line in lines:
    >             if line.startswith(end_marker):
    >                 return
    >             yield line


    Here's a (stateful) version that generates all blocks...

    import itertools

    def iter_blocks(lines, start_marker, end_marker):
    inblock = [False]
    def line_in_block(line):
    inblock[0] = inblock[0] and not line.startswith(end_marker)
    inblock[0] = inblock[0] or line.startswith(start_marker)
    return inblock[0]
    return (block for is_in_block, block in
    itertools.groupby(lines, line_in_block) if is_in_block)

    If you just want the first block (as the original code did), you can
    just take it...

    for line in iter_blocks(lines, start_marker, end_marker).next():
    ... process lines of first block.

    I'm not happy about the way the inblock state has to be a 1-element
    list to avoid the non-local problem. Is there a nicer way to code it?
    Otherwise, I quite like this code (if I do say so myself) as it neatly
    separates out the logic of whether you're inside a block or not from
    the code that yields blocks and lines. I'd say it was quite readable
    if you're familiar with groupby.

    And back on topic... I use itertools regularly (and have a functional
    background), but have never needed takewhile or dropwhile. I'd be
    happy to see them deprecated.

    --
    Paul Hankin
     
    Paul Hankin, Dec 31, 2007
    #13
  14. Raymond Hettinger wrote:
    > I'm considering deprecating these two functions and would like some
    > feedback from the community or from people who have a background in
    > functional programming.
    >
    > * I'm concerned that use cases for the two functions are uncommon and
    > can obscure code rather than clarify it.
    >
    > * I originally added them to itertools because they were found in
    > other functional languages and because it seemed like they would serve
    > basic building blocks in combination with other itertools allow
    > construction of a variety of powerful, high-speed iterators. The
    > latter may have been a false hope -- to date, I've not seen good
    > recipes that depend on either function.
    >
    > * If an always true or always false predicate is given, it can be hard
    > to break-out of the function once it is running.
    >
    > * Both functions seem simple and basic until you try to explain them
    > to someone else. Likewise, when reading code containing dropwhile(),
    > I don't think it is self-evident that dropwhile() may have a lengthy
    > start-up time.
    >
    > * Since itertools are meant to be combined together, the whole module
    > becomes easier to use if there are fewer tools to choose from.
    >
    > These thoughts reflect my own experience with the itertools module.
    > It may be that your experience with them has been different. Please
    > let me know what you think.
    >
    > Raymond


    FWIW, Google Code Search shows a few users:

    <http://www.google.com/codesearch?q=lang%3Apython+%28drop%7Ctake%29while>

    Do any of them make good use of them?
    --
     
    Matt Nordhoff, Dec 31, 2007
    #14
  15. Raymond Hettinger

    Guest

    On Dec 29 2007, 11:10 pm, Raymond Hettinger <> wrote:
    > I'm considering deprecating these two functions and would like some
    > feedback from the community or from people who have a background in
    > functional programming.


    Well I have just this minute used dropwhile in anger, to find the next
    suitable filename when writing database dumps using date.count names:

    filename = "%02d-%02d-%d" % (now.day, now.month, now.year)
    if os.path.exists(filename):
    candidates = ("%s.%d" % (filename, x) for x in count(1))
    filename = dropwhile(os.path.exists, candidates).next()

    Much clearer than the alternatives I think, please keep dropwhile and
    takewhile in itertools ;)

    Cheers,

    Doug.
     
    , Jan 3, 2008
    #15
  16. On Jan 3, 4:39 pm, "" <> wrote:
    > On Dec 29 2007, 11:10 pm, Raymond Hettinger <> wrote:
    >
    > > I'm considering deprecating these two functions and would like some
    > > feedback from the community or from people who have a background in
    > > functional programming.

    >
    > Well I have just this minute used dropwhile in anger, to find the next
    > suitable filename when writing database dumps using date.count names:
    >
    >     filename = "%02d-%02d-%d" % (now.day, now.month, now.year)
    >     if os.path.exists(filename):
    >         candidates = ("%s.%d" % (filename, x) for x in count(1))
    >         filename = dropwhile(os.path.exists, candidates).next()
    >
    > Much clearer than the alternatives I think, please keep dropwhile and
    > takewhile in itertools ;)


    Wouldn't using ifilterfalse instead of dropwhile produce the same
    result?

    --
    Arnaud
     
    Arnaud Delobelle, Jan 3, 2008
    #16
  17. Raymond Hettinger

    Paul Rubin Guest

    Raymond Hettinger <> writes:
    > > I presume you did scans of
    > > large code bases and you did not find occurrences of
    > > takewhile and dropwhile, right?

    >
    > Yes.


    I think I have used them. I don't remember exactly how. Probably
    something that could have been done more generally with groupby.

    I remember a clpy thread about a takewhile gotcha, that it consumes an
    extra element:

    >>> from itertools import takewhile as tw
    >>> x = range(10)
    >>> z = iter(x)
    >>> list(tw(lambda i:i<5, z))

    [0, 1, 2, 3, 4]
    >>> z.next()

    6

    I.e. I had wanted to use takewhile to split a list into the
    initial sublist satisfying some condition, and the rest of the
    list.

    This all by itself is something to at least warn about. I don't
    know if it's enough for deprecation.

    I've been cooking up a scheme for iterators with lookahead, that I
    want to get around to coding and posting. It's a harder thing
    to get right than it at first appears.
     
    Paul Rubin, Jan 11, 2008
    #17
  18. On Dec 29, 2007 11:10 PM, Raymond Hettinger <> wrote:
    > I'm considering deprecating these two functions and would like some
    > feedback from the community or from people who have a background in
    > functional programming.


    Personally, I'd rather you kept them around. I have no FP background,
    and I found them easy enough to understand.

    > These thoughts reflect my own experience with the itertools module.
    > It may be that your experience with them has been different. Please
    > let me know what you think.


    FWIW, I used them only today: http://tinyurl.com/22q6cb

    Not sure if something that ugly counts as a reason for keeping them
    around, though!

    --
    Cheers,
    Simon B.

    http://www.brunningonline.net/simon/blog/
    GTalk: simon.brunning | MSN: small_values | Yahoo: smallvalues
     
    Simon Brunning, Feb 18, 2008
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chas Emerick
    Replies:
    2
    Views:
    292
    =?ISO-8859-1?Q?=22Elveto=2C_artiste-ing=E9nieur_en
    Aug 21, 2004
  2. Christopher A. Craig
    Replies:
    10
    Views:
    548
    Michael J. Fromberger
    Aug 21, 2004
  3. Steven Bethard
    Replies:
    0
    Views:
    411
    Steven Bethard
    Mar 12, 2005
  4. Rajanikanth Jammalamadaka

    dropwhile question

    Rajanikanth Jammalamadaka, Aug 23, 2008, in forum: Python
    Replies:
    2
    Views:
    403
    Rajanikanth Jammalamadaka
    Aug 24, 2008
  5. Nick Mellor
    Replies:
    35
    Views:
    394
    Paul Rubin
    Dec 6, 2012
Loading...

Share This Page