Re: A gnarly little python loop

Discussion in 'Python' started by Cameron Simpson, Nov 11, 2012.

  1. On 11Nov2012 08:56, Stefan Behnel <> wrote:
    | Steve Howell, 11.11.2012 04:03:
    | > On Nov 10, 2:58 pm, Roy Smith <> wrote:
    | >> page = 1
    | >> while 1:
    | >> r = api.GetSearch(term="foo", page=page)
    | >> if not r:
    | >> break
    | >> for tweet in r:
    | >> process(tweet)
    | >> page += 1
    | >>
    | >> It works, but it seems excessively fidgety. Is there some cleaner way
    | >> to refactor this?
    | >
    | > I think your code is perfectly readable and clean, but you can flatten
    | > it like so:
    | >
    | > def get_tweets(term, get_page):
    | > page_nums = itertools.count(1)
    | > pages = itertools.imap(api.getSearch, page_nums)
    | > valid_pages = itertools.takewhile(bool, pages)
    | > tweets = itertools.chain.from_iterable(valid_pages)
    | > return tweets
    |
    | I'd prefer the original code ten times over this inaccessible beast.

    Me too.
    --
    Cameron Simpson <>

    In an insane society, the sane man must appear insane.
    - Keith A. Schauer <>
    Cameron Simpson, Nov 11, 2012
    #1
    1. Advertising

  2. Cameron Simpson

    Paul Rubin Guest

    Cameron Simpson <> writes:
    > | I'd prefer the original code ten times over this inaccessible beast.
    > Me too.


    Me, I like the itertools version better. There's one chunk of data
    that goes through a succession of transforms each of which
    is very straightforward.
    Paul Rubin, Nov 11, 2012
    #2
    1. Advertising

  3. Cameron Simpson

    Peter Otten Guest

    Paul Rubin wrote:

    > Cameron Simpson <> writes:
    >> | I'd prefer the original code ten times over this inaccessible beast.
    >> Me too.

    >
    > Me, I like the itertools version better. There's one chunk of data
    > that goes through a succession of transforms each of which
    > is very straightforward.


    [Steve Howell]
    > def get_tweets(term, get_page):
    > page_nums = itertools.count(1)
    > pages = itertools.imap(api.getSearch, page_nums)
    > valid_pages = itertools.takewhile(bool, pages)
    > tweets = itertools.chain.from_iterable(valid_pages)
    > return tweets



    But did you spot the bug(s)?
    My itertools-based version would look like this

    def get_tweets(term):
    pages = (api.GetSearch(term, pageno)
    for pageno in itertools.count(1))
    for page in itertools.takewhile(bool, pages):
    yield from page

    but I can understand that it's not everybody's cup of tea.
    Peter Otten, Nov 11, 2012
    #3
  4. Cameron Simpson

    Steve Howell Guest

    On Sunday, November 11, 2012 1:54:46 AM UTC-8, Peter Otten wrote:
    > Paul Rubin wrote:
    >
    >
    >
    > > Cameron Simpson <> writes:

    >
    > >> | I'd prefer the original code ten times over this inaccessible beast.

    >
    > >> Me too.

    >
    > >

    >
    > > Me, I like the itertools version better. There's one chunk of data

    >
    > > that goes through a succession of transforms each of which

    >
    > > is very straightforward.

    >
    >
    >
    > [Steve Howell]
    >
    > > def get_tweets(term, get_page):

    >
    > > page_nums = itertools.count(1)

    >
    > > pages = itertools.imap(api.getSearch, page_nums)

    >
    > > valid_pages = itertools.takewhile(bool, pages)

    >
    > > tweets = itertools.chain.from_iterable(valid_pages)

    >
    > > return tweets

    >
    >
    >
    >
    >
    > But did you spot the bug(s)?
    >


    My first version was sketching out the technique, and I don't have handy access to the API.

    Here is an improved version:

    def get_tweets(term):
    def get_page(page):
    return getSearch(term, page)
    page_nums = itertools.count(1)
    pages = itertools.imap(get_page, page_nums)
    valid_pages = itertools.takewhile(bool, pages)
    tweets = itertools.chain.from_iterable(valid_pages)
    return tweets

    for tweet in get_tweets("foo"):
    process(tweet)

    This is what I used to test it:


    def getSearch(term = "foo", page = 1):
    # simulate api for testing
    if page < 5:
    return [
    'page %d, tweet A for term %s' % (page, term),
    'page %d, tweet B for term %s' % (page, term),
    ]
    else:
    return None

    def process(tweet):
    print tweet
    Steve Howell, Nov 11, 2012
    #4
  5. Cameron Simpson

    Steve Howell Guest

    On Sunday, November 11, 2012 1:54:46 AM UTC-8, Peter Otten wrote:
    > Paul Rubin wrote:
    >
    >
    >
    > > Cameron Simpson <> writes:

    >
    > >> | I'd prefer the original code ten times over this inaccessible beast.

    >
    > >> Me too.

    >
    > >

    >
    > > Me, I like the itertools version better. There's one chunk of data

    >
    > > that goes through a succession of transforms each of which

    >
    > > is very straightforward.

    >
    >
    >
    > [Steve Howell]
    >
    > > def get_tweets(term, get_page):

    >
    > > page_nums = itertools.count(1)

    >
    > > pages = itertools.imap(api.getSearch, page_nums)

    >
    > > valid_pages = itertools.takewhile(bool, pages)

    >
    > > tweets = itertools.chain.from_iterable(valid_pages)

    >
    > > return tweets

    >
    >
    >
    >
    >
    > But did you spot the bug(s)?
    >


    My first version was sketching out the technique, and I don't have handy access to the API.

    Here is an improved version:

    def get_tweets(term):
    def get_page(page):
    return getSearch(term, page)
    page_nums = itertools.count(1)
    pages = itertools.imap(get_page, page_nums)
    valid_pages = itertools.takewhile(bool, pages)
    tweets = itertools.chain.from_iterable(valid_pages)
    return tweets

    for tweet in get_tweets("foo"):
    process(tweet)

    This is what I used to test it:


    def getSearch(term = "foo", page = 1):
    # simulate api for testing
    if page < 5:
    return [
    'page %d, tweet A for term %s' % (page, term),
    'page %d, tweet B for term %s' % (page, term),
    ]
    else:
    return None

    def process(tweet):
    print tweet
    Steve Howell, Nov 11, 2012
    #5
  6. Cameron Simpson

    Steve Howell Guest

    On Nov 11, 1:09 am, Paul Rubin <> wrote:
    > Cameron Simpson <> writes:
    > > | I'd prefer the original code ten times over this inaccessible beast.
    > > Me too.

    >
    > Me, I like the itertools version better.  There's one chunk of data
    > that goes through a succession of transforms each of which
    > is very straightforward.


    Thanks, Paul.

    Even though I supplied the "inaccessible" itertools version, I can
    understand why folks find it inaccessible. As I said to the OP, there
    was nothing wrong with the original imperative approach; I was simply
    providing an alternative.

    It took me a while to appreciate itertools, but the metaphor that
    resonates with me is a Unix pipeline. It's just a metaphor, so folks
    shouldn't be too literal, but the idea here is this:

    page_nums -> pages -> valid_pages -> tweets

    The transforms are this:

    page_nums -> pages: call API via imap
    pages -> valid_pages: take while true
    valid_pages -> tweets: use chain.from_iterable to flatten results

    Here's the code again for context:

    def get_tweets(term):
    def get_page(page):
    return getSearch(term, page)
    page_nums = itertools.count(1)
    pages = itertools.imap(get_page, page_nums)
    valid_pages = itertools.takewhile(bool, pages)
    tweets = itertools.chain.from_iterable(valid_pages)
    return tweets
    Steve Howell, Nov 11, 2012
    #6
  7. Cameron Simpson

    Peter Otten Guest

    Steve Howell wrote:

    > On Nov 11, 1:09 am, Paul Rubin <> wrote:
    >> Cameron Simpson <> writes:
    >> > | I'd prefer the original code ten times over this inaccessible beast.
    >> > Me too.

    >>
    >> Me, I like the itertools version better. There's one chunk of data
    >> that goes through a succession of transforms each of which
    >> is very straightforward.

    >
    > Thanks, Paul.
    >
    > Even though I supplied the "inaccessible" itertools version, I can
    > understand why folks find it inaccessible. As I said to the OP, there
    > was nothing wrong with the original imperative approach; I was simply
    > providing an alternative.
    >
    > It took me a while to appreciate itertools, but the metaphor that
    > resonates with me is a Unix pipeline. It's just a metaphor, so folks
    > shouldn't be too literal, but the idea here is this:
    >
    > page_nums -> pages -> valid_pages -> tweets
    >
    > The transforms are this:
    >
    > page_nums -> pages: call API via imap
    > pages -> valid_pages: take while true
    > valid_pages -> tweets: use chain.from_iterable to flatten results
    >
    > Here's the code again for context:
    >
    > def get_tweets(term):
    > def get_page(page):
    > return getSearch(term, page)
    > page_nums = itertools.count(1)
    > pages = itertools.imap(get_page, page_nums)
    > valid_pages = itertools.takewhile(bool, pages)
    > tweets = itertools.chain.from_iterable(valid_pages)
    > return tweets
    >


    Actually you supplied the "accessible" itertools version. For reference,
    here's the inaccessible version:

    class api:
    """Twitter search API mock-up"""
    pages = [
    ["a", "b", "c"],
    ["d", "e"],
    ]
    @staticmethod
    def GetSearch(term, page):
    assert term == "foo"
    assert page >= 1
    if page > len(api.pages):
    return []
    return api.pages[page-1]

    from collections import deque
    from functools import partial
    from itertools import chain, count, imap, takewhile

    def process(tweet):
    print tweet

    term = "foo"

    deque(
    imap(
    process,
    chain.from_iterable(
    takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
    maxlen=0)

    ;)
    Peter Otten, Nov 11, 2012
    #7
  8. Cameron Simpson

    Steve Howell Guest

    On Nov 11, 10:34 am, Peter Otten <> wrote:
    > Steve Howell wrote:
    > > On Nov 11, 1:09 am, Paul Rubin <> wrote:
    > >> Cameron Simpson <> writes:
    > >> > | I'd prefer the original code ten times over this inaccessible beast.
    > >> > Me too.

    >
    > >> Me, I like the itertools version better.  There's one chunk of data
    > >> that goes through a succession of transforms each of which
    > >> is very straightforward.

    >
    > > Thanks, Paul.

    >
    > > Even though I supplied the "inaccessible" itertools version, I can
    > > understand why folks find it inaccessible.  As I said to the OP, there
    > > was nothing wrong with the original imperative approach; I was simply
    > > providing an alternative.

    >
    > > It took me a while to appreciate itertools, but the metaphor that
    > > resonates with me is a Unix pipeline.  It's just a metaphor, so folks
    > > shouldn't be too literal, but the idea here is this:

    >
    > >   page_nums -> pages -> valid_pages -> tweets

    >
    > > The transforms are this:

    >
    > >   page_nums -> pages: call API via imap
    > >   pages -> valid_pages: take while true
    > >   valid_pages -> tweets: use chain.from_iterable to flatten results

    >
    > > Here's the code again for context:

    >
    > >     def get_tweets(term):
    > >         def get_page(page):
    > >             return getSearch(term, page)
    > >         page_nums = itertools.count(1)
    > >         pages = itertools.imap(get_page, page_nums)
    > >         valid_pages = itertools.takewhile(bool, pages)
    > >         tweets = itertools.chain.from_iterable(valid_pages)
    > >         return tweets

    >
    > Actually you supplied the "accessible" itertools version. For reference,
    > here's the inaccessible version:
    >
    > class api:
    >     """Twitter search API mock-up"""
    >     pages = [
    >         ["a", "b", "c"],
    >         ["d", "e"],
    >         ]
    >     @staticmethod
    >     def GetSearch(term, page):
    >         assert term == "foo"
    >         assert page >= 1
    >         if page > len(api.pages):
    >             return []
    >         return api.pages[page-1]
    >
    > from collections import deque
    > from functools import partial
    > from itertools import chain, count, imap, takewhile
    >
    > def process(tweet):
    >     print tweet
    >
    > term = "foo"
    >
    > deque(
    >     imap(
    >         process,
    >         chain.from_iterable(
    >             takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
    >     maxlen=0)
    >
    > ;)


    I know Peter's version is tongue in cheek, but I do think that it has
    a certain expressive power, and it highlights three mind-expanding
    Python modules.

    Here's a re-flattened take on Peter's version ("Flat is better than
    nested." -- PEP 20):

    term = "foo"
    search = partial(api.GetSearch, term)
    nums = count(1)
    paged_tweets = imap(search, nums)
    paged_tweets = takewhile(bool, paged_tweets)
    tweets = chain.from_iterable(paged_tweets)
    processed_tweets = imap(process, tweets)
    deque(processed_tweets, maxlen=0)

    The use of deque to exhaust an iterator is slightly overboard IMHO,
    but all the other lines of code can be fairly easily understood once
    you read the docs.

    partial: http://docs.python.org/2/library/functools.html
    count, imap, takewhile, chain.from_iterable:
    http://docs.python.org/2/library/itertools.html
    deque: http://docs.python.org/2/library/collections.html
    Steve Howell, Nov 11, 2012
    #8
  9. Cameron Simpson

    Roy Smith Guest

    In article <>,
    Peter Otten <> wrote:

    > deque(
    > imap(
    > process,
    > chain.from_iterable(
    > takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
    > maxlen=0)
    >
    > ;)


    If I wanted STL, I would still be writing C++ :)
    Roy Smith, Nov 11, 2012
    #9
  10. On 11Nov2012 11:16, Steve Howell <> wrote:
    | On Nov 11, 10:34 am, Peter Otten <> wrote:
    | > Steve Howell wrote:
    | > > On Nov 11, 1:09 am, Paul Rubin <> wrote:
    | > >> Cameron Simpson <> writes:
    | > >> > | I'd prefer the original code ten times over this inaccessible beast.
    | > >> > Me too.
    | >
    | > >> Me, I like the itertools version better.  There's one chunk of data
    | > >> that goes through a succession of transforms each of which
    | > >> is very straightforward.
    | >
    | > > Thanks, Paul.
    | >
    | > > Even though I supplied the "inaccessible" itertools version, I can
    | > > understand why folks find it inaccessible.  As I said to the OP, there
    | > > was nothing wrong with the original imperative approach; I was simply
    | > > providing an alternative.
    | >
    | > > It took me a while to appreciate itertools, but the metaphor that
    | > > resonates with me is a Unix pipeline.
    [...]
    | > Actually you supplied the "accessible" itertools version. For reference,
    | > here's the inaccessible version:
    [...]
    | I know Peter's version is tongue in cheek, but I do think that it has
    | a certain expressive power, and it highlights three mind-expanding
    | Python modules.
    | Here's a re-flattened take on Peter's version ("Flat is better than
    | nested." -- PEP 20):
    [...]

    Ok, who's going to quiz the OP on his/her uptake of these techniques...
    --
    Cameron Simpson <>

    It's hard to make a man understand something when his livelihood depends
    on him not understanding it. - Upton Sinclair
    Cameron Simpson, Nov 12, 2012
    #10
  11. Cameron Simpson

    Steve Howell Guest

    On Nov 11, 4:44 pm, Cameron Simpson <> wrote:
    > On 11Nov2012 11:16, Steve Howell <> wrote:
    > | On Nov 11, 10:34 am, Peter Otten <> wrote:
    > | > Steve Howell wrote:
    > | > > On Nov 11, 1:09 am, Paul Rubin <> wrote:
    > | > >> Cameron Simpson <> writes:
    > | > >> > | I'd prefer the original code ten times over this inaccessible beast.
    > | > >> > Me too.
    > | >
    > | > >> Me, I like the itertools version better.  There's one chunk of data
    > | > >> that goes through a succession of transforms each of which
    > | > >> is very straightforward.
    > | >
    > | > > Thanks, Paul.
    > | >
    > | > > Even though I supplied the "inaccessible" itertools version, I can
    > | > > understand why folks find it inaccessible.  As I said to the OP, there
    > | > > was nothing wrong with the original imperative approach; I was simply
    > | > > providing an alternative.
    > | >
    > | > > It took me a while to appreciate itertools, but the metaphor that
    > | > > resonates with me is a Unix pipeline.
    > [...]
    > | > Actually you supplied the "accessible" itertools version. For reference,
    > | > here's the inaccessible version:
    > [...]
    > | I know Peter's version is tongue in cheek, but I do think that it has
    > | a certain expressive power, and it highlights three mind-expanding
    > | Python modules.
    > | Here's a re-flattened take on Peter's version ("Flat is better than
    > | nested." -- PEP 20):
    > [...]
    >
    > Ok, who's going to quiz the OP on his/her uptake of these techniques...


    Cameron, with all due respect, I think you're missing the point.

    Roy posted this code:

    page = 1
    while 1:
    r = api.GetSearch(term="foo", page=page)
    if not r:
    break
    for tweet in r:
    process(tweet)
    page += 1

    In his own words, he described the loop as "gnarly" and the overall
    code as "fidgety."

    One way to eliminate the "while", the "if", and the "break" statements
    is to use higher level constructs that are shipped with all modern
    versions of Python, and which are well documented and well tested (and
    fast, I might add):

    search = partial(api.GetSearch, "foo")
    paged_tweets = imap(search, count(1))
    paged_tweets = takewhile(bool, paged_tweets)
    tweets = chain.from_iterable(paged_tweets)
    for tweet in tweets:
    process(tweet)

    The moral of the story is that you can avoid brittle loops by relying
    on a well-tested library to work at a higher level of abstraction.

    For this particular use case, the imperative version is fine, but for
    more complex use cases, the loops are only gonna get more gnarly and
    fidgety.
    Steve Howell, Nov 12, 2012
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Horta

    A little more advanced for loop

    Horta, Feb 9, 2007, in forum: Python
    Replies:
    6
    Views:
    294
  2. ThaDoctor
    Replies:
    3
    Views:
    382
    Alan Woodland
    Sep 28, 2007
  3. Roy Smith

    A gnarly little python loop

    Roy Smith, Nov 10, 2012, in forum: Python
    Replies:
    9
    Views:
    212
  4. Isaac Won
    Replies:
    9
    Views:
    372
    Ulrich Eckhardt
    Mar 4, 2013
  5. Daniel
    Replies:
    1
    Views:
    210
    Bart van Ingen Schenau
    Jul 9, 2013
Loading...

Share This Page