A gnarly little python loop

Discussion in 'Python' started by Roy Smith, Nov 10, 2012.

  1. Roy Smith

    Roy Smith Guest

    I'm trying to pull down tweets with one of the many twitter APIs. The
    particular one I'm using (python-twitter), has a call:

    data = api.GetSearch(term="foo", page=page)

    The way it works, you start with page=1. It returns a list of tweets.
    If the list is empty, there are no more tweets. If the list is not
    empty, you can try to get more tweets by asking for page=2, page=3, etc.
    I've got:

    page = 1
    while 1:
    r = api.GetSearch(term="foo", page=page)
    if not r:
    break
    for tweet in r:
    process(tweet)
    page += 1

    It works, but it seems excessively fidgety. Is there some cleaner way
    to refactor this?
     
    Roy Smith, Nov 10, 2012
    #1
    1. Advertising

  2. Roy Smith

    Ian Kelly Guest

    On Sat, Nov 10, 2012 at 3:58 PM, Roy Smith <> wrote:
    > I'm trying to pull down tweets with one of the many twitter APIs. The
    > particular one I'm using (python-twitter), has a call:
    >
    > data = api.GetSearch(term="foo", page=page)
    >
    > The way it works, you start with page=1. It returns a list of tweets.
    > If the list is empty, there are no more tweets. If the list is not
    > empty, you can try to get more tweets by asking for page=2, page=3, etc.
    > I've got:
    >
    > page = 1
    > while 1:
    > r = api.GetSearch(term="foo", page=page)
    > if not r:
    > break
    > for tweet in r:
    > process(tweet)
    > page += 1
    >
    > It works, but it seems excessively fidgety. Is there some cleaner way
    > to refactor this?


    I'd do something like this:

    def get_tweets(term):
    for page in itertools.count(1):
    r = api.GetSearch(term, page)
    if not r:
    break
    for tweet in r:
    yield tweet

    for tweet in get_tweets("foo"):
    process(tweet)
     
    Ian Kelly, Nov 10, 2012
    #2
    1. Advertising

  3. On Sat, 10 Nov 2012 17:58:14 -0500, Roy Smith wrote:

    > The way it works, you start with page=1. It returns a list of tweets.
    > If the list is empty, there are no more tweets. If the list is not
    > empty, you can try to get more tweets by asking for page=2, page=3, etc.
    > I've got:
    >
    > page = 1
    > while 1:
    > r = api.GetSearch(term="foo", page=page)
    > if not r:
    > break
    > for tweet in r:
    > process(tweet)
    > page += 1
    >
    > It works, but it seems excessively fidgety. Is there some cleaner way
    > to refactor this?



    Seems clean enough to me. It does exactly what you need: loop until there
    are no more tweets, process each tweet.

    If you're allergic to nested loops, move the inner for-loop into a
    function. Also you could get rid of the "if r: break".

    page = 1
    r = ["placeholder"]
    while r:
    r = api.GetSearch(term="foo", page=page)
    process_all(tweets) # does nothing if r is empty
    page += 1


    Another way would be to use a for list for the outer loop.

    for page in xrange(1, sys.maxint):
    r = api.GetSearch(term="foo", page=page)
    if not r: break
    process_all(r)



    --
    Steven
     
    Steven D'Aprano, Nov 11, 2012
    #3
  4. Roy Smith

    Steve Howell Guest

    On Nov 10, 2:58 pm, Roy Smith <> wrote:
    > I'm trying to pull down tweets with one of the many twitter APIs.  The
    > particular one I'm using (python-twitter), has a call:
    >
    > data = api.GetSearch(term="foo", page=page)
    >
    > The way it works, you start with page=1.  It returns a list of tweets..
    > If the list is empty, there are no more tweets.  If the list is not
    > empty, you can try to get more tweets by asking for page=2, page=3, etc.
    > I've got:
    >
    >     page = 1
    >     while 1:
    >         r = api.GetSearch(term="foo", page=page)
    >         if not r:
    >             break
    >         for tweet in r:
    >             process(tweet)
    >         page += 1
    >
    > It works, but it seems excessively fidgety.  Is there some cleaner way
    > to refactor this?


    I think your code is perfectly readable and clean, but you can flatten
    it like so:

    def get_tweets(term, get_page):
    page_nums = itertools.count(1)
    pages = itertools.imap(api.getSearch, page_nums)
    valid_pages = itertools.takewhile(bool, pages)
    tweets = itertools.chain.from_iterable(valid_pages)
    return tweets
     
    Steve Howell, Nov 11, 2012
    #4
  5. Steve Howell, 11.11.2012 04:03:
    > On Nov 10, 2:58 pm, Roy Smith <> wrote:
    >> I'm trying to pull down tweets with one of the many twitter APIs. The
    >> particular one I'm using (python-twitter), has a call:
    >>
    >> data = api.GetSearch(term="foo", page=page)
    >>
    >> The way it works, you start with page=1. It returns a list of tweets.
    >> If the list is empty, there are no more tweets. If the list is not
    >> empty, you can try to get more tweets by asking for page=2, page=3, etc.
    >> I've got:
    >>
    >> page = 1
    >> while 1:
    >> r = api.GetSearch(term="foo", page=page)
    >> if not r:
    >> break
    >> for tweet in r:
    >> process(tweet)
    >> page += 1
    >>
    >> It works, but it seems excessively fidgety. Is there some cleaner way
    >> to refactor this?

    >
    > I think your code is perfectly readable and clean, but you can flatten
    > it like so:
    >
    > def get_tweets(term, get_page):
    > page_nums = itertools.count(1)
    > pages = itertools.imap(api.getSearch, page_nums)
    > valid_pages = itertools.takewhile(bool, pages)
    > tweets = itertools.chain.from_iterable(valid_pages)
    > return tweets


    I'd prefer the original code ten times over this inaccessible beast.

    Stefan
     
    Stefan Behnel, Nov 11, 2012
    #5
  6. Roy Smith

    rusi Guest

    On Nov 11, 3:58 am, Roy Smith <> wrote:
    > I'm trying to pull down tweets with one of the many twitter APIs.  The
    > particular one I'm using (python-twitter), has a call:
    >
    > data = api.GetSearch(term="foo", page=page)
    >
    > The way it works, you start with page=1.  It returns a list of tweets..
    > If the list is empty, there are no more tweets.  If the list is not
    > empty, you can try to get more tweets by asking for page=2, page=3, etc.
    > I've got:
    >
    >     page = 1
    >     while 1:
    >         r = api.GetSearch(term="foo", page=page)
    >         if not r:
    >             break
    >         for tweet in r:
    >             process(tweet)
    >         page += 1
    >
    > It works, but it seems excessively fidgety.  Is there some cleaner way
    > to refactor this?


    This is a classic problem -- structure clash of parallel loops -- nd
    Steve Howell has given the classic solution using the fact that
    generators in python simulate/implement lazy lists.
    As David Beazley http://www.dabeaz.com/coroutines/ explains,
    coroutines are more general than generators and you can use those if
    you prefer.

    The classic problem used to be stated like this:
    There is an input in cards of 80 columns.
    It needs to be copied onto printer of 132 columns.

    The structure clash arises because after reading 80 chars a new card
    has to be read; after printing 132 chars a linefeed has to be given.

    To pythonize the problem, lets replace the 80,132 by 3,4, ie take the
    char-square
    abc
    def
    ghi

    and produce
    abcd
    efgh
    i

    The important difference (explained nicely by Beazley) is that in
    generators the for-loop pulls the generators, in coroutines, the
    'generator' pushes the consuming coroutines.


    ---------------
    from __future__ import print_function
    s= ["abc", "def", "ghi"]

    # Coroutine-infrastructure from pep 342
    def consumer(func):
    def wrapper(*args,**kw):
    gen = func(*args, **kw)
    gen.next()
    return gen
    return wrapper

    @consumer
    def endStage():
    while True:
    for i in range(0,4):
    print((yield), sep='', end='')
    print("\n", sep='', end='')


    def genStage(s, target):
    for line in s:
    for i in range(0,3):
    target.send(line)


    if __name__ == '__main__':
    genStage(s, endStage())
     
    rusi, Nov 12, 2012
    #6
  7. Roy Smith

    rusi Guest

    On Nov 12, 12:09 pm, rusi <> wrote:
    > This is a classic problem -- structure clash of parallel loops

    <rest snipped>

    Sorry wrong solution :D

    The fidgetiness is entirely due to python not allowing C-style loops
    like these:
    >> while ((c=getchar()!= EOF) { ... }



    Putting it into coroutine form, it becomes something like the
    following [Untested since I dont have the API]. Clearly the
    fidgetiness is there as before and now with extra coroutine plumbing

    def genStage(term, target):
    page = 1
    while 1:
    r = api.GetSearch(term="foo", page=page)
    if not r: break
    for tweet in r: target.send(tweet)
    page += 1


    @consumer
    def endStage():
    while True: process((yield))

    if __name__ == '__main__':
    genStage("foo", endStage())
     
    rusi, Nov 12, 2012
    #7
  8. Roy Smith

    Peter Otten Guest

    rusi wrote:

    > The fidgetiness is entirely due to python not allowing C-style loops
    > like these:
    > >>> while ((c=getchar()!= EOF) { ... }


    for c in iter(getchar, EOF):
    ...

    > Clearly the fidgetiness is there as before and now with extra coroutine
    > plumbing


    Hmm, very funny...
     
    Peter Otten, Nov 12, 2012
    #8
  9. Roy Smith

    Steve Howell Guest

    On Nov 12, 7:21 am, rusi <> wrote:
    > On Nov 12, 12:09 pm, rusi <> wrote:> This is a classic problem -- structure clash of parallel loops
    >
    > <rest snipped>
    >
    > Sorry wrong solution :D
    >
    > The fidgetiness is entirely due to python not allowing C-style loops
    > like these:
    >
    > >> while ((c=getchar()!= EOF) { ... }

    > [...]


    There are actually three fidgety things going on:

    1. The API is 1-based instead of 0-based.
    2. You don't know the number of pages in advance.
    3. You want to process tweets, not pages of tweets.

    Here's yet another take on the problem:

    # wrap fidgety 1-based api
    def search(i):
    return api.GetSearch("foo", i+1)

    paged_tweets = (search(i) for i in count())

    # handle sentinel
    paged_tweets = iter(paged_tweets.next, [])

    # flatten pages
    tweets = chain.from_iterable(paged_tweets)
    for tweet in tweets:
    process(tweet)
     
    Steve Howell, Nov 12, 2012
    #9
  10. Roy Smith

    rusi Guest

    On Nov 12, 9:09 pm, Steve Howell <> wrote:
    > On Nov 12, 7:21 am, rusi <> wrote:
    >
    > > On Nov 12, 12:09 pm, rusi <> wrote:> This is a classic problem -- structure clash of parallel loops

    >
    > > <rest snipped>

    >
    > > Sorry wrong solution :D

    >
    > > The fidgetiness is entirely due to python not allowing C-style loops
    > > like these:

    >
    > > >> while ((c=getchar()!= EOF) { ... }

    > > [...]

    >
    > There are actually three fidgety things going on:
    >
    >  1. The API is 1-based instead of 0-based.
    >  2. You don't know the number of pages in advance.
    >  3. You want to process tweets, not pages of tweets.
    >
    > Here's yet another take on the problem:
    >
    >     # wrap fidgety 1-based api
    >     def search(i):
    >         return api.GetSearch("foo", i+1)
    >
    >     paged_tweets = (search(i) for i in count())
    >
    >     # handle sentinel
    >     paged_tweets = iter(paged_tweets.next, [])
    >
    >     # flatten pages
    >     tweets = chain.from_iterable(paged_tweets)
    >     for tweet in tweets:
    >         process(tweet)


    [Steve Howell]
    Nice on the whole -- thanks
    Could not the 1-based-ness be dealt with by using count(1)?
    ie use
    paged_tweets = (api.GetSearch("foo", i) for i in count(1))

    {Peter]
    > >>> while ((c=getchar()!= EOF) { ... }


    for c in iter(getchar, EOF):
    ...

    Thanks. Learnt something
     
    rusi, Nov 13, 2012
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Horta

    A little more advanced for loop

    Horta, Feb 9, 2007, in forum: Python
    Replies:
    6
    Views:
    297
  2. ThaDoctor
    Replies:
    3
    Views:
    392
    Alan Woodland
    Sep 28, 2007
  3. Cameron Simpson

    Re: A gnarly little python loop

    Cameron Simpson, Nov 11, 2012, in forum: Python
    Replies:
    10
    Views:
    247
    Steve Howell
    Nov 12, 2012
  4. Isaac Won
    Replies:
    9
    Views:
    397
    Ulrich Eckhardt
    Mar 4, 2013
  5. Daniel
    Replies:
    1
    Views:
    220
    Bart van Ingen Schenau
    Jul 9, 2013
Loading...

Share This Page