A gnarly little python loop

Roy Smith · Nov 10, 2012

I'm trying to pull down tweets with one of the many twitter APIs. The
particular one I'm using (python-twitter), has a call:

data = api.GetSearch(term="foo", page=page)

The way it works, you start with page=1. It returns a list of tweets.
If the list is empty, there are no more tweets. If the list is not
empty, you can try to get more tweets by asking for page=2, page=3, etc.
I've got:

page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r:
break
for tweet in r:
process(tweet)
page += 1

It works, but it seems excessively fidgety. Is there some cleaner way
to refactor this?

Ian Kelly · Nov 10, 2012

I'm trying to pull down tweets with one of the many twitter APIs. The
particular one I'm using (python-twitter), has a call:

data = api.GetSearch(term="foo", page=page)

The way it works, you start with page=1. It returns a list of tweets.
If the list is empty, there are no more tweets. If the list is not
empty, you can try to get more tweets by asking for page=2, page=3, etc.
I've got:

page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r:
break
for tweet in r:
process(tweet)
page += 1

It works, but it seems excessively fidgety. Is there some cleaner way
to refactor this?

I'd do something like this:

def get_tweets(term):
for page in itertools.count(1):
r = api.GetSearch(term, page)
if not r:
break
for tweet in r:
yield tweet

for tweet in get_tweets("foo"):
process(tweet)

Steven D'Aprano · Nov 11, 2012

The way it works, you start with page=1. It returns a list of tweets.
If the list is empty, there are no more tweets. If the list is not
empty, you can try to get more tweets by asking for page=2, page=3, etc.
I've got:

page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r:
break
for tweet in r:
process(tweet)
page += 1

It works, but it seems excessively fidgety. Is there some cleaner way
to refactor this?

Seems clean enough to me. It does exactly what you need: loop until there
are no more tweets, process each tweet.

If you're allergic to nested loops, move the inner for-loop into a
function. Also you could get rid of the "if r: break".

page = 1
r = ["placeholder"]
while r:
r = api.GetSearch(term="foo", page=page)
process_all(tweets) # does nothing if r is empty
page += 1

Another way would be to use a for list for the outer loop.

for page in xrange(1, sys.maxint):
r = api.GetSearch(term="foo", page=page)
if not r: break
process_all(r)

Steve Howell · Nov 11, 2012

I'm trying to pull down tweets with one of the many twitter APIs. The
particular one I'm using (python-twitter), has a call:

data = api.GetSearch(term="foo", page=page)

The way it works, you start with page=1. It returns a list of tweets..
If the list is empty, there are no more tweets. If the list is not
empty, you can try to get more tweets by asking for page=2, page=3, etc.
I've got:

page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r:
break
for tweet in r:
process(tweet)
page += 1

It works, but it seems excessively fidgety. Is there some cleaner way
to refactor this?

I think your code is perfectly readable and clean, but you can flatten
it like so:

def get_tweets(term, get_page):
page_nums = itertools.count(1)
pages = itertools.imap(api.getSearch, page_nums)
valid_pages = itertools.takewhile(bool, pages)
tweets = itertools.chain.from_iterable(valid_pages)
return tweets

Stefan Behnel · Nov 11, 2012

Steve Howell, 11.11.2012 04:03:

I think your code is perfectly readable and clean, but you can flatten
it like so:

def get_tweets(term, get_page):
page_nums = itertools.count(1)
pages = itertools.imap(api.getSearch, page_nums)
valid_pages = itertools.takewhile(bool, pages)
tweets = itertools.chain.from_iterable(valid_pages)
return tweets

I'd prefer the original code ten times over this inaccessible beast.

Stefan

rusi · Nov 12, 2012

I'm trying to pull down tweets with one of the many twitter APIs. The
particular one I'm using (python-twitter), has a call:

data = api.GetSearch(term="foo", page=page)

The way it works, you start with page=1. It returns a list of tweets..
If the list is empty, there are no more tweets. If the list is not
empty, you can try to get more tweets by asking for page=2, page=3, etc.
I've got:

page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r:
break
for tweet in r:
process(tweet)
page += 1

It works, but it seems excessively fidgety. Is there some cleaner way
to refactor this?

This is a classic problem -- structure clash of parallel loops -- nd
Steve Howell has given the classic solution using the fact that
generators in python simulate/implement lazy lists.
As David Beazley http://www.dabeaz.com/coroutines/ explains,
coroutines are more general than generators and you can use those if
you prefer.

The classic problem used to be stated like this:
There is an input in cards of 80 columns.
It needs to be copied onto printer of 132 columns.

The structure clash arises because after reading 80 chars a new card
has to be read; after printing 132 chars a linefeed has to be given.

To pythonize the problem, lets replace the 80,132 by 3,4, ie take the
char-square
abc
def
ghi

and produce
abcd
efgh
i

The important difference (explained nicely by Beazley) is that in
generators the for-loop pulls the generators, in coroutines, the
'generator' pushes the consuming coroutines.

---------------
from __future__ import print_function
s= ["abc", "def", "ghi"]

# Coroutine-infrastructure from pep 342
def consumer(func):
def wrapper(*args,**kw):
gen = func(*args, **kw)
gen.next()
return gen
return wrapper

@consumer
def endStage():
while True:
for i in range(0,4):
print((yield), sep='', end='')
print("\n", sep='', end='')

def genStage(s, target):
for line in s:
for i in range(0,3):
target.send(line)

if __name__ == '__main__':
genStage(s, endStage())

rusi · Nov 12, 2012

This is a classic problem -- structure clash of parallel loops

<rest snipped>

Sorry wrong solution

The fidgetiness is entirely due to python not allowing C-style loops
like these:

Putting it into coroutine form, it becomes something like the
following [Untested since I dont have the API]. Clearly the
fidgetiness is there as before and now with extra coroutine plumbing

def genStage(term, target):
page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r: break
for tweet in r: target.send(tweet)
page += 1

@consumer
def endStage():
while True: process((yield))

if __name__ == '__main__':
genStage("foo", endStage())

Peter Otten · Nov 12, 2012

rusi said:
The fidgetiness is entirely due to python not allowing C-style loops
like these:

for c in iter(getchar, EOF):
...

Clearly the fidgetiness is there as before and now with extra coroutine
plumbing

Hmm, very funny...

Steve Howell · Nov 12, 2012

<rest snipped>

Sorry wrong solution

The fidgetiness is entirely due to python not allowing C-style loops
like these:
[...]

There are actually three fidgety things going on:

1. The API is 1-based instead of 0-based.
2. You don't know the number of pages in advance.
3. You want to process tweets, not pages of tweets.

Here's yet another take on the problem:

# wrap fidgety 1-based api
def search(i):
return api.GetSearch("foo", i+1)

paged_tweets = (search(i) for i in count())

# handle sentinel
paged_tweets = iter(paged_tweets.next, [])

# flatten pages
tweets = chain.from_iterable(paged_tweets)
for tweet in tweets:
process(tweet)

rusi · Nov 13, 2012

On Nov 12, 12:09 pm, rusi <[email protected]> wrote:> This is a classic problem -- structure clash of parallel loops

Click to expand...

<rest snipped>

Click to expand...

Sorry wrong solution

Click to expand...

The fidgetiness is entirely due to python not allowing C-style loops
like these:

while ((c=getchar()!= EOF) { ... }

Click to expand...

[...]

Click to expand...

There are actually three fidgety things going on:

1. The API is 1-based instead of 0-based.
2. You don't know the number of pages in advance.
3. You want to process tweets, not pages of tweets.

Here's yet another take on the problem:

# wrap fidgety 1-based api
def search(i):
return api.GetSearch("foo", i+1)

paged_tweets = (search(i) for i in count())

# handle sentinel
paged_tweets = iter(paged_tweets.next, [])

# flatten pages
tweets = chain.from_iterable(paged_tweets)
for tweet in tweets:
process(tweet)

[Steve Howell]
Nice on the whole -- thanks
Could not the 1-based-ness be dealt with by using count(1)?
ie use
paged_tweets = (api.GetSearch("foo", i) for i in count(1))

{Peter]
for c in iter(getchar, EOF):
...

Thanks. Learnt something

How to Send a Tweet from Python? I can read, but not post.	1	Feb 11, 2013
Could you verify this, Oh Great Unicode Experts of the Python-List?	8	Aug 11, 2013
Difference between using "let" in a "for" loop	0	Jul 3, 2022
Brython - Python in the browser	52	Dec 19, 2012
Batch modifying text - content and context based	5	Jan 19, 2023
Python recv loop	7	Feb 11, 2013
Python battle game help	2	Feb 23, 2023
asyncio - how to stop loop?	0	Jun 11, 2014

A gnarly little python loop

Roy Smith

Ian Kelly

Steven D'Aprano

Steve Howell

Stefan Behnel

rusi

rusi

Peter Otten

Steve Howell

rusi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads