Help me optimize my feed script.

bsagert · Jun 26, 2008

I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill

# UTF-8
import feedparser

rss = [
'http://feeds.feedburner.com/typepad/alleyinsider/
silicon_alley_insider',
'http://www.techmeme.com/index.xml',
'http://feeds.feedburner.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopular.rss',
'http://rss.news.yahoo.com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'

s += '<style>\n'\
'h3{margin:10px 0 0 0;padding:0}\n'\
'a.x{color:black}'\
'p{margin:5px 0 0 0;padding:0}'\
'</style>\n'

s += '</head>\n<body>\n<br />\n'

for url in rss:
d = feedparser.parse(url)
title = d.feed.title
link = d.feed.link
s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
# aldaily.com has weird feed
if link.find('aldaily.com') != -1:
description = d.entries[0].description
s += description + '\n'
for x in range(0,3):
if link.find('aldaily.com') != -1:
continue
title = d.entries[x].title
link = d.entries[x].link
s += '<a href="'+ link +'">'+ title +'</a><br />\n'

s += '<br /><br />\n</body>\n</html>'

f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close

print
print 'myFeeds.htm written'

Carl Banks · Jun 26, 2008

I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill

# UTF-8
import feedparser

rss = [
'http://feeds.feedburner.com/typepad/alleyinsider/
silicon_alley_insider',
'http://www.techmeme.com/index.xml',
'http://feeds.feedburner.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopular.rss',
'http://rss.news.yahoo.com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'

s += '<style>\n'\
'h3{margin:10px 0 0 0;padding:0}\n'\
'a.x{color:black}'\
'p{margin:5px 0 0 0;padding:0}'\
'</style>\n'

s += '</head>\n<body>\n<br />\n'

for url in rss:
d = feedparser.parse(url)
title = d.feed.title
link = d.feed.link
s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
# aldaily.com has weird feed
if link.find('aldaily.com') != -1:
description = d.entries[0].description
s += description + '\n'
for x in range(0,3):
if link.find('aldaily.com') != -1:
continue
title = d.entries[x].title
link = d.entries[x].link
s += '<a href="'+ link +'">'+ title +'</a><br />\n'

s += '<br /><br />\n</body>\n</html>'

f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close

print
print 'myFeeds.htm written'

Using the += operator on strings is a common bottleneck in programs.
First thing you should try is to get rid of that. (Recent versions of
Python have taken steps to optimize it, but still it sometimes doesn't
work, such as if you have more than one reference to the string
alive.)

Instead, create a list like this:

s = []

And append substrings to the list, like this:

s.append('</head>\n<body>\n<br />\n')

Then, when writing the string out (or otherwise using it), join all
the substrings with the str.join method:

f.write(''.join(s))

Carl Banks

Jason Scheirer · Jun 26, 2008

I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill

# UTF-8
import feedparser

rss = [
'http://feeds.feedburner.com/typepad/alleyinsider/
silicon_alley_insider',
'http://www.techmeme.com/index.xml',
'http://feeds.feedburner.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopular.rss',
'http://rss.news.yahoo.com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'

s += '<style>\n'\
'h3{margin:10px 0 0 0;padding:0}\n'\
'a.x{color:black}'\
'p{margin:5px 0 0 0;padding:0}'\
'</style>\n'

s += '</head>\n<body>\n<br />\n'

for url in rss:
d = feedparser.parse(url)
title = d.feed.title
link = d.feed.link
s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
# aldaily.com has weird feed
if link.find('aldaily.com') != -1:
description = d.entries[0].description
s += description + '\n'
for x in range(0,3):
if link.find('aldaily.com') != -1:
continue
title = d.entries[x].title
link = d.entries[x].link
s += '<a href="'+ link +'">'+ title +'</a><br />\n'

s += '<br /><br />\n</body>\n</html>'

f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close

print
print 'myFeeds.htm written'

I can 100% guarantee you that the extended run time is network I/O
bound. Investigate using a thread pool to load the feeds in parallel.
Some code you might be able to shim in:

# Extra imports
import threading
import Queue

# Function that fetches and pushes
def parse_and_put(url, queue_):
parsed_feed = feedparser.parse(url)
queue_.put(parsed_feed)

# Set up some variables
my_queue = Queue.Queue()
threads = []

# Set up a thread for fetching each URL
for url in rss:
url_thread = threading.Thread(target=parse_and_put, name=url,
args=(url, my_queue))
threads.append(url_thread)
url_thread.setDaemonic(False)
url_thread.start()

# Wait for threads to finish
for thread in threads:
thread.join()

# Push the results into a list
feeds_list = []
while not my_queue.empty():
feeds_list.append(my_queue.get())

# Do what you were doing before, replacing the for url in rss with for
d in feedS_list
for d in feeds_list:
title = d.feed.title
link = d.feed.link

PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 11, 2022
Help with my responsive home page	2	Dec 14, 2022
Help me sort out this script	1	Oct 17, 2023
I dont get this. Please help me!!	2	Jan 24, 2023
Help with code	0	Jun 12, 2022
Issue with textbox script?	0	Sep 5, 2022
Help with Visual Lightbox: Scripts	2	May 3, 2023
I need help fixing my website	2	Oct 15, 2023

Help me optimize my feed script.

bsagert

Carl Banks

Jason Scheirer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads