Help me optimize my feed script.

B

bsagert

I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill

# UTF-8
import feedparser

rss = [
'http://feeds.feedburner.com/typepad/alleyinsider/
silicon_alley_insider',
'http://www.techmeme.com/index.xml',
'http://feeds.feedburner.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopular.rss',
'http://rss.news.yahoo.com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'

s += '<style>\n'\
'h3{margin:10px 0 0 0;padding:0}\n'\
'a.x{color:black}'\
'p{margin:5px 0 0 0;padding:0}'\
'</style>\n'

s += '</head>\n<body>\n<br />\n'

for url in rss:
d = feedparser.parse(url)
title = d.feed.title
link = d.feed.link
s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
# aldaily.com has weird feed
if link.find('aldaily.com') != -1:
description = d.entries[0].description
s += description + '\n'
for x in range(0,3):
if link.find('aldaily.com') != -1:
continue
title = d.entries[x].title
link = d.entries[x].link
s += '<a href="'+ link +'">'+ title +'</a><br />\n'

s += '<br /><br />\n</body>\n</html>'

f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close

print
print 'myFeeds.htm written'
 
C

Carl Banks

I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill

# UTF-8
import feedparser

rss = [
'http://feeds.feedburner.com/typepad/alleyinsider/
silicon_alley_insider',
'http://www.techmeme.com/index.xml',
'http://feeds.feedburner.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopular.rss',
'http://rss.news.yahoo.com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'

s += '<style>\n'\
'h3{margin:10px 0 0 0;padding:0}\n'\
'a.x{color:black}'\
'p{margin:5px 0 0 0;padding:0}'\
'</style>\n'

s += '</head>\n<body>\n<br />\n'

for url in rss:
d = feedparser.parse(url)
title = d.feed.title
link = d.feed.link
s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
# aldaily.com has weird feed
if link.find('aldaily.com') != -1:
description = d.entries[0].description
s += description + '\n'
for x in range(0,3):
if link.find('aldaily.com') != -1:
continue
title = d.entries[x].title
link = d.entries[x].link
s += '<a href="'+ link +'">'+ title +'</a><br />\n'

s += '<br /><br />\n</body>\n</html>'

f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close

print
print 'myFeeds.htm written'

Using the += operator on strings is a common bottleneck in programs.
First thing you should try is to get rid of that. (Recent versions of
Python have taken steps to optimize it, but still it sometimes doesn't
work, such as if you have more than one reference to the string
alive.)

Instead, create a list like this:

s = []

And append substrings to the list, like this:

s.append('</head>\n<body>\n<br />\n')

Then, when writing the string out (or otherwise using it), join all
the substrings with the str.join method:

f.write(''.join(s))


Carl Banks
 
J

Jason Scheirer

I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill

# UTF-8
import feedparser

rss = [
'http://feeds.feedburner.com/typepad/alleyinsider/
silicon_alley_insider',
'http://www.techmeme.com/index.xml',
'http://feeds.feedburner.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopular.rss',
'http://rss.news.yahoo.com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'

s += '<style>\n'\
     'h3{margin:10px 0 0 0;padding:0}\n'\
     'a.x{color:black}'\
     'p{margin:5px 0 0 0;padding:0}'\
     '</style>\n'

s += '</head>\n<body>\n<br />\n'

for url in rss:
        d = feedparser.parse(url)
        title = d.feed.title
        link = d.feed.link
        s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
        # aldaily.com has weird feed
        if link.find('aldaily.com') != -1:
                description = d.entries[0].description
                s += description + '\n'
        for x in range(0,3):
                if link.find('aldaily.com') != -1:
                        continue
                title = d.entries[x].title
                link = d.entries[x].link
                s += '<a href="'+ link +'">'+ title +'</a><br />\n'

s += '<br /><br />\n</body>\n</html>'

f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close

print
print 'myFeeds.htm written'

I can 100% guarantee you that the extended run time is network I/O
bound. Investigate using a thread pool to load the feeds in parallel.
Some code you might be able to shim in:

# Extra imports
import threading
import Queue

# Function that fetches and pushes
def parse_and_put(url, queue_):
parsed_feed = feedparser.parse(url)
queue_.put(parsed_feed)

# Set up some variables
my_queue = Queue.Queue()
threads = []

# Set up a thread for fetching each URL
for url in rss:
url_thread = threading.Thread(target=parse_and_put, name=url,
args=(url, my_queue))
threads.append(url_thread)
url_thread.setDaemonic(False)
url_thread.start()

# Wait for threads to finish
for thread in threads:
thread.join()

# Push the results into a list
feeds_list = []
while not my_queue.empty():
feeds_list.append(my_queue.get())

# Do what you were doing before, replacing the for url in rss with for
d in feedS_list
for d in feeds_list:
title = d.feed.title
link = d.feed.link
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top