RSS/Atom feed consuming lib?

M

Marcus Bristav

I have a customer (we build their intranet with Rails) that subscribes
to a number of news feeds. They want to make these feeds available on
their intranet so we need to fetch, parse and publish these feeds
(like a interal web based feed reader).

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have...) and
exposes it with a nice common (for all formats) API?

I looked att RAA and Rubyforge but didn't find anything that really
peaked my interest (although I might have missed something)

/Marcus
 
M

Marcus Bristav

Thanks for the tips! I've tried feedtools and it seems to work nicely :)

Out of curiosity: Why couldn't you use feedtools?

/Marcus
 
G

Gustav Paul

If you're planning to go through FeedBurner, you can checkout the plugin at
http://combustible.rubyforge.org/docs

Still very young, but perhaps you'll find it usefull

Gustav

--
about me:
My greatest achievement was when all the other
kids just learnt to count from 1 to 10,
i was counting (0..9)

- gustav.paul
 
A

Andy Smith

Marcus said:
I have a customer (we build their intranet with Rails) that subscribes
to a number of news feeds. They want to make these feeds available on
their intranet so we need to fetch, parse and publish these feeds
(like a interal web based feed reader).

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have...) and
exposes it with a nice common (for all formats) API?

I looked att RAA and Rubyforge but didn't find anything that really
peaked my interest (although I might have missed something)

/Marcus

You may be interested in feed-normalizer; something I pieced together to
wrap a few different Atom/RSS parsers. It outputs a normalized
object graph to represent a feed, regardless of the underlying feed format.

It currently wraps the Ruby RSS parser and Lucas Carlson's SimpleRSS,
but it can be easily extended to support more parsers. Patches welcome.

http://feed-normalizer.rubyforge.org/

Hope that helps.

Andy
 
R

Ray Chen

I am also working on a performance app that requires feed parsing. The
two that I have tried are feedtools and syndication. First I tried
feedtools for RSS and Atom, but that was too slow, so I switched to
syndication for both RSS and Atom. I found syndication to break on a
high percentage of Atom sites, so in the end, I sent RSS to syndication
and Atom to feedtools and took the corresponding perf hit for Atom
feeds.

I find this approach to be decently robust, but not very elegant. I am
going through > 10k feeds a day of all varieties.

Can someone comment on the robustness of Ruby RSS Parser and Lucas
Carlson's SimpleRSS? I am curious about Andy's feed normalizer.

HTH,
Ray
 
A

Andy Smith

Ray said:
I am also working on a performance app that requires feed parsing.

As previously mentioned, feed-normalizer aims to produce a 'Feed' object
that is independent of the underlying format. This means it will use
each parser (in a user-defined order) until it gets back a successful
parse and usable a object which to interface.

What this also means is that the *primary* goal of feed-normalizer is to
produce the aforementioned Feed object graph. This might mean it hitting
3 parsers before it gets that result. So performance isn't really a
consideration.

Of course, you could change the order of parsing so that feed-normalizer
uses the fastest parser first, and so on. feed-normalizer currently uses
most strict to most liberal as its default order. Right now, this just
happens to be fastest parser first, too :)
The two that I have tried are feedtools and syndication. First I tried
feedtools for RSS and Atom, but that was too slow, so I switched to
syndication for both RSS and Atom. I found syndication to break on a
high percentage of Atom sites, so in the end, I sent RSS to syndication
and Atom to feedtools and took the corresponding perf hit for Atom
feeds.

In this case you could create a wrapper for feed-normalizer that
interfaces both syndication and feedtools, and tell feed-normalizer
which one to use first. I assume you'll probably encounter more RSS than
Atom.
I find this approach to be decently robust, but not very elegant. I am
going through > 10k feeds a day of all varieties.

Can someone comment on the robustness of Ruby RSS Parser and Lucas
Carlson's SimpleRSS? I am curious about Andy's feed normalizer.

I personally have found Ruby's RSS library to be very good at handling
RSS feeds that aren't broken :) What that means is the results should be
predictable, but the chance of a good parse may be lower.

SimpleRSS on the other hand is uber-liberal, and if the feed resembles
anywhere near an RSS or Atom document, you'll probably get a pretty good
result back, but there are small errors sometimes.

Bob Aman did an overview of both parsers, somewhere on sporkmonger.com.

Back to performance again; I did some rudimentary benchmarks[1] of both
Ruby's RSS as well as SimpleRSS. I think the results of this benchmark
really make the point for SimpleRSS being a great 'backup' parser to
have when nothing else will parse an ill-formed feed.

And of course, I'm always looking for patches and new parser wrappers
for feed-normalizer.

Hope that helps.

Andy

[1]
http://blog.andyis.textdriven.com/articles/2006/03/28/parsers-in-the-pool
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top