Atom & the Standard Library RSS Module

Discussion in 'Ruby' started by grant, Apr 4, 2008.

  1. grant

    grant Guest

    I need to parse RSS 1.0, 2.0 and ATOM feeds. I upgraded my RSS module
    to the latest version (0.2.4) to get ATOM support. The strange thing
    is that the data structure returned for ATOM feeds is ugly and wildly
    inconsistent with the nice, clean one that is returned for RSS feeds.

    I've noticed there are a couple of competing Ruby ports of Mark
    Pilgrim's Universal Feedparser. The 'rfeedparser' one looks to be the
    best and FeedTools looks interesting, but I haven't actually tried
    them yet. (I really like the Universal Feedparser for Python.)

    Does anyone have any suggestions on which direction to take?
    grant, Apr 4, 2008
    #1
    1. Advertising

  2. grant

    Kouhei Sutou Guest

    Hi,

    In <>
    "Atom & the Standard Library RSS Module" on Sat, 5 Apr 2008 06:25:05 +0900,
    grant <> wrote:

    > I need to parse RSS 1.0, 2.0 and ATOM feeds. I upgraded my RSS module
    > to the latest version (0.2.4) to get ATOM support. The strange thing
    > is that the data structure returned for ATOM feeds is ugly and wildly
    > inconsistent with the nice, clean one that is returned for RSS feeds.


    Please show an example.

    --
    kou
    Kouhei Sutou, Apr 5, 2008
    #2
    1. Advertising

  3. grant

    grant Guest

    On 5 Apr, 01:29, Kouhei Sutou <> wrote:
    > Please show an example.


    Sorry, I got tangled up in my own wishes. My problem with the library
    was just in my head. I suppose I was just hoping for a more consistent
    representation of a feed than I've had to deal with in the past. I
    know that what is generated by RSS/ATOM parsing libraries is a
    reflection of the feeds themselves.

    Anyway, a few examples of what bugs me, demonstrated in irb:

    require 'rss'

    rss = 'http://www.giftedslacker.com/feed/'
    atom = 'http://oblivionation.blogspot.com/feeds/posts/default'

    rssfeed = RSS::parser.parse(rss)
    atomfeed = RSS::parser.parse(atom)

    #print the content of the most recent post
    puts rssfeed.items[0].content_encoded
    puts atomfeed.items[0].content.content

    #print the titles of the posts in the feed
    rssfeed.items.each {|item| puts item.title}
    atomfeed.items.each {|item| puts item.title.content}

    #print the author of the most recent post
    rssfeed.items[0].dc_creator
    atomfeed.entries[0].author.name.content

    ---

    What I'd like is a consistent interface to the commonly used elements
    in ATOM and RSS feeds, regardless of version. It must be more
    difficult than it seems to me, because I'm not aware that anyone does
    it. I might give it a try just for fun. But, being new to Ruby, I'm
    not sure where to begin.

    -grant
    grant, Apr 5, 2008
    #3
  4. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    grant wrote:
    | On 5 Apr, 01:29, Kouhei Sutou <> wrote:

    | What I'd like is a consistent interface to the commonly used elements
    | in ATOM and RSS feeds, regardless of version. It must be more
    | difficult than it seems to me, because I'm not aware that anyone does
    | it. I might give it a try just for fun. But, being new to Ruby, I'm
    | not sure where to begin.

    You could build upon/look at/submit patches to Simple RSS:
    http://simple-rss.rubyforge.org/

    I've used it, and while it is simple to use, it comes at the cost of
    limited functionality (as far as I could see. I only used it to grab
    NetBeans Ruby IDE builds off the web, when the buildserver used updated
    its RSS feed, so take my comments with a grain of salt.)

    - -- Phillip Gawlowski
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.8 (MingW32)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    iEYEARECAAYFAkf3bSUACgkQbtAgaoJTgL9ijwCfZPylNfWZHsTE02Pec6fQXTcl
    F6EAoJCO3lkfChpIKji9hc52aUaWm9BV
    =Uhvw
    -----END PGP SIGNATURE-----
    Phillip Gawlowski, Apr 5, 2008
    #4
  5. grant

    Kouhei Sutou Guest

    Hi,

    In <>
    "Re: Atom & the Standard Library RSS Module" on Sat, 5 Apr 2008 20:45:06 +0900,
    grant <> wrote:

    > On 5 Apr, 01:29, Kouhei Sutou <> wrote:
    > > Please show an example.

    >
    > Sorry, I got tangled up in my own wishes. My problem with the library
    > was just in my head. I suppose I was just hoping for a more consistent
    > representation of a feed than I've had to deal with in the past. I
    > know that what is generated by RSS/ATOM parsing libraries is a
    > reflection of the feeds themselves.
    >
    > Anyway, a few examples of what bugs me, demonstrated in irb:
    >
    > require 'rss'
    >
    > rss = 'http://www.giftedslacker.com/feed/'
    > atom = 'http://oblivionation.blogspot.com/feeds/posts/default'
    >
    > rssfeed = RSS::parser.parse(rss)
    > atomfeed = RSS::parser.parse(atom)
    >

    You can normalize parsed feed by to_rss, to_atom or
    to_xml. For example:

    rss10feed = atomfeed.to_rss("1.0")

    You may need to set default value:

    rss10feed = atomfeed.to_rss("1.0") do |maker|
    maker.channel.about ||= maker.channel.link
    maker.channel.description ||= "No description"
    maker.items.each do |item|
    item.title ||= "UNKNOWN"
    item.link ||= "UNKNOWN"
    end
    end

    > #print the content of the most recent post
    > puts rssfeed.items[0].content_encoded
    > puts atomfeed.items[0].content.content
    >
    > #print the titles of the posts in the feed
    > rssfeed.items.each {|item| puts item.title}
    > atomfeed.items.each {|item| puts item.title.content}
    >
    > #print the author of the most recent post
    > rssfeed.items[0].dc_creator
    > atomfeed.entries[0].author.name.content


    The atom specification says that all atom element may have
    xml:base and xml:lang attributes. If
    RSS::Atom::Entry::Author#name returns a String, we can't
    get such attribute values. This is why
    RSS::Atom::Entry::Author#name returns an
    RSS::Atom::Entry::Author::Name not a String.

    BTW, what about the following API?

    atomfeed.entries[0].author.name # => a String
    atomfeed.entries[0].author.name do |name|
    # name: a RSS::Atom::Entry::Author::Name
    name.content # => a String
    end # => the last evaluated value (name.content)


    Thanks,
    --
    kou
    Kouhei Sutou, Apr 6, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. chlori
    Replies:
    1
    Views:
    522
  2. Prathap

    RSS/Atom library

    Prathap, Feb 11, 2009, in forum: C++
    Replies:
    1
    Views:
    539
    mlimber
    Feb 12, 2009
  3. Gerald Bauer
    Replies:
    1
    Views:
    118
    Gerald Bauer
    Jul 21, 2008
  4. Jonathan Groll
    Replies:
    1
    Views:
    248
    Kouhei Sutou
    Jun 27, 2009
  5. Richard Conroy
    Replies:
    4
    Views:
    130
    Walton Hoops
    Jun 23, 2010
Loading...

Share This Page