retrieving ATOM/FSS feeds

_spitFIRE · Aug 13, 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm using feedparser library to parser ATOM/RSS feeds. However, I don't
get the entire post! but only summaries! How do I retrieve the entire feed?
I believe that the parser library should have support for doing that or the
specification should detail how it can be done? Or should I simply get the
feed link and do HTML scraping?

- --
_ _ _]{5pitph!r3}[_ _ _
__________________________________________________
â€œI'm smart enough to know that I'm dumb.â€
- Richard P Feynman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGv77mA0th8WKBUJMRAk80AJ9VCIBXIZVhuPtT7bfY4dRrM15H+gCeOVJG
77Zbl8jmWPsp4QjP85Lbwbc=
=Ho+8
-----END PGP SIGNATURE-----

Lawrence Oluyede · Aug 13, 2007

_spitFIRE said:
I'm using feedparser library to parser ATOM/RSS feeds. However, I don't
get the entire post! but only summaries! How do I retrieve the entire feed?
I believe that the parser library should have support for doing that or the
specification should detail how it can be done? Or should I simply get the
feed link and do HTML scraping?

If the content producer doesn't provide the full article via RSS/ATOM
there's no way you can get it from there. Search for full content feeds
if any, otherwise get the article URL and feed it to BeautifulSoup to
scrape the content.

_spitFIRE · Aug 13, 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lawrence said:
If the content producer doesn't provide the full article via RSS/ATOM
there's no way you can get it from there. Search for full content feeds
if any, otherwise get the article URL and feed it to BeautifulSoup to
scrape the content.

For the same feed (where the content producer doesn't provide the full
article!) I was able to see the complete post in other RSS aggregators (like
Blam). I wanted to know how they were able to collect the feed!

I knew for sure that you can't do screen scraping separately for each and
every blog and that there has be a standard way or atleast that blogs
maintain a standard template for rendering posts. I mean if each of the site
only offered partial content and the rest had to be scraped from the page,
and the page maintained a non-standard structure which is more likely, then
it would become impossible IMHO for any aggregator to aggregate feeds!

I shall for now try with BeautifulSoup, though I'm still doubtful about it.

- --
_ _ _]{5pitph!r3}[_ _ _
__________________________________________________
â€œI'm smart enough to know that I'm dumb.â€
- Richard P Feynman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGwC1SA0th8WKBUJMRAs4eAJ0bLJVzEZls1JtE6e8MUrqdapXGPwCfVO02
yYzezvhJFY1SDHUGxrJdR5M=
=rfLo
-----END PGP SIGNATURE-----

Lawrence Oluyede · Aug 13, 2007

_spitFIRE said:
For the same feed (where the content producer doesn't provide the full
article!) I was able to see the complete post in other RSS aggregators (like
Blam). I wanted to know how they were able to collect the feed!

Perhaps in the feed itself there's the link for the full content feed.

Diez B. Roggisch · Aug 13, 2007

_spitFIRE said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

For the same feed (where the content producer doesn't provide the full
article!) I was able to see the complete post in other RSS aggregators
(like Blam). I wanted to know how they were able to collect the feed!

Either it is referred to in the data - or it might be that they do have
affiliate-deals with certain partners.

Diez

parsing XSD	2	Aug 10, 2008
ideas for a GUI	1	Jan 20, 2008
Reading Web Feeds (RSS, Atom) in Ruby - Pros and Cons of Web FeedLibraries	0	Jul 15, 2008
Next meeting of the Hamburg Python User Group	0	Nov 6, 2007
[ANN] ruby-feedparser : RSS/Atom feed parser	0	Nov 15, 2005
Rich media in Atom feeds - Best practice?	0	Sep 27, 2005
[RELEASED] Python 3.2.4 and Python 3.3.1	3	Apr 6, 2013
[RELEASED] Python 3.2.4 rc 1 and Python 3.3.1 rc 1	0	Mar 26, 2013

retrieving ATOM/FSS feeds

_spitFIRE

Lawrence Oluyede

_spitFIRE

Lawrence Oluyede

Diez B. Roggisch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads