how to stuff HTML into RSS??

Discussion in 'XML' started by lkrubner@geocities.com, Dec 2, 2004.

  1. Guest

    Me and some friends are working on some PHP based templates for web
    pages. We've templates that look like this (simplified):

    <html>
    <head>
    <title>
    The green and blue design for carpentry companies
    </title>
    </head>
    <body>
    <?php showMainContent(); ?>
    <div style="width:200px; float:right">
    <?php showLinkArea(3); ?>
    </div>
    </body>
    </html>


    I'd like to publish all the templates in our database in an RSS feed so
    it will be easier to import them on other sites. Does it screw things
    up if I stuff HTML into the DESCRIPTION tag on an RSS .91 feed?
     
    , Dec 2, 2004
    #1
    1. Advertising

  2. Andy Dingley Guest

    On 2 Dec 2004 01:03:34 -0800, wrote:

    >Does it screw things
    >up if I stuff HTML into the DESCRIPTION tag on an RSS .91 feed?


    It's not what you stuff, it's how you stuff it.

    You should encode HTML, so that

    <description><p>Some <b>HTML</b> in RSS</description>

    becomes this

    <description>&lt;p&gt;Some &lt;b&gt;HTML&lt;/b&gt; in
    RSS&lt;/p&gt;</description>

    Watch out as well for & (becomes &amp;) and for &eacute; etc. (turn
    them into the equivalent numeric entity)

    I'd also suggest that you make your HTML fragments into well-formed,
    balanced XHTML fragments before you embed them (lower case element
    names, close open elements). Although this isn't required, it can make
    life easier with XML toolsets.

    This stuff isn't hard to do, but it's very poorly documented. There
    are many RSS versions, and few of them describe it fully. This is a
    useful read
    http://diveintomark.org/archives/2004/02/04/incompatible-rss


    I'd also avoid the obsolete RSS 0.91 in favour of RSS 1.0 (far
    better), or you might prefer the more popular RSS 2.0

    --
    Smert' spamionam
     
    Andy Dingley, Dec 2, 2004
    #2
    1. Advertising

  3. Joris Gillis Guest

    > You should encode HTML, so that
    >
    > <description><p>Some <b>HTML</b> in RSS</description>
    >
    > becomes this
    >
    > <description>&lt;p&gt;Some &lt;b&gt;HTML&lt;/b&gt; in
    > RSS&lt;/p&gt;</description>
    >

    Hi,

    I don't know anything about RSS, but wouldn'it be easier and more logical to insert the XHTML as elements using namespaces? And if that wouldn't be possible yet, shouldn't it become possible?

    regards,
    --
    Joris Gillis (http://www.ticalc.org/cgi-bin/acct-view.cgi?userid=38041)
    Ceterum censeo XML omnibus esse utendum
     
    Joris Gillis, Dec 2, 2004
    #3
  4. Andy Dingley Guest

    On Thu, 02 Dec 2004 15:41:32 GMT, "Joris Gillis" <>
    wrote:

    >I don't know anything about RSS,


    I suggest you read the Dive Into Mark article. It explains some of the
    background to this and is a good explanation.
    http://diveintomark.org/archives/2004/02/04/incompatible-rss

    RSS has suffered because of too many standards, and especially because
    these standards have generally been poorly specified. In particular
    there is no clear guidance on how to embed HTML content within an RSS
    item.

    A problem with RSS, and all such protocols that try to become an open
    publication medium, is that many creators will make content and many
    consumers will try to read it. Where the spec isn't exhaustive on how
    it _must_ be done, then a situation soon develops of de facto
    behaviour for how it _is_ done. Readers become dependent on this, and
    you diverge from it at your peril.

    > but wouldn'it be easier and more logical to insert the XHTML as elements using namespaces?


    That's an attractive option. However it's not a viable one.
    There are several reasons:

    Namespacing relies on using XHTML, and you may wish to include HTML
    _as_HTML_ not XHTML. Some consumers may be confused if they receive
    XHTML

    Namespacing relies on including a balanced fragment (i.e. one that can
    be well-formed as as XML fragment). This wasn't a requirement on the
    original RSS/HTML enclosure, so this is hard to re-impose in some
    cases (<a name="..." > is one of the more awkward cases to deal
    with).

    RSS is not an XML protocol. Successive versions of badly-written specs
    have clouded this. There are all sorts of references of "ASCII" when
    it should really be CDATA. It's commonplace to include HTML entities,
    even when these aren't valid outside the HTML DTD. Reliable parsing
    of RSS from external sources is a mess, and it often relies on
    knife-and-fork parsing with non-XML tools. It's not reliable to
    assume good support for standard XML features if you're working with
    external feeds, even though you "should" be able to do this.

    > And if that wouldn't be possible yet, shouldn't it become possible?


    RSS is old. It's post-XML, but pre-XHTML and (arguably)
    pre-namespacing. So even if a namespaced approach became widespread,
    consumers should (strongly) keep supporting the old way if they still
    want to accept content supplied that way.

    I use namespaced content for internal RSS feeds within my projects,
    where I always use RSS 1.0. For external work though, I encode plain
    HTML. I use balanced fragments, so I close elements like <p>...</p>,
    but I don't use the <br /> form for <br>

    --
    Smert' spamionam
     
    Andy Dingley, Dec 2, 2004
    #4
  5. Joris Gillis Guest

    On Thu, 02 Dec 2004 20:30:17 +0000, Andy Dingley <> wrote:

    > On Thu, 02 Dec 2004 15:41:32 GMT, "Joris Gillis" <>
    > wrote:
    >
    >> I don't know anything about RSS,

    >
    > I suggest you read the Dive Into Mark article. It explains some of the
    > background to this and is a good explanation.
    > http://diveintomark.org/archives/2004/02/04/incompatible-rss
    >
    > RSS has suffered because of too many standards, and especially because
    > these standards have generally been poorly specified. In particular
    > there is no clear guidance on how to embed HTML content within an RSS
    > item.
    >
    > A problem with RSS, and all such protocols that try to become an open
    > publication medium, is that many creators will make content and many
    > consumers will try to read it. Where the spec isn't exhaustive on how
    > it _must_ be done, then a situation soon develops of de facto
    > behaviour for how it _is_ done. Readers become dependent on this, and
    > you diverge from it at your peril.
    >
    >> but wouldn'it be easier and more logical to insert the XHTML as elements using namespaces?

    >
    > That's an attractive option. However it's not a viable one.
    > There are several reasons:
    >
    > Namespacing relies on using XHTML, and you may wish to include HTML
    > _as_HTML_ not XHTML. Some consumers may be confused if they receive
    > XHTML
    >
    > Namespacing relies on including a balanced fragment (i.e. one that can
    > be well-formed as as XML fragment). This wasn't a requirement on the
    > original RSS/HTML enclosure, so this is hard to re-impose in some
    > cases (<a name="..." > is one of the more awkward cases to deal
    > with).
    >
    > RSS is not an XML protocol. Successive versions of badly-written specs
    > have clouded this. There are all sorts of references of "ASCII" when
    > it should really be CDATA. It's commonplace to include HTML entities,
    > even when these aren't valid outside the HTML DTD. Reliable parsing
    > of RSS from external sources is a mess, and it often relies on
    > knife-and-fork parsing with non-XML tools. It's not reliable to
    > assume good support for standard XML features if you're working with
    > external feeds, even though you "should" be able to do this.
    >
    >> And if that wouldn't be possible yet, shouldn't it become possible?

    >
    > RSS is old. It's post-XML, but pre-XHTML and (arguably)
    > pre-namespacing. So even if a namespaced approach became widespread,
    > consumers should (strongly) keep supporting the old way if they still
    > want to accept content supplied that way.
    >
    > I use namespaced content for internal RSS feeds within my projects,
    > where I always use RSS 1.0. For external work though, I encode plain
    > HTML. I use balanced fragments, so I close elements like <p>...</p>,
    > but I don't use the <br /> form for <br>
    >


    Now that what I call a valuable reply:)
    Thank you very much.

    --
    Joris Gillis (http://www.ticalc.org/cgi-bin/acct-view.cgi?userid=38041)
    Ceterum censeo XML omnibus esse utendum
     
    Joris Gillis, Dec 3, 2004
    #5
  6. Peter Flynn Guest

    wrote:

    > Me and some friends are working on some PHP based templates for web
    > pages. We've templates that look like this (simplified):
    >
    > <html>
    > <head>
    > <title>
    > The green and blue design for carpentry companies
    > </title>
    > </head>
    > <body>
    > <?php showMainContent(); ?>
    > <div style="width:200px; float:right">
    > <?php showLinkArea(3); ?>
    > </div>
    > </body>
    > </html>
    >
    >
    > I'd like to publish all the templates in our database in an RSS feed so
    > it will be easier to import them on other sites. Does it screw things
    > up if I stuff HTML into the DESCRIPTION tag on an RSS .91 feed?


    Yes. Implementations of RSS readers are almost all hopelessly broken and
    non-conformant, and the RSS "spec" -- such as it is -- has been so kicked
    about and bastardised as to be virtually worthless except as a carrier
    format like HTML. There were plans to make a newer, better version, but
    like HTML it has now become so fossilised that it's not worth changing.

    ///Peter
    --
    "The cat in the box is both a wave and a particle"
    -- Terry Pratchett, introducing quantum physics in _The Authentic Cat_
     
    Peter Flynn, Dec 4, 2004
    #6
  7. Guest

    Thank you for your in-depth reply. I've already read Mark's article and
    one thing I got from it was that it didn't matter much which version of
    RSS you used, they were all broken.

    For now I'm in the lucky position of being the consumer of my own
    output. We have some HTML templates we'd like publish, but we are
    publishing them for people who have our software, so we control the
    source and the point of consumption. I'd love to eventualy use a richer
    RSS but I'm short on time this month and so I'd like to reuse what PHP
    code we already have written and tested. The code we have puts out
    valid RSS .91.

    To publish an HTML template in the description tag of RSS, should I
    just wrap it in a CDATA tag? Or escape it as someone ablove remarked.
     
    , Dec 8, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    779
    SpaceGirl
    Feb 25, 2005
  2. Motta
    Replies:
    1
    Views:
    541
    Andy Dingley
    Jun 9, 2004
  3. Replies:
    0
    Views:
    423
  4. Replies:
    0
    Views:
    469
  5. Jonathan Groll
    Replies:
    1
    Views:
    283
    Kouhei Sutou
    Jun 27, 2009
Loading...

Share This Page