Problems with <link> in a 0.91 RSS

Discussion in 'XML' started by Francesco Moi, Dec 6, 2003.

  1. Hello.

    I'm trying to build a RSS feed for my website. It starts:

    ----------------//---------------------
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"
    "http://my.netscape.com/publish/formats/rss-0.91.dtd">
    <rss version="0.91">
    ----------------//----------------------

    And an item could be:
    --------//--------------
    <item>
    <link>http://www.mydomain.com</link>
    <title>Foo</title>
    </item>
    ------//---------------

    If instead of 'http://www.mydomain.com', I set
    'http://www.mydomain.com/mypage.aspx?ID=1&cod=9&num=20031206'
    I get problems of validation (some RSS readers do not read it).

    Does exist any problem with these kind of URLs?

    Thank you very much.
     
    Francesco Moi, Dec 6, 2003
    #1
    1. Advertising

  2. Francesco Moi

    Andy Dingley Guest

    On 6 Dec 2003 07:48:23 -0800, (Francesco Moi)
    wrote:

    >If instead of 'http://www.mydomain.com', I set
    >'http://www.mydomain.com/mypage.aspx?ID=1&cod=9&num=20031206'
    >I get problems of validation (some RSS readers do not read it).


    Try this instead

    http://www.mydomain.com/mypage.aspx?ID=1&amp;cod=9&amp;num=20031206


    It's an XML entity issue, not RSS
     
    Andy Dingley, Dec 6, 2003
    #2
    1. Advertising

  3. Francesco Moi

    Bill Kearney Guest

    To futher clarify, XML itself requires that a certain 5 characters should always
    be entity encoded if used within an element or attribute value.

    & - &amp;
    < - &lt;
    > - &gt;

    ' - &apos;
    " - &quot;

    The one thing to guard against is double encoding. Do not re-encode n already
    encoded entity. As in don't create &amp;amp;

    This is less of an issue inside the link element than it is inside the
    descriptions.

    While many folks argue about this, the most commonly used and least distruptive
    form is a single encoding of markup. For example, an HTML snippet of "this text
    has both <b>bold</b> & <i>italic</i> text". The least harmful way to encode
    this is "this text has both &lt;b&gt;bold&lt;/b&gt; &amp; &lt;i>italic&lt;/i&gt;
    text". Sure, if the generating tool can /properly/ assure it's well-formed it's
    perfectly reasonable to use XHTML instead. But most applications don't
    consistently guarantee that their text will be valid, let alone well-formed. In
    a perfect world it would be arguably superior to avoid using markup encoding.
    Until that time arrives (don't hold your breath) using a single pass of encoding
    has shown itself to be the most workable all-around.

    -Bill Kearney
    www.Syndic8.com - The worlds largest directory of RSS content

    "Andy Dingley" <> wrote in message
    news:...
    > On 6 Dec 2003 07:48:23 -0800, (Francesco Moi)
    > wrote:
    >
    > >If instead of 'http://www.mydomain.com', I set
    > >'http://www.mydomain.com/mypage.aspx?ID=1&cod=9&num=20031206'
    > >I get problems of validation (some RSS readers do not read it).

    >
    > Try this instead
    >
    > http://www.mydomain.com/mypage.aspx?ID=1&amp;cod=9&amp;num=20031206
    >
    >
    > It's an XML entity issue, not RSS
    >
     
    Bill Kearney, Dec 10, 2003
    #3
  4. In article <>,
    Bill Kearney <> wrote:

    % To futher clarify, XML itself requires that a certain 5 characters should always
    % be entity encoded if used within an element or attribute value.
    %
    % & - &amp;
    % < - &lt;
    % > - &gt;
    % ' - &apos;
    % " - &quot;

    You waffle a bit there (requires ... should), but I'm going to disagree
    anyway. Except when used in a CDATA section, & and < must always be
    encoded. On the other hand, > never needs to be encoded. and ' and "
    need be encoded only in attribute values, and only when they match the
    value's delimiter. It is legal to encode any of the five outside a CDATA
    section, but not always required.

    My personal opinion is that you're better off using the predefined
    entities as little as possible. It's hard to avoid using &amp;, but
    I would always encode your example using a CDATA section

    <![CDATA[this text has both <b>bold</b> & <i>italic</i> text]]>

    --

    Patrick TJ McPhee
    East York Canada
     
    Patrick TJ McPhee, Dec 10, 2003
    #4
  5. Francesco Moi

    Andy Dingley Guest

    On Wed, 10 Dec 2003 12:43:05 -0500, "Bill Kearney"
    <> wrote:

    >The one thing to guard against is double encoding. Do not re-encode n already
    >encoded entity. As in don't create &amp;amp;


    You often can't avoid this happening, especially not in an RSS-like
    context where you're handling material that may already be encoded.

    But if it does, make sure that your de-coding and en-coding is
    balanced.
     
    Andy Dingley, Dec 10, 2003
    #5
  6. Francesco Moi

    Bill Kearney Guest


    > >The one thing to guard against is double encoding. Do not re-encode n

    already
    > >encoded entity. As in don't create &amp;amp;

    >
    > You often can't avoid this happening, especially not in an RSS-like
    > context where you're handling material that may already be encoded.


    Then your code better work at improving the situation. Honestly, don't pass
    along crap.

    > But if it does, make sure that your de-coding and en-coding is
    > balanced.


    Sure, the trick lies in making sure the input is decoded properly and passed
    along with the proper encoding as well.

    It's not all that hard but it can be tedious to code properly.

    -Bill Kearney
     
    Bill Kearney, Dec 11, 2003
    #6
  7. Francesco Moi

    Bill Kearney Guest

    > You waffle a bit there (requires ... should), but I'm going to disagree
    > anyway. Except when used in a CDATA section, & and < must always be
    > encoded. On the other hand, > never needs to be encoded. and ' and "
    > need be encoded only in attribute values, and only when they match the
    > value's delimiter. It is legal to encode any of the five outside a CDATA
    > section, but not always required.


    Well, what's better, to worry about the if's and when's or to encode them
    consistently?

    > My personal opinion is that you're better off using the predefined
    > entities as little as possible. It's hard to avoid using &amp;, but
    > I would always encode your example using a CDATA section
    >
    > <![CDATA[this text has both <b>bold</b> & <i>italic</i> text]]>


    Sure, provided tools understand how to use CDATA properly (many don't).
     
    Bill Kearney, Dec 11, 2003
    #7
  8. Francesco Moi

    Andy Dingley Guest

    On Thu, 11 Dec 2003 11:28:46 -0500, "Bill Kearney"
    <> wrote:

    [double encoding]

    >> You often can't avoid this happening, especially not in an RSS-like
    >> context where you're handling material that may already be encoded.


    >Then your code better work at improving the situation. Honestly, don't pass
    >along crap.


    Rubbish. The _last_ thing your code should ever do is to try and "fix
    up" content in transit. (Especially note the "in transit")

    Multiple encoding is perfectly safe, and can be decoded perfectly by
    applying the appropriate number of decodes. Where it goes wrong is
    when someone breaks this number - encoding more than they should, or
    less than they should. But I would _much_ rather receive the
    occasional bit of extra-encoded garbage (it's semantically wrong, but
    it's still well-formed XML) rather than run the risk of getting things
    which have been "smart de-encoded" by something en-route that
    "thought" it ought not to see an entity in that location and so
    decided to decode the lot. That means it's no longer well-formed, and
    that causes a lot of trouble down the line.

    If you're _really_ worried about never rendering "&amp;" on screen for
    the poor squeamish user, then do this in the user agent at the very
    last point, when there's _no_ risk of it being propagated further.
    This is also a good time to do it, as it's clearer (sic) here what the
    content author's original intent was (maybe they're writing an RSS
    feed of HTML coding tips and the entity is deliberate).

    Are you really part of syndic8 ? Is this their official policy ?


    --
    Die Gotterspammerung - Junkmail of the Gods
     
    Andy Dingley, Dec 11, 2003
    #8
  9. In article <>,
    Bill Kearney <> wrote:
    % > You waffle a bit there (requires ... should), but I'm going to disagree
    % > anyway. Except when used in a CDATA section, & and < must always be
    % > encoded. On the other hand, > never needs to be encoded. and ' and "
    % > need be encoded only in attribute values, and only when they match the
    % > value's delimiter. It is legal to encode any of the five outside a CDATA
    % > section, but not always required.
    %
    % Well, what's better, to worry about the if's and when's or to encode them
    % consistently?

    I guess, it depends on your goals. If you're writing what's `required',
    I think it's better to be correct. If you have trouble keeping track
    of when to use pre-defined entities, then you can take comfort in
    the fact that it's always allowed, and not worry about when it's required.

    % > My personal opinion is that you're better off using the predefined
    % > entities as little as possible. It's hard to avoid using &amp;, but
    % > I would always encode your example using a CDATA section
    % >
    % > <![CDATA[this text has both <b>bold</b> & <i>italic</i> text]]>
    %
    % Sure, provided tools understand how to use CDATA properly (many don't).

    Well, why use these tools? What's the point of pretending to use XML if
    you're really spending your life worrying about whether your tools can
    support the basic syntax? It's fair enough to say that you'd prefer to
    always use the predefined entities, but lack of CDATA support doesn't
    merit consideration.



    --

    Patrick TJ McPhee
    East York Canada
     
    Patrick TJ McPhee, Dec 11, 2003
    #9
  10. Francesco Moi

    Andy Dingley Guest

    On Thu, 11 Dec 2003 19:59:14 +0100 (MET), (Patrick
    TJ McPhee) wrote:

    >What's the point of pretending to use XML if
    >you're really spending your life worrying about whether your tools can
    >support the basic syntax?


    We're dealing with RSS 0.91 here. The spec for the content here is
    "ASCII", not even CDATA or PCDATA (Yes, Dave Winer's lousy
    spec-writing).

    If you do anything vaguely clever in the RSS field, it;'s likely to
    break other people's (broken) code all over the place. It sucks, but
    there you have it - your call.
    --
    Die Gotterspammerung - Junkmail of the Gods
     
    Andy Dingley, Dec 12, 2003
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    776
    SpaceGirl
    Feb 25, 2005
  2. Motta
    Replies:
    1
    Views:
    538
    Andy Dingley
    Jun 9, 2004
  3. Jake Barnes
    Replies:
    1
    Views:
    424
    Andy Dingley
    Nov 14, 2005
  4. Scott Gordo
    Replies:
    5
    Views:
    715
  5. Jonathan Groll
    Replies:
    1
    Views:
    281
    Kouhei Sutou
    Jun 27, 2009
Loading...

Share This Page