can't get my RSS to validate - Help!!

Discussion in 'XML' started by lawrence, May 5, 2004.

  1. lawrence

    lawrence Guest

    lawrence, May 5, 2004
    #1
    1. Advertising

  2. lawrence

    Rolf Magnus Guest

    lawrence wrote:

    > I'm running this page:
    >
    > http://www.krubner.com/rss/page938.xml
    >
    >
    > through this validator:
    >
    > http://rss.scripting.com/?url=http://www.krubner.com/rss

    2Fpage938.xml
    >
    >
    > I can't get this page to validate. I'm getting two errors:
    >
    > 1. pubDate must be an RFC-822 date 9 9
    > 2. Undefined rss element: item 12 7
    >
    > The second one makes no sense to me. "item" is used in all the
    > examples they give for .91 RSS, so why is the validator giving me
    > grief?


    I don't know rss, but your items are direct children of rss, and
    according to the dtd, the only valid child element for rss is channel
    and the only valid parent element for item is also channel.
     
    Rolf Magnus, May 5, 2004
    #2
    1. Advertising

  3. lawrence

    Andy Dingley Guest

    (lawrence) wrote in message news:<>...
    > I'm running this page:
    >
    > http://www.krubner.com/rss/page938.xml
    >
    > through this validator:
    >
    > http://rss.scripting.com/?url=http://www.krubner.com/rss/page938.xml


    I'd believe George Bush before I trusted the accuracy of a validator
    from scripting.com -- and for similar reasons; both are always "right"
    (i.e. not obvious enough to clearly catch them at it) but they'll
    cheerfully redefine black as white to push their own agendas.

    >
    > I can't get this page to validate. I'm getting two errors:
    >
    > 1. pubDate must be an RFC-822 date 9 9


    Your <pubDate> has no content. The whole element is optional, but if
    you're going to use it, you have to give it some content.

    > 2. Undefined rss element: item 12 7


    You need to have <item> as a child of <channel>. This 0.91 feed has it
    as a child of <rss> (which is where RSS 1.0 places it, so that might
    have caused some confusion)

    I wouldn't suggest 0.91 either, as a format for new work. 1.0 is best
    (a hobby horse of mine), but go to 0.92 as a minimum.
     
    Andy Dingley, May 5, 2004
    #3
  4. lawrence

    lawrence Guest

    (lawrence) wrote in message news:<>...
    > I'm running this page:
    >
    > http://www.krubner.com/rss/page938.xml
    >
    >
    > through this validator:
    >
    > http://rss.scripting.com/?url=http://www.krubner.com/rss/page938.xml
    >
    >
    > I can't get this page to validate. I'm getting two errors:
    >
    > 1. pubDate must be an RFC-822 date 9 9
    > 2. Undefined rss element: item 12 7
    >
    > The second one makes no sense to me. "item" is used in all the
    > examples they give for .91 RSS, so why is the validator giving me
    > grief?



    Thanks very much. I rearranged things and got it to validate, using
    the validator from scripting.com.

    However, it still fails validation at
    http://feeds.archive.org/validator/

    The problem is "bad characters". I'm not sure how to start to debug
    that. What is the first thing I should look for?

    I wrote the 0.91 script (a PHP script) first because I assumed it
    would be easy. I figured once I worked out the troubles at 0.91, I
    could move up and do scripts for 1.0 and 2.0. The fact that I'm having
    trouble even at the 0.91 level makes me happy that I didn't jump
    straight to the 1.0 or 2.0 level.
     
    lawrence, May 6, 2004
    #4
  5. lawrence wrote:
    >>I'm running this page:
    >>
    >>http://www.krubner.com/rss/page938.xml
    >>

    > However, it still fails validation at
    > http://feeds.archive.org/validator/
    >
    > The problem is "bad characters". I'm not sure how to start to debug
    > that. What is the first thing I should look for?


    Did you change anything? I get "This is a valid RSS feed." with <>
    http://feeds.archive.org/validator/>.

    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
     
    Johannes Koch, May 6, 2004
    #5
  6. lawrence

    Andy Dingley Guest

    > However, it still fails validation at
    > http://feeds.archive.org/validator/


    The version this morning seems OK. I suggest you keep an archive of
    these bad feeds under static URLs, so that we can more easily see the
    problems.

    > The problem is "bad characters". I'm not sure how to start to debug
    > that. What is the first thing I should look for?


    Most likely thing is the use of HTML entities (like &eacute; ) that
    aren't valid because RSS is an XML protocol and doesn't recognise
    these HTML-defined entities. A _very_ common error in RSS feeds.


    > I wrote the 0.91 script (a PHP script) first because I assumed it
    > would be easy.


    I have to ask why anyone needs to write RSS scripts these days ?
    (Although I spent yesterday doing it myself). There are very many
    already out there, and it's a rare situation that really needs
    something written from scratch.

    I suggest that you learn a bit more detailed XML (entities for one
    thing, namespaces for another) and learn how to read a formal DTD.
    It's one thing to make a feed work once during testing, but quite
    another to make a reliable feed that handles all the data it will meet
    over its lifetime. Sadly RSS tools suffer badly from this - they run
    for a week, then crash when they meet their first accented European
    character. To get a reliable feed, you really do need to know how to
    _understand_ the specification, not just match up one example.

    As to the versions, then I'd support 0.92 and 1.0 1.0 is best, 0.92
    solves compatibility issues for some older or simpler aggregators. 2.0
    is pointless.

    You don't need to understand Dublin Core to use RSS 1.0, but you will
    do if you want to really use it. Worth the effort of studying it
    anyway (Learning Dublin Core is vastly more useful than learning RSS
    anyway)

    Hang in there - RSS versions don't really vary that much between
    themselves.
     
    Andy Dingley, May 6, 2004
    #6
  7. lawrence

    lawrence Guest

    (Andy Dingley) wrote in message news:<>...
    > > However, it still fails validation at
    > > http://feeds.archive.org/validator/

    >
    > The version this morning seems OK. I suggest you keep an archive of
    > these bad feeds under static URLs, so that we can more easily see the
    > problems.
    >
    > > The problem is "bad characters". I'm not sure how to start to debug
    > > that. What is the first thing I should look for?

    >
    > Most likely thing is the use of HTML entities (like &eacute; ) that
    > aren't valid because RSS is an XML protocol and doesn't recognise
    > these HTML-defined entities. A _very_ common error in RSS feeds.


    I'm stumped when I think of all the possible user inputs that can mess
    up an RSS feed. Myself and a friend have been working on some weblog
    software for a year now - the weblog entries show up in the RSS feed.
    Lots and lots of problems come up - people might copy and paste
    something from Word in their weblog and then get a Windows character
    that is not in the defined char set for the RSS feed. How to protect
    against that. Or people can, as you say, add in some HTML entities, or
    sometimes the software does that automatically.

    It's true that I could write filters for each possible problem as it
    comes up, but it seems like it would take a mountain of code and there
    would never be an end to it - after I figure out how to convert
    unexpected European accented characters, then I'd have all the Asian
    alphabets to work on.

    I guess I'm asking if there is an easy, elegant way to convert a set
    of characters to some particular character set. I don't know of any
    myself. What strategies have other programmers used when tackling this
    problem?
     
    lawrence, May 22, 2004
    #7
  8. lawrence

    lawrence Guest

    (Andy Dingley) wrote in message >
    > I have to ask why anyone needs to write RSS scripts these days ?
    > (Although I spent yesterday doing it myself). There are very many
    > already out there, and it's a rare situation that really needs
    > something written from scratch.


    I think that is a fair question. Part of the answer would be, I guess,
    we want to be able to assert that our weblog software is "complete" in
    some sense. Without RSS, it is not complete. The other aspect of it,
    call it an institutional mission, is that we want to donate all our
    code to the public domain. That means we often have to do things from
    scratch, because other we might use is copyrighted. There is a lot of
    good code under the GPL, but the GPL is too restrictive for our
    purposes.




    > I suggest that you learn a bit more detailed XML (entities for one
    > thing, namespaces for another) and learn how to read a formal DTD.
    > It's one thing to make a feed work once during testing, but quite
    > another to make a reliable feed that handles all the data it will meet
    > over its lifetime.


    I really would love to learn more about XML and I've bought a bunch
    books from O'Reilly. Sadly, I'm stretched a little thin. Me and some
    friends have been working on a content management system and I've had
    to learn a lot about a lot of different stuff, and I haven't had the
    chance to become very good at anything. I'd like to know more about
    XML, XHTML, Apache, CSS 2.0, RSS, Linux, Windows XP, mime types,
    character encodings, SOAP, Java, web services, Flash, PostGreSql,
    Swing, MySql, encryption and a whole lot more. It's all just too much.
    My hope, at this point, is they'll come a point when the basic
    abilities of our CMS are stable enough that I can go over it again and
    go deeper into certain areas when my knowledge is better. That, or
    maybe our project can pick up some programmers who are more talented
    than I.

    We're giving away our (still quite buggy) software here:

    http://www.publicdomainsoftware.org/
     
    lawrence, May 22, 2004
    #8
  9. lawrence

    Andy Dingley Guest

    (lawrence) wrote in message news:<>...

    > I'm stumped when I think of all the possible user inputs that can mess
    > up an RSS feed.


    Can't be done. Never was possible - users are just too inventive.

    Instead, look at it the other way. Find the set of all things that are
    _valid_ RSS and exclude anything else. This is a smaller set than the
    invalid stuff, and it's better documented. If you _only_ allow the
    valid sequences, then you have implicitly forbidden all the invalid
    stuff.

    Definitely read this
    http://diveintomark.org/archives/2004/02/04/incompatible-rss


    > It's true that I could write filters for each possible problem as it
    > comes up,


    No, you can't. You might possibly manage to do it for one day, but
    there will always be something new coming along tomorrow. Take it the
    other way.
     
    Andy Dingley, May 24, 2004
    #9
  10. lawrence

    lawrence Guest

    (Andy Dingley) wrote in message news:<>...
    > (lawrence) wrote in message news:<>...
    >
    > > I'm stumped when I think of all the possible user inputs that can mess
    > > up an RSS feed.

    >
    > Can't be done. Never was possible - users are just too inventive.
    >
    > Instead, look at it the other way. Find the set of all things that are
    > _valid_ RSS and exclude anything else. This is a smaller set than the
    > invalid stuff, and it's better documented. If you _only_ allow the
    > valid sequences, then you have implicitly forbidden all the invalid
    > stuff.
    >
    > Definitely read this
    > http://diveintomark.org/archives/2004/02/04/incompatible-rss


    Thanks for the link to the Mark Pilgrim article. Despite the obvious
    anti-Dave Winer bias in the article, I thought it was very good and
    very informative.

    I like your suggested style for moving forward but don't understand
    how to do it. Should I take an input (when a user posts) and take
    every character and put it into an array and then test it against an
    array full of those character's that are allowed? That seems
    cumbersome, though it only happens when the user inputs stuff, so
    being cumbersome at that point isn't lethal. But is there a more
    graceful way to validate the input?
     
    lawrence, May 29, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    776
    SpaceGirl
    Feb 25, 2005
  2. Motta
    Replies:
    1
    Views:
    539
    Andy Dingley
    Jun 9, 2004
  3. Replies:
    3
    Views:
    735
    Janwillem Borleffs
    Feb 12, 2005
  4. Replies:
    2
    Views:
    110
    John Joyce
    Mar 19, 2007
  5. Jonathan Groll
    Replies:
    1
    Views:
    281
    Kouhei Sutou
    Jun 27, 2009
Loading...

Share This Page