trying to validate my rss feed

Discussion in 'XML' started by lkrubner@geocities.com, Feb 21, 2005.

  1. Guest

    I've a client who, I think, writes his essays in Microsoft Word on a
    Macintosh, then copies and pastes it to a form to upload it to his
    weblog. The weblog then creates an RSS feed. The weblog and RSS feed
    are created using a simple PHP script I wrote.

    His RSS feed is not validating, apparently because of the Word "smart
    quotes". This is a guess on my part. I need to find out for sure what
    character is causing the rss failure. How do I do that? Here is the
    feed:

    http://www.feedvalidator.org/check.cgi?url=http://www.alexmarshall.org/rss/page2494.xml#l25


    What I'd like to do is run a simple search-n-replace for that character
    right before the RSS is created. But I need to find a way to get that
    character. Hex value? Byte code? How do I find such a thing?

    I could teach this client not to make this mistake, but I assume (or
    rather, I dream) at some point thousands of people will be using this
    PHP script, and I can't teach all of them.
     
    , Feb 21, 2005
    #1
    1. Advertising

  2. wrote:

    > I've a client who, I think, writes his essays in Microsoft Word on a
    > Macintosh, then copies and pastes it to a form to upload it to his
    > weblog. The weblog then creates an RSS feed. The weblog and RSS feed
    > are created using a simple PHP script I wrote.
    >
    > His RSS feed is not validating, apparently because of the Word "smart
    > quotes". This is a guess on my part. I need to find out for sure what
    > character is causing the rss failure. How do I do that? Here is the
    > feed:
    >
    > http://www.feedvalidator.org/check.cgi?url=http://www.alexmarshall.org/rss/page2494.xml#l25


    How about correcting the issues that validator raises?
    You shouldn't label something as UTF-8 encoded XML if it isn't so I
    think you need to make sure your PHP script makes sure it creates UTF-8
    encoded XML if you want that format and encoding.



    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Feb 21, 2005
    #2
    1. Advertising

  3. Andy Dingley Guest

    On 21 Feb 2005 10:35:06 -0800, wrote:

    >His RSS feed is not validating, apparently because of the Word "smart
    >quotes".


    Look for numeric entities in the output of — , ’ and the
    like. They are probably still in there as UTF-16 characters.

    Your PHP is broken (which is common behaviour for PHP & XML).
    Although these characters are well-formed in XML (not _everything_
    that M$oft do is actually invalid), they need to be represented in the
    appropriate way for your encoding. As a guess, you're including UTF-16
    characters in a document that's then getting served as UTF-8.
     
    Andy Dingley, Feb 21, 2005
    #3
  4. Guest

    Sorry. On most sites I put a .htaccess file that tells the browser that
    the text the server is sending out is UTF-8. However, what is really
    being sent out can easily become a crazy hodgepodge of character sets,
    when users start copy text from Word, WordPerfect, PDF files, Macs,
    etc, and then pasting it into the form and posting that as their weblog
    entry.

    I've had other conversations elsewhere on Usenet that suggested its
    hopeless trying to catch every encoding that people might try to input.
    For now, that's beyond my resources.

    But I would like to capture and change the 3 most common errors that
    come up, and those are the smart quotes, double and single, from Word.
     
    , Feb 22, 2005
    #4
  5. Guest

    I'm not sure how to fix the PHP. I can't serve the RSS as plain text,
    all the RSS validators complain about that. So I have to give an
    encoding. So I decided to give it a UTF-8 encoding. (I usually do this
    with an .htaccess file). But if people write stuff in Word and then
    copy and paste it to a form and hit enter and post that as their weblog
    entry, then how can I purify their input to keep the characters really
    UTF-8?

    I've asked this before on other newsgroups and have yet to hear an
    answer that was within my resources to tackle.

    It would help, of course, if I had a better understanding of character
    encodings. I've been trying to educate myself, but its slow because I
    don't have much time. Are there any resources on the subject you might
    point me to?
     
    , Feb 22, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    821
    SpaceGirl
    Feb 25, 2005
  2. Motta
    Replies:
    1
    Views:
    578
    Andy Dingley
    Jun 9, 2004
  3. Scott Gordo
    Replies:
    5
    Views:
    756
  4. Andrew Thompson
    Replies:
    2
    Views:
    395
    Daniele Futtorovic
    Jul 13, 2008
  5. Jonathan Groll
    Replies:
    1
    Views:
    321
    Kouhei Sutou
    Jun 27, 2009
Loading...

Share This Page