trying to validate my rss feed

L

lkrubner

I've a client who, I think, writes his essays in Microsoft Word on a
Macintosh, then copies and pastes it to a form to upload it to his
weblog. The weblog then creates an RSS feed. The weblog and RSS feed
are created using a simple PHP script I wrote.

His RSS feed is not validating, apparently because of the Word "smart
quotes". This is a guess on my part. I need to find out for sure what
character is causing the rss failure. How do I do that? Here is the
feed:

http://www.feedvalidator.org/check.cgi?url=http://www.alexmarshall.org/rss/page2494.xml#l25


What I'd like to do is run a simple search-n-replace for that character
right before the RSS is created. But I need to find a way to get that
character. Hex value? Byte code? How do I find such a thing?

I could teach this client not to make this mistake, but I assume (or
rather, I dream) at some point thousands of people will be using this
PHP script, and I can't teach all of them.
 
M

Martin Honnen

I've a client who, I think, writes his essays in Microsoft Word on a
Macintosh, then copies and pastes it to a form to upload it to his
weblog. The weblog then creates an RSS feed. The weblog and RSS feed
are created using a simple PHP script I wrote.

His RSS feed is not validating, apparently because of the Word "smart
quotes". This is a guess on my part. I need to find out for sure what
character is causing the rss failure. How do I do that? Here is the
feed:

http://www.feedvalidator.org/check.cgi?url=http://www.alexmarshall.org/rss/page2494.xml#l25

How about correcting the issues that validator raises?
You shouldn't label something as UTF-8 encoded XML if it isn't so I
think you need to make sure your PHP script makes sure it creates UTF-8
encoded XML if you want that format and encoding.
 
A

Andy Dingley

His RSS feed is not validating, apparently because of the Word "smart
quotes".

Look for numeric entities in the output of — , ’ and the
like. They are probably still in there as UTF-16 characters.

Your PHP is broken (which is common behaviour for PHP & XML).
Although these characters are well-formed in XML (not _everything_
that M$oft do is actually invalid), they need to be represented in the
appropriate way for your encoding. As a guess, you're including UTF-16
characters in a document that's then getting served as UTF-8.
 
L

lkrubner

Sorry. On most sites I put a .htaccess file that tells the browser that
the text the server is sending out is UTF-8. However, what is really
being sent out can easily become a crazy hodgepodge of character sets,
when users start copy text from Word, WordPerfect, PDF files, Macs,
etc, and then pasting it into the form and posting that as their weblog
entry.

I've had other conversations elsewhere on Usenet that suggested its
hopeless trying to catch every encoding that people might try to input.
For now, that's beyond my resources.

But I would like to capture and change the 3 most common errors that
come up, and those are the smart quotes, double and single, from Word.
 
L

lkrubner

I'm not sure how to fix the PHP. I can't serve the RSS as plain text,
all the RSS validators complain about that. So I have to give an
encoding. So I decided to give it a UTF-8 encoding. (I usually do this
with an .htaccess file). But if people write stuff in Word and then
copy and paste it to a form and hit enter and post that as their weblog
entry, then how can I purify their input to keep the characters really
UTF-8?

I've asked this before on other newsgroups and have yet to hear an
answer that was within my resources to tackle.

It would help, of course, if I had a better understanding of character
encodings. I've been trying to educate myself, but its slow because I
don't have much time. Are there any resources on the subject you might
point me to?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top