messed up quotes/encoding: how do I fix it?

I

idiotprogrammer

I'm having a character encoding issue here; I don't have access to the
code in question right at the moment, so my question will be more
general than specific. How generally to avoid encoding mixup problems.

Here's how I've messed up in the past. Previously, I would type text in
Openoffice and cut and paste them into Dreamweaver MX. Usually the
problem was quotations. Cutting and pasting them made them unviewable
in the dreaweaver editor (and the browser too). My solution was to do a
global cut-and-paste, replacing Openoffice's smartquotes with
dreamweaver quotes.

That solved the browser issues when the character encoding was
specified as iso-8859-1 in the html file. But recently I've been
migrating a few dozen static html pages into strict XHTML and using
utf-8, which is causing problems. I'm now editing these files in
Oxygen XMl editor and changing the encoding for all files to utf-8.

I learned that there was a clash between Openoffice/Dreamweaver
encodings, and if I used a special option to save OpenOffice files as
encoded text (and then specifying utf8), the problem would go away.

But I still had 2 dozen files to scrub.

NOW: files in 8859 encoding with dreamweaver-replaced quotes render
fine in browsers, but look awful when I switch encoding at the top of
the HTML file to utf-8 in my xml oxygen text editor.

One workaround I thought about was copying text paragraphs from the
browser into the XML editor, but the problem seems to have resurfaced
(in IE, not Firefox). When I do this, the the text looks perfectly
fine inside the XML editor and renders ok in Firefox, but errors show
up in IE.

When you copy and paste from an application (say a browser, or an html
editor) what kind of encoding are you working with? Why would text
pasted from a browser into an XML editor still have problems in IE (and
yet look ok within the XML Editor).

Any suggestions about how to avoid these encoding mismatches in the
future?

Robert Nagle http://www.imaginaryplanet.net/weblogs/idiotprogrammer/
 
I

idiotprogrammer

sorry one other thing.

The previous way I used to specify encoding was within the <meta tag.
Now I'm doing it within the <xml tag

Reading from the w3 specs, it seems that the web server sets a default
encoding (I assume you can do this in apache config files). However,
I've been noticing these issues when viewing local files (I haven't
checked them on the server yet). Does this sound like a problem
specific to viewing local files?

Is the safest practice just to specify encoding both in the meta tag
and the xml tag?

rj

Robert Nagle

*************************************
from http://w3.org/TR/xhtml1/#C_9

In order to portably present documents with specific character
encodings, the best approach is to ensure that the web server provides
the correct headers. If this is not possible, a document that wants to
set its character encoding explicitly must include both the XML
declaration an encoding declaration and a meta http-equiv statement
(e.g., <meta http-equiv="Content-type" content="text/html;
charset=EUC-JP" />). In XHTML-conforming user agents, the value of the
encoding declaration of the XML declaration takes precedence.

Note: be aware that if a document must include the character encoding
declaration in a meta http-equiv statement, that document may always be
interpreted by HTTP servers and/or user agents as being of the internet
media type defined in that statement. If a document is to be served as
multiple media types, the HTTP server must be used to set the encoding
of the document.
 
S

saz

I'm having a character encoding issue here; I don't have access to the
code in question right at the moment, so my question will be more
general than specific. How generally to avoid encoding mixup problems.

Here's how I've messed up in the past. Previously, I would type text in
Openoffice and cut and paste them into Dreamweaver MX. Usually the
problem was quotations. Cutting and pasting them made them unviewable
in the dreaweaver editor (and the browser too). My solution was to do a
global cut-and-paste, replacing Openoffice's smartquotes with
dreamweaver quotes.

That solved the browser issues when the character encoding was
specified as iso-8859-1 in the html file. But recently I've been
migrating a few dozen static html pages into strict XHTML and using
utf-8, which is causing problems. I'm now editing these files in
Oxygen XMl editor and changing the encoding for all files to utf-8.

I learned that there was a clash between Openoffice/Dreamweaver
encodings, and if I used a special option to save OpenOffice files as
encoded text (and then specifying utf8), the problem would go away.

But I still had 2 dozen files to scrub.

NOW: files in 8859 encoding with dreamweaver-replaced quotes render
fine in browsers, but look awful when I switch encoding at the top of
the HTML file to utf-8 in my xml oxygen text editor.

One workaround I thought about was copying text paragraphs from the
browser into the XML editor, but the problem seems to have resurfaced
(in IE, not Firefox). When I do this, the the text looks perfectly
fine inside the XML editor and renders ok in Firefox, but errors show
up in IE.

When you copy and paste from an application (say a browser, or an html
editor) what kind of encoding are you working with? Why would text
pasted from a browser into an XML editor still have problems in IE (and
yet look ok within the XML Editor).

Any suggestions about how to avoid these encoding mismatches in the
future?

Robert Nagle http://www.imaginaryplanet.net/weblogs/idiotprogrammer/
You are going from OpenOffice to Dreamweaver to an XML editor. Why? I
guarantee that is what is creating your problems.

You seem to be going through a lot of steps here. Why use OpenOffice
for typing text? Why not directly into your xml or html editor, then
add the desired tags?

Or if you need to use a text editor before cutting and pasting into
another editor (I still can't figure that out), use notepad - it adds
nothing.
 
I

idiotprogrammer

Yes, I won't deny that in retrospect typing it first in Openoffice
seemed incredibly stupid. I don't do that anymore. rj
 
M

Mitja

NOW: files in 8859 encoding with dreamweaver-replaced quotes render
fine in browsers, but look awful when I switch encoding at the top of
the HTML file to utf-8 in my xml oxygen text editor.
Obviously "switching encoding" in oxygen does just that - switches
oxygen's interpretation of given bytes that are the file. In other words,
it doesn't convert the data. The solution in probably to change the
encoding BEFORE pasting data in oxygen. Or, since the data is already
there, the following dance in oxygen: select all - cut - switch encoding -
paste.
When I do this, the the text looks perfectly
fine inside the XML editor and renders ok in Firefox, but errors show
up in IE.
Probably because IE detects the wrong encoding type. Re-check your headers
etc. If you switch to utf in IE manually, does it show up ok?

Mitja
 
I

idiotprogrammer

Here's what happened.

I created the text in OO and pasted them into dreamweaver.
Inconsistencies occurred (posting UTF data into a file encoded with
iso-8859-1 and vice versa).

Recently, when cleaning up the code, I changed the encoding to be
utf-8, declaring it both in the xml tag and the meta http-equiv tag.

What I did not realize was that the web host was serving content using
iso 8859 encoding. So by forcing everything to be utf-8 I was causing a
mismatch.

My challenge: I want to declare utf-8 for my static files (for reasons
of portability and future compatibility) while making it visible when
viewed in 8859-1.

Two questions (before you read any further)
1. Is utf-8 simply a superset of iso8859. Or are they separate beasts.
If I declare utf-8 encoding but restrict myself to iso-8859-1
characters, then I'll be safe in both worlds, right?
2. Are there any tools that are good for converting back and forth?

When in Dreamweaver I saw bad characters for quotation marks and
mdashes. I did global replaces (probably).

because I was viewing the files locally, the firefox browser was
accepting the utf8 encoding. And my latest editor, Oxygen XML was
displaying all UTF-8 content normally. But when it was being served
from a site with 8859 headers, it was showing mistakes.

The solution: cutting and pasting text from the mistake-ridden
iso-8859-1 encoded browser page to the utf-8 encoded file in the XML
editor and then doing a global search and replace. By doing that, I am
identifying and changing the characters that have the potential to
cause problems. And I can test them out by enabling 8859-1 in my
Firefox browser.

Interestingly, IE resolved the encoding more easily by accepting the
files defaults instead of the server's defaults. don't know if this is
good or bad.

So that is my statement of a problem and a probable solution. Any
thoughts/insights/links would be appreciated.

Robert Nagle, idiotprogrammer

By the way, it looks as though if I am explicitly declaring multiple
media, then I can't use the meta http-equiv http://w3.org/TR/xhtml1/#C_9
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top