Confused about encoding.

C

Call Me Tom

I am confused as to which encoding is best. I do all my work on a
Windows machine in US English. I have been setting the encoding to
ISO-8859-1 because some book said that was the standard Windows
character set. Now I read that most web pages are developed using
UTF-8. Assuming all the characters I need for developing my page
exist in either character set, is there any advantage of one over the
other? Does it make any difference to the person viewing my site?

Tom
 
D

Dylan Parry

Call Me Tom said:
I am confused as to which encoding is best. I do all my work on a
Windows machine in US English. I have been setting the encoding to
ISO-8859-1 because some book said that was the standard Windows
character set. Now I read that most web pages are developed using
UTF-8. Assuming all the characters I need for developing my page
exist in either character set, is there any advantage of one over the
other? Does it make any difference to the person viewing my site?

Assuming that you are writing entirely in English, or even slipping in
some Western European languages such as French or German, then
ISO-8859-1 is more than adequate and there's no reason for you to
change.

If you needed to use some letters from other character sets, such as
Cyrillic or Greek, then perhaps UTF-8 would be of some advantage, but as
it is it's not.

As you said you're using US English, and nothing else, you could
probably even get away with encoding your pages as ASCII!
 
D

dorayme

Call Me Tom said:
I am confused as to which encoding is best. I do all my work on a
Windows machine in US English. I have been setting the encoding to
ISO-8859-1 because some book said that was the standard Windows
character set. Now I read that most web pages are developed using
UTF-8. Assuming all the characters I need for developing my page
exist in either character set, is there any advantage of one over the
other? Does it make any difference to the person viewing my site?


How exactly have you been "setting the encoding to ISO-8859-1"?
 
A

Andy Dingley

I am confused as to which encoding is best.

UTF-8 Because you can then do everything with one set of tools, with
one set of settings.

The other great thing about UTF-8 (no BOM) is that although it's
capable for obscure characters too, a document expressed in UTF-8 that
doesn't need anything beyond ASCII will simultaneously be a valid
ASCII document. What this means is that you _can_ use UTF-8: you can
switch your in-office work to pure UTF-8 and this will still be
compatible with the needs of your customers, even though they've
probably not heard of it.

jEdit is still my favourite editor for muching out encoding errors.
Much better than Eclipse at doing repairs.

There's some pain to getting there, particularly across teams, but
once you're there, everything Just Works and keeps on doing so.

If you're using a version control system or other file repository, you
need to keep the files in here in a standard encoding (it's possible
to work file-specific, but very awkward). In which case, you need
easy ways to check that everyone is putting their content in there
correctly. Particularly you need to find that Fred's SQL editor has
its settings screwed up and you need to go and fix Fred, not just keep
fixing Fred's files afterwards. One technique (useful if you're
working in teams) is to embed a "canary" at the tops of files. There's
no copyright character in ASCII, but © (Alt-0169) is available in
Unicode and UTF-8. So taking the usual corporate policy of "All source
must have a copyright boilerplate statement at the top" you can make
this useful to you, by embedding a standard string of "Copyright ©
2010 by FooCo". If this isn't found and there's no copyright symbol
in the first 40 lines (a pageful), then you can flag this file up as
likely being an encoding error. This is an easy regex search from a
script, easy enough for it to be automated and run under your Hudson.
Otherwise it's actually fiendishly difficult to detect encoding
errors, unless you do have some known text to search for.
 
J

Jukka K. Korpela

Call said:
Both in my editor (Netbeans for PHP) and in the head section of the
web page.

What matters is what the HTTP headers say. You didn't reveal your site URL,
so we cannot enlighten you on this.

But if your document only contains ASCII characters, then the encoding
doesn't really matter in practical terms, if you work in the Western world.

It only matters if you use non-ASCII characters. This is somewhat difficult,
since most people don't know what ASCII is.
 
J

Jukka K. Korpela

Call said:
How does one determine what the HTTP headers say?

Well, by sending a HTTP request and looking at the response. If you don't
know how to talk HTTP directly, I'd recommend using Firefox with Web
Developer Extension, which lets you see the HTTP response headers in a
simple way.

That's a server's domain name, not a URL (well, it is formally relative URL,
but where's the base?). Anyway, the URL is
http://www.corporateairamerica.com/ and the server response says
Content-Type: text/html
thereby refusing to tell the encoding, so in _this_ case the meta tag will
be used to determine the encoding.

Since the document contains ASCII characters only, the encoding could be
specified as almost anything that people use on the Web in practice.
 
J

Jonathan N. Little

Jeremy said:
There are a number of ways to see the HTTP headers.

If you are using Firefox, the Firebug extension allows you to see headers.

The VERY FIRST HIT on Google took me here:
http://web-sniffer.net/

And telnet...

jonathan@zuko:~$ telnet www.corporateairamerica.com 80
Trying 99.198.119.34...
Connected to corporateairamerica.com.
Escape character is '^]'.
GET / HTTP/1.1
host: www.corporateairamerica.com

HTTP/1.1 200 OK
Date: Thu, 12 Aug 2010 20:15:02 GMT
Server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8e-fips-rhel5
mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By: PHP/5.2.9
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Pragma: no-cache
Set-Cookie: caalogin=772a09b39677ebe5bdb54c0294998464; path=/
Transfer-Encoding: chunked
Content-Type: text/html

1069
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

<html>
<... snip rest of page output ...>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top