Confused about encoding.

Call Me Tom · Aug 11, 2010

I am confused as to which encoding is best. I do all my work on a
Windows machine in US English. I have been setting the encoding to
ISO-8859-1 because some book said that was the standard Windows
character set. Now I read that most web pages are developed using
UTF-8. Assuming all the characters I need for developing my page
exist in either character set, is there any advantage of one over the
other? Does it make any difference to the person viewing my site?

Tom

Dylan Parry · Aug 12, 2010

Call Me Tom said:
I am confused as to which encoding is best. I do all my work on a
Windows machine in US English. I have been setting the encoding to
ISO-8859-1 because some book said that was the standard Windows
character set. Now I read that most web pages are developed using
UTF-8. Assuming all the characters I need for developing my page
exist in either character set, is there any advantage of one over the
other? Does it make any difference to the person viewing my site?

Assuming that you are writing entirely in English, or even slipping in
some Western European languages such as French or German, then
ISO-8859-1 is more than adequate and there's no reason for you to
change.

If you needed to use some letters from other character sets, such as
Cyrillic or Greek, then perhaps UTF-8 would be of some advantage, but as
it is it's not.

As you said you're using US English, and nothing else, you could
probably even get away with encoding your pages as ASCII!

dorayme · Aug 12, 2010

Call Me Tom said:
I am confused as to which encoding is best. I do all my work on a
Windows machine in US English. I have been setting the encoding to
ISO-8859-1 because some book said that was the standard Windows
character set. Now I read that most web pages are developed using
UTF-8. Assuming all the characters I need for developing my page
exist in either character set, is there any advantage of one over the
other? Does it make any difference to the person viewing my site?

How exactly have you been "setting the encoding to ISO-8859-1"?

Call Me Tom · Aug 12, 2010

How exactly have you been "setting the encoding to ISO-8859-1"?

Both in my editor (Netbeans for PHP) and in the head section of the
web page.

Andy Dingley · Aug 12, 2010

I am confused as to which encoding is best.

UTF-8 Because you can then do everything with one set of tools, with
one set of settings.

The other great thing about UTF-8 (no BOM) is that although it's
capable for obscure characters too, a document expressed in UTF-8 that
doesn't need anything beyond ASCII will simultaneously be a valid
ASCII document. What this means is that you _can_ use UTF-8: you can
switch your in-office work to pure UTF-8 and this will still be
compatible with the needs of your customers, even though they've
probably not heard of it.

jEdit is still my favourite editor for muching out encoding errors.
Much better than Eclipse at doing repairs.

There's some pain to getting there, particularly across teams, but
once you're there, everything Just Works and keeps on doing so.

If you're using a version control system or other file repository, you
need to keep the files in here in a standard encoding (it's possible
to work file-specific, but very awkward). In which case, you need
easy ways to check that everyone is putting their content in there
correctly. Particularly you need to find that Fred's SQL editor has
its settings screwed up and you need to go and fix Fred, not just keep
fixing Fred's files afterwards. One technique (useful if you're
working in teams) is to embed a "canary" at the tops of files. There's
no copyright character in ASCII, but © (Alt-0169) is available in
Unicode and UTF-8. So taking the usual corporate policy of "All source
must have a copyright boilerplate statement at the top" you can make
this useful to you, by embedding a standard string of "Copyright ©
2010 by FooCo". If this isn't found and there's no copyright symbol
in the first 40 lines (a pageful), then you can flag this file up as
likely being an encoding error. This is an easy regex search from a
script, easy enough for it to be automated and run under your Hudson.
Otherwise it's actually fiendishly difficult to detect encoding
errors, unless you do have some known text to search for.

Jukka K. Korpela · Aug 12, 2010

Call said:
Both in my editor (Netbeans for PHP) and in the head section of the
web page.

What matters is what the HTTP headers say. You didn't reveal your site URL,
so we cannot enlighten you on this.

But if your document only contains ASCII characters, then the encoding
doesn't really matter in practical terms, if you work in the Western world.

It only matters if you use non-ASCII characters. This is somewhat difficult,
since most people don't know what ASCII is.

Call Me Tom · Aug 12, 2010

What matters is what the HTTP headers say. You didn't reveal your site URL,
so we cannot enlighten you on this.

How does one determine what the HTTP headers say? The URL is
www.corporateairamerica.com

Tom

Jukka K. Korpela · Aug 12, 2010

Call said:
How does one determine what the HTTP headers say?

Well, by sending a HTTP request and looking at the response. If you don't
know how to talk HTTP directly, I'd recommend using Firefox with Web
Developer Extension, which lets you see the HTTP response headers in a
simple way.

The URL is
www.corporateairamerica.com

That's a server's domain name, not a URL (well, it is formally relative URL,
but where's the base?). Anyway, the URL is
http://www.corporateairamerica.com/ and the server response says
Content-Type: text/html
thereby refusing to tell the encoding, so in _this_ case the meta tag will
be used to determine the encoding.

Since the document contains ASCII characters only, the encoding could be
specified as almost anything that people use on the Web in practice.

Jeremy J Starcher · Aug 12, 2010

How does one determine what the HTTP headers say? The URL is
www.corporateairamerica.com

Tom

There are a number of ways to see the HTTP headers.

If you are using Firefox, the Firebug extension allows you to see headers.

The VERY FIRST HIT on Google took me here:
http://web-sniffer.net/

Denis McMahon · Aug 12, 2010

How does one determine what the HTTP headers say? The URL is
www.corporateairamerica.com

If you have a copy of wget, "wget -S <url>" from the command line.

Rgds

Denis McMahon

Jonathan N. Little · Aug 12, 2010

Jeremy said:
There are a number of ways to see the HTTP headers.

If you are using Firefox, the Firebug extension allows you to see headers.

The VERY FIRST HIT on Google took me here:
http://web-sniffer.net/

And telnet...

jonathan@zuko:~$ telnet www.corporateairamerica.com 80
Trying 99.198.119.34...
Connected to corporateairamerica.com.
Escape character is '^]'.
GET / HTTP/1.1
host: www.corporateairamerica.com

HTTP/1.1 200 OK
Date: Thu, 12 Aug 2010 20:15:02 GMT
Server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8e-fips-rhel5
mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By: PHP/5.2.9
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
Pragma: no-cache
Set-Cookie: caalogin=772a09b39677ebe5bdb54c0294998464; path=/
Transfer-Encoding: chunked
Content-Type: text/html

1069
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

<html>
<... snip rest of page output ...>

Jeremy J Starcher · Aug 12, 2010

And telnet...

Always telnet, but I was feeling to lazy to explain that one.

Jonathan N. Little · Aug 13, 2010

Jeremy said:
Always telnet, but I was feeling to lazy to explain that one.

I was just getting down to some basic *basic* fun! ;-)

Roy A. · Aug 14, 2010

There are a number of ways to see the HTTP headers.

If you are using Firefox, the Firebug extension allows you to see headers..

The VERY FIRST HIT on Google took me here:http://web-sniffer.net/

In Opera you can click the "Info" tab in the sidebar.

files.py (encoding error)	0	Jun 10, 2013
files.py (weird encoding error)	0	Jun 10, 2013
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
A few questiosn about encoding	103	Jun 9, 2013
Character encoding	14	Feb 15, 2008
I few confused about C++	0	Oct 19, 2012
SimpleXmlRpcServer and character encoding	3	Oct 9, 2008
Ruby 1.8 - character encoding	22	Jul 7, 2009

Confused about encoding.

Call Me Tom

Dylan Parry

dorayme

Call Me Tom

Andy Dingley

Jukka K. Korpela

Call Me Tom

Jukka K. Korpela

Jeremy J Starcher

Denis McMahon

Jonathan N. Little

Jeremy J Starcher

Jonathan N. Little

Roy A.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads