I am confused as to which encoding is best.
UTF-8 Because you can then do everything with one set of tools, with
one set of settings.
The other great thing about UTF-8 (no BOM) is that although it's
capable for obscure characters too, a document expressed in UTF-8 that
doesn't need anything beyond ASCII will simultaneously be a valid
ASCII document. What this means is that you _can_ use UTF-8: you can
switch your in-office work to pure UTF-8 and this will still be
compatible with the needs of your customers, even though they've
probably not heard of it.
jEdit is still my favourite editor for muching out encoding errors.
Much better than Eclipse at doing repairs.
There's some pain to getting there, particularly across teams, but
once you're there, everything Just Works and keeps on doing so.
If you're using a version control system or other file repository, you
need to keep the files in here in a standard encoding (it's possible
to work file-specific, but very awkward). In which case, you need
easy ways to check that everyone is putting their content in there
correctly. Particularly you need to find that Fred's SQL editor has
its settings screwed up and you need to go and fix Fred, not just keep
fixing Fred's files afterwards. One technique (useful if you're
working in teams) is to embed a "canary" at the tops of files. There's
no copyright character in ASCII, but © (Alt-0169) is available in
Unicode and UTF-8. So taking the usual corporate policy of "All source
must have a copyright boilerplate statement at the top" you can make
this useful to you, by embedding a standard string of "Copyright ©
2010 by FooCo". If this isn't found and there's no copyright symbol
in the first 40 lines (a pageful), then you can flag this file up as
likely being an encoding error. This is an easy regex search from a
script, easy enough for it to be automated and run under your Hudson.
Otherwise it's actually fiendishly difficult to detect encoding
errors, unless you do have some known text to search for.