Email charset

R

Roedy Green

When sending an email with Javamail, it is safe to use UTF-8 encoding.
Is there anyone who can't read it today?
 
T

Thomas Weidenfeller

Roedy said:
When sending an email with Javamail, it is safe to use UTF-8 encoding.
Is there anyone who can't read it today?

Just half a year ago I had to use a mail reader which could only deal
with ASCII. It was on a remote site running some embedded OS. I was
happy that I even had a command line mail tool.

My suggestion would be to use the least common denominator:

- If your mail's texts only contain characters which can be encoded in
ASCII, use ASCII for the mail.

- If your mail's texts only contain characters which can be encoded
within the range of Latin 1, use ISO-8859-1.

- Only if you have other characters, too, and if you don't want to go
through the trouble of figuring out for each destination the most common
encoding which still includes your characters, then use UTF-8.

/Thomas
 
R

Roedy Green

- If your mail's texts only contain characters which can be encoded in
ASCII, use ASCII for the mail.

- If your mail's texts only contain characters which can be encoded
within the range of Latin 1, use ISO-8859-1.

- Only if you have other characters, too, and if you don't want to go
through the trouble of figuring out for each destination the most common
encoding which still includes your characters, then use UTF-8.

My generated email alerts will be in dozens of languages. The project
I am working on in the Internationaliser which among other thing will
generate email alert messages to the various administrators,
translators, proofreaders and programmers all speaking a hodgepodge of
languages. the Internationaliser itself will of course be
internationalised up the yin yang.

See http://mindprod.com/projects/internationaliser.html

I guess I should play it safe and allow the Charset to be
configurable, setting everyone's to UTF-8 default.
 
T

Thomas Weidenfeller

Roedy said:
I guess I should play it safe and allow the Charset to be
configurable, setting everyone's to UTF-8 default.

I would set the default to ASCII. And when reading in a mail's text for
distribution probably run a check if it is all 7-bit. If not, give a
warning, error, or silently switch to UTF-8.

Or work with some kind of templates, instead of raw text input. Each
text intended for distribution can contain not only the text, but e-mail
style headers which your tool then uses to prepare the actual messages.
E.g. instead of just having

bla bla, buy our void bla, great
stuff now, cheap, bla

as input, you have stuff like

Content-Type: text/plain; charset=us-ascii
Organistaion: Get poor quick!
From: (e-mail address removed)

bla bla, buy our void bla, great
stuff now, cheap, bla

in your input, and you copy that stuff to the outgoing mails, overriding
any defaults. That way the people who contribute the language specific
texts can set the "best" encoding for the text while providing it.


/Thomas
 
R

Roedy Green

I would set the default to ASCII. And when reading in a mail's text for
distribution probably run a check if it is all 7-bit. If not, give a
warning, error, or silently switch to UTF-8.

In my case a large proportion of the messages will not be in English.
To start most will be in Latvian or Serbian.
 
R

Roedy Green

in your input, and you copy that stuff to the outgoing mails, overriding
any defaults. That way the people who contribute the language specific
texts can set the "best" encoding for the text while providing it.

my program is a tool for internationalising Java apps, and logically
it should use itself to internationalise itself including all the
email alerts it generates. These are things like warnings to
translators work is available.

Right now I have a record for each person with the email address,
preferred locale (language/country/variant), preferred L&F, and
preferred email encoding.

If you are curious what I am up to, see
http://mindprod.com/projects/internnationaliser.html

I am implementing one of my own student projects.
 
O

Oliver Wong

Roedy Green said:
When sending an email with Javamail, it is safe to use UTF-8 encoding.
Is there anyone who can't read it today?

There's always *someone*, *somewhere* who can't read *something*. If
your program is multilingual (as you've mentioned later on in this thread),
then your best bet is probably Unicode (and thus UTF-8). ASCII certainly
won't cut it.

If I understand correctly, the users of your program are going to be
application translators (or more generally, computer-savy linguists), so
they should have software that can handle UTF-8. If not, you could accompany
your software with tutorials on setting up a unicode-enabled system for the
various popular OSes.

- Oliver
 
R

Roedy Green

If I understand correctly, the users of your program are going to be
application translators (or more generally, computer-savy linguists), so
they should have software that can handle UTF-8. If not, you could accompany
your software with tutorials on setting up a unicode-enabled system for the
various popular OSes.

What makes this program more complicated than you might expect is
that it deals with the interactions between people and simultaneous
updates of everything. It is not just a resource bundle editor.

There are four classes of people my program deals with.

1. programmers: they write Java code that they (or their bosses) want
interationalised/localised.

2. translators: people who can translate the programmer's language
(usually English) into the target languages with national and other
variants. Serbians have several dialects for example.

3. proofreaders. People who check the translators' work.

4. administrators. people who manage the projects, assigning work,
checking up on progress, configuring.

A person could wear all four hats.

At this stage, I am primarily concerned with computer-generated emails
to alert people something is ready for them, e.g. a work assignment is
ready for a translator who may work at home part time.

see http://mindprod.com/projects/internationaliser.html

It may eventually be extended to create a simple person-to-person
email that would deal with the problem of selecting a suitable
encoding for the recipient's email program. It would be a can of worms
if I tried to handle getting it translated as well.

At some point in my life I want to write something to replace regular
email that deals in a serious way with spam, spoofing and unwanted
enclosures.

see http://mindprod.com/projects/mailreadernewsreader.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,430
Messages
2,571,676
Members
48,796
Latest member
Greg L.

Latest Threads

Top