Ross Ridge (Sat, 21 Feb 2009 18:06:35 -0500)
No, the original post demonstrates you don't have include MIME headers for
ISO 8859-1 text to be properly displayed by many newsreaders. The fact
that your obscure newsreader didn't display it properly doesn't mean
that original poster's newsreader is broken.
And how is this kind of assuming better than clearly stating the used
encoding? Does the fact that the last official Usenet RFC doesn't
mandate content-type headers mean that all bets are off and that we
should rely on guesswork to determine the correct encoding of a
message? No, it means the RFC is outdated and no longer suitable for
current needs.
HTTP requires the assumption of ISO 8859-1 in the absense of any
specified encoding.
Which is, of course, completely irrelevant for this discussion. Or are
you saying that this fact should somehow obliterate the need for
specifying encodings?
Newsreaders assuming ISO 8859-1 instead of ASCII doesn't make it a guess.
It's just a different assumption, nor does making an assumption, ASCII
or ISO 8850-1, give you any certainty.
Assuming is another way of saying "I don't know, so I'm using this
arbitrary default", which is not that different from a completely wild
guess.
Which is reasonable given that Python is programming language where it's
better to have more conservative assumption about encodings so errors
can be more quickly diagnosed. A newsreader however is a different
beast, where it's better to make a less conservative assumption that's
more likely to display messages correctly to the user. Assuming ISO
8859-1 in the absense of any specified encoding allows the message to be
correctly displayed if the character set is either ISO 8859-1 or ASCII.
Doing things the "pythonic" way and assuming ASCII only allows such
messages to be displayed if ASCII is used.
Reading this paragraph, I've began thinking that we've misunderstood
each other. I agree that assuming ISO 8859-1 in the absence of
specification is a better guess than most (since it's more likely to
display the message correctly). However, not specifying the encoding
of a message is just asking for trouble and assuming anything is just
an attempt of cleaning someone's mess. Unfortunately, it is impossible
to detect the encoding scheme just by heuristics and with hundreds of
encodings in existence today, the only real solution to the problem is
clearly stating your content-type. Since MIME is the most accepted way
of doing this, it should be the preferred way, RFC'ed or not.