To unicode or not to unicode

Discussion in 'Python' started by Ron Garret, Feb 20, 2009.

  1. Ron Garret

    Ron Garret Guest

    I'm writing a little wiki that I call µWiki. That's a lowercase Greek
    mu at the beginning (it's pronounced micro-wiki). It's working, except
    that I can't actually enter the name of the wiki into the wiki itself
    because the default unicode encoding on my Python installation is
    "ascii". So I'm trying to decide on a course of action. There seem to
    be three possibilities:

    1. Change the code to properly support unicode. Preliminary
    investigations indicate that this is going to be a colossal pain in the
    ass.

    2. Change the default encoding on my Python installation to be latin-1
    or UTF8. The disadvantage to this is that no one else will be able to
    run my code without making the same change to their installation, since
    you can't change default encodings once Python has started.

    3. Punt and spell it 'uwiki' instead.

    I'm feeling indecisive so I thought I'd ask other people's opinion.
    What should I do?

    rg
    Ron Garret, Feb 20, 2009
    #1
    1. Advertising

  2. Ron Garret <rNOSPAMon <at> flownet.com> writes:

    >
    > I'm writing a little wiki that I call µWiki. That's a lowercase Greek
    > mu at the beginning (it's pronounced micro-wiki). It's working, except
    > that I can't actually enter the name of the wiki into the wiki itself
    > because the default unicode encoding on my Python installation is
    > "ascii". So I'm trying to decide on a course of action. There seem to
    > be three possibilities:


    You should never have to rely on the default encoding. You should explicitly
    decode and encode data.

    >
    > 1. Change the code to properly support unicode. Preliminary
    > investigations indicate that this is going to be a colossal pain in the
    > ass.


    Properly handling unicode may be painful at first, but it will surely pay off in
    the future.
    Benjamin Peterson, Feb 20, 2009
    #2
    1. Advertising

  3. * Ron Garret (Thu, 19 Feb 2009 18:57:13 -0800)
    > I'm writing a little wiki that I call µWiki. That's a lowercase Greek
    > mu at the beginning (it's pronounced micro-wiki).


    No, it's not. I suggest you start your Unicode adventure by configuring
    your newsreader.

    Thorsten
    Thorsten Kampe, Feb 20, 2009
    #3
  4. Ron Garret

    MRAB Guest

    Thorsten Kampe wrote:
    > * Ron Garret (Thu, 19 Feb 2009 18:57:13 -0800)
    >> I'm writing a little wiki that I call µWiki. That's a lowercase Greek
    >> mu at the beginning (it's pronounced micro-wiki).

    >
    > No, it's not. I suggest you start your Unicode adventure by configuring
    > your newsreader.
    >

    It looked like mu to me, but you're correct: it's "MICRO SIGN", not
    "GREEK SMALL LETTER MU".
    MRAB, Feb 20, 2009
    #4
  5. Ron Garret

    Ron Garret Guest

    In article <>,
    MRAB <> wrote:

    > Thorsten Kampe wrote:
    > > * Ron Garret (Thu, 19 Feb 2009 18:57:13 -0800)
    > >> I'm writing a little wiki that I call µWiki. That's a lowercase Greek
    > >> mu at the beginning (it's pronounced micro-wiki).

    > >
    > > No, it's not. I suggest you start your Unicode adventure by configuring
    > > your newsreader.
    > >

    > It looked like mu to me, but you're correct: it's "MICRO SIGN", not
    > "GREEK SMALL LETTER MU".


    Heh, I didn't know that those two things were distinct. Learn something
    new every day.

    rg
    Ron Garret, Feb 20, 2009
    #5
  6. MRAB wrote:
    > Thorsten Kampe wrote:
    >> * Ron Garret (Thu, 19 Feb 2009 18:57:13 -0800)
    >>> I'm writing a little wiki that I call µWiki. That's a lowercase
    >>> Greek mu at the beginning (it's pronounced micro-wiki).

    >>
    >> No, it's not. I suggest you start your Unicode adventure by
    >> configuring your newsreader.
    >>

    > It looked like mu to me, but you're correct: it's "MICRO SIGN", not
    > "GREEK SMALL LETTER MU".


    I don't think that was the complaint. Instead, the complaint was
    that the OP's original message did not have a Content-type header,
    and that it was thus impossible to tell what the byte in front of
    "Wiki" meant. To properly post either MICRO SIGN or GREEK SMALL LETTER
    MU in a usenet or email message, you really must use MIME. (As both
    your article and Thorsten's did, by choosing UTF-8)

    Regards,
    Martin

    P.S. The difference between MICRO SIGN and GREEK SMALL LETTER MU
    is nit-picking, IMO:

    py> unicodedata.name(unicodedata.normalize("NFKC", u"\N{MICRO SIGN}"))
    'GREEK SMALL LETTER MU'
    Martin v. Löwis, Feb 20, 2009
    #6
  7. Ron Garret

    Ron Garret Guest

    In article <>,
    "Martin v. Löwis" <> wrote:

    > MRAB wrote:
    > > Thorsten Kampe wrote:
    > >> * Ron Garret (Thu, 19 Feb 2009 18:57:13 -0800)
    > >>> I'm writing a little wiki that I call µWiki. That's a lowercase
    > >>> Greek mu at the beginning (it's pronounced micro-wiki).
    > >>
    > >> No, it's not. I suggest you start your Unicode adventure by
    > >> configuring your newsreader.
    > >>

    > > It looked like mu to me, but you're correct: it's "MICRO SIGN", not
    > > "GREEK SMALL LETTER MU".

    >
    > I don't think that was the complaint. Instead, the complaint was
    > that the OP's original message did not have a Content-type header,


    I'm the OP. I'm using MT-Newswatcher 3.5.1. I thought I had it
    configured properly, but I guess I didn't. Under
    Preferences->Languages->Send Messages with Encoding I had selected
    latin-1. I didn't know I also needed to have MIME turned on for that to
    work. I've turned it on now. Is this better?

    This should be a micro sign: µ

    rg
    Ron Garret, Feb 20, 2009
    #7
  8. Ron Garret wrote:
    > In article <>,
    > "Martin v. Löwis" <> wrote:
    >
    >
    > I'm the OP. I'm using MT-Newswatcher 3.5.1. I thought I had it
    > configured properly, but I guess I didn't.


    Probably you did. However, it then means that the newsreader is crap.

    > Under
    > Preferences->Languages->Send Messages with Encoding I had selected
    > latin-1.


    That sounds like early nineties, before the invention of MIME.

    > I didn't know I also needed to have MIME turned on for that to
    > work. I've turned it on now. Is this better?
    >
    > This should be a micro sign: µ


    Not really (it's worse, from my point of view - but might be better
    for others). You are now sending in UTF-8, but there is still no
    MIME declaration in the news headers. As a consequence, my newsreader
    continues to interpret it as Latin-1 (which it assumes as the default
    encoding), and it comes out as moji-bake (in responding, my reader
    should declare the encoding properly, so you should see what I see,
    namely A-circumflex, micro sign)

    If you look at the message headers / message source as sent e.g.
    by MRAB, you'll notice lines like

    MIME-Version: 1.0
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: 8bit

    These lines are missing from your posting.

    Assuming the newsreader is not crap, it might help to set the default
    send encoding to ASCII. When sending micro sign, the newsreader might
    infer that ASCII is not good enough, and use MIME - although it then
    still needs to pick an encoding.

    Regards,
    Martin
    Martin v. Löwis, Feb 20, 2009
    #8
  9. Ron Garret

    Ross Ridge Guest

    =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= <> wrote:
    >I don't think that was the complaint. Instead, the complaint was
    >that the OP's original message did not have a Content-type header,
    >and that it was thus impossible to tell what the byte in front of
    >"Wiki" meant. To properly post either MICRO SIGN or GREEK SMALL LETTER
    >MU in a usenet or email message, you really must use MIME. (As both
    >your article and Thorsten's did, by choosing UTF-8)


    MIME only applies Internet e-mail messages. RFC 1036 doesn't require
    nor give a meaning to a Content-Type header in a Usenet message, so
    there's nothing wrong with the original poster's newsreader.

    In any case what the original poster really should do is come up with
    a better name for his program

    Ross Ridge

    --
    l/ // Ross Ridge -- The Great HTMU
    [oo][oo]
    -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
    db //
    Ross Ridge, Feb 21, 2009
    #9
  10. * Ross Ridge (Sat, 21 Feb 2009 12:22:36 -0500)
    > =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= <> wrote:
    > >I don't think that was the complaint. Instead, the complaint was
    > >that the OP's original message did not have a Content-type header,
    > >and that it was thus impossible to tell what the byte in front of
    > >"Wiki" meant. To properly post either MICRO SIGN or GREEK SMALL LETTER
    > >MU in a usenet or email message, you really must use MIME. (As both
    > >your article and Thorsten's did, by choosing UTF-8)

    >
    > MIME only applies Internet e-mail messages.


    No, it doesn't: "MIME's use, however, has grown beyond describing the
    content of e-mail to describing content type in general. [...]

    The content types defined by MIME standards are also of importance
    outside of e-mail, such as in communication protocols like HTTP [...]"

    http://en.wikipedia.org/wiki/MIME

    > RFC 1036 doesn't require nor give a meaning to a Content-Type header
    > in a Usenet message


    Well, /maybe/ the reason for that is that RFC 1036 was written in 1987
    and the first MIME RFC in 1992...? The "Son of RFC 1036" mentions MIME
    more often than you can count.

    > so there's nothing wrong with the original poster's newsreader.


    If you follow RFC 1036 (who was written before anyone even thought of
    MIME) then all content has to ASCII. The OP used non ASCII letters.

    It's all about declaring your charset. In Python as well as in your
    newsreader. If you don't declare your charset it's ASCII for you - in
    Python as well as in your newsreader.

    Thorsten
    Thorsten Kampe, Feb 21, 2009
    #10
  11. Ron Garret

    Ross Ridge Guest

    Thorsten Kampe <> wrote:
    >> RFC 1036 doesn't require nor give a meaning to a Content-Type header
    >> in a Usenet message

    >
    >Well, /maybe/ the reason for that is that RFC 1036 was written in 1987
    >and the first MIME RFC in 1992...?


    Obviously.

    >"Son of RFC 1036" mentions MIME more often than you can count.


    Since it was never sumbitted and accepted, RFC 1036 remains current.

    >> so there's nothing wrong with the original poster's newsreader.

    >
    >If you follow RFC 1036 (who was written before anyone even thought of
    >MIME) then all content has to ASCII. The OP used non ASCII letters.


    RFC 1036 doesn't place any restrictions on the content on the body of
    an article. On the other hand "Son of RFC 1036" does have restrictions
    on characters used in the body of message:

    Articles MUST not contain any octet with value exceeding 127,
    i.e. any octet that is not an ASCII character

    Which means that merely adding a Content-Encoding header wouldn't
    be enough to conform to "Son of RFC 1036", the original poster would
    also have had to either switch to a 7-bit character set or use a 7-bit
    compatible transfer encoding. If you trying to claim that "Son of RFC
    1036" is the new defacto standard, then that would mean your newsreader
    is broken too.

    >It's all about declaring your charset. In Python as well as in your
    >newsreader. If you don't declare your charset it's ASCII for you - in
    >Python as well as in your newsreader.


    Except in practice unlike Python, many newsreaders don't assume ASCII.
    The original article displayed fine for me. Google Groups displays it
    correctly too:

    http://groups.google.com/group/comp.lang.python/msg/828fefd7040238bc

    I could just as easily argue that assuming ISO 8859-1 is the defacto
    standard, and that its your newsreader that's broken. The reality however
    is that RFC 1036 is the only standard for Usenet messages, defacto or
    otherwise, and so there's nothing wrong with anyone's newsreader.

    Ross Ridge

    --
    l/ // Ross Ridge -- The Great HTMU
    [oo][oo]
    -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
    db //
    Ross Ridge, Feb 21, 2009
    #11
  12. * Ross Ridge (Sat, 21 Feb 2009 14:52:09 -0500)
    > Thorsten Kampe <> wrote:
    >> It's all about declaring your charset. In Python as well as in your
    >> newsreader. If you don't declare your charset it's ASCII for you - in
    >> Python as well as in your newsreader.

    >
    > Except in practice unlike Python, many newsreaders don't assume ASCII.


    They assume ASCII - unless you declare your charset (the exception being
    Outlook Express and a few Windows newsreaders). Everything else is
    "guessing".

    > The original article displayed fine for me. Google Groups displays it
    > correctly too:
    >
    > http://groups.google.com/group/comp.lang.python/msg/828fefd7040238bc


    Your understanding of the principles of Unicode is as least as non-
    existant as the OP's.

    > I could just as easily argue that assuming ISO 8859-1 is the defacto
    > standard, and that its your newsreader that's broken.


    There is no "standard" in regard to guessing (this is what you call
    "assuming"). The need for explicit declaration of an encoding is exactly
    the same in Python as in any Usenet article.

    > The reality however is that RFC 1036 is the only standard for Usenet
    > messages, defacto or otherwise, and so there's nothing wrong with
    > anyone's newsreader.


    The reality is that all non-broken newsreaders use MIME headers to
    declare and interpret the charset being used. I suggest you read at
    least http://www.joelonsoftware.com/articles/Unicode.html to get an idea
    of Unicode and associated topics.

    Thorsten
    Thorsten Kampe, Feb 21, 2009
    #12
  13. Ron Garret

    Ross Ridge Guest

    Ross Ridge (Sat, 21 Feb 2009 14:52:09 -0500)
    > Except in practice unlike Python, many newsreaders don't assume ASCII.


    Thorsten Kampe <> wrote:
    >They assume ASCII - unless you declare your charset (the exception being
    >Outlook Express and a few Windows newsreaders). Everything else is
    >"guessing".


    No, it's an assumption like the way Python by default assumes ASCII.

    >> The original article displayed fine for me. Google Groups displays it
    >> correctly too:
    >>
    >> http://groups.google.com/group/comp.lang.python/msg/828fefd7040238bc

    >
    >Your understanding of the principles of Unicode is as least as non-
    >existant as the OP's.


    The link demonstrates that Google Groups doesn't assume ASCII like
    Python does. Since popular newsreaders like Google Groups and Outlook
    Express can display the message correctly without the MIME headers,
    but your obscure one can't, there's a much stronger case to made that
    it's your newsreader that's broken.

    >> I could just as easily argue that assuming ISO 8859-1 is the defacto
    >> standard, and that its your newsreader that's broken.

    >
    >There is no "standard" in regard to guessing (this is what you call
    >"assuming"). The need for explicit declaration of an encoding is exactly
    >the same in Python as in any Usenet article.


    No, many newsreaders don't assume ASCII by default like Python.

    >> The reality however is that RFC 1036 is the only standard for Usenet
    >> messages, defacto or otherwise, and so there's nothing wrong with
    >> anyone's newsreader.

    >
    >The reality is that all non-broken newsreaders use MIME headers to
    >declare and interpret the charset being used.


    Since RFC 1036 doesn't require MIME headers a reader that doesn't generate
    them is by definition not broken.

    Ross Ridge

    --
    l/ // Ross Ridge -- The Great HTMU
    [oo][oo]
    -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
    db //
    Ross Ridge, Feb 21, 2009
    #13
  14. Ron Garret

    Carl Banks Guest

    On Feb 19, 6:57 pm, Ron Garret <> wrote:
    > I'm writing a little wiki that I call µWiki.  That's a lowercase Greek
    > mu at the beginning (it's pronounced micro-wiki).  It's working, except
    > that I can't actually enter the name of the wiki into the wiki itself
    > because the default unicode encoding on my Python installation is
    > "ascii".  So I'm trying to decide on a course of action.  There seem to
    > be three possibilities:
    >
    > 1.  Change the code to properly support unicode.  Preliminary
    > investigations indicate that this is going to be a colossal pain in the
    > ass.
    >
    > 2.  Change the default encoding on my Python installation to be latin-1
    > or UTF8.  The disadvantage to this is that no one else will be able to
    > run my code without making the same change to their installation, since
    > you can't change default encodings once Python has started.
    >
    > 3.  Punt and spell it 'uwiki' instead.
    >
    > I'm feeling indecisive so I thought I'd ask other people's opinion.  
    > What should I do?
    >
    > rg
    Carl Banks, Feb 21, 2009
    #14
  15. * Ross Ridge (Sat, 21 Feb 2009 17:07:35 -0500)
    > The link demonstrates that Google Groups doesn't assume ASCII like
    > Python does. Since popular newsreaders like Google Groups and Outlook
    > Express can display the message correctly without the MIME headers,
    > but your obscure one can't, there's a much stronger case to made that
    > it's your newsreader that's broken.


    *sigh* I give up on you. You didn't even read the "Joel on Software"
    article. The whole "why" and "what for" of Unicode and MIME will always
    be a complete mystery to you.

    T.
    Thorsten Kampe, Feb 21, 2009
    #15
  16. Ron Garret

    Ross Ridge Guest

    Ross Ridge (Sat, 21 Feb 2009 17:07:35 -0500)
    > The link demonstrates that Google Groups doesn't assume ASCII like
    > Python does. Since popular newsreaders like Google Groups and Outlook
    > Express can display the message correctly without the MIME headers,
    > but your obscure one can't, there's a much stronger case to made that
    > it's your newsreader that's broken.


    Thorsten Kampe <> wrote:
    >*sigh* I give up on you. You didn't even read the "Joel on Software"
    >article. The whole "why" and "what for" of Unicode and MIME will always
    >be a complete mystery to you.


    I understand what Unicode and MIME are for and why they exist. Neither
    their merits nor your insults change the fact that the only current
    standard governing the content of Usenet posts doesn't require their use.

    Ross Ridge

    --
    l/ // Ross Ridge -- The Great HTMU
    [oo][oo]
    -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
    db //
    Ross Ridge, Feb 21, 2009
    #16
  17. * Ross Ridge (Sat, 21 Feb 2009 18:06:35 -0500)
    > > The link demonstrates that Google Groups doesn't assume ASCII like
    > > Python does. Since popular newsreaders like Google Groups and Outlook
    > > Express can display the message correctly without the MIME headers,
    > > but your obscure one can't, there's a much stronger case to made that
    > > it's your newsreader that's broken.

    >
    > Thorsten Kampe <> wrote:
    > >*sigh* I give up on you. You didn't even read the "Joel on Software"
    > >article. The whole "why" and "what for" of Unicode and MIME will always
    > >be a complete mystery to you.

    >
    > I understand what Unicode and MIME are for and why they exist. Neither
    > their merits nor your insults change the fact that the only current
    > standard governing the content of Usenet posts doesn't require their
    > use.


    That's right. As long as you use pure ASCII you can skip this nasty step
    of informing other people which charset you are using. If you do use non
    ASCII then you have to do that. That's the way virtually all newsreaders
    work. It has nothing to do with some 21+ year old RFC. Even your Google
    Groups "newsreader" does that ('content="text/html; charset=UTF-8"').

    Being explicit about your encoding is 99% of the whole Unicode magic in
    Python and in any communication across the Internet (may it be NNTP,
    SMTP or HTTP). Your Google Groups simply uses heuristics to guess the
    encoding the OP probably used. Windows newsreaders simply use the locale
    of the local host. That's guessing. You can call it assuming but it's
    still guessing. There is no way you can be sure without any declaration.

    And it's unpythonic. Python "assumes" ASCII and if the decodes/encoded
    text doesn't fit that encoding it refuses to guess.

    T.
    Thorsten Kampe, Feb 21, 2009
    #17
  18. Ron Garret

    Ross Ridge Guest

    Ross Ridge (Sat, 21 Feb 2009 18:06:35 -0500)
    > I understand what Unicode and MIME are for and why they exist. Neither
    > their merits nor your insults change the fact that the only current
    > standard governing the content of Usenet posts doesn't require their
    > use.


    Thorsten Kampe <> wrote:
    >That's right. As long as you use pure ASCII you can skip this nasty step
    >of informing other people which charset you are using. If you do use non
    >ASCII then you have to do that. That's the way virtually all newsreaders
    >work. It has nothing to do with some 21+ year old RFC. Even your Google
    >Groups "newsreader" does that ('content="text/html; charset=UTF-8"').


    No, the original post demonstrates you don't have include MIME headers for
    ISO 8859-1 text to be properly displayed by many newsreaders. The fact
    that your obscure newsreader didn't display it properly doesn't mean
    that original poster's newsreader is broken.

    >Being explicit about your encoding is 99% of the whole Unicode magic in
    >Python and in any communication across the Internet (may it be NNTP,
    >SMTP or HTTP).


    HTTP requires the assumption of ISO 8859-1 in the absense of any
    specified encoding.

    >Your Google Groups simply uses heuristics to guess the
    >encoding the OP probably used. Windows newsreaders simply use the locale
    >of the local host. That's guessing. You can call it assuming but it's
    >still guessing. There is no way you can be sure without any declaration.


    Newsreaders assuming ISO 8859-1 instead of ASCII doesn't make it a guess.
    It's just a different assumption, nor does making an assumption, ASCII
    or ISO 8850-1, give you any certainty.

    >And it's unpythonic. Python "assumes" ASCII and if the decodes/encoded
    >text doesn't fit that encoding it refuses to guess.


    Which is reasonable given that Python is programming language where it's
    better to have more conservative assumption about encodings so errors
    can be more quickly diagnosed. A newsreader however is a different
    beast, where it's better to make a less conservative assumption that's
    more likely to display messages correctly to the user. Assuming ISO
    8859-1 in the absense of any specified encoding allows the message to be
    correctly displayed if the character set is either ISO 8859-1 or ASCII.
    Doing things the "pythonic" way and assuming ASCII only allows such
    messages to be displayed if ASCII is used.

    Ross Ridge

    --
    l/ // Ross Ridge -- The Great HTMU
    [oo][oo]
    -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
    db //
    Ross Ridge, Feb 22, 2009
    #18
  19. * Ross Ridge (Sat, 21 Feb 2009 19:39:42 -0500)
    > Thorsten Kampe <> wrote:
    > >That's right. As long as you use pure ASCII you can skip this nasty step
    > >of informing other people which charset you are using. If you do use non
    > >ASCII then you have to do that. That's the way virtually all newsreaders
    > >work. It has nothing to do with some 21+ year old RFC. Even your Google
    > >Groups "newsreader" does that ('content="text/html; charset=UTF-8"').

    >
    > No, the original post demonstrates you don't have include MIME headers for
    > ISO 8859-1 text to be properly displayed by many newsreaders.


    *sigh* As you still refuse to read the article[1] I'm going to quote it
    now here:

    'The Single Most Important Fact About Encodings

    If you completely forget everything I just explained, please remember
    one extremely important fact. It does not make sense to have a string
    without knowing what encoding it uses.
    [...]
    If you have a string [...] in an email message, you have to know what
    encoding it is in or you cannot interpret it or display it to users
    correctly.

    Almost every [...] "she can't read my emails when I use accents" problem
    comes down to one naive programmer who didn't understand the simple fact
    that if you don't tell me whether a particular string is encoded using
    UTF-8 or ASCII or ISO 8859-1 (Latin 1) or Windows 1252 (Western
    European), you simply cannot display it correctly [...]. There are over
    a hundred encodings and above code point 127, all bets are off.'

    Enough said.

    > The fact that your obscure newsreader didn't display it properly
    > doesn't mean that original poster's newsreader is broken.


    You don't even know if my "obscure newsreader" displayed it properly.
    Non ASCII text without a declared encoding is just a bunch of bytes.
    It's not even text.

    T.

    [1] http://www.joelonsoftware.com/articles/Unicode.html
    Thorsten Kampe, Feb 22, 2009
    #19
  20. Ron Garret

    Steve Holden Guest

    Thorsten Kampe wrote:
    > * Ross Ridge (Sat, 21 Feb 2009 14:52:09 -0500)
    >> Thorsten Kampe <> wrote:
    >>> It's all about declaring your charset. In Python as well as in your
    >>> newsreader. If you don't declare your charset it's ASCII for you - in
    >>> Python as well as in your newsreader.

    >> Except in practice unlike Python, many newsreaders don't assume ASCII.

    >
    > They assume ASCII - unless you declare your charset (the exception being
    > Outlook Express and a few Windows newsreaders). Everything else is
    > "guessing".
    >
    >> The original article displayed fine for me. Google Groups displays it
    >> correctly too:
    >>
    >> http://groups.google.com/group/comp.lang.python/msg/828fefd7040238bc

    >
    > Your understanding of the principles of Unicode is as least as non-
    > existant as the OP's.
    >
    >> I could just as easily argue that assuming ISO 8859-1 is the defacto
    >> standard, and that its your newsreader that's broken.

    >
    > There is no "standard" in regard to guessing (this is what you call
    > "assuming"). The need for explicit declaration of an encoding is exactly
    > the same in Python as in any Usenet article.
    >
    >> The reality however is that RFC 1036 is the only standard for Usenet
    >> messages, defacto or otherwise, and so there's nothing wrong with
    >> anyone's newsreader.

    >
    > The reality is that all non-broken newsreaders use MIME headers to
    > declare and interpret the charset being used. I suggest you read at
    > least http://www.joelonsoftware.com/articles/Unicode.html to get an idea
    > of Unicode and associated topics.
    >

    And I suggest you try to phrase your remarks in a way more respectful of
    those you are discussing these matters with. I understand that
    exasperation can lead to offensiveness, but if a lack of understanding
    does exist then it's better to simply try and remove it without
    commenting on its existence.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
    Steve Holden, Feb 22, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Mark Bram
    Replies:
    0
    Views:
    3,906
    Robert Mark Bram
    Sep 28, 2003
  2. ygao

    unicode wrap unicode object?

    ygao, Apr 8, 2006, in forum: Python
    Replies:
    6
    Views:
    530
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Apr 8, 2006
  3. Gabriele *darkbard* Farina

    Unicode digit to unicode string

    Gabriele *darkbard* Farina, May 16, 2006, in forum: Python
    Replies:
    2
    Views:
    497
    Gabriele *darkbard* Farina
    May 16, 2006
  4. gabor
    Replies:
    13
    Views:
    535
    Leo Kislov
    Nov 18, 2006
  5. Jean-Paul Calderone
    Replies:
    23
    Views:
    655
    Leo Kislov
    Nov 21, 2006
Loading...

Share This Page