Codecs for ISO 8859-11 (Thai) and 8859-16 (Romanian)

Discussion in 'Python' started by Peter Jacobi, Jul 27, 2004.

  1. Peter Jacobi

    Peter Jacobi Guest

    I've seen from the 2.4alpha announcements, that the CJK codecs made it
    into this version.

    I'd like to ask whether (or how to) add the missing ISO 8859 codes:

    ISO 8859-11 (= TIS620) for Thai
    ISO 8859-16 for Romanian

    They are easily built from the Unicode mapping files like the other
    ISO 8859 codecs and it would just be nice, if they were included in
    the standard distribution.

    Peter
     
    Peter Jacobi, Jul 27, 2004
    #1
    1. Advertising

  2. Peter Jacobi wrote:
    > They are easily built from the Unicode mapping files like the other
    > ISO 8859 codecs and it would just be nice, if they were included in
    > the standard distribution.


    Can you produce a patch? Please upload it to sf.net/projects/python.

    ISO-8859-11 is actually very difficult to implement, as it is unclear
    whether the characters \x80..\x9F are assigned in this character set
    or not. In fact, it is unclear whether the character set contains
    even C0.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Jul 27, 2004
    #2
    1. Advertising

  3. On 27 Jul 2004 04:10:24 -0700, rumours say that
    (Peter Jacobi) might have written:

    >I'd like to ask whether (or how to) add the missing ISO 8859 codes:


    Martin asked for a patch, which would be nice if you could provide. On
    "how": just take any lib/encodings/iso8859_?.py and edit the dict
    argument to the decoding_map.update call.
    --
    TZOTZIOY, I speak England very best,
    "Tssss!" --Brad Pitt as Achilles in unprecedented Ancient Greek
     
    Christos TZOTZIOY Georgiou, Jul 28, 2004
    #3
  4. "Martin v. Löwis" <> wrote in message news:...

    > ISO-8859-11 is actually very difficult to implement, as it is unclear
    > whether the characters \x80..\x9F are assigned in this character set
    > or not. In fact, it is unclear whether the character set contains
    > even C0.


    That seems like a very fine distinction to me; the Unicode mapping tables
    are the same for those points as in ISO-8859-1, so what's the difference?
     
    Richard Brodie, Jul 28, 2004
    #4
  5. Richard Brodie wrote:
    >>ISO-8859-11 is actually very difficult to implement, as it is unclear
    >>whether the characters \x80..\x9F are assigned in this character set
    >>or not. In fact, it is unclear whether the character set contains
    >>even C0.

    >
    >
    > That seems like a very fine distinction to me; the Unicode mapping tables
    > are the same for those points as in ISO-8859-1, so what's the difference?


    For ISO-8859-1, I believe the standard actually says that those code
    points are C1. For ISO-8859-11, you can find various statements in the
    net, some claiming that it includes C1, and some claiming that it
    doesn't. Somebody would actually have to take a look at ISO-8859-11 to
    find out what is the case.

    The issue is complicated by two facts:
    - many sources indicate that ISO-8859-11 is derived by taking TIS-620,
    and adding NBSP into 0xa0. Now, it seems quite clear that TIS-620 does
    *not* include C1.
    - some sources indicate certain restrictrions wrt. to control functions,
    eg. in

    http://www.nectec.or.th/it-standards/iso8859-11/

    which says "control functions are not used to create composite graphic
    symbols from two or more graphic characters (see 6). "
    I don't know what this means, especially as section 6 does not talk
    about control functions. Section 7 says that any control functions
    are out of scope of ISO 8859, which I believe is factually incorrect.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Jul 28, 2004
    #5
  6. Peter Jacobi

    Peter Jacobi Guest

    Hi Christos, All,

    Christos "TZOTZIOY" Georgiou <> wrote in message
    > Martin asked for a patch, which would be nice if you could provide. On
    > "how": just take any lib/encodings/iso8859_?.py and edit the dict
    > argument to the decoding_map.update call.


    Thanks for the hint, but I've already succeeded in generating the
    necessary files. It's even easier than your solutions, as the utility
    gencodec.py in Tools/Scripts generate these automatically from (1:1)
    Unicode mapping files (ftp://ftp.unicode.org/Public/MAPPINGS/).

    I'll add the generated files at the end of this post.

    The remaining question, and it seems the more difficult one, is a
    question of process. Whether and how to add these to the normal
    Python distribution.

    Regards,
    Peter Jacobi

    Thai:
    === iso8859_11.py ===
    """ Python Character Mapping Codec generated from '8859-11.TXT' with gencodec.py.

    Written by Marc-Andre Lemburg ().

    (c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
    (c) Copyright 2000 Guido van Rossum.

    """#"

    import codecs

    ### Codec APIs

    class Codec(codecs.Codec):

    def encode(self,input,errors='strict'):

    return codecs.charmap_encode(input,errors,encoding_map)

    def decode(self,input,errors='strict'):

    return codecs.charmap_decode(input,errors,decoding_map)

    class StreamWriter(Codec,codecs.StreamWriter):
    pass

    class StreamReader(Codec,codecs.StreamReader):
    pass

    ### encodings module API

    def getregentry():

    return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

    ### Decoding Map

    decoding_map = codecs.make_identity_dict(range(256))
    decoding_map.update({
    0x00a1: 0x0e01, # THAI CHARACTER KO KAI
    0x00a2: 0x0e02, # THAI CHARACTER KHO KHAI
    0x00a3: 0x0e03, # THAI CHARACTER KHO KHUAT
    0x00a4: 0x0e04, # THAI CHARACTER KHO KHWAI
    0x00a5: 0x0e05, # THAI CHARACTER KHO KHON
    0x00a6: 0x0e06, # THAI CHARACTER KHO RAKHANG
    0x00a7: 0x0e07, # THAI CHARACTER NGO NGU
    0x00a8: 0x0e08, # THAI CHARACTER CHO CHAN
    0x00a9: 0x0e09, # THAI CHARACTER CHO CHING
    0x00aa: 0x0e0a, # THAI CHARACTER CHO CHANG
    0x00ab: 0x0e0b, # THAI CHARACTER SO SO
    0x00ac: 0x0e0c, # THAI CHARACTER CHO CHOE
    0x00ad: 0x0e0d, # THAI CHARACTER YO YING
    0x00ae: 0x0e0e, # THAI CHARACTER DO CHADA
    0x00af: 0x0e0f, # THAI CHARACTER TO PATAK
    0x00b0: 0x0e10, # THAI CHARACTER THO THAN
    0x00b1: 0x0e11, # THAI CHARACTER THO NANGMONTHO
    0x00b2: 0x0e12, # THAI CHARACTER THO PHUTHAO
    0x00b3: 0x0e13, # THAI CHARACTER NO NEN
    0x00b4: 0x0e14, # THAI CHARACTER DO DEK
    0x00b5: 0x0e15, # THAI CHARACTER TO TAO
    0x00b6: 0x0e16, # THAI CHARACTER THO THUNG
    0x00b7: 0x0e17, # THAI CHARACTER THO THAHAN
    0x00b8: 0x0e18, # THAI CHARACTER THO THONG
    0x00b9: 0x0e19, # THAI CHARACTER NO NU
    0x00ba: 0x0e1a, # THAI CHARACTER BO BAIMAI
    0x00bb: 0x0e1b, # THAI CHARACTER PO PLA
    0x00bc: 0x0e1c, # THAI CHARACTER PHO PHUNG
    0x00bd: 0x0e1d, # THAI CHARACTER FO FA
    0x00be: 0x0e1e, # THAI CHARACTER PHO PHAN
    0x00bf: 0x0e1f, # THAI CHARACTER FO FAN
    0x00c0: 0x0e20, # THAI CHARACTER PHO SAMPHAO
    0x00c1: 0x0e21, # THAI CHARACTER MO MA
    0x00c2: 0x0e22, # THAI CHARACTER YO YAK
    0x00c3: 0x0e23, # THAI CHARACTER RO RUA
    0x00c4: 0x0e24, # THAI CHARACTER RU
    0x00c5: 0x0e25, # THAI CHARACTER LO LING
    0x00c6: 0x0e26, # THAI CHARACTER LU
    0x00c7: 0x0e27, # THAI CHARACTER WO WAEN
    0x00c8: 0x0e28, # THAI CHARACTER SO SALA
    0x00c9: 0x0e29, # THAI CHARACTER SO RUSI
    0x00ca: 0x0e2a, # THAI CHARACTER SO SUA
    0x00cb: 0x0e2b, # THAI CHARACTER HO HIP
    0x00cc: 0x0e2c, # THAI CHARACTER LO CHULA
    0x00cd: 0x0e2d, # THAI CHARACTER O ANG
    0x00ce: 0x0e2e, # THAI CHARACTER HO NOKHUK
    0x00cf: 0x0e2f, # THAI CHARACTER PAIYANNOI
    0x00d0: 0x0e30, # THAI CHARACTER SARA A
    0x00d1: 0x0e31, # THAI CHARACTER MAI HAN-AKAT
    0x00d2: 0x0e32, # THAI CHARACTER SARA AA
    0x00d3: 0x0e33, # THAI CHARACTER SARA AM
    0x00d4: 0x0e34, # THAI CHARACTER SARA I
    0x00d5: 0x0e35, # THAI CHARACTER SARA II
    0x00d6: 0x0e36, # THAI CHARACTER SARA UE
    0x00d7: 0x0e37, # THAI CHARACTER SARA UEE
    0x00d8: 0x0e38, # THAI CHARACTER SARA U
    0x00d9: 0x0e39, # THAI CHARACTER SARA UU
    0x00da: 0x0e3a, # THAI CHARACTER PHINTHU
    0x00db: None,
    0x00dc: None,
    0x00dd: None,
    0x00de: None,
    0x00df: 0x0e3f, # THAI CURRENCY SYMBOL BAHT
    0x00e0: 0x0e40, # THAI CHARACTER SARA E
    0x00e1: 0x0e41, # THAI CHARACTER SARA AE
    0x00e2: 0x0e42, # THAI CHARACTER SARA O
    0x00e3: 0x0e43, # THAI CHARACTER SARA AI MAIMUAN
    0x00e4: 0x0e44, # THAI CHARACTER SARA AI MAIMALAI
    0x00e5: 0x0e45, # THAI CHARACTER LAKKHANGYAO
    0x00e6: 0x0e46, # THAI CHARACTER MAIYAMOK
    0x00e7: 0x0e47, # THAI CHARACTER MAITAIKHU
    0x00e8: 0x0e48, # THAI CHARACTER MAI EK
    0x00e9: 0x0e49, # THAI CHARACTER MAI THO
    0x00ea: 0x0e4a, # THAI CHARACTER MAI TRI
    0x00eb: 0x0e4b, # THAI CHARACTER MAI CHATTAWA
    0x00ec: 0x0e4c, # THAI CHARACTER THANTHAKHAT
    0x00ed: 0x0e4d, # THAI CHARACTER NIKHAHIT
    0x00ee: 0x0e4e, # THAI CHARACTER YAMAKKAN
    0x00ef: 0x0e4f, # THAI CHARACTER FONGMAN
    0x00f0: 0x0e50, # THAI DIGIT ZERO
    0x00f1: 0x0e51, # THAI DIGIT ONE
    0x00f2: 0x0e52, # THAI DIGIT TWO
    0x00f3: 0x0e53, # THAI DIGIT THREE
    0x00f4: 0x0e54, # THAI DIGIT FOUR
    0x00f5: 0x0e55, # THAI DIGIT FIVE
    0x00f6: 0x0e56, # THAI DIGIT SIX
    0x00f7: 0x0e57, # THAI DIGIT SEVEN
    0x00f8: 0x0e58, # THAI DIGIT EIGHT
    0x00f9: 0x0e59, # THAI DIGIT NINE
    0x00fa: 0x0e5a, # THAI CHARACTER ANGKHANKHU
    0x00fb: 0x0e5b, # THAI CHARACTER KHOMUT
    0x00fc: None,
    0x00fd: None,
    0x00fe: None,
    0x00ff: None,
    })

    ### Encoding Map

    encoding_map = codecs.make_encoding_map(decoding_map)
    === eof ===

    Romanian:
    === iso8859_16.py ===
    """ Python Character Mapping Codec generated from '8859-16.TXT' with gencodec.py.

    Written by Marc-Andre Lemburg ().

    (c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
    (c) Copyright 2000 Guido van Rossum.

    """#"

    import codecs

    ### Codec APIs

    class Codec(codecs.Codec):

    def encode(self,input,errors='strict'):

    return codecs.charmap_encode(input,errors,encoding_map)

    def decode(self,input,errors='strict'):

    return codecs.charmap_decode(input,errors,decoding_map)

    class StreamWriter(Codec,codecs.StreamWriter):
    pass

    class StreamReader(Codec,codecs.StreamReader):
    pass

    ### encodings module API

    def getregentry():

    return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

    ### Decoding Map

    decoding_map = codecs.make_identity_dict(range(256))
    decoding_map.update({
    0x00a1: 0x0104, # LATIN CAPITAL LETTER A WITH OGONEK
    0x00a2: 0x0105, # LATIN SMALL LETTER A WITH OGONEK
    0x00a3: 0x0141, # LATIN CAPITAL LETTER L WITH STROKE
    0x00a4: 0x20ac, # EURO SIGN
    0x00a5: 0x201e, # DOUBLE LOW-9 QUOTATION MARK
    0x00a6: 0x0160, # LATIN CAPITAL LETTER S WITH CARON
    0x00a8: 0x0161, # LATIN SMALL LETTER S WITH CARON
    0x00aa: 0x0218, # LATIN CAPITAL LETTER S WITH COMMA BELOW
    0x00ac: 0x0179, # LATIN CAPITAL LETTER Z WITH ACUTE
    0x00ae: 0x017a, # LATIN SMALL LETTER Z WITH ACUTE
    0x00af: 0x017b, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
    0x00b2: 0x010c, # LATIN CAPITAL LETTER C WITH CARON
    0x00b3: 0x0142, # LATIN SMALL LETTER L WITH STROKE
    0x00b4: 0x017d, # LATIN CAPITAL LETTER Z WITH CARON
    0x00b5: 0x201d, # RIGHT DOUBLE QUOTATION MARK
    0x00b8: 0x017e, # LATIN SMALL LETTER Z WITH CARON
    0x00b9: 0x010d, # LATIN SMALL LETTER C WITH CARON
    0x00ba: 0x0219, # LATIN SMALL LETTER S WITH COMMA BELOW
    0x00bc: 0x0152, # LATIN CAPITAL LIGATURE OE
    0x00bd: 0x0153, # LATIN SMALL LIGATURE OE
    0x00be: 0x0178, # LATIN CAPITAL LETTER Y WITH DIAERESIS
    0x00bf: 0x017c, # LATIN SMALL LETTER Z WITH DOT ABOVE
    0x00c3: 0x0102, # LATIN CAPITAL LETTER A WITH BREVE
    0x00c5: 0x0106, # LATIN CAPITAL LETTER C WITH ACUTE
    0x00d0: 0x0110, # LATIN CAPITAL LETTER D WITH STROKE
    0x00d1: 0x0143, # LATIN CAPITAL LETTER N WITH ACUTE
    0x00d5: 0x0150, # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
    0x00d7: 0x015a, # LATIN CAPITAL LETTER S WITH ACUTE
    0x00d8: 0x0170, # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
    0x00dd: 0x0118, # LATIN CAPITAL LETTER E WITH OGONEK
    0x00de: 0x021a, # LATIN CAPITAL LETTER T WITH COMMA BELOW
    0x00e3: 0x0103, # LATIN SMALL LETTER A WITH BREVE
    0x00e5: 0x0107, # LATIN SMALL LETTER C WITH ACUTE
    0x00f0: 0x0111, # LATIN SMALL LETTER D WITH STROKE
    0x00f1: 0x0144, # LATIN SMALL LETTER N WITH ACUTE
    0x00f5: 0x0151, # LATIN SMALL LETTER O WITH DOUBLE ACUTE
    0x00f7: 0x015b, # LATIN SMALL LETTER S WITH ACUTE
    0x00f8: 0x0171, # LATIN SMALL LETTER U WITH DOUBLE ACUTE
    0x00fd: 0x0119, # LATIN SMALL LETTER E WITH OGONEK
    0x00fe: 0x021b, # LATIN SMALL LETTER T WITH COMMA BELOW
    })

    ### Encoding Map

    encoding_map = codecs.make_encoding_map(decoding_map)
    === eof ===
     
    Peter Jacobi, Jul 31, 2004
    #6
  7. Peter Jacobi wrote:
    > The remaining question, and it seems the more difficult one, is a
    > question of process. Whether and how to add these to the normal
    > Python distribution.


    The process is actually very easy. Anybody willing to contribute them
    would have to upload them to SF (sf.net/projects/python).

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 1, 2004
    #7
  8. Peter Jacobi

    Peter Jacobi Guest

    Hi Martin, All,

    "Martin v. Löwis" <> wrote:>
    > The process is actually very easy. Anybody willing to contribute them
    > would have to upload them to SF (sf.net/projects/python).


    Perhaps I have just misunderstood your email. I read it this way (in my own words):

    Taking into account unanswered questions about ISO 8859-11 and TIS620,
    whoever wants to contribute, has to do provider further research,
    starting with, but not limited to, buying the ISO standard.

    The prospective contributor in addition has to provide support for this
    patch and answer all questions about the details involved.

    Sorry, this is in the moment out of scope for me. I have a patch, using
    information from a source which is reliable enough for my personal
    requirements, and now the patch is on USENET available for everyone
    who wants to investigate further.

    Regards,
    Peter Jacobi
     
    Peter Jacobi, Aug 1, 2004
    #8
  9. Peter Jacobi wrote:
    >>The process is actually very easy. Anybody willing to contribute them
    >>would have to upload them to SF (sf.net/projects/python).

    >
    >
    > Perhaps I have just misunderstood your email. I read it this way (in my own words):


    [snipped]
    No - this is indeed my view on the issue. However, this is a technical
    view; the *process* is completely independent, and very straight
    forward. Submit the patch to SF, and somebody (probably Marc-Andre
    Lemburg) will review it. The reviewer might ask questions or request
    further changes (such as adding documentation); then the patch gets
    accepted or rejected.

    I know that *I* would ask questions as to why the submitter thinks the
    patch is correct, and I would request that the submitter commits to
    maintaining the patch. If you are unwilling to make such a commitment,
    I can understand that - it just means that Python 2.4 might not have
    these codecs (and we haven't discussed the 8859-16 at all).

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 2, 2004
    #9
  10. Peter Jacobi

    Peter Jacobi Guest

    I've added an entry in the RFE tracker at http://sf.net/projects/python

    Regarding the correctness doubts, I can provide these three points os far:

    a) ISO 8859-n vs ISO-8859-n
    If the information at
    http://en.wikipedia.org/wiki/ISO_8859-1#ISO_8859-1_vs_ISO-8859-1
    is correct, Python 8859-n
    codecs do implement the ISO standard charsets ISO 8859-n
    in the specialized IANA forms ISO-8859-n (and in agreement
    with the Unicode mapping files). So any difficult C0/C1
    wording in the original ISO standard can be disregarded.

    b) libiconv ISO 8859-11
    The implementation by Bruno Haible in libiconv does agree
    with the Unicode mapping file:
    http://cvs.sourceforge.net/viewcvs.py/libiconv/libiconv/lib/

    c) IBM ICU4C
    The implementation in ICU4C does agree with the Unicode
    mapping file:
    http://oss.software.ibm.com/cvs/icu/charset/data/ucm/

    Regards,
    Peter Jacobi
     
    Peter Jacobi, Aug 2, 2004
    #10
  11. Peter Jacobi wrote:
    > a) ISO 8859-n vs ISO-8859-n
    > If the information at
    > http://en.wikipedia.org/wiki/ISO_8859-1#ISO_8859-1_vs_ISO-8859-1
    > is correct, Python 8859-n
    > codecs do implement the ISO standard charsets ISO 8859-n
    > in the specialized IANA forms ISO-8859-n (and in agreement
    > with the Unicode mapping files). So any difficult C0/C1
    > wording in the original ISO standard can be disregarded.


    I see. According to RFC 1345, this is definitely the case
    for ISO-8859-1. ISO-8859-16 is not defined in an RFC, but
    in

    http://www.iana.org/assignments/charset-reg/ISO-8859-16

    This is a confusing document, as it both refers to ISO/IEC
    8859-16:2001 (no control characters), and the Unicode character
    map (with control characters). We might interpret this as a
    mistake, and assume that it was intended to include control
    characters (as all the other ISO-8859-n).

    For ISO-8859-11, the situation is even more confusing, as
    that is no registered IANA character set, according to

    http://www.iana.org/assignments/character-sets

    Therefore, it would be a protocol violation (strictly speaking)
    if one would use iso-8859-11 in, say, a MIME charset= header.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 2, 2004
    #11
  12. Peter Jacobi

    Peter Jacobi Guest

    More charset troubles (Re: Codecs for ISO 8859-11 (Thai) and 8859-16 (Romanian))

    Hi Martin, All,

    "Martin v. Löwis" <> wrote in message
    > Therefore, it would be a protocol violation (strictly speaking)
    > if one would use iso-8859-11 in, say, a MIME charset= header.


    Strictly speaking, there are some more dark corners to check.
    All ISO charsets should be, strictly speaking, qualified by year. And
    in fact there were some prominent changes, e.g. in 8859-7 (greek).
    What to do of them?

    Looking around:
    - the RFC references a fixed year old version
    - Unicode mapping files and libiconv track the newest version
    - IBM ICU4C provides all versions
    - Python (not by planning, I assume) has a "middle" version with
    some features of the old mapping table (no currency signs) and some
    features of the new (0xA1=0x2018, 0xA2=0x2019)

    Weird.

    Best Regards,
    Peter Jacobi
     
    Peter Jacobi, Aug 3, 2004
    #12
  13. Re: More charset troubles (Re: Codecs for ISO 8859-11 (Thai) and8859-16 (Romanian))

    Peter Jacobi wrote:
    > Looking around:
    > - the RFC references a fixed year old version
    > - Unicode mapping files and libiconv track the newest version
    > - IBM ICU4C provides all versions
    > - Python (not by planning, I assume) has a "middle" version with
    > some features of the old mapping table (no currency signs) and some
    > features of the new (0xA1=0x2018, 0xA2=0x2019)


    Indeed. Adding new codecs is not a matter of just compiling a few files
    that somebody else has produced, but requires a lot of expertise.
    Therefore, I would have preferred if Python would not have included any
    codecs, but relied on the codecs that come with the platform (e.g. iconv
    on Unix, IE DLLs on Windows).

    Now, things came out differently, and we are now in charge of
    maintaining what we got. This requires great care, and expert volunteers
    are always welcome. Unfortunately, in the Unicode/character sets/l10n
    world, there is no one true way, so experts need to stand up and voice
    their opinion, hoping that contributors become atleast aware of the
    issues.

    In the specific case of ISO-8859-7, I was until just now unaware of the
    issue - I would not have guessed that ISO dared to ever change a part
    of 8859. If this is ever going to be changed, I would suggest the
    following approach:
    - provide two encodings: ISO-8859-7:1987, and ISO-8859-7:2003. Without
    checking, I would hope that the version in RFC 1345 is identical with
    8859-7:1987
    - Make ISO-8859-7 an alias for ISO-8859-7:1987
    Of course, somebody should really talk to IANA and come up with
    preferred MIME name. Apparently, ISO-8859-7-EURO and ISO-8859-7-2003
    have been proposed.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 3, 2004
    #13
  14. Peter Jacobi wrote:
    > a) ISO 8859-n vs ISO-8859-n
    > If the information at
    > http://en.wikipedia.org/wiki/ISO_8859-1#ISO_8859-1_vs_ISO-8859-1
    > is correct, Python 8859-n
    > codecs do implement the ISO standard charsets ISO 8859-n
    > in the specialized IANA forms ISO-8859-n (and in agreement
    > with the Unicode mapping files). So any difficult C0/C1
    > wording in the original ISO standard can be disregarded.


    I have just asked Markus Kuhn about this, who has registered
    ISO-8859-16 with IANA. He believes that his registration does
    not include control characters (neither C0 nor C1), just as
    the ISO standard does not contain any. Wrt. RFC 1345 he points
    out that this is not an Internet Standard, but a private
    collection of Keld Simonsen, i.e. it is not binding.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 3, 2004
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    712
  2. Franck DARRAS
    Replies:
    12
    Views:
    638
    Jim Higson
    Aug 23, 2004
  3. hasanw07
    Replies:
    0
    Views:
    428
    hasanw07
    Aug 14, 2006
  4. hangeonos
    Replies:
    0
    Views:
    1,906
    hangeonos
    Feb 4, 2009
  5. Karl Knechtel
    Replies:
    2
    Views:
    367
    Walter Dörwald
    Jul 10, 2012
Loading...

Share This Page