C
Christos TZOTZIOY Georgiou
I found at least one case where decombining and recombining a unicode
character does not result in the same character (see at end).
I have no extensive knowledge about Unicode, yet I believe that this
must be a problem of the Unicode 3.2 specification and not Python's.
However, I haven't found out how the decomp_data (in unicodedata_db.h)
is built, and neither did I find much more info about the specifics of
Unicode 3.2. I thought about posting here; anyone more knowing could
give it a look.
If we find out that it's a problem with Python, I'll open a bug report
(and volunteer work).
*** Example ***
for uchar in utext:
print ord(uchar), ud.name(uchar)
945 GREEK SMALL LETTER ALPHA
769 COMBINING ACUTE ACCENT
*** End of Example ***
I can understand this confusion; if, as I have found, there is no
COMBINING GREEK TONOS or COMBINING TONOS ACCENT in the Unicode table,
decombining, one has to use the 'oxeia' (acute) accent...
character does not result in the same character (see at end).
I have no extensive knowledge about Unicode, yet I believe that this
must be a problem of the Unicode 3.2 specification and not Python's.
However, I haven't found out how the decomp_data (in unicodedata_db.h)
is built, and neither did I find much more info about the specifics of
Unicode 3.2. I thought about posting here; anyone more knowing could
give it a look.
If we find out that it's a problem with Python, I'll open a bug report
(and volunteer work).
*** Example ***
for uchar in utext:
print ord(uchar), ud.name(uchar)
945 GREEK SMALL LETTER ALPHA
769 COMBINING ACUTE ACCENT
*** End of Example ***
I can understand this confusion; if, as I have found, there is no
COMBINING GREEK TONOS or COMBINING TONOS ACCENT in the Unicode table,
decombining, one has to use the 'oxeia' (acute) accent...