Unicode troubles

R

Rodrigo Benenson

Hi!
I'm finishing a multiplatform collaborative realtime text editor (something
like SubEthaEdit but multiplatform and opensource) develloped using
Python+Twisted as a plugin for Leo.

Of course as the software run in different platforms in different places,
text encoding compatibility is an issue.
So the obvious choice was Tkencoding for client gui, unicode for system
internals and utf-8 for web outputs.
But I'm getting serious trouble using Tk and Unicode internals.

The system, being a text editor use string lenghts and position in the text
widget as parameters of most of the function critical algorithms.
Unfortunatelly I had discovered recently that some encoding does not provide
and equivalence between
num_of_chars/length_of_string/position_in_text_widget. As a result each time
someone press a non ascii key, the references are lose and the other clients
receive a soup of letters.

I had read on internet that Unicode was supposed to keep the relation
num_of_char/string_lenght (and thus the relation
string_length/num_of_char/position_in_text_widget). But this relation does
not occurs on all my machines.

Sometimes I get len(u"eló") = 3 (the good result) and other times
len(u"eló") = 4 (wrong result). These seems indiferent of the OS.

Could someone explain me this issue ? How I'm supposed to manage this
problem ? Do I have to compile python with special params to get unicode
chars and one length unit ?

Thanks.
Rodrigo Benenson.
 
M

Michael Radziej

Rodrigo said:
Sometimes I get len(u"eló") = 3 (the good result) and other times
len(u"eló") = 4 (wrong result). These seems indiferent of the OS.

There are different ways to express "special" characters.
E.g. you can describe "ó" as a single character,
or as accent + "o".
What you want is the "canonical form".
Take a look at unicodedata.normalize (well, it came
new with Python 2.3)

http://www.python.org/doc/current/lib/module-unicodedata.html

Hope this helps,

Michael Radziej
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top