Tkinter / Unicode and UTF-8

Discussion in 'Python' started by Thomas, Nov 19, 2003.

  1. Thomas

    Thomas Guest

    I was used to pass Unicode strings to Tk widgets. IIRC, Tcl/Tk
    expects UTF-8 encoded strings, but Tkinter took care of that.
    This worked, as long as I was using Python/Tk on Red Hat 9 Linux
    (and on earlier versions).

    Now I switched to Fedora Core 1 Linux (where Python/Tk does not
    work without fixing it - but I described that in another thread)
    and I have to pass UTF-8 encoded strings to Tk widgets (i.e. I
    cannot directly pass Unicode strings any more).

    Now I have some questions:

    - Was Tkinter changed to behave like that?
    - Will it stay like that in the future?
    - Isn't it strange, that you have to pass UTF-8 encoded strings
    to Tk widgets, but that the widgets will return Unicode strings?

    (My versions: Python 2.2.3, Tkinter 2.2.3, Tcl/Tk 8.3.5)

    Thanks in advance for any comments and hints; I have to change
    a lot of code if passing UTF-8 encoded strings to Tk widgets
    is now the only way to do it. And before doing that, I would
    really like to know what the 'correct' way is.
     
    Thomas, Nov 19, 2003
    #1
    1. Advertisements

  2. Then you fixed it incorrectly.
    No, it wasn't even changed.
    You don't have to. It works just fine with Unicode strings.
    Not change the code.

    Regards,
    Martin
     
    Martin v. =?iso-8859-15?q?L=F6wis?=, Nov 20, 2003
    #2
    1. Advertisements

  3. Thomas

    Thomas Guest

    Hi Martin!

    I just used the 'Python' and 'tkinter' RPMs from www.python.org to
    update ('rpm -U ...') the RPMs provided with the Fedora Core 1 Linux
    distribution. (The RPMs of the distribution did not allow to use any
    Python/Tk application because Python was compiled with the UCS4 option
    whereas Tcl/Tk uses UCS2 (if I understand this point correctly).)


    Now, the following example does not work correctly:

    from Tkinter import *
    tk = Tk()

    txt = Text(tk)
    txt.pack()
    message = u"hello"
    txt.insert('1.0', message)

    tk.mainloop()

    (The above example will display h\x00e\x001 in the Text widget.)


    But the following works (only the 'UTF-8' part is new here):

    from Tkinter import *
    tk = Tk()

    txt = Text(tk)
    txt.pack()
    message = u"hello"
    txt.insert('1.0', message.encode('UTF-8'))

    tk.mainloop()

    (I just used the Text widget as an example here; the same holds
    for many other widgets, e.g. menus.)

    In some Tcl/Tk documentation, I read that Tk widgets expect UTF-8;
    somewhere else (don't remember the URLs), I read that _tkinter.c
    handles this (by encoding Unicode strings with UTF-8 for Tk widgets);
    that was the reason why I thought there might have been a change
    in Tkinter recently.

    But (my main problem!): I still do not understand why the first
    example does not work, while the second does!?

    Thomas
     
    Thomas, Nov 20, 2003
    #3
  4. Which RPM did you use specifically? If it is

    http://www.python.org/ftp/python/2.3.2/rpms/redhat-9/python2.3-tkinter-2.3.2-1pydotorg.i386.rpm

    then you can't use it on Fedora 1: The RPM is for Redhat 9, after all,
    not for Fedora 1.
    Because you are using incorrect binaries. You will have to build Python
    from source on Fedora 1, or wait for Redhat to fix the package.

    The pydotorg RPM assumes that Tk uses UCS-4 internally, as it does on
    Redhat 9. On Fedora 1, Tk uses UCS-2, so copying a Python Unicode string
    into a Tcl Unicode string copies twice as many character as you have
    (and overwrites some unrelated memory in the process).

    There is, unfortunately, no way to detect the problem at run-time. So
    I repeat: You *have* to compile from source.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Nov 22, 2003
    #4
  5. Thomas

    Thomas Guest

    I think that will not happen (at least not for the Fedora 1 Release).
    (My experience with RedHat is: security fixes: yes; bug fixes: no).

    Thanks for the explanation! Now I got it.

    I just did that, following your advice.

    (Compiled Tcl/Tk 8.4 and Python 2.3 from the sources without
    deleting the Fedora 1 installation of Python 2.2 and Tcl/Tk 8.3;
    now - with my new Python/Tkinter 2.3, Tcl/Tk 8.4 - everything
    works as usual. Both, Python and Tcl/Tk, use UCS-2 now.
    I only had to rename the Python executable for Python 2.3 (so that
    it does not get executed when the system actually wants to use its
    own Python 2.2) and I have to use a startup script for Python 2.3
    setting LD_LIBRARY_PATH, because otherwise libtk8.4.so is not found
    by Python in /usr/local/lib. The 'LD_LIBRARY_PATH'-script could be
    avoided with a permanent solution (symlink resp. ldconfig), but at
    the moment, this solution is OK for me.)


    Thanks a lot for your help! This really drove me nuts :)

    Regards, Thomas.
     
    Thomas, Nov 22, 2003
    #5
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.