Tkinter - non-ASCII characters in text widgets problem

  • Thread starter Sebastian PajÄ…k
  • Start date
S

Sebastian PajÄ…k

Hello

I'm writing an application in Python 2.5.4 under Windows (xp sp3 en).
I use Tkinter as the main GUI toolkit. The app is intended to be
portable (not fully but win & mac os x is a must). It works as it
should on my system, but when I've sent the program to my friend who
has a mac computer, he told me accented characters are turned into
weird symbols. It is another must, the GUI consists of widgets with
polish characters. I always use UTF-8 encoding in Python and I save my
scripts in Notepad++ as UTF-8 without BOM. I've never experienced
similar problems under Windows with Tkinter before

I've created a simple test script:

CODE_START >>
# -*- coding: utf-8 -*-

import sys
from Tkinter import *

root = Tk()

Label(root, text='ęóąśłżźćń').pack()
Button(root, text='ęóąśłżźćń').pack()
Entry(root).pack()

root.mainloop()
CODE_END >>

No problem on Windows, but on mac Button widget has correct text.
Label and Entry has garbage instead of accented characters. (Mac OS X
10.5.6 and 10.4.11 both armed with Python 2.5.4)

I've tried various UTF file encoding (also with BOM mark), use of
u"text" or unicode() function - non of this worked. Googling shows not
much:

reload(sys)
sys.setdefaultencoding("utf-8")

or:

root = Tk()
root.tk.call('encoding', 'system', 'utf-8')

After applying this, the effect remains the same - one big garbage.
I'm out of ideas: my script is UTF-8 in 101%; Mac and Windows both
support UTF-8, Python also supports it - so where is the problem? How
can I show mac-users polish signs?

Please Help!
 
M

Martin v. Löwis

I've tried various UTF file encoding (also with BOM mark), use of

Always use u"text". This should work. Everything else might not work.
After applying this, the effect remains the same - one big garbage.

Can you please be more specific? What is "one big garbage"?
I'm out of ideas: my script is UTF-8 in 101%; Mac and Windows both
support UTF-8, Python also supports it - so where is the problem?

Most likely, Tk does not work correctly on your system. See whether
you can get correct results with wish.

Regards,
Martin
 
S

Sebastian PajÄ…k

2009/6/25 "Martin v. Löwis said:
Always use u"text". This should work. Everything else might not work.

But I tried this here without success
Can you please be more specific? What is "one big garbage"?

There is a square (or some other weird sign) in place where polish
accented character should be (like "ęłąśł" etc)
This problem is only on mac os x and it doesn't apply to button widget
(where characters are correct)
Most likely, Tk does not work correctly on your system. See whether
you can get correct results with wish.

There is no wish. I'm talking about build-in Tkinter (isn't Tk
build-in Python?).

btw. I'm workin on Windows, my friend on Mac - he points me the
problem he has with my script. He is not a computer geek nor a
programmer - he even doesn't know what wish/Tk or Python is


Does different endianness can have something to do here?
 
M

MRAB

Sebastian said:
But I tried this here without success


There is a square (or some other weird sign) in place where polish
accented character should be (like "ęłąśł" etc)
This problem is only on mac os x and it doesn't apply to button widget
(where characters are correct)


There is no wish. I'm talking about build-in Tkinter (isn't Tk
build-in Python?).

btw. I'm workin on Windows, my friend on Mac - he points me the
problem he has with my script. He is not a computer geek nor a
programmer - he even doesn't know what wish/Tk or Python is


Does different endianness can have something to do here?

In summary:

You're providing the same text for a Button and a Label. On Mac OSX the
Button shows the text correctly, but the Label doesn't.

Is this correct?
 
N

norseman

Sebastian said:
But I tried this here without success


There is a square (or some other weird sign) in place where polish
accented character should be (like "ęłąśł" etc)
This problem is only on mac os x and it doesn't apply to button widget
(where characters are correct)


There is no wish. I'm talking about build-in Tkinter (isn't Tk
build-in Python?).

btw. I'm workin on Windows, my friend on Mac - he points me the
problem he has with my script. He is not a computer geek nor a
programmer - he even doesn't know what wish/Tk or Python is


Does different endianness can have something to do here?
================
Can, but should not.
I read that the problem is when using the Polish language only.
Otherwise things work normally. Is that correct?
If so then byte swap may be a problem. Using the u'string' should solve
that. I am assuming you have the Polish alphabet working correctly on
your machine. I think I read that was so in an earlier posting.

Are there any problems with his alphabet scrambling on your machine?
If so that needs investigating. Here I assume you are reading Polish
from him on your machine and not a network translator version.


No - Tkinter is not built in. tkinter is a module shipped with Python
for people to use. (Tk interface) use: import tkinter

From Google:
Tkinter Life Preserver
Tkinter is a Python interface to the Tk GUI toolkit.
This document is not designed to be an exhaustive tutorial on either Tk
or Tkinter. ...www.python.org/doc/life-preserver/

more properly Tcl/Tk
see also www.tcl.tk



Steve
 
S

Sebastian PajÄ…k

Can, but should not.
I read that the problem is when using the Polish language only. Otherwise
things work normally. Is that correct?

Yes, correct
If so then byte swap may be a problem.  Using the u'string' should solve
that. I am assuming you have the Polish alphabet working correctly on your
machine. I think I read that was so in an earlier posting.

Are there any problems with his alphabet scrambling on your machine?
If so that needs investigating.  Here I assume you are reading Polish from
him on your machine and not a network translator version.

The original thread is here:
http://mail.python.org/pipermail/python-list/2009-June/717666.html
I've explained the problem there
 
N

norseman

Sebastian said:
Yes, correct


The original thread is here:
http://mail.python.org/pipermail/python-list/2009-June/717666.html
I've explained the problem there
================
I re-read the posting. (Thanks for the link)

You do not mention if he has sent you any Polish words and if they
appear OK on your machine.

A note here: In reading the original posting I get symbols that are not
familiar to me as alphabet.
From the line in your original:
Label(root, text='ęóąśłżźćń').pack()
I see text='
then an e with a goatee
a capitol O with an accent symbol on top (')
an a with a tail on the right
a s with an accent on top
an I do no not know what - maybe some sort of l with a
slash through the middle
a couple of z with accents on top
a capitol C with an accent on top
a n with a short bar on top

I put the code into python and took a look.



I get:
cat xx

# -*- coding: utf-8 -*-

import sys
from Tkinter import *

root = Tk()

Label(root, text='\u0119ó\u0105\u015b\u0142\u017c\u017a\u0107\u0144').pack()
Button(root,
text='\u0119ó\u0105\u015b\u0142\u017c\u017a\u0107\u0144').pack()
Entry(root).pack()

root.mainloop()

Then:
python xx
File "xx", line 10
SyntaxError: Non-ASCII character '\xf3' in file xx on line 10, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for details

So I did.
It notes Window$ puts things into those lines. Namely:
"To aid with platforms such as Windows, which add Unicode BOM marks
to the beginning of Unicode files, the UTF-8 signature
'\xef\xbb\xbf' will be interpreted as 'utf-8' encoding as well
(even if no magic encoding comment is given).
"

Then I took out the o with the accent and re-ran the file.

Everything works except the text is exactly as shown above. That is:
\u0119ó\u0105\u015b\u0142\u017c\u017a\u0107\u0144
(shows twice as directed, one for label, one for button, no apostrophes)

OK - now I take a look at what in actually in the file.
in MC on Linux Slackware 10.2 I read, in the mail folder,
0119 capitol A with a tilde on top.
HEX readings beginning at the 0119\...
30 31 31 39 C3 B3 5C

but in the python file xx, I read:
30 31 31 39 5C
0119\...

I would have to say the mail system is screwing you up. Might try
zipping the file and sending it that way and see if problem changes.


Steve
 
M

Martin v. Löwis

After applying this, the effect remains the same - one big garbage.
There is a square (or some other weird sign)

***PLEASE*** be specific. A square box is something *completely*
different than any other weird sign. It is impossible to understand
the problem if you don't know *exactly* what happens.

in place where polish
accented character should be (like "ęłąśł" etc)
This problem is only on mac os x and it doesn't apply to button widget
(where characters are correct)

I see. So it is a font problem: if the square box is displayed, it means
that the font just doesn't have a glyph for the character you want to
display. Try using a different font in the label widget.
There is no wish. I'm talking about build-in Tkinter

So try installing Tk separately.
(isn't Tk build-in Python?).

Depends on where exactly you got your Python from, and what exactly
is your OSX version. Recent releases of OSX include a copy of Tcl/Tk,
and some sets of Python binaries link against the Apple Tk.

Regards,
Martin
 
S

Sebastian PajÄ…k

2009/6/26 norseman said:
================
I re-read the posting. (Thanks for the link)

You do not mention if he has sent you any Polish words and if they
appear OK on your machine.

He has sent my a polish words, they appear correct. We both have the
english version of systems (they are both set to polish locale (time,
dates, keyboard etc.))
A note here:  In reading the original posting I get symbols that are not
familiar to me as alphabet.
From the line in your original:
    Label(root, text='ęóąśłżźćń').pack()
I see text='
          then an e with a goatee
               a  capitol O with an accent symbol on top (')
               an a with a tail on the right
               a  s with an accent on top
               an I do no not know what - maybe some sort of l with a
                                          slash through the middle
               a  couple of z with accents on top
               a  capitol C with an accent on top
               a  n with a short bar on top

I put the code into python and took a look.



I get:
cat xx

# -*- coding: utf-8 -*-

import sys
from Tkinter import *

root = Tk()

Label(root, text='\u0119ó\u0105\u015b\u0142\u017c\u017a\u0107\u0144').pack()
Button(root,
text='\u0119ó\u0105\u015b\u0142\u017c\u017a\u0107\u0144').pack()
Entry(root).pack()

root.mainloop()

Then:
python xx
 File "xx", line 10
SyntaxError: Non-ASCII character '\xf3' in file xx on line 10, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for details

So I did.
It notes Window$ puts things into those lines. Namely:
"To aid with platforms such as Windows, which add Unicode BOM marks
   to the beginning of Unicode files, the UTF-8 signature
   '\xef\xbb\xbf' will be interpreted as 'utf-8' encoding as well
   (even if no magic encoding comment is given).
"

Then I took out the o with the accent and re-ran the file.

Everything works except the text is exactly as shown above. That is:
\u0119ó\u0105\u015b\u0142\u017c\u017a\u0107\u0144
(shows twice as directed, one for label, one for button, no apostrophes)

OK - now I take a look at what in actually in the file.
in MC on Linux Slackware 10.2 I read, in the mail folder,
0119 capitol A with a tilde on top.
HEX readings beginning at the 0119\...
30 31 31 39 C3 B3 5C

but in the python file xx, I read:
30 31 31 39 5C
0119\...

I would have to say the mail system is screwing you up.  Might try zipping
the file and sending it that way and see if problem changes.

I've tried zipping
It looks like you you didn't save the script in UTF-8. Try to run the
original script file from attachment (UTF-8 without BOM).
ps. Do you have mac os x? It would be better if someone with mac tested it


# -*- coding: utf-8 -*-

import sys

from Tkinter import *

root = Tk()
root.tk.call('encoding', 'system', 'utf-8')

Label(root, text=u'ęóąśłżźćń').pack()
Button(root, text=u'ęóąśłżźćń').pack()

root.mainloop()
 
S

Sebastian PajÄ…k

 in place where polish
I see. So it is a font problem: if the square box is displayed, it means
that the font just doesn't have a glyph for the character you want to
display. Try using a different font in the label widget.

I've tried many fonts, the effect is always the same. Standard fonts
like Arial, Tahoma, Vedana all have Polish glyphs. The problem is that
Tkinter selects wrong glyphs for non-ASCII chars. It's not the font
issue as text on Button widget appears correctly
So try installing Tk separately.


Depends on where exactly you got your Python from, and what exactly
is your OSX version. Recent releases of OSX include a copy of Tcl/Tk,
and some sets of Python binaries link against the Apple Tk.

As I said I don't have mac osx. I just expect my Python script to be
portable and behave the same on both Windows and OSX, but It isn't.
 
N

norseman

Scott said:
norseman said:
... A note here: In reading the original posting I get symbols that
are not
familiar to me as alphabet.
From the line in your original:
Label(root, text='ęóąśłżźćń').pack()
I see text='
then an e with a goatee
a capitol O with an accent symbol on top (')
an a with a tail on the right
a s with an accent on top
an I do no not know what - maybe some sort of l with a
slash through the middle
a couple of z with accents on top
a capitol C with an accent on top
a n with a short bar on top

Here's something to try in any future circumstances:

Python 3.1rc2 (r31rc2:73414, Jun 13 2009, 16:43:15) [MSC v.1500 32 bit
(Intel)] on win32
Type "copyright", "credits" or "license()" for more information.print('%3d %4x %c %s' % (ord(ch), ord(ch), ch, ud.name(ch)))


281 119 Ä™ LATIN SMALL LETTER E WITH OGONEK
243 f3 ó LATIN SMALL LETTER O WITH ACUTE
261 105 Ä… LATIN SMALL LETTER A WITH OGONEK
347 15b Å› LATIN SMALL LETTER S WITH ACUTE
322 142 Å‚ LATIN SMALL LETTER L WITH STROKE
380 17c ż LATIN SMALL LETTER Z WITH DOT ABOVE
378 17a ź LATIN SMALL LETTER Z WITH ACUTE
263 107 ć LATIN SMALL LETTER C WITH ACUTE
324 144 Å„ LATIN SMALL LETTER N WITH ACUTE

--Scott David Daniels
(e-mail address removed)
==============
Good thought, good idea, useful tool.

BUT - compare your output to what I *see*.
And I do not see any f3 anywhere except in the doc ref I copy/duped and
in this file.

I suspect the mail handlers all have some differences.

I also suspect Window$ is still cooking it's outputs. It has a long
history of saying one thing and doing another.

I used to program exclusively in assembly. I know for a fact Window$ can
and does lie. Little ones, not often, but like your f3. I don't have
one. Not from the copy/paste of the original posting, not anywhere I
have looked in reviewing possible cause/effect of the problem posted. I
do have a C3 B3 byte pair after the 30 31 31 39 (0119) and before the
5C (\) that follows the 0119. MC shows it as a CAP-A with a tilde on
top of it. Firefox shows it as a CAP-O with an accent on top. (Kids
today call the Accent a single quote.)

I do not know what Window$ program to guide you to for proper hex
listings. I'm not even sure an accurate one exists. (No doubt someone
will now list a few thousand of them. :)


Maybe zipping and transferring the .zip will help - maybe not. I would
like to know the results.



Steve
 
P

Piet van Oostrum

Sebastian PajÄ…k said:
SP> Maybe this picture will tell you more:
SP> http://files.getdropbox.com/u/1211593/tkinter.png
SP> May someone can confirm this osx behaviour?

Yes, I get the same. But it is a problem of the underlying Tk
implementation. I get the same strange behaviour in wish.

Text widgets seem to have the same problem and it has to do with the use
of QuickDraw rather than ATSUI in Tcl/Tk 8.4. Whereas 8.5 uses ATSUI but
this seems to cause other problems.

See:
http://aspn.activestate.com/ASPN/Mail/Message/tcl-mac/2862062
http://aspn.activestate.com/ASPN/Mail/Message/tcl-mac/2862807
 
S

Sebastian PajÄ…k

2009/6/27 Piet van Oostrum said:
Yes, I get the same. But it is a problem of the underlying Tk
implementation. I get the same strange behaviour in wish.

Text widgets seem to have the same problem and it has to do with the use
of QuickDraw rather than ATSUI in Tcl/Tk 8.4. Whereas 8.5 uses ATSUI but
this seems to cause other problems.

Uhhh. good to know
It should be written somewhere with BIG letters so people like me
whouldn't be confused...

Thanks
Sebastian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top