Problem with curses and UTF-8

Ian Ward · Feb 7, 2006

When I run the following code in a terminal with the encoding set to
UTF-8 I get garbage on the first line, but the correct output on the second.

import curses
s = curses.initscr()
s.addstr('\xc3\x85 U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE\n')
s.addstr('\xc3\xa5 U+00F5 LATIN SMALL LETTER O WITH TILDE')
s.refresh()
s.getstr()
curses.endwin()

I tested with gnome-terminal, Python 2.4 and Ubuntu breezy. The output
is correct when I run the following code:

print '\xc3\x85 U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE'
print '\xc3\xa5 U+00F5 LATIN SMALL LETTER O WITH TILDE'

Any Ideas?

Ian Ward

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Feb 7, 2006

Ian said:
Any Ideas?

I think there is one or more ncurses bugs somewhere.

The ncurses documentation suggests that you should link with
ncurses_w instead of linking with ncurses - you might try
that as well. If it helps, please do report back.

Ultimately, somebody will need to debug ncurses to find out
what precisely happens, and why.

Regards,
Martin

Ian Ward · Feb 8, 2006

Martin said:
Ian Ward wrote:

I think there is one or more ncurses bugs somewhere.

The ncurses documentation suggests that you should link with
ncurses_w instead of linking with ncurses - you might try
that as well. If it helps, please do report back.

Ultimately, somebody will need to debug ncurses to find out
what precisely happens, and why.

Thank you for your response. I see there are other people that have run
into the same problem.

I've had to work around many curses issues while developing Urwid (a
console UI library). Even if the bugs are fixed I'm going to have to
bypass the curses module to support UTF-8 in a reliable way for all users.

I think there are enough escape sequences common to all modern terminals
so that I can build a generic curses-replacement for my library.
However, if someone is already working on something similar I don't want
to reinvent the wheel.

Ian Ward

Thomas Dickey · Feb 8, 2006

I think there is one or more ncurses bugs somewhere.

indeed. It might be nice to report them rather than jawing about it.

The ncurses documentation suggests that you should link with
ncurses_w instead of linking with ncurses - you might try
that as well. If it helps, please do report back.
ncursesw

Ultimately, somebody will need to debug ncurses to find out
what precisely happens, and why.

no need for debugging - it's a well-known problem. UTF-8 uses more than
one byte per cell, normal curses uses one byte per cell. To handle UTF-8,
you need ncursesw.

Thomas Dickey · Feb 8, 2006

Ian Ward said:
I've had to work around many curses issues while developing Urwid (a

hmm - I've read Urwid, and most of the comments I've read in that regard
reflect problems in Urwid. Perhaps it's time for you to do a little analysis.

(looking forward to bug reports, rather than line noise)

Grant Edwards · Feb 8, 2006

I think there are enough escape sequences common to all modern terminals
so that I can build a generic curses-replacement for my library.

Why not use termcap/terminfo?

Ian Ward · Feb 8, 2006

Thomas said:
hmm - I've read Urwid, and most of the comments I've read in that regard
reflect problems in Urwid. Perhaps it's time for you to do a little analysis.

(looking forward to bug reports, rather than line noise)

A fair request. My appologies for the inflammatory subject

When trying to check for user input without waiting I use code like:
window_object.nodelay(1)
curses.cbreak()
input = window_object.getch()

Occasionally (hard to reproduce reliably) the cbreak() call will raise
an exception, but if I call it a second time before calling getch the
code will work properly. This problem might be related to a signal
interrupting the function call, I'm not sure.

Also, screen resizing only seems to be reported once by getch() even if
the user continues to resize the window. I have worked around this by
calling curses.doupdate() between calls to getch(). Maybe this is by design?

Finally, the curses escape sequence detection could be broadened. The
top part of the curses_display module in Urwid defines many escape
sequences I've run into that curses doesn't detect.

Ian Ward

Ian Ward · Feb 8, 2006

Grant said:
Why not use termcap/terminfo?

That's a good idea, but I'd have to wrap the c library myself, wouldn't
I? Also, what happens when a user has an incorrect TERM setting (I've
run into this before)

I don't want to reimpliment all the nice speed optimizations that the
curses library has, I just want something simple that should work for as
many people as possible.

Ian Ward

Grant Edwards · Feb 8, 2006

That's a good idea, but I'd have to wrap the c library myself,
wouldn't I?

Probably. I don't remember seeing a python module for them.

Also, what happens when a user has an incorrect TERM setting
(I've run into this before)

Then things (besides your program) won't work.

I don't want to reimpliment all the nice speed optimizations
that the curses library has, I just want something simple that
should work for as many people as possible.

Depending on what you're tring to do, slang might be an option,
but I don't think there's a Python binding. There is a
(largely unsupported) Python binding for the "newt" widget set
that runs on top of slang. The old text-mode "red dialog
windows on a blue background" RedHat installer and admin apps
were written in Python using the newt widget library. The
"newt" Python module is called "snack".

Ian Ward · Feb 8, 2006

Thomas said:
ncursesw

I'll test it if someone would dumb down "link with ncursesw instead of
ncurses" a little for me.

I tried:
../configure --with-libs="ncursesw5"

and it failed saying:
checking size of wchar_t... configure: error: cannot compute sizeof
(wchar_t), 77

Ian Ward

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Feb 8, 2006

Thomas said:
no need for debugging - it's a well-known problem. UTF-8 uses more than
one byte per cell, normal curses uses one byte per cell. To handle UTF-8,
you need ncursesw.

I tried that, but it didn't improve anything.

Regards,
Martin

Ian Ward · Feb 8, 2006

Grant said:
Depending on what you're tring to do, slang might be an option,

I've looked at newt and snack, but all I really need is:
- a way to position the cursor at (0,0)
- a way to hide and show the cursor
- a way to detect when the terminal is resized
- a way to query the terminal size

Ian Ward

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Feb 8, 2006

I'll test it if someone would dumb down "link with ncursesw instead of

ncurses" a little for me.

I tried:
./configure --with-libs="ncursesw5"

and it failed saying:
checking size of wchar_t... configure: error: cannot compute sizeof
(wchar_t), 77

If that was Python's configure: don't do that. Instead, hack setup.py
to make it change the compiler/linker settings, or even edit the
compiler/linker line manually at first.

Regards.
Martin

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Feb 8, 2006

I'll test it if someone would dumb down "link with ncursesw instead of

ncurses" a little for me.

I tried:
./configure --with-libs="ncursesw5"

and it failed saying:
checking size of wchar_t... configure: error: cannot compute sizeof
(wchar_t), 77

If that was Python's configure: don't do that. Instead, hack setup.py
to make it change the compiler/linker settings, or even edit the
compiler/linker line manually at first.

Regards.
Martin

Ian Ward · Feb 8, 2006

Martin said:
If that was Python's configure: don't do that. Instead, hack setup.py
to make it change the compiler/linker settings, or even edit the
compiler/linker line manually at first.

Ok, that compiled.

Now when I run the same test:

import curses
s = curses.initscr()
s.addstr('\xc3\x85 U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE\n')
s.addstr('\xc3\xa5 U+00F5 LATIN SMALL LETTER O WITH TILDE')
s.refresh()
s.getstr()
curses.endwin()

This is what I see:

+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
+00F5 LATIN SMALL LETTER O WITH TILDE

so, the UTF-8 characters didn't appear and the " U" at the beginning
became just " ".

Ian Ward

Thomas Dickey · Feb 8, 2006

If that was Python's configure: don't do that. Instead, hack setup.py

yes - python's configure script needs a lot of work
(alternatively, it is not the sort of script I would write).

to make it change the compiler/linker settings, or even edit the
compiler/linker line manually at first.

that works

Thomas Dickey · Feb 9, 2006

Ian Ward said:
Martin v. Löwis wrote:

Ok, that compiled.

same here - though it was not immediately not clear which copy of ncurses it's
using (not the shared libraries since I installed those with tracing - a
little odd for it to use the static library, but that's what the access time
tells me).

To check on that (since I wanted to read the ncurses trace),
I ran strace and ltrace to look for clues.

Now when I run the same test:

import curses
s = curses.initscr()
s.addstr('\xc3\x85 U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE\n')
s.addstr('\xc3\xa5 U+00F5 LATIN SMALL LETTER O WITH TILDE')
s.refresh()
s.getstr()
curses.endwin()

Testing this, and looking to see what's going on, I notice that python
is doing a

setlocale(LC_ALL, "C");

before the addstr is actually called. (ncurses never sets the locale;
it calls setlocale in one place to ask what it is).

That makes ncurses think it's not really doing UTF-8, of course. What I
see on the screen is the U+00C5 comes out with a box and a "~E" (the
latter being ncurses' representation in POSIX for \0x85).

This is what I see:

+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
+00F5 LATIN SMALL LETTER O WITH TILDE

so, the UTF-8 characters didn't appear and the " U" at the beginning
became just " ".

well - running in uxterm I see the second line properly. But some more
tinkering is needed to make python work properly.

Thomas Dickey · Feb 9, 2006

Grant Edwards said:
Depending on what you're tring to do, slang might be an option,

perhaps not - he's trying to use UTF-8. I haven't seen any plausible
comment that indicates John Davis is interested in updating newt to
work with slang2 (though of course he's welcome to show the code ;-)

Thomas Dickey · Feb 9, 2006

I've looked at newt and snack, but all I really need is:
- a way to position the cursor at (0,0)
- a way to hide and show the cursor
- a way to detect when the terminal is resized
- a way to query the terminal size

....and send UTF-8 text, keeping track of where you really are on the screen.

Thomas Dickey · Feb 9, 2006

A fair request. My appologies for the inflammatory subject

When trying to check for user input without waiting I use code like:
window_object.nodelay(1)
curses.cbreak()
input = window_object.getch()

Occasionally (hard to reproduce reliably) the cbreak() call will raise
an exception, but if I call it a second time before calling getch the
code will work properly. This problem might be related to a signal
interrupting the function call, I'm not sure.

perhaps a more complete test-case would let me test it and see.

Also, screen resizing only seems to be reported once by getch() even if
the user continues to resize the window. I have worked around this by
calling curses.doupdate() between calls to getch(). Maybe this is by design?

Or perhaps it's some interaction with python - I don't know.
The applications that I use with resizing (and ncurses' test
programs) work smoothly enough.

Finally, the curses escape sequence detection could be broadened. The
top part of the curses_display module in Urwid defines many escape
sequences I've run into that curses doesn't detect.

That's data (terminfo). ncurses is data-driven, doesn't "detect"
features of the terminal (though it does of course use environment
variables for locale, etc.).

xterm's terminfo lists a lot of function keys, for instance.

The limit for predefined function-key names for terminfo is 60,
but ncurses can accept extended terminfo descriptions (but I like to
limit the length and style of names so it's possible to access them
from termcap). One could define names like shift_f1, but then termcap
applications couldn't see them. (The last I knew, slang doesn't either,
but that's a different thread).

That's been true for about 6 years.

Current xterm's terminfo includes these names which apply to your
comment: The ones on the end are extended names that ncurses' tic
deduces from the terminfo file when it compiles it:

comparing xterm-new to xterm-xf86-v44.
comparing booleans.
comparing numbers.
comparing strings.
kf49: '\EO3P', NULL.
kf50: '\EO3Q', NULL.
kf51: '\EO3R', NULL.
kf52: '\EO3S', NULL.
kf53: '\E[15;3~', NULL.
kf54: '\E[17;3~', NULL.
kf55: '\E[18;3~', NULL.
kf56: '\E[19;3~', NULL.
kf57: '\E[20;3~', NULL.
kf58: '\E[21;3~', NULL.
kf59: '\E[23;3~', NULL.
kf60: '\E[24;3~', NULL.
kf61: '\EO4P', NULL.
kf62: '\EO4Q', NULL.
kf63: '\EO4R', NULL.
kind: '\E[1;2B', NULL.
kri: '\E[1;2A', NULL.
kDN: '\E[1;2B', NULL.
kDN5: '\E[1;5B', NULL.
kDN6: '\E[1;6B', NULL.
kLFT5: '\E[1;5D', NULL.
kLFT6: '\E[1;6D', NULL.
kRIT5: '\E[1;5C', NULL.
kRIT6: '\E[1;6C', NULL.
kUP: '\E[1;2A', NULL.
kUP5: '\E[1;5A', NULL.
kUP6: '\E[1;6A', NULL.

MeCab UTF-8 Decoding Problem	6	Jun 29, 2013
codec for UTF-8 with BOM	3	May 2, 2011
UTF-8 read & print?	6	Nov 25, 2012
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position	58	Sep 29, 2013
Stuck with urllib.quote and Unicode/UTF-8	0	May 7, 2011
Forcing any output (file / stdout) to UTF-8	0	Jun 6, 2010
decoding keyboard input when using curses	6	May 30, 2009
UTF-8 characters in doctest	6	Sep 19, 2007

Problem with curses and UTF-8

Ian Ward

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ian Ward

Thomas Dickey

Thomas Dickey

Grant Edwards

Ian Ward

Ian Ward

Grant Edwards

Ian Ward

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ian Ward

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ian Ward

Thomas Dickey

Thomas Dickey

Thomas Dickey

Thomas Dickey

Thomas Dickey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads