Problem with curses and UTF-8

Discussion in 'Python' started by Ian Ward, Feb 7, 2006.

  1. Ian Ward

    Ian Ward Guest

    When I run the following code in a terminal with the encoding set to
    UTF-8 I get garbage on the first line, but the correct output on the second.


    import curses
    s = curses.initscr()
    s.addstr('\xc3\x85 U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE\n')
    s.addstr('\xc3\xa5 U+00F5 LATIN SMALL LETTER O WITH TILDE')
    s.refresh()
    s.getstr()
    curses.endwin()


    I tested with gnome-terminal, Python 2.4 and Ubuntu breezy. The output
    is correct when I run the following code:


    print '\xc3\x85 U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE'
    print '\xc3\xa5 U+00F5 LATIN SMALL LETTER O WITH TILDE'


    Any Ideas?

    Ian Ward
     
    Ian Ward, Feb 7, 2006
    #1
    1. Advertisements

  2. I think there is one or more ncurses bugs somewhere.

    The ncurses documentation suggests that you should link with
    ncurses_w instead of linking with ncurses - you might try
    that as well. If it helps, please do report back.

    Ultimately, somebody will need to debug ncurses to find out
    what precisely happens, and why.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Feb 7, 2006
    #2
    1. Advertisements

  3. Ian Ward

    Ian Ward Guest

    Thank you for your response. I see there are other people that have run
    into the same problem.

    I've had to work around many curses issues while developing Urwid (a
    console UI library). Even if the bugs are fixed I'm going to have to
    bypass the curses module to support UTF-8 in a reliable way for all users.

    I think there are enough escape sequences common to all modern terminals
    so that I can build a generic curses-replacement for my library.
    However, if someone is already working on something similar I don't want
    to reinvent the wheel.

    Ian Ward
     
    Ian Ward, Feb 8, 2006
    #3
  4. indeed. It might be nice to report them rather than jawing about it.
    no need for debugging - it's a well-known problem. UTF-8 uses more than
    one byte per cell, normal curses uses one byte per cell. To handle UTF-8,
    you need ncursesw.
     
    Thomas Dickey, Feb 8, 2006
    #4
  5. hmm - I've read Urwid, and most of the comments I've read in that regard
    reflect problems in Urwid. Perhaps it's time for you to do a little analysis.

    (looking forward to bug reports, rather than line noise)
     
    Thomas Dickey, Feb 8, 2006
    #5
  6. Why not use termcap/terminfo?
     
    Grant Edwards, Feb 8, 2006
    #6
  7. Ian Ward

    Ian Ward Guest

    A fair request. My appologies for the inflammatory subject :)

    When trying to check for user input without waiting I use code like:
    window_object.nodelay(1)
    curses.cbreak()
    input = window_object.getch()

    Occasionally (hard to reproduce reliably) the cbreak() call will raise
    an exception, but if I call it a second time before calling getch the
    code will work properly. This problem might be related to a signal
    interrupting the function call, I'm not sure.

    Also, screen resizing only seems to be reported once by getch() even if
    the user continues to resize the window. I have worked around this by
    calling curses.doupdate() between calls to getch(). Maybe this is by design?

    Finally, the curses escape sequence detection could be broadened. The
    top part of the curses_display module in Urwid defines many escape
    sequences I've run into that curses doesn't detect.

    Ian Ward
     
    Ian Ward, Feb 8, 2006
    #7
  8. Ian Ward

    Ian Ward Guest

    That's a good idea, but I'd have to wrap the c library myself, wouldn't
    I? Also, what happens when a user has an incorrect TERM setting (I've
    run into this before)

    I don't want to reimpliment all the nice speed optimizations that the
    curses library has, I just want something simple that should work for as
    many people as possible.

    Ian Ward
     
    Ian Ward, Feb 8, 2006
    #8
  9. Probably. I don't remember seeing a python module for them.
    Then things (besides your program) won't work.
    Depending on what you're tring to do, slang might be an option,
    but I don't think there's a Python binding. There is a
    (largely unsupported) Python binding for the "newt" widget set
    that runs on top of slang. The old text-mode "red dialog
    windows on a blue background" RedHat installer and admin apps
    were written in Python using the newt widget library. The
    "newt" Python module is called "snack".
     
    Grant Edwards, Feb 8, 2006
    #9
  10. Ian Ward

    Ian Ward Guest

    I'll test it if someone would dumb down "link with ncursesw instead of
    ncurses" a little for me.

    I tried:
    ../configure --with-libs="ncursesw5"

    and it failed saying:
    checking size of wchar_t... configure: error: cannot compute sizeof
    (wchar_t), 77

    Ian Ward
     
    Ian Ward, Feb 8, 2006
    #10
  11. I tried that, but it didn't improve anything.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Feb 8, 2006
    #11
  12. Ian Ward

    Ian Ward Guest

    I've looked at newt and snack, but all I really need is:
    - a way to position the cursor at (0,0)
    - a way to hide and show the cursor
    - a way to detect when the terminal is resized
    - a way to query the terminal size

    Ian Ward
     
    Ian Ward, Feb 8, 2006
    #12
  13. I'll test it if someone would dumb down "link with ncursesw instead of
    If that was Python's configure: don't do that. Instead, hack setup.py
    to make it change the compiler/linker settings, or even edit the
    compiler/linker line manually at first.

    Regards.
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Feb 8, 2006
    #13
  14. I'll test it if someone would dumb down "link with ncursesw instead of
    If that was Python's configure: don't do that. Instead, hack setup.py
    to make it change the compiler/linker settings, or even edit the
    compiler/linker line manually at first.

    Regards.
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Feb 8, 2006
    #14
  15. Ian Ward

    Ian Ward Guest

    Ok, that compiled.

    Now when I run the same test:

    import curses
    s = curses.initscr()
    s.addstr('\xc3\x85 U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE\n')
    s.addstr('\xc3\xa5 U+00F5 LATIN SMALL LETTER O WITH TILDE')
    s.refresh()
    s.getstr()
    curses.endwin()


    This is what I see:

    +00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
    +00F5 LATIN SMALL LETTER O WITH TILDE


    so, the UTF-8 characters didn't appear and the " U" at the beginning
    became just " ".

    Ian Ward
     
    Ian Ward, Feb 8, 2006
    #15
  16. yes - python's configure script needs a lot of work
    (alternatively, it is not the sort of script I would write).
    that works
     
    Thomas Dickey, Feb 8, 2006
    #16
  17. same here - though it was not immediately not clear which copy of ncurses it's
    using (not the shared libraries since I installed those with tracing - a
    little odd for it to use the static library, but that's what the access time
    tells me).

    To check on that (since I wanted to read the ncurses trace),
    I ran strace and ltrace to look for clues.
    Testing this, and looking to see what's going on, I notice that python
    is doing a

    setlocale(LC_ALL, "C");

    before the addstr is actually called. (ncurses never sets the locale;
    it calls setlocale in one place to ask what it is).

    That makes ncurses think it's not really doing UTF-8, of course. What I
    see on the screen is the U+00C5 comes out with a box and a "~E" (the
    latter being ncurses' representation in POSIX for \0x85).
    well - running in uxterm I see the second line properly. But some more
    tinkering is needed to make python work properly.
     
    Thomas Dickey, Feb 9, 2006
    #17
  18. perhaps not - he's trying to use UTF-8. I haven't seen any plausible
    comment that indicates John Davis is interested in updating newt to
    work with slang2 (though of course he's welcome to show the code ;-)
     
    Thomas Dickey, Feb 9, 2006
    #18
  19. ....and send UTF-8 text, keeping track of where you really are on the screen.
     
    Thomas Dickey, Feb 9, 2006
    #19
  20. perhaps a more complete test-case would let me test it and see.
    Or perhaps it's some interaction with python - I don't know.
    The applications that I use with resizing (and ncurses' test
    programs) work smoothly enough.
    That's data (terminfo). ncurses is data-driven, doesn't "detect"
    features of the terminal (though it does of course use environment
    variables for locale, etc.).

    xterm's terminfo lists a lot of function keys, for instance.

    The limit for predefined function-key names for terminfo is 60,
    but ncurses can accept extended terminfo descriptions (but I like to
    limit the length and style of names so it's possible to access them
    from termcap). One could define names like shift_f1, but then termcap
    applications couldn't see them. (The last I knew, slang doesn't either,
    but that's a different thread).

    That's been true for about 6 years.

    Current xterm's terminfo includes these names which apply to your
    comment: The ones on the end are extended names that ncurses' tic
    deduces from the terminfo file when it compiles it:

    comparing xterm-new to xterm-xf86-v44.
    comparing booleans.
    comparing numbers.
    comparing strings.
    kf49: '\EO3P', NULL.
    kf50: '\EO3Q', NULL.
    kf51: '\EO3R', NULL.
    kf52: '\EO3S', NULL.
    kf53: '\E[15;3~', NULL.
    kf54: '\E[17;3~', NULL.
    kf55: '\E[18;3~', NULL.
    kf56: '\E[19;3~', NULL.
    kf57: '\E[20;3~', NULL.
    kf58: '\E[21;3~', NULL.
    kf59: '\E[23;3~', NULL.
    kf60: '\E[24;3~', NULL.
    kf61: '\EO4P', NULL.
    kf62: '\EO4Q', NULL.
    kf63: '\EO4R', NULL.
    kind: '\E[1;2B', NULL.
    kri: '\E[1;2A', NULL.
    kDN: '\E[1;2B', NULL.
    kDN5: '\E[1;5B', NULL.
    kDN6: '\E[1;6B', NULL.
    kLFT5: '\E[1;5D', NULL.
    kLFT6: '\E[1;6D', NULL.
    kRIT5: '\E[1;5C', NULL.
    kRIT6: '\E[1;6C', NULL.
    kUP: '\E[1;2A', NULL.
    kUP5: '\E[1;5A', NULL.
    kUP6: '\E[1;6A', NULL.
     
    Thomas Dickey, Feb 9, 2006
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.