wchar_t in Linux

Harald van DÄ³k · May 7, 2011

Thanks a lot!
That changes output a bit.
\x2500 is shown as a blank
\u2500 is not shown at all

That is odd. \x2500 and \u2500 should map to the same character on
your system. I suggested the \u form because for U+1xxxx, and for
extended characters in non-wide character strings, it is far more
readable than specifying the representation, plus you get better
diagnostics if you specify an invalid character, but in this case they
should behave the same.

I am still searching what is the right parameter for setlocale
(http://www.wolf-fuerth.de/charmap.png)

Often you can use setlocale(LC_ALL, ""), meaning "whatever locale the
user has set up", and in this case you probably should. You shouldn't
force an UTF-8 encoding: if the user is on an ISO-8859-1 terminal, and
you start outputting UTF-8 bytes, the user will just get garbage.

Heinrich Wolf · May 7, 2011

That is odd. \x2500 and \u2500 should map to the same character on
your system. I suggested the \u form because for U+1xxxx, and for
extended characters in non-wide character strings, it is far more
readable than specifying the representation, plus you get better
diagnostics if you specify an invalid character, but in this case they
should behave the same.

[Heiner]
Maybe it is due to changing my source from .c to .cpp . But that would be
very strange. In my makefile I use cc . Maybe cc loads different compilers
for .cpp and .c . In .c files \u is not allowed.

I am still searching what is the right parameter for setlocale
(http://www.wolf-fuerth.de/charmap.png)

[Harald]
Often you can use setlocale(LC_ALL, ""), meaning "whatever locale the
user has set up", and in this case you probably should. You shouldn't
force an UTF-8 encoding: if the user is on an ISO-8859-1 terminal, and
you start outputting UTF-8 bytes, the user will just get garbage.

[Heiner]
My terminal setting is de_DE.utf8 . And I still do not get \u2500 displayed.
I hoped, that I could solve this problem with a call to setlocale() , but
with what charmap other than utf8 in it?

kind regards
Heiner

Heinrich Wolf · May 7, 2011

....

\x2500 is shown as a blank
\u2500 is not shown at all

In the following test program \u2500 is also shown as blank:

#include <curses.h>
const wchar_t *Holz2 = L"o\u2500\u2500\u2500";
int main(void)
{
initscr();
addwstr(Holz2);
getch();
endwin();
return 0;
}

I thought that \u2500 was not shown at all, because
{
move(y, x);
addwstr(Holz2);
}
in my full programm does not move to the right position.

Heinrich Wolf · May 7, 2011

....
Often you can use setlocale(LC_ALL, ""), meaning "whatever locale the
user has set up", and in this case you probably should. You shouldn't
force an UTF-8 encoding: if the user is on an ISO-8859-1 terminal, and
you start outputting UTF-8 bytes, the user will just get garbage.

[Heiner]
If I have no setlocale() call in my program at all, \u2500 is shown as
blank.
set | grep LANG
in my terminal shows de_DE.utf8 .
In both cases if I call setlocale(LC_ALL, ""); or setlocale(LC_CTYPE,
"de_DE.utf8");
after initscr() , then
move(y, x);
addwstr(L"o\u2500\u2500\u2500");
moves to a wrong position left of the desired one.

Heinrich Wolf · May 7, 2011

....

\x2500 is shown as a blank
\u2500 is not shown at all

That is not correct.
The problem was a call to
setlocale(LC_CTYPE, "de_DE.utf8");
after initscr(); . Then
move(y, x);
addwstr(L"o\u2500\u2500\u2500");
moves to a wrong position left of the desired one.
If I have no setlocale() call in my program at all, \u2500 is shown as
blank.
set | grep LANG
in my terminal shows de_DE.utf8 .

I am still searching what is needed to display the line character from
http://www.wolf-fuerth.de/charmap.png

Ben Bacarisse · May 8, 2011

Heinrich Wolf said:
...

No no!
I am using f14 Linux and it is the charmap from gnome.

Sorry. I should have recognised it. The effect is two-fold:

(a) You can use the better \uXXXX notation; and
(b) to find out about the locale settings (both inside your program and
outside in your OS) post in comp.unix.programmer.

Heinrich Wolf · May 8, 2011

....

(a) You can use the better \uXXXX notation; and

....

In the meantime I try to use that. But it makes no difference.
\u2500 is for de_DE.utf16
My terminal has de_DE.utf8 . But maybe curses switches to de_DE.utf16 .
setlocale(LC_CTYPE, "de_DE.utf16"); makes no difference.
The desired character is L"\xE2\x94\x80" in de_DE.utf16 .
But using that string makes no difference either.
The character is always shown as a blank.
Copying the character from charmap to clipboard and pasting into the
terminal displays fine.

Heinrich Wolf · May 8, 2011

....

In the following test program \u2500 is also shown as blank:

#include <curses.h>
const wchar_t *Holz2 = L"o\u2500\u2500\u2500";
int main(void)
{
initscr();
addwstr(Holz2);
getch();
endwin();
return 0;
} ....

Here is a screenshot of page 2 of the charmap:
http://www.wolf-fuerth.de/cmap2en.png
The following test program does not improve displaying:

#include <curses.h>
#include <locale.h>
int main(void)
{
initscr();
move(0, 0);
addwstr(L"o\xE2\x94\x80\xE2\x94\x80\xE2\x94\x80");
move(1, 0);
addwstr(L"o\u2500\u2500\u2500");
setlocale(LC_ALL, "de_DE.utf16");
move(2, 0);
addwstr(L"o\u2500\u2500\u2500");
getch();
endwin();
return 0;
}

My terminal is set to de_DE.utf8 . Copying the character from charmap to
clipboard and pasting into the terminal displays fine.

Harald van DÄ³k · May 8, 2011

Maybe it is due to changing my source from .c to .cpp . But that would be
very strange. In my makefile I use cc . Maybe cc loads different compilers
for .cpp and .c . In .c files \u is not allowed.

In .c files \u is allowed if you pass the -std=c99 (standard C) or -
std=gnu99 (C with GNU extensions) option to the compiler. If you don't
pass either of those options, \u still works, but you get a warning
telling you that it is only valid in C++ and in C99. You don't need to
compile your program in C++ mode for it; that just gives you a wholly
different set of problems. (C++ is fine, but compiling C code as C++
is rarely a good idea.)

My terminal setting is de_DE.utf8 . And I still do not get \u2500 displayed.
I hoped, that I could solve this problem with a call to setlocale() , but
with what charmap other than utf8 in it?

At the very start of your program, before anything else (even
initscr), call setlocale(LC_ALL, "").

Heinrich Wolf · May 8, 2011

In .c files \u is allowed if you pass the -std=c99 (standard C) or -
std=gnu99 (C with GNU extensions) option to the compiler.

[Heiner]
If I specify cc -std=c99 or cc -std=gnu99 , then I get a warning "implicit
declaration of addwstr" with whatever #include <*curses*.h> I use.
Furthermore if I specify cc -std=c99 in place of g++ that does not #define
linux , but I need #ifdef linux , because my full program shall be
multi-platform.

[Harald]
If you don't
pass either of those options, \u still works, but you get a warning
telling you that it is only valid in C++ and in C99. You don't need to
compile your program in C++ mode for it; that just gives you a wholly
different set of problems. (C++ is fine, but compiling C code as C++
is rarely a good idea.)

My terminal setting is de_DE.utf8 . And I still do not get \u2500
displayed.
I hoped, that I could solve this problem with a call to setlocale() , but
with what charmap other than utf8 in it?

At the very start of your program, before anything else (even
initscr), call setlocale(LC_ALL, "").

[Heiner]
Thank you very much! That did the trick. I had already tried
setlocale(LC_ALL, ""); , but without success, because I had placed it after
initscr();

Martin Ambuhl · May 8, 2011

If I specify cc -std=c99 or cc -std=gnu99 , then I get a warning "implicit
declaration of addwstr" with whatever #include <*curses*.h> I use.

That's trivial to take care of.
After
#include <curses.h>
Add
int addwstr(const wchar_t *str);

Harald van DÄ³k · May 8, 2011

That is why I suggested earlier to enable warnings. You would have
received a warning even without -std=c99, if you added -Wall to your
compiler options. The only effect -std=c99 had on that is that it
defaults to showing you the warning. I will repeat this now: please
turn on warnings. They tell you when you're doing something wrong.

That's trivial to take care of.
After
#include <curses.h>
Add
int addwstr(const wchar_t *str);

That's the wrong solution. There are more conflicts between
differently configured versions of ncurses than merely the number of
available functions. The right solution is to use the header files
belonging to the version of the library you're using. For example, if
they are not installed in /usr/include, but in /usr/include/ncursesw
(as they are on my non-Fedora system), you would add -I/usr/include/
ncursesw to the compiler options. If they are installed somewhere
else, find out where that somewhere else is.

Heinrich Wolf · May 8, 2011

....
That's trivial to take care of.
After
#include <curses.h>
Add
int addwstr(const wchar_t *str);

That's the wrong solution. There are more conflicts between
differently configured versions of ncurses than merely the number of
available functions. The right solution is to use the header files
belonging to the version of the library you're using. For example, if
they are not installed in /usr/include, but in /usr/include/ncursesw
(as they are on my non-Fedora system), you would add -I/usr/include/
ncursesw to the compiler options. If they are installed somewhere
else, find out where that somewhere else is.

[Heiner]
I also have a file /usr/include/ncursesw/cursesw.h on my Fedora 14 and if I
do
#include <cursesw.h>
cc -std=gnu99 -I /usr/include/ncursesw/ holz.c ...
I get errors due to the fact that cursesw.h is a c++ header file:
It contains extern "C" {} , which cc does not understand.
If I use g++ in place of cc, it compiles with the warning that -std=gnu99 is
not allowed in C++ .

Heinrich Wolf · May 8, 2011

....

If I specify cc -std=c99 or cc -std=gnu99 , then I get a warning "implicit
declaration of addwstr" with whatever #include <*curses*.h> I use.
Furthermore if I specify cc -std=c99 in place of g++ that does not #define
linux , but I need #ifdef linux , because my full program shall be
multi-platform.

....
addwstr seems not to be definded in curses.h , only in cursesw.h , but that
is a C++ file. However I can work without addwstr(QuickMatch); -
printw("%ls", QuickMatch); does the same.

Harald van DÄ³k · May 8, 2011

addwstr seems not to be definded in curses.h , only in cursesw.h ,

addwstr will be defined in /usr/include/ncursesw/curses.h (or maybe /
usr/include/ncursesw/ncurses.h, but those two are probably the same),
just not in /usr/include/curses.h. You don't need cursesw.h, you can

continue to include said:
but that
is a C++ file. However I can work without addwstr(QuickMatch); -
printw("%ls", QuickMatch); does the same.

It does, so yes, you can use that, but be careful: your version with
addwstr had the advantage of the linker telling you "this isn't going
to work" when linking with just -lcurses, prompting this thread. Do
make sure you keep using -lcursesw and the corresponding header files.

Heinrich Wolf · May 8, 2011

addwstr will be defined in /usr/include/ncursesw/curses.h (or maybe /
usr/include/ncursesw/ncurses.h, but those two are probably the same),
just not in /usr/include/curses.h. You don't need cursesw.h, you can
continue to include <curses.h>. You just need the right <curses.h>.

[Heiner]
#include <ncursesw/curses.h>
or
#include <ncursesw/ncurses.h>
both lead to "implicit declaration"

/usr/include/ncursesw/curses.h is a symbolic link to /usr/include/curses.h
/usr/include/ncursesw/ncurses.h is a symbolic link to /usr/include/ncurses.h
/usr/include/ncurses.h is a symbolic link to /usr/include/curses.h
Four different i-nodes for all the same file.
addwstr is hidden behind #ifdef __cplusplus

However I can work without addwstr(QuickMatch); -
printw("%ls", QuickMatch); does the same.

[Harald]
It does, so yes, you can use that, but be careful: your version with
addwstr had the advantage of the linker telling you "this isn't going
to work" when linking with just -lcurses, prompting this thread. Do
make sure you keep using -lcursesw and the corresponding header files.

[Heiner]
You are right! -lcursesw gives the desired output, -lcurses gives corrupted
output on the screen.

Harald van DÄ³k · May 8, 2011

#include <ncursesw/curses.h>
or
#include <ncursesw/ncurses.h>
both lead to "implicit declaration"

/usr/include/ncursesw/curses.h is a symbolic link to /usr/include/curses.h
/usr/include/ncursesw/ncurses.h is a symbolic link to /usr/include/ncurses.h
/usr/include/ncurses.h is a symbolic link to /usr/include/curses.h
Four different i-nodes for all the same file.
addwstr is hidden behind #ifdef __cplusplus

Huh. Looking at how Fedora sets up its ncurses, yes, you don't need a
specific -I compiler option. You also don't need C++. You do, however,
need to define _XOPEN_SOURCE_EXTENDED.

At any rate, it is good that you have things working without it.

Ben Bacarisse · May 8, 2011

Heinrich Wolf said:
...
...

In the meantime I try to use that. But it makes no difference.
\u2500 is for de_DE.utf16

No it isn't. \u2500 in a wide C string specifies the character with hex
2500 as it's value. When you output the string, the C library can
encode this character in a number of ways, possibly including utf16 and
certainly including utf8.

My terminal has de_DE.utf8 . But maybe curses switches to de_DE.utf16 .
setlocale(LC_CTYPE, "de_DE.utf16"); makes no difference.

Absolutely. It's wrong and it won't make matters any worse. You need
to tell the C library what encoding your terminal uses -- not the
encoding you think you are using in your strings. Usually

setlocale(LC_ALL, "");

is all you need, but this relied on system specific things like your
environment setting. To be sure, try

setlocale(LC_ALL, "de_DE.utf8");

in the meantime.

The desired character is L"\xE2\x94\x80" in de_DE.utf16 .
But using that string makes no difference either.

Yup. You need to step back and re-think what you think you know about
characters and encodings. The desired character is not
L"\xE2\x94\x80". That may be some encoding of that character, but
unless the encoding happens to match the one used by your program's
output system, you should not try to use it.

The character you want is \u2500 (or \x2500 the difference is subtle and
not important to the current problem). The part that is not working is
getting this charcter output in a form that is compatible with your
terminal. Try the setlocale calls I suggested above. It certainly
works for me (my terminal uses en_GB.utf8).

<snip>

Ben Bacarisse · May 8, 2011

<stuff... snipped>

Should have read the rest of the thread. Problem already solved.

Heinrich Wolf · May 10, 2011

....
Huh. Looking at how Fedora sets up its ncurses, yes, you don't need a
specific -I compiler option. You also don't need C++. You do, however,
need to define _XOPEN_SOURCE_EXTENDED.

[Heiner]
Thank you very much! That did the trick for addwstr.

Undefined Reference to Main	2	Jun 12, 2012
problems with installing the ruby libnet module	1	Apr 24, 2007
linux g++ compile error	1	Aug 9, 2013
Why does this compile OK, but fail to build?	2	Jun 6, 2014
undefined reference to symbol '_end'	5	Sep 2, 2012
undefined reference to `__gxx_personality_v0'	7	Sep 24, 2009
merging multiple static library	2	Sep 7, 2007
Why the undefined references in this simple program?	5	Feb 26, 2013

wchar_t in Linux

Harald van DÄ³k

Heinrich Wolf

Heinrich Wolf

Heinrich Wolf

Heinrich Wolf

Ben Bacarisse

Heinrich Wolf

Heinrich Wolf

Harald van DÄ³k

Heinrich Wolf

Martin Ambuhl

Harald van DÄ³k

Heinrich Wolf

Heinrich Wolf

Harald van DÄ³k

Heinrich Wolf

Harald van DÄ³k

Ben Bacarisse

Ben Bacarisse

Heinrich Wolf

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads