wcout, wprintf() only print English

I

Ioannis Vranos

Has anyone actually managed to print non-English text by using wcout or
wprintf and the rest of standard, wide character functions?
 
I

Ioannis Vranos

Ioannis said:
Has anyone actually managed to print non-English text by using wcout or
wprintf and the rest of standard, wide character functions?


For example:

[john@localhost src]$ cat main.cc
#include <iostream>

int main()
{
using namespace std;

wcout<< L"Äïêéìáóôéêü ìÞíõìá\n";
}

[john@localhost src]$ ./foobar-cpp
??????????? ??????

[john@localhost src]$
 
R

Rolf Magnus

Ioannis said:
Ioannis said:
Has anyone actually managed to print non-English text by using wcout or
wprintf and the rest of standard, wide character functions?


For example:

[john@localhost src]$ cat main.cc
#include <iostream>

int main()
{
using namespace std;

wcout<< L"Δοκιμαστικό μήνυμα\n";

Are you sure that you stored your source file in the same encoding the
compiler expects as source character set?
}

[john@localhost src]$ ./foobar-cpp
??????????? ??????

[john@localhost src]$
 
I

Ioannis Vranos

Rolf said:
Ioannis said:
Ioannis said:
Has anyone actually managed to print non-English text by using wcout or
wprintf and the rest of standard, wide character functions?

For example:

[john@localhost src]$ cat main.cc
#include <iostream>

int main()
{
using namespace std;

wcout<< L"Δοκιμαστικό μήνυμα\n";

Are you sure that you stored your source file in the same encoding the
compiler expects as source character set?
}

[john@localhost src]$ ./foobar-cpp
??????????? ??????

[john@localhost src]$


Well I created the file with anjuta editor with the message being a
Greek one. The Greek message also appears the same when I display the
source file in the console.

I suppose it is saved as UTF8.


Also the code

#include <iostream>
#include <string>

int main()
{
using namespace std;

wstring s;

wcin>> s;


wcout<< s<< endl;
}


displays nothing when I enter greek text.


Should I mess with locales?
 
I

Ioannis Vranos

Ioannis said:
Rolf said:
Ioannis said:
Ioannis Vranos wrote:
Has anyone actually managed to print non-English text by using wcout or
wprintf and the rest of standard, wide character functions?

For example:

[john@localhost src]$ cat main.cc
#include <iostream>

int main()
{
using namespace std;

wcout<< L"Δοκιμαστικό μήνυμα\n";

Are you sure that you stored your source file in the same encoding the
compiler expects as source character set?
}

[john@localhost src]$ ./foobar-cpp
??????????? ??????

[john@localhost src]$


Well I created the file with anjuta editor with the message being a
Greek one. The Greek message also appears the same when I display the
source file in the console.

I suppose it is saved as UTF8.


Also the code

#include <iostream>
#include <string>

int main()
{
using namespace std;

wstring s;

wcin>> s;


wcout<< s<< endl;
}


displays nothing when I enter greek text.


both in g++ under Linux and VC++ 2008 Express under Windows, with the
latest saving the source code file as Unicode after it detected
non-english text.
 
I

Ioannis Vranos

Made more precise:

Ioannis said:
For example:

[john@localhost src]$ cat main.cc
#include <iostream>

int main()
{
using namespace std;

wcout<< L"Äïêéìáóôéêü ìÞíõìá\n";

Are you sure that you stored your source file in the same encoding the
compiler expects as source character set?

}

[john@localhost src]$ ./foobar-cpp
??????????? ??????

[john@localhost src]$


Well I created the file with anjuta editor with the message being a
Greek one. The Greek message also appears the same when I display the
source file in the console.

I suppose it is saved as UTF8.


Also the code

#include <iostream>
#include <string>

int main()
{
using namespace std;

wstring s;
wcin>> s;

wcout<< s<< endl;
}

displays the Greek text when I enter it, but outputs nothing. With
English text, the text is displayed both when entered and outputed.


[john@localhost src]$ ./foobar-cpp
Äïêéìáóôéêü

[john@localhost src]$ ./foobar-cpp
Test
Test
[john@localhost src]$
 
J

Jeff Schwab

Ioannis said:
Ioannis said:
Has anyone actually managed to print non-English text by using wcout
or wprintf and the rest of standard, wide character functions?


For example:

[john@localhost src]$ cat main.cc
#include <iostream>

int main()
{
using namespace std;

wcout<< L"Äïêéìáóôéêü ìÞíõìá\n";
}

[john@localhost src]$ ./foobar-cpp
??????????? ??????

[john@localhost src]$

Hmmm... I work almost entirely in English, so this error message is new
to me:

$ make
g++ -ansi -pedantic -Wall main.cc -o main
main.cc: In function 'int main()':
main.cc:4: error: converting to execution character set: Invalid or
incomplete multibyte or wide character
make: *** [main] Error 1
 
B

Boris

[...][...]displays the Greek text when I enter it, but outputs nothing. With
English text, the text is displayed both when entered and outputed.

I don't remember anymore the details but the problem has something to do
with codecvt: Your wide characters are automatically converted to narrow
characters by wcout. This is something you might not want (and even if you
want it the conversion might not work automatically the way you expect :).

Try writing to wstringstream and converting to UTF-8 explicitly (storing
the result eg. in string). If your console supports UTF-8 you can print to
cout (otherwise print to a file so you can test the output in an editor).

HTH,
Boris
 
I

Ioannis Vranos

Jeff said:
Ioannis said:
Ioannis said:
Has anyone actually managed to print non-English text by using wcout
or wprintf and the rest of standard, wide character functions?


For example:

[john@localhost src]$ cat main.cc
#include <iostream>

int main()
{
using namespace std;

wcout<< L"Äïêéìáóôéêü ìÞíõìá\n";
}

[john@localhost src]$ ./foobar-cpp
??????????? ??????

[john@localhost src]$

Hmmm... I work almost entirely in English, so this error message is new
to me:

$ make
g++ -ansi -pedantic -Wall main.cc -o main
main.cc: In function 'int main()':
main.cc:4: error: converting to execution character set: Invalid or
incomplete multibyte or wide character
make: *** [main] Error 1


I tried the same:

[john@localhost src]$ g++ -ansi -pedantic-errors -Wall main.cc -o
foobar-cpp

[john@localhost src]$


Perhaps when you copy and paste the greek text, you copy garbage (that
is, not viewing the message in the correct character set in your
newsgroup reader).


So, I repost the code in this message which is encoded to Unicode (UTF-8):


#include <iostream>

int main()
{
using namespace std;

wcout<< L"Δοκιμαστικό μήνυμα\n";
}
 
J

Jeff Schwab

Ioannis said:
Jeff said:
Ioannis said:
Ioannis Vranos wrote:
Has anyone actually managed to print non-English text by using wcout
or wprintf and the rest of standard, wide character functions?


For example:

[john@localhost src]$ cat main.cc
#include <iostream>

int main()
{
using namespace std;

wcout<< L"Äïêéìáóôéêü ìÞíõìá\n";
}

[john@localhost src]$ ./foobar-cpp
??????????? ??????

[john@localhost src]$

Hmmm... I work almost entirely in English, so this error message is
new to me:

$ make
g++ -ansi -pedantic -Wall main.cc -o main
main.cc: In function 'int main()':
main.cc:4: error: converting to execution character set: Invalid or
incomplete multibyte or wide character
make: *** [main] Error 1


I tried the same:

[john@localhost src]$ g++ -ansi -pedantic-errors -Wall main.cc -o
foobar-cpp

[john@localhost src]$


Perhaps when you copy and paste the greek text, you copy garbage (that
is, not viewing the message in the correct character set in your
newsgroup reader).


So, I repost the code in this message which is encoded to Unicode (UTF-8):


#include <iostream>

int main()
{
using namespace std;

wcout<< L"Δοκιμαστικό μήνυμα\n";
}

Thanks, you were correct.

Here's what I thought was "supposed" to be the portable solution:

#include <iostream>
#include <locale>

int main() {
std::wcout.imbue(std::locale("el_GR.UTF-8"));
std::wcout << L"Δοκιμαστικό μήνυμα\n";
}

However, my system still shows question marks for this. For whatever
it's worth, here's the (probably incorrect) way that appears to work on
my system:

#include <iostream>
#include <locale>

int main() {
std::cout.imbue(std::locale(""));
std::cout << "Δοκιμαστικό μήνυμα\n";
}
 
I

Ioannis Vranos

Jeff said:
Thanks, you were correct.

Here's what I thought was "supposed" to be the portable solution:

#include <iostream>
#include <locale>

int main() {
std::wcout.imbue(std::locale("el_GR.UTF-8"));
std::wcout << L"Δοκιμαστικό μήνυμα\n";
}

However, my system still shows question marks for this. For whatever
it's worth, here's the (probably incorrect) way that appears to work on
my system:

#include <iostream>
#include <locale>

int main() {
std::cout.imbue(std::locale(""));
std::cout << "Δοκιμαστικό μήνυμα\n";
}


"Strangely" these also happen to my Linux box with "gcc version 4.1.2
20070626".

cout prints Greek without the L notation to the string literal.

The same with wcout prints an empty line.

The same with wcout and L notation prints question marks.


This made me think to use plain cout, and it also works:


#include <iostream>

int main()
{
std::cout << "Δοκιμαστικό μήνυμα\n";
}

also prints the Greek message.


Seeing this I am assuming char is implemented as unsigned char and this
is working because Greek is provided in the extended ASCII character set
(values 128-255) supported by my system (I have set the regional
settings under GNOME etc). However why does this also work for you?


The code


#include <iostream>
#include <limits>

int main()
{
using namespace std;

cout<< static_cast<int>( numeric_limits<char>::max() )<< endl;
}

produces in my system:

[john@localhost src]$ ./foobar-cpp
127

[john@localhost src]$


so I am wrong, char is implemented as signed char, and no extended ASCII
takes place.


Strange.
 
I

Ioannis Vranos

Based on the MSDN example:


// basic_ios_imbue.cpp
// compile with: /EHsc
#include <iostream>
#include <locale>

int main( )
{
using namespace std;

cout.imbue( locale( "french_france" ) );
double x = 1234567.123456;
cout << x << endl;
}


that doesn't work in my GCC, this works:

#include <iostream>
#include <limits>

int main()
{
using namespace std;

cout.imbue( locale( "greek" ) );

cout<< "Δοκιμαστικό\n";
}


This also works:

#include <iostream>
#include <limits>

int main()
{
using namespace std;

cout.imbue( locale( "en_US" ) );

cout<< "Δοκιμαστικό\n";
}




Crazy stuff.
 
I

Ioannis Vranos

It looks like GCC has the opposite stuff, cout, cin, string work as
wcout, wcin, wstring and vice versa! Bug?



#include <iostream>

int main()
{
using namespace std;

wstring ws;

wcin>> ws;

cout<< ws.size()<< endl;
}



[john@localhost src]$ ./foobar-cpp
Δοκιμαστικό
0
[john@localhost src]$



#include <iostream>

int main()
{
using namespace std;

string s;

cin>> s;

cout<< s.size()<< endl;
}


[john@localhost src]$ ./foobar-cpp
Δοκιμαστικό
22
[john@localhost src]$


#include <iostream>

int main()
{
using namespace std;

string s;

cin>> s;

cout<< s<< endl;
}


[john@localhost src]$ ./foobar-cpp
Δοκιμαστικό
Δοκιμαστικό
[john@localhost src]$



#include <iostream>

int main()
{
using namespace std;

wstring ws;

wcin>> ws;

wcout<< ws<< endl;
}


[john@localhost src]$ ./foobar-cpp
Δοκιμαστικό

[john@localhost src]$



#include <iostream>

int main()
{
using namespace std;

cout<< "Δοκιμαστικό-11\n";

wcout<< "Δοκιμαστικό-22\n";

cout<< L"Δοκιμαστικό-33\n";

wcout<< L"Δοκιμαστικό-44\n";
}


[john@localhost src]$ ./foobar-cpp
Δοκιμαστικό-11
-22
0x80488c8���������-44
[john@localhost src]$



Conclusion: It appears GCC has the wide character stuff messed up, or I
am missing important knowledge.
 
J

Jeff Schwab

Ioannis said:
It looks like GCC has the opposite stuff, cout, cin, string work as
wcout, wcin, wstring and vice versa! Bug? ....
Conclusion: It appears GCC has the wide character stuff messed up, or I
am missing important knowledge.

You and me both. I would be very surprised if this were a GCC bug (I'm
using 4.2.4 pre-release), but I'm guessing somebody here knows a lot
more about this than we do, and is willing to enlighten us. :)
 
A

Alf P. Steinbach

* Jeff Schwab:
You and me both. I would be very surprised if this were a GCC bug (I'm
using 4.2.4 pre-release), but I'm guessing somebody here knows a lot
more about this than we do, and is willing to enlighten us. :)

As has been remarked else-thread, by Rolf Magnus, one issue, relevant
for literal strings, is the compiler's translation (or lack of
translation) of the source code text's character set to the execution
character set.

Ans as has also been remarked else-thread, by Boris, one issue, relevant
for i/o, is that the wide character streams convert to and from narrow
characters. wcout converts to narrow characters, and wcin converts from
narrow characters. They're not wide character streams, they're wide
character converters.

Assuming no issue with translation from source code character set to
execution character set, if you use only the narrow character streams
you avoid most translation. There's still translation of newlines and
possibly other characters (e.g. Ctrl Z in Windows). Thus, using UTF-8
source code and UTF-8 execution environment character set, and (mostly)
non-translating narrow character streams, everything should work swimmingly.

Another reason to avoid the wide character streams is that they're not
supported by the MingW Windows port of g++.

At least, not in the version I have.

And as I understand it UTF-8 is the usual in the *nix world.

For an interactive Windows program, you can set the console's narrow
character stream translation (to/from UCS2, which is what a console
window uses internally) temporarily to UTF-8 via Windows' console API
functions.


Disclaimer: I've never tried this for greek text + UTF-8 encoding,
because I've not had to deal with that particular issue.

Cheers, & hth.,

- Alf
 
J

Jeff Schwab

Alf said:
* Jeff Schwab:

As has been remarked else-thread, by Rolf Magnus, one issue, relevant
for literal strings, is the compiler's translation (or lack of
translation) of the source code text's character set to the execution
character set.

A good point. I know my source is in UTF-8. I don't know what
influences the execution character set, or how to tweak it.

Ans as has also been remarked else-thread, by Boris, one issue, relevant
for i/o, is that the wide character streams convert to and from narrow
characters. wcout converts to narrow characters, and wcin converts from
narrow characters. They're not wide character streams, they're wide
character converters.

Clear as mud. :)
 
I

Ioannis Vranos

Alf said:
* Jeff Schwab:

As has been remarked else-thread, by Rolf Magnus, one issue, relevant
for literal strings, is the compiler's translation (or lack of
translation) of the source code text's character set to the execution
character set.


There isn't such issue here, cout prints Greek literal correctly and
wcout not. Also cin and string read and store Greek text correctly while
wcin and wstring look like they do not work for Greek text input.

Ans as has also been remarked else-thread, by Boris, one issue, relevant
for i/o, is that the wide character streams convert to and from narrow
characters. wcout converts to narrow characters, and wcin converts from
narrow characters. They're not wide character streams, they're wide
character converters.

I am not sure I understand this.

Isn't L"some text" a wide character string literal? Don't wcout, wcin
and wstring provide operator<< and operator>> overloads for wide
characters and wide character strings?

Assuming no issue with translation from source code character set to
execution character set, if you use only the narrow character streams
you avoid most translation.


What do you mean by "narrow character" streams? char streams right?

There's still translation of newlines and
possibly other characters (e.g. Ctrl Z in Windows). Thus, using UTF-8
source code and UTF-8 execution environment character set, and (mostly)
non-translating narrow character streams, everything should work
swimmingly.

Another reason to avoid the wide character streams is that they're not
supported by the MingW Windows port of g++.


This is irrelevant. MINGW's problems are MINGW problems, I am using GCC
under Linux (Scientific Linux 5.1 which is essentially Red Hat
Enterprise Linux 5.1 source code recompiled, like CentOS - give them a try).

Also I have MS Visual C++ 2008 Express installed.

At least, not in the version I have.

And as I understand it UTF-8 is the usual in the *nix world.

For an interactive Windows program, you can set the console's narrow
character stream translation (to/from UCS2, which is what a console
window uses internally) temporarily to UTF-8 via Windows' console API
functions.


Disclaimer: I've never tried this for greek text + UTF-8 encoding,
because I've not had to deal with that particular issue.


Can you pinpoint where our code is wrong? Essentially the following:
#include <iostream>
#include <string>

int main()
{
using namespace std;

wcout<< "Give wide character input: ";

wstring ws;

wcin>> ws;

wcout<< "You gave: "<< ws << endl;
}


It produces:

[john@localhost src]$ ./foobar-cpp
Give wide character input: Δοκιμαστικό
You gave:
[john@localhost src]$



while the code:

#include <iostream>
#include <string>

int main()
{
using namespace std;

cout<< "Give wide character input: ";

string s;

cin>> s;

cout<< "You gave: "<< s << endl;
}


produces:

[john@localhost src]$ ./foobar-cpp
Give wide character input: Δοκιμαστικό
You gave: Δοκιμαστικό
[john@localhost src]$
 
I

Ioannis Vranos

I posted the following to c.l.c., and I think it is useful to post it
here too:


[The current message encoding is set to Unicode (UTF-8) because it
contains Greek]


The following code does not work as expected:


#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "Greek" );

wchar_t input[50];

if (!p)
printf("NULL returned!\n");

fgetws(input, 50, stdin);

wprintf(L"%s\n", input);

return 0;
}


Under Linux:


[john@localhost src]$ ./foobar-cpp
Test
T
[john@localhost src]$


[john@localhost src]$ ./foobar-cpp
Δοκιμαστικό
�
[john@localhost src]$




Under MS Visual C++ 2008 Express:

Test
Test

Press any key to continue . . .


Δοκιμαστικό
??????ε????

Press any key to continue . . .


Am I missing something?
 
J

James Kanze

Ioannis said:
Ioannis said:
Has anyone actually managed to print non-English text by
using wcout or wprintf and the rest of standard, wide
character functions?
For example:
[john@localhost src]$ cat main.cc
#include <iostream>
int main()
{
using namespace std;
wcout<< L"Δοκιμαστικό μήνυμα\n";
Are you sure that you stored your source file in the same
encoding the compiler expects as source character set?

Are you sure the compiler even allows anything but US ASCII as
input? The standard makes most of this implementation defined.
(Logically, if you think about it. I wouldn't expect any of my
files to compile without being transcoded on a machine which
uses EBCDIC.)

Before going any further, we have to know 1) how the Greek
characters are encoded. (Probably UTF-8, since that what my
editor is configured for, and I'm seeing them correctly.) And
which compiler he's using, which options, and what the compiler
documentation says about input file encodings. Most likely,
he'll have to ask in a group for his compiler what it accepts,
and how to make it accept what he's got.
 
J

James Kanze

Thanks, you were correct.
Here's what I thought was "supposed" to be the portable solution:
#include <iostream>
#include <locale>
int main() {
std::wcout.imbue(std::locale("el_GR.UTF-8"));
std::wcout << L"Δοκιμαστικό μήνυμα\n";
}
However, my system still shows question marks for this. For
whatever it's worth, here's the (probably incorrect) way that
appears to work on my system:
#include <iostream>
#include <locale>
int main() {
std::cout.imbue(std::locale(""));
std::cout << "Δοκιμαστικό μήνυμα\n";
}

You're still not telling us a lot of important information.
What is the actual encoding used in the source file, and what
are the bytes actually output. (FWIW: I think g++, and most
other compilers, just pass the bytes through transparently in a
narrow character string. Which means that your second code will
output whatever your editor put in the source file. If you're
using the same encoding everywhere, it will seem to work.)

Note that there isn't really any portable solution, because so
much depends on things the C++ compiler has no control over.
Run the same code in two different xterm, and it can output two
different things, completely; just specify a different font
(option -fn) with a different encoding for one of the xterm.
(And of course, it's pretty much par for the course to see one
thing when you cat to the screen, and something else when you
output the same file to the printer.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top