Umlaut letters in C++

P

Pekka Jarvela

I am using Visual Studio C++ .NET and when I try to print words with
umlaut letters, for instance

printf("Pässinpää-ääliö");

letters with dots over them, äö, will not be printed correctly on the
screen. I tried the trick

#ifdef _UNICODE
int wmain(void)
#else
int main(void)
#endif

but it didn't help. How can I get printf to produce umlaut letters
correctly?

Pekka
 
S

Stewart Gordon

Pekka said:
I am using Visual Studio C++ .NET and when I try to print words with
umlaut letters, for instance

printf("Pässinpää-ääliö");

letters with dots over them, äö, will not be printed correctly on the
screen.
<snip>

Assuming you mean it's printing 'different' characters, it's a character
set issue.

Windows uses the ANSI character set (with a few additions), at least
when it isn't using Unicode.

Your program is obviously running in a DOS window. DOS uses the IBM
character set (one of various versions thereof). So what you are
probably seeing is the IBM characters with the same codes as the ANSI
characters you're typing in your (presumably) Windows-based editor.

Look up the codes here:

http://www.i18nguy.com/unicode/codepages.html#ibmdos

Stewart.
 
J

JKop

Pekka Jarvela posted:
I am using Visual Studio C++ .NET and when I try to print words with
umlaut letters, for instance

printf("Pässinpää-ääliö");

letters with dots over them, äö, will not be printed correctly on the
screen. I tried the trick

#ifdef _UNICODE
int wmain(void)
#else
int main(void)
#endif

but it didn't help. How can I get printf to produce umlaut letters
correctly?

Pekka


Firstly,

Windows 95 -> Windows ME were all ANSI, ie. 8-bit charachters = 255 possible
different charachters. If you wanted foreign charachters, eg. Arabic,
Chinese, then you had to install a different codepage. You'd to switch
between codepages and could not display them both at once.


All versions of Windows NT, including Windows 2000 were Unicode, ie. 16-Bit
characters = 65,535 possible different charachters.


With Windows XP came hope, all versions are Unicode, both home and
professional edition.


But still, here comes a bit of irony: On my system, WinXP Professional, the
following

MessageBoxA(blah,"€6.72",blah,blah); //ANSI version


works perfectly, ie. the euro sign _is_ displayed, but:


MessageBoxW(blah,L"€6.72",blah,blah); //Unicode version


does _not_ display the euro sign!!
 
J

josh

JKop said:
Pekka Jarvela posted:
I am using Visual Studio C++ .NET and when I try to print words with
umlaut letters, for instance

printf("Pässinpää-ääliö");

letters with dots over them, äö, will not be printed correctly on the
screen. I tried the trick
[...]
Windows 95 -> Windows ME were all ANSI, ie. 8-bit charachters = 255
possible different charachters. If you wanted foreign charachters, eg.
Arabic, Chinese, then you had to install a different codepage. You'd
to switch between codepages and could not display them both at once.

Not quite. Win9x use multi-byte character sets in some locales, certainly
Chinese. So you can have more than 256 characters, but each character can
take more than one char.

Also, if you've got "Microsoft Layer for Unicode" installed, you can use
Unicode on Win9x.
All versions of Windows NT, including Windows 2000 were Unicode, ie.
16-Bit characters = 65,535 possible different charachters.

With Windows XP came hope, all versions are Unicode, both home and
professional edition.

XP is NT. Internally, everything is done with UCS-2, but applications
compiled for ANSI still get ANSI of some flavor.

Practically, it doesn't matter too much for the application.
But still, here comes a bit of irony: On my system, WinXP
Professional, the following

MessageBoxA(blah,"€6.72",blah,blah); //ANSI version

works perfectly, ie. the euro sign _is_ displayed, but:

MessageBoxW(blah,L"€6.72",blah,blah); //Unicode version

does _not_ display the euro sign!!

This is because your source code is ANSI, so you're entering the euro
symbol using the Microsoft-specific code 128. In ANSI mode, Windows maps
that to the appropriate Unicode codepoint 0x20AC before displaying it.
I'd guess that in Unicode mode, the compiler naively maps that to Unicode
codepoint 0x0080, which is not the euro symbol.

Try using '\x20AC' in the Unicode version.
Umlated charachters _are_ included in ANSI, so I presume your problemo
may simply be that the umlated charachters are _not_ in the font
you're using. Try changing font.

Or not in the ANSI codepage you're using. Actually, in console windows,
it tends to use the OEM codepage, which will distinct from any ANSI
codepage. (in particular the one that the IDE is probably using)

I would recommend not using non-ASCII characters in source code, and in
console windows.

And if you just want to make it work now, look for a font that uses the
OEM codepage, like Terminal or Lucida ConsoleP (note the P), in your
editor. (or in charmap, since it may be hard to enter accented characters
in the OEM codepage)

#include <cstdio>

int main(void)
{
printf("P\204ssinp\204\204-\204\204li\224\n");
printf("P\344ssinp\344\344-\344\344li\366\n");
return 0;
}

At the console window, the first line will be correct. Piped to a file
and opened in notepad, at least with the "Windows: Western" (almost ISO-
8859-1) codepage, the second line will be correct. (The second is
identical to what Pekka Jarvela posted.)

wprintf(L"P\344ssinp\344\344-\344\344li\366\n");

should look right when opened in Unicode-capable notepad, although you'd
need to make sure your stdout was Unicode. (I'm not really familiar with
wprintf...)

-josh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top