Umlaut letters in C++

Discussion in 'C++' started by Pekka Jarvela, Apr 28, 2004.

  1. I am using Visual Studio C++ .NET and when I try to print words with
    umlaut letters, for instance

    printf("Pässinpää-ääliö");

    letters with dots over them, äö, will not be printed correctly on the
    screen. I tried the trick

    #ifdef _UNICODE
    int wmain(void)
    #else
    int main(void)
    #endif

    but it didn't help. How can I get printf to produce umlaut letters
    correctly?

    Pekka
    Pekka Jarvela, Apr 28, 2004
    #1
    1. Advertising

  2. Pekka Jarvela wrote:
    > I am using Visual Studio C++ .NET and when I try to print words with
    > umlaut letters, for instance
    >
    > printf("Pässinpää-ääliö");
    >
    > letters with dots over them, äö, will not be printed correctly on the
    > screen.

    <snip>

    Assuming you mean it's printing 'different' characters, it's a character
    set issue.

    Windows uses the ANSI character set (with a few additions), at least
    when it isn't using Unicode.

    Your program is obviously running in a DOS window. DOS uses the IBM
    character set (one of various versions thereof). So what you are
    probably seeing is the IBM characters with the same codes as the ANSI
    characters you're typing in your (presumably) Windows-based editor.

    Look up the codes here:

    http://www.i18nguy.com/unicode/codepages.html#ibmdos

    Stewart.

    --
    My e-mail is valid but not my primary mailbox, aside from its being the
    unfortunate victim of intensive mail-bombing at the moment. Please keep
    replies on the 'group where everyone may benefit.
    Stewart Gordon, Apr 28, 2004
    #2
    1. Advertising

  3. Pekka Jarvela

    JKop Guest

    OFF TOPIC: UMLAT

    Pekka Jarvela posted:

    > I am using Visual Studio C++ .NET and when I try to print words with
    > umlaut letters, for instance
    >
    > printf("Pässinpää-ääliö");
    >
    > letters with dots over them, äö, will not be printed correctly on the
    > screen. I tried the trick
    >
    > #ifdef _UNICODE
    > int wmain(void)
    > #else
    > int main(void)
    > #endif
    >
    > but it didn't help. How can I get printf to produce umlaut letters
    > correctly?
    >
    > Pekka



    Firstly,

    Windows 95 -> Windows ME were all ANSI, ie. 8-bit charachters = 255 possible
    different charachters. If you wanted foreign charachters, eg. Arabic,
    Chinese, then you had to install a different codepage. You'd to switch
    between codepages and could not display them both at once.


    All versions of Windows NT, including Windows 2000 were Unicode, ie. 16-Bit
    characters = 65,535 possible different charachters.


    With Windows XP came hope, all versions are Unicode, both home and
    professional edition.


    But still, here comes a bit of irony: On my system, WinXP Professional, the
    following

    MessageBoxA(blah,"€6.72",blah,blah); //ANSI version


    works perfectly, ie. the euro sign _is_ displayed, but:


    MessageBoxW(blah,L"€6.72",blah,blah); //Unicode version


    does _not_ display the euro sign!!

    --

    Umlated charachters _are_ included in ANSI, so I presume your problemo may
    simply be that the umlated charachters are _not_ in the font you're using.
    Try changing font.
    JKop, Apr 28, 2004
    #3
  4. Pekka Jarvela

    josh Guest

    Re: OFF TOPIC: UMLAT

    JKop <> wrote in news:GFRjc.5944$:

    > Pekka Jarvela posted:
    >> I am using Visual Studio C++ .NET and when I try to print words with
    >> umlaut letters, for instance
    >>
    >> printf("Pässinpää-ääliö");
    >>
    >> letters with dots over them, äö, will not be printed correctly on the
    >> screen. I tried the trick

    [...]
    > Windows 95 -> Windows ME were all ANSI, ie. 8-bit charachters = 255
    > possible different charachters. If you wanted foreign charachters, eg.
    > Arabic, Chinese, then you had to install a different codepage. You'd
    > to switch between codepages and could not display them both at once.


    Not quite. Win9x use multi-byte character sets in some locales, certainly
    Chinese. So you can have more than 256 characters, but each character can
    take more than one char.

    Also, if you've got "Microsoft Layer for Unicode" installed, you can use
    Unicode on Win9x.

    > All versions of Windows NT, including Windows 2000 were Unicode, ie.
    > 16-Bit characters = 65,535 possible different charachters.
    >
    > With Windows XP came hope, all versions are Unicode, both home and
    > professional edition.


    XP is NT. Internally, everything is done with UCS-2, but applications
    compiled for ANSI still get ANSI of some flavor.

    Practically, it doesn't matter too much for the application.

    > But still, here comes a bit of irony: On my system, WinXP
    > Professional, the following
    >
    > MessageBoxA(blah,"€6.72",blah,blah); //ANSI version
    >
    > works perfectly, ie. the euro sign _is_ displayed, but:
    >
    > MessageBoxW(blah,L"€6.72",blah,blah); //Unicode version
    >
    > does _not_ display the euro sign!!


    This is because your source code is ANSI, so you're entering the euro
    symbol using the Microsoft-specific code 128. In ANSI mode, Windows maps
    that to the appropriate Unicode codepoint 0x20AC before displaying it.
    I'd guess that in Unicode mode, the compiler naively maps that to Unicode
    codepoint 0x0080, which is not the euro symbol.

    Try using '\x20AC' in the Unicode version.

    > Umlated charachters _are_ included in ANSI, so I presume your problemo
    > may simply be that the umlated charachters are _not_ in the font
    > you're using. Try changing font.


    Or not in the ANSI codepage you're using. Actually, in console windows,
    it tends to use the OEM codepage, which will distinct from any ANSI
    codepage. (in particular the one that the IDE is probably using)

    I would recommend not using non-ASCII characters in source code, and in
    console windows.

    And if you just want to make it work now, look for a font that uses the
    OEM codepage, like Terminal or Lucida ConsoleP (note the P), in your
    editor. (or in charmap, since it may be hard to enter accented characters
    in the OEM codepage)

    #include <cstdio>

    int main(void)
    {
    printf("P\204ssinp\204\204-\204\204li\224\n");
    printf("P\344ssinp\344\344-\344\344li\366\n");
    return 0;
    }

    At the console window, the first line will be correct. Piped to a file
    and opened in notepad, at least with the "Windows: Western" (almost ISO-
    8859-1) codepage, the second line will be correct. (The second is
    identical to what Pekka Jarvela posted.)

    wprintf(L"P\344ssinp\344\344-\344\344li\366\n");

    should look right when opened in Unicode-capable notepad, although you'd
    need to make sure your stdout was Unicode. (I'm not really familiar with
    wprintf...)

    -josh
    josh, Apr 29, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?TWFya3VzUG9laGxlcg==?=

    Form Post looses Umlaut

    =?Utf-8?B?TWFya3VzUG9laGxlcg==?=, Jul 25, 2005, in forum: ASP .Net
    Replies:
    7
    Views:
    618
    Joerg Jooss
    Jul 26, 2005
  2. Reinier
    Replies:
    1
    Views:
    410
    Craig Deelsnyder
    Mar 31, 2006
  3. vijay
    Replies:
    10
    Views:
    2,176
    vijay
    Apr 7, 2006
  4. Merrigan
    Replies:
    4
    Views:
    558
    Chris
    Dec 14, 2007
  5. Venugopal
    Replies:
    11
    Views:
    1,497
    Tassilo v. Parseval
    Nov 5, 2003
Loading...

Share This Page