French characters not recognised in C?

Discussion in 'C Programming' started by Ess355, Apr 2, 2004.

  1. Ess355

    Ess355 Guest

    Hi,

    In the debugger at run time, characters like é are not recognised by
    their normal ASCII number, but something like -8615722... . I've seen
    this number before, it means "rubbish" right?

    So how can I possible modify my program so that french characters get
    recognised?

    Thanks in advance,
    Ehsan.
     
    Ess355, Apr 2, 2004
    #1
    1. Advertising

  2. On Thu, 1 Apr 2004, Ess355 wrote:
    >
    > In the debugger at run time, characters like é are not recognised by
    > their normal ASCII number, but something like -8615722... .


    That doesn't make a whole lot of sense. What do you mean, "characters
    ....are not recognized by their normal ASCII number"? First of all,
    é doesn't *have* an ASCII number. Second, assuming you've
    picked an encoding somehow and you're expecting to see é displayed
    correctly, what's going wrong?
    Do you type é at the keyboard and your program doesn't recognize
    it?
    Do you type é in your source code and it doesn't display
    correctly?
    Do you type é in your source code and it refuses to compile at
    all?

    In general, the C programming language only deals with a very restricted
    "basic character set," which doesn't contain things like é. If
    you want to display or process that sort of input or output, you'll need
    to either find a compiler with nice language support; find a library that
    handles your national encoding(s) or Unicode; or roll your own library.
    'wchar_t' and the wchar functions might be useful to you, too; read the
    manpages for them or Google 'wchar_t manpage' for details.

    > So how can I possible modify my program so that french characters get
    > recognised?


    Depending on what exactly your problem is, you might try:

    * Posting to fr.comp.lang.c or another French-language group.
    * Getting a better compiler.
    * Using 'wchar_t' in place of 'char'.
    * Using a translation library that can convert between French encodings
    and a useful ASCII encoding of the same text, e.g.: é -> \'e

    If you post a complete, compilable, minimal program that demonstrates
    the problem, someone here might be able to help you more. But
    fr.comp.lang.c sounds like a better bet to me.

    HTH,
    -Arthur
     
    Arthur J. O'Dwyer, Apr 2, 2004
    #2
    1. Advertising

  3. On Thu, 01 Apr 2004 21:21:06 -0500, Ess355 wrote:
    > In the debugger at run time, characters like é are not recognised by
    > their normal ASCII number, but something like -8615722... . I've seen
    > this number before, it means "rubbish" right?
    >
    > So how can I possible modify my program so that french characters get
    > recognised?


    By default, most platforms (all?) will execute programs in the "C"
    locale which only supports ASCII. ASCII is a 7bit encoding/charset that
    does not support european characters. You might try adding a call to
    setlocale like:

    setlocale(LC_CTYPE, "");

    This will check some environment variables to determine the locale
    your running in. You can force a specific locale like setlocal(LC_ALL,
    "fr_FR") but you may or may not want to do that depending on the source
    of the characters.

    Or you might need to run the debugger in a different locale. For example
    on Unix systems a very simple way to run a program in a different locale
    is by preceeding the command with an environment variable like:

    $ LC_CTYPE=fr_CA dbug ./myproggie

    Mike
     
    Michael B Allen, Apr 2, 2004
    #3
  4. Ess355

    Dan Pop Guest

    In <> Michael B Allen <> writes:

    >On Thu, 01 Apr 2004 21:21:06 -0500, Ess355 wrote:
    >> In the debugger at run time, characters like é are not recognised by
    >> their normal ASCII number, but something like -8615722... . I've seen
    >> this number before, it means "rubbish" right?
    >>
    >> So how can I possible modify my program so that french characters get
    >> recognised?

    >
    >By default, most platforms (all?) will execute programs in the "C"
    >locale which only supports ASCII.


    Nope. By default most platforms will use one 8-bit extension to ASCII or
    another in the "C" locale. The others will use one EBCDIC flavour (code
    page) or another. In principle, one could attach a KSR-33 to a serial
    port (and figure out how to set the speed of that port to 110 bps), just
    to prove me wrong ;-)

    This can be easily tested with a trivial program like this:

    #include <stdio.h>

    int main()
    {
    printf("\376\375\374 Hello world\n");
    return 0;
    }

    >ASCII is a 7bit encoding/charset that
    >does not support european characters. You might try adding a call to
    >setlocale like:
    >
    > setlocale(LC_CTYPE, "");


    You're really naive if you believe that this will change the character
    set used by the implementation. It will merely change the behaviour of
    certain functions that are affected by the current locale.

    In practice, it is the user's job to select a character set suitable for
    his locale and to set the default native locale accordingly.

    >This will check some environment variables to determine the locale
    >your running in. You can force a specific locale like setlocal(LC_ALL,
    >"fr_FR") but you may or may not want to do that depending on the source
    >of the characters.


    1. Where did you get the idea that "fr_FR" is a valid locale name from?
    May I have the chapter and verse?

    2. If the user has a Russian terminal, selecting a French locale won't
    make Latin-1 characters appear as intended.

    >Or you might need to run the debugger in a different locale. For example
    >on Unix systems a very simple way to run a program in a different locale
    >is by preceeding the command with an environment variable like:
    >
    > $ LC_CTYPE=fr_CA dbug ./myproggie


    Let's see:

    fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie
    LC_CTYPE=fr_CA: Command not found.

    Doesn't Linux count as a Unix system any more? ;-)

    The issue is very simple in practice, but extremely difficult to describe
    in terms of what the C standard actually says. Each new C programmer
    should to a bit of experimenting, using programs like the one shown above,
    to see what happens when values above 127 (and, for pragmatic reasons,
    the range 128 - 159 should be avoided) are used as (unsigned) character
    values.

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
     
    Dan Pop, Apr 2, 2004
    #4
  5. On Fri, 02 Apr 2004 07:20:32 -0500, Dan Pop wrote:
    >>> In the debugger at run time, characters like é are not recognised by
    >>> their normal ASCII number, but something like -8615722... . I've seen
    >>> this number before, it means "rubbish" right?
    >>>
    >>> So how can I possible modify my program so that french characters get
    >>> recognised?

    >>
    >>By default, most platforms (all?) will execute programs in the "C"
    >>locale which only supports ASCII.

    >
    > Nope. By default most platforms will use one 8-bit extension to ASCII
    > or another in the "C" locale. The others will use one EBCDIC flavour
    > (code page) or another. In principle, one could attach a KSR-33 to a
    > serial port (and figure out how to set the speed of that port to 110
    > bps), just to prove me wrong ;-)
    >
    > This can be easily tested with a trivial program like this:
    >
    > #include <stdio.h>
    >
    > int main()
    > {
    > printf("\376\375\374 Hello world\n"); return 0;
    > }


    Why do you think this will give you the default behavior? If you run
    this on a fancy machine with extravagant libraries and locales available
    it will likely give you different results depending on what the default
    locale is. On my system this will print Latin1.

    >>ASCII is a 7bit encoding/charset that does not support european
    >>characters. You might try adding a call to setlocale like:
    >>
    >> setlocale(LC_CTYPE, "");

    >
    > You're really naive if you believe that this will change the character
    > set used by the implementation. It will merely change the behaviour of
    > certain functions that are affected by the current locale.


    What do you mean by "used by the implementation"? The OP said "at run
    time". On my system if I do:

    $ LANG=en_US.UTF-8 ./myproggie

    it indeed changes the behavior of how characters are interpreted
    at runtime. I said nothing about the charset or encoding used by the
    compiler or how string literal are stored in binaries.

    > Let's see:
    >
    > fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
    > Command not found.
    >
    > Doesn't Linux count as a Unix system any more? ;-)


    Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
    not get too pedantic about it. You've embarrassed yourself enough by
    acknowledging you use C shell :->

    Mike
     
    Michael B Allen, Apr 2, 2004
    #5
  6. Ess355

    Richard Bos Guest

    Michael B Allen <> wrote:

    > On Fri, 02 Apr 2004 07:20:32 -0500, Dan Pop wrote:


    [ Quoting was buggered up-stream; the next bit is by Michael B Allen. ]

    > >>By default, most platforms (all?) will execute programs in the "C"
    > >>locale which only supports ASCII.

    > >
    > > Nope. By default most platforms will use one 8-bit extension to ASCII
    > > or another in the "C" locale. The others will use one EBCDIC flavour
    > > (code page) or another. In principle, one could attach a KSR-33 to a
    > > serial port (and figure out how to set the speed of that port to 110
    > > bps), just to prove me wrong ;-)
    > >
    > > This can be easily tested with a trivial program like this:
    > >
    > > #include <stdio.h>
    > >
    > > int main()
    > > {
    > > printf("\376\375\374 Hello world\n"); return 0;
    > > }

    >
    > Why do you think this will give you the default behavior?


    It must, if compiled in ISO C mode. All programs start in the "C"
    locale. Even so...

    > If you run this on a fancy machine with extravagant libraries and
    > locales available it will likely give you different results depending
    > on what the default locale is. On my system this will print Latin1.


    ....even so, the char types must be at least 8-bit, which means that
    plain ASCII, being 7-bit, is out of the race from the start. Your
    default character set _must_ be either an (at least 8-bit) extension to
    ASCII, or something else entirely (most usually EBCDIC, which itself is
    rare enough, but not entirely unheard of).
    IOW, Dan's '\376' et al. must specify a valid member of the character
    set, even though they are not part of ASCII.

    > > fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
    > > Command not found.
    > >
    > > Doesn't Linux count as a Unix system any more? ;-)

    >
    > Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
    > not get too pedantic about it. You've embarrassed yourself enough by
    > acknowledging you use C shell :->


    And what other shell did you expect to see used in _this_ newsgroup,
    then <g>?

    Richard
     
    Richard Bos, Apr 5, 2004
    #6
  7. Ess355

    Dan Pop Guest

    In <> Michael B Allen <> writes:

    >On Fri, 02 Apr 2004 07:20:32 -0500, Dan Pop wrote:
    >>>> In the debugger at run time, characters like é are not recognised by
    >>>> their normal ASCII number, but something like -8615722... . I've seen
    >>>> this number before, it means "rubbish" right?
    >>>>
    >>>> So how can I possible modify my program so that french characters get
    >>>> recognised?
    >>>
    >>>By default, most platforms (all?) will execute programs in the "C"
    >>>locale which only supports ASCII.

    >>
    >> Nope. By default most platforms will use one 8-bit extension to ASCII
    >> or another in the "C" locale. The others will use one EBCDIC flavour
    >> (code page) or another. In principle, one could attach a KSR-33 to a
    >> serial port (and figure out how to set the speed of that port to 110
    >> bps), just to prove me wrong ;-)
    >>
    >> This can be easily tested with a trivial program like this:
    >>
    >> #include <stdio.h>
    >>
    >> int main()
    >> {
    >> printf("\376\375\374 Hello world\n"); return 0;
    >> }

    >
    >Why do you think this will give you the default behavior? If you run
    >this on a fancy machine with extravagant libraries and locales available
    >it will likely give you different results depending on what the default
    >locale is.


    Because this program runs in the "C" locale, reagrdless of what the
    default locale is. It's the default font/character set that will
    determine it's output, not the default locale. I can set the default
    locale to an English locale using Latin1, but if the font currently
    used by the terminal where the program generates its output is Latin2,
    I'm not going to see Latin1 output.

    >On my system this will print Latin1.


    More likely, it will simply output some character codes and let an entity
    external to the implementation to decide what character set to use.

    On my system, I can switch between Latin1 and Latin2 fonts in an
    xterm window with the mouse. Therefore, I can alter the program output
    even *after* running the program, by selecting another font for that
    window. The only invariant is the character codes output by the program.
    This is *not* a locale issue at all.

    >>>ASCII is a 7bit encoding/charset that does not support european
    >>>characters. You might try adding a call to setlocale like:
    >>>
    >>> setlocale(LC_CTYPE, "");

    >>
    >> You're really naive if you believe that this will change the character
    >> set used by the implementation. It will merely change the behaviour of
    >> certain functions that are affected by the current locale.

    >
    >What do you mean by "used by the implementation"? The OP said "at run
    >time". On my system if I do:
    >
    > $ LANG=en_US.UTF-8 ./myproggie
    >
    >it indeed changes the behavior of how characters are interpreted
    >at runtime.


    But does it have *any* effect on what appears on your screen?

    >I said nothing about the charset or encoding used by the
    >compiler or how string literal are stored in binaries.
    >
    >> Let's see:
    >>
    >> fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
    >> Command not found.
    >>
    >> Doesn't Linux count as a Unix system any more? ;-)

    >
    >Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
    >not get too pedantic about it.


    Confusing Unix features and shell features is quite embarrassing, for a
    Unix user...

    >You've embarrassed yourself enough by acknowledging you use C shell :->


    I am NOT using C shell ;-)

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
     
    Dan Pop, Apr 5, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. gusmeister

    French characters and Perl

    gusmeister, Jun 1, 2004, in forum: Perl
    Replies:
    2
    Views:
    1,505
    gusmeister
    Jun 3, 2004
  2. =?Utf-8?B?U2ltb24gV2FsbGlz?=

    French characters messed up

    =?Utf-8?B?U2ltb24gV2FsbGlz?=, Jun 15, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    551
    Natty Gur
    Jun 15, 2004
  3. =?Utf-8?B?THU=?=
    Replies:
    4
    Views:
    1,485
    Joerg Jooss
    Sep 2, 2005
  4. John C.
    Replies:
    5
    Views:
    8,011
    John C.
    Feb 24, 2006
  5. Replies:
    6
    Views:
    1,011
Loading...

Share This Page