French characters not recognised in C?

E

Ess355

Hi,

In the debugger at run time, characters like é are not recognised by
their normal ASCII number, but something like -8615722... . I've seen
this number before, it means "rubbish" right?

So how can I possible modify my program so that french characters get
recognised?

Thanks in advance,
Ehsan.
 
A

Arthur J. O'Dwyer

In the debugger at run time, characters like é are not recognised by
their normal ASCII number, but something like -8615722... .

That doesn't make a whole lot of sense. What do you mean, "characters
....are not recognized by their normal ASCII number"? First of all,
é doesn't *have* an ASCII number. Second, assuming you've
picked an encoding somehow and you're expecting to see é displayed
correctly, what's going wrong?
Do you type é at the keyboard and your program doesn't recognize
it?
Do you type é in your source code and it doesn't display
correctly?
Do you type é in your source code and it refuses to compile at
all?

In general, the C programming language only deals with a very restricted
"basic character set," which doesn't contain things like é. If
you want to display or process that sort of input or output, you'll need
to either find a compiler with nice language support; find a library that
handles your national encoding(s) or Unicode; or roll your own library.
'wchar_t' and the wchar functions might be useful to you, too; read the
manpages for them or Google 'wchar_t manpage' for details.
So how can I possible modify my program so that french characters get
recognised?

Depending on what exactly your problem is, you might try:

* Posting to fr.comp.lang.c or another French-language group.
* Getting a better compiler.
* Using 'wchar_t' in place of 'char'.
* Using a translation library that can convert between French encodings
and a useful ASCII encoding of the same text, e.g.: é -> \'e

If you post a complete, compilable, minimal program that demonstrates
the problem, someone here might be able to help you more. But
fr.comp.lang.c sounds like a better bet to me.

HTH,
-Arthur
 
M

Michael B Allen

In the debugger at run time, characters like é are not recognised by
their normal ASCII number, but something like -8615722... . I've seen
this number before, it means "rubbish" right?

So how can I possible modify my program so that french characters get
recognised?

By default, most platforms (all?) will execute programs in the "C"
locale which only supports ASCII. ASCII is a 7bit encoding/charset that
does not support european characters. You might try adding a call to
setlocale like:

setlocale(LC_CTYPE, "");

This will check some environment variables to determine the locale
your running in. You can force a specific locale like setlocal(LC_ALL,
"fr_FR") but you may or may not want to do that depending on the source
of the characters.

Or you might need to run the debugger in a different locale. For example
on Unix systems a very simple way to run a program in a different locale
is by preceeding the command with an environment variable like:

$ LC_CTYPE=fr_CA dbug ./myproggie

Mike
 
D

Dan Pop

In said:
By default, most platforms (all?) will execute programs in the "C"
locale which only supports ASCII.

Nope. By default most platforms will use one 8-bit extension to ASCII or
another in the "C" locale. The others will use one EBCDIC flavour (code
page) or another. In principle, one could attach a KSR-33 to a serial
port (and figure out how to set the speed of that port to 110 bps), just
to prove me wrong ;-)

This can be easily tested with a trivial program like this:

#include <stdio.h>

int main()
{
printf("\376\375\374 Hello world\n");
return 0;
}
ASCII is a 7bit encoding/charset that
does not support european characters. You might try adding a call to
setlocale like:

setlocale(LC_CTYPE, "");

You're really naive if you believe that this will change the character
set used by the implementation. It will merely change the behaviour of
certain functions that are affected by the current locale.

In practice, it is the user's job to select a character set suitable for
his locale and to set the default native locale accordingly.
This will check some environment variables to determine the locale
your running in. You can force a specific locale like setlocal(LC_ALL,
"fr_FR") but you may or may not want to do that depending on the source
of the characters.

1. Where did you get the idea that "fr_FR" is a valid locale name from?
May I have the chapter and verse?

2. If the user has a Russian terminal, selecting a French locale won't
make Latin-1 characters appear as intended.
Or you might need to run the debugger in a different locale. For example
on Unix systems a very simple way to run a program in a different locale
is by preceeding the command with an environment variable like:

$ LC_CTYPE=fr_CA dbug ./myproggie

Let's see:

fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie
LC_CTYPE=fr_CA: Command not found.

Doesn't Linux count as a Unix system any more? ;-)

The issue is very simple in practice, but extremely difficult to describe
in terms of what the C standard actually says. Each new C programmer
should to a bit of experimenting, using programs like the one shown above,
to see what happens when values above 127 (and, for pragmatic reasons,
the range 128 - 159 should be avoided) are used as (unsigned) character
values.

Dan
 
M

Michael B Allen

Nope. By default most platforms will use one 8-bit extension to ASCII
or another in the "C" locale. The others will use one EBCDIC flavour
(code page) or another. In principle, one could attach a KSR-33 to a
serial port (and figure out how to set the speed of that port to 110
bps), just to prove me wrong ;-)

This can be easily tested with a trivial program like this:

#include <stdio.h>

int main()
{
printf("\376\375\374 Hello world\n"); return 0;
}

Why do you think this will give you the default behavior? If you run
this on a fancy machine with extravagant libraries and locales available
it will likely give you different results depending on what the default
locale is. On my system this will print Latin1.
You're really naive if you believe that this will change the character
set used by the implementation. It will merely change the behaviour of
certain functions that are affected by the current locale.

What do you mean by "used by the implementation"? The OP said "at run
time". On my system if I do:

$ LANG=en_US.UTF-8 ./myproggie

it indeed changes the behavior of how characters are interpreted
at runtime. I said nothing about the charset or encoding used by the
compiler or how string literal are stored in binaries.
Let's see:

fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
Command not found.

Doesn't Linux count as a Unix system any more? ;-)

Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
not get too pedantic about it. You've embarrassed yourself enough by
acknowledging you use C shell :->

Mike
 
R

Richard Bos

[ Quoting was buggered up-stream; the next bit is by Michael B Allen. ]
Why do you think this will give you the default behavior?

It must, if compiled in ISO C mode. All programs start in the "C"
locale. Even so...
If you run this on a fancy machine with extravagant libraries and
locales available it will likely give you different results depending
on what the default locale is. On my system this will print Latin1.

....even so, the char types must be at least 8-bit, which means that
plain ASCII, being 7-bit, is out of the race from the start. Your
default character set _must_ be either an (at least 8-bit) extension to
ASCII, or something else entirely (most usually EBCDIC, which itself is
rare enough, but not entirely unheard of).
IOW, Dan's '\376' et al. must specify a valid member of the character
set, even though they are not part of ASCII.
Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
not get too pedantic about it. You've embarrassed yourself enough by
acknowledging you use C shell :->

And what other shell did you expect to see used in _this_ newsgroup,
then <g>?

Richard
 
D

Dan Pop

In said:
Why do you think this will give you the default behavior? If you run
this on a fancy machine with extravagant libraries and locales available
it will likely give you different results depending on what the default
locale is.

Because this program runs in the "C" locale, reagrdless of what the
default locale is. It's the default font/character set that will
determine it's output, not the default locale. I can set the default
locale to an English locale using Latin1, but if the font currently
used by the terminal where the program generates its output is Latin2,
I'm not going to see Latin1 output.
On my system this will print Latin1.

More likely, it will simply output some character codes and let an entity
external to the implementation to decide what character set to use.

On my system, I can switch between Latin1 and Latin2 fonts in an
xterm window with the mouse. Therefore, I can alter the program output
even *after* running the program, by selecting another font for that
window. The only invariant is the character codes output by the program.
This is *not* a locale issue at all.
What do you mean by "used by the implementation"? The OP said "at run
time". On my system if I do:

$ LANG=en_US.UTF-8 ./myproggie

it indeed changes the behavior of how characters are interpreted
at runtime.

But does it have *any* effect on what appears on your screen?
I said nothing about the charset or encoding used by the
compiler or how string literal are stored in binaries.


Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
not get too pedantic about it.

Confusing Unix features and shell features is quite embarrassing, for a
Unix user...
You've embarrassed yourself enough by acknowledging you use C shell :->

I am NOT using C shell ;-)

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top