ansi c compiler character encoding

Discussion in 'C Programming' started by Andreas Lundgren, Aug 18, 2008.

  1. Hi!

    Is it determined that the C standard compiler always encode characters
    with the same character excoding? If for example the functions Foo and
    Bar are compiled by different compilers, is it unambiguous how to
    interpret the character string in Bar?

    Does string.h expect a specific string format?

    void Foo(void)
    {
    char myTextString[11] = "stuvxyzåäö";
    Bar(myTextString);
    }

    void Bar(char* inp)
    {
    What character set to expect?
    }
    Andreas Lundgren, Aug 18, 2008
    #1
    1. Advertising

  2. Andreas Lundgren

    James Kuyper Guest

    Andreas Lundgren wrote:
    > Hi!
    >
    > Is it determined that the C standard compiler always encode characters
    > with the same character excoding?


    No.
    James Kuyper, Aug 18, 2008
    #2
    1. Advertising

  3. On 18 Aug, 14:07, James Kuyper <> wrote:
    > Andreas Lundgren wrote:
    > > Hi!

    >
    > > Is it determined that the C standard compiler always encode characters
    > > with the same character excoding?

    >
    > No.


    Shit.

    /Andreas ;-)
    Andreas Lundgren, Aug 18, 2008
    #3
  4. Andreas Lundgren <> writes:
    > Is it determined that the C standard compiler always encode characters
    > with the same character excoding? If for example the functions Foo and
    > Bar are compiled by different compilers, is it unambiguous how to
    > interpret the character string in Bar?
    >
    > Does string.h expect a specific string format?
    >
    > void Foo(void)
    > {
    > char myTextString[11] = "stuvxyzåäö";
    > Bar(myTextString);
    > }
    >
    > void Bar(char* inp)
    > {
    > What character set to expect?
    > }


    No.

    But if the two compilers are being used on the same system, it's very
    likely that they'll use the same encoding. Since you're calling one
    function from the other, presumably you're using the compilers on the
    same system and linking the resulting code into a single executable or
    equivalent.

    Typically a given operating system will impose representations for
    certain things. Though this is outside the scope of the C standard,
    it's in the best interest of compiler writers to make their generate
    code work and play well with that of other compilers. (For example, a
    C compiler for Linux that generates code that's incompatible with code
    generated by gcc wouldn't be very useful.)

    This goes far beyond character set issues and includes things like
    integer and floating-point type representations and function calling
    conventions.

    Your later followup suggests that you're concerned about some
    real-world situation, presumably on some specific system. You should
    ask in a newsgroup that deals with that system.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Aug 18, 2008
    #4
  5. Keith Thompson <> wrote:
    > Andreas Lundgren <> writes:
    > > Is it determined that the C standard compiler always encode characters
    > > with the same character excoding? If for example the functions Foo and
    > > Bar are compiled by different compilers, is it unambiguous how to
    > > interpret the character string in Bar?
    > >
    > > Does string.h expect a specific string format?
    > >
    > > void Foo(void)
    > > {
    > > char myTextString[11] = "stuvxyzåäö";
    > > Bar(myTextString);
    > > }
    > >
    > > void Bar(char* inp)
    > > {
    > > What character set to expect?
    > > }


    > No.


    > But if the two compilers are being used on the same system, it's very
    > likely that they'll use the same encoding. Since you're calling one
    > function from the other, presumably you're using the compilers on the
    > same system and linking the resulting code into a single executable or
    > equivalent.


    Is it actually a question about the compiler at all? As far as
    I can see the compiler will happily create a string literal with
    whatever there is in the string, not caring a bit about the en-
    coding of the string. I guess the problem is much more one of
    how the source files are generated and the expectations of the
    output medium.

    Consider the case of using one editor for the first file, set
    to output files in e.g. one of the different (and incompatible)
    russian extended ASCII code pages, and the second file genera-
    ted with another editor, set to output in a different encoding.
    Even if you use the same compiler this should lead to trouble.
    And if then the terminal that receives the output of the pro-
    gram is set to a third encoding it becomes a complete mess;-)

    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
    Jens Thoms Toerring, Aug 18, 2008
    #5
  6. On Aug 18, 7:48 am, Andreas Lundgren <> wrote:
    > Hi!
    >
    > Is it determined that the C standard compiler always encode characters
    > with the same character excoding? If for example the functions Foo and
    > Bar are compiled by different compilers, is it unambiguous how to
    > interpret the character string in Bar?


    No, it does not depends on the compiler...

    >
    > Does string.h expect a specific string format?
    >
    > void Foo(void)
    > {
    > char myTextString[11] = "stuvxyzåäö";


    Here, instead of char, try with wchar_t and
    related functions if you are using unicode
    for your messages and your .c files

    > Bar(myTextString);
    >
    > }
    >
    > void Bar(char* inp)
    > {
    > What character set to expect?


    Thats depends on the user environment, but if the
    user environments is using unicode, you can expect no
    more than an array of bytes, other case is with
    wchar_t and related functions...

    >
    > }


    Regards,
    DMW
    Daniel Molina Wegener, Aug 18, 2008
    #6
  7. Andreas Lundgren

    Guest

    Daniel Molina Wegener wrote:
    > On Aug 18, 7:48 am, Andreas Lundgren <> wrote:
    > > Hi!
    > >
    > > Is it determined that the C standard compiler always encode characters
    > > with the same character excoding? If for example the functions Foo and
    > > Bar are compiled by different compilers, is it unambiguous how to
    > > interpret the character string in Bar?

    >
    > No, it does not depends on the compiler...
    >
    > >
    > > Does string.h expect a specific string format?
    > >
    > > void Foo(void)
    > > {
    > > char myTextString[11] = "stuvxyz���";

    >
    > Here, instead of char, try with wchar_t and
    > related functions if you are using unicode
    > for your messages and your .c files


    Whether or not wchar_t has anything to do with unicode depends upon
    the compiler; the standard makes no such requirement. When it does,
    the way in which you can take advantage of that fact depends upon the
    compiler as well.
    , Aug 18, 2008
    #7
  8. Andreas Lundgren

    Flash Gordon Guest

    Daniel Molina Wegener wrote, On 18/08/08 18:29:
    > On Aug 18, 7:48 am, Andreas Lundgren <> wrote:
    >> Hi!
    >>
    >> Is it determined that the C standard compiler always encode characters
    >> with the same character excoding? If for example the functions Foo and
    >> Bar are compiled by different compilers, is it unambiguous how to
    >> interpret the character string in Bar?

    >
    > No, it does not depends on the compiler...


    You are wrong. See the replies others posted before you for details.

    >> Does string.h expect a specific string format?
    >>
    >> void Foo(void)
    >> {
    >> char myTextString[11] = "stuvxyzåäö";

    >
    > Here, instead of char, try with wchar_t and
    > related functions if you are using unicode
    > for your messages and your .c files
    >
    >> Bar(myTextString);
    >>
    >> }
    >>
    >> void Bar(char* inp)
    >> {
    >> What character set to expect?

    >
    > Thats depends on the user environment,


    Wrong. It depends on what the function is written to expect and
    (assuming the function expects a simple C string, which is likely) on
    the encoding the implementation expects.

    Actually, the expected encodings for standard C library functions which
    handle strings and characters can be changed at run-time using the
    setlocale() function, so it could also depend on what the program has
    done before calling this function.

    > but if the
    > user environments is using unicode, you can expect no
    > more than an array of bytes,


    Not necessarily.

    > other case is with
    > wchar_t and related functions...


    For a start, an array of wchar_t is not simply an array of bytes.

    >> }

    --
    Flash Gordon
    Flash Gordon, Aug 18, 2008
    #8
  9. In article <> Andreas Lundgren <> writes:
    > A simple example may be the letter =D6 that in ASCII is represented by
    > the number 153, but in ISO-8859-1 and Unicode is represented by the
    > number 214.


    That letter is not represented in ASCII. ASCII contains the code points
    0 to 127, no more.
    --
    dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
    home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
    Dik T. Winter, Aug 21, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. raavi
    Replies:
    2
    Views:
    908
    raavi
    Mar 2, 2006
  2. No Spam
    Replies:
    7
    Views:
    556
    Micah Cowan
    Jan 4, 2005
  3. Replies:
    1
    Views:
    481
  4. Replies:
    11
    Views:
    1,065
    Keith Thompson
    Apr 28, 2008
  5. Frank Iannarilli

    pre-ansi to ansi c++ conversion?

    Frank Iannarilli, Jul 21, 2009, in forum: C++
    Replies:
    2
    Views:
    410
Loading...

Share This Page