Questions on ISO C character constants

Discussion in 'C Programming' started by Edward Rutherford, Nov 8, 2011.

  1. There are 2 points in Sec. 6.4.4.4, describing character constants
    that are not entirely clear to me. It may be that I have not understood
    correctly the issues of character encondings.

    In p10 there is the sentence "The value of an integer character
    constant containing a single character that maps to a single-byte
    execution character is the numerical value of the representation of the
    mapped character interpreted as an integer".

    This confirms that it may be that a single character of the source set
    may be mapped to multiple bytes in the execution character set (and this
    consistent with other parts of the standard). But still on p10 there is
    the sentence "If an integer character constant contains a single
    character or escape sequence, its value is the one that results when an
    object with type char whose value is that of the single character or
    escape sequence is converted to type int". This sentence seems to imply
    that the value corresponding to a single character (or escape sequence)
    can be fit into a single object of type char, i.e., into a single byte.
    Isn't the latter sentence a contradiction with the former (and other
    parts of the standard)?

    On p11 there is the sentence "The value of a wide character constant
    containing a single multibyte character that maps to a member of the
    extended execution character set is the wide character corresponding to
    that multibyte character, as defined by the mbtowc function, with an
    implementation-defined current locale."

    This sentence suggests to me that the function mbtowc maps the
    multibyte encoding of a character of the *source* character set to a wide
    character.
    I find this surprising for the following reasons:
    1) the second parameter of mbtowc is a char *, so a pointer to bytes
    in the execution environment
    2) wctomb operates at runtime so I think it converts a wide character
    to a multibyte encoding in the execution environment; I would expect that
    wctomb and mbtowc were inverse of each other

    One more question: a byte is (sec. 3.3.6) a unit of data storage of
    the execution environment. Isn't it possible that the host environment
    has units of data storage with a different number of bits?
     
    Edward Rutherford, Nov 8, 2011
    #1
    1. Advertising

  2. Edward Rutherford

    Kaz Kylheku Guest

    On 2011-11-08, Edward Rutherford <> wrote:
    > There are 2 points in Sec. 6.4.4.4, describing character constants
    > that are not entirely clear to me. It may be that I have not understood
    > correctly the issues of character encondings.
    >
    > In p10 there is the sentence "The value of an integer character
    > constant containing a single character that maps to a single-byte
    > execution character is the numerical value of the representation of the
    > mapped character interpreted as an integer".
    >
    > This confirms that it may be that a single character of the source set
    > may be mapped to multiple bytes in the execution character set (and this
    > consistent with other parts of the standard).


    Howver, this is not true of the source set described in 5.2.1, which is the
    only characters from which strictly conforming C programs can be comprised.

    If you have a character constant like 'X' where X is some character that
    doesn't fit into a byte, that is code which falls outside of scope of the above
    sentence, since X is not "a single character that maps to a single-byte
    execution character".

    You didn't miss this text, in the same paragraph, right?

    The value of an integer character constant containing more than one character
    (e.g., 'ab'), or containing a character or escape sequence that does not map
    to a single-byte execution character, is implementation-defined.

    > But still on p10 there is
    > the sentence "If an integer character constant contains a single
    > character or escape sequence, its value is the one that results when an
    > object with type char whose value is that of the single character or
    > escape sequence is converted to type int".


    If the character has a value which doesn't fit into type char,
    then it is shoehorned into that type anyway (leading to an
    implementation-defined value). The resulting value converted
    to int is the value of the character constant.

    > On p11 there is the sentence "The value of a wide character constant
    > containing a single multibyte character that maps to a member of the
    > extended execution character set is the wide character corresponding to
    > that multibyte character, as defined by the mbtowc function, with an
    > implementation-defined current locale."
    >
    > This sentence suggests to me that the function mbtowc maps the
    > multibyte encoding of a character of the *source* character set to a wide
    > character.


    All mappings are from the source, to produce a value that will be
    used in the execution environment. The multibyte character is a piece of
    the program source code.

    Which encoding is used is implementation-defined.

    An implementation could use the encoding is that used on the target system
    (execution environment) or it could use the one from the build machine
    (translation enviornment) or something else entirely.

    It looks like the standard simply does not specify a lot in this area.

    There are reasons why one might want it done in any of these ways: source code
    in character set/encoding X, the build machine in Y, and target machine in Z,
    where these can all be different, or some of them are the same.

    It's probably best not to constrain this sort of thing too much, because then
    when someone wants a particular compilation scenario, it becomes needlessly
    nonconforming.

    > I find this surprising for the following reasons:
    > 1) the second parameter of mbtowc is a char *, so a pointer to bytes
    > in the execution environment


    A C compiler could contain a function exactly like mbtwoc which it can use at
    translation time.

    That is to say, just because some semantic action in the translator
    is described in terms of a C function doesn't mean that it has to be
    delayed into the execution environment (which doesn't make sense).

    6.4 Lexical elements has this in paragraph 4: ``If the input stream has been
    parsed into preprocessing tokens up to a given character, the next
    preprocessing token is the longest sequence of characters that could constitute
    a preprocessing token.'' Since they wrote ``stream'' does that mean that
    this is about a FILE * pointer in the execution environment?

    > One more question: a byte is (sec. 3.3.6) a unit of data storage of
    > the execution environment. Isn't it possible that the host environment
    > has units of data storage with a different number of bits?


    You could have, say, a cross-compiler running on an 8 bit machine targetting a
    9 bit machine. That cross-compiler would have to do all compile-time
    calculations involving characters and bytes as 9 bit quantities, so that it
    works out the correct values, as if the calculation were being done on the
    target system. E.g. the constant expression ('\x100' >> 8) should reduce to 1,
    etc.
     
    Kaz Kylheku, Nov 8, 2011
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Franck DARRAS
    Replies:
    12
    Views:
    641
    Jim Higson
    Aug 23, 2004
  2. Alexei Polkhanov
    Replies:
    11
    Views:
    2,464
  3. Replies:
    13
    Views:
    6,431
    Dave Thompson
    Dec 20, 2004
  4. ISO C89 and ISO C99

    , Dec 10, 2004, in forum: C Programming
    Replies:
    18
    Views:
    545
    Dave Thompson
    Dec 20, 2004
  5. Luca Forlizzi

    Questions on character constants

    Luca Forlizzi, Dec 12, 2010, in forum: C Programming
    Replies:
    2
    Views:
    339
    luser- -droog
    Dec 13, 2010
Loading...

Share This Page