Questions on character constants

Discussion in 'C Programming' started by Luca Forlizzi, Dec 12, 2010.

  1. There are 2 points in Sec. 6.4.4.4, describing character constants
    that are
    not entirely clear to me. It may be that I don't read well the text or
    that I
    have not understood correcly the issues of character encondings.

    In p10 there is the sentence "The value of an integer character
    constant
    containing a single character that maps to a single-byte execution
    character is the
    numerical value of the representation of the mapped character
    interpreted as an integer".
    This confirms that it may be that a single character of the source set
    may be
    mapped to multiple bytes in the execution character set (and this
    consistent with
    other parts of the standard). But still in p10 there is the sentence
    "If an integer
    character constant contains a single character or escape sequence, its
    value
    is the one that results when an object with type char whose value is
    that of the
    single character or escape sequence is converted to
    type int". This sentence seems to imply that the value corresponding
    to a single
    character (or escape sequence) can be fit into a single object of
    thype char,
    i.e., into a single byte. Isn't the latter sentence a contradiction
    with the
    former (and other parts of the standard)?

    In p11 there is the sentence "The value of a wide character constant
    containing a single
    multibyte character that maps to a member of the extended execution
    character set is the
    wide character corresponding to that multibyte character, as defined
    by the mbtowc
    function, with an implementation-defined current locale."
    This sentence suggests to me that the function mbtowc maps the
    multibyte encoding
    of a character of the *source* character set to a wide character.
    I find this surprising because of the following reasons:
    1) the second parameter of mbtowc is a char *, so a pointer to bytes
    in the
    execution environment
    2) wctomb operates at runtime so I think it converts a wide character
    to a multibyte
    encoding in the execution environment; I would expect that wctomb and
    mbtowc were
    inverse of each other

    One more question: a byte is (sec. 3.3.6) a unit of data storage of
    the execution environment.
    Isn't it possible that the host environment has units of data storage
    with a different
    number of bits?
     
    Luca Forlizzi, Dec 12, 2010
    #1
    1. Advertising

  2. Luca Forlizzi

    Eric Sosman Guest

    On 12/12/2010 10:24 AM, Luca Forlizzi wrote:
    > There are 2 points in Sec. 6.4.4.4, describing character constants
    > that are
    > not entirely clear to me. It may be that I don't read well the text or
    > that I
    > have not understood correcly the issues of character encondings.
    >
    > In p10 there is the sentence "The value of an integer character
    > constant
    > containing a single character that maps to a single-byte execution
    > character is the
    > numerical value of the representation of the mapped character
    > interpreted as an integer".
    > This confirms that it may be that a single character of the source set
    > may be
    > mapped to multiple bytes in the execution character set (and this
    > consistent with
    > other parts of the standard). But still in p10 there is the sentence
    > "If an integer
    > character constant contains a single character or escape sequence, its
    > value
    > is the one that results when an object with type char whose value is
    > that of the
    > single character or escape sequence is converted to
    > type int". This sentence seems to imply that the value corresponding
    > to a single
    > character (or escape sequence) can be fit into a single object of
    > thype char,
    > i.e., into a single byte. Isn't the latter sentence a contradiction
    > with the
    > former (and other parts of the standard)?


    "The escape sequence" refers to the source-code escape sequences,
    multi-source-character sequences like '\n' or '\xFF'. When you write
    the resulting character to a stream, the implementation might use an
    encoding scheme like Shift JIS that employs "escape sequences" of its
    own, but these "escape sequences" are not the source-level constructs
    described in 6.4.4.4.

    (I'll pass on your question about mbtowc() et al. because I have
    used them only a few times, and even then without real understanding.)

    > One more question: a byte is (sec. 3.3.6) a unit of data storage of


    ITYM 3.6.

    > the execution environment.
    > Isn't it possible that the host environment has units of data storage
    > with a different
    > number of bits?


    Yes, and in fact it's quite common. Very many C platforms support
    "units" of many sizes: bytes, halfwords, words, doublewords, pages, ...
    The crucial requirement in 3.6 is not only that this unit exist, but
    that it be an "addressable unit," because in C's view nearly every data
    object can be treated as an array of bytes. Even if this treatment is
    not "natural" for the underlying platform, the C implementation must
    somehow make the array-of-bytes view work. For example, the original
    DEC Alpha supported 32- and 64-bit units, and used shifts and masks
    to simulate byte access within those larger blobs.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Dec 12, 2010
    #2
    1. Advertising

  3. On Dec 12, 9:24 am, Luca Forlizzi <> wrote:
    > There are 2 points in Sec. 6.4.4.4, describing character constants
    > that are
    > not entirely clear to me. It may be that I don't read well the text or
    > that I
    > have not understood correcly the issues of character encondings.
    >
    > In p10 there is the sentence "The value of an integer character
    > constant
    > containing a single character that maps to a single-byte execution
    > character is the
    > numerical value of the representation of the mapped character
    > interpreted as an integer".
    > This confirms that it may be that a single character of the source set
    > may be
    > mapped to multiple bytes in the execution character set (and this
    > consistent with
    > other parts of the standard).

    [snip]

    It may be helpful, in learning C, to shift your perspective from bytes
    to words. One of C's original, primary purposes was to be fast. This
    means
    that the compiler is less concerned with the spatial arrangement of
    code and data in core memory. and more focused on the temporal
    arrangment
    of intructions as they will be executed.

    So when the standard talks about 'integer character constant's, it's
    dead fucking serious. This object is not a char. It's an int. This is
    because when it get loaded into a register, it'll be an integer-sized
    register. It doesn't matter if there are separately-addressable byte-
    sized BH and BL registers, it's gonna use EBX.

    It may also be helpful, when reading dense matter like standards,
    to circle or highlight the nouns with all their adjectives and
    adornments.

    So in this case,

    "The
    value
    [-- perhaps they could've said 'value yielded in an expression' --]

    of an

    integer character constant
    [-- this is just the term being defined, so we don't get to assume
    anything about this beast yet except those three words --]

    containing a

    single character that maps to a single-byte execution character
    [-- so it's telling us absolutely nothing about characters that
    do not map to a 'single-byte execution character', whatever that
    may mean --]

    is the

    numerical value of the representation of the mapped character
    interpreted as an integer"
    [-- remember we're talking about the 'value' of this creature.
    It's value is a number. It's whatever number it needs to be
    to match the 'representaion of the mapped character' if you
    had to give the most obvious number to it. --]

    lxt
    --
    Hopefully this comes off more useful than patronizing.
     
    luser- -droog, Dec 13, 2010
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. SpOiLeR
    Replies:
    6
    Views:
    4,150
    Victor Bazarov
    Mar 15, 2005
  2. Akhil

    Character Constants

    Akhil, Feb 21, 2006, in forum: C Programming
    Replies:
    23
    Views:
    667
    Al Balmer
    Feb 23, 2006
  3. Kavya
    Replies:
    2
    Views:
    298
  4. Mirco Wahab

    Multi-character constants

    Mirco Wahab, Jul 9, 2008, in forum: C++
    Replies:
    2
    Views:
    1,984
    James Kanze
    Jul 10, 2008
  5. Edward Rutherford

    Questions on ISO C character constants

    Edward Rutherford, Nov 8, 2011, in forum: C Programming
    Replies:
    1
    Views:
    348
    Kaz Kylheku
    Nov 8, 2011
Loading...

Share This Page