Multi-character constants

Discussion in 'C++' started by Mirco Wahab, Jul 9, 2008.

  1. Mirco Wahab

    Mirco Wahab Guest

    After reading through some (open) Intel (CPU detection)
    C++ source (www.intel.com/cd/ids/developer/asmo-na/eng/276611.htm)
    I stumbled upon a sketchy use of multibyte characters

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    260:
    unsigned int VendorID[3] = {0, 0, 0};
    try // If CPUID instruction is supported
    {
    ...
    }
    catch (...)
    {
    ...
    }
    return (
    (VendorID[0] == 'uneG') &&
    (VendorID[1] == 'Ieni') &&
    (VendorID[2] == 'letn')
    );

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    This seems to work, gcc 4.2 emits a warning:

    "warning: multi-character character constant"

    and Visual C++ 9 says nothing at all.

    Whats the matter w/multibyte characters now?
    I didn't use them and would be glad to learn
    if they are widely implemented and part of
    the standard soon/now?

    gcc tells us: (http://gcc.gnu.org/onlinedocs/gcc/Characters-implementation.html)
    ...
    [Characters]
    ...
    The value of a wide character constant containing more than
    one multibyte character, or containing a multibyte character
    or escape sequence not represented in the extended execution
    character set (C90 6.1.3.4, C99 6.4.4.4).
    ...



    Regards & Thanks for clearing this

    M.
     
    Mirco Wahab, Jul 9, 2008
    #1
    1. Advertising

  2. Mirco Wahab

    James Kanze Guest

    On Jul 9, 4:29 pm, Mirco Wahab <-halle.de> wrote:
    > After reading through some (open) Intel (CPU detection)
    > C++ source (www.intel.com/cd/ids/developer/asmo-na/eng/276611.htm)
    > I stumbled upon a sketchy use of multibyte characters


    > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    > 260:
    > unsigned int VendorID[3] = {0, 0, 0};
    > try // If CPUID instruction is supported
    > {
    > ...
    > }
    > catch (...)
    > {
    > ...
    > }
    > return (
    > (VendorID[0] == 'uneG') &&
    > (VendorID[1] == 'Ieni') &&
    > (VendorID[2] == 'letn')
    > );
    > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


    > This seems to work, gcc 4.2 emits a warning:


    > "warning: multi-character character constant"


    > and Visual C++ 9 says nothing at all.


    > Whats the matter w/multibyte characters now?


    First, do you mean multi-byte characters (e.g. UTF-8), or
    multicharacter literals. Your example doesn't contain any
    multi-byte characters, only multicharacter literals.

    > I didn't use them and would be glad to learn if they are
    > widely implemented and part of the standard soon/now?


    Multicharacter literals are a holdover from the original C. As
    far as I can tell, they have no use, and are of no interest
    whatsoever. And what they mean is implementation defined. All
    of which is probably why g++ warns about them.

    Multi-byte characters are becoming more and more frequent as
    applications shift to UTF-8, for reasons of
    internationalization. True support is still spotty, but getting
    there; the next version of the standard will require it (to some
    degree---there still won't be functions like isdigit which work
    on them).

    > gcc tells us: (http://gcc.gnu.org/onlinedocs/gcc/Characters-implementation.html)
    > ...
    > [Characters]
    > ...
    > The value of a wide character constant containing more than
    > one multibyte character, or containing a multibyte character
    > or escape sequence not represented in the extended execution
    > character set (C90 6.1.3.4, C99 6.4.4.4).
    > ...


    Implementation defined behavior is required to be documented by
    the implementation. In this case, you've cut the only
    significant bit, a link to the implementation defined behavior,
    where you'll find:

    The compiler values a multi-character character constant
    a character at a time, shifting the previous value left
    by the number of bits per target character, and then
    or-ing in the bit-pattern of the new character truncated
    to the width of a target character. The final
    bit-pattern is given type int, and is therefore signed,
    regardless of whether single characters are signed or
    not (a slight change from versions 3.1 and earlier of
    GCC). If there are more characters in the constant than
    would fit in the target int the compiler issues a
    warning, and the excess leading characters are ignored.

    For example, 'ab' for a target with an 8-bit char would
    be interpreted as `(int) ((unsigned char) 'a' * 256 +
    (unsigned char) 'b')', and '\234a' as `(int) ((unsigned
    char) '\234' * 256 + (unsigned char) 'a')'.

    (Technically, this documentation only applies to C, I think.
    But I would be very surprised if C++ did differently.)

    But since this is implementation defined, the above is only
    valid for gcc (although it does seem to be a frequent behavior).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jul 10, 2008
    #2
    1. Advertising

  3. Mirco Wahab

    James Kanze Guest

    On Jul 9, 4:39 pm, Victor Bazarov <> wrote:
    > Mirco Wahab wrote:


    [...]
    > > gcc tells us:
    > > (http://gcc.gnu.org/onlinedocs/gcc/Characters-implementation.html)
    > > ...
    > > [Characters]
    > > ...
    > > The value of a wide character constant containing more than
    > > one multibyte character, or containing a multibyte character
    > > or escape sequence not represented in the extended execution
    > > character set (C90 6.1.3.4, C99 6.4.4.4).
    > > ...


    > The are part of C++ since before the first Standard, IIRC.
    > The problem with them, however, is that the order of the bytes
    > in memory depends on the endianness of the system (or other
    > factors). Also, they don't have the type 'char', they have
    > the type 'int' and their representation is
    > implementation-defined (see [lex.ccon]/1).


    They were part of K&R C. Where a character literal always had
    type int. Even in C, however, the only place I've seen them
    used was for generating the "magic" for certain types of files
    in very early Unix. (Presumably, the author of the code "knew"
    what his compiler did.) They're one of those misfeatures which
    we can't get rid of for reasons of backwards compatibility.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jul 10, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mimmo
    Replies:
    4
    Views:
    28,100
  2. Akhil

    Character Constants

    Akhil, Feb 21, 2006, in forum: C Programming
    Replies:
    23
    Views:
    685
    Al Balmer
    Feb 23, 2006
  3. Kavya
    Replies:
    2
    Views:
    314
  4. Luca Forlizzi

    Questions on character constants

    Luca Forlizzi, Dec 12, 2010, in forum: C Programming
    Replies:
    2
    Views:
    352
    luser- -droog
    Dec 13, 2010
  5. emeraldsky23

    warning: multi-character character constant error

    emeraldsky23, Jan 15, 2011, in forum: C Programming
    Replies:
    0
    Views:
    1,083
    emeraldsky23
    Jan 15, 2011
Loading...

Share This Page