using character as array subscript

Discussion in 'C++' started by Ivan, Jun 16, 2008.

  1. Ivan

    Ivan Guest

    Hi,

    What is the best syntax to use a char to index into an array.

    ///////////////////////////////////
    For example

    int data[256];

    data['a'] = 1;
    data['b'] = 1;
    ///////////////////////////////////

    gcc is complaining about this syntax, so i am using static cast on the
    character literal. Is there a better way to do this?

    Thanks,
    Ivan
    Ivan, Jun 16, 2008
    #1
    1. Advertising

  2. Ivan

    Jim Langston Guest

    "Ivan" <> wrote in message
    news:...
    > Hi,
    >
    > What is the best syntax to use a char to index into an array.
    >
    > ///////////////////////////////////
    > For example
    >
    > int data[256];
    >
    > data['a'] = 1;
    > data['b'] = 1;
    > ///////////////////////////////////
    >
    > gcc is complaining about this syntax, so i am using static cast on the
    > character literal. Is there a better way to do this?


    MSVC++ 2008 express isn't complaining and compiles that code fine, not even
    a warning. It is well defined behavior as long as the type of your native
    char is unsigned 8 bit byte.

    On my system if I
    std::cout << typeid('a').name() < "\n";
    I get the output of
    char

    Not unsigned char. That may produce some undefined behavior for you if you
    attempt to work with characters that would be above 127 as a byte, they
    might show up negative.
    Jim Langston, Jun 17, 2008
    #2
    1. Advertising

  3. Ivan

    Daniel Pitts Guest

    Jim Langston wrote:
    > "Ivan" <> wrote in message
    > news:...
    >> Hi,
    >>
    >> What is the best syntax to use a char to index into an array.
    >>
    >> ///////////////////////////////////
    >> For example
    >>
    >> int data[256];
    >>
    >> data['a'] = 1;
    >> data['b'] = 1;
    >> ///////////////////////////////////
    >>
    >> gcc is complaining about this syntax, so i am using static cast on the
    >> character literal. Is there a better way to do this?

    >
    > MSVC++ 2008 express isn't complaining and compiles that code fine, not even
    > a warning. It is well defined behavior as long as the type of your native
    > char is unsigned 8 bit byte.
    >
    > On my system if I
    > std::cout << typeid('a').name() < "\n";
    > I get the output of
    > char
    >
    > Not unsigned char. That may produce some undefined behavior for you if you
    > attempt to work with characters that would be above 127 as a byte, they
    > might show up negative.
    >
    >

    Is it well defined? I thought it would depend on the character encoding
    used, such as ASCII vs EBCDIC. Or does the standard actually specify
    char encoding now?

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
    Daniel Pitts, Jun 17, 2008
    #3
  4. Ivan

    James Kanze Guest

    On Jun 17, 12:48 am, Ivan <> wrote:

    > What is the best syntax to use a char to index into an array.


    It depends.

    > ///////////////////////////////////
    > For example


    > int data[256];


    > data['a'] = 1;
    > data['b'] = 1;
    > ///////////////////////////////////


    > gcc is complaining about this syntax, so i am using static
    > cast on the character literal. Is there a better way to do
    > this?


    It depends on the context.

    First, this is a warning; you can turn it off, or ignore it. In
    fact, it is a legitimate warning unless you've taken adequate
    precautions; a char may have negative values. (But then, so may
    an int. Logically, g++ shouldn't warn unless the size of the
    array is such that not all entries can be reached by a char, and
    not in the case of a character literal, in any case. But in
    fact, it does always warn, unless you turn that warning off.)

    The first case is when the array will normally be indexed by an
    int, and you're just using character literals during
    initialization; if the only indexation by a char is with a
    character literal, you can simply ignore the warning. (Note
    that this is a more or less usual idiom: you read the array with
    a return value of istream::get(), for example, after having
    checked for EOF.)

    If you really do want to index with arbitrary characters, there
    are three solutions:

    1. If portability isn't a large concern, you can just compile
    with -funsigned-char. This should really be the default,
    but there are historical reasons which mean that it isn't.
    Other compilers also have such an option. (It's /J for
    VC++, I think.) If you're certain that you'll never have to
    port to a compiler without this option, you can just use it,
    and be assured that plain char is unsigned.

    In this case, you'll still have to turn off the warning from
    g++. (IMHO, the warning, as it is currently implemented, is
    stupid. If they want to warn, it would be more reasonable
    to warn when the type of the index cannot encompass all of
    the possible index values, and only if the value is not a
    constant.)

    2. Otherwise, you can cast to unsigned_char anytime you use a
    char as an index.

    3. Or, you can rearrange the array, and use character -
    CHAR_MIN as an index.

    In the latter two cases, I'd wrap the array in a class which
    took care of the "correction" of the index.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jun 17, 2008
    #4
  5. Ivan

    Mirco Wahab Guest

    Ivan wrote:
    > For example
    > int data[256];
    > data['a'] = 1;
    > data['b'] = 1;
    > ///////////////////////////////////
    > gcc is complaining about this syntax, so i am using static cast on the
    > character literal. Is there a better way to do this?


    Which gcc? From your example, I assumed:

    int data[256];

    int main()
    {
    data['a'] = 1;
    data['b'] = 1;
    return 0;
    }

    Compiled as C++ There was not a single warning in:
    g++-4.3 (-Wall -pedantic)
    mingw-gcc-3.4.1
    icpc (intel CC 10.1)

    Maybe you made another mistake not
    shown in your incomplete excerpt.

    Regards

    Mirco
    Mirco Wahab, Jun 17, 2008
    #5
  6. Ivan

    Daniel Pitts Guest

    Jack Klein wrote:
    > On Mon, 16 Jun 2008 18:53:05 -0700, Daniel Pitts
    > <> wrote in comp.lang.c++:
    >
    >> Jim Langston wrote:
    >>> "Ivan" <> wrote in message
    >>> news:...
    >>>> Hi,
    >>>>
    >>>> What is the best syntax to use a char to index into an array.
    >>>>
    >>>> ///////////////////////////////////
    >>>> For example
    >>>>
    >>>> int data[256];
    >>>>
    >>>> data['a'] = 1;
    >>>> data['b'] = 1;
    >>>> ///////////////////////////////////
    >>>>
    >>>> gcc is complaining about this syntax, so i am using static cast on the
    >>>> character literal. Is there a better way to do this?
    >>> MSVC++ 2008 express isn't complaining and compiles that code fine, not even
    >>> a warning. It is well defined behavior as long as the type of your native
    >>> char is unsigned 8 bit byte.
    >>>
    >>> On my system if I
    >>> std::cout << typeid('a').name() < "\n";
    >>> I get the output of
    >>> char
    >>>
    >>> Not unsigned char. That may produce some undefined behavior for you if you
    >>> attempt to work with characters that would be above 127 as a byte, they
    >>> might show up negative.
    >>>
    >>>

    >> Is it well defined? I thought it would depend on the character encoding
    >> used, such as ASCII vs EBCDIC. Or does the standard actually specify
    >> char encoding now?

    >
    > No, the standard does not specify execution character set. Or source
    > character set, for that matter. That's exactly why it is more
    > portable to use the actual characters, rather than their numerical
    > value in a particular character set.
    >
    > In fact, the OP's code could well be part of a beginner's assignment
    > to generate a histogram of characters in some input data.
    >
    > This is guaranteed to produce the correct hex digit character for the
    > lowest nibble of an unsigned int regardless of the character set:
    >
    > char hex[] = "0123456789ABCDEF";
    >
    > char hex_digit(unsigned int x)
    > {
    > return hex [x & 0xf];
    > }

    You're example only addresses the *converse* of my point, and therefor
    doesn't have any connection to the validity of my point.
    >
    > ....if you change the definition of the array to:
    >
    > char hex [17] = { 48, 48, /*... */ 69, 70, 0 };
    >
    > ....then you get exactly the same array and result on an ASCII
    > implementation, and gibberish on any other execution character set.
    >

    Right, but using 'a' as an index into an array could be a different
    index on different compilers. considering that char could be signed and
    negative, you could have serious consequences.

    Granted, this isn't a problem in practice, but its not portable that
    foo['a'] = 1 should do something specific.

    Now, if you were to get specific with vendor/platform, thats a different
    question.

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
    Daniel Pitts, Jun 17, 2008
    #6
  7. Ivan

    Jerry Coffin Guest

    In article <4857ed5a$0$12713$>,
    says...

    [ ... ]

    > Right, but using 'a' as an index into an array could be a different
    > index on different compilers. considering that char could be signed and
    > negative, you could have serious consequences.
    >
    > Granted, this isn't a problem in practice, but its not portable that
    > foo['a'] = 1 should do something specific.


    That depends on what you mean by something specific. Basically, the
    behavior is unspecified, but NOT undefined. In particular, the C++
    standard specifies a basic execution character set that includes the
    usual English letters, base-10 digits, etc. and requires that all those
    characters have non-negative values. Since the 'a' in your expression
    must be non-negative, it has defined results if (for example) foo has
    been defined something like 'int foo[UCHAR_MAX];'

    It's certainly true that you could encounter characters whose encoding
    is negative, but this isn't one of them.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
    Jerry Coffin, Jun 17, 2008
    #7
  8. Ivan

    James Kanze Guest

    On Jun 17, 6:58 pm, Daniel Pitts
    <> wrote:

    [...]
    > Right, but using 'a' as an index into an array could be a
    > different index on different compilers.


    Which, presumably, is what is wanted. You don't want the entry
    corresponding to 97 (or whatever); you want the entry
    corresponding to the encoding for the character 'a' on the
    platform in question.

    > considering that char could be signed and negative, you could
    > have serious consequences.


    That's the real problem. The OP had an array "int x[ 256 ] ;";
    indexing it with a char could definitely be a problem (and
    logically, it probably should be "int x[ UCHAR_MAX + 1 ] ;").
    But of course, we (and g++) don't know whether he intends to
    index it with a char, or with a char cast to unsigned char, or
    with an int, return value from istream::get() or fgetc(). And
    'a' *is* guaranteed to be positive, and in the range
    0...UCHAR_MAX.

    > Granted, this isn't a problem in practice, but its not
    > portable that foo['a'] = 1 should do something specific.


    Except that the language standard says that it does something
    very specific, and very useful. Issuing a warning in this case
    is simply brain
    damage.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jun 18, 2008
    #8
  9. Ivan

    James Kanze Guest

    On Jun 17, 7:27 pm, Jerry Coffin <> wrote:
    > In article <4857ed5a$0$12713$>,
    > says...


    > [ ... ]


    > > Right, but using 'a' as an index into an array could be a
    > > different index on different compilers. considering that
    > > char could be signed and negative, you could have serious
    > > consequences.


    > > Granted, this isn't a problem in practice, but its not
    > > portable that foo['a'] = 1 should do something specific.


    > That depends on what you mean by something specific.
    > Basically, the behavior is unspecified, but NOT undefined.


    The behavior is exactly specified (or at least, as specified as
    anything else in C++). You index the array with the value
    corresponding to the encoding of a small a in the native
    character encoding. If the goal is to index the entry
    corresponding to the encoding of a small a, this is the only
    correct and specified way of doing it.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jun 18, 2008
    #9
  10. Ivan

    James Kanze Guest

    On Jun 17, 12:00 pm, Mirco Wahab <-halle.de> wrote:
    > Ivan wrote:
    > > For example
    > > int data[256];
    > > data['a'] = 1;
    > > data['b'] = 1;
    > > ///////////////////////////////////
    > > gcc is complaining about this syntax, so i am using static cast on the
    > > character literal. Is there a better way to do this?


    > Which gcc? From your example, I assumed:


    > int data[256];


    > int main()
    > {
    > data['a'] = 1;
    > data['b'] = 1;
    > return 0;
    > }


    > Compiled as C++ There was not a single warning in:
    > g++-4.3 (-Wall -pedantic)


    g++ 4.1.0 (under Solaris) definitely warns in this case when
    -Wall -pedantic is used.

    > mingw-gcc-3.4.1


    So does 3.4.0 under Solaris, and the CygWin version of 3.4.4
    under Windows.

    > icpc (intel CC 10.1)


    > Maybe you made another mistake not shown in your incomplete
    > excerpt.


    I have no problem reproducing his warnings, with several
    different versions of g++, as long as -Wall is used. The actual
    warning is "char-subscripts", so adding -Wno-char-subscripts
    *after* -Wall (or not using -Wall at all, but choosing
    explicitly for each warning) will suppress it. Which you
    probably should do---this is one of those brain dead warnings of
    which every compiler seems to have a few.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jun 18, 2008
    #10
  11. Ivan

    Mirco Wahab Guest

    James Kanze wrote:
    > g++ 4.1.0 (under Solaris) definitely warns in this case when
    > -Wall -pedantic is used.
    > ...
    > So does 3.4.0 under Solaris, and the CygWin version of 3.4.4
    > under Windows.
    > I have no problem reproducing his warnings, with several
    > different versions of g++, as long as -Wall is used. The actual
    > warning is "char-subscripts", so adding -Wno-char-subscripts
    > *after* -Wall (or not using -Wall at all, but choosing
    > explicitly for each warning) will suppress it. Which you
    > probably should do---this is one of those brain dead warnings of
    > which every compiler seems to have a few.


    OK, I checked again (-Wall, -pedantic if possible):

    1) gcc version 3.4.2 (mingw-special)
    /s/misc/charsubscr/charsubscr.cxx:6: warning: array subscript has type `char'
    /s/misc/charsubscr/charsubscr.cxx:7: warning: array subscript has type `char'

    2) gcc version 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
    charsubscr.cxx:6: warning: array subscript has type `char'
    charsubscr.cxx:7: warning: array subscript has type `char'

    3) gcc version 4.2.3 20071030 (Linux)
    (no warning)

    4) gcc version 4.3.1 20080507 [gcc-4_3-branch revision 135036] (Linux)
    (no warning)

    5) icpc Version 10.1 (Linux)
    (no warning)

    6) Visual C++ 6 (SP6), Warning Level 4 (XP/SP2)
    (no warning)

    7) Visual C++ 9 (SP0), Warning Level 4 (XP/SP2)
    (no warning)


    So the gcc < 4.x seems to be the only tool
    that emits this warning (?).

    Thanks & Regards

    Mirco
    Mirco Wahab, Jun 18, 2008
    #11
  12. Ivan

    Jerry Coffin Guest

    In article <d55ff98f-755c-4a85-86f5-8981ab3afe48
    @f36g2000hsa.googlegroups.com>, says...
    > On Jun 17, 7:27 pm, Jerry Coffin <> wrote:
    > > In article <4857ed5a$0$12713$>,
    > > says...

    >
    > > [ ... ]

    >
    > > > Right, but using 'a' as an index into an array could be a
    > > > different index on different compilers. considering that
    > > > char could be signed and negative, you could have serious
    > > > consequences.

    >
    > > > Granted, this isn't a problem in practice, but its not
    > > > portable that foo['a'] = 1 should do something specific.

    >
    > > That depends on what you mean by something specific.
    > > Basically, the behavior is unspecified, but NOT undefined.

    >
    > The behavior is exactly specified (or at least, as specified as
    > anything else in C++). You index the array with the value
    > corresponding to the encoding of a small a in the native
    > character encoding. If the goal is to index the entry
    > corresponding to the encoding of a small a, this is the only
    > correct and specified way of doing it.


    Right -- all I meant is that the order in which most of those entries
    are arranged isn't specified. IIRC, the only part that's specified is
    that the digits will be in order and contiguous.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
    Jerry Coffin, Jun 18, 2008
    #12
  13. Ivan

    James Kanze Guest

    On Jun 18, 11:01 am, Mirco Wahab <-halle.de> wrote:
    > James Kanze wrote:
    > > g++ 4.1.0 (under Solaris) definitely warns in this case when
    > > -Wall -pedantic is used.
    > > ...
    > > So does 3.4.0 under Solaris, and the CygWin version of 3.4.4
    > > under Windows.
    > > I have no problem reproducing his warnings, with several
    > > different versions of g++, as long as -Wall is used. The actual
    > > warning is "char-subscripts", so adding -Wno-char-subscripts
    > > *after* -Wall (or not using -Wall at all, but choosing
    > > explicitly for each warning) will suppress it. Which you
    > > probably should do---this is one of those brain dead warnings of
    > > which every compiler seems to have a few.


    > OK, I checked again (-Wall, -pedantic if possible):


    [...]
    > So the gcc < 4.x seems to be the only tool that emits this
    > warning (?).


    I get it with g++ 4.1. So maybe they realized how stupid it
    was, and got rid of it (or at least dropped it from -Wall).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jun 18, 2008
    #13
  14. Jerry Coffin <> writes:

    > In article <d55ff98f-755c-4a85-86f5-8981ab3afe48
    > @f36g2000hsa.googlegroups.com>, says...
    >> On Jun 17, 7:27 pm, Jerry Coffin <> wrote:
    >> > In article <4857ed5a$0$12713$>,
    >> > says...

    >>
    >> > [ ... ]

    >>
    >> > > Right, but using 'a' as an index into an array could be a
    >> > > different index on different compilers. considering that
    >> > > char could be signed and negative, you could have serious
    >> > > consequences.

    >>
    >> > > Granted, this isn't a problem in practice, but its not
    >> > > portable that foo['a'] = 1 should do something specific.

    >>
    >> > That depends on what you mean by something specific.
    >> > Basically, the behavior is unspecified, but NOT undefined.

    >>
    >> The behavior is exactly specified (or at least, as specified as
    >> anything else in C++). You index the array with the value
    >> corresponding to the encoding of a small a in the native
    >> character encoding. If the goal is to index the entry
    >> corresponding to the encoding of a small a, this is the only
    >> correct and specified way of doing it.

    >
    > Right -- all I meant is that the order in which most of those entries
    > are arranged isn't specified. IIRC, the only part that's specified is
    > that the digits will be in order and contiguous.


    The order is the least of the problems we have with a['a']. The main
    problem is that 'a' is of type char, and char is often signed char,
    therefore 'a' might be negative 0, and 'à' will most probably be
    negative.

    So you can use bytes to index arrays, but be careful:

    int a[UCHAR_MAX+1];

    char i=42;
    if(0<=i){
    a; // ok
    }

    char j='a';
    a[(unsigned char)j]; // ok

    unsigned char k='a';
    a[k]; // best

    --
    __Pascal Bourguignon__
    Pascal J. Bourguignon, Jun 19, 2008
    #14
  15. Ivan

    Guest

    On 17 Jun., 00:48, Ivan <> wrote:
    > Hi,
    >
    > What is the best syntax to use a char to index into an array.
    >
    > ///////////////////////////////////
    > For example
    >
    > int data[256];
    >
    > data['a'] = 1;
    > data['b'] = 1;
    > ///////////////////////////////////
    >
    > gcc is complaining about this syntax, so i am using static cast on the
    > character literal. Is there a better way to do this?


    It would be helpful, to post also the gcc warnings (complaints).

    Greetings Thomas Mertes

    Seed7 Homepage: http://seed7.sourceforge.net
    Seed7 - The extensible programming language: User defined statements
    and operators, abstract data types, templates without special
    syntax, OO with interfaces and multiple dispatch, statically typed,
    interpreted or compiled, portable, runs under linux/unix/windows.
    , Jun 19, 2008
    #15
  16. Ivan

    Jerry Coffin Guest

    In article <>,
    says...

    [ ... ]

    > The order is the least of the problems we have with a['a']. The main
    > problem is that 'a' is of type char, and char is often signed char,
    > therefore 'a' might be negative 0, and 'à' will most probably be
    > negative.


    The standard specifically requires that all members of the basic
    execution character set be nonnegative and 'a' is a member of the basic
    execution character set, so it will never be negative.

    ONLY characters that are NOT members of the basic execution character
    set can be encoded with negative values. That includes a lot, but there
    ARE limits.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
    Jerry Coffin, Jun 19, 2008
    #16
  17. Pete Becker <> writes:

    > On 2008-06-19 07:02:05 -0400, Pete Becker <> said:
    >
    >> On 2008-06-19 05:18:46 -0400, (Pascal
    >> J. Bourguignon) said:
    >>
    >>> The order is the least of the problems we have with a['a']. The
    >>> main
    >>> problem is that 'a' is of type char, and char is often signed char,
    >>> therefore 'a' might be negative 0, and 'à' will most probably be
    >>> negative.
    >>>

    >> Phew, I knew it had to be there somewhere, and I just found it:
    >> [lex.charset]/3: "For each basic execution character set, the
    >> values of the members shall be non-negative and distinct from one another."
    >> So, in particular, 'a' cannot be negative.

    >
    > Sorry, missed the accent over that last 'a'. That character is not in
    > the basic execution character set, so its value can be negative.


    Yes, but thanks for the reference, at least 'a' is not negative.

    --
    __Pascal Bourguignon__
    Pascal J. Bourguignon, Jun 20, 2008
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tom Page
    Replies:
    4
    Views:
    461
    Victor Bazarov
    Feb 17, 2004
  2. Richard Delorme

    out of range array subscript

    Richard Delorme, May 3, 2004, in forum: C Programming
    Replies:
    5
    Views:
    476
    Chris Torek
    May 15, 2004
  3. Pedro Graca

    array subscript type cannot be `char`?

    Pedro Graca, Mar 22, 2006, in forum: C Programming
    Replies:
    51
    Views:
    1,467
    Jordan Abel
    Mar 28, 2006
  4. DaVinci
    Replies:
    5
    Views:
    753
    Axter
    May 10, 2006
  5. Yehuda Berlinger

    Using undef as an array subscript

    Yehuda Berlinger, Jul 1, 2003, in forum: Perl Misc
    Replies:
    8
    Views:
    125
    Greg Bacon
    Jul 1, 2003
Loading...

Share This Page