using character as array subscript

I

Ivan

Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?

Thanks,
Ivan
 
J

Jim Langston

Ivan said:
Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?

MSVC++ 2008 express isn't complaining and compiles that code fine, not even
a warning. It is well defined behavior as long as the type of your native
char is unsigned 8 bit byte.

On my system if I
std::cout << typeid('a').name() < "\n";
I get the output of
char

Not unsigned char. That may produce some undefined behavior for you if you
attempt to work with characters that would be above 127 as a byte, they
might show up negative.
 
D

Daniel Pitts

Jim said:
Ivan said:
Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?

MSVC++ 2008 express isn't complaining and compiles that code fine, not even
a warning. It is well defined behavior as long as the type of your native
char is unsigned 8 bit byte.

On my system if I
std::cout << typeid('a').name() < "\n";
I get the output of
char

Not unsigned char. That may produce some undefined behavior for you if you
attempt to work with characters that would be above 127 as a byte, they
might show up negative.
Is it well defined? I thought it would depend on the character encoding
used, such as ASCII vs EBCDIC. Or does the standard actually specify
char encoding now?
 
J

James Kanze

What is the best syntax to use a char to index into an array.

It depends.
///////////////////////////////////
For example
int data[256];
data['a'] = 1;
data['b'] = 1;
///////////////////////////////////
gcc is complaining about this syntax, so i am using static
cast on the character literal. Is there a better way to do
this?

It depends on the context.

First, this is a warning; you can turn it off, or ignore it. In
fact, it is a legitimate warning unless you've taken adequate
precautions; a char may have negative values. (But then, so may
an int. Logically, g++ shouldn't warn unless the size of the
array is such that not all entries can be reached by a char, and
not in the case of a character literal, in any case. But in
fact, it does always warn, unless you turn that warning off.)

The first case is when the array will normally be indexed by an
int, and you're just using character literals during
initialization; if the only indexation by a char is with a
character literal, you can simply ignore the warning. (Note
that this is a more or less usual idiom: you read the array with
a return value of istream::get(), for example, after having
checked for EOF.)

If you really do want to index with arbitrary characters, there
are three solutions:

1. If portability isn't a large concern, you can just compile
with -funsigned-char. This should really be the default,
but there are historical reasons which mean that it isn't.
Other compilers also have such an option. (It's /J for
VC++, I think.) If you're certain that you'll never have to
port to a compiler without this option, you can just use it,
and be assured that plain char is unsigned.

In this case, you'll still have to turn off the warning from
g++. (IMHO, the warning, as it is currently implemented, is
stupid. If they want to warn, it would be more reasonable
to warn when the type of the index cannot encompass all of
the possible index values, and only if the value is not a
constant.)

2. Otherwise, you can cast to unsigned_char anytime you use a
char as an index.

3. Or, you can rearrange the array, and use character -
CHAR_MIN as an index.

In the latter two cases, I'd wrap the array in a class which
took care of the "correction" of the index.
 
M

Mirco Wahab

Ivan said:
For example
int data[256];
data['a'] = 1;
data['b'] = 1;
///////////////////////////////////
gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?

Which gcc? From your example, I assumed:

int data[256];

int main()
{
data['a'] = 1;
data['b'] = 1;
return 0;
}

Compiled as C++ There was not a single warning in:
g++-4.3 (-Wall -pedantic)
mingw-gcc-3.4.1
icpc (intel CC 10.1)

Maybe you made another mistake not
shown in your incomplete excerpt.

Regards

Mirco
 
D

Daniel Pitts

Jack said:
Jim said:
Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?
MSVC++ 2008 express isn't complaining and compiles that code fine, not even
a warning. It is well defined behavior as long as the type of your native
char is unsigned 8 bit byte.

On my system if I
std::cout << typeid('a').name() < "\n";
I get the output of
char

Not unsigned char. That may produce some undefined behavior for you if you
attempt to work with characters that would be above 127 as a byte, they
might show up negative.
Is it well defined? I thought it would depend on the character encoding
used, such as ASCII vs EBCDIC. Or does the standard actually specify
char encoding now?

No, the standard does not specify execution character set. Or source
character set, for that matter. That's exactly why it is more
portable to use the actual characters, rather than their numerical
value in a particular character set.

In fact, the OP's code could well be part of a beginner's assignment
to generate a histogram of characters in some input data.

This is guaranteed to produce the correct hex digit character for the
lowest nibble of an unsigned int regardless of the character set:

char hex[] = "0123456789ABCDEF";

char hex_digit(unsigned int x)
{
return hex [x & 0xf];
}
You're example only addresses the *converse* of my point, and therefor
doesn't have any connection to the validity of my point.
....if you change the definition of the array to:

char hex [17] = { 48, 48, /*... */ 69, 70, 0 };

....then you get exactly the same array and result on an ASCII
implementation, and gibberish on any other execution character set.
Right, but using 'a' as an index into an array could be a different
index on different compilers. considering that char could be signed and
negative, you could have serious consequences.

Granted, this isn't a problem in practice, but its not portable that
foo['a'] = 1 should do something specific.

Now, if you were to get specific with vendor/platform, thats a different
question.
 
J

Jerry Coffin

[ ... ]
Right, but using 'a' as an index into an array could be a different
index on different compilers. considering that char could be signed and
negative, you could have serious consequences.

Granted, this isn't a problem in practice, but its not portable that
foo['a'] = 1 should do something specific.

That depends on what you mean by something specific. Basically, the
behavior is unspecified, but NOT undefined. In particular, the C++
standard specifies a basic execution character set that includes the
usual English letters, base-10 digits, etc. and requires that all those
characters have non-negative values. Since the 'a' in your expression
must be non-negative, it has defined results if (for example) foo has
been defined something like 'int foo[UCHAR_MAX];'

It's certainly true that you could encounter characters whose encoding
is negative, but this isn't one of them.
 
J

James Kanze

On Jun 17, 6:58 pm, Daniel Pitts

[...]
Right, but using 'a' as an index into an array could be a
different index on different compilers.

Which, presumably, is what is wanted. You don't want the entry
corresponding to 97 (or whatever); you want the entry
corresponding to the encoding for the character 'a' on the
platform in question.
considering that char could be signed and negative, you could
have serious consequences.

That's the real problem. The OP had an array "int x[ 256 ] ;";
indexing it with a char could definitely be a problem (and
logically, it probably should be "int x[ UCHAR_MAX + 1 ] ;").
But of course, we (and g++) don't know whether he intends to
index it with a char, or with a char cast to unsigned char, or
with an int, return value from istream::get() or fgetc(). And
'a' *is* guaranteed to be positive, and in the range
0...UCHAR_MAX.
Granted, this isn't a problem in practice, but its not
portable that foo['a'] = 1 should do something specific.

Except that the language standard says that it does something
very specific, and very useful. Issuing a warning in this case
is simply brain
damage.
 
J

James Kanze

[ ... ]
Right, but using 'a' as an index into an array could be a
different index on different compilers. considering that
char could be signed and negative, you could have serious
consequences.
Granted, this isn't a problem in practice, but its not
portable that foo['a'] = 1 should do something specific.
That depends on what you mean by something specific.
Basically, the behavior is unspecified, but NOT undefined.

The behavior is exactly specified (or at least, as specified as
anything else in C++). You index the array with the value
corresponding to the encoding of a small a in the native
character encoding. If the goal is to index the entry
corresponding to the encoding of a small a, this is the only
correct and specified way of doing it.
 
J

James Kanze

Ivan said:
For example
int data[256];
data['a'] = 1;
data['b'] = 1;
///////////////////////////////////
gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?
Which gcc? From your example, I assumed:
int data[256];
int main()
{
data['a'] = 1;
data['b'] = 1;
return 0;
}
Compiled as C++ There was not a single warning in:
g++-4.3 (-Wall -pedantic)

g++ 4.1.0 (under Solaris) definitely warns in this case when
-Wall -pedantic is used.
mingw-gcc-3.4.1

So does 3.4.0 under Solaris, and the CygWin version of 3.4.4
under Windows.
icpc (intel CC 10.1)
Maybe you made another mistake not shown in your incomplete
excerpt.

I have no problem reproducing his warnings, with several
different versions of g++, as long as -Wall is used. The actual
warning is "char-subscripts", so adding -Wno-char-subscripts
*after* -Wall (or not using -Wall at all, but choosing
explicitly for each warning) will suppress it. Which you
probably should do---this is one of those brain dead warnings of
which every compiler seems to have a few.
 
M

Mirco Wahab

James said:
g++ 4.1.0 (under Solaris) definitely warns in this case when
-Wall -pedantic is used.
...
So does 3.4.0 under Solaris, and the CygWin version of 3.4.4
under Windows.
I have no problem reproducing his warnings, with several
different versions of g++, as long as -Wall is used. The actual
warning is "char-subscripts", so adding -Wno-char-subscripts
*after* -Wall (or not using -Wall at all, but choosing
explicitly for each warning) will suppress it. Which you
probably should do---this is one of those brain dead warnings of
which every compiler seems to have a few.

OK, I checked again (-Wall, -pedantic if possible):

1) gcc version 3.4.2 (mingw-special)
/s/misc/charsubscr/charsubscr.cxx:6: warning: array subscript has type `char'
/s/misc/charsubscr/charsubscr.cxx:7: warning: array subscript has type `char'

2) gcc version 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
charsubscr.cxx:6: warning: array subscript has type `char'
charsubscr.cxx:7: warning: array subscript has type `char'

3) gcc version 4.2.3 20071030 (Linux)
(no warning)

4) gcc version 4.3.1 20080507 [gcc-4_3-branch revision 135036] (Linux)
(no warning)

5) icpc Version 10.1 (Linux)
(no warning)

6) Visual C++ 6 (SP6), Warning Level 4 (XP/SP2)
(no warning)

7) Visual C++ 9 (SP0), Warning Level 4 (XP/SP2)
(no warning)


So the gcc < 4.x seems to be the only tool
that emits this warning (?).

Thanks & Regards

Mirco
 
J

Jerry Coffin

[ ... ]
Right, but using 'a' as an index into an array could be a
different index on different compilers. considering that
char could be signed and negative, you could have serious
consequences.
Granted, this isn't a problem in practice, but its not
portable that foo['a'] = 1 should do something specific.
That depends on what you mean by something specific.
Basically, the behavior is unspecified, but NOT undefined.

The behavior is exactly specified (or at least, as specified as
anything else in C++). You index the array with the value
corresponding to the encoding of a small a in the native
character encoding. If the goal is to index the entry
corresponding to the encoding of a small a, this is the only
correct and specified way of doing it.

Right -- all I meant is that the order in which most of those entries
are arranged isn't specified. IIRC, the only part that's specified is
that the digits will be in order and contiguous.
 
J

James Kanze

OK, I checked again (-Wall, -pedantic if possible):

[...]
So the gcc < 4.x seems to be the only tool that emits this
warning (?).

I get it with g++ 4.1. So maybe they realized how stupid it
was, and got rid of it (or at least dropped it from -Wall).
 
P

Pascal J. Bourguignon

Jerry Coffin said:
Right, but using 'a' as an index into an array could be a
different index on different compilers. considering that
char could be signed and negative, you could have serious
consequences.
Granted, this isn't a problem in practice, but its not
portable that foo['a'] = 1 should do something specific.
That depends on what you mean by something specific.
Basically, the behavior is unspecified, but NOT undefined.

The behavior is exactly specified (or at least, as specified as
anything else in C++). You index the array with the value
corresponding to the encoding of a small a in the native
character encoding. If the goal is to index the entry
corresponding to the encoding of a small a, this is the only
correct and specified way of doing it.

Right -- all I meant is that the order in which most of those entries
are arranged isn't specified. IIRC, the only part that's specified is
that the digits will be in order and contiguous.

The order is the least of the problems we have with a['a']. The main
problem is that 'a' is of type char, and char is often signed char,
therefore 'a' might be negative 0, and 'à' will most probably be
negative.

So you can use bytes to index arrays, but be careful:

int a[UCHAR_MAX+1];

char i=42;
if(0<=i){
a; // ok
}

char j='a';
a[(unsigned char)j]; // ok

unsigned char k='a';
a[k]; // best
 
T

thomas.mertes

Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?

It would be helpful, to post also the gcc warnings (complaints).

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
J

Jerry Coffin

[ ... ]
The order is the least of the problems we have with a['a']. The main
problem is that 'a' is of type char, and char is often signed char,
therefore 'a' might be negative 0, and 'à' will most probably be
negative.

The standard specifically requires that all members of the basic
execution character set be nonnegative and 'a' is a member of the basic
execution character set, so it will never be negative.

ONLY characters that are NOT members of the basic execution character
set can be encoded with negative values. That includes a lot, but there
ARE limits.
 
P

Pascal J. Bourguignon

Pete Becker said:
The order is the least of the problems we have with a['a']. The
main
problem is that 'a' is of type char, and char is often signed char,
therefore 'a' might be negative 0, and 'à' will most probably be
negative.
Phew, I knew it had to be there somewhere, and I just found it:
[lex.charset]/3: "For each basic execution character set, the
values of the members shall be non-negative and distinct from one another."
So, in particular, 'a' cannot be negative.

Sorry, missed the accent over that last 'a'. That character is not in
the basic execution character set, so its value can be negative.

Yes, but thanks for the reference, at least 'a' is not negative.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top