QoI issue: silencing a warning

glen herrmannsfeldt · Mar 17, 2014

In comp.lang.c Ian Collins said:
Noob wrote:
(snip)

$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]

Click to expand...

Whatever the implementation of isdigit, why would a char subscript be
worthy of a diagnostic? Yes char is usually signed, but so are short,
int and long.

A signed char overflows to a negative value sooner.

Until not so many years ago, you would run past the end of memory
before an int subscript wrapped to negative values. Not so with
short of char.

Note that Java avoids this problem, as char is unsigned 16 bits,
all the other types in Java are signed.

-- glen

David Brown · Mar 17, 2014

(snip)

Seems to me the best suggestion. When you could reasonably rely
on character data being plain ASCII, there might have been some
argument against it, especially on processors that included a way
to convert signed char to int that didn't conveniently also allow
unsigned char to int.

I certainly have never found a use for a signed plain char - when you
want to put a number in a char, it should be explicitly signed or
unsigned. In my own code, I use int8_t and uint8_t extensively - "char"
is only for characters in strings. (And if the code will be run on cpus
without 8-bit chars, which I occasionally use, then int_least8_t and
uint_least8_t are the types to use - these types have a sensible meaning
in their names, unlike "signed char".)

But the normal ABI on x86 uses signed plain chars, so it is possible
that some code relies on that - so be careful.

Keith Thompson · Mar 17, 2014

glen herrmannsfeldt said:
Seems to me the best suggestion. When you could reasonably rely
on character data being plain ASCII, there might have been some
argument against it, especially on processors that included a way
to convert signed char to int that didn't conveniently also allow
unsigned char to int.

Making plain char unsigned is sensible.

Making plain char unsigned in code that needs to interoperate with other
code written with *signed* plain char could cause problems.

Stephen Sprunk · Mar 17, 2014

x86 has a complicated history that traces back at least to the 8080.
The 8086 16 bit registers (AX, BX, CX, DX) could also be referened
as two 8 bit registers (AL, AH, BL, BH, etc). In that case, I believe
it would be more natural (easier) to zero the high byte than to sign
extend it, but it is about as easy either way.

A simple MOV to either AL or AH does not modify the other; to get either
sign- or zero-extension, you have to MOVSX or MOVZX to AX.

IIRC, a simple MOV to AX does not modify the top half of EAX; to get
either sign- or zero-extension, you have to MOVSX or MOVZX to EAX.
There is no way to assign to the top half only.

Assigning to EAX _is_ sign-extended to the top half of RAX, which
prevents partial register stalls, which among other things makes 32-bit
immediates rather useful for pointers even in 64-bit mode.

There is no way to access just the top half of SI, DI, SP or BP, but in
64-bit mode you can access just the lower half via SIL, DIL, SPL or BPL;
the same rules apply to those as for AL, BL, CL and DL.

What a mess.

S

BartC · Mar 18, 2014

But the normal ABI on x86 uses signed plain chars, so it is possible that
some code relies on that - so be careful.

What does that mean? In binary, a signed char looks exactly like an unsigned
one. There is a difference when a char is sign-extended, but that only
happens when the value being passed is wider, so not a char type.

David Brown · Mar 18, 2014

What does that mean? In binary, a signed char looks exactly like an
unsigned
one. There is a difference when a char is sign-extended, but that only
happens when the value being passed is wider, so not a char type.

There is no problem when you link pre-compiled code - as you say, there
is no difference at the binary level. But you might find source code
that has been tested to work on x86 and which makes the assumption that
"char" is signed - that assumption is valid on x86 because signed char
is the standard on that processor. So the risk is that enabling
"-funsigned-char" on gcc could break existing working code (albeit
non-portable code).

glen herrmannsfeldt · Mar 18, 2014

(snip)

There is no problem when you link pre-compiled code - as you say, there
is no difference at the binary level. But you might find source code
that has been tested to work on x86 and which makes the assumption that
"char" is signed - that assumption is valid on x86 because signed char
is the standard on that processor. So the risk is that enabling
"-funsigned-char" on gcc could break existing working code (albeit
non-portable code).

It might also fix already broken code. There might be a lot using
char for subscripts with the warning disabled, and assuming that
the source is ASCII.

Conveniently a problem that never occurs in the EBCDIC world.
(EBCDIC letters and digits all have the high bit of a byte set.)

-- glen

Malcolm McLean · Mar 22, 2014

Probably because a programmer using a char as a subscript very likely
assumed he could use it to implement a lookup table covering the entire
range of char values. (Which is exactly what happened here, though
indirectly.)

Also int is not usually signed, it's always signed. So if the code is broken
because of a negative subscript, it will break on the test platform.
Of course there should be a warning if someone redefines a basic integer
type with typedef then uses that type as subscript, because you have a similar
potential problem.

JohnF · Mar 22, 2014

Malcolm McLean said:
Also int is not usually signed, it's always signed. So if the code is broken
because of a negative subscript, it will break on the test platform.
Of course there should be a warning if someone redefines a basic integer
type with typedef then uses that type as subscript, because you have a similar
potential problem.

Just curious -- what's necessarily wrong with negative subscripts?
For example, kinda silly, but just to make the point,
unsigned char *mem = malloc(10000), *p = mem+5000;
int i=0;
for ( i=(-4000); i<=4000; i++ ) p='\000';
That a problem?

Keith Thompson · Mar 23, 2014

JohnF said:
Malcolm McLean said:

Also int is not usually signed, it's always signed. So if the code is broken
because of a negative subscript, it will break on the test platform.
Of course there should be a warning if someone redefines a basic integer
type with typedef then uses that type as subscript, because you have a similar
potential problem.

Click to expand...

Just curious -- what's necessarily wrong with negative subscripts?
For example, kinda silly, but just to make the point,
unsigned char *mem = malloc(10000), *p = mem+5000;
int i=0;
for ( i=(-4000); i<=4000; i++ ) p='\000';
That a problem?

No, that's perfectly valid -- but it's unusual enough that it doesn't
justify *not* issuing a warning.

glen herrmannsfeldt · Mar 23, 2014

(snip, someone wrote)

(snip)

Just curious -- what's necessarily wrong with negative subscripts?
For example, kinda silly, but just to make the point,
unsigned char *mem = malloc(10000), *p = mem+5000;
int i=0;
for ( i=(-4000); i<=4000; i++ ) p='\000';
That a problem?

In that case, the compiler might not warn about it.

If it is an array (that is, declared with [] and a length, not with *)
the compiler knows that, and can take that into consideration.

Also, it is a warning, but I agree that it might be that you should
not get the warning in the pointer case.

If you did something like:

unsigned char mem[10000], *p = mem+5000;
int i=0;
for ( i=(-4000); i<=4000; i++ ) p='\000';

the compile might realize that p has an offset.

The warning previously indicated was for signed char, which overflow
(and often wrap) easier than larger signed integer types.

-- glen

Tim Rentsch · Mar 29, 2014

Ian Collins said:
Noob said:

[ NOTE : cross-posted to comp.lang.c and comp.unix.programmer,
please trim as you see fit ]

Hello,

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}

$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]

Click to expand...

Whatever the implementation of isdigit, why would a char subscript be
worthy of a diagnostic? Yes char is usually signed, but so are short,
int and long.

The type char is unique[*] among all the standard integer types
in that it is signed on some implementations and unsigned on
others. That's my take on the warning anyway.

[*] Technically there is one other, in that an 'int' bitfield is
allowed to be unsigned instead of signed, at the implementation's
whim. So a compiler flag warning on 'int' bitfields also seems
like a good option to have.

Adding adressing of IPv6 to program	1	Feb 16, 2023
strcpy(s, "ABC"); gives a warning why...	4	Jul 25, 2012
no warning for data truncation?	1	Oct 23, 2008
WIN32 - Update Text in a Window in order to show its size in Pixels and coordinates	0	Oct 4, 2023
Const Issue	2	Aug 25, 2010
const, typedef, and warning	14	Nov 29, 2010
warning of breaking strict-aliasing rules	9	Apr 10, 2012
Compiler warning about comparison always false	11	Mar 7, 2012

QoI issue: silencing a warning

glen herrmannsfeldt

David Brown

Keith Thompson

Stephen Sprunk

BartC

David Brown

glen herrmannsfeldt

Malcolm McLean

JohnF

Keith Thompson

glen herrmannsfeldt

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads