QoI issue: silencing a warning

G

glen herrmannsfeldt

In comp.lang.c Ian Collins said:
Noob wrote:
(snip)
$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
Whatever the implementation of isdigit, why would a char subscript be
worthy of a diagnostic? Yes char is usually signed, but so are short,
int and long.

A signed char overflows to a negative value sooner.

Until not so many years ago, you would run past the end of memory
before an int subscript wrapped to negative values. Not so with
short of char.

Note that Java avoids this problem, as char is unsigned 16 bits,
all the other types in Java are signed.

-- glen
 
D

David Brown

(snip)

Seems to me the best suggestion. When you could reasonably rely
on character data being plain ASCII, there might have been some
argument against it, especially on processors that included a way
to convert signed char to int that didn't conveniently also allow
unsigned char to int.

I certainly have never found a use for a signed plain char - when you
want to put a number in a char, it should be explicitly signed or
unsigned. In my own code, I use int8_t and uint8_t extensively - "char"
is only for characters in strings. (And if the code will be run on cpus
without 8-bit chars, which I occasionally use, then int_least8_t and
uint_least8_t are the types to use - these types have a sensible meaning
in their names, unlike "signed char".)

But the normal ABI on x86 uses signed plain chars, so it is possible
that some code relies on that - so be careful.
 
K

Keith Thompson

glen herrmannsfeldt said:
Seems to me the best suggestion. When you could reasonably rely
on character data being plain ASCII, there might have been some
argument against it, especially on processors that included a way
to convert signed char to int that didn't conveniently also allow
unsigned char to int.

Making plain char unsigned is sensible.

Making plain char unsigned in code that needs to interoperate with other
code written with *signed* plain char could cause problems.
 
S

Stephen Sprunk

x86 has a complicated history that traces back at least to the 8080.
The 8086 16 bit registers (AX, BX, CX, DX) could also be referened
as two 8 bit registers (AL, AH, BL, BH, etc). In that case, I believe
it would be more natural (easier) to zero the high byte than to sign
extend it, but it is about as easy either way.

A simple MOV to either AL or AH does not modify the other; to get either
sign- or zero-extension, you have to MOVSX or MOVZX to AX.

IIRC, a simple MOV to AX does not modify the top half of EAX; to get
either sign- or zero-extension, you have to MOVSX or MOVZX to EAX.
There is no way to assign to the top half only.

Assigning to EAX _is_ sign-extended to the top half of RAX, which
prevents partial register stalls, which among other things makes 32-bit
immediates rather useful for pointers even in 64-bit mode.

There is no way to access just the top half of SI, DI, SP or BP, but in
64-bit mode you can access just the lower half via SIL, DIL, SPL or BPL;
the same rules apply to those as for AL, BL, CL and DL.

What a mess.

S
 
B

BartC

But the normal ABI on x86 uses signed plain chars, so it is possible that
some code relies on that - so be careful.

What does that mean? In binary, a signed char looks exactly like an unsigned
one. There is a difference when a char is sign-extended, but that only
happens when the value being passed is wider, so not a char type.
 
D

David Brown

What does that mean? In binary, a signed char looks exactly like an
unsigned
one. There is a difference when a char is sign-extended, but that only
happens when the value being passed is wider, so not a char type.

There is no problem when you link pre-compiled code - as you say, there
is no difference at the binary level. But you might find source code
that has been tested to work on x86 and which makes the assumption that
"char" is signed - that assumption is valid on x86 because signed char
is the standard on that processor. So the risk is that enabling
"-funsigned-char" on gcc could break existing working code (albeit
non-portable code).
 
G

glen herrmannsfeldt

(snip)
There is no problem when you link pre-compiled code - as you say, there
is no difference at the binary level. But you might find source code
that has been tested to work on x86 and which makes the assumption that
"char" is signed - that assumption is valid on x86 because signed char
is the standard on that processor. So the risk is that enabling
"-funsigned-char" on gcc could break existing working code (albeit
non-portable code).

It might also fix already broken code. There might be a lot using
char for subscripts with the warning disabled, and assuming that
the source is ASCII.

Conveniently a problem that never occurs in the EBCDIC world.
(EBCDIC letters and digits all have the high bit of a byte set.)

-- glen
 
M

Malcolm McLean

Probably because a programmer using a char as a subscript very likely
assumed he could use it to implement a lookup table covering the entire
range of char values. (Which is exactly what happened here, though
indirectly.)
Also int is not usually signed, it's always signed. So if the code is broken
because of a negative subscript, it will break on the test platform.
Of course there should be a warning if someone redefines a basic integer
type with typedef then uses that type as subscript, because you have a similar
potential problem.
 
J

JohnF

Malcolm McLean said:
Also int is not usually signed, it's always signed. So if the code is broken
because of a negative subscript, it will break on the test platform.
Of course there should be a warning if someone redefines a basic integer
type with typedef then uses that type as subscript, because you have a similar
potential problem.

Just curious -- what's necessarily wrong with negative subscripts?
For example, kinda silly, but just to make the point,
unsigned char *mem = malloc(10000), *p = mem+5000;
int i=0;
for ( i=(-4000); i<=4000; i++ ) p='\000';
That a problem?
 
K

Keith Thompson

JohnF said:
Malcolm McLean said:
Also int is not usually signed, it's always signed. So if the code is broken
because of a negative subscript, it will break on the test platform.
Of course there should be a warning if someone redefines a basic integer
type with typedef then uses that type as subscript, because you have a similar
potential problem.

Just curious -- what's necessarily wrong with negative subscripts?
For example, kinda silly, but just to make the point,
unsigned char *mem = malloc(10000), *p = mem+5000;
int i=0;
for ( i=(-4000); i<=4000; i++ ) p='\000';
That a problem?


No, that's perfectly valid -- but it's unusual enough that it doesn't
justify *not* issuing a warning.
 
G

glen herrmannsfeldt

(snip, someone wrote)
(snip)

Just curious -- what's necessarily wrong with negative subscripts?
For example, kinda silly, but just to make the point,
unsigned char *mem = malloc(10000), *p = mem+5000;
int i=0;
for ( i=(-4000); i<=4000; i++ ) p='\000';
That a problem?


In that case, the compiler might not warn about it.

If it is an array (that is, declared with [] and a length, not with *)
the compiler knows that, and can take that into consideration.

Also, it is a warning, but I agree that it might be that you should
not get the warning in the pointer case.

If you did something like:

unsigned char mem[10000], *p = mem+5000;
int i=0;
for ( i=(-4000); i<=4000; i++ ) p='\000';

the compile might realize that p has an offset.

The warning previously indicated was for signed char, which overflow
(and often wrap) easier than larger signed integer types.

-- glen
 
T

Tim Rentsch

Ian Collins said:
Noob said:
[ NOTE : cross-posted to comp.lang.c and comp.unix.programmer,
please trim as you see fit ]

Hello,

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}

$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]

Whatever the implementation of isdigit, why would a char subscript be
worthy of a diagnostic? Yes char is usually signed, but so are short,
int and long.

The type char is unique[*] among all the standard integer types
in that it is signed on some implementations and unsigned on
others. That's my take on the warning anyway.

[*] Technically there is one other, in that an 'int' bitfield is
allowed to be unsigned instead of signed, at the implementation's
whim. So a compiler flag warning on 'int' bitfields also seems
like a good option to have.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top