signed and unsigned char

  • Thread starter Christopher Benson-Manica
  • Start date
C

Christopher Benson-Manica

Given

signed char str_a[]="Hello, world!\n";
unsigned char str_b[]="Hello, world!\n";

what is the difference, if any, between the following two statements?

printf( "%s", str_a );
printf( "%s", str_b );

If there is a difference, what is the best way to compare *str_a with
0xFF? (On my implementation, unadorned char is signed, and so I'm
using

if( *str_a == (signed char)0xFF ) ...

to quiet compiler warnings.)
 
P

pete

Christopher said:
Given

signed char str_a[]="Hello, world!\n";
unsigned char str_b[]="Hello, world!\n";

what is the difference, if any, between the following two statements?

printf( "%s", str_a );
printf( "%s", str_b );

If there is a difference, what is the best way to compare *str_a with
0xFF?

char type arguments are converted to int.
That result is converted to unsigned char for stdio output.
There shouldn't be a problem there, regardless of sign of char.
(On my implementation, unadorned char is signed, and so I'm
using

if( *str_a == (signed char)0xFF ) ...

if( *(unsigned char*)str_a == -1 )

/* assuming 0xff is meant to be all bits set */
 
L

Leor Zolman

Given

signed char str_a[]="Hello, world!\n";
unsigned char str_b[]="Hello, world!\n";

what is the difference, if any, between the following two statements?

printf( "%s", str_a );
printf( "%s", str_b );

Other than that they take different arguments? None that I can see.
There are no conversions to worry about, and once the pointer values
land in printf, there's no way for printf to be able tell the
difference anyway (and no reason for it to care).
If there is a difference, what is the best way to compare *str_a with
0xFF? (On my implementation, unadorned char is signed, and so I'm
using

if( *str_a == (signed char)0xFF ) ...

to quiet compiler warnings.)

I'm not sure how the 2nd question is dependent upon the first... What
you've done is clearly showing your intent, which is a Good Thing. One
alternative is
if (*str_a == -1) ...
but I like yours better.




Leor Zolman
BD Software
(e-mail address removed)
www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html
 
M

Martin Johansen

Christopher Benson-Manica said:
Given

signed char str_a[]="Hello, world!\n";
unsigned char str_b[]="Hello, world!\n";

what is the difference, if any, between the following two statements?

printf( "%s", str_a );
printf( "%s", str_b );

'^' != (unsigned char)'^' for example.

Extended ascii chars have negative values (since they are higher than 127).
 
C

Christopher Benson-Manica

pete said:
if( *(unsigned char*)str_a == -1 )

1) Is that better than ... (unsigned char)*str_a ... ?
2) This relies on -1's representation being all bits set, yes?
 
E

Eric Sosman

pete said:
Given

signed char str_a[]="Hello, world!\n";
unsigned char str_b[]="Hello, world!\n";

what is the difference, if any, between the following two statements?

printf( "%s", str_a );
printf( "%s", str_b );

If there is a difference, what is the best way to compare *str_a with
0xFF?

char type arguments are converted to int.

True (usually), but irrelevant: the arguments are not
any kind of `char', but are pointers.
That result is converted to unsigned char for stdio output.

Either the Standard doesn't say so, or I've overlooked
the spot where it does.
There shouldn't be a problem there, regardless of sign of char.

There isn't a problem, because the "%s" specifier is defined
to work with any of `char*', `unsigned char*', and `signed char*'.
Something of an oddity, really: Most conversion specifiers are
very strict about the type of the corresponding argument, yet
here's one that accepts arguments of three distinct types.

Undefined behavior, I think. You might do better with

if ( (unsigned char)*str == 0xFF )
 
P

pete

Christopher said:
1) Is that better than ... (unsigned char)*str_a ... ?

No. Now, I like
if( (unsigned char)*str_a == (unsigned char)-1)
or
if( (unsigned char)*str_a == (unsigned char)0xFF)
best.
2) This relies on -1's representation being all bits set, yes?

If str_a points to an all bits set byte,
then *(unsigned char*)str_a will equal ((unsigned char)-1)

then the question becomes, is
((unsigned char)-1) equal to (-1) ?
and it isn't, so I was wrong.

if( (unsigned char)*str_a == (unsigned char)0xFF )

The conversion of out of range values to signed char,
is implementation defined, so I would avoid it.
The conversion of everything to unsigned char, is well defined.

0xff is of type int.
 
J

Jeremy Yallop

Eric said:
Either the Standard doesn't say so, or I've overlooked
the spot where it does.

I think pete's right. "Byte output" functions, printf included, are
defined in terms of fputc:

The byte output functions write characters to the stream as if by
successive calls to the fputc function. [7.19.3#12]

and the description of fputc says:

The fputc function writes the character specified by c (converted
to an unsigned char) to the output stream pointed to by stream
[...] [7.19.7.3#2]

Jeremy.
 
K

Keith Thompson

Martin Johansen said:
Christopher Benson-Manica said:
Given

signed char str_a[]="Hello, world!\n";
unsigned char str_b[]="Hello, world!\n";

what is the difference, if any, between the following two statements?

printf( "%s", str_a );
printf( "%s", str_b );

'^' != (unsigned char)'^' for example.

Extended ascii chars have negative values (since they are higher than 127).

<Slightly OT>
Actually the ASCII value of '^' is 94 (unless my newsreader mangled
whatever extended ASCII character you actually wrote).
</Slightly OT>

All the characters in the string literals above are in the "basic
execution character set". C99 6.2.5p3 says:

If a member of the basic execution character set is stored in a
char object, its value is guaranteed to be positive.

In ASCII, all such characters happen to have values in the range
32..126. In EBCDIC, if I recall correctly, some basic characters have
codes greater than 127; I think this implies that in an implementation
that uses EBCDIC as its execution character set, type char must be
unsigned (assuming CHAR_BIT==8).
 
R

Richard Bos

pete said:
Christopher said:
signed char str_a[]="Hello, world!\n";
unsigned char str_b[]="Hello, world!\n";

what is the difference, if any, between the following two statements?

printf( "%s", str_a );
printf( "%s", str_b );

If there is a difference, what is the best way to compare *str_a with
0xFF?

char type arguments are converted to int.

They're not char type arguments, they're _pointers_ to char.

AFAICT pointers to signed and unsigned types must behave the same, btw.

Not perfectly guaranteed to work. Conversion of unsigned values to
signed types, when they're out of range, is implementation-defined. It's
not undefined, which means that at least it's guaranteed not to give
false negatives (0xFF must be converted to _some_ signed char value, not
to random junk), but it may give false positives (AFAICT, it's legal for
some other value to convert to the same signed char as 0xFF).

if ((unsigned char)*str_a == 0xFF)

should be perfectly defined, but will not do what you expect if chars
are more than eight bits. OTOH,

if ((unsigned char)*str_a == UCHAR_MAX)

is also well-defined, but will not compare to 0xFF when CHAR_BIT > 8.

Richard
 
P

pete

Richard said:
pete said:
Christopher said:
signed char str_a[]="Hello, world!\n";
unsigned char str_b[]="Hello, world!\n";

what is the difference, if any, between the following two statements?

printf( "%s", str_a );
printf( "%s", str_b );

If there is a difference, what is the best way to compare *str_a with
0xFF?

char type arguments are converted to int.

They're not char type arguments, they're _pointers_ to char.

I was considering printf, as a byte output function,
defined in terms of fputc as per Jeremy Yallop's
last post in this thread.

http://groups.google.com/[email protected]
 
C

Christopher Benson-Manica

pete said:
If str_a points to an all bits set byte,
then *(unsigned char*)str_a will equal ((unsigned char)-1)

Is that guaranteed? (non-rhetorical question)
 
D

Dan Pop

In said:
signed char str_a[]="Hello, world!\n";
unsigned char str_b[]="Hello, world!\n";

what is the difference, if any, between the following two statements?

printf( "%s", str_a );
printf( "%s", str_b );

None.

s If no l length modifier is present, the argument shall
be a pointer to the initial element of an array of
character type.
^^^^^^^^^^^^^^
Both str_a and str_b are arrays of character type.
If there is a difference, what is the best way to compare *str_a with
0xFF? (On my implementation, unadorned char is signed, and so I'm
using

if( *str_a == (signed char)0xFF ) ...

to quiet compiler warnings.)

It's not clear what exactly you want to achieve here. If you want to see
if the respective character value has a certain representation, the most
portable approach is to use a pointer to unsigned char:

if( *(unsigned char *)str_a == 0xFF ) ...

This works even if this pattern is a trap representation for the type
signed char.

OTOH, if you want to check that your character has a certain value,
simply compare against that value:

if( *str_a == -1 ) ...

Comparing an object with a value it cannot possibly take, as in your
example, doesn't make much sense a priori, so you have to explain your
exact intentions.

BTW, if str_a were an array of plain char, you had the following solution:

if( *str_a == '\xff' ) ...

but still not guaranteed to work if this bit pattern is a trap
representation for plain char.

As other people have already mentioned, (signed char)0xFF is useless for
your purpose, in a portability context, because the result need not be
the signed char value corresponding to that bit pattern. Casts really
are *conversion* operators and not devices for silencing the compilers.

Dan
 
P

pete

Christopher said:
Is that guaranteed? (non-rhetorical question)

Yes.
For the type unsigned char, there are only value bits;
there are no padding bits and there is no sign bit.
The expression
*(unsigned char*)str_a
will cause the byte at str_a,
to be evaluated by it's bit pattern,
according to the rules for the representation of unsigned char.

(-1) cast to any unsigned type,
is the MAX value for that unsigned type,
which is, all value bits, set to one.
 
E

Eric Sosman

Richard said:
Implementation-defined, surely?

Harrumph. I guess so, but the distinction seems not
to be very important. When `(signed char)0xFF' is evaluated

"... either the result is implementation-defined or an
implementation-defined signal is raised." (6.3.1.3/3)

On the face of it, that's implementation-defined behavior and
not undefined behavior. But what if the implementation takes
the second alternative and raises a signal? If a function has
been installed to handle the signal

"If and when the function returns, if the value of _sig_
is [...] or any other implementation-defined value
corresponding to a computational exception, the behavior
is undefined; [...]" (7.14.1.1/3)

So if there's a handler, it cannot return without invoking
undefined behavior. I guess that means it must call abort()
or _Exit() or run an infinite loop; all of these have defined
effects, but are sufficiently unfortunate that they ought to
be avoided just about as strenuously as undefined behavior.
No nasal demons, surely, but no happy outcome either.

If there's not a handler, the implementation-defined signal
is treated as if one of SIG_IGN or SIG_DFL had been set up (the
choice is implementation-defined). If the handling is equivalent
to SIG_IGN, I think we're back in U.B. territory again: we're
told that we'll get *either* a result *or* a signal, not both.
Thus, we can't count on getting a result of any kind if a signal
is raised and ignored; the Standard doesn't specify any behavior,
so the behavior is undefined by omission (c.f. 3.4.3).

In the SIG_DFL case, the handling of the implementation-defined
signal is implementation-defined, not undefined. But right here
in the documentation I see

"The default handling for SIGBITROT causes demons
to fly out of your nose." (DS 9000 programmer's
manual, courtesy Armed Response Technologies)

.... which is not undefined behavior, but might seem so to a
casual observer. ;-)

Summary:

- You're right: `(signed char)0xFF' produces implementation-
defined, not undefined, behavior. My apologies.

- ... but since the I.B. is just about as unpredictable as
U.B., the programmer would be well-advised to avoid it.

- The *real* solution, I think, is to use `unsigned' types
whenever you want to deal with bits as bits. To ask the
question "Does this byte have all its bits set?", one
should not use potentially signed arithmetic. To answer
the question "Does this byte have the value 42?", either
signed or unsigned arithmetic will do.

- And, of course, all this is just another c.l.c exercise
in taking a census on a pinhead. We know perfectly well
that two's complement has won the game and extinguished
its competitors, right? And we're certain that it's the
ultimate in integer representations, and will never ever
be supplanted, right? Computer design is immune to the
vagaries of fashion, right?

(Ahem.) "Right?"

(I know you're out there; I can hear you breathing. C'mon,
stand up and be counted -- in two's complement ...)
 
D

Dan Pop

In said:
Harrumph. I guess so, but the distinction seems not
to be very important. When `(signed char)0xFF' is evaluated

"... either the result is implementation-defined or an
implementation-defined signal is raised." (6.3.1.3/3)

On the face of it, that's implementation-defined behavior and
not undefined behavior. But what if the implementation takes
the second alternative and raises a signal?

It won't, for backward compatibility with C89, which doesn't allow any
signal to be raised because of this. Breaking perfectly correct C89
code is not an option any serious implementor is going to adopt, *if* it
can be avoided.

This is a typical case where C99 fixed something that wasn't broken in
C89. And the person responsible for it couldn't produce a *convincing*
rationale...

Dan
 
M

Michael Wojcik

In ASCII, all such characters happen to have values in the range
32..126. In EBCDIC, if I recall correctly, some basic characters have
codes greater than 127; I think this implies that in an implementation
that uses EBCDIC as its execution character set, type char must be
unsigned (assuming CHAR_BIT==8).

You recall correctly; the EBCDIC decimal digits, for example, are 0xF0
through 0xF9. It hadn't occurred to me earlier that this implied that
an EBCDIC implementation where CHAR_BIT==8 would have to make plain
char unsigned, but I suppose it would.

(I could check an EBCDIC implementation or two if anyone's curious, but
of course that wouldn't prove anything one way or the other.)
 
L

Larry Jones

Michael Wojcik said:
(I could check an EBCDIC implementation or two if anyone's curious, but
of course that wouldn't prove anything one way or the other.)

For what it's worth, every EBCDIC implementation I've ever seen -- and
I've seen a few -- has had plain char unsigned.

-Larry Jones

The problem with the future is that it keeps turning into the present.
-- Hobbes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top