[There was much snippage and I am not 100% sure that the attributions
here are correct, but I think:]
somenath said:
I have changed the code as below
#include <stdio.h>
int main(void)
{
int x = 0x7fff;
signed char y;
y =(signed char) x;
printf("%hhx\n", y);
return 0;
}
Now is it guaranteed that y will hold ff which is the last byte of x ?
No, unless ... well, I will get to that in a moment.
Now the output is ffffffff
Why it is not ff only ?
The "hh" modifier is new in C99; C89 does not have it, and if you are
using a C89 system (rather than a C99 one -- and C99 systems are still
rather rare, so even if your compiler supports *some* C99 features, it
probably is not actually a C99 implementation, and may not support the
hh modifier), the result is unpredictable.
In C99, "%hhx" in printf() says that the argument is an int or unsigned
int resulting from widening an unsigned char, and should be narrowed
back to an unsigned char and then printed otherwise the same as "%x".
In C89, most printf engines are likely to implement this as "%hx" or
just plain "%x", and yours appears to do the latter.
Let me go back to the original version for a moment:
int x = 0xFFFFFFF0;
signed char y;
y = x;
and let us further consider two actual implementations, one on a PDP-11
(16-bit int, 32-bit long, two's complement) and one on a Univac (18-bit
int, 36-bit long, ones' complement).
The expression
0xFFFFFFF0
has type "unsigned long" on the PDP-11, because the value is
4294967280, which exceeds INT_MAX (32767) and LONG_MAX (2147483647)
but not ULONG_MAX (4294967295). It has type "long" on the Univac,
because the value exceeds INT_MAX (131071) but not LONG_MAX
(34359738367).
This value does not fit in an "int" on either machine, but both
happen to merely "chop off" excess bits in assignment (I think --
I know this is how the PDP-11 works; the Univac compiler is more
opaque). On the PDP-11, then, assigning this value to the 16-bit
signed int named "x" results in setting the bits of x to 0xfff0,
which represents -16. On the Univac, it sets the 18-bit signed
int to 0x3fff0, which represents -15. (See
<
http://web.torek.net/torek/c/numbers.html> for an explanation of
ones' and two's complement.)
Luckily, the values -16 and -15 are both always in range for a
"signed char", which must be able to hold values between -127 and
127 inclusive. So, on the PDP-11, the assignment to y sets y to
-16, and on the Univac, it sets it to -15. These have bit patterns
0xfff0 and 0x3fff0 respectively. If you then pass these values to
printf() using the "%x" format, you will see fff0 (on the PDP-11)
and 3fff0 (on the Univac).
(Aside: "%x" expects an unsigned int, but a signed char converts
to a signed int on all C systems. The C standards have text that
implies that this is "supposed to work" -- you are supposed to be
able to pass values of types that are correct in all but signedness
-- but does not come right out and demand that implementations
*make* it work. It may be wise to avoid depending on it, at least
in some situations.)
[Note that a C implementation is allowed, but not required, to
catch the fact that 4294967280 does not fit in an ordinary "int".
So it is possible that the code will not compile, or that if it
does compile, it might not run. In general, however, people who
buy computers are most interested in getting the wrong answer as
fast as possible
, so they tend to overlook things like
reliability and bug-detection in favor of whatever system has the
most gigahertz or teraflops. Computer-system-makers usually indulge
these customers: there is no point in working long and hard to
build a system that no one will buy. In some cases, like flight
control systems on airplanes or computers inside medical devices,
people are actually willing to pay for correctness. More seriously,
it is a matter of trade-offs: correctness is not so important in
a handheld game machine, but incorrect operation of the brakes in
your car could be disastrous. Unfortunately for those who *want*
reliability, "fast because we omitted all the correctness-checking"
tends to be the default -- we have to add our own checking.]
In the second, modified version, the code now reads:
int x = 0x7fff;
signed char y;
y =(signed char) x;
The constant 0x7fff has value 32767. All C systems are required
to have INT_MAX be at least 32767 (as, e.g., on the PDP-11; it may
be larger, as, e.g., on most Intel-based systems like the one you
are no doubt using). So 32767 has type "int" and fits in "x",
eliminating a lot of concerns.
The "signed char" type, however, need only hold values in -127 to
127. Chances are that your system, whatever it is, holds values
in -128 to 127. On the Univac, which has 9-bit "char"s, it holds
values in -255 to 255 inclusive (not -256, just -255). Conversion
from plain (signed) int to signed char can produce implementation-defined
results, if the value of the "int" does not fit in the "signed
char" (as is generally the case here). (I seem to recall that the
wording for casts is slightly different from that for ordinary
assignments, but cannot find the text in the Standard at the moment.)
Thus, there is no guarantee that y will hold "ff" (as a bit pattern)
-- and on the Univac, it probably holds 0x1ff as a bit pattern,
which represents the value -0 (negative zero). Whether you consider
a 9-bit byte a "byte" is also not clear to me (but I note that the
C Standard does: it says that a "byte" is a char, however many bits
that may be).
Finally, consider the phrasing of this question:
Now is it guaranteed that y will hold ff which is the last byte of x ?
The whole concept of "last byte" is rather fuzzy: which byte(s)
are "first" and which are "last"? On an 8-bit little-endian machine,
like the Intel based systems most people are most familiar with,
the least significant byte comes *first* in numerical order, not
last. I believe it is better to think not in terms of "machine
byte order" -- which is something you can only control by picking
which machines you use -- but rather to think in terms of values
and representations. As a C programmer, you have a great deal of
control of values, and if you use "unsigned" types, you have complete
control of representations. For instance, you can read a 10-bit
two's complement value from a stdio stream, with the first input
char giving the uppermost 2 bits, using "unsigned int" this way:
/*
* Read one 2-bit value and one 8-bit value from the given stream,
* and compose a signed 10-bit value (in the range [-512..+511])
* from those bits.
*/
int get_signed_10_bit_value(FILE *fp) {
int c0, c1;
unsigned int val;
c0 = getc(fp);
if (c0 == EOF) ... handle error ...
c1 = getc(fp);
if (c1 == EOF) ... handle error ...
val = ((c0 & 0x03) << 8) | (c1 & 0xff);
return (val ^ 0x200) - 0x200;
}
(Note that when you go to read more than 15 bits, you need to be
careful with intermediate values, since plain int may have as few
as 15 non-sign "value bits", and unsigned int may have as few as
16. You will need to convert values to "unsigned long", using
temporary variables, casts, or both.)
This xor-and-subtract trick works on all implementations, including
ones' complement machine like the Univac. Its only real limitation
is that the final (signed) value has to fit in the types available:
a 16-bit two's complement machine has a -32768 but a 16-bit ones'
complement machine bottoms out at -32767. (As it happens, though,
anything other than two's complement is rare today, so you probably
need not worry about this very much.)