Behavior of the code

J

Joe Wright

jacob said:
I was speaking about the second version, and did not see that you
have changed it AGAIN. After that change I get ffff again.

The bug I was referring to was for this code
#include <stdio.h
int main(void)
{
int x = 0x7fff;
signed char y;
y =(signed char) x;
printf("%hhx\n", y);
return 0;


}
Bug? I reformatted the above code and fixed the stdio.h invocation..

#include <stdio.h>
int main(void)
{
int x = 0x7fff;
signed char y;
y = (signed char) x;
printf("%hhx\n", y);
return 0;
}

...and I still get ffff as I expect. If you get ff please explain why and
also explain the "%hhx" format string.

Joyeux Noel
 
J

jacob navia

Joe said:
Bug? I reformatted the above code and fixed the stdio.h invocation..

#include <stdio.h>
int main(void)
{
int x = 0x7fff;
signed char y;
y = (signed char) x;
printf("%hhx\n", y);
return 0;
}

..and I still get ffff as I expect. If you get ff please explain why and
also explain the "%hhx" format string.

Joyeux Noel

[root@gateway tmp]# cat thh.c
#include <stdio.h>
int main(void)
{
int x = 0x7fff;
signed char y;
y =(signed char) x;
printf("%hhx\n", y);
return 0;
}
[root@gateway tmp]# gcc thh.c
[root@gateway tmp]# ./a.out
ff
[root@gateway tmp]# gcc -v
Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux-gnu/2.96/specs
gcc version 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)
 
T

Thomas Lumley

somenath wrote:





Because of format specifier mismatch. You are supplying a char and
telling it to look for an int. As a result, printf accesses more bytes
than it should and prints whatever happens to be in the extra bytes.
Use the 'hhx' format specifier and try again. Also use a cast for the
assignment to 'y'.

A clarification: this sounds as though you mean that printf() is
accessing memory it isn't supposed to (which would undefined
behaviour). The argument to printf() *is* an int because of the
default argument promotions, and so %x is a perfectly valid
specifier, it just doesn't give the output that the OP expected.

-thomas
 
J

Joe Wright

jacob said:
Joe said:
Bug? I reformatted the above code and fixed the stdio.h invocation..

#include <stdio.h>
int main(void)
{
int x = 0x7fff;
signed char y;
y = (signed char) x;
printf("%hhx\n", y);
return 0;
}

..and I still get ffff as I expect. If you get ff please explain why
and also explain the "%hhx" format string.

Joyeux Noel

[root@gateway tmp]# cat thh.c
#include <stdio.h>
int main(void)
{
int x = 0x7fff;
signed char y;
y =(signed char) x;
printf("%hhx\n", y);
return 0;
}
[root@gateway tmp]# gcc thh.c
[root@gateway tmp]# ./a.out
ff
[root@gateway tmp]# gcc -v
Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux-gnu/2.96/specs
gcc version 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)
I can do that too.

C:\work\c\clc>cat xx1.c
#include <stdio.h>
int main(void)
{
int x = 0x7fff;
signed char y;
y = (signed char) x;
printf("%hhx\n", y);
return 0;
}

C:\work\c\clc>gcc xx1.c -o xx1.exe

C:\work\c\clc>xx1
ffff

C:\work\c\clc>gcc --version
gcc.exe (GCC) 3.1
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Do you think 2.96 is broken? I think ffff is right and you think ff is
right. I ask you again to explain why.
 
J

jacob navia

Joe said:
Do you think 2.96 is broken? I think ffff is right and you think ff is
right. I ask you again to explain why.

The C standard fprintf function.
<quote>
hh
Specifies that a following d, i, o, u, x, or X conversion specifier
applies to a signed char or unsigned char argument (the argument will
have been promoted according to the integer promotions, but its value
shall be converted to signed char or unsigned char before printing); or
that a following n conversion specifier applies to a pointer to a signed
char argument.
<end quote>

Since you specify the "x" format, the value is
interpreted as unsigned, and it is converted to an
unsigned char.

It is not lcc-win that is wrong but gcc 3.1

The 2.96 version apparently wasn't broken.
 
J

Joe Wright

jacob said:
The C standard fprintf function.
<quote>
hh
Specifies that a following d, i, o, u, x, or X conversion specifier
applies to a signed char or unsigned char argument (the argument will
have been promoted according to the integer promotions, but its value
shall be converted to signed char or unsigned char before printing); or
that a following n conversion specifier applies to a pointer to a signed
char argument.
<end quote>

Since you specify the "x" format, the value is
interpreted as unsigned, and it is converted to an
unsigned char.

It is not lcc-win that is wrong but gcc 3.1

The 2.96 version apparently wasn't broken.
Thanks. The "%hhx" had escaped me until now. I'm going to have to ask DJ
about this now. Thanks again.
 
C

Chris Torek

[There was much snippage and I am not 100% sure that the attributions
here are correct, but I think:]

I have changed the code as below

#include <stdio.h>
int main(void)
{
int x = 0x7fff;
signed char y;
y =(signed char) x;
printf("%hhx\n", y);
return 0;


}
Now is it guaranteed that y will hold ff which is the last byte of x ?

No, unless ... well, I will get to that in a moment.
Now the output is ffffffff
Why it is not ff only ?

The "hh" modifier is new in C99; C89 does not have it, and if you are
using a C89 system (rather than a C99 one -- and C99 systems are still
rather rare, so even if your compiler supports *some* C99 features, it
probably is not actually a C99 implementation, and may not support the
hh modifier), the result is unpredictable.

In C99, "%hhx" in printf() says that the argument is an int or unsigned
int resulting from widening an unsigned char, and should be narrowed
back to an unsigned char and then printed otherwise the same as "%x".
In C89, most printf engines are likely to implement this as "%hx" or
just plain "%x", and yours appears to do the latter.

Let me go back to the original version for a moment:

int x = 0xFFFFFFF0;
signed char y;
y = x;

and let us further consider two actual implementations, one on a PDP-11
(16-bit int, 32-bit long, two's complement) and one on a Univac (18-bit
int, 36-bit long, ones' complement).

The expression

0xFFFFFFF0

has type "unsigned long" on the PDP-11, because the value is
4294967280, which exceeds INT_MAX (32767) and LONG_MAX (2147483647)
but not ULONG_MAX (4294967295). It has type "long" on the Univac,
because the value exceeds INT_MAX (131071) but not LONG_MAX
(34359738367).

This value does not fit in an "int" on either machine, but both
happen to merely "chop off" excess bits in assignment (I think --
I know this is how the PDP-11 works; the Univac compiler is more
opaque). On the PDP-11, then, assigning this value to the 16-bit
signed int named "x" results in setting the bits of x to 0xfff0,
which represents -16. On the Univac, it sets the 18-bit signed
int to 0x3fff0, which represents -15. (See
<http://web.torek.net/torek/c/numbers.html> for an explanation of
ones' and two's complement.)

Luckily, the values -16 and -15 are both always in range for a
"signed char", which must be able to hold values between -127 and
127 inclusive. So, on the PDP-11, the assignment to y sets y to
-16, and on the Univac, it sets it to -15. These have bit patterns
0xfff0 and 0x3fff0 respectively. If you then pass these values to
printf() using the "%x" format, you will see fff0 (on the PDP-11)
and 3fff0 (on the Univac).

(Aside: "%x" expects an unsigned int, but a signed char converts
to a signed int on all C systems. The C standards have text that
implies that this is "supposed to work" -- you are supposed to be
able to pass values of types that are correct in all but signedness
-- but does not come right out and demand that implementations
*make* it work. It may be wise to avoid depending on it, at least
in some situations.)

[Note that a C implementation is allowed, but not required, to
catch the fact that 4294967280 does not fit in an ordinary "int".
So it is possible that the code will not compile, or that if it
does compile, it might not run. In general, however, people who
buy computers are most interested in getting the wrong answer as
fast as possible :) , so they tend to overlook things like
reliability and bug-detection in favor of whatever system has the
most gigahertz or teraflops. Computer-system-makers usually indulge
these customers: there is no point in working long and hard to
build a system that no one will buy. In some cases, like flight
control systems on airplanes or computers inside medical devices,
people are actually willing to pay for correctness. More seriously,
it is a matter of trade-offs: correctness is not so important in
a handheld game machine, but incorrect operation of the brakes in
your car could be disastrous. Unfortunately for those who *want*
reliability, "fast because we omitted all the correctness-checking"
tends to be the default -- we have to add our own checking.]

In the second, modified version, the code now reads:

int x = 0x7fff;
signed char y;
y =(signed char) x;

The constant 0x7fff has value 32767. All C systems are required
to have INT_MAX be at least 32767 (as, e.g., on the PDP-11; it may
be larger, as, e.g., on most Intel-based systems like the one you
are no doubt using). So 32767 has type "int" and fits in "x",
eliminating a lot of concerns.

The "signed char" type, however, need only hold values in -127 to
127. Chances are that your system, whatever it is, holds values
in -128 to 127. On the Univac, which has 9-bit "char"s, it holds
values in -255 to 255 inclusive (not -256, just -255). Conversion
from plain (signed) int to signed char can produce implementation-defined
results, if the value of the "int" does not fit in the "signed
char" (as is generally the case here). (I seem to recall that the
wording for casts is slightly different from that for ordinary
assignments, but cannot find the text in the Standard at the moment.)

Thus, there is no guarantee that y will hold "ff" (as a bit pattern)
-- and on the Univac, it probably holds 0x1ff as a bit pattern,
which represents the value -0 (negative zero). Whether you consider
a 9-bit byte a "byte" is also not clear to me (but I note that the
C Standard does: it says that a "byte" is a char, however many bits
that may be).

Finally, consider the phrasing of this question:
Now is it guaranteed that y will hold ff which is the last byte of x ?

The whole concept of "last byte" is rather fuzzy: which byte(s)
are "first" and which are "last"? On an 8-bit little-endian machine,
like the Intel based systems most people are most familiar with,
the least significant byte comes *first* in numerical order, not
last. I believe it is better to think not in terms of "machine
byte order" -- which is something you can only control by picking
which machines you use -- but rather to think in terms of values
and representations. As a C programmer, you have a great deal of
control of values, and if you use "unsigned" types, you have complete
control of representations. For instance, you can read a 10-bit
two's complement value from a stdio stream, with the first input
char giving the uppermost 2 bits, using "unsigned int" this way:

/*
* Read one 2-bit value and one 8-bit value from the given stream,
* and compose a signed 10-bit value (in the range [-512..+511])
* from those bits.
*/
int get_signed_10_bit_value(FILE *fp) {
int c0, c1;
unsigned int val;

c0 = getc(fp);
if (c0 == EOF) ... handle error ...
c1 = getc(fp);
if (c1 == EOF) ... handle error ...
val = ((c0 & 0x03) << 8) | (c1 & 0xff);
return (val ^ 0x200) - 0x200;
}

(Note that when you go to read more than 15 bits, you need to be
careful with intermediate values, since plain int may have as few
as 15 non-sign "value bits", and unsigned int may have as few as
16. You will need to convert values to "unsigned long", using
temporary variables, casts, or both.)

This xor-and-subtract trick works on all implementations, including
ones' complement machine like the Univac. Its only real limitation
is that the final (signed) value has to fit in the types available:
a 16-bit two's complement machine has a -32768 but a 16-bit ones'
complement machine bottoms out at -32767. (As it happens, though,
anything other than two's complement is rare today, so you probably
need not worry about this very much.)
 
S

somenath

[There was much snippage and I am not 100% sure that the attributions
here are correct, but I think:]

somenath said:
I have changed the code as below
#include <stdio.h>
int main(void)
{
int x = 0x7fff;
signed char y;
y =(signed char) x;
printf("%hhx\n", y);
return 0;
}
Now is it guaranteed that y will hold ff which is the last byte of x ?

No, unless ... well, I will get to that in a moment.
Now the output is ffffffff
Why it is not ff only ?

The "hh" modifier is new in C99; C89 does not have it, and if you are
using a C89 system (rather than a C99 one -- and C99 systems are still
rather rare, so even if your compiler supports *some* C99 features, it
probably is not actually a C99 implementation, and may not support the
hh modifier), the result is unpredictable.

In C99, "%hhx" in printf() says that the argument is an int or unsigned
int resulting from widening an unsigned char, and should be narrowed
back to an unsigned char and then printed otherwise the same as "%x".
In C89, most printf engines are likely to implement this as "%hx" or
just plain "%x", and yours appears to do the latter.

Let me go back to the original version for a moment:

int x = 0xFFFFFFF0;
signed char y;
y = x;

and let us further consider two actual implementations, one on a PDP-11
(16-bit int, 32-bit long, two's complement) and one on a Univac (18-bit
int, 36-bit long, ones' complement).

The expression

0xFFFFFFF0

has type "unsigned long" on the PDP-11, because the value is
4294967280, which exceeds INT_MAX (32767) and LONG_MAX (2147483647)
but not ULONG_MAX (4294967295). It has type "long" on the Univac,
because the value exceeds INT_MAX (131071) but not LONG_MAX
(34359738367).

This value does not fit in an "int" on either machine, but both
happen to merely "chop off" excess bits in assignment (I think --
I know this is how the PDP-11 works; the Univac compiler is more
opaque). On the PDP-11, then, assigning this value to the 16-bit
signed int named "x" results in setting the bits of x to 0xfff0,
which represents -16. On the Univac, it sets the 18-bit signed
int to 0x3fff0, which represents -15. (See
<http://web.torek.net/torek/c/numbers.html> for an explanation of
ones' and two's complement.)

Luckily, the values -16 and -15 are both always in range for a
"signed char", which must be able to hold values between -127 and
127 inclusive. So, on the PDP-11, the assignment to y sets y to
-16, and on the Univac, it sets it to -15. These have bit patterns
0xfff0 and 0x3fff0 respectively. If you then pass these values to
printf() using the "%x" format, you will see fff0 (on the PDP-11)
and 3fff0 (on the Univac).

(Aside: "%x" expects an unsigned int, but a signed char converts
to a signed int on all C systems. The C standards have text that
implies that this is "supposed to work" -- you are supposed to be
able to pass values of types that are correct in all but signedness
-- but does not come right out and demand that implementations
*make* it work. It may be wise to avoid depending on it, at least
in some situations.)

[Note that a C implementation is allowed, but not required, to
catch the fact that 4294967280 does not fit in an ordinary "int".
So it is possible that the code will not compile, or that if it
does compile, it might not run. In general, however, people who
buy computers are most interested in getting the wrong answer as
fast as possible :) , so they tend to overlook things like
reliability and bug-detection in favor of whatever system has the
most gigahertz or teraflops. Computer-system-makers usually indulge
these customers: there is no point in working long and hard to
build a system that no one will buy. In some cases, like flight
control systems on airplanes or computers inside medical devices,
people are actually willing to pay for correctness. More seriously,
it is a matter of trade-offs: correctness is not so important in
a handheld game machine, but incorrect operation of the brakes in
your car could be disastrous. Unfortunately for those who *want*
reliability, "fast because we omitted all the correctness-checking"
tends to be the default -- we have to add our own checking.]

In the second, modified version, the code now reads:

int x = 0x7fff;
signed char y;
y =(signed char) x;

The constant 0x7fff has value 32767. All C systems are required
to have INT_MAX be at least 32767 (as, e.g., on the PDP-11; it may
be larger, as, e.g., on most Intel-based systems like the one you
are no doubt using). So 32767 has type "int" and fits in "x",
eliminating a lot of concerns.

The "signed char" type, however, need only hold values in -127 to
127. Chances are that your system, whatever it is, holds values
in -128 to 127. On the Univac, which has 9-bit "char"s, it holds
values in -255 to 255 inclusive (not -256, just -255). Conversion
from plain (signed) int to signed char can produce implementation-defined
results, if the value of the "int" does not fit in the "signed
char" (as is generally the case here). (I seem to recall that the
wording for casts is slightly different from that for ordinary
assignments, but cannot find the text in the Standard at the moment.)

Thus, there is no guarantee that y will hold "ff" (as a bit pattern)
-- and on the Univac, it probably holds 0x1ff as a bit pattern,
which represents the value -0 (negative zero). Whether you consider
a 9-bit byte a "byte" is also not clear to me (but I note that the
C Standard does: it says that a "byte" is a char, however many bits
that may be).

Finally, consider the phrasing of this question:
Now is it guaranteed that y will hold ff which is the last byte of x ?

The whole concept of "last byte" is rather fuzzy: which byte(s)
are "first" and which are "last"? On an 8-bit little-endian machine,
like the Intel based systems most people are most familiar with,
the least significant byte comes *first* in numerical order, not
last. I believe it is better to think not in terms of "machine
byte order" -- which is something you can only control by picking
which machines you use -- but rather to think in terms of values
and representations. As a C programmer, you have a great deal of
control of values, and if you use "unsigned" types, you have complete
control of representations. For instance, you can read a 10-bit
two's complement value from a stdio stream, with the first input
char giving the uppermost 2 bits, using "unsigned int" this way:

/*
* Read one 2-bit value and one 8-bit value from the given stream,
* and compose a signed 10-bit value (in the range [-512..+511])
* from those bits.
*/
int get_signed_10_bit_value(FILE *fp) {
int c0, c1;
unsigned int val;

c0 = getc(fp);
if (c0 == EOF) ... handle error ...
c1 = getc(fp);
if (c1 == EOF) ... handle error ...
val = ((c0 & 0x03) << 8) | (c1 & 0xff);
return (val ^ 0x200) - 0x200;
}

(Note that when you go to read more than 15 bits, you need to be
careful with intermediate values, since plain int may have as few
as 15 non-sign "value bits", and unsigned int may have as few as
16. You will need to convert values to "unsigned long", using
temporary variables, casts, or both.)

This xor-and-subtract trick works on all implementations, including
ones' complement machine like the Univac. Its only real limitation
is that the final (signed) value has to fit in the types available:
a 16-bit two's complement machine has a -32768 but a 16-bit ones'
complement machine bottoms out at -32767. (As it happens, though,
anything other than two's complement is rare today, so you probably
need not worry about this very much.)

While going through this article I got an one question i.e how compile
decides if it is negative number?
For example in the bellow code
#include <stdio.h>

int main(int argc, char *argv[])
{
signed char c = 128;

printf("value of C = %d\n",c);
return 0;
}

Output of the program is
value of C = -128

My question is how compiler understans if C is equal to -128.

The binary representation of 128 is 10000000
In two's complement system 10000000 also means 128 so how compile
assign the value -128 to c ?

I guessed one possibility. Please correct me if I am wrong.
Complier know the sizeof char and SCHAR_MAX and it found that 128 >
SCHAR_MAX .So it treats binary representation of 128 as the negative
number representation and incidentally it is -128 so it assign -128 to
C .
Is it not correct?
 
J

James Kuyper

somenath wrote:
....
While going through this article I got an one question i.e how compile
decides if it is negative number?
For example in the bellow code
#include <stdio.h>

int main(int argc, char *argv[])
{
signed char c = 128;

printf("value of C = %d\n",c);
return 0;
}

Output of the program is
value of C = -128

My question is how compiler understans if C is equal to -128.

That's up to the implementation, you'll have to read the
implementation's documentation to find out the method it uses. The
standard doesn't cover this.
The binary representation of 128 is 10000000
In two's complement system 10000000 also means 128 so how compile
assign the value -128 to c ?

I guessed one possibility. Please correct me if I am wrong.
Complier know the sizeof char and SCHAR_MAX and it found that 128 >
SCHAR_MAX .So it treats binary representation of 128 as the negative
number representation and incidentally it is -128 so it assign -128 to
C .
Is it not correct?

The result you see is possible only if 128 > SCHAR_MAX. In that case,
the token 128 is interpreted as an integer literal of type 'int' with a
value of 128. When this value is converted to signed char, because 128 >
SCHAR_MAX, an implementation is allowed to either raise an
implementation-defined signal, or to produce an implementation-defined
result. As a result you should NEVER do this unless you're writing code
that is deliberately and with good justification intended to be
non-portable. In that case, you would have to look at implementation's
documentation to determine what the results would be.

If the implementation raises a signal on signed overflow, you'll need to
write a signal handler, and register it as handling that specific
signal, in order for such code to serve any (debatably) useful purpose.

If the implementation produces an implementation-defined result, you'll
need to find out what it is, and decide whether or not it's useful for
your purposes. However, if it is useful to write 128 and have it convert
to signed char as -128 (as the implementation you're using apparently
does), then surely you would be better off writing -128 in the first place?
 
B

Ben Bacarisse

somenath said:
[There was much snippage and I am not 100% sure that the attributions
here are correct, but I think:]
somenath wrote:
int x = 0xFFFFFFF0;
signed char y;
y = x;

While going through this article I got an one question i.e how compile
decides if it is negative number?
For example in the bellow code
#include <stdio.h>

int main(int argc, char *argv[])
{
signed char c = 128;

printf("value of C = %d\n",c);
return 0;
}

Output of the program is
value of C = -128

My question is how compiler understans if C is equal to -128.

The binary representation of 128 is 10000000

You will get into a lot of trouble if think too much about
representations and bit patterns. Representations matter, but much
less than most beginners think they do.

In C, the literal 128 means 128. It never means anything else. You
can't write a negative integer constant in C. The type of an integer
constant depends on the sizes of some of the integer types in your
implementation, but in this case we can be sure it will be of type
'int' because C guarantees that int is always big enough to hold 128,
and int is the smallest integer type that the compiler will "try" (if
the value is too big it will be the first of 'long int' or 'long long
int' that is big enough.

[All this is about C99 and excludes the possibility of extended
integer types that some implementations provide. For the "full
monty" you need to read up on how octal and hex constants are
treated and how adding l, ll and u suffixes can change things, but
lets keep this simple.]

OK, so 128 means 128. There are then two issues to understand. What
happens when 128 is assigned to a variable of type 'signed char' and
what happens in the printf call. I *think* these have been explained
well enough, but I am happy to have a go if you think otherwise!
 
A

Army1987

Harald van Dijk wrote:
[...] 7.18.1.1 now reads
"These types are optional. However, if an implementation provides integer
types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for
the signed types) that have a two's complement representation, it shall
define the corresponding typedef names." [...]
-- but, if CHAR_BIT == 8, unsigned char is still an unsigned integer type
with a width of 8 bits and no padding, meaning uint8_t is required to be
provided. And when uint8_t is provided, int8_t is required as well.
IOW the standard unintentionally requires signed char to use 2's
complement whenever CHAR_BIT is 8. Or does it?
(An implementation *can* have CHAR_BIT == 8 and signed char using e.g.
sign and magnitude, but it would need to have another 1-byte signed type
for int8_t to be typedef'd to. Since this was clearly not the intent, the
requirement to define uint..._t if and only if int..._t is defined ought
to be dropped.)
 
A

Army1987

santosh said:
somenath wrote:

Because of format specifier mismatch. You are supplying a char and
telling it to look for an int.
Isn't that char supposed to be promoted to int (or, in case CHAR_MAX >
INT_MAX, to unsigned int) according to the integer promotion?
 
A

Army1987

jacob said:
The C standard fprintf function.
<quote>
hh
Specifies that a following d, i, o, u, x, or X conversion specifier
applies to a signed char or unsigned char argument (the argument will
have been promoted according to the integer promotions, but its value
shall be converted to signed char or unsigned char before printing); or
that a following n conversion specifier applies to a pointer to a signed
char argument.
<end quote>

Since you specify the "x" format, the value is
interpreted as unsigned, and it is converted to an
unsigned char.
And -1, which is the most likely value for (signed char)0x7fff (though
that's implementation defined) should become 0xff when converted to
unsigned char. Implementations which don't support %hhx and treat it as
%hx will print ffff.
 
J

jacob navia

Army1987 said:
And -1, which is the most likely value for (signed char)0x7fff (though
that's implementation defined) should become 0xff when converted to
unsigned char. Implementations which don't support %hhx and treat it as
%hx will print ffff.

That was the whole problem. hhx is defined in C99 only.
What is surprising is that the gcc printf recognizes somehow hh, even
if it doesn't support it. Strange.

Maybe we should try to write hhhhhhhhh to see what happens!

:)
 
H

Harald van Dijk

Harald van Dijk wrote:
[...] 7.18.1.1 now reads
"These types are optional. However, if an implementation provides
integer
types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for
the signed types) that have a two's complement representation, it
shall define the corresponding typedef names." [...]
-- but, if CHAR_BIT == 8, unsigned char is still an unsigned integer
type with a width of 8 bits and no padding, meaning uint8_t is required
to be provided. And when uint8_t is provided, int8_t is required as
well.
IOW the standard unintentionally requires signed char to use 2's
complement whenever CHAR_BIT is 8. Or does it? (An implementation *can*
have CHAR_BIT == 8 and signed char using e.g. sign and magnitude, but it
would need to have another 1-byte signed type for int8_t to be typedef'd
to.
Exactly.

Since this was clearly not the intent, the requirement to define
uint..._t if and only if int..._t is defined ought to be dropped.)

Either that, or the requirement to define uint8_t should exist only if a
type matching the description of int8_t also exists. This would allow
CHAR_BIT==8 without two's complement by simply not defining (u)int8_t,
even though a type matching uint8_t's requirements exists.
 
C

Chris Torek

That was the whole problem. hhx is defined in C99 only.
What is surprising is that the gcc printf recognizes somehow hh, even
if it doesn't support it. Strange.

It is not really surprising, or at least, *should* not be, if you
think about it. The GNU compiler collection contains a "C compiler"
(of sorts), but not a complete *implementation* of C, because it
uses whatever libraries are provided by the underlying system.

The compiler front-end, which reads the source code and turns
syntax and semantics into instruction sequences, is all part of
this "compiler collection". While this part does not actually
implement C99, it comes close enough to "understand" %hhx. So
the part of the compiler that emits diagnostics will "read" the
formatting directives to printf, see "%hhx", and check that the
argument has the correct type, all while *assuming* that the
system-provided library will "do the right thing" with it.

Later, at link time, when you combine the "compiled" code (object
files and/or libraries) with the system-provided library to get
the final executable -- Translation Phase 8 in the C99 standard
(TP7 in C89, if I remember right) -- you get the system's actual
implementation of printf(), which is often "less C99-ish" (as it
were) than even GCC.

It is not practical for GCC to provide the implementation of
printf() itself: The bottom levels of stdio are full of
"implementation-specific magic" (such as handling all the RMS
formats on VMS, or the multiple kinds of file formats on IBM
mainframes or DOS/Windows systems) that varies too much from one
implementation to the next. You can, of course, just choose to
use a system whose system-provided C library supports %hhx. :)

(The C Standards really address only complete systems, not divided-up
half-implementations[%] like the GNU Compiler Collection. So one
cannot even say that gcc implements C89, much less that it implements
C99. If you combine gcc with a particular set of libraries, you
can get a conformant C89 implementation, but gcc is approaching
C99 in the front-end only asymptotically. The biggest sticking
point appears to be various GNU extensions that are incompatible
with C99: making the front end C99-conformant would break those.
Given that the GNU folks have broken their own extensions before,
I am not sure how "sticky" a sticking point this really is, but it
is definitely "sticky". :) )

[% Actually, I tend to think of gcc as a "3/4ths or so" implementation.
Compilers are usually divided, in compiler circles at least, into
three parts. Only two of these are thought of as "the compiler":
the "front end", which reads syntax and "understands" semantics,
and the "back end", which does code-generation and final (peephole)
optimization. The main body of optimization is either part of the
"front end" or, in many cases now, a "middle end" that -- like the
back end -- is shared between multiple languages. That is, one
might have a front end for Ada, another for C, a third for C++, a
fourth for Fortran, and a fifth for Pascal; these would all produce
some sort of internal tree or list representation that feeds through
a shared optimizer and shared back-end. The "middle end" optimization
needed tends to vary quite a bit from one language to the next,
though, so it can be more efficient, in some senses, to paste
different "middle ends" onto the various front-end parts. Back
when 8 megabytes was a lot of RAM, this kind of efficiency was
more important; nowadays the gigantic shared middle end, that uses
a gigabyte of RAM to run, seems to be in vogue. :)

In any case, after the final code comes out of the "back end"
of the compiler -- and in some cases, is turned into linkable
object code by a separate "assembler" -- the object code and
libraries are handled by a piece usually called a "linker". The
"linker" is normally completely separate from the compiler, and
tends to be used on its own now and then, e.g., to build
libraries.

(There are also "globally optimizing" compilers that defer at least
some of the optimization and code-generation phases. In this case,
instead of generating object code and linking that, the front end
simply saves a "digested form" of the code in the "object" files
-- which are no longer object files at all -- and the optimization,
code generation, and final stages are all run when you "link" the
pieces together. This gives the optimizer a view of the entire
program, so that it can do a much better job. The drawback is that
the "final link phase", which is usually pretty fast, now contains
most of the real work, and can take hours or even days for large
programs.)

The GNU Compiler Collection provides front and back ends, but
not the linker. There *is* a GNU linker, and using it buys you
some advantages, especially in languages other than C, but gcc
can be built for systems that use the native non-GNU linker,
using an auxiliary step they call "collect2".]
 
A

Army1987

Chris said:
It is not really surprising, or at least, *should* not be, if you
think about it. The GNU compiler collection contains a "C compiler"
(of sorts), but not a complete *implementation* of C, because it
uses whatever libraries are provided by the underlying system.

The compiler front-end, which reads the source code and turns
syntax and semantics into instruction sequences, is all part of
this "compiler collection". While this part does not actually
implement C99, it comes close enough to "understand" %hhx. So
the part of the compiler that emits diagnostics will "read" the
formatting directives to printf, see "%hhx", and check that the
argument has the correct type, all while *assuming* that the
system-provided library will "do the right thing" with it.

Later, at link time, when you combine the "compiled" code (object
files and/or libraries) with the system-provided library to get
the final executable -- Translation Phase 8 in the C99 standard
(TP7 in C89, if I remember right) -- you get the system's actual
implementation of printf(), which is often "less C99-ish" (as it
were) than even GCC.
Support for hh was added in glibc 2.1.
With gcc 4.1.2 and glibc 2.5,
#include <stdio.h>
int main(void)
{
printf("%hhx %hx\n", -1, -1);
return 0;
}
prints "ff ffff".
 
D

Dan Henry

[...]
I believe it is better to think not in terms of "machine
byte order" -- which is something you can only control by picking
which machines you use -- but rather to think in terms of values
and representations. As a C programmer, you have a great deal of
control of values, and if you use "unsigned" types, you have complete
control of representations. For instance, you can read a 10-bit
two's complement value from a stdio stream, with the first input
char giving the uppermost 2 bits, using "unsigned int" this way:

/*
* Read one 2-bit value and one 8-bit value from the given stream,
* and compose a signed 10-bit value (in the range [-512..+511])
* from those bits.
*/
int get_signed_10_bit_value(FILE *fp) {
int c0, c1;
unsigned int val;

c0 = getc(fp);
if (c0 == EOF) ... handle error ...
c1 = getc(fp);
if (c1 == EOF) ... handle error ...
val = ((c0 & 0x03) << 8) | (c1 & 0xff);
return (val ^ 0x200) - 0x200;
}

(Note that when you go to read more than 15 bits, you need to be
careful with intermediate values, since plain int may have as few
as 15 non-sign "value bits", and unsigned int may have as few as
16. You will need to convert values to "unsigned long", using
temporary variables, casts, or both.)

This xor-and-subtract trick works on all implementations, including
ones' complement machine like the Univac. Its only real limitation
is that the final (signed) value has to fit in the types available:
a 16-bit two's complement machine has a -32768 but a 16-bit ones'
complement machine bottoms out at -32767. (As it happens, though,
anything other than two's complement is rare today, so you probably
need not worry about this very much.)

I wonder if the holidays have left me the victim of some form of
limited thinking. I have been admonished occasionally here to (in so
many words) think less about representations (i.e., bit patterns) and
more about values. Thinking about values with the sign extension code
above, with a negative signed 10-bit value (e.g., 'val' is 0x202
before the return), I'd have thought that (assuming 16-bit int and
unsigned int) the return expression would yield the unsigned *value*
0xFE02 which is also twos-complement *representation* of the expected
int return value. However, my new and improved, value-oriented self
is now confused. What allows the the conversion, which I thought was
value-to-value, of an unsigned value that a 16-bit signed int can't
have? There has got to be a clause in the standard that I am
overlooking.

Remember, I already said that I suffer from limited thinking. Would
someone kindly point me where I should be reading?
 
C

Chris Torek

I wonder if the holidays have left me the victim of some form of
limited thinking. I have been admonished occasionally here to (in so
many words) think less about representations (i.e., bit patterns) and
more about values.

Well, in this case, we have a specification that talks about
representations: c0 and c0 (read from a stdio stream) hold bits
for a 10-bit signed two's complement representation, and our goal
in this code fragment is to turn those into whatever *our* machine
uses to represent the numbers extracted from the stream. So for
this particular part of the problem, we do have to care about
representations.
Thinking about values with the sign extension code
above, with a negative signed 10-bit value (e.g., 'val' is 0x202
before the return), I'd have thought that (assuming 16-bit int and
unsigned int) the return expression would yield the unsigned *value*
0xFE02

Indeed. The problem is, I goofed. As you say, if val is 0x202 (and
has type "unsigned int"), we have:

(val ^ 0x200) - 0x200

which is:

(0x202U ^ 0x200) - 0x200

which mixes unsigned and signed. In pre-ANSI C, it was easy to
predict the result: mix unsigned with signed, you got unsigned.
In C89 and C99, what you get depends on type_MAX vs Utype_MAX,
where the <type>s are the types of the signed and unsigned values
involved. In this case, <type> is int (each time), and in general
we have UINT_MAX > INT_MAX, so the xor is done by converting 0x200
to unsigned (i.e., 512U or 0x200U). Since 0x200U ^ 0x202U is 2U,
the result is 2U. Then we have the same unsigned-vs-signed problem,
and again 0x200 is converted to 0x200U, so the value is in fact
(2U - 512U), which is indeed 0xfe02 or 0xfffffe02 on common
implementations.

The code *should* have read:

(int)(val ^ 0x200U) - 512

(I used a decimal constant for clarity this time, although it may
actually be less clear. When I wrote the text in ">>" above I
actually tried putting in 512, but switched back to 0x200.)

By converting to int after xor-ing with 0x200, we would now get the
(signed int) value 2, and (2 - 512) is -510, which is what we wanted.
So:

should read, instead, something more like:

return (int)(val ^ 0x200U) - (int)0x200;
 
D

Dan Henry

On 25 Dec 2007 23:58:34 GMT, Chris Torek <[email protected]> wrote:
[where "val" has type "unsigned int" and c0 and c1 are ordinary "int"]
I wonder if the holidays have left me the victim of some form of
limited thinking. I have been admonished occasionally here to (in so
many words) think less about representations (i.e., bit patterns) and
more about values.

Well, in this case, we have a specification that talks about
representations: c0 and c0 (read from a stdio stream) hold bits
for a 10-bit signed two's complement representation, and our goal
in this code fragment is to turn those into whatever *our* machine
uses to represent the numbers extracted from the stream. So for
this particular part of the problem, we do have to care about
representations.

Chris,

Thank you for your reply. I had no problem or confusion with the
representation aspects of everything above the 'return' line. My
issue was entirely regarding the 'return' expression and its
conversion to the returned value.
Indeed. The problem is, I goofed. As you say, if val is 0x202 (and
has type "unsigned int"), we have:

(val ^ 0x200) - 0x200

which is:

(0x202U ^ 0x200) - 0x200

which mixes unsigned and signed. In pre-ANSI C, it was easy to
predict the result: mix unsigned with signed, you got unsigned.
In C89 and C99, what you get depends on type_MAX vs Utype_MAX,
where the <type>s are the types of the signed and unsigned values
involved. In this case, <type> is int (each time), and in general
we have UINT_MAX > INT_MAX, so the xor is done by converting 0x200
to unsigned (i.e., 512U or 0x200U). Since 0x200U ^ 0x202U is 2U,
the result is 2U. Then we have the same unsigned-vs-signed problem,
and again 0x200 is converted to 0x200U, so the value is in fact
(2U - 512U), which is indeed 0xfe02 or 0xfffffe02 on common
implementations.

The code *should* have read:

(int)(val ^ 0x200U) - 512

(I used a decimal constant for clarity this time, although it may
actually be less clear. When I wrote the text in ">>" above I
actually tried putting in 512, but switched back to 0x200.)

0x200 seems just fine to me.
By converting to int after xor-ing with 0x200, we would now get the
(signed int) value 2, and (2 - 512) is -510, which is what we wanted.
So:


should read, instead, something more like:

return (int)(val ^ 0x200U) - (int)0x200;

It is exactly the coercion of (val ^ 0x200) to an int that I thought
would be necessary.

Thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top