FAQ Related - why cast?

M

Martin

Two questions relating to FAQ answer 12.42.

(1) In the statement

s.i16 |= (unsigned)(getc(fp) << 8);

i16 is declared int. The reason for casting to (unsigned) is explained as
guarding against sign extension. But left-shifting will always fill vacated
bits with zero (assuming the right operand is nonnegative and less than the
number of bits in the left expression's type). So how is the cast useful?

(2) I am puzzled by the cast to unsigned in the following statement:

putc((unsigned)((s.i32 >> 24) & 0xff), fp);

i32 is declared long int.

As I understand it the usual arithmetic conversions will ensure the type of
the expression (s.i32 >> 24) & 0xff will be long int. That long int will be
cast to unsigned int, but what is the point? putc() expects its first
argument to be of type int. So at the moment it's going through

long int -> unsigned int -> int

whereas without the cast it would be

long int -> int
 
J

Jack Klein

Two questions relating to FAQ answer 12.42.

You must have the book version of the FAQ, since 12.42 is not in the
online version.
(1) In the statement

s.i16 |= (unsigned)(getc(fp) << 8);

i16 is declared int. The reason for casting to (unsigned) is explained as
guarding against sign extension. But left-shifting will always fill vacated
bits with zero (assuming the right operand is nonnegative and less than the
number of bits in the left expression's type). So how is the cast useful?

If the int returned by getc() is negative, left shifting it produces
undefined behavior. If the int returned by getc() has a value greater
than 255, left shifting it produces undefined behavior. Converting
either of these out-of-range values to unsigned int avoids the
undefined behavior.
(2) I am puzzled by the cast to unsigned in the following statement:

putc((unsigned)((s.i32 >> 24) & 0xff), fp);

i32 is declared long int.

As I understand it the usual arithmetic conversions will ensure the type of
the expression (s.i32 >> 24) & 0xff will be long int. That long int will be
cast to unsigned int, but what is the point? putc() expects its first
argument to be of type int. So at the moment it's going through

long int -> unsigned int -> int

whereas without the cast it would be

long int -> int

This is somewhat sloppy coding. Generally, bit shifts should not be
used on signed integer types. There are too many potential surprises
(read defects, when the program does not do what the programmer
expected). If s.i32 is negative, the result of the shift is
implementation defined. It would actually make more sense to cast
s.i32 to unsigned long before the shift.
 
D

Derrick Coetzee

Jack said:
If the int returned by getc() is negative, left shifting it produces
undefined behavior. If the int returned by getc() has a value greater
than 255, left shifting it produces undefined behavior.

The shift is done before the cast, though. To avoid undefined behaviour
you would want to do:

s.i16 |= ((unsigned)getc(fp)) << 8;

Also, getc cannot possibly return a value exceeding 255, because it
always returns either an unsigned char value (sizeof(unsigned char) is
1) or a negative value (EOF is negative).
 
M

Micah Cowan

Derrick said:
The shift is done before the cast, though. To avoid undefined behaviour
you would want to do:

s.i16 |= ((unsigned)getc(fp)) << 8;

Also, getc cannot possibly return a value exceeding 255, because it
always returns either an unsigned char value (sizeof(unsigned char) is
1) or a negative value (EOF is negative).

I'm not sure why Jack thought that a value of greater than 255
could not be left-shifted, but be assured that it is entirely
possible for getc() to return a value exceding 255, on systems
with more than 8 bits to a byte. There are people here who have
worked on such implementations.
 
D

Dietmar Schindler

Martin said:
Two questions relating to FAQ answer 12.42.

(1) In the statement

s.i16 |= (unsigned)(getc(fp) << 8);

i16 is declared int. The reason for casting to (unsigned) is explained as
guarding against sign extension. ...

Provided that you stated the FAQ answer correctly, the explanation is
nonsense (the left hand side of the assignment expression is of type
int, and without the cast, the right hand side is also of type int; so
there is no extension).
 
C

CBFalconer

Jack said:
You must have the book version of the FAQ, since 12.42 is not in
the online version.


If the int returned by getc() is negative, left shifting it
produces undefined behavior. If the int returned by getc() has
a value greater than 255, left shifting it produces undefined
behavior. Converting either of these out-of-range values to
unsigned int avoids the undefined behavior.

Disagree. getc returns the integer value of an unsigned char
(positive) or EOF. The code is faulty since it doesn't handle EOF
anyway. That integer needs to be coerced into an unsigned to allow
the left shift. So the statement should be:

s.i16 |= ((unsigned)getc(fp)) << 8;

which may still not fit into an int, if the int is 16 bits. i16
should have been declared as unsigned.
 
M

Martin

Dietmar Schindler said:
Provided that you stated the FAQ answer correctly, the explanation is
nonsense (the left hand side of the assignment expression is of type
int, and without the cast, the right hand side is also of type int; so
there is no extension).


To ensure the partial quote I gave in my initial post was not misleading,
this is the question and answer from the book (c)1996 by Addison-Wesley
Publishing Company, Inc.

Question: How can I write code to conform to these old, binary data file
formats?

Answer: It's difficult because of word size and byte-order differences,
floating-point formats, and structure padding. To get the control you need
over these particulars, you may have to read and write things a byte at a
time, shuffling and rearranging as you go. (This isn't always as bad as it
sounds and gives you both code portability and complete
control.) For example, suppose that you want to read a data structure,
consisting of a character, a 32-bit integer, and a 16-bit integer, from the
stream fp into the C structure

struct mystruct {
char c;
long int i32;
int i16;
};

You might use code like this:

s.c = getc(fp);

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (unsigned)(getc(fp) << 8);
s.i32 |= getc(fp);

s.i16 = getc(fp) << 8;
s.i16 |= getc(fp);

This code assumes that getc reads 8-bit characters and that the data is
stored most significant byte first ("big endian"). The casts to (long)
ensure that the 16- and 24-bit shifts operate on long values (see question
3.14), and the cast to (unsigned) guards against sign extension. (In
general, it's safer to use all unsigned types when writing code like this,
but see question 3.19.)

The corresponding code to write the structure might look like:

putc(s.c, fp);
putc((unsigned)((s.i32 >> 24) & 0xff), fp);
putc((unsigned)((s.i32 >> 16) & 0xff), fp);
putc((unsigned)((s.i32 >> 8) & 0xff), fp);
putc((unsigned)(s.i32 & 0xff), fp);
putc(s.i16 >> 8) & 0xff, fp);
putc(s.i16 & 0xff, fp);

See also questions 2.12, 12.38, 16.7, and 20.5.
 
E

Eric Sosman

Martin said:
To ensure the partial quote I gave in my initial post was not misleading,
this is the question and answer from the book (c)1996 by Addison-Wesley
Publishing Company, Inc.

Question: How can I write code to conform to these old, binary data file
formats?

Answer: It's difficult because of word size and byte-order differences,
floating-point formats, and structure padding. To get the control you need
over these particulars, you may have to read and write things a byte at a
time, shuffling and rearranging as you go. (This isn't always as bad as it
sounds and gives you both code portability and complete
control.) For example, suppose that you want to read a data structure,
consisting of a character, a 32-bit integer, and a 16-bit integer, from the
stream fp into the C structure

struct mystruct {
char c;
long int i32;
int i16;
};

You might use code like this:

s.c = getc(fp);

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (unsigned)(getc(fp) << 8);
s.i32 |= getc(fp);

s.i16 = getc(fp) << 8;
s.i16 |= getc(fp);

This code assumes that getc reads 8-bit characters and that the data is
stored most significant byte first ("big endian"). The casts to (long)
ensure that the 16- and 24-bit shifts operate on long values (see question
3.14), and the cast to (unsigned) guards against sign extension. (In
general, it's safer to use all unsigned types when writing code like this,
but see question 3.19.)

This code seems to arise from an odd combination of
caution, carelessness, and micro-optimization. The design
considerations may have evolved along these lines:

Caution: Since an `int' could be as narrow as 16 bits,
use `long' to store the final value, safe in the knowledge
that `long' is at least 32 bits wide. For the same reason,
convert the first two getc() results from `int' to `long'
before shifting, since the shifts might be too wide for a
narrow `int'.

Optimization: The third getc() result is shifted only
8 bits, so it will fit in an `int' even if `int' is only
16 bits wide. Doing arithmetic on an `int' may be a hair
faster than on a `long', so shift first and convert later.

Carelessness: If `int' is only 16 bits wide, this
shift may slide a high-order 1-bit from the getc() result
into the sign position of the `int'. This will cause no
harm on most machines, but the C language doesn't actually
specify what will happen. (The same carelessness afflicts
the shifting of the first byte, too.)

Caution: If the shift did in fact slide a 1-bit into
the sign position of a 16-bit `int' and thereby make it
negative, converting this `int' to `long' will propagate
the sign bit leftward and the subsequent `|' will clobber
the two bytes already processed. Hence the `unsigned' cast:
if `int' is 16 bits wide it will be zero-extended instead of
sign-extended, and if `int' is wider it won't be negative
anyhow.

Optimization: Since the fourth getc() result is non-
negative and doesn't get shifted, this sign bit is zero and
conversion to `long' will not "smear" the first three bytes.
The conversion can go straight from `int' to `long' safely.

Carelessness: Of course, all these getc() calls can fail,
and the results should be checked against EOF before being
used. I assume Mr. Summit omitted the checks for brevity.
(Alternatively, the individual checks could be omitted if
tests of feof() and ferror() followed the whole sequence.)

The optimizations seem pointless to me. If there is any
speed advantage for shift-convert over convert-shift, that
advantage will be tiny compared to the I/O activity that
provides the incoming bytes. Suppose a disk read takes 10ms
to fetch 64KB of input: that's ~150ns per byte, or about 450
processing cycles on a 3GHz machine. If shift-then-convert
saves two cycles, say, you have saved a whopping two-tenths
of one percent -- it seems likely that almost any program you
can name presents more significant optimization opportunities
elsewhere. (The other way to think about this is to note that
64KB per 10ms means bytes arrive at a rate of 6.5MHz, which is
peanuts compared to even a 1GHz=1000MHz machine.)

If we throw out the pointless optimizations, we get
something like

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (long)getc(fp) << 8;
s.i32 |= (long)getc(fp) << 0;

.... which, I submit, makes up in clarity what little it gives
away in efficiency.
The corresponding code to write the structure might look like:

putc(s.c, fp);
putc((unsigned)((s.i32 >> 24) & 0xff), fp);
putc((unsigned)((s.i32 >> 16) & 0xff), fp);
putc((unsigned)((s.i32 >> 8) & 0xff), fp);
putc((unsigned)(s.i32 & 0xff), fp);
putc(s.i16 >> 8) & 0xff, fp);
putc(s.i16 & 0xff, fp);

I'm afraid this baffles me. I could understand, e.g.

putc( ((unsigned)(s.i32 >> 24)) & 0xFF, fp);

on the grounds of avoiding the need for a `long' version of
0xFF, but as written I simply don't get it. (Besides, the
next-to-last line is missing a parenthesis.) You'd better
address your question to Mr. Summit directly.
 
M

Martin

:
(Besides, the next-to-last line is missing a parenthesis.)
You'd better address your question to Mr. Summit directly.


Thanks for that response. The penultimate line should be

putc((s.i16 >> 8) & 0xff, fp);

as you point out.

Martin
 
C

CBFalconer

Martin said:
.... snip ...

For example, suppose that you want to read a data structure,
consisting of a character, a 32-bit integer, and a 16-bit
integer, from the stream fp into the C structure

struct mystruct {
char c;
long int i32;
int i16;
};

You might use code like this:

s.c = getc(fp);

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (unsigned)(getc(fp) << 8);
s.i32 |= getc(fp);

I hope not. What if CHAR_BIT is greater than 8? What about EOF?
What you might do (assuming hi byte first in the stream) is:

#include <limits.h>
unsigned long u;
int i;

for (i = 0, u = 0; i < 4; i++) {
/* you may want to include error traps for getc
returning anything larger than 255 or EOF */
u = u * 256 + (getc(fp) & 0xff);
}
if (u < LONG_MAX) s.i32 = u;
else {
/* take corrective action on overflow */
/* creating a neg. value is system dependant */
}

and if you really need the obfuscation you can use "<< 8" in place
of "* 256".

Notice how the standard network assumption of hi byte first eases
the translation of an incoming stream, and does not hamper
generation of an output stream. You can also settle the possible
negations etc. on the initial input byte, and make any following
code bulletproof.
 
P

Peter Nilsson

In C90 perhaps, but not always in C99.

Actually, anything over 127 is problematic.

Derrick said:
The shift is done before the cast, though. To avoid undefined behaviour
you would want to do:

s.i16 |= ((unsigned)getc(fp)) << 8;

Also, getc cannot possibly return a value exceeding 255, because it
always returns either an unsigned char value (sizeof(unsigned char) is
1)

sizeof(unsigned char) == 1 does not limit the upper bound of unsigned
char. CHAR_BIT may be more than 8.
 
D

Derrick Coetzee

Micah said:
Derrick said:
Also, getc cannot possibly return a value exceeding 255, because it
always returns either an unsigned char value (sizeof(unsigned char) is
1) or a negative value (EOF is negative).

[ . . . ] be assured that it is entirely possible for getc()
to return a value exceding 255, on systems with more than 8 bits to a
byte. There are people here who have worked on such implementations.

Oops, I should've said, "exceeding UCHAR_MAX" - I keep assuming CHAR_BIT
is 8.
 
M

Micah Cowan

Peter said:
In C90 perhaps, but not always in C99.

Yes, always. Read 6.5.7#4:

The result of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are filled with zeros.
If E1 has an unsigned type.... If E1 has a signed
type and nonnegative value... otherwise, the behavior is
undefined.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top