Martin said:
To ensure the partial quote I gave in my initial post was not misleading,
this is the question and answer from the book (c)1996 by Addison-Wesley
Publishing Company, Inc.
Question: How can I write code to conform to these old, binary data file
formats?
Answer: It's difficult because of word size and byte-order differences,
floating-point formats, and structure padding. To get the control you need
over these particulars, you may have to read and write things a byte at a
time, shuffling and rearranging as you go. (This isn't always as bad as it
sounds and gives you both code portability and complete
control.) For example, suppose that you want to read a data structure,
consisting of a character, a 32-bit integer, and a 16-bit integer, from the
stream fp into the C structure
struct mystruct {
char c;
long int i32;
int i16;
};
You might use code like this:
s.c = getc(fp);
s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (unsigned)(getc(fp) << 8);
s.i32 |= getc(fp);
s.i16 = getc(fp) << 8;
s.i16 |= getc(fp);
This code assumes that getc reads 8-bit characters and that the data is
stored most significant byte first ("big endian"). The casts to (long)
ensure that the 16- and 24-bit shifts operate on long values (see question
3.14), and the cast to (unsigned) guards against sign extension. (In
general, it's safer to use all unsigned types when writing code like this,
but see question 3.19.)
This code seems to arise from an odd combination of
caution, carelessness, and micro-optimization. The design
considerations may have evolved along these lines:
Caution: Since an `int' could be as narrow as 16 bits,
use `long' to store the final value, safe in the knowledge
that `long' is at least 32 bits wide. For the same reason,
convert the first two getc() results from `int' to `long'
before shifting, since the shifts might be too wide for a
narrow `int'.
Optimization: The third getc() result is shifted only
8 bits, so it will fit in an `int' even if `int' is only
16 bits wide. Doing arithmetic on an `int' may be a hair
faster than on a `long', so shift first and convert later.
Carelessness: If `int' is only 16 bits wide, this
shift may slide a high-order 1-bit from the getc() result
into the sign position of the `int'. This will cause no
harm on most machines, but the C language doesn't actually
specify what will happen. (The same carelessness afflicts
the shifting of the first byte, too.)
Caution: If the shift did in fact slide a 1-bit into
the sign position of a 16-bit `int' and thereby make it
negative, converting this `int' to `long' will propagate
the sign bit leftward and the subsequent `|' will clobber
the two bytes already processed. Hence the `unsigned' cast:
if `int' is 16 bits wide it will be zero-extended instead of
sign-extended, and if `int' is wider it won't be negative
anyhow.
Optimization: Since the fourth getc() result is non-
negative and doesn't get shifted, this sign bit is zero and
conversion to `long' will not "smear" the first three bytes.
The conversion can go straight from `int' to `long' safely.
Carelessness: Of course, all these getc() calls can fail,
and the results should be checked against EOF before being
used. I assume Mr. Summit omitted the checks for brevity.
(Alternatively, the individual checks could be omitted if
tests of feof() and ferror() followed the whole sequence.)
The optimizations seem pointless to me. If there is any
speed advantage for shift-convert over convert-shift, that
advantage will be tiny compared to the I/O activity that
provides the incoming bytes. Suppose a disk read takes 10ms
to fetch 64KB of input: that's ~150ns per byte, or about 450
processing cycles on a 3GHz machine. If shift-then-convert
saves two cycles, say, you have saved a whopping two-tenths
of one percent -- it seems likely that almost any program you
can name presents more significant optimization opportunities
elsewhere. (The other way to think about this is to note that
64KB per 10ms means bytes arrive at a rate of 6.5MHz, which is
peanuts compared to even a 1GHz=1000MHz machine.)
If we throw out the pointless optimizations, we get
something like
s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (long)getc(fp) << 8;
s.i32 |= (long)getc(fp) << 0;
.... which, I submit, makes up in clarity what little it gives
away in efficiency.
The corresponding code to write the structure might look like:
putc(s.c, fp);
putc((unsigned)((s.i32 >> 24) & 0xff), fp);
putc((unsigned)((s.i32 >> 16) & 0xff), fp);
putc((unsigned)((s.i32 >> 8) & 0xff), fp);
putc((unsigned)(s.i32 & 0xff), fp);
putc(s.i16 >> 8) & 0xff, fp);
putc(s.i16 & 0xff, fp);
I'm afraid this baffles me. I could understand, e.g.
putc( ((unsigned)(s.i32 >> 24)) & 0xFF, fp);
on the grounds of avoiding the need for a `long' version of
0xFF, but as written I simply don't get it. (Besides, the
next-to-last line is missing a parenthesis.) You'd better
address your question to Mr. Summit directly.