William said:
Yes. And to see a real world example of such a program failing because of
this (not that undefined means it must fail), try this compiler
http://fabrice.bellard.free.fr/tcc/
w/ your code, using the -b switch (bounds checker). I never heeded the
standard on this point until I started using TCC to improve my code
portability.
I must say there are some circumstances where it is indeed desirable to
iterate backwards. For example, I had to tweak many places in a memory
pool library because I would iterate backwards from a given pointer
reading bookkeeping information until I hit a terminator bit. Took me hours to
figure out why my program would crash using TCC:
/*
* Beginning from *p, work backwards reconstructing the value of an
* rbitsint_t integer. Stop when the highest order bit of *p is set, which
* should have been previously preserved as a marker. Return the
* reconstructed value, setting *end to the last position used of p.
*/
static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
rbitsint_t i = 0; /* currently typedef to size_t */
int n = 0;
do {
i |= (*p & ~(1 << (CHAR_BIT - 1))) << (n++ * (CHAR_BIT - 1));
Two problems with this line.
First, ~(1 << (CHAR_BIT - 1)) is a questionable expression. 1 << (CHAR_BIT -
1) is a signed integer, to which ~ is applied. The value of the result is
implementation-defined. This will happen to work here because *p is at most
as wide as the bitmask, so the irrelevant bits will be masked out and the
actual value of the expression doesn't matter. Still, this is a habit best
unlearned. Use U<FOO>_MAX >> 1 for the all-ones-except-the-MSB bitmask,
where U<FOO>_MAX can be defined if necessary as ((unsigned foo) -1).
Second, (*p & ~(1 << (CHAR_BIT - 1))) is an int. You then shift this int by
(n++ * (CHAR_BIT - 1)), which has potential for undefined behavior since it
can exceed the width of an int. What you want is to shift an rbitsint_t, not
an int. (Unlike the previous issue, this can be a real problem -- try
typedef'ing rbitsint_t as a 64-bit type on a 32-bit architecture and reading
more than 32 value bits into it to see what I mean.)
This should fix both issues:
i |= (rbitsint_t) (*p & (UCHAR_MAX >> 1)) << (n++ * (CHAR_BIT - 1));
} while (!(*(p--) & (1 << (CHAR_BIT - 1))));
*end = p + 1;
return i;
} /* rbits_get() */
Not inspired by the overflow observation, but here's my take:
#define LOMASK (UCHAR_MAX >> 1)
#define MSB (LOMASK + 1)
static inline rbitsint_t rbits_get(unsigned char *p, unsigned char **end) {
rbitsint_t i;
unsigned char *q = p;
while (!(*q & MSB)) --q;
*end = q;
i = *q & LOMASK;
while (++q <= p) {
i = i << (CHAR_BIT - 1) | *q;
}
return i;
}
Yes, it's more lines; yes, I go backwards just to go forwards again. If the
operations involved are expensive, this is not a good idea, but here the
operations aren't expensive. I personally find this easier to read, and for
my compiler and my machine (YMMV) it's actually faster.
Of course, I assume you've rewritten your code since discovering the
one-before-the-beginning bug, possibly even along these lines, so my point
may be moot.
S.