My thinking is perhaps colored by too many years of assembly coding and
instruction sets that include "decrement and branch if nonzero":
Interesting where the differences lead us. I did some x86 assembly way
back - not much but enough to get the general idea. As for any other
architectures - I've used them but haven't coded their assembly.
So my thinking is based mainly on my own concepts of an 'abstract machine'
and the operations it would perform with my own idea of the abstract cost
of those operations.
In practice your thinking is likely to lead to code that is efficient on a
given set of architectures.
My approach will be more efficient on a machine in my head.
Whether
that translates into efficiency on real machines depends on
a) the natural fit of my abstract machine to the real hardware
b) of the ability of the compiler to optimise my semantics for the real
hardware.
There is another dependency - optimising my semantics for the abstract
machine. But that's a step I deem it upon myself to take and prefer not
to rely on the compiler to provide. And that's the point where according
to my abstract concepts, the 'maximum pointer test' is cheaper than the
separate counter.
But it's only valid for my personal abstract machine. Another
programmer might have their own personal abstract machine with a
different cost weighting. And your cost calculations are valid for the
machines you know have that type of instruction.
In the end when we're talking about non-platform-specific C as on this
forum, apart from blazingly obvious inefficiencies, and representations
that can only be implemented by hardware in one particular way, things
like this are a matter of personal preference and style, would you not
agree?
[snip assembly and explanation]
Modern CPUs have "branch on register not zero" but also may have "branch
on r1 not equal to r2". MIPS and SPARCv9 lack the latter, so the
counted loop is still better; but on SPARCv9 at least, you really want
to implement large block-copy operations using the special
cache-conserving streaming load operations, through the floating point
registers. (But that assembly code is far too large to post
off-topically to comp.lang.c
)
From vague memory I know x86 has the jump on zero, jump on less than etc
but I have no recollection of whether this can be combined with a
decrement in a single instruction.
We're wandering somewhat off-topic here but these are things to consider
in the context of writing non-specific code and being aware that it must
ultimately run on an implementation which will have strengths and
weaknesses. How much should you abstract those implementation-specific
strengths and weaknesses into generic ones and code for those? Is there
any validity to the concept of a generic strength or weakness for an
abstract machine?