|In article <
[email protected]>,
|:Consider the following statement:
|:n+i, where i = 1 or 0.
|:Is there more fast method for computing n+i than direct computing that sum?
|It depends on the costs you assign to the various operations -- a
|matter which is architecture dependant.
There is a possibility that would be slower in any real
architecture that I've ever heard of, but which could be faster
under very narrow circumstances.
(n&1) ? (n+i) : (n|i)
The narrow circumstances under which this could be faster are:
- this is within a tight loop that fits within the processor's
primary instruction cache
- the processor has a "move conditional" operation that
avoids taking an actual branch when the operations are
simple enough and the result is being used arithmetically
instead of to control a branch
- at the microcode level, the processor "runs free"
when working from instruction cache, processing each
instruction as fast as possible instead of working
on a bus-cycle system (which is needed in most cases
when anything outside the primary cache is being referenced)
- the cost of the bitwise AND operation plus the cost of the
comparison to 0 plus the cost of the bitwise OR operation,
are faster than the cost of a full addition
I have heard of one architecture (I don't recall which)
that had a "move conditional" operation that took
a test condition and two arithmetic operations as operands,
and would start doing the two artihmetic operations in
parallel at the same time it was doing the test; when the
result of the test was available, it would abort the false
branch if it was not already finished, with the result
being whichever of the arithmetic expressions was selected
by the condition.
Please note how narrow these conditions are: you would
have to know a LOT about your processor to make this kind
of optimization: the expression I give above will be slower
than a straight addition on nearly every architecture.
Addition is usually hard-coded through a series of
transistors, with the carry circuit taking most of the
landscape. It's hard to beat transistor-level speeds
by using multiple instructions.
I have heard that some architectures internally
optimize +0 and +1; there would be no way to beat that...
but again you would need to know intimate details of
the architecture.