Richard Tobin wrote, On 25/06/07 09:15:
It might use a multiply-and-add instruction which does them both at
once.
(I assume multiply-and-add works domething like this: to do 12*23+45
it does 10*23 + 2*23 + 45, so that the multiplication might at least
not be finished before the addition.)
On the processors I've used with a MAC instruction, the add is the
result of the previous multiple not the current one, so it can be done
in parallel with the multiple. So if you had
a * b + a * c + a * d;
You could implement it as
ZAC ; Zero the accumulator
LT a ; Load a in to the T register
MPY b ; Multiple b by the T register
MAC c ; Add the product to the accumulator and multiple c * T
MAC d ; Add the product to the accumulator and multiple d * T
APAC ; Add the product to the accumulator
So here an add is always occurring at the same time (same clock cycle)
as a multiply, but the add is always the result of an earlier multiply.
The above was for the Texas Instruments TMS320C2x series of processors.
What I have done with hand coded assembler which is even screwier was
using some of the address registers for add/subtract whilst using the
accumulator & multiplier for other things. Because the addressing was
done in an earlier pipeline stage to the ALU, multiplier and
transferring data to/from the address registers, it ended up with
instructions using data in address registers appearing *after*
instructions overwriting the data in the register, although the parts
actually executed in the correct order. This was on a TMS320C80, and I
would not have written code like that if it was not for the fact that
even using every trick I could come up with we still did not have enough
processing power (I did it so it would work as much as possible with
what we had).
I don't know how much the compilers make use of these tricks, I know the
TMS320C80 C compiler was bad, but I generally found the TMS320C2x
compiler did a good enough job. However, the compiler writers *could* do
screwy things like this if they were clever enough.