B
Brian K. Michalk
I have a perl app that is calculating the standard deviation of a 4000
element 16 bit integer array, that has large dynamic content. I.e,
the range spans a significant portion of the 16 bits.
I am trying to increase the performance of this critical loop, and
I've found that I am exceeding the 32 bit registers causing Perl to
switch to an infinite precision math library. I've rewritten the loop
such that I now have only one term that overflows 32 bits.
I am at the point of inlining assembler to speed up this loop.
However, the MMX terminology is confusing. I see there are 64bit MMX
registers that can do multiply and accumulate, but I don't see any
64bit add commands. My loop is something like this:
for (i=0; i<4000; i++) { sum += array; }
Of course my loop is slightly bigger than this, but even this toy
example has terrible performance on real data when sum exceeds 32
bits.
Before I go into the time of inlining C (and assembler), are the MMX
registers cabable of accumulating a 64 bit register with adding a
16(or 32) bit operand?
By the way, I've already tried the GMP library, and it did not help.
I also recompiled everything with the MMX optimizations turned on. Oh
yeah, this is on Linux.
element 16 bit integer array, that has large dynamic content. I.e,
the range spans a significant portion of the 16 bits.
I am trying to increase the performance of this critical loop, and
I've found that I am exceeding the 32 bit registers causing Perl to
switch to an infinite precision math library. I've rewritten the loop
such that I now have only one term that overflows 32 bits.
I am at the point of inlining assembler to speed up this loop.
However, the MMX terminology is confusing. I see there are 64bit MMX
registers that can do multiply and accumulate, but I don't see any
64bit add commands. My loop is something like this:
for (i=0; i<4000; i++) { sum += array; }
Of course my loop is slightly bigger than this, but even this toy
example has terrible performance on real data when sum exceeds 32
bits.
Before I go into the time of inlining C (and assembler), are the MMX
registers cabable of accumulating a 64 bit register with adding a
16(or 32) bit operand?
By the way, I've already tried the GMP library, and it did not help.
I also recompiled everything with the MMX optimizations turned on. Oh
yeah, this is on Linux.