L
Lisa Simpson
I am writing a program (precisely an image processing filter) which computes
several multiplications of floating-point numbers by constant powers of 2.
This means adding an integer constant to the exponent without changing the
mantissa, and I believed that this was simpler than multiplying two
floating-point numbers. Instead, if I write the program using the C "ldexp"
function, it runs much slower and produces the same output.
Since the program should run on a Pentium 4 processor, I downloaded a manual
from Intel's website
http://www.intel.com/design/pentium4/manuals/248966.htm
and indeed floating-point multiplication ("fmul" in assembler) has a latency
of 8 clock cycles, while changing the exponent ("fscale" in assembler) has a
latency of 60 cycles. This makes no sense to me.
I would like to ask if anyone knows a fast method for multiplying a
floating-point number by a constant power of 2, or if I should use standard
multiplication. I must use floating-point arithmetic because the input data
can span several orders of magnitude and the program also computes square
roots and exponentials. Speed is important, because the program should
process high-resolution video.
I tried to add an integer to the exponent using an union, but this is still
a bit slower than pure floating-point multiplication. I guess this happens
because the processor uses different registers for integer and
floating-point operations, therefore some clock cycles are wasted in moving
data around. Also in this way underflows give incorrect results, e.g.
"multiplying" 0.0 by 2^(-1) gives -Inf. Maybe I could gain some speed by
using SSE instructions, which (as far as I know) use the same set of
registers for both integer and floating-point operations, but this will not
solve the underflow problem.
I hope someone could help me. Thanks in advance.
An Electronics Engineering PhD student
several multiplications of floating-point numbers by constant powers of 2.
This means adding an integer constant to the exponent without changing the
mantissa, and I believed that this was simpler than multiplying two
floating-point numbers. Instead, if I write the program using the C "ldexp"
function, it runs much slower and produces the same output.
Since the program should run on a Pentium 4 processor, I downloaded a manual
from Intel's website
http://www.intel.com/design/pentium4/manuals/248966.htm
and indeed floating-point multiplication ("fmul" in assembler) has a latency
of 8 clock cycles, while changing the exponent ("fscale" in assembler) has a
latency of 60 cycles. This makes no sense to me.
I would like to ask if anyone knows a fast method for multiplying a
floating-point number by a constant power of 2, or if I should use standard
multiplication. I must use floating-point arithmetic because the input data
can span several orders of magnitude and the program also computes square
roots and exponentials. Speed is important, because the program should
process high-resolution video.
I tried to add an integer to the exponent using an union, but this is still
a bit slower than pure floating-point multiplication. I guess this happens
because the processor uses different registers for integer and
floating-point operations, therefore some clock cycles are wasted in moving
data around. Also in this way underflows give incorrect results, e.g.
"multiplying" 0.0 by 2^(-1) gives -Inf. Maybe I could gain some speed by
using SSE instructions, which (as far as I know) use the same set of
registers for both integer and floating-point operations, but this will not
solve the underflow problem.
I hope someone could help me. Thanks in advance.
An Electronics Engineering PhD student