Hello all, I'm facing the all-time favorite design issue of creating an implementation for floating point division (for hobby, not work). A direct face value implementation shouldn't cause too much trouble, but I wonder if anyone knows some optimized algorithms for this problem. Also: part of the division always seems to be the (18-22 bit in this case) subtraction. One clock subtraction of such numbers should probably work at most at 100-150 MHz. Should I consider breaking up the subtraction into 2 clock cycles, or are there better algorithms for this that use a lower logic depth? Are compilers like Synopsys DC capable of optimizing this or should I delve into subtraction algorithms myself? Regards, Pieter Hulshoff