Fixed Point Rounding

Tricky · May 14, 2010

Im using the new float fix lib, and Im getting a bit confused with the
resize function.

Im using the following code:

video_us : out ufixed(7 downto 0)
...
variable weighted_pix0 : ufixed(7 downto -12);
variable weighted_pix1 : ufixed(7 downto -12);
.......
video_us <= resize(weighted_pix0 + weighted_pix1, 7 ,
0);

Now, when the sum of the 2 weighted pixels is X.5, the rounded output
(round style is defaulting to fixed_round) is always X, not X+1. Is
there a rule Im missing? Is this not what the resize function is for,
because otherwise whats the point of the "round_style" and
"overflow_style" function arguments? do I have to go back to old way
of rounding that is +0.5 and then truncating?

Thanks for help in advance

KJ · May 15, 2010

Im using the new float fix lib, and Im getting a bit confused with the
resize function.

Im using the following code:

video_us : out ufixed(7 downto 0)
..
variable weighted_pix0 : ufixed(7 downto -12);
variable weighted_pix1 : ufixed(7 downto -12);
......
video_us <= resize(weighted_pix0 + weighted_pix1, 7 ,
0);

Now, when the sum of the 2 weighted pixels is X.5, the rounded output
(round style is defaulting to fixed_round) is always X, not X+1. Is
there a rule Im missing?

What you're missing is that whether X.5 rounds up or down depends on
what X is.

From ther user's guide...
"round_style" defaults to fixed_round (true) that turns on the
rounding routines. If false (fixed_truncate), the number is truncated.
Rounding is done by first looking to see if the MSB of the remainder
is a “1”, AND the LSB of the unrounded result is a “1” or the lower
bits of the remainder include a “1”, the result will be rounded. This
is similar to the floating-point “round_nearest” style. The down side
is that ALL of the bits are included in the decision to round

do I have to go back to old way
of rounding that is +0.5 and then truncating?

I use the "+0.5 and then truncating" approach because it takes less
logic to implement (hence the 'down side' mentioned in the user's
guide) and my requirements haven't so far required the floating-point
“round_nearest” style

Kevin Jennings

KJ · May 15, 2010

Should've said "I *have* used +0.5...". I've also used 'fixed_round'.

Which causes data forking.
Take the numbers 1.5 to 8.5 and round by your method and add up the
error.

Depending on the application though, this additional error for certain
input combinations might still be acceptable. "+0.5 and truncate" is
generally intermediate between 'fixed_truncate' and 'fixed_round' both
in error and logic resources to implement. Which of the three methods
is 'best' will depend on the accuracy requirements of the particular
application.

"+0.5 and truncate" is a design tradeoff that should be evaluated
versus 'fixed_truncate' and 'fixed_round'...it's just another tool in
the toolbox.

As an example of resource usage, I took Tricky's code (actual code
posted below) and ran it through Quartus 9.0 to produce the following
results:

Rounding method Logic resources
=============== ===============
fixed_round 44
+.5_and_trunc 39
fixed_truncate 29

Kevin Jennings

--- START OF CODE
library ieee_proposed;
use ieee_proposed.math_utility_pkg.all;
use ieee_proposed.fixed_pkg.all;

entity Resizer_Adder is port(
weighted_pix0: in ufixed(7 downto -12);
weighted_pix1: in ufixed(7 downto -12);
video_us: out ufixed(7 downto 0));
end Resizer_Adder;
architecture rtl of Resizer_Adder is
begin
-- Uncomment the line you would like to evaluate
video_us <= resize(weighted_pix0 + weighted_pix1, 7 , 0,
fixed_overflow_style,fixed_round);
-- video_us <= resize(weighted_pix0 + weighted_pix1 +
to_ufixed(0.5,-1,-1), 7 , 0, fixed_overflow_style,fixed_truncate);
-- video_us <= resize(weighted_pix0 + weighted_pix1, 7 , 0,
fixed_overflow_style,fixed_truncate);
end rtl;
--- END OF CODE

Tricky · May 17, 2010

KJ is correct.

4.5 rounds to 4
5.5 rounds to 6

Though carrying 12 bits of decimal is a bit overkill.

Which causes data forking.
Take the numbers 1.5 to 8.5 and round by your method and add up the
error. Then do the same for my method (I wrote the fixed point packages).

For me, data forking (if I understand correctly - is that like
compounding errors?) shouldnt be an issue becuase this is the final
output. The 12 bits are carried only because I have a previous divide
(by a constant 2^n, with n as a generic) with input data only carrying
4 bits fractional. 12 bits contains the worst possible case of N, with
me expecting the synthesiser to clear up anything thats overkill. And
actually, for my output, I require X.5 to always round to X+1, so
there is no error.

How to use ufixed when it involves multiplication a number of times?(VHDL question)	0	Aug 22, 2016
Binary to BCD code understanding	0	Dec 27, 2021
fixed point syntax question	1	Dec 11, 2008
Extracting type - Not possible or would it be useful in a future standard?	4	Sep 7, 2010
Problem with codewars.	5	Dec 4, 2023
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
Hexadecimal value (literal) as function parameter, is it possible?	3	Apr 12, 2013
and operator overloading	0	Jun 4, 2016

Fixed Point Rounding

Tricky

KJ

KJ

Tricky

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads