About floating number arithmetic

Bo Yang · May 3, 2009

Hi,
I have spent some time on the web to search the definitive guide
about how the floating point number in computer are calculated. Take
double for example, I mean, when the FPU process any arithmetic of two
double, it must use a more wider register to restore the intermediary
result. Sine the double's fraction width is "52" bits, how wide is the
intermediary register usually? And more important, what round-off
method does FPU use? Round-to-nearest or round-to-even? Thanks!

Regards!
Bo

Ñî ²¨ · May 3, 2009

Typical hardware also has a mode with 64-bits of mantissa, so
it might actually use that. I seem to recall, though, that if you
start off with double precision and intend to round the result
to that precision, you only need 3 extra bits.

From the point of view of the hardware, this is typically a
software-settable option (this may be available from C in a particular
C99 implementation), however, round-to-nearest *IS* round-to-even.
The other options you get are round UP, round DOWN, and round towards
zero.

Thank you very much, Gordon.
For round-off issues, if we get the first five digits:

Round-even:
10001100 --> 10010
10000100 --> 10000

Round-up:
10001100 --> 10010
10000100 --> 10001

Am I right? And the round-to-even method in binary is not equal to
round-to-up method in decimal?

Regards!
Bo

user923005 · May 4, 2009

Hi,
I have spent some time on the web to search the definitive guide
about how the floating point number in computer are calculated.

You are searching for a document that does not exist.

Take
double for example, I mean, when the FPU process any arithmetic of two
double, it must use a more wider register to restore the intermediary
result.

There are literally no rules for how this must take place. For
instance, C does not demand the use of IEEE arithmetic. Even with
IEEE arithmetic, you can have more than one rounding mode. The C
compiler vendor is free to choose any data representation and model
that they like, so long as it meets certain minimal requirements. For
the most part, C's floating point model does not even have any
accuracy requirements.

Sine the double's fraction width is "52" bits, how wide is the
intermediary register usually?

Double's fraction width is not always 52 bits. OpenVMS D-FLOAT (one 8
byte double format), for instance, has 55 mantissa bits.

And more important, what round-off
method does FPU use? Round-to-nearest or round-to-even?

Hopefully, your compiler's documentation will tell you. In the
meantime, you might enjoy reading this:
http://dlc.sun.com/pdf/800-7895/800-7895.pdf

Guest · May 5, 2009

I have spent some time on the web to search the definitive guide
about how the floating point number in computer are calculated.

you could start here
http://en.wikipedia.org/wiki/Floating_point

most introductory CS books should discuss FP and a book
on numerical analysis will give more detail. Knuth's
"The Art of Computer Programming Vol II" might be a bit heavy.

Take double for example,

this narrow's the subject of discourse a little

I mean, when the FPU

you are assuming there *is* an FPU. Though conceptually
a bunch of software to do FP arithmatic is, in a sense, an FPU.

process any arithmetic of two
double, it must use a more wider register to restore the intermediary
result.

probaly a good idea, but there has been some very strange FP over
the years.

Sin[c]e the double's fraction width is "52" bits,

not necessarily

how wide is the
intermediary register usually? And more important, what round-off
method does FPU use? Round-to-nearest or round-to-even? Thanks!

you might want to look at this
http://en.wikipedia.org/wiki/IEEE_754

which discusses a particular (and very common) standard for FP.

Why do you want to know?

Python -- floating point arithmetic	3	Jul 7, 2010
Python -- floating point arithmetic	2	Jul 7, 2010
The floating point environment	4	Jan 17, 2006
Weird Behavior with Rays in C and OpenGL	4	Feb 12, 2024
Semi OT: Binary representation of floating point numbers	8	Dec 27, 2006
About Rational Number (PEP 239/PEP 240)	25	Dec 15, 2007
ideal interface for Random Number Generators?	39	Jun 7, 2010
Random Number Generators....	40	Feb 26, 2006

About floating number arithmetic

Bo Yang

Ñî ²¨

user923005

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads