About floating number arithmetic

B

Bo Yang

Hi,
I have spent some time on the web to search the definitive guide
about how the floating point number in computer are calculated. Take
double for example, I mean, when the FPU process any arithmetic of two
double, it must use a more wider register to restore the intermediary
result. Sine the double's fraction width is "52" bits, how wide is the
intermediary register usually? And more important, what round-off
method does FPU use? Round-to-nearest or round-to-even? Thanks!


Regards!
Bo
 
Ñ

Ñî ²¨

Typical hardware also has a mode with 64-bits of mantissa, so
it might actually use that. I seem to recall, though, that if you
start off with double precision and intend to round the result
to that precision, you only need 3 extra bits.


From the point of view of the hardware, this is typically a
software-settable option (this may be available from C in a particular
C99 implementation), however, round-to-nearest *IS* round-to-even.
The other options you get are round UP, round DOWN, and round towards
zero.

Thank you very much, Gordon.
For round-off issues, if we get the first five digits:

Round-even:
10001100 --> 10010
10000100 --> 10000

Round-up:
10001100 --> 10010
10000100 --> 10001

Am I right? And the round-to-even method in binary is not equal to
round-to-up method in decimal?

Regards!
Bo
 
U

user923005

Hi,
   I have spent some time on the web to search the definitive guide
about how the floating point number in computer are calculated.

You are searching for a document that does not exist.
Take
double for example, I mean, when the FPU process any arithmetic of two
double, it must use a more wider register to restore the intermediary
result.

There are literally no rules for how this must take place. For
instance, C does not demand the use of IEEE arithmetic. Even with
IEEE arithmetic, you can have more than one rounding mode. The C
compiler vendor is free to choose any data representation and model
that they like, so long as it meets certain minimal requirements. For
the most part, C's floating point model does not even have any
accuracy requirements.
Sine the double's fraction width is "52" bits, how wide is the
intermediary register usually?

Double's fraction width is not always 52 bits. OpenVMS D-FLOAT (one 8
byte double format), for instance, has 55 mantissa bits.
And more important, what round-off
method does FPU use? Round-to-nearest or round-to-even?

Hopefully, your compiler's documentation will tell you. In the
meantime, you might enjoy reading this:
http://dlc.sun.com/pdf/800-7895/800-7895.pdf
 
G

Guest

   I have spent some time on the web to search the definitive guide
about how the floating point number in computer are calculated.

you could start here
http://en.wikipedia.org/wiki/Floating_point

most introductory CS books should discuss FP and a book
on numerical analysis will give more detail. Knuth's
"The Art of Computer Programming Vol II" might be a bit heavy.
Take double for example,

this narrow's the subject of discourse a little
I mean, when the FPU

you are assuming there *is* an FPU. Though conceptually
a bunch of software to do FP arithmatic is, in a sense, an FPU.
process any arithmetic of two
double, it must use a more wider register to restore the intermediary
result.

probaly a good idea, but there has been some very strange FP over
the years.

Sin[c]e the double's fraction width is "52" bits,

not necessarily

how wide is the
intermediary register usually? And more important, what round-off
method does FPU use? Round-to-nearest or round-to-even? Thanks!

you might want to look at this
http://en.wikipedia.org/wiki/IEEE_754

which discusses a particular (and very common) standard for FP.

Why do you want to know?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,431
Messages
2,571,678
Members
48,796
Latest member
Greg L.

Latest Threads

Top