D
Dilip
Hello Wise men
This question is purely academic as I am just trying to understand this
stuff a little better. Therefore please free to instruct me to other
suitabe NGs if necessary.
Recently I started a thread (at comp.lang.c) on converting a single
precision floating point number to double precision after running into
a problem at work. Little did I realize it would spawn close to 100
replies. Needless to say I learnt a lot in the discussion.
Considering I will probably run into many such problems in the future I
set out to study the IEEE 754 representation of these numbers. I have
understood it to an extent but some gaps remain. I would be extremely
grateful if one of you more knowledgeable types can set me straight.
What I understand so far: (for single precision)
IEEE says the following is the bit pattern of such numbers:
<1 bit for sign><8 bits for exponent><23 bits for significand>
Mathematically a normalized binary floating point number is represented
as:
(-1)^s * 1.f * 2^(e-127)
where s is the sign bit
f is the 23 bits of significand **fraction**
and e is the biased exponent which can range from 0 to 255
So given a number that is represented like this:
1.00000000000000000000001 * 2^18
which has 23 zeros following the binary decimal point.
we turn that into:
1000000000000000000.00001
To convert that into decimal:
1 * 2^18 +...... +..... +..... + 1 * 1/2^5 which is 262144 + 0.03125
which is basically
262,144.03125
In other words the bit pattern for such a number will look like this:
0 10010001 00000000000000000000001
(sign bit, exponent and significand fraction) [exponent part is
10010001 because e-127 = 18 and hence e = 145 in decimal]
which in hex is 0x48800001
So far so good?
Now my question is given how do I go in the reverse direction?
Given a number 59.889999 how do I retrace back to what its binary
representation looks like?
I seem to be missing a step in between.
thanks!
This question is purely academic as I am just trying to understand this
stuff a little better. Therefore please free to instruct me to other
suitabe NGs if necessary.
Recently I started a thread (at comp.lang.c) on converting a single
precision floating point number to double precision after running into
a problem at work. Little did I realize it would spawn close to 100
replies. Needless to say I learnt a lot in the discussion.
Considering I will probably run into many such problems in the future I
set out to study the IEEE 754 representation of these numbers. I have
understood it to an extent but some gaps remain. I would be extremely
grateful if one of you more knowledgeable types can set me straight.
What I understand so far: (for single precision)
IEEE says the following is the bit pattern of such numbers:
<1 bit for sign><8 bits for exponent><23 bits for significand>
Mathematically a normalized binary floating point number is represented
as:
(-1)^s * 1.f * 2^(e-127)
where s is the sign bit
f is the 23 bits of significand **fraction**
and e is the biased exponent which can range from 0 to 255
So given a number that is represented like this:
1.00000000000000000000001 * 2^18
which has 23 zeros following the binary decimal point.
we turn that into:
1000000000000000000.00001
To convert that into decimal:
1 * 2^18 +...... +..... +..... + 1 * 1/2^5 which is 262144 + 0.03125
which is basically
262,144.03125
In other words the bit pattern for such a number will look like this:
0 10010001 00000000000000000000001
(sign bit, exponent and significand fraction) [exponent part is
10010001 because e-127 = 18 and hence e = 145 in decimal]
which in hex is 0x48800001
So far so good?
Now my question is given how do I go in the reverse direction?
Given a number 59.889999 how do I retrace back to what its binary
representation looks like?
I seem to be missing a step in between.
thanks!