convert 32bit numbers to 64bit (or float to double)

S

Sebastian Gibb

Hello,

a long time ago I had to use a foreign C application to generate some
numbers. This application saves the numbers as 32bit (float) values in a
file.
I had to use R (www.r-project.org) to read the files. It imports the
values as 64bit (double, R only knows doubles) and generates some pseudo
numbers at position after decimal point.
I want to show you an example C++ code, which does nearly the same:
#include <iostream>
#include <iomanip>
#include <limits>

using namespace std;

int main() {
float myFloat = 1234.56;
double myDouble = myFloat;

cout << setprecision(numeric_limits<float>::digits10) << "myFloat: " <<
myFloat << endl;
cout << setprecision(numeric_limits<double>::digits10) << "myDouble: " <<
myDouble << endl;

return 0;
}

output:
myFloat: 1234.56
myDouble: 1234.56005859375

You could see, the fifth position after decimal point becomes a pseudo
value.
Now I am able to replace the foreign C application by an own R application.
(The algorithm uses double values, too.)
For compatibility reasons I want to get the same values for R like the old C
ones. Until now I use an C-binding for R to do the following:

double precision32(double value) {
float x = value;
return (double)x;
}

I want to know what happens when I call "double myDouble=myFloat" and how
can I simulate sth. like that with only using double values?

Kind regards,

Sebastian
 
S

Sebastian Gibb

Victor said:
[...]
I want to know what happens when I call "double myDouble=myFloat" and how
can I simulate sth. like that with only using double values?

I thin you will benefit from studying this article:

http://docs.sun.com/source/806-3568/ncg_goldberg.html

When you comprehend everything it presents, review your code and your
approaches, and if you still have some questions, come and ask them.

V
Hello,

after reading the article "What Every Computer Scientist Should Know About
Floating-Point Arithmetic" by Mr. Goldberg I update my code.
I don't understand why it is working only partially.
I test some floating point numbers. Only half of them are converted
correctly.
18.4 -> correct
999.4813232421875 -> correct
1/3 -> not working
0.1 -> not working

What do I wrong?

Kind regards,

Sebastian

You find my code at: http://pastebin.com/yczrW8br
 
S

Sebastian Gibb

Hello,
Shorten it to the minimum and post it here. Most of us don't click on
links. At least I don't.

Sorry, I thought nobody would read long code without syntax highlighting.
#include <cmath>
#include <iomanip>
#include <iostream>
#include <vector>

using namespace std;

// some constants from IEEE 754
const int nBitsSingleMantissa = 23;
const int nBitsSingleExzess = 8;
const int nBitsDoubleMantissa = 52;
const int nBitsDoubleExzess = 11;

// old method using by another cpp application
// it is my reference method
double convertWithCast(double value) {
float x = value;
return (double)x;
}

// try to simulate the same behaviour without using floats
struct IEEEBinary {
int signedBit;
vector<int > exzess;
vector<int > mantissa;
};

vector<int > swapVectorOrder(const vector<int >& x) {
vector<int > y;
for (int i=x.size()-1; i >= 0; --i) {
y.push_back(x);
}
return y;
}

double calcExzess(int nEBits) {
return pow(2, nEBits-1)-1;
}

IEEEBinary double2binary(double x, int nMBits, int nEBits) {
// calculate mantissa
// before point
int pre = floor(abs(x));
vector<int > preMantissa;

while (pre != 0) {
preMantissa.push_back(pre % 2);
pre = floor(pre/2.0);
}

if (preMantissa.size() > 1)
preMantissa = swapVectorOrder(preMantissa);

// after point
double post = x - floor(x);
vector<int > postMantissa;
for (unsigned int i=0; i<2*nMBits; ++i) {
post = post * 2;
int pre = floor(post);
postMantissa.push_back(pre);
post -= pre;
}

vector<int > mantissa = preMantissa;
mantissa.insert(mantissa.end(), postMantissa.begin(), postMantissa.end());

// normalize
vector<int >::iterator it;

for (it = mantissa.begin(); it != mantissa.end(); ++it) {
if (*it == 1)
break;
}
// save size for exzess calc
unsigned int sMantissa = mantissa.size();
// remove leading zeros and first 1
it = mantissa.erase(mantissa.begin(), (it+1));
// save new size for exzess calc
unsigned int sMantissa2 = mantissa.size();

// round
if (mantissa.at(nMBits+1) == 1) {
mantissa.at(nMBits) = 1;
}

// cut
mantissa.erase(it+nMBits, mantissa.end());
//mantissa.erase(mantissa.end());

// exzess
int ex = calcExzess(nEBits) + preMantissa.size() - (sMantissa-sMantissa2);

vector<int > exzess;

while (ex != 0) {
exzess.push_back(ex % 2);
ex = floor(ex/2.0);
}

// append zeros to exzess
if (exzess.size() < nEBits) {
for (unsigned int i=exzess.size(); i<nEBits; ++i)
exzess.push_back(0);
}

exzess = swapVectorOrder(exzess);

// signed bit
int signedBit = 0;

if (x < 0) {
signedBit = 1;
}

// build binary struct
IEEEBinary bin;
bin.signedBit = signedBit;
bin.mantissa = mantissa;
bin.exzess = exzess;

return bin;
}

double binary2double(const IEEEBinary& binary) {
int exzess = 0;

for (unsigned int i = 0; i < binary.exzess.size(); ++i)
exzess += binary.exzess*pow(2, binary.exzess.size()-(i+1));

exzess -= calcExzess(binary.exzess.size());

double value = pow(2, exzess);

for (unsigned int i = 0; i < binary.mantissa.size(); ++i) {
value += binary.mantissa*pow(2, exzess-(int)(i+1));
}

if (binary.signedBit == 1)
value *= (-1);

return value;
}

// wrapper function
double convertWithoutCast(double value) {
return binary2double(double2binary(value, nBitsSingleMantissa,
nBitsSingleExzess));
}


int main() {
vector<double > testValues;
testValues.push_back(1.0/3.0);
testValues.push_back(18.4);
testValues.push_back(0.1);
testValues.push_back(999.4813232421875);

for (vector<double >::iterator it=testValues.begin(); it !=
testValues.end(); ++it) {
double oldConv = convertWithCast(*it);
double newConv = convertWithoutCast(*it);

if (oldConv != newConv) {
cout << setprecision(22) << *it << ": " << oldConv << " != " <<
newConv << endl;
}
}

return 0;
}

// the output:
0.3333333333333333148296:
0.3333333432674407958984 != 0.3333333134651184082031
0.1000000000000000055511:
0.1000000014901161193848 != 0.09999999403953552246094

// it works for
18.4 and 999.4813232421875

I think, I do something wrong because the old method with typical c-cast
returns a different value in comparison to my new method without c-cast.

Kind regards,

Sebastian
 
S

Sebastian Gibb

Hello,

Victor said:
Apparently it either contains hardware-specific code (which I don't see
right away) or contains a logical error (for which, while on vacation, I
really don't care to search) - when I took your code and tried debugging
it with VC10, I got first of all some errors I needed to correct (mostly
the use of an ambiguous 'pow'), and second of all, a debugging assertion
failed in one of the functions, the iterator was out of bounds.
I use g++ 4.4.1 and get no warnings caused by 'pow'. (g++ -Wall ...)
Your code is overly complex, I believe. And it doesn't seem to contain
any test cases. Consider writing test cases, like expecting a zeroed
mantissa with a power of 2, and a particular mantissa. When you split
your number into the mantissa and "exzess" (exponent), you really need
to make sure your splitting code works right before relying on it for
your "conversion".
Thank for your advice. I will add some test cases and hope to find the
logical error.

Kind regards,

Sebastian
 
J

Jorgen Grahn

Hello,


I use g++ 4.4.1 and get no warnings caused by 'pow'. (g++ -Wall ...)

Note that g++ -Wall does NOT mean "enable all warnings". See your
documentation for details.

That said, all I got from gcc 4.3.2 was a few signedness warnings, no
matter which flags I added.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top