binary for floating point numbers - small?

S

suresh

Hi
when one store real numbers (doubles) in a binary file, could the file
size be smaller if the data is stored in a binary file than in an
ascii file? I wrote a small program to verify and both files give the
same size. But for integer data, binary files seems to be smaller in
size as the integer value increases which is understandable. I need to
store a really large square matrix of floating point numbers (1
million X 1 million) in a file and what is the best format to store
them so that file size is smaller?

The program I used to test the ascii and binary file sizes is given
below.
Thanks
suresh

#include <iostream>
#include <fstream>

using namespace std;
int main(){
double x[] = {1000.234,2000.345,20000000.567};
ofstream outfileA("ascii.txt");
for(int i = 0; i < sizeof(x)/sizeof(double);i++)
outfileA << x << " ";
outfileA.close();

ofstream outfileB("bin.bin",ios::binary);
for(int i = 0; i < sizeof(x)/sizeof(double);i++)
outfileB.write(reinterpret_cast<char*>(&x),sizeof(double));
}
 
U

Ulrich Eckhardt

suresh said:
when one store real numbers (doubles) in a binary file, could the file
size be smaller if the data is stored in a binary file than in an
ascii file?

Yes. The in-memory representation (which I assume you just write to disk)
is typically 8 bytes, regardless of the value. The representation as
decimal string can be two bytes (single-digit number plus separating
space) to many bytes for numbers written with many digits.

I wrote a small program to verify and both files give the
same size. But for integer data, binary files seems to be smaller in
size as the integer value increases which is understandable.

The same thing applies to floating-point numbers as to integers.
double x[] = {1000.234,2000.345,20000000.567};

Have you tested things like 0.5, too? That one should be significantly
smaller as ASCII.

I need to store a really large square matrix of floating point
numbers (1 million X 1 million) in a file and what is the best format
to store them so that file size is smaller?

That would be 8 TiB of data for a binary packed format and 8 bytes per
value.


Actually, with that amount of data, I'd say there are other things to
consider that are also important:
1. You can't memory-map such a file, unless you have a 64-bit OS. On a 32-
bit OS, you could map a part of it, but that won't help unless you know
the precise position of the number in the file, which requires either an
index or that you are using a fixed field size.
2. How many significant digits do you need? Your test numbers above use up
to eleven significant digits. I would say such a precision typically
doesn't come up in practice (though your case may be special) so using
float instead of double would be an alternative.
3. You probably don't only want to store the data but also access it and
work with it. The way you store it should reflect the usage pattern. I
could imagine e.g. using a matrix of matrixes (maybe size 10k by 10k
values each), each within its own file in order to quickly locate the
values.
4. The layout of floating point values in memory is not guaranteed. Most
CPUs use IEEE 754, some in little or big endian though. This affects
portability/readability of your files on different machines. A portable
format would be ASCII, but that has a large overhead. You may have to make
a compromise here. You could also convert between a machine-dependent
format and a machine-independent representation in ASCII.


Interesting question in any case, I'd like to hear more about this!

Happy weekend!

Uli
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top