how to convert an array of int to an array of float?

B

bwv539

I have to read a bynary file with some signed int (32 bit) data and re-
write the same data into another file in floating point format, 32
bit.

The loop where I do this is this:



int inINT[1024];

float inFLOAT[1024];

int idx;



while(...some control...) {

fread (&inINT[0], sizeof(int), 1024, fin);

if(feof(fin)) {

break;

}

for( idx = 0; idx < 1024; idx++) {

inFLOAT[idx] = (float) inINT[idx];

}

fwrite (inFLOAT, sizeof(float), 1024, fout);

}



I read data in blocks of 1024. I am wondering if there is any way of
getting rid of the for loop, for this application I need speed (I have
many TB to convert).
For example, I tried declaring a pointer to union:

union FI_IN {

int intval;

float fval;

};

union FI_IN* fi_in;



But, the following

fread (&inINT[0], sizeof(int), 1024, fin);

fi_in = (union FI_IN*)inINT;

does not work: if I access union members, ints are correct but float
are garbage.

Any hint?

Thank you.
 
E

Eric Sosman

I have to read a bynary file with some signed int (32 bit) data and re-
write the same data into another file in floating point format, 32
bit.

The loop where I do this is this:



int inINT[1024];
float inFLOAT[1024];
int idx;

while(...some control...) {
fread (&inINT[0], sizeof(int), 1024, fin);
if(feof(fin)) {
break;
}
for( idx = 0; idx< 1024; idx++) {

Side-issue: This is risky. When you're near the end of the
input and there are fewer than 1024 ints remaining, fread() will
read as much as it can but will not fill the entire array. You
may "know" that this "cannot happen," but it costs almost nothing
to note the value fread() returns and use that value in the loop
instead of the hard-wired 1024.

For that matter, fread() may fail due to an I/O error. The
feof() test would not detect this ("I didn't stop at end-of-input;
I stopped at head crash"). Again, the thing to do is to inspect
the value returned by fread().

inFLOAT[idx] = (float) inINT[idx];
}
fwrite (inFLOAT, sizeof(float), 1024, fout);
}

I read data in blocks of 1024. I am wondering if there is any way of
getting rid of the for loop, for this application I need speed (I have
many TB to convert).

No. A conversion is unavoidable, because an int is not a float.
On most systems (including yours, it appears), the mapping from int
values to float values is not even one-to-one: You will find that
there are many sets of distinct int values that all convert to the
same float value. This should convince you that there's no sleight-
of-hand that can let you somehow "do nothing" and have things come
out right.

In any case, you almost certainly needn't worry about the loop.
Ask yourself two questions: (1) How many bytes per second can your
I/O devices read and write, and (2) how many bytes per second can
your RAM read and write? You needn't be really careful about read-
ahead and write-behind, or the effects of L1/2/3 cache, or anything
complicated: We're just looking for "back of the envelope" figures.
Get those figures, compare them, and ponder.
 
B

Ben Bacarisse

bwv539 said:
I have to read a bynary file with some signed int (32 bit) data and re-
write the same data into another file in floating point format, 32
bit.

If this is "throwaway" code for a one-time conversion then your
assumption that int is 32 bits and float the same is fine. If the code
may live longer than you expect or move between implementations you
might want to at least test this assumption in the code (you might be
doing this already, of course, you posted only a fragment).

You know, presumably, that not all 32-bit ints can be exactly
represented by 32-bit floats.
The loop where I do this is this:

int inINT[1024];
float inFLOAT[1024];
int idx;

while(...some control...) {
fread (&inINT[0], sizeof(int), 1024, fin);

Do you mind if less than 1024 of them could be read?
if(feof(fin)) {
break;
}

I'd also worry about read errors. Because of that, I almost always
write input loops so that they are driven by success rather than
terminated by failure. For example if (as your code suggests) there are
always multiple of 1024 ints you could write:

while (/* any other conditions && */
fread(inINT, sizeof inINT, 1, fin) == 1) { /* ... */ }

[This also avoids the need to repeat the number of elements in the array.]
for( idx = 0; idx < 1024; idx++) {
inFLOAT[idx] = (float) inINT[idx];
}
fwrite (inFLOAT, sizeof(float), 1024, fout);
}

I read data in blocks of 1024. I am wondering if there is any way of
getting rid of the for loop,

No, short of using some very specific vectoring instructions that some
machines have the conversion must be a loop.
for this application I need speed (I have
many TB to convert).
For example, I tried declaring a pointer to union:

union FI_IN {
int intval;
float fval;
};

union FI_IN* fi_in;

But, the following

fread (&inINT[0], sizeof(int), 1024, fin);
fi_in = (union FI_IN*)inINT;

That's not the best way to use a union. Mind you, the union idea won't
work so it makes no difference. You might as well have written:

float *fp = (float *)inINT;
write(fp, sizeof(float), 1024, fout);

No conversion happens in this case nor does it in yours. You need to
have code that converts an int to a float and that needs a loop. BTW,
the cast to float in your original code is not needed.
does not work: if I access union members, ints are correct but float
are garbage.

As you found, all you are doing is reinterpreting the int as if it were
a float. But ints and floats are stored using different representations
so you don't get a float with the same value as the int -- you get a
whatever flat corresponds to some specific set of bits (and there may
not even be one).
 
F

Francois Grieu

Le 02/06/2010 14:49, bwv539 a écrit :
I have to read a binary file with some signed int (32 bit) data and re-
write the same data into another file in floating point format, 32
bit.

The loop where I do this is this:



int inINT[1024];

float inFLOAT[1024];

int idx;



while(...some control...) {

fread (&inINT[0], sizeof(int), 1024, fin);

if(feof(fin)) {

break;

}

for( idx = 0; idx < 1024; idx++) {

inFLOAT[idx] = (float) inINT[idx];

}

fwrite (inFLOAT, sizeof(float), 1024, fout);

}



I read data in blocks of 1024. I am wondering if there is any way of
getting rid of the for loop, for this application I need speed (I have
many TB to convert).

The work performed by the "for" loop can not be portably replaced by
some type or pointer fiddling.

It is possible to get rid of the for loop (e.g. by replacing it with
1024 individual assignments) but that is unlikely to much improve
performance (and might be very negative).

Avenues for optimization:

Make 1024 a constant, and properly handle the case where fread only read
a partial buffer, that's easy. Then increase the constant, so that I/O
is by bigger chunks.

Checkthat the bottleneck is the loop (try to remove it and see if the
program runs faster), else ignore the rest of this post.

Activate whatever compiler option turns the conversion into something
intrinsic to the CPU used, if that's possible; all modern x86 CPUs and
compilers can do that. If not possible, and portability is not an issue,
code the conversion yourself (in inline assembly, assembly, or even C);
you may need to know the internal format of floats.

Partially unroll the loop.

If the inINT[] are all (or mostly) within a relatively small range,
maybe a table lookup would help.


François Grieu
 
J

James Waldby

....
In any case, you almost certainly needn't worry about the loop.
Ask yourself two questions: (1) How many bytes per second can your I/O
devices read and write, and (2) how many bytes per second can your RAM
read and write? You needn't be really careful about read- ahead and
write-behind, or the effects of L1/2/3 cache, or anything complicated:
We're just looking for "back of the envelope" figures. Get those
figures, compare them, and ponder.

[OT] If you (the OP, that is) are a skilled programmer, you could
use 3 threads -- a reader thread, a converter thread, and a writer
thread. Run some tests on your system and see if things speed up;
if not, just use the simple form of read/process/write as in the
program you posted, although possibly with buffers 100 times bigger.

This is OT in c.l.c, so for further help, post in comp.programming
or comp.programming.threads instead.
 
K

Keith Thompson

Geoff said:
Why not write:

while(...some control...) {

fread (&inINT, sizeof(int), 1, fin);

if(feof(fin)) {
break;
}

inFLOAT = (float) inINT;
fwrite (&inFLOAT, sizeof(float), 1, fout);

}

Don't use feof() to detect end of input. If there's an error,
the ferr(fin) becomes true, but feof(fin) doesn't, and you've got
yourself an infinite loop.

Check the value returned by fread(). After it's returned 0,
indicating that there's no more input, you can use feof() and/or
ferror() to find out why there's no more input.
 
M

Malcolm McLean

Any hint?
Firstly, it is likely that the conversion time is trivial in
comparision to your IO, as others have noted.

If this is not the case, it is sometimes possible to do fast integer
to float conversion, by accessing the bits of the floating point
number directly. You can also sometimes pipeline the units so that the
floating point unit is doing half the conversions and the integer unit
the other half.
However these are very non-portable, hacker's techniques, and you only
try them as a last resort.
 
B

Ben Bacarisse

Geoff said:
I would probably have written a more robust function something like
this:

void int2double(void)
{
int ival[BUFF_SIZE];
double fval[BUFF_SIZE];
int idx;

while(1)
{
if (!fread (&ival[0], sizeof(int), BUFF_SIZE, fin)) {
if(feof(fin)) {
break;
}
else if(ferror(fin)) {
printf("Error %i reading input file\n", ferror(fin));
break;
}
}

for(idx = 0; idx < BUFF_SIZE; idx++) {
fval[idx] = (double) ival[idx];
}
fwrite (&fval, sizeof(double), BUFF_SIZE, fout);
}
}

I don't think that's more robust. I tries to detect errors as well as
EOF but it fails to do both in what I'd call a robust way. Both EOF and
a read error can cause fread to return a short count (not zero). If you
get an error you'd want to report it and in both cases you'd want to
either processes the items that were read or (at least) not try to
process a full buffer's worth.

Many of these problems come from working backwards. Why write while (1)
and then try to detect a problem? I'd loop while there is data to be
processed and report the reasons for stopping later:

int ival[BUFF_SIZE];
double fval[BUFF_SIZE];
size_t items;

while (items = fread(ival, sizeof(int), BUFF_SIZE, fin)) {
size_t idx;
for (idx = 0; idx < items; idx++)
fval[idx] = ival[idx];
fwrite(&fval, sizeof(double), items, fout);
}
if (ferror(fin))
fprintf(stderr, "Error reading input file\n");

This has the advantage that EOF can simply be ignored, and we can be
certain that the correct number of items get processed (modulo typos of
course).

I've made a bunch of other changes. For example ferror does not report
anything interesting in the return value (other than it's being or not
being zero of course) and error messages are usually better written to
stderr. Since we are processing object count, size_t seems the best
counter type and the cast to float is redundant. I like also to give
variables as small as scope as possible (e.g. idx). Most of these other
changes are cosmetic.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top