how to convert an array of int to an array of float?

Discussion in 'C Programming' started by bwv539, Jun 2, 2010.

  1. bwv539

    bwv539 Guest

    I have to read a bynary file with some signed int (32 bit) data and re-
    write the same data into another file in floating point format, 32
    bit.

    The loop where I do this is this:



    int inINT[1024];

    float inFLOAT[1024];

    int idx;



    while(...some control...) {

    fread (&inINT[0], sizeof(int), 1024, fin);

    if(feof(fin)) {

    break;

    }

    for( idx = 0; idx < 1024; idx++) {

    inFLOAT[idx] = (float) inINT[idx];

    }

    fwrite (inFLOAT, sizeof(float), 1024, fout);

    }



    I read data in blocks of 1024. I am wondering if there is any way of
    getting rid of the for loop, for this application I need speed (I have
    many TB to convert).
    For example, I tried declaring a pointer to union:

    union FI_IN {

    int intval;

    float fval;

    };

    union FI_IN* fi_in;



    But, the following

    fread (&inINT[0], sizeof(int), 1024, fin);

    fi_in = (union FI_IN*)inINT;

    does not work: if I access union members, ints are correct but float
    are garbage.

    Any hint?

    Thank you.
     
    bwv539, Jun 2, 2010
    #1
    1. Advertising

  2. bwv539

    Eric Sosman Guest

    On 6/2/2010 8:49 AM, bwv539 wrote:
    > I have to read a bynary file with some signed int (32 bit) data and re-
    > write the same data into another file in floating point format, 32
    > bit.
    >
    > The loop where I do this is this:
    >
    >
    >
    > int inINT[1024];
    > float inFLOAT[1024];
    > int idx;
    >
    > while(...some control...) {
    > fread (&inINT[0], sizeof(int), 1024, fin);
    > if(feof(fin)) {
    > break;
    > }
    > for( idx = 0; idx< 1024; idx++) {


    Side-issue: This is risky. When you're near the end of the
    input and there are fewer than 1024 ints remaining, fread() will
    read as much as it can but will not fill the entire array. You
    may "know" that this "cannot happen," but it costs almost nothing
    to note the value fread() returns and use that value in the loop
    instead of the hard-wired 1024.

    For that matter, fread() may fail due to an I/O error. The
    feof() test would not detect this ("I didn't stop at end-of-input;
    I stopped at head crash"). Again, the thing to do is to inspect
    the value returned by fread().


    > inFLOAT[idx] = (float) inINT[idx];
    > }
    > fwrite (inFLOAT, sizeof(float), 1024, fout);
    > }
    >
    > I read data in blocks of 1024. I am wondering if there is any way of
    > getting rid of the for loop, for this application I need speed (I have
    > many TB to convert).


    No. A conversion is unavoidable, because an int is not a float.
    On most systems (including yours, it appears), the mapping from int
    values to float values is not even one-to-one: You will find that
    there are many sets of distinct int values that all convert to the
    same float value. This should convince you that there's no sleight-
    of-hand that can let you somehow "do nothing" and have things come
    out right.

    In any case, you almost certainly needn't worry about the loop.
    Ask yourself two questions: (1) How many bytes per second can your
    I/O devices read and write, and (2) how many bytes per second can
    your RAM read and write? You needn't be really careful about read-
    ahead and write-behind, or the effects of L1/2/3 cache, or anything
    complicated: We're just looking for "back of the envelope" figures.
    Get those figures, compare them, and ponder.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Jun 2, 2010
    #2
    1. Advertising

  3. bwv539 <> writes:

    > I have to read a bynary file with some signed int (32 bit) data and re-
    > write the same data into another file in floating point format, 32
    > bit.


    If this is "throwaway" code for a one-time conversion then your
    assumption that int is 32 bits and float the same is fine. If the code
    may live longer than you expect or move between implementations you
    might want to at least test this assumption in the code (you might be
    doing this already, of course, you posted only a fragment).

    You know, presumably, that not all 32-bit ints can be exactly
    represented by 32-bit floats.

    > The loop where I do this is this:
    >
    > int inINT[1024];
    > float inFLOAT[1024];
    > int idx;
    >
    > while(...some control...) {
    > fread (&inINT[0], sizeof(int), 1024, fin);


    Do you mind if less than 1024 of them could be read?

    > if(feof(fin)) {
    > break;
    > }


    I'd also worry about read errors. Because of that, I almost always
    write input loops so that they are driven by success rather than
    terminated by failure. For example if (as your code suggests) there are
    always multiple of 1024 ints you could write:

    while (/* any other conditions && */
    fread(inINT, sizeof inINT, 1, fin) == 1) { /* ... */ }

    [This also avoids the need to repeat the number of elements in the array.]

    > for( idx = 0; idx < 1024; idx++) {
    > inFLOAT[idx] = (float) inINT[idx];
    > }
    > fwrite (inFLOAT, sizeof(float), 1024, fout);
    > }
    >
    > I read data in blocks of 1024. I am wondering if there is any way of
    > getting rid of the for loop,


    No, short of using some very specific vectoring instructions that some
    machines have the conversion must be a loop.

    > for this application I need speed (I have
    > many TB to convert).
    > For example, I tried declaring a pointer to union:
    >
    > union FI_IN {
    > int intval;
    > float fval;
    > };
    >
    > union FI_IN* fi_in;
    >
    > But, the following
    >
    > fread (&inINT[0], sizeof(int), 1024, fin);
    > fi_in = (union FI_IN*)inINT;


    That's not the best way to use a union. Mind you, the union idea won't
    work so it makes no difference. You might as well have written:

    float *fp = (float *)inINT;
    write(fp, sizeof(float), 1024, fout);

    No conversion happens in this case nor does it in yours. You need to
    have code that converts an int to a float and that needs a loop. BTW,
    the cast to float in your original code is not needed.

    > does not work: if I access union members, ints are correct but float
    > are garbage.


    As you found, all you are doing is reinterpreting the int as if it were
    a float. But ints and floats are stored using different representations
    so you don't get a float with the same value as the int -- you get a
    whatever flat corresponds to some specific set of bits (and there may
    not even be one).

    --
    Ben.
     
    Ben Bacarisse, Jun 2, 2010
    #3
  4. Le 02/06/2010 14:49, bwv539 a écrit :
    > I have to read a binary file with some signed int (32 bit) data and re-
    > write the same data into another file in floating point format, 32
    > bit.
    >
    > The loop where I do this is this:
    >
    >
    >
    > int inINT[1024];
    >
    > float inFLOAT[1024];
    >
    > int idx;
    >
    >
    >
    > while(...some control...) {
    >
    > fread (&inINT[0], sizeof(int), 1024, fin);
    >
    > if(feof(fin)) {
    >
    > break;
    >
    > }
    >
    > for( idx = 0; idx < 1024; idx++) {
    >
    > inFLOAT[idx] = (float) inINT[idx];
    >
    > }
    >
    > fwrite (inFLOAT, sizeof(float), 1024, fout);
    >
    > }
    >
    >
    >
    > I read data in blocks of 1024. I am wondering if there is any way of
    > getting rid of the for loop, for this application I need speed (I have
    > many TB to convert).


    The work performed by the "for" loop can not be portably replaced by
    some type or pointer fiddling.

    It is possible to get rid of the for loop (e.g. by replacing it with
    1024 individual assignments) but that is unlikely to much improve
    performance (and might be very negative).

    Avenues for optimization:

    Make 1024 a constant, and properly handle the case where fread only read
    a partial buffer, that's easy. Then increase the constant, so that I/O
    is by bigger chunks.

    Checkthat the bottleneck is the loop (try to remove it and see if the
    program runs faster), else ignore the rest of this post.

    Activate whatever compiler option turns the conversion into something
    intrinsic to the CPU used, if that's possible; all modern x86 CPUs and
    compilers can do that. If not possible, and portability is not an issue,
    code the conversion yourself (in inline assembly, assembly, or even C);
    you may need to know the internal format of floats.

    Partially unroll the loop.

    If the inINT[] are all (or mostly) within a relatively small range,
    maybe a table lookup would help.


    François Grieu
     
    Francois Grieu, Jun 2, 2010
    #4
  5. bwv539

    James Waldby Guest

    On Wed, 02 Jun 2010 09:21:01 -0400, Eric Sosman wrote:

    > On 6/2/2010 8:49 AM, bwv539 wrote:
    >> I have to read a bynary file with some signed int (32 bit) data and re-
    >> write the same data into another file in floating point format, 32 bit.

    ....
    > In any case, you almost certainly needn't worry about the loop.
    > Ask yourself two questions: (1) How many bytes per second can your I/O
    > devices read and write, and (2) how many bytes per second can your RAM
    > read and write? You needn't be really careful about read- ahead and
    > write-behind, or the effects of L1/2/3 cache, or anything complicated:
    > We're just looking for "back of the envelope" figures. Get those
    > figures, compare them, and ponder.


    [OT] If you (the OP, that is) are a skilled programmer, you could
    use 3 threads -- a reader thread, a converter thread, and a writer
    thread. Run some tests on your system and see if things speed up;
    if not, just use the simple form of read/process/write as in the
    program you posted, although possibly with buffers 100 times bigger.

    This is OT in c.l.c, so for further help, post in comp.programming
    or comp.programming.threads instead.

    --
    jiw
     
    James Waldby, Jun 5, 2010
    #5
  6. Geoff <> writes:
    [...]
    > Why not write:
    >
    > while(...some control...) {
    >
    > fread (&inINT, sizeof(int), 1, fin);
    >
    > if(feof(fin)) {
    > break;
    > }
    >
    > inFLOAT = (float) inINT;
    > fwrite (&inFLOAT, sizeof(float), 1, fout);
    >
    > }


    Don't use feof() to detect end of input. If there's an error,
    the ferr(fin) becomes true, but feof(fin) doesn't, and you've got
    yourself an infinite loop.

    Check the value returned by fread(). After it's returned 0,
    indicating that there's no more input, you can use feof() and/or
    ferror() to find out why there's no more input.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jun 6, 2010
    #6
  7. On Jun 2, 3:49 pm, bwv539 <> wrote:
    >
    > Any hint?
    >

    Firstly, it is likely that the conversion time is trivial in
    comparision to your IO, as others have noted.

    If this is not the case, it is sometimes possible to do fast integer
    to float conversion, by accessing the bits of the floating point
    number directly. You can also sometimes pipeline the units so that the
    floating point unit is doing half the conversions and the integer unit
    the other half.
    However these are very non-portable, hacker's techniques, and you only
    try them as a last resort.
     
    Malcolm McLean, Jun 6, 2010
    #7
  8. Geoff <> writes:
    <snip>
    > I would probably have written a more robust function something like
    > this:
    >
    > void int2double(void)
    > {
    > int ival[BUFF_SIZE];
    > double fval[BUFF_SIZE];
    > int idx;
    >
    > while(1)
    > {
    > if (!fread (&ival[0], sizeof(int), BUFF_SIZE, fin)) {
    > if(feof(fin)) {
    > break;
    > }
    > else if(ferror(fin)) {
    > printf("Error %i reading input file\n", ferror(fin));
    > break;
    > }
    > }
    >
    > for(idx = 0; idx < BUFF_SIZE; idx++) {
    > fval[idx] = (double) ival[idx];
    > }
    > fwrite (&fval, sizeof(double), BUFF_SIZE, fout);
    > }
    > }


    I don't think that's more robust. I tries to detect errors as well as
    EOF but it fails to do both in what I'd call a robust way. Both EOF and
    a read error can cause fread to return a short count (not zero). If you
    get an error you'd want to report it and in both cases you'd want to
    either processes the items that were read or (at least) not try to
    process a full buffer's worth.

    Many of these problems come from working backwards. Why write while (1)
    and then try to detect a problem? I'd loop while there is data to be
    processed and report the reasons for stopping later:

    int ival[BUFF_SIZE];
    double fval[BUFF_SIZE];
    size_t items;

    while (items = fread(ival, sizeof(int), BUFF_SIZE, fin)) {
    size_t idx;
    for (idx = 0; idx < items; idx++)
    fval[idx] = ival[idx];
    fwrite(&fval, sizeof(double), items, fout);
    }
    if (ferror(fin))
    fprintf(stderr, "Error reading input file\n");

    This has the advantage that EOF can simply be ignored, and we can be
    certain that the correct number of items get processed (modulo typos of
    course).

    I've made a bunch of other changes. For example ferror does not report
    anything interesting in the return value (other than it's being or not
    being zero of course) and error messages are usually better written to
    stderr. Since we are processing object count, size_t seems the best
    counter type and the cast to float is redundant. I like also to give
    variables as small as scope as possible (e.g. idx). Most of these other
    changes are cosmetic.

    --
    Ben.
     
    Ben Bacarisse, Jun 6, 2010
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Schnoffos
    Replies:
    2
    Views:
    1,235
    Martien Verbruggen
    Jun 27, 2003
  2. bd
    Replies:
    0
    Views:
    651
  3. Nick Coghlan
    Replies:
    0
    Views:
    489
    Nick Coghlan
    Dec 6, 2004
  4. k3n3dy
    Replies:
    15
    Views:
    1,004
    dan2online
    Apr 20, 2006
  5. Carsten Fuchs
    Replies:
    45
    Views:
    1,606
    James Kanze
    Oct 8, 2009
Loading...

Share This Page