Re: typecast internals

Discussion in 'C Programming' started by Chris Torek, May 6, 2004.

  1. Chris Torek

    Chris Torek Guest

    In article <news:>
    (Niklaus) asks about typecasts (using C as
    the language in which they are written, but I will adjust for that):

    >1) What i would like to know is the about the
    > internals when a cast is applied ? Say we have
    > int i = 3;
    > double j;
    > j = (double) i;
    > What happens in the above statement ? Can someone
    > explain me at bit level or a considerable explantion ?

    [and similarly for double => int conversions]

    In article <news:hgdmc.37445$kh4.1876161@attbi_s52>
    glen herrmannsfeldt <> wrote:
    >The question has different meaning in different groups. comp.arch
    >deals with computer hardware, so the question has to do with how the
    >hardware to perform such casts works.


    Since this *is* (or was) comp.arch, let us take a look at a high(ish)
    level of how the hardware does the trick. (I am going to cross-post
    this back to comp.lang.c, because I will touch on requirements
    imposed by Standard C too.)

    The first problem is that we have to pick a floating-point format.
    Most modern CPUs use IEEE FP, which is a binary system with implied
    leading "1" bits and biased exponents. Here a floating point number
    is generally represented as:

    sign (exp - bias)
    -1 x 1.fff...ff x 2

    The "f"s here represent fraction bits. For instance, because 2.5 is
    1.01 x 2**1 -- 1 times 2, plus 0 times 1, plus 1 times one-half --
    the "f" bit sequence would be "01000000000...". To get 3, the "f"
    sequence would be "1000..."; to get 3.5, it would be "11000...";
    to get 3.75 it would be "111000...", and so on.

    In today's typical "double", there are 52 fraction bits, 11 exponent
    bits, and one sign bit (52+11+1 thus using up all 64 bits of an
    8-byte double stored in 8-bit-bytes). Because fractions are
    "1.fff..." rather than "f.fff...", these 52 bits hold 53 bits of
    binary representation. (There are exceptions for "denormalized"
    numbers, and special cases like infinities, NaN, and zeros; the
    53-bits rule is only for "normalized" floating point values.)

    (The "bias" in this case is 1023, not that this is particularly
    interesting. For explanatory purposes it is usually easiest to
    just assume the bias does not even exist, and talk about 2**-4 for
    values between 1/16th and 1/8th, for instance. The "-1**sign"
    part is just a way of saying "if sign then negative else positive",
    without using the word "if". :) )

    The C cast construct here acts as as request to convert from an
    int (represented in ordinary binary) to a double-precision floating
    point number (represented as above). At the CPU instruction level,
    the expression "(double)i" might (or might not) do something like:

    store reg_holding_i => memory
    load_fpu_with_integer memory => fpu_register_pair

    The FPU will then have to find the appropriate power of two that can
    hold the integer's value. Since the value is 3, the first such
    power of two is 1: 2**1 is 2, and 2**2 is 4, and 3 is squarely
    in between 2 and 4. So the (unbiased) exponent is 1. Having
    found that number, the remaining task is to figure out which
    fraction bits to set. We start by subtracting 2**1 from the
    initial value:

    3 - 2 = 1

    This is the implied "1" in front of the fraction. Now we need
    whatever additional bits it takes to represent "1" in the fraction
    part. We also start with 2**(exp-1). Since the exponent is 1,
    and 1-1 is 0, the next value we consider is 2**0 or 1. If we
    need at least 1, we set that bit and subtract 1.

    Now we have 0, so we are done. A more interesting example occurs
    when we try to represent 2.375:

    BEGIN:
    - 2.25 is greater than or equal to 2 but less than 4, so
    the exponent is 1 (and 2**1 = 2).

    - Subtract 2**1 from 2.375, giving 0.375. This is our
    "working fraction". Subtract 1 from 1, giving 0. This
    is our working exponent.

    LOOP (or iterate through gates etc):
    - Is 2**(working exponent) less than or equal to the working
    fraction? (2**0 is 1, which is not <= 0.375, so the answer
    is no, initially.) If no: use a 0 for the next fraction bit.
    If yes: use a 1 for the next fraction bit, and subtract off
    this fractional part.
    - Decrement working exponent and continue loop. Repeat loop
    until working fraction goes to 0 or 52 fraction bits are
    generated, whichever occurs first.

    In this case, with the working fraction initially 0.375, we have:

    FIRST ITERATION OF LOOP: 2**0 is 1 -- too big. Generate 0.
    [result so far: 0]

    NEXT ITERATION OF LOOP: 2**-1 is 0.5 -- too big. Generate 0.
    [result so far: 00]

    NEXT ITERATION OF LOOP: 2**-2 is 0.25 -- not too big. Generate 1
    and subtract off 0.25, leaving working fraction set to 0.125.
    [result so far: 001]

    NEXT ITERATION OF LOOP: 2**-3 is 0.125 -- not too big. Generate 1
    and subtract off 0.125, leaving working fraction set to 0.
    [result so far: 0011]

    STOP, because working fraction is now 0.

    Note that if the working fraction is not 0 but we stop because we
    have done all 52 fraction bits, the final result is inexact. If
    inexact traps are enabled, we should set such a trap. If the
    rounding mode is such that we need to round this value *away* from
    zero (towards +infinity if positive, or towards -infinity if
    negative), we should increment the entire fraction (which can cause
    overflow if every fraction bit is 1, so this is a bit tricky).

    So, in this case, we get:

    sign = 0
    1.fraction = 1.0011 (fraction bits = 001100...000).
    exponent = 1

    Hence the value is (-1)**0 x 1.0011 [base 2] x 2**1, or 2.375.

    How the bits are actually laid-out by the hardware -- in FPU
    registers or in memory (which may well different internal layouts)
    -- is architecture-dependent. (Indeed, IEEE float is already
    architechture-dependent; IBM S/370 systems still use "hex-float"
    and older systems used other methods.)

    To convert from double to int for C, just compute the value, stopping
    as soon as the exponent goes below 0 (because 2**1 is 0): 1.0011
    times 2**1 is "one times two-to-the-1, plus 0 times two-to-the-0,
    and then stop". Many architectures do not have an instruction that
    does this for you -- they often insist on doing more work having
    to do with rounding modes -- so the C code tends to compile to
    multiple machine instructions ("force rounding to round-to-zero,
    THEN convert") or a subroutine call.

    Conversion from a 32-bit-int whose maximum magnitude is 2147483648
    (+2147483647, -2147483648 => biggest magnitude is for negative
    numbers) to an IEEE "double" never has any trouble, because 53
    "mantissa" bits (52 from the "fraction" part plus 1 for the implied
    leading 1) can represent exact integers up to 2**54-1, or +/-
    18014398509481983.

    Conversion from such an integer to a 32-bit "float" can result in
    inexact values, however:

    - float: 23 fraction bits, 8 exponent bits, 1 sign bit

    - 24 "mantissa" bits, 1.fff... (23 fraction bits) => exact
    integer range is +/- 16777215.

    Suppose we try to convert 20971523 to "float". The first step is
    the same as before: find a power of 2 that allows numbers in this
    range. 2**24 is 16777216 and 2**25 is 33554432, so we use 2**24
    for the exponent. We subtract 16777216 from our initial value to
    get our working fraction: 20971523 - 16777216 = 4194307. The
    initial working exponent is 24-1 or 23.

    Now we loop, stopping after 23 iterations, generating fraction
    bits:

    FIRST ITERATION: 2**23 is 8388608 -- too big. Generate 0.
    [result so far: 0]

    NEXT ITERATION: 2**22 is 4194304 -- not too big. Generate
    1 and subtract, leaving (4194307-4194304) = 3.
    [result so far: 01]

    NEXT ITERATION: 2**21 is 2097152 -- too big.
    [result so far: 010]

    2**20 is 1048576 -- too big.

    2**19 is 524288 -- too big.
    ...
    2**2 is 4 -- too big.
    [result so far: 0100000000000000000000]
    2**1 is 2 -- not too big. Generate 1, subtract, leaving 3-2 = 1.
    [result so far: 01000000000000000000001]

    STOP, because we have completed 23 iterations. Working fraction
    is not yet 0 -- we have an inexact result, a "rounding error".

    The result -- 1.01000000000000000000001 x 2**24 -- represents not
    4194307 but rather 4194306. Depending on rounding mode, the FPU
    might (or might not) change this to 1.01000000000000000000010, or
    4194308. The C language does not specify which result is required
    (unlike floating-point-to-integer, where fractional parts must be
    discarded, not rounded).

    Returning to Glen's remarks:

    >In the context of C, casts can do two different things.
    >
    >Casts like (int), (float), or (double) convert the value of the
    >operand to the same value in the appropriate type. The
    >result should be the same as assigning to a variable of
    >the appropriate type.
    >
    >Casts like (int*) or (float*) tell the compiler to assume
    >(if it can) that the value is of the appropriate pointer type,
    >but not actually change the value of the pointer, or the
    >value that the pointer points to.


    This is not really true. Pointer casts are, conceptually, just as
    much of a conversion operation as arithmetic-type casts. It is
    just that the machines on which this conversion requires
    machine-instructions are now rare. An example machine that *does*
    require instructions is the Data General Eclipse, where one uses
    a shift instruction to convert between byte and word pointers.

    As I recall, the old Burroughs A-series machines used floating-point
    formats to store integers, so on the Burroughs, a cast from int to
    double would use no instructions. It would still be a conversion,
    though.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
     
    Chris Torek, May 6, 2004
    #1
    1. Advertising

  2. Chris Torek wrote:

    (snip)

    > Since this *is* (or was) comp.arch, let us take a look at a high(ish)
    > level of how the hardware does the trick. (I am going to cross-post
    > this back to comp.lang.c, because I will touch on requirements
    > imposed by Standard C too.)


    (snip of description of floating point formats)

    > The C cast construct here acts as as request to convert from an
    > int (represented in ordinary binary) to a double-precision floating
    > point number (represented as above). At the CPU instruction level,
    > the expression "(double)i" might (or might not) do something like:


    > store reg_holding_i => memory
    > load_fpu_with_integer memory => fpu_register_pair


    Or consider S/360 and S/370, which didn't have any instructions
    to do the conversion. It was done by storing the fixed point
    value in memory, applying the appropriate high order word with
    the right exponent, loading it into a floating point register,
    and then correcting for possible negative numbers.

    (snip)

    > How the bits are actually laid-out by the hardware -- in FPU
    > registers or in memory (which may well different internal layouts)
    > -- is architecture-dependent. (Indeed, IEEE float is already
    > architechture-dependent; IBM S/370 systems still use "hex-float"
    > and older systems used other methods.)


    Note that there is also the different endianness even for the
    same format, and especially the VAX not big or little endian
    formats.

    (snip of more details of fix/float conversions)

    >>Casts like (int*) or (float*) tell the compiler to assume
    >>(if it can) that the value is of the appropriate pointer type,
    >>but not actually change the value of the pointer, or the
    >>value that the pointer points to.


    > This is not really true. Pointer casts are, conceptually, just as
    > much of a conversion operation as arithmetic-type casts. It is
    > just that the machines on which this conversion requires
    > machine-instructions are now rare. An example machine that *does*
    > require instructions is the Data General Eclipse, where one uses
    > a shift instruction to convert between byte and word pointers.


    Reading it again, after the fact, I said it doesn't change the
    value not that it doesn't change the bits. The value (that it
    points somewhere in memory) still doesn't change, even though
    the bits change. I do agree that it is a strange distinction
    to make, though.

    > As I recall, the old Burroughs A-series machines used floating-point
    > formats to store integers, so on the Burroughs, a cast from int to
    > double would use no instructions. It would still be a conversion,
    > though.


    This is true. Consider, though, that it would have been possible
    for the C language to define the (int) and (float) casts as
    converting data types without converting bits, such as the Java
    functions (oops, methods) floatToIntBits() and doubleToLongBits().
    These exist in Java/JVM as native methods because the tricks used
    in other languages don't work in Java.

    It is generally more common, (and more portable,) to convert the value,
    so for example I can do something like printf("%f",sqrt((double)i));
    Somewhat like the Fortran DFLOAT() function. One could imagine a C
    where casts didn't convert the data but functions like DFLOAT() did.

    One could also imagine a pointer cast that would create a pointer
    pointing to converted data. Consider what PL/I does with a
    fixed point array used as an argument to the FLOAT function?
    It should create a whole new array containing the converted data.

    It seems to me, then, that it isn't completely obvious that the C
    way is the only possible way, but the one that fit most closely
    with the C language when it was being developed.

    It did take me a little while to get used to casts, after using
    languages, such as Fortran and PL/I, with conversion functions.

    To continue on, PL/I pointers are typeless, somewhat like the (void*)
    pointer in C. They are only given a type when they are dereferenced,
    similar to structure pointers in C. Especially interesting is that the
    operator -> is used to dereference PL/I pointers. (Is that where
    the C operator came from?)

    -- glen
     
    glen herrmannsfeldt, May 7, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Fitzpatrick

    Re: typecast error!

    Mark Fitzpatrick, Feb 1, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    1,693
    Lau Lei Cheong
    Feb 8, 2006
  2. Jobs Gooogle
    Replies:
    2
    Views:
    492
    Patricia Shanahan
    May 11, 2007
  3. Jobs Gooogle
    Replies:
    1
    Views:
    329
    Victor Bazarov
    May 10, 2007
  4. Jobs Gooogle

    .Net VC++ Java C++ Windows Internals Unix Internals

    Jobs Gooogle, May 10, 2007, in forum: C Programming
    Replies:
    0
    Views:
    374
    Jobs Gooogle
    May 10, 2007
  5. Jobs Gooogle
    Replies:
    0
    Views:
    134
    Jobs Gooogle
    May 10, 2007
Loading...

Share This Page