Derivation of the C Standard's Formula for FLT_DIG, DBL_DIG, LDBL_DIG

Discussion in 'C Programming' started by John Reye, Jul 7, 2012.

  1. John Reye

    John Reye Guest

    Hi,

    have you seen the C Standards formula for FLT_DIG, DBL_DIG, LDBL_DIG?

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=49 (pdf-
    page 49)

    If you have a capable browser, it can be displayed like this:

    ⎧ p log10 b if b is a power of 10
    q = ⎨
    ⎩ ⎣( p − 1) log10 b⎦ otherwise



    How does one derive this formula? (And: is it correct, in the sense of
    really defining the maximum for q ?)

    I'm particularly interested in the "otherwise" case, which states q <=
    floor((p-1)*log10(b))

    In particular: cannot q be somewhat larger, as in q <=
    floor(p*log10(b)) = LIM ??? I believe that also results in no rounding
    loss; and an exact representation (of the original decimal number)
    with p radix-b digits.

    (((In fact... q might even be able to be larger still (than LIM), for
    cases where there is a non-exact representation of the original number
    with p radix-b digits; but rounding back to base-10, would still yield
    the original number.
    But pondering on it... I think this case is not possible, since we are
    seeking the maximum of q... and any non-exact representation, would
    only makes sense if it provides "more digits", i.e. more resolution...
    but this contradicts the search for the maximum q. Hence I currently
    suspect LIM to be the maximum for q.
    )))

    Thanks.
    J.
     
    John Reye, Jul 7, 2012
    #1
    1. Advertising

  2. John Reye

    John Reye Guest

    On 7 Jul., 10:05, John Reye <> wrote:
    > Hi,
    >
    > have you seen the C Standards formula for FLT_DIG, DBL_DIG, LDBL_DIG?
    >
    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=49(pdf-
    > page 49)
    >
    > If you have a capable browser, it can be displayed like this:
    >
    >       ⎧    p log10 b               if b is a power of 10
    > q =   ⎨
    >       ⎩    ⎣( p − 1) log10 b⎦      otherwise
    >
    > How does one derive this formula? (And: is it correct, in the sense of
    > really defining the maximum for q ?)
    >
    > I'm particularly interested in the "otherwise" case, which states q <=
    > floor((p-1)*log10(b))
    >
    > In particular: cannot q be somewhat larger, as in q <=
    > floor(p*log10(b)) = LIM ??? I believe that also results in no rounding
    > loss; and an exact representation (of the original decimal number)
    > with p radix-b digits.


    Yip, I'm pretty sure I'm correct here.

    Here's a proof by example.
    Consider the following two cases:
    a) b = 100
    b) b = 101

    In a):
    Here b = 100; then:
    q <= p*log10(b) = p*2

    In b):
    Here b = 101; then:
    According to the C standard:
    q <= floor((p-1)*log10(b)) = (p-1)*2 = p*2 - 2

    Note the -2 !!

    This shows how ludicrous the formula in the C standard actually is!
    If you increase the resolution of the p digits, by increasing b by
    1... we end up getting 2 less "base-10-digits" (i.e. 2 less q)!!
    This is simply false.

    The correct max of q (as far as I can see) is given by
    q <= floor(p*log10(b)) = p*2
    Note: no ridiculous -2 here!
     
    John Reye, Jul 7, 2012
    #2
    1. Advertising

  3. John Reye <> writes:

    > On 7 Jul., 10:05, John Reye <> wrote:
    >> Hi,
    >>
    >> have you seen the C Standards formula for FLT_DIG, DBL_DIG, LDBL_DIG?
    >>
    >> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=49(pdf-
    >> page 49)
    >>
    >> If you have a capable browser, it can be displayed like this:
    >>
    >>       ⎧    p log10 b               if b is a power of 10
    >> q =   ⎨
    >>       ⎩    ⎣( p − 1) log10 b⎦      otherwise
    >>
    >> How does one derive this formula? (And: is it correct, in the sense of
    >> really defining the maximum for q ?)
    >>
    >> I'm particularly interested in the "otherwise" case, which states q <=
    >> floor((p-1)*log10(b))
    >>
    >> In particular: cannot q be somewhat larger, as in q <=
    >> floor(p*log10(b)) = LIM ??? I believe that also results in no rounding
    >> loss; and an exact representation (of the original decimal number)
    >> with p radix-b digits.

    >
    > Yip, I'm pretty sure I'm correct here.
    >
    > Here's a proof by example.
    > Consider the following two cases:
    > a) b = 100
    > b) b = 101
    >
    > In a):
    > Here b = 100; then:
    > q <= p*log10(b) = p*2
    >
    > In b):
    > Here b = 101; then:
    > According to the C standard:
    > q <= floor((p-1)*log10(b)) = (p-1)*2 = p*2 - 2
    >
    > Note the -2 !!
    >
    > This shows how ludicrous the formula in the C standard actually is!


    For it to be wrong (let alone ludicrous) you'd have to show that all
    numbers with q > floor((p-1)*log10(b)) decimal digits can be "rounded
    into a floating-point number [...] and back again without change [...]".

    For example pick, say, p=2. q is 2 meaning that there are 3-digit
    decimal numbers that can't be converted to and from 2-digit base 101
    floating point representation. Consider 100 -- how do you represent 100
    as a 2-digit base 101 number so that it converts back to 100?

    > If you increase the resolution of the p digits, by increasing b by
    > 1... we end up getting 2 less "base-10-digits" (i.e. 2 less q)!!
    > This is simply false.


    That needs a proof, not a blank statement.

    > The correct max of q (as far as I can see) is given by
    > q <= floor(p*log10(b)) = p*2
    > Note: no ridiculous -2 here!


    It's not true that floor(p*log10(b)) = p*2. It's not true even in the
    specific case when b = 101.

    --
    Ben.
     
    Ben Bacarisse, Jul 7, 2012
    #3
  4. John Reye

    John Reye Guest

    On Jul 7, 5:34 pm, Ben Bacarisse <> wrote:
    > For it to be wrong (let alone ludicrous) you'd have to show that all
    > numbers with q > floor((p-1)*log10(b)) decimal digits can be "rounded
    > into a floating-point number [...] and back again without change [...]".

    Wrong. I'm claiming it for all positive integer numbers, q where q <=
    floor(p*log10(b))

    > For example pick, say, p=2.  q is 2 meaning that there are 3-digit
    > decimal numbers that can't be converted to and from 2-digit base 101
    > floating point representation.  Consider 100 -- how do you represent 100
    > as a 2-digit base 101 number so that it converts back to 100?

    What about 0G, where G is the single base-101 digit representing 100?
    Simple!! ;)

    > > If you increase the resolution of the p digits, by increasing b by
    > > 1... we end up getting 2 less "base-10-digits" (i.e. 2 less q)!!
    > > This is simply false.

    >
    > That needs a proof, not a blank statement.

    Well I have a proof. Maby if I find a nice math2ascii (or
    math2unicode) converter, I'll show it.

    > It's not true that floor(p*log10(b)) = p*2.  It's not true even in the
    > specific case when b = 101.

    I think the previous post is clear enough, for showing a simple
    counter-example.
    But yes, if you must have it, I can add that floor(p*log10(101)) =
    p*2,
    for p <= 100
     
    John Reye, Jul 7, 2012
    #4
  5. John Reye

    John Reye Guest

    On Jul 7, 6:48 pm, John Reye <> wrote:
    > On Jul 7, 5:34 pm, Ben Bacarisse <> wrote:> For it to be wrong (let alone ludicrous) you'd have to show that all
    > > numbers with q > floor((p-1)*log10(b)) decimal digits can be "rounded
    > > into a floating-point number [...] and back again without change [...]"..

    >
    > Wrong. I'm claiming it for all positive integer numbers, q
    > where q <= floor(p*log10(b))


    I'm no longer sure of this statement, since I've found something that
    I forgot in my proof-attempt.

    I'll work at it some more and then post the results as good as I get
    them...
    Maby someone can then fill in the missing details, and maby we can
    proof the C Standard's formula to be true. ;)
     
    John Reye, Jul 7, 2012
    #5
  6. John Reye

    John Reye Guest

    OK here's my attempt at a proof, that
    q <= p*log10(b)


    I hope I've considered everything. Maby I made some careless error
    that invalidates it. Please explain it to me, if you find one!
    Otherwise... here we go:





    The decimal number is:

    q
    ====
    \ -i
    X = > ( G * 10 )
    / i
    ====
    i=1



    The float number representation will have the form:

    p
    ====
    \ -k
    Y = > ( F * b )
    / k
    ====
    k=1





    To round z to the n'th digit after the comma, for a number having base
    r, use:


    Round(n, r, z)

    =
    n n
    floor(z * r + 0.5) / (r )




    So what now?

    We round X to the form Y.
    Like this:
    Y = Round(p, b, X)

    And then we seek the maximum q, such that when rounding back, the
    following holds true:

    Round(q, 10, Round(p, b, X)) = X

    |
    \
    = Y


    Anyway, lets look at the rounding of X to the form Y:

    Y = Round(p, b, X)
    =

    q
    ====
    \ -i p p
    floor( > ( G * 10 ) * b + 0.5) / (b ))
    / i
    ====
    i=1


    Now have a look at the following SUBEXPRESSION from above

    -i p
    ( G * 10 ) * b
    i



    If (for the subexpression) all G digits ...
    i

    ....end up being left of the comma, then we can basically ignore
    "adding 0.5 and the floor operation" above.

    So lets do that:
    Let's get the rightmost digit
    G
    q
    ....to end up being left of the comma (in base-b representation)!
    (we're considering the subexpression here!)


    -q p
    This is when 10 * b >= 1 .

    THIS IS THE MAIN CONDITION!!

    If this holds true, then X can be represented exactly as Y, and vice
    versa.
    (Question to experts: is this guaranteed to be true?)

    Now simply solve the MAIN CONDITION, for q:

    q <= p*log10(b)



    One hmmm, what if the G - digits when converted to base-b digits
    (SUBEXPRESSION), end up having more than p digits!???
    That's something that I have not considered previously.

    The maximum possible value of SUBEXPRESSION, in base-b format is
    defined by p base-b digits: b^p - 1
    So what happens if we insert the maximum base-10 format, which is when
    G is always 9:
    Our subexpression becomes:
    -i p
    ( 9 * 10 ) * b

    If we sum it over the q digits, then we get
    (10^q - 1) / 10^q * b^p


    We must have b^p - 1 >= (10^q - 1) / 10^q * b^p
    in order for X to be representable as Y and not shoot out over Y's
    range!

    OK, now solve for q and (woha!) we're safe:
    q <= p*log10(b)



    So in both cases I get
    q <= p*log10(b)

    If this holds true, then X can be exactly expressed as Y, and it can
    then obviously be exactly represented as (rounded to) X again.


    Please tell me if there's something important that I've missed.
    Thanks for your help.
    J.
     
    John Reye, Jul 7, 2012
    #6
  7. John Reye

    Tim Rentsch Guest

    John Reye <> writes:

    > Hi,
    >
    > have you seen the C Standards formula for FLT_DIG, DBL_DIG, LDBL_DIG?
    >
    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=49
    > (pdf- page 49)
    >
    > If you have a capable browser, it can be displayed like this: [snip]


    Possibly true, but many people (myself included) don't read news
    postings in a browser. Just fyi.


    > How does one derive this formula? (And: is it correct, in the
    > sense of really defining the maximum for q ?)


    I don't know how to derive the formula, nor do I know how to go
    about deriving it. Despite that, I believe the formula is
    correct.


    > I'm particularly interested in the "otherwise" case, which
    > states q <= floor((p-1)*log10(b))
    >
    > In particular: cannot q be somewhat larger, as in q <=
    > floor(p*log10(b)) = LIM ??? I believe that also results in no
    > rounding loss; and an exact representation (of the original
    > decimal number) with p radix-b digits. [snip elaboration]


    There may be particular cases where q can be larger than the
    Standard's formula dictates, but certainly it cannot be in all
    cases. For example, consider the floating value 97e-4. Naively,
    we might guess seven bits would be enough for conversion of all
    two decimal digit FP numbers (because 10**2 < 2**7). That would
    correspond to q (==2) = floor (7 * log10 2). However, seven bits
    is not enough. Calculating with 0.0097 = 0.97 * 10**-2, we get:

    0.97 * 10**-2 = 0.000 000 100 111 101 ... (base 2)
    = 0.100 111 101 ... (base 2) * 2**-6

    Rounding the right hand side to 7 bits (after the binary
    point) gives

    = 0.100 111 1 (base 2) * 2**-6

    which is approximately equal to 0.009643, or

    0.9643 * 10**-2

    which when rounded to two decimal places is 0.0096, and
    not 0.0097. So here we have an example where the
    formula q = floor(p * log10(b)) does not work for p=7, but
    the formula q = floor( (p-1) * log10(b) ) does work for
    p=8; and indeed, using eight bits does suffice to convert
    0.0097 to binary and back again (rounding in both cases)
    and get 0.0097 back again.
     
    Tim Rentsch, Jul 7, 2012
    #7
  8. John Reye <> writes:

    > On Jul 7, 5:34 pm, Ben Bacarisse <> wrote:
    >> For it to be wrong (let alone ludicrous) you'd have to show that all
    >> numbers with q > floor((p-1)*log10(b)) decimal digits can be "rounded
    >> into a floating-point number [...] and back again without change [...]".

    > Wrong. I'm claiming it for all positive integer numbers, q where q <=
    > floor(p*log10(b))


    Yes you are but that's not my point. My point was, though very poorly
    stated, that for the formula to be wrong you have to show that there is
    *some* q > floor((p-1)*log10(b)) for which all q-digit numbers can be
    round-trip converted. Can you show such a thing?

    >> For example pick, say, p=2.  q is 2 meaning that there are 3-digit
    >> decimal numbers that can't be converted to and from 2-digit base 101
    >> floating point representation.  Consider 100 -- how do you represent 100
    >> as a 2-digit base 101 number so that it converts back to 100?

    > What about 0G, where G is the single base-101 digit representing 100?
    > Simple!! ;)


    True. And it's obviously true for all integers between 0 and 9999. If
    there is a problem it will be with fractional numbers or large numbers.

    >> > If you increase the resolution of the p digits, by increasing b by
    >> > 1... we end up getting 2 less "base-10-digits" (i.e. 2 less q)!!
    >> > This is simply false.

    >>
    >> That needs a proof, not a blank statement.

    > Well I have a proof. Maby if I find a nice math2ascii (or
    > math2unicode) converter, I'll show it.
    >
    >> It's not true that floor(p*log10(b)) = p*2.  It's not true even in the
    >> specific case when b = 101.

    > I think the previous post is clear enough, for showing a simple
    > counter-example.


    Well, that was my point. I did not see a counter example just a
    statement that the bound in the standard is ludicrous.

    To make my search for a counter example easier, lets consider 2-digit
    base 32 floating point numbers. (In base 32, the difference between the
    standard's formula and yours is exaggerated when p=2.)

    Your formula says that all floating point numbers with three decimal
    digits can be converted to a 2-digit base 32 floating point number (and
    back) without change. I don't think this is true for 6.24 x 10^-2
    (0.0624). The closest base 32 representation is

    0.(2)(0) = 2/32 = 0.0625

    and the next smallest is

    0.(1)(31) = 1/32 + 31/1024 = 0.061523

    To take a non-fractional example, 203 x 10^1 would be represented as

    (1)(31) x 32^1 = 1*32*32 + 31*32 = 2016

    The round-trip conversion will yield 202 x 10^1. The next largest is
    2*32*32 = 2048 which rounds to 205 x 10^1.

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Jul 7, 2012
    #8
  9. John Reye

    John Reye Guest

    Thanks Tim and Ben for your good replies!

    I've found a big flaw in my "proof" above.
    I never consider the exponent!

    But it is vital, particularly when b is not a power of 10.

    The reason is: the exponent, changes the bit-pattern of the machine
    representation!!

    Example (from Tim):
    0.97 * 10**-2 = 0.000 000 100 111 101 ... (base 2)
    = 0.100 111 101 ... (base 2) * 2**-6

    But
    0.97 * 10**0 = 0.111 110 000 ... (base 2)

    The important thing to note is that the binary bit pattern in the
    mantissa is completely different!
    ;)


    I'm hoping someone can come up with a nice exact proof for the limit
    on q.
    (Maby the flawed proof can be a starting point, or inspiration).

    Thanks.
     
    John Reye, Jul 7, 2012
    #9
  10. John Reye

    John Reye Guest

    Hi there!

    Here is a detailed analysis of the problem.
    I end up with something I term "THE MAIN RESULT" below.
    A good discrete mathematician can probably use this to determine the
    exact tightest highest bound for q.

    The bound given in the C standard is not the tighest max. bound.
    I believe I've shown this in the 2nd post on this thread (dated Sat, 7
    Jul 2012 06:50:02 -0700 (PDT)).

    Anyway... below I also go ahead and proof the bound found in the C
    standard.
    Here we go:


    The decimal number is:


    X = Xmantissa * 10^ex

    q
    ====
    \ -i
    X = > ( G * 10 ) * 10^ex
    / i
    ====
    i=1


    The constraint on Xmantissa and then X is:

    10^(-1) <= Xmantissa < 1

    Thus since X = Xmantissa * 10^ex, we get

    10^(ex-1) <= X < 10^e




    We want to represent X as a floating point number with radix (base) b.
    We'll call that representation Y.

    Y will have this form:

    Y = Ymantissa * b^ey

    p
    ====
    \ -k
    Y = > ( F * b ) * b^ey
    / k
    ====
    k=1

    The constraint on Y is:

    b^(-1) <= Ymantissa < 1

    Thus since Y = Ymantissa * b^ey, we get

    b^(ey-1) <= Y < b^ey


    How do we get X into the form Y, with the constraints on the mantissa.
    Well, we have to scale X, by multiplication by b^n, such that we get
    the standard form for Ymantissa, which corresponds to its constraint
    of
    b^(-1) <= Ymantissa < 1

    Thus
    b^(-1) <= X * b^n < 1
    Solving for n:
    -1 + log_b(X^(-1)) <= n < log_b(X^(-1))

    Since n must be an integer, we set:
    n = ceil(-1 + log_b(X^(-1))

    Thus X in terms of Y becomes:
    Yexact = (X * b^n) * b^(-n)
    ---------
    Ymantissa


    But this is an exact Y, and we want a true floating point Y, with it's
    p digits (base b) or precision.


    We'll use this rounding scheme:
    To round z to the m'th digit after the comma, for a number having base
    r, use:

    Round(m, r, z)

    =
    m m
    floor(z * r + 0.5) / (r )



    For rounding X to Y we'll do:
    Y = Round(p, b, X*b^n)*b^(-n)
    -----

    The line shows our Ymantissa = X*b^n, which needs to undergo rounding,
    to the correct number of digits, p.
    For simplicity, let's call
    Round(p, b, X*b^n) = YmantissaRound

    For rounding back to form X, we need to scale Y according the the
    constraint of Xmantissa, and then round to the q digits.
    We have to scale Y by multiplication of 10^something, such that
    10^(-1) <= Xmantissa < 1
    We know that we'll scale by 10^(-ex) since this will be consistant,
    with the very first equation of this proof.
    Thus
    XmantissaBack = (Y * 10^(-ex))
    so that (compare with 1st equation of proof):
    Xback = (Y * 10^(-ex)) * 10^ex


    So the rounding of X to Y and back to Xback, will be:
    Xback = Round(q, 10, XmantissaBack) * 10^ex
    or
    Xback = Round(q, 10, (Y * 10^(-ex))) * 10^ex
    or
    Xback = Round(q, 10, (YmantissaRound * b^(-n) * 10^(-ex))) * 10^ex
    or
    Xback = Round(q, 10, (Round(p, b, X*b^n) * b^(-n) * 10^(-ex))) * 10^ex

    Ultimately, we seek the maximum q such that the above line, when
    evaluated yields:
    Xback == X

    Thus we seek the maximum q, such that:

    Round(q, 10, (Round(p, b, X*b^n) * b^(-n) * 10^(-ex))) * 10^ex == X

    where
    n = ceil(-1 + log_b(X^(-1))
    over all possible variations in X (according to our format of X) and
    over all possible variations in ex (according the the limits of our
    exponent ex_MIN <= ex <= ex_MAX)

    i.e.
    **********THE MAIN RESULT******************************
    We seek the maximum q, such that:

    (floor(((floor(X*b^n * b^p + 0.5) / (b^p)) * b^(-n) * 10^(-ex)) * 10^q
    + 0.5) / (10^q)) * 10^ex == X

    where
    n = ceil(-1 + log_b(X^(-1))
    over all possible variations in X (according to our format of X) and
    over all possible variations in ex (according the the limits of our
    exponent ex_MIN <= ex <= ex_MAX)



    I suspect that the exact solution to this problem, is something for
    the good discrete mathematician.




    How can one simplify it?
    a)
    I've found some interesting constaints.

    b)
    I've found a way to find the bound for q, as given by the C standard.
    However, this is not the best possible (i.e. max) bound. (If you
    mangage to work out a tighter larger bound, please share the details,
    if you like ;) )


    Anyway... here goes:
    a)

    X is the sum of coefficients. Lets just consider the least significant
    coefficient of X which is:

    -q ex
    G * 10 * 10 = G_q * 10^(ex-q)
    q

    Putting it into the formula:

    (floor(((floor(G_q*10^(ex-q)*b^n * b^p + 0.5) / (b^p)) * b^(-n) * 10^(-
    ex)) * 10^q + 0.5) / (10^q)) * 10^ex == G_q*10^(ex-q)

    Then set G_q to it's lowest value, which is 1. (Reason: we seek the
    limit, where the floor cuts off important stuff. For this, we need to
    consider the lowest positive value.)

    This then simplfies to:

    (floor(((floor(10^(ex-q)*b^n * b^p + 0.5) / (b^p)) * b^(-n) * 10^(-
    ex)) * 10^q + 0.5)) == 1

    Call this the same as:
    (floor(A + 0.5)) == 1

    Then we have
    0.5 <= A < 1.5

    Then
    0.5 * b^(n+p) * 10^(ex-q) <= floor(b^(n+p) * 10^(ex-q) + 0.5) < 1.5 *
    b^(n+p) * 10^(ex-q)

    Call this:
    0.5 * S <= floor(S + 0.5) < 1.5 * S

    The middle floor must yield an integer, so the following constraint
    (considering the outer bounds) must hold:
    ceil(0.5 * S) <= S + 0.5 <= floor(1.5 * S)


    b)
    Here's how to prove the C standard's bound on q.

    Consider the following SUBEXPRESSION, from THE MAIN RESULT (see above)
    (floor(X*b^n * b^p + 0.5) / (b^p))

    The least significant coefficient of X is:
    G_q*10^(ex-q)

    Set G_q to 1 (the lowest possible value)
    Then the contribution of this least coefficient to the subexpression
    is:

    (floor(10^(ex-q)*b^n * b^p + 0.5) / (b^p))

    For the contribution to be visible and rounded correctly (and not
    chopped off by floor), we require:

    10^(ex-q)*b^n * b^p >= 1
    Call this the CONTRIBUTION_REQUIREMENT.

    To handle n, we'll need to consider the 2 mantissa constraints, which
    were introduced almost right at the top:

    -1 + log_b(X^(-1)) <= n < log_b(X^(-1))
    10^(ex-1) <= X < 10^e

    We can use the 2nd constraint to determine the range of
    log_b(X^(-1)) which appears in the 1st constraint.

    Applying ()^-1 and then log_b to the 2nd constraint yields:
    log_b(10^(-ex+1)) >= log_b(X^(-1)) > log_b(10^(-ex))
    which - after reformatting - is:
    log_b(10^(-ex)) < log_b(X^(-1)) <= log_b(10^(-ex+1))

    This allows us to get a range for the 1st constraint, i.e. a range for
    n:

    -1 + log_b(10^(-ex)) < -1 + log_b(X^(-1)) <= n < log_b(X^(-1)) <=
    log_b(10^(-ex+1))

    Now we'll apply this bound to the CONTRIBUTION_REQUIREMENT. We use the
    lower bound (which is the "worst case" for the
    CONTRIBUTION_REQUIREMENT) which will yield the value for q, which
    *guarantees* the CONTRIBUTION_REQUIREMENT

    Thus
    10^(ex-q)*b^n * b^p >= 1
    becomes
    10^(ex-q)*b^(-1 + log_b(10^(-ex))) * b^p >= 1
    or
    10^(ex-q)*b^(-1) * b^(log_b(10^(-ex))) * b^p >= 1
    or
    10^(ex-q)*b^(-1) * 10^(-ex) * b^p >= 1
    or
    10^(-q)*b^(-1+p) >= 1
    or
    q <= log10(b^(-1+p)) = (p-1)*log10(b)

    So:
    q <= (p-1)*log10(b)

    which proves the bound found in the C standard:
    q <= floor((p-1)*log10(b))


    Regards,
    J.
     
    John Reye, Jul 8, 2012
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. derivation

    , Apr 24, 2004, in forum: Perl
    Replies:
    0
    Views:
    545
  2. Benjamin Michotte

    derivation, restriction, extension, ...

    Benjamin Michotte, Sep 5, 2003, in forum: XML
    Replies:
    1
    Views:
    510
    Bob Foster
    Sep 5, 2003
  3. Steven T. Hatton
    Replies:
    12
    Views:
    1,709
    Jonathan Turkanis
    Aug 20, 2004
  4. Malcolm

    stringise DBL_DIG

    Malcolm, Feb 1, 2004, in forum: C Programming
    Replies:
    6
    Views:
    886
    Peter Nilsson
    Feb 3, 2004
  5. John Reye
    Replies:
    0
    Views:
    349
    John Reye
    Jul 7, 2012
Loading...

Share This Page