long double precision

Discussion in 'C++' started by vi, Nov 16, 2007.

  1. vi

    vi Guest

    I have a question concerning the precision of long double, think may
    be stupid question, I apalogyze if it is so

    here a piece of code

    #include <iomanip>
    #include <iostream>
    using namespace::std;
    int main () {
    long double toto=0.123456789123456789123456789123456789;
    cout << sizeof(long double) << endl;
    cout << setprecision(21) << toto << endl;
    double titi=0.123456789123456789123456789123456789;
    cout << sizeof(double) << endl;
    cout << setprecision(21) << titi << endl;
    return 0;

    and the result

    I don't understand why long double and double have the same precision
    in the output,
    they seem to be different in memory, so the problem come from the
    initialisation or for the wrinting in the output?

    Thanks in advance for your reply,
    vi, Nov 16, 2007
    1. Advertisements

  2. vi

    Markus Moll Guest

    The above literal is a double literal. Therefore, the same value is assigned
    to both toto and titi.

    Use 0.123456...L or 0.123456...l to denote that the literal is a long double
    literal (unlike with integers, the type to be chosen is not immediately
    clear. 0.1 is likely not representable in any of the floating point types,
    but you would expect its type to be double, not the type with the greatest
    Markus Moll, Nov 16, 2007
    1. Advertisements

  3. The problem may actually be simpler: the Standard does not guarantee
    that 'long double' has more precision than 'double'. BTW, it is the
    case with Microsoft Visual C++ on Windows, for example.
    Victor Bazarov, Nov 16, 2007
  4. vi

    Markus Moll Guest


    Phew... as the OP said that his long double was twice the size of a double,
    I assumed that the precision would also be greater. However, of course it's
    possible that all the space is wasted or used for the exponent (or for
    redundant sign-bits for error-correction or something like this :p)

    What does MSVC++ say about sizeof(long double) vs sizeof(double)?

    Markus Moll, Nov 16, 2007
  5. 8 vs 8

    Victor Bazarov, Nov 16, 2007
  6. vi

    vi Guest

    Great, it works with L!

    vi, Nov 16, 2007
  7. MSVC++ has all kinds of odd settings which are standard, but different
    from any other compiler. Another one is that, if I'm not mistaken,
    sizeof(long) == 32 even in 64-bit platforms when compiling a 64-bit
    binary. (So if you ever programmed assuming 'long' will be 64 bits in a
    64-bit system, then you are for a surprise.)
    Makes one wonder how you seek a file larger than 4GB, given that fseek
    takes a long as parameter.

    (Btw, *why* does it take a long as parameter? Shouldn't it take
    size_t? It's not like what MSVC++ does is wrong or against the standard.
    It just makes it impossible to seek large files with standard code.)
    Juha Nieminen, Nov 16, 2007
  8. 2GB, actually. 'long' is signed, the largest value is 2^31-1. You
    must be thinking 'unsigned long', but that's not what 'fseek' is
    taking (as you correctly pointed out).
    (a) It takes 'long' because when C Library was standardised (1989)
    there was no concern probably with the files larger than what 'long'
    can service, and besides, as the files grow, so will 'long', right?
    [Well, Microsoft told them all, didn't it?] (b) If you need to seek
    in files larger than 'long' allows, use either 'fsetpos' or some OS
    specific means. (c) size_t is not a very suitable thing for that,
    since 'size_t' is for the sizes of objects. I would rather think
    that 'ptrdiff_t' is a better choice. (d) Don't use C Library for
    file I/O, use C++ Library, there you'll deal with the special type
    for the position, 'std::basic_streambuf::pos_type'. And if it's
    not large enough, complain to the compiler vendor.

    Victor Bazarov, Nov 16, 2007
  9. vi

    BobR Guest

    Markus Moll wrote in message...
    // #include <iostream>, <limits>
    std::cout <<" dbl digits ="
    std::cout<<" LD digits ="
    <<(std::numeric_limits<long double>::digits)<<std::endl;

    /* - output - (GCC(MinGW), win98se)
    dbl digits =53
    LD digits =64
    See what you get from those lines.
    I asked My Second Virtual Cousin (twice added), and he said nothing! <G>

    In Assembler, I used to use eight-byte(dd) and ten-byte(dt) types. That's
    not even close to "twice the size" (If we're talking number of bits).
    [ assembler == a386 ]
    BobR, Nov 16, 2007
  10. vi

    James Kanze Guest

    I don't think that's true. It's been a while, and maybe I'm
    remembering wrong, but I think the problem with using long was
    knows already back then. I *think* (that is, I'm far from sure)
    that the "answer" was supposed to be fgetpos and fsetpos; fseek,
    with long was maintained for reasons of compatilibity with
    existing code.

    Whatever the case, fsetpos and fgetpos didn't take; people
    continued using fseek. And C++ went in yet another direction,
    and ended up requiring the impossible in the standard. (The
    standard requires round-trip conversions between streamoff and
    streampos, but it also requires streampos to contain more

    IMHO, the real problem is more fundamental: text files and seek
    simply don't mix, and any attempts by the standard to make it
    work are bound to have problems. C (and indirectly C++) sort of
    addresses those problems by limiting the possibilities of
    seeking in a file opened in text mode. The fact that filebuf
    does code translation even in binary mode reintroduces them in
    C++. And somewhere in all that, implementations seem to have
    forgotten that neither streampos nor streamoff are required to
    be integral types. (Or perhaps rather, they don't dare change
    them from their historical types for fear of breaking existing

    With regards to size_t: size_t is related to memory size or
    addressability, not file size: there's certainly nothing
    impossible about a 16 bit system allowing files of more than 4
    GB. Posix uses off_t in its standard (but requires it to be an
    integral type---of course, Posix systems have to support long
    long as well). The logical solution is a different type(def).
    Like in fgetpos and fsetpos.
    Who also has to deal with existing code:). How many times have
    we seen people implicitly converting streampos (i.e.
    std::streambuf::pos_type) to some integral type?

    Systems have the same problem, with regards to existing code,
    and Sun, for example, offers three or four different options to
    handle it at the Posix level (not all of which are strictly
    Posix conform, obviously).
    James Kanze, Nov 17, 2007
  11. vi

    James Kanze Guest

    And g++ on a PC, at least in some configurations, uses 8 and 12

    At the hardware level, there are 10 bytes of information in an
    Intel long double. But 10 bytes results in some awkward
    alignments. Microsoft, from what I understand, punts on the
    question, by ignoring the hardware long double type (which is
    conform, if not very useful). G++ originally chose to use 12
    bytes (with 2 garbage bytes) according to alignment
    considerations on some older machines: with modern hardware,
    unless you have 16 bytes alignment, you might as well go with
    10. (I'm not sure, but there may also be options to control

    Once again, there is no right answer.
    James Kanze, Nov 17, 2007
  12. True, with some additional considerations. The commonly used IEEE 754
    floating point formats are

    single precision: 32 bits including 1 sign bit, 23 significand bits
    (with an implicit leading 1, for 24 total), and 8 exponent bits

    double precision: 64 bits including 1 sign bit, 52 significand bits
    (with an implicit leading 1, for 24 total), and 11 exponent bits

    double extended precision: 80 bits including 1 sign bit, 64 significand
    bits (no implicit leading 1), and 15 exponent bits.

    The native format of the x87 FPU is "double extended". We run into this
    from time to time when floating point computations compile with more
    optimization give slightly different results. The issue is that one
    optimiziation is to hold intermediate results in 80-bit FPU registers
    instead of rounding them down to fit in 64-bit memory locations. gcc
    offers the -ffloat-store to suppress this optimization for that very

    In addition, the x87 control register allows one to select between
    extended (the default), double and single precision. Try this (on
    Linux/gcc/glibc), for example:

    #include <fpu_control.h>

    void set_double(void)
    unsigned short cw;
    cw = (cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;

    and similarly for "set_extended". Note that this will make all kinds of
    trouble for you because libm depends on having the FPU in extended
    precision mode.

    Now comes the real kicker: the SSE/SSE2/SSE3 vector co-processors do not
    support double extended precision; only single and double precision.
    gcc and icc are both using these co-processors pretty extensively now,
    so you're not guaranteed to get even intermediate results done in
    double-extended arithmetic.

    Going back on-topic, if you want to peek at the binary representations
    of IEEE floating-point numbers, you might enjoy the template included
    below. I supply typedefs for single and double precision, writing the
    typedef for double extended precision is left as an exercise to the

    // IEEE Floating-Point template
    // Copyright (C) 2007 Charles M. "Chip" Coldwell <>

    // This program is free software: you can redistribute it and/or modify
    // it under the terms of the GNU General Public License as published by
    // the Free Software Foundation, either version 3 of the License, or
    // (at your option) any later version.

    // This program is distributed in the hope that it will be useful,
    // but WITHOUT ANY WARRANTY; without even the implied warranty of
    // GNU General Public License for more details.

    // You should have received a copy of the GNU General Public License
    // along with this program. If not, see <http://www.gnu.org/licenses/>.

    #ifndef IEEEFLOAT_HH
    #define IEEEFLOAT_HH

    <typename _float_t, typename _sint_t, typename _uint_t, int _mbits, int _ebits>
    class ieee_float {
    typedef _uint_t uint_t;
    typedef _sint_t sint_t;
    typedef _float_t float_t;
    enum { mbits = _mbits, ebits = _ebits };

    #ifdef __BIG_ENDIAN__
    uint_t s:1;
    uint_t e:ebits;
    uint_t m:mbits;
    uint_t m:mbits;
    uint_t e:ebits;
    uint_t s:1;

    static const uint_t mdenom = ((uint_t)1 << mbits);
    static const uint_t ebias = ((uint_t)1 << (ebits - 1)) - 1;

    sint_t sign(void) const { return 1 - 2*s; }
    sint_t exponent(void) const { return e ? e - ebias : -(ebias - 1); }
    uint_t mantissa(void) const { return ((uint_t)(!!e) << mbits) | m; }
    bool infinity(void) const { return (e == ((1 << ebits) - 1)) && (m == 0); }
    bool nan(void) const { return (e == ((1 << ebits) - 1)) && (m != 0); }
    bool denormal(void) const { return (e == 0) && (m != 0); }

    ieee_float(float_t f) { *reinterpret_cast<float_t *>(this) = f; }
    ieee_float(uint_t u) { *reinterpret_cast<uint_t *>(this) = u; }

    operator float_t() const { return *reinterpret_cast<const float_t *>(this); }
    operator uint_t() const { return *reinterpret_cast<const uint_t *>(this); }

    typedef ieee_float<float, int, unsigned, 23, 8>
    typedef ieee_float<double, long long, unsigned long long, 52, 11>


    Charles Coldwell, Apr 20, 2008
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.