# long double precision

Discussion in 'C++' started by vi, Nov 16, 2007.

1. ### viGuest

Hello
I have a question concerning the precision of long double, think may
be stupid question, I apalogyze if it is so

here a piece of code

#include <iomanip>
#include <iostream>
using namespace::std;
int main () {
long double toto=0.123456789123456789123456789123456789;
cout << sizeof(long double) << endl;
cout << setprecision(21) << toto << endl;
double titi=0.123456789123456789123456789123456789;
cout << sizeof(double) << endl;
cout << setprecision(21) << titi << endl;
return 0;
}

and the result
16
0.1234567891234567838
8
0.1234567891234567838

I don't understand why long double and double have the same precision
in the output,
they seem to be different in memory, so the problem come from the
initialisation or for the wrinting in the output?

vi, Nov 16, 2007

2. ### Markus MollGuest

Hi

vi wrote:

> Hello
> I have a question concerning the precision of long double, think may
> be stupid question, I apalogyze if it is so
>
> here a piece of code
>
>
> #include <iomanip>
> #include <iostream>
> using namespace::std;
> int main () {
> long double toto=0.123456789123456789123456789123456789;

The above literal is a double literal. Therefore, the same value is assigned
to both toto and titi.

Use 0.123456...L or 0.123456...l to denote that the literal is a long double
literal (unlike with integers, the type to be chosen is not immediately
clear. 0.1 is likely not representable in any of the floating point types,
but you would expect its type to be double, not the type with the greatest
precision).

> cout << sizeof(long double) << endl;
> cout << setprecision(21) << toto << endl;
> double titi=0.123456789123456789123456789123456789;
> cout << sizeof(double) << endl;
> cout << setprecision(21) << titi << endl;
> return 0;
> }

Markus

Markus Moll, Nov 16, 2007

3. ### Victor BazarovGuest

Markus Moll wrote:
> vi wrote:
>
>> Hello
>> I have a question concerning the precision of long double, think may
>> be stupid question, I apalogyze if it is so
>>
>> here a piece of code
>>
>>
>> #include <iomanip>
>> #include <iostream>
>> using namespace::std;
>> int main () {
>> long double toto=0.123456789123456789123456789123456789;

>
> The above literal is a double literal. Therefore, the same value is
> assigned to both toto and titi.
>
> Use 0.123456...L or 0.123456...l to denote that the literal is a long
> double literal (unlike with integers, the type to be chosen is not
> immediately clear. 0.1 is likely not representable in any of the
> floating point types, but you would expect its type to be double, not
> the type with the greatest precision).

The problem may actually be simpler: the Standard does not guarantee
that 'long double' has more precision than 'double'. BTW, it is the
case with Microsoft Visual C++ on Windows, for example.

>
>> cout << sizeof(long double) << endl;
>> cout << setprecision(21) << toto << endl;
>> double titi=0.123456789123456789123456789123456789;
>> cout << sizeof(double) << endl;
>> cout << setprecision(21) << titi << endl;
>> return 0;
>> }

>
> Markus

V
--

Victor Bazarov, Nov 16, 2007
4. ### Markus MollGuest

Hi

Victor Bazarov wrote:

> Markus Moll wrote:
>> Use 0.123456...L or 0.123456...l to denote that the literal is a long
>> double literal (unlike with integers, the type to be chosen is not
>> immediately clear. 0.1 is likely not representable in any of the
>> floating point types, but you would expect its type to be double, not
>> the type with the greatest precision).

>
> The problem may actually be simpler: the Standard does not guarantee
> that 'long double' has more precision than 'double'. BTW, it is the
> case with Microsoft Visual C++ on Windows, for example.

Phew... as the OP said that his long double was twice the size of a double,
I assumed that the precision would also be greater. However, of course it's
possible that all the space is wasted or used for the exponent (or for
redundant sign-bits for error-correction or something like this )

What does MSVC++ say about sizeof(long double) vs sizeof(double)?

Markus

Markus Moll, Nov 16, 2007
5. ### Victor BazarovGuest

Markus Moll wrote:
> [..]
> What does MSVC++ say about sizeof(long double) vs sizeof(double)?

8 vs 8

V
--

Victor Bazarov, Nov 16, 2007
6. ### viGuest

Hello
Great, it works with L!
Thanks

On 16 nov, 11:47, Markus Moll <> wrote:
> Hi
>
> vi wrote:
> > Hello
> > I have a question concerning the precision of long double, think may
> > be stupid question, I apalogyze if it is so

>
> > here a piece of code

>
> > #include <iomanip>
> > #include <iostream>
> > using namespace::std;
> > int main () {
> > long double toto=0.123456789123456789123456789123456789;

>
> The above literal is a double literal. Therefore, the same value is assigned
> to both toto and titi.
>
> Use 0.123456...L or 0.123456...l to denote that the literal is a long double
> literal (unlike with integers, the type to be chosen is not immediately
> clear. 0.1 is likely not representable in any of the floating point types,
> but you would expect its type to be double, not the type with the greatest
> precision).
>
> > cout << sizeof(long double) << endl;
> > cout << setprecision(21) << toto << endl;
> > double titi=0.123456789123456789123456789123456789;
> > cout << sizeof(double) << endl;
> > cout << setprecision(21) << titi << endl;
> > return 0;
> > }

>
> Markus

vi, Nov 16, 2007
7. ### Juha NieminenGuest

Victor Bazarov wrote:
> Markus Moll wrote:
>> [..]
>> What does MSVC++ say about sizeof(long double) vs sizeof(double)?

>
> 8 vs 8

MSVC++ has all kinds of odd settings which are standard, but different
from any other compiler. Another one is that, if I'm not mistaken,
sizeof(long) == 32 even in 64-bit platforms when compiling a 64-bit
binary. (So if you ever programmed assuming 'long' will be 64 bits in a
64-bit system, then you are for a surprise.)
Makes one wonder how you seek a file larger than 4GB, given that fseek
takes a long as parameter.

(Btw, *why* does it take a long as parameter? Shouldn't it take
size_t? It's not like what MSVC++ does is wrong or against the standard.
It just makes it impossible to seek large files with standard code.)

Juha Nieminen, Nov 16, 2007
8. ### Victor BazarovGuest

Juha Nieminen wrote:
> Victor Bazarov wrote:
>> Markus Moll wrote:
>>> [..]
>>> What does MSVC++ say about sizeof(long double) vs sizeof(double)?

>>
>> 8 vs 8

>
> MSVC++ has all kinds of odd settings which are standard, but
> different from any other compiler. Another one is that, if I'm not
> mistaken, sizeof(long) == 32 even in 64-bit platforms when compiling
> a 64-bit binary. (So if you ever programmed assuming 'long' will be
> 64 bits in a 64-bit system, then you are for a surprise.)
> Makes one wonder how you seek a file larger than 4GB,

2GB, actually. 'long' is signed, the largest value is 2^31-1. You
must be thinking 'unsigned long', but that's not what 'fseek' is
taking (as you correctly pointed out).

> given that
> fseek takes a long as parameter.
>
> (Btw, *why* does it take a long as parameter? Shouldn't it take
> size_t? It's not like what MSVC++ does is wrong or against the
> standard. It just makes it impossible to seek large files with
> standard code.)

(a) It takes 'long' because when C Library was standardised (1989)
there was no concern probably with the files larger than what 'long'
can service, and besides, as the files grow, so will 'long', right?
[Well, Microsoft told them all, didn't it?] (b) If you need to seek
in files larger than 'long' allows, use either 'fsetpos' or some OS
specific means. (c) size_t is not a very suitable thing for that,
since 'size_t' is for the sizes of objects. I would rather think
that 'ptrdiff_t' is a better choice. (d) Don't use C Library for
file I/O, use C++ Library, there you'll deal with the special type
for the position, 'std::basic_streambuf:os_type'. And if it's
not large enough, complain to the compiler vendor.

V
--

Victor Bazarov, Nov 16, 2007
9. ### BobRGuest

Markus Moll wrote in message...
> Victor Bazarov wrote:
> > [snip]
> > The problem may actually be simpler: the Standard does not guarantee
> > that 'long double' has more precision than 'double'. BTW, it is the
> > case with Microsoft Visual C++ on Windows, for example.

>
> Phew... as the OP said that his long double was twice the size of a

double,
> I assumed that the precision would also be greater. However, of course

it's
> possible that all the space is wasted or used for the exponent (or for
> redundant sign-bits for error-correction or something like this )

// #include <iostream>, <limits>
std::cout <<" dbl digits ="
<<(std::numeric_limits<double>::digits)<<std::endl;
std::cout<<" LD digits ="
<<(std::numeric_limits<long double>::digits)<<std::endl;

/* - output - (GCC(MinGW), win98se)
dbl digits =53
LD digits =64
*/
See what you get from those lines.

>
> What does MSVC++ say about sizeof(long double) vs sizeof(double)?

I asked My Second Virtual Cousin (twice added), and he said nothing! <G>

In Assembler, I used to use eight-byte(dd) and ten-byte(dt) types. That's
not even close to "twice the size" (If we're talking number of bits).
[ assembler == a386 ]

--
Bob R
POVrookie

BobR, Nov 16, 2007
10. ### James KanzeGuest

On Nov 16, 9:55 pm, "Victor Bazarov" <> wrote:
> Juha Nieminen wrote:

> > (Btw, *why* does it take a long as parameter? Shouldn't it take
> > size_t? It's not like what MSVC++ does is wrong or against the
> > standard. It just makes it impossible to seek large files with
> > standard code.)

> (a) It takes 'long' because when C Library was standardised (1989)
> there was no concern probably with the files larger than what 'long'
> can service, and besides, as the files grow, so will 'long', right?

I don't think that's true. It's been a while, and maybe I'm
remembering wrong, but I think the problem with using long was
knows already back then. I *think* (that is, I'm far from sure)
that the "answer" was supposed to be fgetpos and fsetpos; fseek,
with long was maintained for reasons of compatilibity with
existing code.

Whatever the case, fsetpos and fgetpos didn't take; people
continued using fseek. And C++ went in yet another direction,
and ended up requiring the impossible in the standard. (The
standard requires round-trip conversions between streamoff and
streampos, but it also requires streampos to contain more
information.)

IMHO, the real problem is more fundamental: text files and seek
simply don't mix, and any attempts by the standard to make it
work are bound to have problems. C (and indirectly C++) sort of
addresses those problems by limiting the possibilities of
seeking in a file opened in text mode. The fact that filebuf
does code translation even in binary mode reintroduces them in
C++. And somewhere in all that, implementations seem to have
forgotten that neither streampos nor streamoff are required to
be integral types. (Or perhaps rather, they don't dare change
them from their historical types for fear of breaking existing
code.)

With regards to size_t: size_t is related to memory size or
addressability, not file size: there's certainly nothing
impossible about a 16 bit system allowing files of more than 4
GB. Posix uses off_t in its standard (but requires it to be an
integral type---of course, Posix systems have to support long
long as well). The logical solution is a different type(def).
Like in fgetpos and fsetpos.

> [Well, Microsoft told them all, didn't it?] (b) If you need to seek
> in files larger than 'long' allows, use either 'fsetpos' or some OS
> specific means. (c) size_t is not a very suitable thing for that,
> since 'size_t' is for the sizes of objects. I would rather think
> that 'ptrdiff_t' is a better choice. (d) Don't use C Library for
> file I/O, use C++ Library, there you'll deal with the special type
> for the position, 'std::basic_streambuf:os_type'. And if it's
> not large enough, complain to the compiler vendor.

Who also has to deal with existing code. How many times have
we seen people implicitly converting streampos (i.e.
std::streambuf:os_type) to some integral type?

Systems have the same problem, with regards to existing code,
and Sun, for example, offers three or four different options to
handle it at the Posix level (not all of which are strictly
Posix conform, obviously).

--
James Kanze (GABI Software) email:
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze, Nov 17, 2007
11. ### James KanzeGuest

On Nov 16, 10:55 pm, "BobR" <> wrote:
> Markus Moll wrote in message...
> > What does MSVC++ say about sizeof(long double) vs sizeof(double)?

> I asked My Second Virtual Cousin (twice added), and he said nothing! <G>

> In Assembler, I used to use eight-byte(dd) and ten-byte(dt) types. That's
> not even close to "twice the size" (If we're talking number of bits).
> [ assembler == a386 ]

And g++ on a PC, at least in some configurations, uses 8 and 12
bytes.

At the hardware level, there are 10 bytes of information in an
Intel long double. But 10 bytes results in some awkward
alignments. Microsoft, from what I understand, punts on the
question, by ignoring the hardware long double type (which is
conform, if not very useful). G++ originally chose to use 12
bytes (with 2 garbage bytes) according to alignment
considerations on some older machines: with modern hardware,
unless you have 16 bytes alignment, you might as well go with
10. (I'm not sure, but there may also be options to control
this.)

Once again, there is no right answer.

--
James Kanze (GABI Software) email:
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze, Nov 17, 2007
12. ### Charles ColdwellGuest

James Kanze <> writes:

> On Nov 16, 10:55 pm, "BobR" <> wrote:
>> Markus Moll wrote in message...
>> > What does MSVC++ say about sizeof(long double) vs sizeof(double)?

>
>> I asked My Second Virtual Cousin (twice added), and he said nothing! <G>

>
>> In Assembler, I used to use eight-byte(dd) and ten-byte(dt) types. That's
>> not even close to "twice the size" (If we're talking number of bits).
>> [ assembler == a386 ]

>
> And g++ on a PC, at least in some configurations, uses 8 and 12
> bytes.
>
> At the hardware level, there are 10 bytes of information in an
> Intel long double.

True, with some additional considerations. The commonly used IEEE 754
floating point formats are

single precision: 32 bits including 1 sign bit, 23 significand bits
(with an implicit leading 1, for 24 total), and 8 exponent bits

double precision: 64 bits including 1 sign bit, 52 significand bits
(with an implicit leading 1, for 24 total), and 11 exponent bits

double extended precision: 80 bits including 1 sign bit, 64 significand
bits (no implicit leading 1), and 15 exponent bits.

The native format of the x87 FPU is "double extended". We run into this
from time to time when floating point computations compile with more
optimization give slightly different results. The issue is that one
optimiziation is to hold intermediate results in 80-bit FPU registers
instead of rounding them down to fit in 64-bit memory locations. gcc
offers the -ffloat-store to suppress this optimization for that very
reason.

In addition, the x87 control register allows one to select between
extended (the default), double and single precision. Try this (on
Linux/gcc/glibc), for example:

#include <fpu_control.h>

void set_double(void)
{
unsigned short cw;
_FPU_GETCW(cw);
cw = (cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;
_FPU_SETCW(cw);
}

and similarly for "set_extended". Note that this will make all kinds of
trouble for you because libm depends on having the FPU in extended
precision mode.

Now comes the real kicker: the SSE/SSE2/SSE3 vector co-processors do not
support double extended precision; only single and double precision.
gcc and icc are both using these co-processors pretty extensively now,
so you're not guaranteed to get even intermediate results done in
double-extended arithmetic.

Going back on-topic, if you want to peek at the binary representations
of IEEE floating-point numbers, you might enjoy the template included
below. I supply typedefs for single and double precision, writing the
typedef for double extended precision is left as an exercise to the

// IEEE Floating-Point template
// Copyright (C) 2007 Charles M. "Chip" Coldwell <>

// This program is free software: you can redistribute it and/or modify
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.

// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.

// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.

#ifndef IEEEFLOAT_HH
#define IEEEFLOAT_HH

template
<typename _float_t, typename _sint_t, typename _uint_t, int _mbits, int _ebits>
class ieee_float {
public:
typedef _uint_t uint_t;
typedef _sint_t sint_t;
typedef _float_t float_t;
enum { mbits = _mbits, ebits = _ebits };

#ifdef __BIG_ENDIAN__
uint_t s:1;
uint_t e:ebits;
uint_t m:mbits;
#else
uint_t m:mbits;
uint_t e:ebits;
uint_t s:1;
#endif

static const uint_t mdenom = ((uint_t)1 << mbits);
static const uint_t ebias = ((uint_t)1 << (ebits - 1)) - 1;

sint_t sign(void) const { return 1 - 2*s; }
sint_t exponent(void) const { return e ? e - ebias : -(ebias - 1); }
uint_t mantissa(void) const { return ((uint_t)(!!e) << mbits) | m; }
bool infinity(void) const { return (e == ((1 << ebits) - 1)) && (m == 0); }
bool nan(void) const { return (e == ((1 << ebits) - 1)) && (m != 0); }
bool denormal(void) const { return (e == 0) && (m != 0); }

ieee_float(float_t f) { *reinterpret_cast<float_t *>(this) = f; }
ieee_float(uint_t u) { *reinterpret_cast<uint_t *>(this) = u; }

operator float_t() const { return *reinterpret_cast<const float_t *>(this); }
operator uint_t() const { return *reinterpret_cast<const uint_t *>(this); }
};

typedef ieee_float<float, int, unsigned, 23, 8>
single_precision;
typedef ieee_float<double, long long, unsigned long long, 52, 11>
double_precision;

#endif

Chip

--
Charles M. "Chip" Coldwell