long double precision

V

vi

Hello
I have a question concerning the precision of long double, think may
be stupid question, I apalogyze if it is so

here a piece of code


#include <iomanip>
#include <iostream>
using namespace::std;
int main () {
long double toto=0.123456789123456789123456789123456789;
cout << sizeof(long double) << endl;
cout << setprecision(21) << toto << endl;
double titi=0.123456789123456789123456789123456789;
cout << sizeof(double) << endl;
cout << setprecision(21) << titi << endl;
return 0;
}

and the result
16
0.1234567891234567838
8
0.1234567891234567838

I don't understand why long double and double have the same precision
in the output,
they seem to be different in memory, so the problem come from the
initialisation or for the wrinting in the output?

Thanks in advance for your reply,
 
M

Markus Moll

Hi
Hello
I have a question concerning the precision of long double, think may
be stupid question, I apalogyze if it is so

here a piece of code


#include <iomanip>
#include <iostream>
using namespace::std;
int main () {
long double toto=0.123456789123456789123456789123456789;

The above literal is a double literal. Therefore, the same value is assigned
to both toto and titi.

Use 0.123456...L or 0.123456...l to denote that the literal is a long double
literal (unlike with integers, the type to be chosen is not immediately
clear. 0.1 is likely not representable in any of the floating point types,
but you would expect its type to be double, not the type with the greatest
precision).
cout << sizeof(long double) << endl;
cout << setprecision(21) << toto << endl;
double titi=0.123456789123456789123456789123456789;
cout << sizeof(double) << endl;
cout << setprecision(21) << titi << endl;
return 0;
}

Markus
 
V

Victor Bazarov

Markus said:
The above literal is a double literal. Therefore, the same value is
assigned to both toto and titi.

Use 0.123456...L or 0.123456...l to denote that the literal is a long
double literal (unlike with integers, the type to be chosen is not
immediately clear. 0.1 is likely not representable in any of the
floating point types, but you would expect its type to be double, not
the type with the greatest precision).

The problem may actually be simpler: the Standard does not guarantee
that 'long double' has more precision than 'double'. BTW, it is the
case with Microsoft Visual C++ on Windows, for example.

V
 
M

Markus Moll

Hi

Victor said:
The problem may actually be simpler: the Standard does not guarantee
that 'long double' has more precision than 'double'. BTW, it is the
case with Microsoft Visual C++ on Windows, for example.

Phew... as the OP said that his long double was twice the size of a double,
I assumed that the precision would also be greater. However, of course it's
possible that all the space is wasted or used for the exponent (or for
redundant sign-bits for error-correction or something like this :p)

What does MSVC++ say about sizeof(long double) vs sizeof(double)?

Markus
 
V

vi

Hello
Great, it works with L!
Thanks


Hi




The above literal is a double literal. Therefore, the same value is assigned
to both toto and titi.

Use 0.123456...L or 0.123456...l to denote that the literal is a long double
literal (unlike with integers, the type to be chosen is not immediately
clear. 0.1 is likely not representable in any of the floating point types,
but you would expect its type to be double, not the type with the greatest
precision).


Markus
 
J

Juha Nieminen

Victor said:
Markus said:
[..]
What does MSVC++ say about sizeof(long double) vs sizeof(double)?

8 vs 8

MSVC++ has all kinds of odd settings which are standard, but different
from any other compiler. Another one is that, if I'm not mistaken,
sizeof(long) == 32 even in 64-bit platforms when compiling a 64-bit
binary. (So if you ever programmed assuming 'long' will be 64 bits in a
64-bit system, then you are for a surprise.)
Makes one wonder how you seek a file larger than 4GB, given that fseek
takes a long as parameter.

(Btw, *why* does it take a long as parameter? Shouldn't it take
size_t? It's not like what MSVC++ does is wrong or against the standard.
It just makes it impossible to seek large files with standard code.)
 
V

Victor Bazarov

Juha said:
Victor said:
Markus said:
[..]
What does MSVC++ say about sizeof(long double) vs sizeof(double)?

8 vs 8

MSVC++ has all kinds of odd settings which are standard, but
different from any other compiler. Another one is that, if I'm not
mistaken, sizeof(long) == 32 even in 64-bit platforms when compiling
a 64-bit binary. (So if you ever programmed assuming 'long' will be
64 bits in a 64-bit system, then you are for a surprise.)
Makes one wonder how you seek a file larger than 4GB,

2GB, actually. 'long' is signed, the largest value is 2^31-1. You
must be thinking 'unsigned long', but that's not what 'fseek' is
taking (as you correctly pointed out).
given that
fseek takes a long as parameter.

(Btw, *why* does it take a long as parameter? Shouldn't it take
size_t? It's not like what MSVC++ does is wrong or against the
standard. It just makes it impossible to seek large files with
standard code.)

(a) It takes 'long' because when C Library was standardised (1989)
there was no concern probably with the files larger than what 'long'
can service, and besides, as the files grow, so will 'long', right?
[Well, Microsoft told them all, didn't it?] (b) If you need to seek
in files larger than 'long' allows, use either 'fsetpos' or some OS
specific means. (c) size_t is not a very suitable thing for that,
since 'size_t' is for the sizes of objects. I would rather think
that 'ptrdiff_t' is a better choice. (d) Don't use C Library for
file I/O, use C++ Library, there you'll deal with the special type
for the position, 'std::basic_streambuf::pos_type'. And if it's
not large enough, complain to the compiler vendor.

V
 
B

BobR

Markus Moll wrote in message...
Victor said:
[snip]
The problem may actually be simpler: the Standard does not guarantee
that 'long double' has more precision than 'double'. BTW, it is the
case with Microsoft Visual C++ on Windows, for example.

Phew... as the OP said that his long double was twice the size of a double,
I assumed that the precision would also be greater. However, of course it's
possible that all the space is wasted or used for the exponent (or for
redundant sign-bits for error-correction or something like this :p)

// #include <iostream>, <limits>
std::cout <<" dbl digits ="
<<(std::numeric_limits<double>::digits)<<std::endl;
std::cout<<" LD digits ="
<<(std::numeric_limits<long double>::digits)<<std::endl;

/* - output - (GCC(MinGW), win98se)
dbl digits =53
LD digits =64
*/
See what you get from those lines.
What does MSVC++ say about sizeof(long double) vs sizeof(double)?

I asked My Second Virtual Cousin (twice added), and he said nothing! <G>

In Assembler, I used to use eight-byte(dd) and ten-byte(dt) types. That's
not even close to "twice the size" (If we're talking number of bits).
[ assembler == a386 ]
 
J

James Kanze

Juha Nieminen wrote:
(a) It takes 'long' because when C Library was standardised (1989)
there was no concern probably with the files larger than what 'long'
can service, and besides, as the files grow, so will 'long', right?

I don't think that's true. It's been a while, and maybe I'm
remembering wrong, but I think the problem with using long was
knows already back then. I *think* (that is, I'm far from sure)
that the "answer" was supposed to be fgetpos and fsetpos; fseek,
with long was maintained for reasons of compatilibity with
existing code.

Whatever the case, fsetpos and fgetpos didn't take; people
continued using fseek. And C++ went in yet another direction,
and ended up requiring the impossible in the standard. (The
standard requires round-trip conversions between streamoff and
streampos, but it also requires streampos to contain more
information.)

IMHO, the real problem is more fundamental: text files and seek
simply don't mix, and any attempts by the standard to make it
work are bound to have problems. C (and indirectly C++) sort of
addresses those problems by limiting the possibilities of
seeking in a file opened in text mode. The fact that filebuf
does code translation even in binary mode reintroduces them in
C++. And somewhere in all that, implementations seem to have
forgotten that neither streampos nor streamoff are required to
be integral types. (Or perhaps rather, they don't dare change
them from their historical types for fear of breaking existing
code.)

With regards to size_t: size_t is related to memory size or
addressability, not file size: there's certainly nothing
impossible about a 16 bit system allowing files of more than 4
GB. Posix uses off_t in its standard (but requires it to be an
integral type---of course, Posix systems have to support long
long as well). The logical solution is a different type(def).
Like in fgetpos and fsetpos.
[Well, Microsoft told them all, didn't it?] (b) If you need to seek
in files larger than 'long' allows, use either 'fsetpos' or some OS
specific means. (c) size_t is not a very suitable thing for that,
since 'size_t' is for the sizes of objects. I would rather think
that 'ptrdiff_t' is a better choice. (d) Don't use C Library for
file I/O, use C++ Library, there you'll deal with the special type
for the position, 'std::basic_streambuf::pos_type'. And if it's
not large enough, complain to the compiler vendor.

Who also has to deal with existing code:). How many times have
we seen people implicitly converting streampos (i.e.
std::streambuf::pos_type) to some integral type?

Systems have the same problem, with regards to existing code,
and Sun, for example, offers three or four different options to
handle it at the Posix level (not all of which are strictly
Posix conform, obviously).
 
J

James Kanze

Markus Moll wrote in message...
I asked My Second Virtual Cousin (twice added), and he said nothing! <G>
In Assembler, I used to use eight-byte(dd) and ten-byte(dt) types. That's
not even close to "twice the size" (If we're talking number of bits).
[ assembler == a386 ]

And g++ on a PC, at least in some configurations, uses 8 and 12
bytes.

At the hardware level, there are 10 bytes of information in an
Intel long double. But 10 bytes results in some awkward
alignments. Microsoft, from what I understand, punts on the
question, by ignoring the hardware long double type (which is
conform, if not very useful). G++ originally chose to use 12
bytes (with 2 garbage bytes) according to alignment
considerations on some older machines: with modern hardware,
unless you have 16 bytes alignment, you might as well go with
10. (I'm not sure, but there may also be options to control
this.)

Once again, there is no right answer.
 
C

Charles Coldwell

James Kanze said:
Markus Moll wrote in message...
I asked My Second Virtual Cousin (twice added), and he said nothing! <G>
In Assembler, I used to use eight-byte(dd) and ten-byte(dt) types. That's
not even close to "twice the size" (If we're talking number of bits).
[ assembler == a386 ]

And g++ on a PC, at least in some configurations, uses 8 and 12
bytes.

At the hardware level, there are 10 bytes of information in an
Intel long double.

True, with some additional considerations. The commonly used IEEE 754
floating point formats are

single precision: 32 bits including 1 sign bit, 23 significand bits
(with an implicit leading 1, for 24 total), and 8 exponent bits

double precision: 64 bits including 1 sign bit, 52 significand bits
(with an implicit leading 1, for 24 total), and 11 exponent bits

double extended precision: 80 bits including 1 sign bit, 64 significand
bits (no implicit leading 1), and 15 exponent bits.

The native format of the x87 FPU is "double extended". We run into this
from time to time when floating point computations compile with more
optimization give slightly different results. The issue is that one
optimiziation is to hold intermediate results in 80-bit FPU registers
instead of rounding them down to fit in 64-bit memory locations. gcc
offers the -ffloat-store to suppress this optimization for that very
reason.

In addition, the x87 control register allows one to select between
extended (the default), double and single precision. Try this (on
Linux/gcc/glibc), for example:

#include <fpu_control.h>

void set_double(void)
{
unsigned short cw;
_FPU_GETCW(cw);
cw = (cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;
_FPU_SETCW(cw);
}

and similarly for "set_extended". Note that this will make all kinds of
trouble for you because libm depends on having the FPU in extended
precision mode.

Now comes the real kicker: the SSE/SSE2/SSE3 vector co-processors do not
support double extended precision; only single and double precision.
gcc and icc are both using these co-processors pretty extensively now,
so you're not guaranteed to get even intermediate results done in
double-extended arithmetic.

Going back on-topic, if you want to peek at the binary representations
of IEEE floating-point numbers, you might enjoy the template included
below. I supply typedefs for single and double precision, writing the
typedef for double extended precision is left as an exercise to the
reader.

// IEEE Floating-Point template
// Copyright (C) 2007 Charles M. "Chip" Coldwell <[email protected]>

// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.

// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.

// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.

#ifndef IEEEFLOAT_HH
#define IEEEFLOAT_HH

template
<typename _float_t, typename _sint_t, typename _uint_t, int _mbits, int _ebits>
class ieee_float {
public:
typedef _uint_t uint_t;
typedef _sint_t sint_t;
typedef _float_t float_t;
enum { mbits = _mbits, ebits = _ebits };

#ifdef __BIG_ENDIAN__
uint_t s:1;
uint_t e:ebits;
uint_t m:mbits;
#else
uint_t m:mbits;
uint_t e:ebits;
uint_t s:1;
#endif

static const uint_t mdenom = ((uint_t)1 << mbits);
static const uint_t ebias = ((uint_t)1 << (ebits - 1)) - 1;

sint_t sign(void) const { return 1 - 2*s; }
sint_t exponent(void) const { return e ? e - ebias : -(ebias - 1); }
uint_t mantissa(void) const { return ((uint_t)(!!e) << mbits) | m; }
bool infinity(void) const { return (e == ((1 << ebits) - 1)) && (m == 0); }
bool nan(void) const { return (e == ((1 << ebits) - 1)) && (m != 0); }
bool denormal(void) const { return (e == 0) && (m != 0); }

ieee_float(float_t f) { *reinterpret_cast<float_t *>(this) = f; }
ieee_float(uint_t u) { *reinterpret_cast<uint_t *>(this) = u; }

operator float_t() const { return *reinterpret_cast<const float_t *>(this); }
operator uint_t() const { return *reinterpret_cast<const uint_t *>(this); }
};

typedef ieee_float<float, int, unsigned, 23, 8>
single_precision;
typedef ieee_float<double, long long, unsigned long long, 52, 11>
double_precision;

#endif

Chip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top