floating point conversions && how to read standards

S

Stanley Rice

Hi all, I got confused while referencing to the 4.8/1 in draft n3242.

4.8 Floating point conversions
A prvalue of floating point type can be converted to a prvalue of
another floating point type. If the source value can be exactly
represented in the destination type, the result of the conversion is
that exactly representation. If the source value is between two
adjacent destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise,
the behavior is undefined.

I do not exactly know what the 'destination type' and 'two adjacent
destination value' mean. Could you explain to me and show me some
example if possible.

What's more, I found reading the standard is quite difficult. There
are so many new terms that were never heard before. I have to keep
cross referencing, which in return, I forget where I got started. Is
there experienced ones who can give me some suggestions about how to
read standards.
 
J

Juha Nieminen

Stanley Rice said:
Hi all, I got confused while referencing to the 4.8/1 in draft n3242.

4.8 Floating point conversions
A prvalue of floating point type can be converted to a prvalue of
another floating point type. If the source value can be exactly
represented in the destination type, the result of the conversion is
that exactly representation. If the source value is between two
adjacent destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise,
the behavior is undefined.

I do not exactly know what the 'destination type' and 'two adjacent
destination value' mean. Could you explain to me and show me some
example if possible.

A variable of type 'float' is (usually) smaller than one of type 'double'
(which means that less bits are reserved for the mantissa and for the
exponent in the former than in the latter; less bits in the mantissa
means in practice that the values that can be represented are less
accurate, as not so many digits can be stored in the variable).

Since a 'float' has less accuracy than a 'double', that means that
there are many values of type 'double' that cannot be accurately
represented with a 'float'. The value in question will be between two
values that can be represented with 'float'.

Thus if you assign a value of type 'double' to a variable of type
'float', it's up to the implementation to decide whether to round the
value up or down to the nearest representable value in the latter.
 
B

Bart van Ingen Schenau

Stanley said:
Hi all, I got confused while referencing to the 4.8/1 in draft n3242.

4.8 Floating point conversions
A prvalue of floating point type can be converted to a prvalue of
another floating point type. If the source value can be exactly
represented in the destination type, the result of the conversion is
that exactly representation. If the source value is between two
adjacent destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise,
the behavior is undefined.

I do not exactly know what the 'destination type' and 'two adjacent
destination value' mean. Could you explain to me and show me some
example if possible.

Lets assume, for the purpose of this explanation, that you have a system
where double can represent 3 decimal digits of precision and float can
represent 2 decimal digits of precision.
On such a system, if you have the initialisation

float pi = 3.14;

the constrant 3.14 (of type double) has to be converted to type float.
This makes float the 'destination type' of clause 4.8.
As float can not represent the value 3.14 exactly (given that float can only
represent 2 digits), the compiler has to round the value to one that can be
represented in float. This would be either 3.1f or 3.2f, which are the two
nearest representable values.
What's more, I found reading the standard is quite difficult. There
are so many new terms that were never heard before. I have to
keep cross referencing, which in return, I forget where I got started.
Is there experienced ones who can give me some suggestions
about how to read standards.

Reading standards is not for the faint of heart. They are not meant as a
teaching aid, but are more like a law-text.
The only advice I can give is to keep practicing. With practice, the common
terms and the phraseology become more familiar.

Bart v Ingen Schenau
 
S

Stanley Rice

  A variable of type 'float' is (usually) smaller than one of type 'double'
(which means that less bits are reserved for the mantissa and for the
exponent in the former than in the latter; less bits in the mantissa
means in practice that the values that can be represented are less
accurate, as not so many digits can be stored in the variable).

  Since a 'float' has less accuracy than a 'double', that means that
there are many values of type 'double' that cannot be accurately
represented with a 'float'. The value in question will be between two
values that can be represented with 'float'.

  Thus if you assign a value of type 'double' to a variable of type
'float', it's up to the implementation to decide whether to round the
value up or down to the nearest representable value in the latter.

Thanks for you replying. And I got your idea. I think I was stuck in
the phase "the floating point types" at first. Does the floating piont
types means that types which consist of a dicimal point, and it
includes type *float* and type *double*, right? It it not just the
type *float*.

Similarly, "unsigned integer type" includes either of *unsigned char*,
*unsigned int*, *unsigned long*, etc. And it is not just the type
*int*, right?

I am not a native English speaker, and eager to get close to the
standard, thanks again for your replying.
 
S

Stanley Rice

Lets assume, for the purpose of this explanation, that you have a system
where double can represent 3 decimal digits of precision and float can
represent 2 decimal digits of precision.
On such a system, if you have the initialisation

  float pi = 3.14;

the constrant 3.14 (of type double) has to be converted to type float.
This makes float the 'destination type' of clause 4.8.
As float can not represent the value 3.14 exactly (given that float can only
represent 2 digits), the compiler has to round the value to one that can be
represented in float. This would be either 3.1f or 3.2f, which are the two
nearest representable values.




Reading standards is not for the faint of heart. They are not meant as a
teaching aid, but are more like a law-text.
The only advice I can give is to keep practicing. With practice, the common
terms and the phraseology become more familiar.

Bart v Ingen Schenau

Greatly thanks for your detail explanation. There is no way but to
keep reading it. Practise makes perfect. And I suppose I would keep
posting questions here, wish you generous to help me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top