float to string to float, with first float == second float

Discussion in 'C++' started by Carsten Fuchs, Oct 6, 2009.

  1. Dear group,

    I would like to serialize a float f1 to a string s, then unserialize s back to a float f2 again,
    such that:

    * s is minimal (use only the precision that is required)
    and preferably in decimal notation,
    * f1==f2 (exact same value after the roundtrip)

    (The first property is for human readers and file size, the second is for data integrity.)
    Here is my implementation:


    std::string serialize(float f1)
    {
    std::string s;

    for (unsigned int prec=6; prec<10; prec++)
    {
    std::stringstream ss;

    ss.precision(prec);
    ss << f1;

    s=ss.str();

    float f2;
    ss >> f2;

    if (f2==f1) break;
    }

    return s;
    }


    It seems to work very well, and I found that in my application 65% to 90% of the float numbers
    serialized this way exit the loop in the first iteration.

    However, I was wondering if there is a shorter and/or more elegant way of implementing this,
    using either C++ streams, C printf-functions, or any other technique available in C++.

    I'd be very grateful for any feedback and advice!

    Thank you very much, and best regards,
    Carsten



    --
    Cafu - The Game and Graphics Engine for
    multiplayer, cross-platform, real-time 3D Action
    Learn more at www.cafu.de
    Carsten Fuchs, Oct 6, 2009
    #1
    1. Advertising

  2. On 6 Ott, 11:16, Carsten Fuchs <> wrote:
    > Dear group,
    >
    > I would like to serialize a float f1 to a string s, then unserialize s back to a float f2 again,
    > such that:
    >
    >         * s is minimal (use only the precision that is required)
    >           and preferably in decimal notation,
    >         * f1==f2 (exact same value after the roundtrip)
    >
    > (The first property is for human readers and file size, the second is for data integrity.)
    > Here is my implementation:
    >
    > std::string serialize(float f1)
    > {
    >      std::string s;
    >
    >      for (unsigned int prec=6; prec<10; prec++)
    >      {
    >          std::stringstream ss;
    >
    >          ss.precision(prec);
    >          ss << f1;
    >
    >          s=ss.str();
    >
    >          float f2;
    >          ss >> f2;
    >
    >          if (f2==f1) break;
    >      }
    >
    >      return s;
    >
    > }
    >
    > It seems to work very well, and I found that in my application 65% to 90% of the float numbers
    > serialized this way exit the loop in the first iteration.
    >
    > However, I was wondering if there is a shorter and/or more elegant way of implementing this,
    > using either C++ streams, C printf-functions, or any other technique available in C++.


    There are a couple of problems with your code.

    First, you should never compare floats or doubles for equality that
    way.

    Secondarily, extracting (>>) non-character data (integers, doubles and
    so on) from streams can hang.

    There is another thread running more or less on the same issues. I
    cannot add further details without spoiling the other one, consider
    having a look there (the topic is: "Good way to read tuple or tripel
    from text file?".)

    You can also reach a lot of useful information from the last two links
    of my signature.

    Cheers,
    Francesco

    --
    Francesco S. Carta, http://fscode.altervista.org
    First time here? Read the 'Welcome' and the 'FAQ'
    Welcome: http://www.slack.net/~shiva/welcome.txt
    C++ FAQ: http://www.parashift.com/c -faq-lite
    Francesco S. Carta, Oct 6, 2009
    #2
    1. Advertising

  3. On 6 Ott, 11:34, "Francesco S. Carta" <> wrote:
    > On 6 Ott, 11:16, Carsten Fuchs <> wrote:
    >
    >
    >
    > > Dear group,

    >
    > > I would like to serialize a float f1 to a string s, then unserialize s back to a float f2 again,
    > > such that:

    >
    > >         * s is minimal (use only the precision that is required)
    > >           and preferably in decimal notation,
    > >         * f1==f2 (exact same value after the roundtrip)

    >
    > > (The first property is for human readers and file size, the second is for data integrity.)
    > > Here is my implementation:

    >
    > > std::string serialize(float f1)
    > > {
    > >      std::string s;

    >
    > >      for (unsigned int prec=6; prec<10; prec++)
    > >      {
    > >          std::stringstream ss;

    >
    > >          ss.precision(prec);
    > >          ss << f1;

    >
    > >          s=ss.str();

    >
    > >          float f2;
    > >          ss >> f2;

    >
    > >          if (f2==f1) break;
    > >      }

    >
    > >      return s;

    >
    > > }

    >
    > > It seems to work very well, and I found that in my application 65% to 90% of the float numbers
    > > serialized this way exit the loop in the first iteration.

    >
    > > However, I was wondering if there is a shorter and/or more elegant way of implementing this,
    > > using either C++ streams, C printf-functions, or any other technique available in C++.

    >
    > There are a couple of problems with your code.
    >
    > First, you should never compare floats or doubles for equality that
    > way.
    >
    > Secondarily, extracting (>>) non-character data (integers, doubles and
    > so on) from streams can hang.


    I'm misusing the "hang" word, I noticed, and of course, this wasn't
    the case with your code, but I had to mention it - I meant to say that
    it can misbehave.

    --
    Francesco S. Carta, http://fscode.altervista.org
    First time here? Read the 'Welcome' and the 'FAQ'
    Welcome: http://www.slack.net/~shiva/welcome.txt
    C++ FAQ: http://www.parashift.com/c -faq-lite
    Francesco S. Carta, Oct 6, 2009
    #3
  4. Carsten Fuchs

    Fred Zwarts Guest

    Francesco S. Carta wrote:
    > On 6 Ott, 11:16, Carsten Fuchs <> wrote:
    >> Dear group,
    >>
    >> I would like to serialize a float f1 to a string s, then unserialize
    >> s back to a float f2 again, such that:
    >>
    >> * s is minimal (use only the precision that is required)
    >> and preferably in decimal notation,
    >> * f1==f2 (exact same value after the roundtrip)
    >>
    >> (The first property is for human readers and file size, the second
    >> is for data integrity.)
    >> Here is my implementation:
    >>
    >> std::string serialize(float f1)
    >> {
    >> std::string s;
    >>
    >> for (unsigned int prec=6; prec<10; prec++)
    >> {
    >> std::stringstream ss;
    >>
    >> ss.precision(prec);
    >> ss << f1;
    >>
    >> s=ss.str();
    >>
    >> float f2;
    >> ss >> f2;
    >>
    >> if (f2==f1) break;
    >> }
    >>
    >> return s;
    >>
    >> }
    >>
    >> It seems to work very well, and I found that in my application 65%
    >> to 90% of the float numbers serialized this way exit the loop in the
    >> first iteration.
    >>
    >> However, I was wondering if there is a shorter and/or more elegant
    >> way of implementing this, using either C++ streams, C
    >> printf-functions, or any other technique available in C++.

    >
    > There are a couple of problems with your code.
    >
    > First, you should never compare floats or doubles for equality that
    > way.


    Why not? One of the requirements is "exact same value after the roundtrip".
    What is a better way to compare for exact same value?
    Fred Zwarts, Oct 6, 2009
    #4
  5. On 6 Ott, 11:55, "Fred Zwarts" <> wrote:
    > Francesco S. Carta wrote:
    > > On 6 Ott, 11:16, Carsten Fuchs <> wrote:
    > >> Dear group,

    >
    > >> I would like to serialize a float f1 to a string s, then unserialize
    > >> s back to a float f2 again, such that:

    >
    > >> * s is minimal (use only the precision that is required)
    > >> and preferably in decimal notation,
    > >> * f1==f2 (exact same value after the roundtrip)

    >
    > >> (The first property is for human readers and file size, the second
    > >> is for data integrity.)
    > >> Here is my implementation:

    >
    > >> std::string serialize(float f1)
    > >> {
    > >> std::string s;

    >
    > >> for (unsigned int prec=6; prec<10; prec++)
    > >> {
    > >> std::stringstream ss;

    >
    > >> ss.precision(prec);
    > >> ss << f1;

    >
    > >> s=ss.str();

    >
    > >> float f2;
    > >> ss >> f2;

    >
    > >> if (f2==f1) break;
    > >> }

    >
    > >> return s;

    >
    > >> }

    >
    > >> It seems to work very well, and I found that in my application 65%
    > >> to 90% of the float numbers serialized this way exit the loop in the
    > >> first iteration.

    >
    > >> However, I was wondering if there is a shorter and/or more elegant
    > >> way of implementing this, using either C++ streams, C
    > >> printf-functions, or any other technique available in C++.

    >
    > > There are a couple of problems with your code.

    >
    > > First, you should never compare floats or doubles for equality that
    > > way.

    >
    > Why not? One of the requirements is "exact same value after the roundtrip".
    > What is a better way to compare for exact same value?


    The equality operator applied to floating point types simply isn't
    affordable.
    It may work in such a particular case, but you're not guaranteed it
    will.

    Built-in floating point types simply don't give such kind of
    certainty, hence I think there is no unequivocal answer to the
    question you're asking me.

    --
    Francesco S. Carta, http://fscode.altervista.org
    First time here? Read the 'Welcome' and the 'FAQ'
    Welcome: http://www.slack.net/~shiva/welcome.txt
    C++ FAQ: http://www.parashift.com/c -faq-lite
    Francesco S. Carta, Oct 6, 2009
    #5
  6. Carsten Fuchs

    Fred Zwarts Guest

    Francesco S. Carta wrote:
    > On 6 Ott, 11:55, "Fred Zwarts" <> wrote:
    >> Francesco S. Carta wrote:
    >>> On 6 Ott, 11:16, Carsten Fuchs <> wrote:
    >>>> Dear group,

    >>
    >>>> I would like to serialize a float f1 to a string s, then
    >>>> unserialize s back to a float f2 again, such that:

    >>
    >>>> * s is minimal (use only the precision that is required)
    >>>> and preferably in decimal notation,
    >>>> * f1==f2 (exact same value after the roundtrip)

    >>
    >>>> (The first property is for human readers and file size, the second
    >>>> is for data integrity.)
    >>>> Here is my implementation:

    >>
    >>>> std::string serialize(float f1)
    >>>> {
    >>>> std::string s;

    >>
    >>>> for (unsigned int prec=6; prec<10; prec++)
    >>>> {
    >>>> std::stringstream ss;

    >>
    >>>> ss.precision(prec);
    >>>> ss << f1;

    >>
    >>>> s=ss.str();

    >>
    >>>> float f2;
    >>>> ss >> f2;

    >>
    >>>> if (f2==f1) break;
    >>>> }

    >>
    >>>> return s;

    >>
    >>>> }

    >>
    >>>> It seems to work very well, and I found that in my application 65%
    >>>> to 90% of the float numbers serialized this way exit the loop in
    >>>> the first iteration.

    >>
    >>>> However, I was wondering if there is a shorter and/or more elegant
    >>>> way of implementing this, using either C++ streams, C
    >>>> printf-functions, or any other technique available in C++.

    >>
    >>> There are a couple of problems with your code.

    >>
    >>> First, you should never compare floats or doubles for equality that
    >>> way.

    >>
    >> Why not? One of the requirements is "exact same value after the
    >> roundtrip". What is a better way to compare for exact same value?

    >
    > The equality operator applied to floating point types simply isn't
    > affordable.
    > It may work in such a particular case, but you're not guaranteed it
    > will.


    What do you mean? Would it give wrong results?
    Return false if the operands are equal
    or return true if operands are different?

    > Built-in floating point types simply don't give such kind of
    > certainty, hence I think there is no unequivocal answer to the
    > question you're asking me.


    As far as I know floating point types are deterministic well defined
    types with binary representations for, exactly what it says, floating point
    values (not to be mixed with the notion of mathematical real numbers).
    The outcome of operations of floating point types can be
    predicted with 100% certainty and 100% accuracy.
    Also the equality operator is well defined
    If you know how floating point types work,
    the equality operator can be used very well.
    If you do not understand how they work,
    the equality operation may give some surprises,
    but don't blame the equality operator.
    Fred Zwarts, Oct 6, 2009
    #6
  7. Carsten Fuchs

    Rune Allnor Guest

    On 6 Okt, 11:16, Carsten Fuchs <> wrote:
    > Dear group,
    >
    > I would like to serialize a float f1 to a string s, then unserialize s back to a float f2 again,
    > such that:
    >
    >         * s is minimal (use only the precision that is required)
    >           and preferably in decimal notation,
    >         * f1==f2 (exact same value after the roundtrip)
    >
    > (The first property is for human readers and file size, the second is for data integrity.)


    You should choose between *either* human readability *or*
    data integrity. You can't have both.

    To demonstrate the point, try

    -----------
    #include <iostream>

    int main()
    {
    float a=0.3;
    double b = a;

    std::cout << "a = " << a << std::endl;
    std::cout.precision(12);
    std::cout << "b = " << b << std::endl;

    return 0;
    }
    -----------

    Output:

    a = 0.3
    b = 0.300000011921

    There are two problems here:

    1) Numbers that are exact in decimal notation
    have no exact floating-point representation,
    only an approximation.
    2) One can not tell the 'true' number of bits or
    digits needed to represent any one number.

    Whenever approximations occur (0.3 can't be represented
    exactly, only approximately, as single precision floating
    point), approximation errors creep in. Whenever the
    accuracy of the representation is expanded, those errors
    can not be avoided. It doesn't matter in the case above
    that 0.3 on double precision format can be represented
    correctly to some 14 significant digits: Since the numeric
    value is constrained by the original single precision format,
    b is incorrect from the 8th significant digit onwards.

    This is the game you are playing when you ask that
    the precision is no larger than 'necessary'.

    Some arguments why *not* to do what you want:

    1) If data integrity is a priority, store the data on a
    binary file format: While imperfect, the binary format
    is consistent. Of course, if the usual number formats
    are unacceptable to you, you could use some binary encoded
    decimal format.
    2) If size is a priority, store the data on a binary file
    format: Single precision floating point numbers require
    32 bits of storage on most platforms available today.
    A text-based format requires some 8-12 8-bit ASCII characters.
    Only a small number of such characters are actually used
    (10 digits, 1 decimal separator, 2 signs, 1-2 field separators),
    giving an entropy in the neighbourhood of 4 bits. Which means
    the naive text-based format is vastly ineffective, files being
    about a factor 2 larger than necessary
    3) If speed is a priority, use a binary file format: It takes a
    long time to convert numerical data back and forth between text
    formats and internal binary formats. Expect 30-60 seconds per
    100 MBytes of text-format numerals.

    Of course, Binary Encoded Decimal formats might be used to
    bring in the human readability aspect, but the question the
    becomes security and consistency: Some time sooner or later
    somebody down the line will forget / disregard / ignore the
    fact that the numbers are BEDs and not standard floats.

    With misery and mayhem to follow.

    Rune
    Rune Allnor, Oct 6, 2009
    #7
  8. On 6 Oct, 12:39, Rune Allnor <> wrote:
    > On 6 Okt, 11:16, Carsten Fuchs <> wrote:



    > > I would like to serialize a float f1 to a string s, then unserialize s back to a float f2
    > > again, such that:

    >
    > >         * s is minimal (use only the precision that is required)
    > >           and preferably in decimal notation,
    > >         * f1==f2 (exact same value after the roundtrip)

    >
    > > (The first property is for human readers and file size, the second is for data integrity.)

    >
    > You should choose between *either* human readability *or*
    > data integrity. You can't have both.
    >
    > To demonstrate the point, try
    >
    > -----------
    > #include <iostream>
    >
    > int main()
    > {
    >         float a=0.3;
    >         double b = a;
    >
    >         std::cout << "a = " << a << std::endl;
    >         std::cout.precision(12);
    >         std::cout << "b = " << b << std::endl;
    >
    >         return 0;}
    >
    > -----------
    >
    > Output:
    >
    > a = 0.3
    > b = 0.300000011921
    >
    > There are two problems here:
    >
    > 1) Numbers that are exact in decimal notation
    >    have no exact floating-point representation,
    >    only an approximation.


    try 0.5

    :)


    > 2) One can not tell the 'true' number of bits or
    >    digits needed to represent any one number.
    >
    > Whenever approximations occur (0.3 can't be represented
    > exactly, only approximately, as single precision floating
    > point), approximation errors creep in. Whenever the
    > accuracy of the representation is expanded, those errors
    > can not be avoided. It doesn't matter in the case above
    > that 0.3 on double precision format can be represented
    > correctly to some 14 significant digits: Since the numeric
    > value is constrained by the original single precision format,
    > b is incorrect from the 8th significant digit onwards.
    >
    > This is the game you are playing when you ask that
    > the precision is no larger than 'necessary'.
    >
    > Some arguments why *not* to do what you want:
    >
    > 1) If data integrity is a priority, store the data on a
    >    binary file format: While imperfect, the binary format
    >    is consistent. Of course, if the usual number formats
    >    are unacceptable to you, you could use some binary encoded
    >    decimal format.
    > 2) If size is a priority, store the data on a binary file
    >    format: Single precision floating point numbers require
    >    32 bits of storage on most platforms available today.
    >    A text-based format requires some 8-12 8-bit ASCII characters.
    >    Only a small number of such characters are actually used
    >    (10 digits, 1 decimal separator, 2 signs, 1-2 field separators),
    >    giving an entropy in the neighbourhood of 4 bits. Which means
    >    the naive text-based format is vastly ineffective, files being
    >    about a factor 2 larger than necessary
    > 3) If speed is a priority, use a binary file format: It takes a
    >    long time to convert numerical data back and forth between text
    >    formats and internal binary formats. Expect 30-60 seconds per
    >    100 MBytes of text-format numerals.
    >
    > Of course, Binary Encoded Decimal formats might be used to
    > bring in the human readability aspect, but the question the
    > becomes security and consistency: Some time sooner or later
    > somebody down the line will forget / disregard / ignore the
    > fact that the numbers are BEDs and not standard floats.
    >
    > With misery and mayhem to follow.


    there a hex floating point formats which I believe
    preserve precision and human (well, computer programmer) readability
    Nick Keighley, Oct 6, 2009
    #8
  9. Carsten Fuchs

    Rune Allnor Guest

    On 6 Okt, 14:22, Nick Keighley <>
    wrote:
    > On 6 Oct, 12:39, Rune Allnor <> wrote:
    >
    >
    >
    >
    >
    > > On 6 Okt, 11:16, Carsten Fuchs <> wrote:
    > > > I would like to serialize a float f1 to a string s, then unserialize s back to a float f2
    > > > again, such that:

    >
    > > >         * s is minimal (use only the precision that is required)
    > > >           and preferably in decimal notation,
    > > >         * f1==f2 (exact same value after the roundtrip)

    >
    > > > (The first property is for human readers and file size, the second is for data integrity.)

    >
    > > You should choose between *either* human readability *or*
    > > data integrity. You can't have both.

    >
    > > To demonstrate the point, try

    >
    > > -----------
    > > #include <iostream>

    >
    > > int main()
    > > {
    > >         float a=0.3;
    > >         double b = a;

    >
    > >         std::cout << "a = " << a << std::endl;
    > >         std::cout.precision(12);
    > >         std::cout << "b = " << b << std::endl;

    >
    > >         return 0;}

    >
    > > -----------

    >
    > > Output:

    >
    > > a = 0.3
    > > b = 0.300000011921

    >
    > > There are two problems here:

    >
    > > 1) Numbers that are exact in decimal notation
    > >    have no exact floating-point representation,
    > >    only an approximation.

    >
    > try 0.5
    >
    > :)


    That's one of the few decimal numbers that can be represented
    exactly on binary format. In fact, decimal numbers on the form

    x = 2^n

    where n is selected from a certain subset of integers, can be
    represented exactly, as binary numbers. If the OP can accept
    such a constraint on the decimal numbers he wants to work with,
    then by all means, disregard what I said. But if he wants to
    work with arbitrary numbers, he is in for trouble.

    Rune
    Rune Allnor, Oct 6, 2009
    #9
  10. Carsten Fuchs

    SG Guest

    On 6 Okt., 13:39, Rune Allnor <> wrote:
    >
    > You should choose between *either* human readability *or*
    > data integrity. You can't have both.


    In theory you can have both. The decimal representation just needs to
    be "close enough". A mapping is invertible if it is surjective. If you
    generate enough decimal digits in the double->string mapping, it will
    become surjective.

    In practice I wouldn't know how to implement this. I recently used the
    scientific manipulator in combination with the setprecision
    manipulator and rely on the implementations quality. As precision I
    used std::numerical_limits<double>::digits10+1 hoping for the best.
    But I don't really expect the roundtrip double->string->double to be
    lossless. A little test suggests that the implementation of libstdc++
    is not bad at all:

    #include <iostream>
    #include <sstream>
    #include <string>
    #include <iomanip>
    #include <limits>
    #include <cstdlib>

    using namespace std;

    string d2s(double x)
    {
    stringstream ss;
    ss << setprecision(numeric_limits<double>::digits10+1);
    ss << scientific;
    ss << x;
    return ss.str();
    }

    double s2d(string s)
    {
    stringstream ss(s);
    double x;
    ss >> x;
    return x;
    }

    int main()
    {
    for (int pass=0; pass<20; ++pass) {
    double x = double(rand())/RAND_MAX + 1.0;
    string s = d2s(x);
    double y = s2d(s);
    std::cout << s << ", delta = " << (x-y) << '\n';
    }
    }

    Output:

    1.0012512588885158e+000, delta = 0
    1.5635853144932401e+000, delta = 0
    1.1933042390209663e+000, delta = 0
    1.8087405011139257e+000, delta = 0
    1.5850093081453902e+000, delta = 0
    1.4798730430005798e+000, delta = 0
    1.3502914517654958e+000, delta = 0
    1.8959624011963256e+000, delta = 0
    1.8228400524918362e+000, delta = 0
    1.7466048158207954e+000, delta = 0
    1.1741080965605639e+000, delta = 0
    1.8589434492019410e+000, delta = 0
    1.7105014191106906e+000, delta = 0
    1.5135349589526048e+000, delta = 0
    1.3039948728904081e+000, delta = 0
    1.0149845881527146e+000, delta = 0
    1.0914029358806117e+000, delta = 0
    1.3644520401623585e+000, delta = 0
    1.1473128452406385e+000, delta = 0
    1.1658986175115207e+000, delta = 0

    If 100% certainty is required you probably have to do it yourself
    (somehow) or store the numbers binary


    Cheers,
    SG
    SG, Oct 6, 2009
    #10
  11. On 6 Ott, 13:18, "Fred Zwarts" <> wrote:
    > Francesco S. Carta wrote:
    > > On 6 Ott, 11:55, "Fred Zwarts" <> wrote:
    > >> Francesco S. Carta wrote:
    > >>> On 6 Ott, 11:16, Carsten Fuchs <> wrote:
    > >>>> Dear group,

    >
    > >>>> I would like to serialize a float f1 to a string s, then
    > >>>> unserialize s back to a float f2 again, such that:

    >
    > >>>> * s is minimal (use only the precision that is required)
    > >>>> and preferably in decimal notation,
    > >>>> * f1==f2 (exact same value after the roundtrip)

    >
    > >>>> (The first property is for human readers and file size, the second
    > >>>> is for data integrity.)
    > >>>> Here is my implementation:

    >
    > >>>> std::string serialize(float f1)
    > >>>> {
    > >>>> std::string s;

    >
    > >>>> for (unsigned int prec=6; prec<10; prec++)
    > >>>> {
    > >>>> std::stringstream ss;

    >
    > >>>> ss.precision(prec);
    > >>>> ss << f1;

    >
    > >>>> s=ss.str();

    >
    > >>>> float f2;
    > >>>> ss >> f2;

    >
    > >>>> if (f2==f1) break;
    > >>>> }

    >
    > >>>> return s;

    >
    > >>>> }

    >
    > >>>> It seems to work very well, and I found that in my application 65%
    > >>>> to 90% of the float numbers serialized this way exit the loop in
    > >>>> the first iteration.

    >
    > >>>> However, I was wondering if there is a shorter and/or more elegant
    > >>>> way of implementing this, using either C++ streams, C
    > >>>> printf-functions, or any other technique available in C++.

    >
    > >>> There are a couple of problems with your code.

    >
    > >>> First, you should never compare floats or doubles for equality that
    > >>> way.

    >
    > >> Why not? One of the requirements is "exact same value after the
    > >> roundtrip". What is a better way to compare for exact same value?

    >
    > > The equality operator applied to floating point types simply isn't
    > > affordable.
    > > It may work in such a particular case, but you're not guaranteed it
    > > will.

    >
    > What do you mean? Would it give wrong results?
    > Return false if the operands are equal
    > or return true if operands are different?
    >
    > > Built-in floating point types simply don't give such kind of
    > > certainty, hence I think there is no unequivocal answer to the
    > > question you're asking me.

    >
    > As far as I know floating point types are deterministic well defined
    > types with binary representations for, exactly what it says, floating point
    > values (not to be mixed with the notion of mathematical real numbers).
    > The outcome of operations of floating point types can be
    > predicted with 100% certainty and 100% accuracy.
    > Also the equality operator is well defined
    > If you know how floating point types work,
    > the equality operator can be used very well.
    > If you do not understand how they work,
    > the equality operation may give some surprises,
    > but don't blame the equality operator.


    You are obviously right, I failed to express myself correctly.

    I meant to write: "It may work in such a particular case, but you're
    not guaranteed it will *in some other*".

    My other paragraph was equally too vague - of course, the equality
    operator works fine and in a well determined manner.

    My point was just to highlight the "trickery" of dealing with floating
    point values and their comparability in general. See Rune's post here
    for an example, although I'm well aware that you fully know what we
    are speaking about.

    If the OP was as aware of these issues as we are, I apologize for
    having pointed it out - although I'm happy I did: every time the
    equality operator gets applied to floats, such caveats should be made,
    even just for the occasional reader's wellness sake.

    Thanks for straightening my bad wording.

    --
    Francesco S. Carta, http://fscode.altervista.org
    Francesco S. Carta, Oct 6, 2009
    #11
  12. Rune Allnor <> writes:
    > With misery and mayhem to follow.


    However, there's another way to represent the floating point in ASCII
    that would not lose information, would use decimal positionnal
    representation and would be 'somewhat' readable.

    A floating point number f is represented as:

    f = s * m * B^e

    s being the sign (+1, -1),

    m being the mantissa, usually only the fractional digits, usually in
    base B, but possibly in a base C different from B (eg. some floating
    point formats used B=16 C=2).

    B being the base, and

    e being the exponent.


    m = 0.abc...xyz = abc...xyz * C^-p = M * C^-p

    with:

    M being the digits of the mantissa, expressed as an integer,

    C being the base of the mantissa (usually 2, but could be 10 if BCD
    floating points are used).

    p being the precision of the mantissa, that is the number of digits of
    base C comprising the mantissa.


    Therefore we can write our floating point number f as an expression of
    integers:

    f = s * M * C^-p * B^e

    if C=B then we can even simplify it:

    f = s * M * B^(e-p)


    With a high level programming language, there are primitives allowing
    you to directly convert between f and s, M, B, (e-p):

    (defun print-float (f)
    (multiple-value-bind (M e-p s) (integer-decode-float f)
    (let ((B (float-radix f)))
    (format t "(~A)(~A*~A*~A^~A)" (type-of f) s M B e-p))))

    (defun deserialize-float (type s M B e-p)
    (assert (= B (float-radix (coerce 1.0 type))))
    (* s (scale-float (coerce M type) e-p)))


    C/USER[82]> (PRINT-FLOAT 0.123456)
    (SINGLE-FLOAT)(1*16569984*2^-27)
    NIL
    C/USER[83]> (coerce (* 16569984 (expt 2 -27)) 'single-float)
    0.123456

    C/USER[78]> (PRINT-FLOAT pi)
    (LONG-FLOAT)(1*14488038916154245685*2^-62)
    NIL
    C/USER[79]> (* 14488038916154245685 (expt 2 -62))
    14488038916154245685/4611686018427387904
    C/USER[80]> (coerce (* 14488038916154245685 (expt 2 -62)) 'long-float)
    3.1415926535897932385L0
    C/USER[81]> (= pi (coerce (* 14488038916154245685 (expt 2 -62)) 'long-float))
    T

    C/USER[89]> (deserialize-float 'single-float 1 16569984 2 -27)
    0.123456
    C/USER[90]> (deserialize-float 'long-float 1 14488038916154245685 2 -62)
    3.1415926535897932385L0



    I assume that knowing your specific machine representation of floating
    point, you could do some bit mungling in C++ to implement similar
    functions.

    --
    __Pascal Bourguignon__
    Pascal J. Bourguignon, Oct 6, 2009
    #12
  13. Dear Francesco,

    thanks for your reply!

    Francesco S. Carta wrote:
    > First, you should never compare floats or doubles for equality that
    > way.


    Why?
    I'm not trying to deal with round-off errors here, but to serialize a float with enough digits to be
    able to restore the exact same value from the string later.
    Can you suggest a working alternative?
    Also please see http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2006/n2005.pdf for some background.

    > Secondarily, extracting (>>) non-character data (integers, doubles and
    > so on) from streams can hang.


    Sure, the check for failure is missing, but besides that, what is the problem?
    What's your suggested better alternative?

    > There is another thread running more or less on the same issues. I
    > cannot add further details without spoiling the other one, consider
    > having a look there (the topic is: "Good way to read tuple or tripel
    > from text file?".)


    Sorry, but the thread you're referring to is about file formats and parsing.
    My question is about serializing float numbers in a way so that unserializing them yields the
    original value.

    > You can also reach a lot of useful information from the last two links
    > of my signature.


    Yes, thanks, I even own (and have read) the books paper edition.

    Best regards,
    Carsten



    --
    Cafu - The Game and Graphics Engine for
    multiplayer, cross-platform, real-time 3D Action
    Learn more at www.cafu.de
    Carsten Fuchs, Oct 6, 2009
    #13
  14. Carsten Fuchs

    SG Guest

    Rune Allnor <> wrote:
    > > try 0.5
    > > :)

    >
    > That's one of the few decimal numbers that can be represented
    > exactly on binary format. In fact, decimal numbers on the form
    >
    > x = 2^n


    Yes they can. Not to mention some linear combinations of these numbers
    as long as their n doesn't differ too much. 10 = 2^3 + 2^1, 0.75 =
    2^-1 + 2^-2, etc

    So yes, you can't represent every decimal in binary. But the opposite
    is true. Every binary floating point value can be exactly represented
    in decimal with a finite string of digits.

    Cheers,
    SG
    SG, Oct 6, 2009
    #14
  15. Rune Allnor wrote:
    > On 6 Okt, 14:22, Nick Keighley <>
    > wrote:
    >> On 6 Oct, 12:39, Rune Allnor <> wrote:
    >>
    >>
    >>
    >>
    >>
    >>> On 6 Okt, 11:16, Carsten Fuchs <> wrote:
    >>>> I would like to serialize a float f1 to a string s, then unserialize s back to a float f2
    >>>> again, such that:
    >>>> * s is minimal (use only the precision that is required)
    >>>> and preferably in decimal notation,
    >>>> * f1==f2 (exact same value after the roundtrip)
    >>>> (The first property is for human readers and file size, the second is for data integrity.)
    >>> You should choose between *either* human readability *or*
    >>> data integrity. You can't have both.
    >>> To demonstrate the point, try
    >>> -----------
    >>> #include <iostream>
    >>> int main()
    >>> {
    >>> float a=0.3;
    >>> double b = a;
    >>> std::cout << "a = " << a << std::endl;
    >>> std::cout.precision(12);
    >>> std::cout << "b = " << b << std::endl;
    >>> return 0;}
    >>> -----------
    >>> Output:
    >>> a = 0.3
    >>> b = 0.300000011921
    >>> There are two problems here:
    >>> 1) Numbers that are exact in decimal notation
    >>> have no exact floating-point representation,
    >>> only an approximation.

    >> try 0.5


    Or 0.25, 0.125, 0.0625, etc., or *any combination thereof*. Then factor
    in the exponents.

    >>
    >> :)

    >
    > That's one of the few


    *Few*? You're kidding, of course, aren't you? Using the IEEE
    definition of "single precision float", there are *about* (2^32 - 2^24)
    decimal values that can be represented *exactly*. Each of all the other
    values (infinite number of them, of course) are rounded to one of the
    *more than three billion* representations (my arithmetic may be off a tad).

    > decimal numbers that can be represented
    > exactly on binary format. In fact, decimal numbers on the form
    >
    > x = 2^n
    >
    > where n is selected from a certain subset of integers, can be
    > represented exactly, as binary numbers. If the OP can accept
    > such a constraint on the decimal numbers he wants to work with,
    > then by all means, disregard what I said. But if he wants to
    > work with arbitrary numbers, he is in for trouble.


    No, he is not "in for trouble". He just needs to realise that computer
    representation of the floating point numbers have limitations, and *stay
    within those limitations*. The computer *will* force the programmer to
    "accept such a constraint". There are ways to overcome those, and they
    are well-known, like the use of rational numbers, use of double
    precision representation, or even arbitrary precision. Those methods
    don't really *eliminate* the limitations, only *reduce* them. There is
    still the limitation of the computing power (memory size is the most
    significant one).

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Oct 6, 2009
    #15
  16. Carsten Fuchs

    Fred Zwarts Guest

    Carsten Fuchs wrote:
    > Dear group,
    >
    > I would like to serialize a float f1 to a string s, then unserialize
    > s back to a float f2 again, such that:
    >
    > * s is minimal (use only the precision that is required)
    > and preferably in decimal notation,
    > * f1==f2 (exact same value after the roundtrip)
    >
    > (The first property is for human readers and file size, the second is
    > for data integrity.)
    > Here is my implementation:
    >
    >
    > std::string serialize(float f1)
    > {
    > std::string s;
    >
    > for (unsigned int prec=6; prec<10; prec++)
    > {
    > std::stringstream ss;
    >
    > ss.precision(prec);
    > ss << f1;
    >
    > s=ss.str();
    >
    > float f2;
    > ss >> f2;
    >
    > if (f2==f1) break;
    > }
    >
    > return s;
    > }
    >
    >
    > It seems to work very well, and I found that in my application 65% to
    > 90% of the float numbers serialized this way exit the loop in the
    > first iteration.
    >
    > However, I was wondering if there is a shorter and/or more elegant
    > way of implementing this, using either C++ streams, C
    > printf-functions, or any other technique available in C++.
    >
    > I'd be very grateful for any feedback and advice!
    >
    > Thank you very much, and best regards,


    I think in principle your approach should work.
    The limitation to prec<10 may be to small.
    In the first place it depends on the platform you run the program on.
    Secondly, can you prove that with 10 decimal digits you
    are always close enough to the binary value of the float, so that
    it can be converted back correctly? I am not sure what the
    standard says, but if both << and >> round down when
    converting between decimal and binary representation,
    no finite number of digits may be enough in all cases.
    Fred Zwarts, Oct 6, 2009
    #16
  17. Carsten Fuchs

    Rune Allnor Guest

    On 6 Okt, 14:54, (Pascal J. Bourguignon) wrote:
    > Rune Allnor <> writes:
    > > With misery and mayhem to follow.

    >
    > However, there's another way to represent the floating point in ASCII
    > that would not lose information, would use decimal positionnal
    > representation and would be 'somewhat' readable.
    >
    > A floating point number f is represented as:
    >
    >    f = s * m * B^e
    >
    > s being the sign (+1, -1),
    >
    > m being the mantissa, usually only the fractional digits, usually in
    > base B, but possibly in a base C different from B (eg. some floating
    > point formats used B=16 C=2).
    >
    > B being the base, and
    >
    > e being the exponent.
    >
    > m = 0.abc...xyz = abc...xyz * C^-p = M * C^-p
    >
    > with:
    >
    > M being the digits of the mantissa, expressed as an integer,
    >
    > C being the base of the mantissa (usually 2, but could be 10 if BCD
    > floating points are used).
    >
    > p being the precision of the mantissa, that is the number of digits of
    > base C comprising the mantissa.
    >
    > Therefore we can write our floating point number f as an expression of
    > integers:
    >
    >    f = s * M * C^-p * B^e


    ....so you just reproduce the floating-point represntation
    usually implemented on the binary level, as ASCII?

    What would be the point?

    True, you can get an exact representation PROVIDED you use
    a base-2 representation. As I demonstrated elsewhere, exactness
    is lost due to incompatibilities of decimal and binary
    representations of numbers.

    The base-2 numbers would not make much sense to the human
    reader, though. Once that advantage has been lost, one can just
    as well store the HEX representation of the binary pattern.

    Rune
    Rune Allnor, Oct 6, 2009
    #17
  18. Carsten Fuchs

    Rune Allnor Guest

    On 6 Okt, 15:04, Victor Bazarov <> wrote:
    > Rune Allnor wrote:
    > > On 6 Okt, 14:22, Nick Keighley <>
    > > wrote:
    > >> On 6 Oct, 12:39, Rune Allnor <> wrote:

    >
    > >>> On 6 Okt, 11:16, Carsten Fuchs <> wrote:
    > >>>> I would like to serialize a float f1 to a string s, then unserialize s back to a float f2
    > >>>> again, such that:
    > >>>>         * s is minimal (use only the precision that is required)
    > >>>>           and preferably in decimal notation,
    > >>>>         * f1==f2 (exact same value after the roundtrip)
    > >>>> (The first property is for human readers and file size, the second is for data integrity.)
    > >>> You should choose between *either* human readability *or*
    > >>> data integrity. You can't have both.
    > >>> To demonstrate the point, try
    > >>> -----------
    > >>> #include <iostream>
    > >>> int main()
    > >>> {
    > >>>         float a=0.3;
    > >>>         double b = a;
    > >>>         std::cout << "a = " << a << std::endl;
    > >>>         std::cout.precision(12);
    > >>>         std::cout << "b = " << b << std::endl;
    > >>>         return 0;}
    > >>> -----------
    > >>> Output:
    > >>> a = 0.3
    > >>> b = 0.300000011921
    > >>> There are two problems here:
    > >>> 1) Numbers that are exact in decimal notation
    > >>>    have no exact floating-point representation,
    > >>>    only an approximation.
    > >> try 0.5

    >
    > Or 0.25, 0.125, 0.0625, etc., or *any combination thereof*.  Then factor
    > in the exponents.
    >
    >
    >
    > >> :)

    >
    > > That's one of the few

    >
    > *Few*?  You're kidding, of course, aren't you?


    No. There are infinitely many real numbers between
    any two consecutive FP numbers.

    >  Using the IEEE
    > definition of "single precision float", there are *about* (2^32 - 2^24)
    > decimal values that can be represented *exactly*.  Each of all the other
    > values (infinite number of them, of course) are rounded to one of the
    > *more than three billion* representations (my arithmetic may be off a tad).


    Three billions is still a finite number of representations.
    True, it is large enough to be useful, but still finite.

    >  > decimal numbers that can be represented
    >
    > > exactly on binary format. In fact, decimal numbers on the form

    >
    > > x = 2^n

    >
    > > where n is selected from a certain subset of integers, can be
    > > represented exactly, as binary numbers. If the OP can accept
    > > such a constraint on the decimal numbers he wants to work with,
    > > then by all means, disregard what I said. But if he wants to
    > > work with arbitrary numbers, he is in for trouble.

    >
    > No, he is not "in for trouble".  He just needs to realise that computer
    > representation of the floating point numbers have limitations, and *stay
    > within those limitations*.


    As I understand the question, the OP wants to break out
    of those limitations: Exact conversions between base-10
    and base-2 numbers, eliminating approximation errors etc.

    >  The computer *will* force the programmer to
    > "accept such a constraint".  There are ways to overcome those, and they
    > are well-known, like the use of rational numbers, use of double
    > precision representation, or even arbitrary precision.  Those methods
    > don't really *eliminate* the limitations, only *reduce* them.  There is
    > still the limitation of the computing power (memory size is the most
    > significant one).


    Agreed.

    Rune
    Rune Allnor, Oct 6, 2009
    #18
  19. Dear Rune,

    thank you very much for your reply!

    Rune Allnor wrote:
    > Output:
    >
    > a = 0.3
    > b = 0.300000011921
    >
    > There are two problems here:
    >
    > 1) Numbers that are exact in decimal notation
    > have no exact floating-point representation,
    > only an approximation.


    I understand this, but all I'm looking for is a serialization of "a" that, when converted back to a
    float, yields the same bits for "a" again.

    That is, I don't care about the fact that
    float a=0.3;
    doesn't assign the exact decimal value 0.3 to "a". Due to the inherent limits of floating point
    representations, exactly as you pointed out, the "true" value of "a" will *not* be 0.3.

    But again, this is not what I'm after.
    Instead, I'm looking for
    float a2 = unserialize(serialize(a));
    such that a2==a (fully intentionally using the == comparison with floats).

    If the string that is required for that purpose happens to be "0.3" - very well. If it happens to be
    "0.300000011921", also ok. If only a2==a, I don't care.

    > 2) One can not tell the 'true' number of bits or
    > digits needed to represent any one number.


    Hmmm. In the above context, I disagree. In my point of view, the number of digits is that number
    that is required to obtain a2 such that a2==a.

    Sooo... if I happened to grossly misunderstand your post, please tell me; but afaics, we *can* have
    both human readability *and* data integrity, as demonstrated by the code in my initial post, which
    doesn't fail to meet both requirements. ;-)

    Best regards,
    Carsten



    --
    Cafu - The Game and Graphics Engine for
    multiplayer, cross-platform, real-time 3D Action
    Learn more at www.cafu.de
    Carsten Fuchs, Oct 6, 2009
    #19
  20. Hi,

    SG wrote:
    > In practice I wouldn't know how to implement this. I recently used the
    > scientific manipulator in combination with the setprecision
    > manipulator and rely on the implementations quality. As precision I
    > used std::numerical_limits<double>::digits10+1 hoping for the best.
    > But I don't really expect the roundtrip double->string->double to be
    > lossless.


    Please see
    http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2006/n2005.pdf

    This is what got me started towards this issue!
    In order words, if I understand this document correctly, we *can* have both human readability and
    data integrity in the sense as described in my other posts in this thread.

    Best regards,
    Carsten



    --
    Cafu - The Game and Graphics Engine for
    multiplayer, cross-platform, real-time 3D Action
    Learn more at www.cafu.de
    Carsten Fuchs, Oct 6, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bd
    Replies:
    0
    Views:
    615
  2. sintral
    Replies:
    9
    Views:
    4,324
    Ben Bacarisse
    Dec 7, 2008
  3. fl
    Replies:
    10
    Views:
    434
    Juha Nieminen
    Jan 12, 2012
  4. joe chesak
    Replies:
    7
    Views:
    277
    (r.*n){2}
    Sep 23, 2010
  5. yelipolok
    Replies:
    4
    Views:
    252
    John W. Krahn
    Jan 27, 2010
Loading...

Share This Page