Number formatting

Discussion in 'C++' started by Chris Theis, Oct 11, 2006.

  1. Chris Theis

    Chris Theis Guest

    Hi all,

    I'm currently facing something which is quite annoying and probably one of
    you might have an idea of how to solve it efficiently. I have some software
    (upon which I have no influence!!!!) which delivers data in scientific
    notation and I have to read it. This is fairly simple, but here is the
    tricky thing. This software is written in FORTRAN and shows the following
    feature, which IMHO is rather a bug than a feature. If numbers get very
    small like 7.0614E-238 it starts writing them out as 7.0614-238. So when I
    parse the file what I get is 7.0614 because the minus is seen as a
    separator. Of course I could start reading all the data a strings,
    tokenizing them and start checking for this rather quirky behavior, but this
    would slow down the process of reading the data which can be really huge!
    Does anybody of you have an idea on how to "fix" this problem because I
    cannot change the software which delivers these , IMHO corrupted values,
    which are FORTRAN standard compliant.

    Cheers
    Chris
    Chris Theis, Oct 11, 2006
    #1
    1. Advertising

  2. Chris Theis

    Phlip Guest

    Chris Theis wrote:

    > I'm currently facing something which is quite annoying and probably one of
    > you might have an idea of how to solve it efficiently. I have some
    > software (upon which I have no influence!!!!) which delivers data in
    > scientific notation and I have to read it. This is fairly simple, but here
    > is the tricky thing. This software is written in FORTRAN and shows the
    > following feature, which IMHO is rather a bug than a feature. If numbers
    > get very small like 7.0614E-238 it starts writing them out as 7.0614-238.
    > So when I parse the file what I get is 7.0614 because the minus is seen as
    > a separator. Of course I could start reading all the data a strings,
    > tokenizing them and start checking for this rather quirky behavior, but
    > this would slow down the process of reading the data which can be really
    > huge!


    How do you know that parsing the - would slow the program down?

    Here's a reprehensibly simple parser:

    http://c2.com/cgi/wiki?MsWindowsResourceLint

    Here's one of its member functions:

    string const &
    pullNextToken()
    {
    m_priorToken = m_currentToken;
    extractNextToken();
    return m_currentToken;
    }

    Here's a unit test on that function:

    TEST_(TestCase, pullNextToken)
    {

    Source aSource("a b\nc\n d");

    string
    token = aSource.pullNextToken();
    CPPUNIT_ASSERT_EQUAL("a", token);
    token = aSource.pullNextToken();
    CPPUNIT_ASSERT_EQUAL("b", token);
    token = aSource.pullNextToken();
    CPPUNIT_ASSERT_EQUAL("c", token);
    token = aSource.pullNextToken();
    CPPUNIT_ASSERT_EQUAL("d", token);
    token = aSource.pullNextToken();
    CPPUNIT_ASSERT_EQUAL("" , token); // EOF!

    }

    Now imagine if you wrote a dirt-simple parser, using fstream goodies, and
    you also wrote unit tests like that. You could add a test that calls a hard
    function ten thousand times, and then asserts that the CPU time didn't
    exceed some obvious limit, like a thousandth of a second.

    You will probably discover that your parser is not slow. If you only stream
    characters, and never buffer strings into std::string (possibly slow), then
    all your code might run inside the CPU's cache, without excessive data
    motion on the main bus.

    Never guess what could be slow; measure.

    --
    Phlip
    http://www.greencheese.us/ZeekLand <-- NOT a blog!!!
    Phlip, Oct 11, 2006
    #2
    1. Advertising

  3. Chris Theis

    F.J.K. Guest

    Chris Theis wrote:
    > Hi all,
    >
    > I'm currently facing something which is quite annoying and probably one of
    > you might have an idea of how to solve it efficiently. I have some software
    > (upon which I have no influence!!!!) which delivers data in scientific
    > notation and I have to read it. This is fairly simple, but here is the
    > tricky thing. This software is written in FORTRAN and shows the following
    > feature, which IMHO is rather a bug than a feature. If numbers get very
    > small like 7.0614E-238 it starts writing them out as 7.0614-238. So when I
    > parse the file what I get is 7.0614 because the minus is seen as a
    > separator. Of course I could start reading all the data a strings,
    > tokenizing them and start checking for this rather quirky behavior, but this
    > would slow down the process of reading the data which can be really huge!
    > Does anybody of you have an idea on how to "fix" this problem because I
    > cannot change the software which delivers these , IMHO corrupted values,
    > which are FORTRAN standard compliant.
    >
    > Cheers
    > Chris


    Pretty much every programmer of scientific code has had that "joy". I'd
    be interested myself, whether there's some secret "Fortran locale",
    that would make all of this obsolete. Looking at LC_NUMERIC and co. I
    doubt so :(

    In C++ I use code like the following. If you really, really need to go
    for speed, you'll have to roll your parser yourself. However, if speed
    was an absolute issue, you'd be reading/writing binary data anyways, so
    there's no point. Btw, it would be pretty easy to fix this problem from
    the fortran side.

    #include <iostream>
    #include <cmath>
    #include <sstream>

    struct fortran_double {
    fortran_double operator = (const double d) {
    value=d;
    return *this;
    }
    operator double() const {return value;}
    friend std::istream& operator >> (std::istream &in,
    fortran_double &fd);
    private:
    double value;
    };
    template <typename T>
    inline T exp10 (T x)
    {
    static T log_10 = std::log(static_cast<T>(10.0));
    return exp(log_10 * x);
    }

    std::istream& operator >> (std::istream &in, fortran_double &fd) {
    double d;
    int mantissa;
    in >> d;
    char ch=in.peek();
    if (ch=='+' || ch=='-') {
    in >> mantissa;
    d*=exp10(static_cast<double> (mantissa));
    }
    fd = d;
    return in;
    }

    int main () {
    double x=0;
    fortran_double fd;
    std::istringstream in("1.2344-200");
    in >> fd;
    x=fd;
    std::cout << x << "\n";
    }
    F.J.K., Oct 11, 2006
    #3
  4. Chris Theis

    Chris Theis Guest

    "Phlip" <> wrote in message
    news:b%6Xg.15412$...
    > Chris Theis wrote:
    >
    >> I'm currently facing something which is quite annoying and probably one
    >> of you might have an idea of how to solve it efficiently. I have some
    >> software (upon which I have no influence!!!!) which delivers data in
    >> scientific notation and I have to read it. This is fairly simple, but
    >> here is the tricky thing. This software is written in FORTRAN and shows
    >> the following feature, which IMHO is rather a bug than a feature. If
    >> numbers get very small like 7.0614E-238 it starts writing them out as
    >> 7.0614-238. So when I parse the file what I get is 7.0614 because the
    >> minus is seen as a separator. Of course I could start reading all the
    >> data a strings, tokenizing them and start checking for this rather quirky
    >> behavior, but this would slow down the process of reading the data which
    >> can be really huge!

    >
    > How do you know that parsing the - would slow the program down?
    >
    > Here's a reprehensibly simple parser:
    >
    > http://c2.com/cgi/wiki?MsWindowsResourceLint
    >
    > Here's one of its member functions:
    >
    > string const &
    > pullNextToken()
    > {
    > m_priorToken = m_currentToken;
    > extractNextToken();
    > return m_currentToken;
    > }
    >
    > Here's a unit test on that function:
    >
    > TEST_(TestCase, pullNextToken)
    > {
    >
    > Source aSource("a b\nc\n d");
    >
    > string
    > token = aSource.pullNextToken();
    > CPPUNIT_ASSERT_EQUAL("a", token);
    > token = aSource.pullNextToken();
    > CPPUNIT_ASSERT_EQUAL("b", token);
    > token = aSource.pullNextToken();
    > CPPUNIT_ASSERT_EQUAL("c", token);
    > token = aSource.pullNextToken();
    > CPPUNIT_ASSERT_EQUAL("d", token);
    > token = aSource.pullNextToken();
    > CPPUNIT_ASSERT_EQUAL("" , token); // EOF!
    >
    > }
    >
    > Now imagine if you wrote a dirt-simple parser, using fstream goodies, and
    > you also wrote unit tests like that. You could add a test that calls a
    > hard function ten thousand times, and then asserts that the CPU time
    > didn't exceed some obvious limit, like a thousandth of a second.
    >
    > You will probably discover that your parser is not slow. If you only
    > stream characters, and never buffer strings into std::string (possibly
    > slow), then all your code might run inside the CPU's cache, without
    > excessive data motion on the main bus.
    >
    > Never guess what could be slow; measure.


    Hi Phlip,

    now you're actually guessing that I didn't measure, aren't you? ;-) Well,
    the thing is that at one point I would have to use strings to assemble the
    total number and finally convert it into a double. All of this is more work
    than simply reading and storing a value. Therefore, I was looking for a
    solution which doesn't necessarily need to re-assemble numbers via
    strings/characters but rather some way to emulate this quirky FORTRAN
    format. Although, I more and more get the impression that this simply
    doesn't work and I will have to try to convince the responsponsible people
    to adjust their format specifiers, as it's just a couple of key punches for
    them, whereas I would have to invest quite some time to solve this.

    Thanks
    Chris
    Chris Theis, Oct 11, 2006
    #4
  5. Chris Theis

    Howard Guest

    "Phlip" <> wrote in message
    news:b%6Xg.15412$...
    > Chris Theis wrote:


    >
    > Here's a unit test on that function:
    >


    > Now imagine if you wrote a dirt-simple parser, using fstream goodies, and
    > you also wrote unit tests like that.


    You've got a little crush on that "unit test" thingie, don't you? C'mon,
    fess up, you know you like it...

    ;-)
    Howard, Oct 11, 2006
    #5
  6. Chris Theis

    Phlip Guest

    Howard wrote:

    > You've got a little crush on that "unit test" thingie, don't you? C'mon,
    > fess up, you know you like it...


    A "crush"? You might also call it a marriage...

    Chris Theis wrote:

    >> How do you know that parsing the - would slow the program down?


    > now you're actually guessing that I didn't measure, aren't you? ;-)


    I answer "premature optimization is the root of all evil" too often here...

    > Well, the thing is that at one point I would have to use strings to
    > assemble the total number and finally convert it into a double.


    At the bottom of my post I hinted that dealing in streams instead of strings
    would be faster, and more like a parser.

    So if you put my technique together with F.J.K.'s, you could use his main()
    as your first unit test.

    > All of this is more work than simply reading and storing a value.


    More coding for you or more work for the CPU? F.J.K.'s solution shows how to
    parse and treat each number as you get it, without putting the numbers into
    separate std::string objects or anything like that.

    > ...Although, I more and more get the impression that this simply doesn't
    > work and I will have to try to convince the responsponsible people to
    > adjust their format specifiers, as it's just a couple of key punches for
    > them, whereas I would have to invest quite some time to solve this.


    And in terms of process, one fixes a bug as close as possible to its source.
    Don't output a bug, then detect it and clean up after it with extra
    statements.

    --
    Phlip
    http://www.greencheese.us/ZeekLand <-- NOT a blog!!!
    Phlip, Oct 12, 2006
    #6
  7. Chris Theis

    Chris Theis Guest

    Hi there,

    > Pretty much every programmer of scientific code has had that "joy". I'd
    > be interested myself, whether there's some secret "Fortran locale",
    > that would make all of this obsolete. Looking at LC_NUMERIC and co. I
    > doubt so :(


    I did some research but I honestly doubt so too :-(

    >
    > In C++ I use code like the following. If you really, really need to go
    > for speed, you'll have to roll your parser yourself. However, if speed
    > was an absolute issue, you'd be reading/writing binary data anyways, so
    > there's no point.


    Binary is a little complicated as we have to remain portable for a lot of
    platforms and there are some backwards compatibility issues with the program
    delivering the data already. So this topic is unfortunately a little touchy
    and beyond my influence.

    > Btw, it would be pretty easy to fix this problem from
    > the fortran side.


    Yes that's for sure - it would be adding "E3" to the format string and
    that's it. But the tricky thing is to convice the responsible, a hardcore
    FORTRAN developer, to acknowledge that something like 7.0631-236 is an
    expression and not a proper scientifc notation for a value ;-)

    Thanks for the code - it's pretty much what I finally came up with and
    implemented as a first work-around.

    Thanks a lot guys
    Chris
    Chris Theis, Oct 12, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rguti

    Number formatting

    rguti, Jun 10, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    391
    Craig Deelsnyder
    Jun 10, 2004
  2. Luis Esteban Valencia

    Formatting decimal number

    Luis Esteban Valencia, Jan 12, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    525
    Philip Q [MVP]
    Jan 12, 2005
  3. probashi

    Number Formatting Question C#

    probashi, Feb 23, 2005, in forum: ASP .Net
    Replies:
    4
    Views:
    15,955
    Karl Seguin
    Feb 23, 2005
  4. thomson

    Number Formatting

    thomson, Aug 3, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    351
  5. Thor W Hammer

    Formatting a number without rounding

    Thor W Hammer, Nov 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    491
    Karl Seguin
    Nov 22, 2005
Loading...

Share This Page