slow complex<double>'s

Discussion in 'C++' started by Greg Buchholz, Mar 5, 2006.

  1. /*
    While writing a C++ version of the Mandelbrot benchmark over at the
    "The Great Computer Language Shootout"...


    http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all

    ....I've come across the issue that complex<double>'s seem quite slow
    unless compiled with -ffast-math. Of course doing that results in
    incorrect answers because of rounding issues. The speed difference for
    the program below is between 5x-8x depending on the version of g++. It
    is also about 5 times slower than the corresponding gcc version at...

    http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2

    ....I'd be interesting in learning the reason for the speed difference.
    Sure, the C version is slightly more optimized, but I was thinking that
    the C++ code should only be 20-50% slower, not 750% slower like I get
    with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
    compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
    do with temporaries not being optimized away, or somesuch? A
    limitation of the x87 instruction set? Is it inherent in the way the
    C++ Standard requires complex<double>'s to be calculated? My bad coding
    style? Limitations imposed by g++?

    Curious,

    Greg Buchholz
    */

    // Takes an integer argument "n" on the command line and generates a
    // PBM bitmap of the Mandelbrot set on stdout.
    // see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )

    #include<iostream>
    #include<complex>

    int main (int argc, char **argv)
    {
    char bit_num = 0, byte_acc = 0;
    const int iter = 50;
    const double limit_sqr = 2.0 * 2.0;

    std::ios_base::sync_with_stdio(false);
    int n = atoi(argv[1]);

    std::cout << "P4\n" << n << " " << n << std::endl;

    for(int y=0; y<n; ++y)
    for(int x=0; x<n; ++x)
    {
    std::complex<double> Z(0.0,0.0);
    std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n -
    1.0);

    for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
    C;

    byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
    0x00:0x01);

    if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc =
    0; }
    else if(x == n-1) { byte_acc <<= (8-n%8);
    std::cout << byte_acc;
    bit_num = byte_acc = 0; }
    }
    }
     
    Greg Buchholz, Mar 5, 2006
    #1
    1. Advertising

  2. Greg Buchholz

    Jerry Coffin Guest

    In article <1141588925.900212.137230
    @z34g2000cwc.googlegroups.com>,
    says...

    [ ... ]

    > ...I'd be interesting in learning the reason for the speed difference.
    > Sure, the C version is slightly more optimized, but I was thinking that
    > the C++ code should only be 20-50% slower, not 750% slower like I get
    > with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
    > compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
    > do with temporaries not being optimized away, or somesuch? A
    > limitation of the x87 instruction set? Is it inherent in the way the
    > C++ Standard requires complex<double>'s to be calculated? My bad coding
    > style? Limitations imposed by g++?


    [ ... ]

    > for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
    > C;


    Hmm...try this:

    for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
    Z *= Z; Z += C; }

    No guarantee, but I think it's worth a shot.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Mar 5, 2006
    #2
    1. Advertising

  3. Jerry Coffin wrote:
    > Hmm...try this:
    >
    > for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
    > Z *= Z; Z += C; }
    >
    > No guarantee, but I think it's worth a shot.


    Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.

    Greg Buchholz
     
    Greg Buchholz, Mar 5, 2006
    #3
  4. Greg Buchholz

    Fei Liu Guest

    Greg Buchholz wrote:
    > Jerry Coffin wrote:
    > > Hmm...try this:
    > >
    > > for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
    > > Z *= Z; Z += C; }
    > >
    > > No guarantee, but I think it's worth a shot.

    >
    > Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.
    >
    > Greg Buchholz


    profile your code and find out what's causing the slowdown...
     
    Fei Liu, Mar 6, 2006
    #4
  5. Greg Buchholz

    peter koch Guest

    Greg Buchholz wrote:
    > /*
    > While writing a C++ version of the Mandelbrot benchmark over at the
    > "The Great Computer Language Shootout"...
    >
    >
    > http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all
    >
    > ...I've come across the issue that complex<double>'s seem quite slow
    > unless compiled with -ffast-math. Of course doing that results in
    > incorrect answers because of rounding issues. The speed difference for
    > the program below is between 5x-8x depending on the version of g++. It
    > is also about 5 times slower than the corresponding gcc version at...
    >
    > http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2
    >
    > ...I'd be interesting in learning the reason for the speed difference.
    > Sure, the C version is slightly more optimized, but I was thinking that
    > the C++ code should only be 20-50% slower, not 750% slower like I get
    > with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
    > compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
    > do with temporaries not being optimized away, or somesuch? A
    > limitation of the x87 instruction set? Is it inherent in the way the
    > C++ Standard requires complex<double>'s to be calculated? My bad coding
    > style? Limitations imposed by g++?
    >
    > Curious,
    >
    > Greg Buchholz
    > */

    [snip]
    >
    > for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
    > C;
    >
    > byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
    > 0x00:0x01);

    [snip]

    Could it be that some large time is spent in calculating "norm"? I
    doubt that the C-version does so - and it should not be necesarry. Some
    simple test should fit the bill (or at least avoid calling norm all the
    time).

    /Peter
     
    peter koch, Mar 6, 2006
    #5
  6. Greg Buchholz

    Jerry Coffin Guest

    In article <1141602729.936509.269620
    @e56g2000cwe.googlegroups.com>,
    says...
    >
    > Jerry Coffin wrote:
    > > Hmm...try this:
    > >
    > > for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
    > > Z *= Z; Z += C; }
    > >
    > > No guarantee, but I think it's worth a shot.

    >
    > Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.


    Well, that takes out the possibility that seemed most
    obvious to me. Depending on your bent, your next step
    would be either a profiler or examining the code the
    compiler's producing. For large chunks of code, the
    former works well, but for small amounts that you want to
    examine in maximum detail the latter can be useful as
    well.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Mar 6, 2006
    #6
  7. Greg Buchholz wrote:
    > ...I've come across the issue that complex<double>'s seem quite slow
    > unless compiled with -ffast-math. Of course doing that results in
    > incorrect answers because of rounding issues. The speed difference for
    > the program below is between 5x-8x depending on the version of g++.


    Looks like the problem can be solved by manually inlining the
    definition of "norm"...

    //manually inlining "norm" results in a 5x-7x speedup on g++
    for(int i=0; i<iter and
    (Z.real()*Z.real() + Z.imag()*Z.imag()) <= limit_sqr; ++i)
    Z = Z*Z + C;

    ....For some reason g++ must not have been able to inline it (or does so
    after common subexpression elimination or somesuch).

    Greg Buchholz
     
    Greg Buchholz, Mar 6, 2006
    #7
  8. "Greg Buchholz" <> wrote in message
    news:...
    > /*
    > While writing a C++ version of the Mandelbrot benchmark over at the
    > "The Great Computer Language Shootout"...
    >
    >
    >

    http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all
    >
    > ...I've come across the issue that complex<double>'s seem quite slow
    > unless compiled with -ffast-math. Of course doing that results in
    > incorrect answers because of rounding issues. The speed difference for
    > the program below is between 5x-8x depending on the version of g++. It
    > is also about 5 times slower than the corresponding gcc version at...
    >
    >

    http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc
    &id=2
    >
    > ...I'd be interesting in learning the reason for the speed difference.
    > Sure, the C version is slightly more optimized, but I was thinking that
    > the C++ code should only be 20-50% slower, not 750% slower like I get
    > with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
    > compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
    > do with temporaries not being optimized away, or somesuch? A
    > limitation of the x87 instruction set? Is it inherent in the way the
    > C++ Standard requires complex<double>'s to be calculated? My bad coding
    > style? Limitations imposed by g++?
    >
    > Curious,
    >
    > Greg Buchholz
    > */
    >
    > // Takes an integer argument "n" on the command line and generates a
    > // PBM bitmap of the Mandelbrot set on stdout.
    > // see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )
    >
    > #include<iostream>
    > #include<complex>
    >
    > int main (int argc, char **argv)
    > {
    > char bit_num = 0, byte_acc = 0;
    > const int iter = 50;
    > const double limit_sqr = 2.0 * 2.0;
    >
    > std::ios_base::sync_with_stdio(false);
    > int n = atoi(argv[1]);
    >
    > std::cout << "P4\n" << n << " " << n << std::endl;
    >
    > for(int y=0; y<n; ++y)
    > for(int x=0; x<n; ++x)
    > {
    > std::complex<double> Z(0.0,0.0);
    > std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n -
    > 1.0);
    >
    > for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
    > C;
    >
    > byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
    > 0x00:0x01);
    >
    > if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc =
    > 0; }
    > else if(x == n-1) { byte_acc <<= (8-n%8);
    > std::cout << byte_acc;
    > bit_num = byte_acc = 0; }
    > }
    > }
    >


    ------------------------------------------------
    Hi Greg,
    I have had similiar problems when
    using the <complex> library for
    Microsoft VC6. It ran at about half the
    expected speed . After looking thru
    the header file I saw that the C++
    structure was somewhat involved with
    a base class and several derived classes.
    I ended up writing my own very simple
    complex class looking like

    namespace std
    {
    template <class Tc>
    class ppcomplex
    {
    public:
    Tc re;
    Tc im;

    ppcomplex(){re = 0;im = 0;}
    ppcomplex(const Tc& r,const Tc& i) : re(r), im(i) {}
    ppcomplex(const Tc& r) : re(r), im((Tc)0) {}

    Tc real() const { return re;}
    Tc imag() const { return im;}

    Tc real(const Tc& x) { return ( re = x):}
    Tc imag(const Tc& x) { return ( im = x):}
    // the usual assignment operators
    ppcomplex(const ppcomplex<Tc>& z)
    {this->re = z.re; this->im = z.im;}
    ppcomplex<Tc>& operator =(const ppcomplex<Tc>& y) {
    if(this != &y)
    {this->re = y.re; this->im = y.im;} return *this; }
    ppcomplex<Tc>& operator =(const Tc& r)
    { this->re = r, this->im = (Tc)0; return *this;}

    etc --- etc ---etc more stuff here

    // updating by a real constant

    ppcomplex<Tc>& operator +=(const Tc& y)
    { re += y; return *this;}

    // more stuff

    }; // end of class ppcomplex

    This ran twice as fast ! so maybe you have the same problem i.e. your
    complex class is just too complicated ?

    Regards....Bill
     
    Bill Shortall, Mar 6, 2006
    #8
  9. Greg Buchholz

    Marcus Kwok Guest

    Bill Shortall <> wrote:
    > namespace std
    > {
    > template <class Tc>
    > class ppcomplex
    > {


    You are not allowed to introduce your own names to namespace std. IIRC,
    you are only allowed to add specializations of the standard template
    classes, when specializing on user-defined classes.

    --
    Marcus Kwok
     
    Marcus Kwok, Mar 6, 2006
    #9
  10. "Marcus Kwok" <> wrote in message
    news:dui7sh$pd7$...
    > Bill Shortall <> wrote:
    > > namespace std
    > > {
    > > template <class Tc>
    > > class ppcomplex
    > > {

    >
    > You are not allowed to introduce your own names to namespace std. IIRC,
    > you are only allowed to add specializations of the standard template
    > classes, when specializing on user-defined classes.
    >
    > --
    > Marcus Kwok


    OK -- change ppcomplex to complex
    call it's header file <complex> and
    remove the old one
     
    Bill Shortall, Mar 7, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. news.amnet.net.au
    Replies:
    1
    Views:
    595
    =?UTF-8?b?TMSByrtpZSBUZWNoaWU=?=
    Apr 13, 2004
  2. Stanimir Stamenkov
    Replies:
    2
    Views:
    772
    Stanimir Stamenkov
    Oct 25, 2005
  3. Sydex
    Replies:
    12
    Views:
    6,601
    Victor Bazarov
    Feb 17, 2005
  4. J.M.
    Replies:
    3
    Views:
    655
    Sarath
    Mar 6, 2007
  5. Replies:
    5
    Views:
    449
    James Kanze
    Jun 27, 2008
Loading...

Share This Page