slow complex<double>'s

Greg Buchholz · Mar 5, 2006

/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all

....I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2

....I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

Curious,

Greg Buchholz
*/

// Takes an integer argument "n" on the command line and generates a
// PBM bitmap of the Mandelbrot set on stdout.
// see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )

#include<iostream>
#include<complex>

int main (int argc, char **argv)
{
char bit_num = 0, byte_acc = 0;
const int iter = 50;
const double limit_sqr = 2.0 * 2.0;

std::ios_base::sync_with_stdio(false);
int n = atoi(argv[1]);

std::cout << "P4\n" << n << " " << n << std::endl;

for(int y=0; y<n; ++y)
for(int x=0; x<n; ++x)
{
std::complex<double> Z(0.0,0.0);
std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n -
1.0);

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
0x00:0x01);

if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc =
0; }
else if(x == n-1) { byte_acc <<= (8-n%8);
std::cout << byte_acc;
bit_num = byte_acc = 0; }
}
}

Jerry Coffin · Mar 5, 2006

@z34g2000cwc.googlegroups.com>,
(e-mail address removed) says...

[ ... ]

...I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

[ ... ]

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

Hmm...try this:

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
Z *= Z; Z += C; }

No guarantee, but I think it's worth a shot.

Greg Buchholz · Mar 5, 2006

Jerry said:
Hmm...try this:

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
Z *= Z; Z += C; }

No guarantee, but I think it's worth a shot.

Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.

Greg Buchholz

Fei Liu · Mar 6, 2006

Greg said:
Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.

Greg Buchholz

profile your code and find out what's causing the slowdown...

peter koch · Mar 6, 2006

Greg said:
/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all

...I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2

...I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

Curious,

Greg Buchholz
*/ [snip]

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
0x00:0x01);

[snip]

Could it be that some large time is spent in calculating "norm"? I
doubt that the C-version does so - and it should not be necesarry. Some
simple test should fit the bill (or at least avoid calling norm all the
time).

/Peter

Jerry Coffin · Mar 6, 2006

@e56g2000cwe.googlegroups.com>,
(e-mail address removed) says...

Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.

Well, that takes out the possibility that seemed most
obvious to me. Depending on your bent, your next step
would be either a profiler or examining the code the
compiler's producing. For large chunks of code, the
former works well, but for small amounts that you want to
examine in maximum detail the latter can be useful as
well.

Greg Buchholz · Mar 6, 2006

Greg said:
...I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++.

Looks like the problem can be solved by manually inlining the
definition of "norm"...

//manually inlining "norm" results in a 5x-7x speedup on g++
for(int i=0; i<iter and
(Z.real()*Z.real() + Z.imag()*Z.imag()) <= limit_sqr; ++i)
Z = Z*Z + C;

....For some reason g++ must not have been able to inline it (or does so
after common subexpression elimination or somesuch).

Greg Buchholz

Bill Shortall · Mar 6, 2006

Greg Buchholz said:
/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all

...I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc
&id=2

...I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

Curious,

Greg Buchholz
*/

// Takes an integer argument "n" on the command line and generates a
// PBM bitmap of the Mandelbrot set on stdout.
// see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )

#include<iostream>
#include<complex>

int main (int argc, char **argv)
{
char bit_num = 0, byte_acc = 0;
const int iter = 50;
const double limit_sqr = 2.0 * 2.0;

std::ios_base::sync_with_stdio(false);
int n = atoi(argv[1]);

std::cout << "P4\n" << n << " " << n << std::endl;

for(int y=0; y<n; ++y)
for(int x=0; x<n; ++x)
{
std::complex<double> Z(0.0,0.0);
std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n -
1.0);

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
0x00:0x01);

if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc =
0; }
else if(x == n-1) { byte_acc <<= (8-n%8);
std::cout << byte_acc;
bit_num = byte_acc = 0; }
}
}

------------------------------------------------
Hi Greg,
I have had similiar problems when
using the <complex> library for
Microsoft VC6. It ran at about half the
expected speed . After looking thru
the header file I saw that the C++
structure was somewhat involved with
a base class and several derived classes.
I ended up writing my own very simple
complex class looking like

namespace std
{
template <class Tc>
class ppcomplex
{
public:
Tc re;
Tc im;

ppcomplex(){re = 0;im = 0;}
ppcomplex(const Tc& r,const Tc& i) : re(r), im(i) {}
ppcomplex(const Tc& r) : re(r), im((Tc)0) {}

Tc real() const { return re;}
Tc imag() const { return im;}

Tc real(const Tc& x) { return ( re = x):}
Tc imag(const Tc& x) { return ( im = x):}
// the usual assignment operators
ppcomplex(const ppcomplex<Tc>& z)
{this->re = z.re; this->im = z.im;}
ppcomplex<Tc>& operator =(const ppcomplex<Tc>& y) {
if(this != &y)
{this->re = y.re; this->im = y.im;} return *this; }
ppcomplex<Tc>& operator =(const Tc& r)
{ this->re = r, this->im = (Tc)0; return *this;}

etc --- etc ---etc more stuff here

// updating by a real constant

ppcomplex<Tc>& operator +=(const Tc& y)
{ re += y; return *this;}

// more stuff

}; // end of class ppcomplex

This ran twice as fast ! so maybe you have the same problem i.e. your
complex class is just too complicated ?

Regards....Bill

Marcus Kwok · Mar 6, 2006

Bill Shortall said:
namespace std
{
template <class Tc>
class ppcomplex
{

You are not allowed to introduce your own names to namespace std. IIRC,
you are only allowed to add specializations of the standard template
classes, when specializing on user-defined classes.

Bill Shortall · Mar 7, 2006

Marcus Kwok said:
You are not allowed to introduce your own names to namespace std. IIRC,
you are only allowed to add specializations of the standard template
classes, when specializing on user-defined classes.

OK -- change ppcomplex to complex
call it's header file <complex> and
remove the old one

Cannot convert (double) to (double*)	1	Sep 5, 2022
Crossword	2	May 11, 2020
Where is my mistake? Why is s equal to minus infinity at some loop iterations?	0	Oct 9, 2022
Programming math challenge gives wrong answer	2	Aug 6, 2023
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
Dont work, it´s something whit the loops?	1	Jun 30, 2021
Crossword	14	May 13, 2020
How to sort vector<complex<double> >	5	Jun 25, 2008

slow complex<double>'s

Greg Buchholz

Jerry Coffin

Greg Buchholz

Fei Liu

peter koch

Jerry Coffin

Greg Buchholz

Bill Shortall

Marcus Kwok

Bill Shortall

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads