newbie: capabilities of c++ in numerical computation

  • Thread starter Kamaraju Kusumanchi
  • Start date
K

Kamaraju Kusumanchi

Hi gurus,
I am relatively newbie in c++ programming regarding numerical
computation. I am looking for advice from someone who have used c++ in
numerical computation. To be specific these are my following questions.
Any references, links would be useful.

1) Does object oriented programming introduce significant overhead as
opposed to using just arrays. I know this is a very subjective issue and
depends on operating system, compiler... What I have in mind is

class vector
double x[3]

some functions here
end class

class node
vector position, velocity
double temp, pres, density

some functions here
end class

Now say if I am having

node domain[128][128][128] // 3 dimensional array

In a given function usually, I will be accessing just one quantity (say
velocity) at all nodes in the domain.

Now every time I access velocity, the function has to first see to what
node velocity belongs + call the function to get velocity + determine
what component of velocity is needed. I am looking for some numbers on
how this overhead compares to the traditional way of doing things. (By
traditional I mean there are no classes but just 3-d arrays of
xvelocity, yvelocity, zvelocity, pressure etc)

For the above model, Can I quantify the overhead for each such call(say
1-5% etc) Some kind of ballpark figure would be sufficient. I am using
g++ 3.3.3 as my compiler on a debian testing distribution.

2) Does object oriented programming significantly affect parallelising
the code? Is MPI support in C++ good enough or is there any better
parallel programming language?

3) Does OOP overhead significantly affect the performance of code when
calling fftw(3.0) libraries?

As you probably would have guessed by now, I am a fortran programmer
trying to see if it is worth shifting to OOP for numerical computation
purposes. Should I stick to Fortran 90 or is it worth shifting to c++?

Any kind of references, helpful hints are appreciated.

thanks
raju
 
V

Victor Bazarov

Kamaraju Kusumanchi said:
Hi gurus,
I am relatively newbie in c++ programming regarding numerical
computation. I am looking for advice from someone who have used c++ in
numerical computation. To be specific these are my following questions.
Any references, links would be useful.

Get a good book on C++.
1) Does object oriented programming introduce significant overhead a
No.

2) Does object oriented programming significantly affect parallelising
the code? Is MPI support in C++ good enough or is there any better
parallel programming language?

There is no MPI support in C++. It's all done by libraries outside
the language proper.
3) Does OOP overhead significantly affect the performance of code when
calling fftw(3.0) libraries?

I don't know what fftw(3.0) is, but see the answer to question 1.

Victor
 
C

Cy Edmunds

Kamaraju Kusumanchi said:
Hi gurus,
I am relatively newbie in c++ programming regarding numerical
computation. I am looking for advice from someone who have used c++ in
numerical computation. To be specific these are my following questions.
Any references, links would be useful.

1) Does object oriented programming introduce significant overhead as
opposed to using just arrays. I know this is a very subjective issue and
depends on operating system, compiler... What I have in mind is

class vector
double x[3]

some functions here
end class

class node
vector position, velocity
double temp, pres, density

some functions here
end class

Now say if I am having

node domain[128][128][128] // 3 dimensional array

In a given function usually, I will be accessing just one quantity (say
velocity) at all nodes in the domain.

Now every time I access velocity, the function has to first see to what
node velocity belongs + call the function to get velocity + determine
what component of velocity is needed. I am looking for some numbers on
how this overhead compares to the traditional way of doing things. (By
traditional I mean there are no classes but just 3-d arrays of
xvelocity, yvelocity, zvelocity, pressure etc)

For the above model, Can I quantify the overhead for each such call(say
1-5% etc) Some kind of ballpark figure would be sufficient. I am using
g++ 3.3.3 as my compiler on a debian testing distribution.

2) Does object oriented programming significantly affect parallelising
the code? Is MPI support in C++ good enough or is there any better
parallel programming language?

3) Does OOP overhead significantly affect the performance of code when
calling fftw(3.0) libraries?

As you probably would have guessed by now, I am a fortran programmer
trying to see if it is worth shifting to OOP for numerical computation
purposes. Should I stick to Fortran 90 or is it worth shifting to c++?

Any kind of references, helpful hints are appreciated.

thanks
raju

I would say you would do better as a good Fortran programmer than you would
as a bad C++ programmer. If you are going to use C++ be prepared to take
some time to learn it. It's kind of complicated and although it has great
power it also has a lot of traps. Read the best textbooks you can find and
work on it for a few months. If you do you won't be tempted by Fortran
again.

Study the way the standard library (particularly the part formerly known as
the Standard Template Library or STL) works. Use existing containers rather
than writing your own. For high performance use highly optimized C library
functions (fftw being a perfect example) using pointers obtained from the
C++ containers. This way the dreaded "OOP overhead" can be pretty much
eliminated.

That's what I do and I was a Fortran programmer for many years.
 
M

Mark Ng

Kamaraju Kusumanchi said:
1) Does object oriented programming introduce significant overhead as
opposed to using just arrays. I know this is a very subjective issue and
depends on operating system, compiler... What I have in mind is

Generally no, your code incurs OOP overhead only if you
use the OOP features (but you probably end up with other
overheads when you don't use OOP)
class vector
double x[3]

some functions here
end class

class node
vector position, velocity
double temp, pres, density

some functions here
end class

Now say if I am having

node domain[128][128][128] // 3 dimensional array

In a given function usually, I will be accessing just one quantity (say
velocity) at all nodes in the domain.

Now every time I access velocity, the function has to first see to what
node velocity belongs + call the function to get velocity + determine
what component of velocity is needed. I am looking for some numbers on

this is done at compile time, not runtime
how this overhead compares to the traditional way of doing things. (By
traditional I mean there are no classes but just 3-d arrays of
xvelocity, yvelocity, zvelocity, pressure etc)

Sounds more like you are concern about locality, which I'd argue
is more of an OS question than language (I assume you are talking
about seperate arrays of xvelocity, pressure, etc)... the thing
is you'd probably be able to implement either case in most languages
(can't you do the class thing in Fortran? May be not exactly with
member functions, but perhaps with functions taking those structs
as arguments...)... I kind of think what you are suggesting is
not specific to C++
2) Does object oriented programming significantly affect parallelising
the code? Is MPI support in C++ good enough or is there any better
parallel programming language?

I myself have used MPI with C++, other than the fact that the
implementation I used was not MT-safe (ack! and the documentation
neglected to mention that!), I'd no complain.
As you probably would have guessed by now, I am a fortran programmer
trying to see if it is worth shifting to OOP for numerical computation
purposes. Should I stick to Fortran 90 or is it worth shifting to c++?

A very subjective question don't you think?
It's obvious that you don't want to move to C++ for the sake of
just moving to C++. You haven't mention why you'd want to use C++
(what feature of the language you are taking advantage of) so there's
no way for me to say I guess.
 
B

Benoit Mathieu

1) Does object oriented programming introduce significant overhead as
opposed to using just arrays. I know this is a very subjective issue and
depends on operating system, compiler... What I have in mind is

class vector
double x[3]

some functions here
end class

class node
vector position, velocity
double temp, pres, density

some functions here
end class

Now say if I am having

node domain[128][128][128] // 3 dimensional array

In a given function usually, I will be accessing just one quantity (say
velocity) at all nodes in the domain.

Doing this, you change the way you represent data at the
lower level. In fortran, you have to choose at the begining
of the project how you represent an array of coordinates:
COORD[n][3]
or COORD[3][n]

The most efficient representation depends on the way data is
accessed in the program, cache memory considerations,
whether you work on a vector computer or not...

C++ allows you to hide these details so that you can change
the representation afterwards. The higher level routines in
your code will access data through

domain.position(node, coord);

I personnaly think that arrays of doubles are better at low
level than arrays of structures. This allows you to call
fortran or c library routines from your code. Take care to
encapsulate those calls in the lower level routines, so that
you don't have to care about the following in the higher
level routines:
- index translation (fortran begins at index 1 and so on...),
- type checking,
- const and non const cast (when the library function
prototypes do not represent logical constness of the
parameters),
- aliasing (you must check when calling fortran routines,
because it is forbidden in fortran),

Also, before calling library routines, performing
prerequisite checks on array sizes and so on with assert()
is great.

C++ can have a huge performance impact in some cases:
- if you create too fine grained classes, like one class to
represent a 3D vector {x,y,z}, and put some virtual
functions inside (32 bytes per element instead of 24, well
in that particular case, it might in some cases be faster
with 16 bytes because of cache alignment...)
- if you use virtual functions to access data at fine grain:

double sum = 0.;
for (i = 0; i < n_nodes; i++)
for (j = 0; j < dimension; j++)
sum += domain.velocity(i,j) * domain.velocity(i,j);

if velocity(i,j) is a virtual function, you will multiply
your execution time by 2, 3 or more... virtual functions
cannot be inlined

Computing the norm of a vector is typically something you do
with a dedicated routine :
norm = domain.velocity().norm();
or norm(domain.velocity()) as you wish...
And this routine will use low level optimized routines (blas
or so...).

MPI has a C and a Fortran interface. We encapsulate MPI
calls in communication classes, so we can perform
prerequisite checking, toggle debug information, switch to
PVM easily...
Encapsulation means more function calls, but with a typical
network latency of 5 micro-seconds, this is not a
significant overhead. If you are on smp machines, this
might, perhaps, become a concern...

When working on more than 32 elements arrays, and provided
that you don't have convert the data structure because the
data is stored differently in your code, the overhead is
less than a few percent. It is important that your data
structures match the data structure of the most commonly
used library functions, those where you spend the most cpu
time. Doing this, you have a chance that you can vectorize
the code on vector machines...
I would say you would do better as a good Fortran programmer than you would
as a bad C++ programmer. If you are going to use C++ be prepared to take
some time to learn it. It's kind of complicated and although it has great
power it also has a lot of traps. Read the best textbooks you can find and
work on it for a few months. If you do you won't be tempted by Fortran
again.

Study the way the standard library (particularly the part formerly known as
the Standard Template Library or STL) works. Use existing containers rather
than writing your own. For high performance use highly optimized C library
functions (fftw being a perfect example) using pointers obtained from the
C++ containers. This way the dreaded "OOP overhead" can be pretty much
eliminated.

That's what I do and I was a Fortran programmer for many years.

I agree with that.

One advantage of c++ is that there exists g++ which is a
good, free and widely used compiler. Vendor c++ compilers,
especially on supercomputers might be less efficient than
the corresponding vendor fortran compiler, especially on
vector computers...

c++ makes sense for large projects, but it takes years to
write an efficient core that is easy to use. For us, the
most important is that people new to the project can quickly
write code. It requires both a good experience in C++ to
write the core classes (so that you encapsulate things
correctly, with correct constness checking, asserts where
they are needed, prevent people from doing weird things) and
a good knowledge of the "big picture" of your computing
program so that your classes match what you need.
One thing which is very difficult to achieve is to hide the
data representation enough so you can change your mind
afterwards without changing all the code (this is one
advantage of c++), but not too much because in this case you
increase data access overhead and you globally increase the
complexity of the code. Moreover, you never know in advance
how you will change your mind later, and it is impossible to
design the code so that *any* change is easy.

One last thing: be sure to always have a skilled c++
programmer maintain the code...

Hope this helps,

Benoit
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top