How to write optimized/efficient C programs....?

K

kr

Hi people,
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code. I think it will be helpful for all of the
programmers community. Please tell your tips/tricks that you feel
should be used to write a code as optimal as the one produced by an
optimizing compiler.

I will start it like this-

1) When we dont need double precision arithmetic, only float
quantities will serve the purpose.
so we should explicitly declare flaot quantities which are double by
default.

float f1;
f1 = f1+2.5f;
instead of--

float f1;
f1 = f1+2.5; // 2.5 is a double quantity here
which leads to double precision arithmatic being done by the processor
and hence a time wastage.


Thanks.
 
M

Martin Wells

kr:
Hi people,
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code. I think it will be helpful for all of the
programmers community. Please tell your tips/tricks that you feel
should be used to write a code as optimal as the one produced by an
optimizing compiler.


There's a few little things I do... but with a lot of optimisations,
there comes a counter-argument which illustrates how there's a better
way of doing it on different systems. For example take the following
function:

void AddFiveToEveryElement(int *p,size_t len)
{
assert(p); assert(len);

do *p++ += 5;
while (--len);
}

Now I would think that that's quite efficient, but another method
might work better on a different system. Something like:

void AddFiveToEveryElement(int *const p,size_t len)
{
assert(p); assert(len);

do p[--len] += 5; while (len);
}

This might work better on a machine that has an instruction which
takes both a pointer and an offset.

Martin
 
S

santosh

kr said:
Hi people,
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code. I think it will be helpful for all of the
programmers community. Please tell your tips/tricks that you feel
should be used to write a code as optimal as the one produced by an
optimizing compiler.

I will start it like this-

1) When we dont need double precision arithmetic, only float
quantities will serve the purpose.
so we should explicitly declare flaot quantities which are double by
default.

float f1;
f1 = f1+2.5f;
instead of--

float f1;
f1 = f1+2.5; // 2.5 is a double quantity here
which leads to double precision arithmatic being done by the processor
and hence a time wastage.

On many modern processors, calculations involving doubles might actually
turn out faster than those involving floats. The extra range and precision
offered by double is almost always more important than speed
considerations.
 
F

Flash Gordon

kr wrote, On 29/09/07 12:34:
Hi people,
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code. I think it will be helpful for all of the
programmers community.

First nail down the requirements so you don't write code to do things
that are not required.
Second select the best algorithm for the job and tune it.
Third design the program so it is not doing things it does not need to do.
Forth, write clear code to implement your design.
Please tell your tips/tricks that you feel
should be used to write a code as optimal as the one produced by an
optimizing compiler.

The best method to do this is use an optimising compiler.
I will start it like this-

1) When we dont need double precision arithmetic, only float
quantities will serve the purpose.
so we should explicitly declare flaot quantities which are double by
default.

Not if you want efficient code on a lot of implementations. Often using
float is *slower* than using double.
float f1;
f1 = f1+2.5f;
instead of--

float f1;
f1 = f1+2.5; // 2.5 is a double quantity here
which leads to double precision arithmatic being done by the processor
and hence a time wastage.

On a lot of implementations the fastest thing would be
double d1;
/* Code which sets d1 */
d1 += 2.5;

Of course, using "d1 = d1 + 2.5" is likely to be just as fast, but is
more error prone since you might type "d1 = e1 + 1" by mistake.

Generally, you are far more likely to get a program doing the correct
thing fast enough if you write your code to be clear than if you try to
micro-optimise.
 
W

William Hughes

Hi people,
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code. I think it will be helpful for all of the
programmers community. Please tell your tips/tricks that you feel
should be used to write a code as optimal as the one produced by an
optimizing compiler.

I will start it like this-

1) When we dont need double precision arithmetic, only float
quantities will serve the purpose.
so we should explicitly declare flaot quantities which are double by
default.

float f1;
f1 = f1+2.5f;
instead of--

float f1;
f1 = f1+2.5; // 2.5 is a double quantity here
which leads to double precision arithmatic being done by the processor
and hence a time wastage.

This is a perfect example of why these types of
"tips/methods/suggestions" are a bad idea.
In fact, on many processors double precision is
native, so double precison calculations
are faster that float calculations (to
do a float calculation the float values are
converted to double, the calculation is done
and the answer converted back to float).
On other processors the opposite is the case.
But what if you have a big (several hundred megabyte)
matrix of floating point values. What matters
here is not how fast you can multiply, but how
fast you can get the information to and from
memory. Using float may be a good idea, even
if float calculations take a bit longer.
But there are cache and register considerations
....


My Tips


Get a good optimizing complier. Many
useful general methods (and many
very processor specific methods) will be
known and used by
the compiler.

Write clear code. Not only does this make
things easier for you, it makes things easier
for the optimzer, (whether a compiler,
or some other programmer).

Use libraries for things like matrix
operations and FFT. (Getting these things
to work fast is a hard job)


**** GET IT WORKING ****

then, if it is not fast enough (note a
hardware cycle may have gone by while you
were getting the code working),

**** PROFILE ****

to find what is taking so much time
(the result will probably surprise you).

Then, and only then, worry about making
the code faster.


- William Hughes
 
M

Malcolm McLean

kr said:
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code. I think it will be helpful for all of the
programmers community. Please tell your tips/tricks that you feel
should be used to write a code as optimal as the one produced by an
optimizing compiler.

I will start it like this-

1) When we dont need double precision arithmetic, only float
quantities will serve the purpose.
so we should explicitly declare flaot quantities which are double by
default.
That's called micro-optimisation.
Sometimes you can get a very significant speedup with such techniques, but
what you cannot do is reduce the order analysis of the algorithm. Nor can
you, normally, strip out layers of data copying and reformatting.

In practise when a program runs too slowly either changing the algorithm or
stripping out layers of "gift-wrapping" will fix it, or nothing will fix it.
The number of times you can convert unacceptable into acceptable performnace
through micro-optimisation is small.

The bottleneck in software development is usually the amount of code the
programmer can write, debug, and interface to other code. Microoptimisation
can and does make this worse. For instance one of my bugbears is the number
of different integer types. Reals are not so bad, there are only two formats
in wide use. Still, it is intensely irritating when code fragment one works
on float *'s and fragment two works on double *s. You end up either
rewriting functional code or writing little interface functions to allocate
buffers of doubles and convert them from floats. Needless to say, the
interfacing code tends to cost more than the advantage of using floats in
the first place.
 
P

pete

kr said:
Hi people,
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code. I think it will be helpful for all of the
programmers community. Please tell your tips/tricks that you feel
should be used to write a code as optimal as the one produced by an
optimizing compiler.

I will start it like this-

1) When we dont need double precision arithmetic, only float
quantities will serve the purpose.
so we should explicitly declare flaot quantities which are double by
default.

I advocate double as the default choice
for a floating point type to use in code.
float is only for when you want the smallest type.
long double is for when you want the greatest range
and or the greatest precision.

There's no reason to assume that operations
on type float are faster than operations on type double.

Arguments of type float are subject to
"the default argument promotions".
 
B

Ben Pfaff

pete said:
I advocate double as the default choice
for a floating point type to use in code.
float is only for when you want the smallest type.
long double is for when you want the greatest range
and or the greatest precision.

I agree.

Furthermore, it makes your life a lot easier if you just go with
double, since you don't have to be careful about casting or
adding suffixes to numeric constants, and you don't have to check
whether your implementation offers the C99 math library functions
on float and long double.
 
M

Mike Wahler

kr said:
Hi people,
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code.

1. Use a quality optimizing compiler (the research
for determining quality is your responsibility)

End of List
I think it will be helpful for all of the
programmers community. Please tell your tips/tricks that you feel
should be used to write a code as optimal as the one produced by an
optimizing compiler.

If someone's already done the work, why should I do it again?
I will start it like this-
1) When we dont need double precision arithmetic, only float
quantities will serve the purpose.
so we should explicitly declare flaot quantities which are double by
default.

float f1;
f1 = f1+2.5f;
instead of--

float f1;
f1 = f1+2.5; // 2.5 is a double quantity here
which leads to double precision arithmatic being done by the processor
and hence a time wastage.

You've made a HUGE assumption here. Type 'double' operations aren't
automatically slower than 'float' operations (the case could be the
exact opposite on certain platforms).

1. If you want to really *know* about performance, you must *measure*.
2. Measurements can and do vary for identical code on different platforms.

-Mike
 
M

Malcolm McLean

Ben Pfaff said:
Furthermore, it makes your life a lot easier if you just go with
double, since you don't have to be careful about casting or
adding suffixes to numeric constants, and you don't have to check
whether your implementation offers the C99 math library functions
on float and long double.
Except that float is traditional for 3D geometry. Almost never do you need
more precision for coordinates. For instance proteins, which I am working on
presently, cannot be resolved to finer than about one Angstrom unit anyway,
and are typically a hundred or so Angstroms across. So there is no point
pretending to have double precision in their representation. Also 3D meshes
can get very large, so the memory take is significant.
 
T

Tor Rustad

kr said:
Hi people,
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code.

<snip float code>

I rather give advice on writing "correct" code, there are many
programmers, that are quite clueless at floating-point calculations.

Tons of scientific computations based on single-precision are bad, some
results are pure nonsense. Pick up a text book on numerical analysis,
and study the effect of cancellation. Doing floating-point calculations,
without proper error-analysis, is asking for trouble.

I'm not doing heavy floating-point calculations these days, but I
wouldn't be surprised if the FPU of my IA-32, by default operate with
IEEE 754 80-bit precision (at native speed). So weather you store the
result outside FPU in 32-bit or 64-bit memory locations, don't need to
matter much speed-wise.

If you really worry about FLOPS speed, get an AMD-64 or IA-64 CPU. Add
the number of CPU's needed, and use good compiler and scientific libraries.
 
T

Tor Rustad

Malcolm McLean wrote:

[...]
cannot be resolved to finer than about one
Angstrom unit anyway, and are typically a hundred or so Angstroms
across. So there is no point pretending to have double precision in
their representation.

That depend very much of the calculations you do. With single-precision
computations, the numerical error in the result, may grow surprisingly fast.

Also 3D meshes can get very large, so the memory take is significant.

Even for a super-computer?
 
P

pete

Malcolm said:
Except that float is traditional for 3D geometry.
Almost never do you need more precision for coordinates.
For instance proteins, which I am working on presently,
cannot be resolved to finer than about one Angstrom unit anyway,
and are typically a hundred or so Angstroms across.
So there is no point
pretending to have double precision in their representation.
Also 3D meshes can get very large,
so the memory take is significant.

Large arrays are the only situation in which I can imagine
that the space saving from using floats would be significant.

Also, arrays are just about the only situation
in which I would use a lower ranking type than int.
 
M

Malcolm McLean

Tor Rustad said:
Malcolm McLean wrote:

[...]
cannot be resolved to finer than about one Angstrom unit anyway, and are
typically a hundred or so Angstroms across. So there is no point
pretending to have double precision in their representation.

That depend very much of the calculations you do. With single-precision
computations, the numerical error in the result, may grow surprisingly
fast.
That's true. For instance we always represent rotations as rotations from
position zero, rather than incrementing Cartesian coordinates by a delta.
One problem, where I did actually use doubles, was when a protein backbone
is represented by torsion angles between the atoms. If the chain is large
enough then a tiny inaccuracy in a torsion atom in the middle can affect the
postion of the whole quite severely. However I then converted the atoms back
to single precision for the rest of the calculations.
Even for a super-computer?
A Beowulf cluster only has 2GB of core on each node, although ours has over
a hundred nodes. However it is vitually all yours, no nasty Windows Vista to
gobble lots of megs.
That still means it can handle a very big protein, except that one of our
algorithms needs to store as many conformations as possible.
 
R

Richard Tobin

If you really worry about FLOPS speed, get an AMD-64 or IA-64 CPU. Add
the number of CPU's needed, and use good compiler and scientific libraries.

Hidden in that paragraph is a practical piece of advice on how to
write your code: there's no point adding CPUs unless you code is
designed to be divided between multiple CPUs. And you need to
consider this early on: it affects your choice of algorithms as well
as how you code them.

-- Richard
 
J

jacob navia

kr said:
Hi people,
Please contribute to make a good list of tips/methods/suggestions for
writing optimized C code. I think it will be helpful for all of the
programmers community. Please tell your tips/tricks that you feel
should be used to write a code as optimal as the one produced by an
optimizing compiler.

I will start it like this-

1) When we dont need double precision arithmetic, only float
quantities will serve the purpose.
so we should explicitly declare flaot quantities which are double by
default.

float f1;
f1 = f1+2.5f;
instead of--

float f1;
f1 = f1+2.5; // 2.5 is a double quantity here
which leads to double precision arithmatic being done by the processor
and hence a time wastage.


Thanks.
http://www.codeproject.com/tips/optimizationenemy.asp
 
T

Tor Rustad

Malcolm said:
Tor Rustad said:
Malcolm McLean wrote:

[...]
cannot be resolved to finer than about one Angstrom unit anyway, and
are typically a hundred or so Angstroms across. So there is no point
pretending to have double precision in their representation.

That depend very much of the calculations you do. With
single-precision computations, the numerical error in the result, may
grow surprisingly fast.
That's true. For instance we always represent rotations as rotations
from position zero, rather than incrementing Cartesian coordinates by a
delta.
One problem, where I did actually use doubles, was when a protein
backbone is represented by torsion angles between the atoms. If the
chain is large enough then a tiny inaccuracy in a torsion atom in the
middle can affect the postion of the whole quite severely. However I
then converted the atoms back to single precision for the rest of the
calculations.

Yes, the point is, if using single-precision, the programmer need to
know what he/she is doing, those who don't, should rather stay with DP.

One area to watch out for, is when solving inverse problems numerically.

A Beowulf cluster only has 2GB of core on each node, although ours has
over a hundred nodes. However it is vitually all yours, no nasty Windows
Vista to gobble lots of megs.
That still means it can handle a very big protein, except that one of
our algorithms needs to store as many conformations as possible.

Hmm.. 2 Gb memory per node, doesn't sound like much these days. I would
expect more on a Top 500 HPC cluster, rather in the range of 16 Gb - 32
Gb per node. If you are located in UK or US, there are lots HPC clusters
out there, and you could try to get some CPU time elsewhere.


What kind of computations are you doing on these proteins? E.g. which
type of equations do you talk about, and what results are obtained when
solving them?

The dynamics of many-particle problems are rather complex, even the
"simple" 3-particle problem in classical physics, has no analytical
solution.
 
T

Tor Rustad

Richard said:
Hidden in that paragraph is a practical piece of advice on how to
write your code: there's no point adding CPUs unless you code is
designed to be divided between multiple CPUs. And you need to
consider this early on: it affects your choice of algorithms as well
as how you code them.

Yes.

For the future, I guess normal programmers need to re-think how they
write and design programs. For a shared memory systems, OS and compilers
could perhaps do some of it for us (to some extent), but for distributed
memory systems, that would require much more. Single core, single CPU
systems, is a thing of the past.

If going parallel, one advice I hear, is split data, not code.

--
Tor <torust [at] online [dot] no>

"Hello everybody out there using minix - I'm doing a (free) operating
system (just a hobby, won't be big and professional like gnu) for
386(486) AT clones" -Linus 1991
 
M

Malcolm McLean

Tor Rustad said:
What kind of computations are you doing on these proteins? E.g. which type
of equations do you talk about, and what results are obtained when solving
them?

The dynamics of many-particle problems are rather complex, even the
"simple" 3-particle problem in classical physics, has no analytical
solution.
We don't do dynamics. We calcuate the free energy of lots of conformations
and try to build up an ensemble that matches the states the protein will
adopt in solution.
The plan is to try to model an amyloid fibre and understand why some
sequences form amyloids much more readily than others.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,900
Latest member
Nell636132

Latest Threads

Top