How to write optimized/efficient C programs....?

Discussion in 'C Programming' started by kr, Sep 29, 2007.

  1. kr

    kr Guest

    Hi people,
    Please contribute to make a good list of tips/methods/suggestions for
    writing optimized C code. I think it will be helpful for all of the
    programmers community. Please tell your tips/tricks that you feel
    should be used to write a code as optimal as the one produced by an
    optimizing compiler.

    I will start it like this-

    1) When we dont need double precision arithmetic, only float
    quantities will serve the purpose.
    so we should explicitly declare flaot quantities which are double by
    default.

    float f1;
    f1 = f1+2.5f;
    instead of--

    float f1;
    f1 = f1+2.5; // 2.5 is a double quantity here
    which leads to double precision arithmatic being done by the processor
    and hence a time wastage.


    Thanks.
     
    kr, Sep 29, 2007
    #1
    1. Advertising

  2. kr

    Martin Wells Guest

    kr:

    > Hi people,
    > Please contribute to make a good list of tips/methods/suggestions for
    > writing optimized C code. I think it will be helpful for all of the
    > programmers community. Please tell your tips/tricks that you feel
    > should be used to write a code as optimal as the one produced by an
    > optimizing compiler.



    There's a few little things I do... but with a lot of optimisations,
    there comes a counter-argument which illustrates how there's a better
    way of doing it on different systems. For example take the following
    function:

    void AddFiveToEveryElement(int *p,size_t len)
    {
    assert(p); assert(len);

    do *p++ += 5;
    while (--len);
    }

    Now I would think that that's quite efficient, but another method
    might work better on a different system. Something like:

    void AddFiveToEveryElement(int *const p,size_t len)
    {
    assert(p); assert(len);

    do p[--len] += 5; while (len);
    }

    This might work better on a machine that has an instruction which
    takes both a pointer and an offset.

    Martin
     
    Martin Wells, Sep 29, 2007
    #2
    1. Advertising

  3. kr

    santosh Guest

    kr wrote:

    > Hi people,
    > Please contribute to make a good list of tips/methods/suggestions for
    > writing optimized C code. I think it will be helpful for all of the
    > programmers community. Please tell your tips/tricks that you feel
    > should be used to write a code as optimal as the one produced by an
    > optimizing compiler.
    >
    > I will start it like this-
    >
    > 1) When we dont need double precision arithmetic, only float
    > quantities will serve the purpose.
    > so we should explicitly declare flaot quantities which are double by
    > default.
    >
    > float f1;
    > f1 = f1+2.5f;
    > instead of--
    >
    > float f1;
    > f1 = f1+2.5; // 2.5 is a double quantity here
    > which leads to double precision arithmatic being done by the processor
    > and hence a time wastage.


    On many modern processors, calculations involving doubles might actually
    turn out faster than those involving floats. The extra range and precision
    offered by double is almost always more important than speed
    considerations.
     
    santosh, Sep 29, 2007
    #3
  4. kr

    Flash Gordon Guest

    kr wrote, On 29/09/07 12:34:
    > Hi people,
    > Please contribute to make a good list of tips/methods/suggestions for
    > writing optimized C code. I think it will be helpful for all of the
    > programmers community.


    First nail down the requirements so you don't write code to do things
    that are not required.
    Second select the best algorithm for the job and tune it.
    Third design the program so it is not doing things it does not need to do.
    Forth, write clear code to implement your design.

    > Please tell your tips/tricks that you feel
    > should be used to write a code as optimal as the one produced by an
    > optimizing compiler.


    The best method to do this is use an optimising compiler.

    > I will start it like this-
    >
    > 1) When we dont need double precision arithmetic, only float
    > quantities will serve the purpose.
    > so we should explicitly declare flaot quantities which are double by
    > default.


    Not if you want efficient code on a lot of implementations. Often using
    float is *slower* than using double.

    > float f1;
    > f1 = f1+2.5f;
    > instead of--
    >
    > float f1;
    > f1 = f1+2.5; // 2.5 is a double quantity here
    > which leads to double precision arithmatic being done by the processor
    > and hence a time wastage.


    On a lot of implementations the fastest thing would be
    double d1;
    /* Code which sets d1 */
    d1 += 2.5;

    Of course, using "d1 = d1 + 2.5" is likely to be just as fast, but is
    more error prone since you might type "d1 = e1 + 1" by mistake.

    Generally, you are far more likely to get a program doing the correct
    thing fast enough if you write your code to be clear than if you try to
    micro-optimise.
    --
    Flash Gordon
     
    Flash Gordon, Sep 29, 2007
    #4
  5. On Sep 29, 7:34 am, kr <> wrote:
    > Hi people,
    > Please contribute to make a good list of tips/methods/suggestions for
    > writing optimized C code. I think it will be helpful for all of the
    > programmers community. Please tell your tips/tricks that you feel
    > should be used to write a code as optimal as the one produced by an
    > optimizing compiler.
    >
    > I will start it like this-
    >
    > 1) When we dont need double precision arithmetic, only float
    > quantities will serve the purpose.
    > so we should explicitly declare flaot quantities which are double by
    > default.
    >
    > float f1;
    > f1 = f1+2.5f;
    > instead of--
    >
    > float f1;
    > f1 = f1+2.5; // 2.5 is a double quantity here
    > which leads to double precision arithmatic being done by the processor
    > and hence a time wastage.
    >


    This is a perfect example of why these types of
    "tips/methods/suggestions" are a bad idea.
    In fact, on many processors double precision is
    native, so double precison calculations
    are faster that float calculations (to
    do a float calculation the float values are
    converted to double, the calculation is done
    and the answer converted back to float).
    On other processors the opposite is the case.
    But what if you have a big (several hundred megabyte)
    matrix of floating point values. What matters
    here is not how fast you can multiply, but how
    fast you can get the information to and from
    memory. Using float may be a good idea, even
    if float calculations take a bit longer.
    But there are cache and register considerations
    ....


    My Tips


    Get a good optimizing complier. Many
    useful general methods (and many
    very processor specific methods) will be
    known and used by
    the compiler.

    Write clear code. Not only does this make
    things easier for you, it makes things easier
    for the optimzer, (whether a compiler,
    or some other programmer).

    Use libraries for things like matrix
    operations and FFT. (Getting these things
    to work fast is a hard job)


    **** GET IT WORKING ****

    then, if it is not fast enough (note a
    hardware cycle may have gone by while you
    were getting the code working),

    **** PROFILE ****

    to find what is taking so much time
    (the result will probably surprise you).

    Then, and only then, worry about making
    the code faster.


    - William Hughes
     
    William Hughes, Sep 29, 2007
    #5
  6. "kr" <> wrote in message
    >
    > Please contribute to make a good list of tips/methods/suggestions for
    > writing optimized C code. I think it will be helpful for all of the
    > programmers community. Please tell your tips/tricks that you feel
    > should be used to write a code as optimal as the one produced by an
    > optimizing compiler.
    >
    > I will start it like this-
    >
    > 1) When we dont need double precision arithmetic, only float
    > quantities will serve the purpose.
    > so we should explicitly declare flaot quantities which are double by
    > default.
    >

    That's called micro-optimisation.
    Sometimes you can get a very significant speedup with such techniques, but
    what you cannot do is reduce the order analysis of the algorithm. Nor can
    you, normally, strip out layers of data copying and reformatting.

    In practise when a program runs too slowly either changing the algorithm or
    stripping out layers of "gift-wrapping" will fix it, or nothing will fix it.
    The number of times you can convert unacceptable into acceptable performnace
    through micro-optimisation is small.

    The bottleneck in software development is usually the amount of code the
    programmer can write, debug, and interface to other code. Microoptimisation
    can and does make this worse. For instance one of my bugbears is the number
    of different integer types. Reals are not so bad, there are only two formats
    in wide use. Still, it is intensely irritating when code fragment one works
    on float *'s and fragment two works on double *s. You end up either
    rewriting functional code or writing little interface functions to allocate
    buffers of doubles and convert them from floats. Needless to say, the
    interfacing code tends to cost more than the advantage of using floats in
    the first place.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
     
    Malcolm McLean, Sep 29, 2007
    #6
  7. kr

    pete Guest

    kr wrote:
    >
    > Hi people,
    > Please contribute to make a good list of tips/methods/suggestions for
    > writing optimized C code. I think it will be helpful for all of the
    > programmers community. Please tell your tips/tricks that you feel
    > should be used to write a code as optimal as the one produced by an
    > optimizing compiler.
    >
    > I will start it like this-
    >
    > 1) When we dont need double precision arithmetic, only float
    > quantities will serve the purpose.
    > so we should explicitly declare flaot quantities which are double by
    > default.


    I advocate double as the default choice
    for a floating point type to use in code.
    float is only for when you want the smallest type.
    long double is for when you want the greatest range
    and or the greatest precision.

    There's no reason to assume that operations
    on type float are faster than operations on type double.

    Arguments of type float are subject to
    "the default argument promotions".

    --
    pete
     
    pete, Sep 29, 2007
    #7
  8. kr

    Ben Pfaff Guest

    pete <> writes:

    > I advocate double as the default choice
    > for a floating point type to use in code.
    > float is only for when you want the smallest type.
    > long double is for when you want the greatest range
    > and or the greatest precision.


    I agree.

    Furthermore, it makes your life a lot easier if you just go with
    double, since you don't have to be careful about casting or
    adding suffixes to numeric constants, and you don't have to check
    whether your implementation offers the C99 math library functions
    on float and long double.
    --
    Ben Pfaff
    http://benpfaff.org
     
    Ben Pfaff, Sep 29, 2007
    #8
  9. kr

    Mike Wahler Guest

    "kr" <> wrote in message
    news:...
    > Hi people,
    > Please contribute to make a good list of tips/methods/suggestions for
    > writing optimized C code.


    1. Use a quality optimizing compiler (the research
    for determining quality is your responsibility)

    End of List

    > I think it will be helpful for all of the
    > programmers community. Please tell your tips/tricks that you feel
    > should be used to write a code as optimal as the one produced by an
    > optimizing compiler.


    If someone's already done the work, why should I do it again?

    >
    > I will start it like this-


    > 1) When we dont need double precision arithmetic, only float
    > quantities will serve the purpose.
    > so we should explicitly declare flaot quantities which are double by
    > default.
    >
    > float f1;
    > f1 = f1+2.5f;
    > instead of--
    >
    > float f1;
    > f1 = f1+2.5; // 2.5 is a double quantity here
    > which leads to double precision arithmatic being done by the processor
    > and hence a time wastage.


    You've made a HUGE assumption here. Type 'double' operations aren't
    automatically slower than 'float' operations (the case could be the
    exact opposite on certain platforms).

    1. If you want to really *know* about performance, you must *measure*.
    2. Measurements can and do vary for identical code on different platforms.

    -Mike
     
    Mike Wahler, Sep 29, 2007
    #9
  10. "Ben Pfaff" <> wrote in message
    > Furthermore, it makes your life a lot easier if you just go with
    > double, since you don't have to be careful about casting or
    > adding suffixes to numeric constants, and you don't have to check
    > whether your implementation offers the C99 math library functions
    > on float and long double.
    >

    Except that float is traditional for 3D geometry. Almost never do you need
    more precision for coordinates. For instance proteins, which I am working on
    presently, cannot be resolved to finer than about one Angstrom unit anyway,
    and are typically a hundred or so Angstroms across. So there is no point
    pretending to have double precision in their representation. Also 3D meshes
    can get very large, so the memory take is significant.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
     
    Malcolm McLean, Sep 29, 2007
    #10
  11. kr

    Tor Rustad Guest

    kr wrote:
    > Hi people,
    > Please contribute to make a good list of tips/methods/suggestions for
    > writing optimized C code.


    <snip float code>

    I rather give advice on writing "correct" code, there are many
    programmers, that are quite clueless at floating-point calculations.

    Tons of scientific computations based on single-precision are bad, some
    results are pure nonsense. Pick up a text book on numerical analysis,
    and study the effect of cancellation. Doing floating-point calculations,
    without proper error-analysis, is asking for trouble.

    I'm not doing heavy floating-point calculations these days, but I
    wouldn't be surprised if the FPU of my IA-32, by default operate with
    IEEE 754 80-bit precision (at native speed). So weather you store the
    result outside FPU in 32-bit or 64-bit memory locations, don't need to
    matter much speed-wise.

    If you really worry about FLOPS speed, get an AMD-64 or IA-64 CPU. Add
    the number of CPU's needed, and use good compiler and scientific libraries.


    --
    Tor <torust [at] online [dot] no>
    "One of the main causes of the fall of the Roman Empire was that,
    lacking zero, they had no way to indicate successful termination of
    their C programs"
     
    Tor Rustad, Sep 30, 2007
    #11
  12. kr

    Tor Rustad Guest

    Malcolm McLean wrote:

    [...]

    > cannot be resolved to finer than about one
    > Angstrom unit anyway, and are typically a hundred or so Angstroms
    > across. So there is no point pretending to have double precision in
    > their representation.


    That depend very much of the calculations you do. With single-precision
    computations, the numerical error in the result, may grow surprisingly fast.


    > Also 3D meshes can get very large, so the memory take is significant.


    Even for a super-computer?

    --
    Tor <torust [at] online [dot] no>
    "One of the main causes of the fall of the Roman Empire was that,
    lacking zero, they had no way to indicate successful termination of
    their C programs"
     
    Tor Rustad, Sep 30, 2007
    #12
  13. kr

    pete Guest

    Malcolm McLean wrote:
    >
    > "Ben Pfaff" <> wrote in message
    > > Furthermore, it makes your life a lot easier if you just go with
    > > double, since you don't have to be careful about casting or
    > > adding suffixes to numeric constants, and you don't have to check
    > > whether your implementation offers the C99 math library functions
    > > on float and long double.
    > >

    > Except that float is traditional for 3D geometry.
    > Almost never do you need more precision for coordinates.
    > For instance proteins, which I am working on presently,
    > cannot be resolved to finer than about one Angstrom unit anyway,
    > and are typically a hundred or so Angstroms across.
    > So there is no point
    > pretending to have double precision in their representation.
    > Also 3D meshes can get very large,
    > so the memory take is significant.


    Large arrays are the only situation in which I can imagine
    that the space saving from using floats would be significant.

    Also, arrays are just about the only situation
    in which I would use a lower ranking type than int.

    --
    pete
     
    pete, Sep 30, 2007
    #13
  14. "Tor Rustad" <> wrote in message
    news:...
    > Malcolm McLean wrote:
    >
    > [...]
    >
    >> cannot be resolved to finer than about one Angstrom unit anyway, and are
    >> typically a hundred or so Angstroms across. So there is no point
    >> pretending to have double precision in their representation.

    >
    > That depend very much of the calculations you do. With single-precision
    > computations, the numerical error in the result, may grow surprisingly
    > fast.
    >

    That's true. For instance we always represent rotations as rotations from
    position zero, rather than incrementing Cartesian coordinates by a delta.
    One problem, where I did actually use doubles, was when a protein backbone
    is represented by torsion angles between the atoms. If the chain is large
    enough then a tiny inaccuracy in a torsion atom in the middle can affect the
    postion of the whole quite severely. However I then converted the atoms back
    to single precision for the rest of the calculations.
    >
    >> Also 3D meshes can get very large, so the memory take is significant.

    >
    > Even for a super-computer?
    >

    A Beowulf cluster only has 2GB of core on each node, although ours has over
    a hundred nodes. However it is vitually all yours, no nasty Windows Vista to
    gobble lots of megs.
    That still means it can handle a very big protein, except that one of our
    algorithms needs to store as many conformations as possible.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
     
    Malcolm McLean, Sep 30, 2007
    #14
  15. In article <>,
    Tor Rustad <> wrote:
    >If you really worry about FLOPS speed, get an AMD-64 or IA-64 CPU. Add
    >the number of CPU's needed, and use good compiler and scientific libraries.


    Hidden in that paragraph is a practical piece of advice on how to
    write your code: there's no point adding CPUs unless you code is
    designed to be divided between multiple CPUs. And you need to
    consider this early on: it affects your choice of algorithms as well
    as how you code them.

    -- Richard
    --
    "Consideration shall be given to the need for as many as 32 characters
    in some alphabets" - X3.4, 1963.
     
    Richard Tobin, Sep 30, 2007
    #15
  16. kr

    jacob navia Guest

    kr wrote:
    > Hi people,
    > Please contribute to make a good list of tips/methods/suggestions for
    > writing optimized C code. I think it will be helpful for all of the
    > programmers community. Please tell your tips/tricks that you feel
    > should be used to write a code as optimal as the one produced by an
    > optimizing compiler.
    >
    > I will start it like this-
    >
    > 1) When we dont need double precision arithmetic, only float
    > quantities will serve the purpose.
    > so we should explicitly declare flaot quantities which are double by
    > default.
    >
    > float f1;
    > f1 = f1+2.5f;
    > instead of--
    >
    > float f1;
    > f1 = f1+2.5; // 2.5 is a double quantity here
    > which leads to double precision arithmatic being done by the processor
    > and hence a time wastage.
    >
    >
    > Thanks.
    >

    http://www.codeproject.com/tips/optimizationenemy.asp

    --
    jacob navia
    jacob at jacob point remcomp point fr
    logiciels/informatique
    http://www.cs.virginia.edu/~lcc-win32
     
    jacob navia, Sep 30, 2007
    #16
  17. kr

    Tor Rustad Guest

    Malcolm McLean wrote:
    > "Tor Rustad" <> wrote in message
    > news:...
    >> Malcolm McLean wrote:
    >>
    >> [...]
    >>
    >>> cannot be resolved to finer than about one Angstrom unit anyway, and
    >>> are typically a hundred or so Angstroms across. So there is no point
    >>> pretending to have double precision in their representation.

    >>
    >> That depend very much of the calculations you do. With
    >> single-precision computations, the numerical error in the result, may
    >> grow surprisingly fast.
    >>

    > That's true. For instance we always represent rotations as rotations
    > from position zero, rather than incrementing Cartesian coordinates by a
    > delta.
    > One problem, where I did actually use doubles, was when a protein
    > backbone is represented by torsion angles between the atoms. If the
    > chain is large enough then a tiny inaccuracy in a torsion atom in the
    > middle can affect the postion of the whole quite severely. However I
    > then converted the atoms back to single precision for the rest of the
    > calculations.


    Yes, the point is, if using single-precision, the programmer need to
    know what he/she is doing, those who don't, should rather stay with DP.

    One area to watch out for, is when solving inverse problems numerically.


    >>> Also 3D meshes can get very large, so the memory take is significant.

    >>
    >> Even for a super-computer?
    >>

    > A Beowulf cluster only has 2GB of core on each node, although ours has
    > over a hundred nodes. However it is vitually all yours, no nasty Windows
    > Vista to gobble lots of megs.
    > That still means it can handle a very big protein, except that one of
    > our algorithms needs to store as many conformations as possible.


    Hmm.. 2 Gb memory per node, doesn't sound like much these days. I would
    expect more on a Top 500 HPC cluster, rather in the range of 16 Gb - 32
    Gb per node. If you are located in UK or US, there are lots HPC clusters
    out there, and you could try to get some CPU time elsewhere.


    What kind of computations are you doing on these proteins? E.g. which
    type of equations do you talk about, and what results are obtained when
    solving them?

    The dynamics of many-particle problems are rather complex, even the
    "simple" 3-particle problem in classical physics, has no analytical
    solution.


    --
    Tor <torust [at] online [dot] no>

    "To this day, many C programmers believe that 'strong typing' just means
    pounding extra hard on the keyboard"
     
    Tor Rustad, Oct 1, 2007
    #17
  18. kr

    Tor Rustad Guest

    Richard Tobin wrote:
    > In article <>,
    > Tor Rustad <> wrote:
    >> If you really worry about FLOPS speed, get an AMD-64 or IA-64 CPU. Add
    >> the number of CPU's needed, and use good compiler and scientific libraries.

    >
    > Hidden in that paragraph is a practical piece of advice on how to
    > write your code: there's no point adding CPUs unless you code is
    > designed to be divided between multiple CPUs. And you need to
    > consider this early on: it affects your choice of algorithms as well
    > as how you code them.


    Yes.

    For the future, I guess normal programmers need to re-think how they
    write and design programs. For a shared memory systems, OS and compilers
    could perhaps do some of it for us (to some extent), but for distributed
    memory systems, that would require much more. Single core, single CPU
    systems, is a thing of the past.

    If going parallel, one advice I hear, is split data, not code.

    --
    Tor <torust [at] online [dot] no>

    "Hello everybody out there using minix - I'm doing a (free) operating
    system (just a hobby, won't be big and professional like gnu) for
    386(486) AT clones" -Linus 1991
     
    Tor Rustad, Oct 1, 2007
    #18
  19. "Tor Rustad" <> wrote in message
    >
    > What kind of computations are you doing on these proteins? E.g. which type
    > of equations do you talk about, and what results are obtained when solving
    > them?
    >
    > The dynamics of many-particle problems are rather complex, even the
    > "simple" 3-particle problem in classical physics, has no analytical
    > solution.
    >

    We don't do dynamics. We calcuate the free energy of lots of conformations
    and try to build up an ensemble that matches the states the protein will
    adopt in solution.
    The plan is to try to model an amyloid fibre and understand why some
    sequences form amyloids much more readily than others.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
     
    Malcolm McLean, Oct 1, 2007
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    12
    Views:
    1,681
    Dave Thompson
    Jan 10, 2005
  2. Coca
    Replies:
    7
    Views:
    763
    Aidan Grey
    Aug 24, 2004
  3. Replies:
    4
    Views:
    392
    Christian Bau
    Feb 11, 2006
  4. Henry Salvia
    Replies:
    2
    Views:
    112
    J. Gleixner
    Mar 19, 2007
  5. Tim Chase
    Replies:
    0
    Views:
    103
    Tim Chase
    Dec 16, 2013
Loading...

Share This Page