Re: Code optimisation

Discussion in 'C++' started by Peter van Merkerk, Aug 27, 2003.

  1. > Fyi, test does indeed contain either 0 or 1.
    >
    > I did think about gathering the non zero terms but I don;t believe this

    will
    > be effecient because in reality there are several test's and they change
    > during the iteration.


    Rather than building a test array first and then gather the non-zero term,
    maybe it is also possible to directly gather the non-zero terms without
    filling a test array?

    > I like the idea of putting
    >
    > out += test ? in : 0;
    >
    > but I suspect (as one response noted) that most compilers will do

    something
    > like this anyway.


    The compiler won't care how you write it, but you do. So make the code as
    clear as possible.

    > As a matter of interest, are compilers smart enough not to bother adding
    > zero - or is faster to add than to test?


    Only if the compiler can determine that it will be always adding zero at
    compile time, e.g.: a += 0; will most likely be optimized away. Integer
    addition is very fast (often the fastest instruction) on all processors I
    know of. Floating point add is on many modern processors pretty fast as
    well. However branches seem to get slower with every new processor
    generation. In most, if not all, cases testing for zero before doing a add
    would actually slow things down.

    Rather than tweaking you code here and there, I think you best bet is to
    reconsider the algorithm you are using. My experience is that tweaking code
    improves the performance typically no more than 10% if you are lucky. A
    better algorithm can often improve the performance by many times and
    sometimes by more than an order of magnitude. If you are working on large
    data sets, you might also want to consider the memory access patterns,
    though I would take a good look at the algorithm first as optimizing memory
    accesses can get ugly.
    --
    Peter van Merkerk
    peter.van.merkerk(at)dse.nl
    Peter van Merkerk, Aug 27, 2003
    #1
    1. Advertising

  2. Peter van Merkerk

    Alan Sung Guest

    On a lot of modern CPU architectures, they now employ the use of predicate
    registers so that the code:

    if(test)
    out = 4*in
    else
    out = 0.0;

    does not contain any branches and does not require any pipeline flushes.
    This turns into something like (pseudo code obviously):

    PredReg1 = CMP test
    MOVE-IF-TRUE PredReg1 out <-- 4*in
    MOVE-IF-FALSE PredReg1 out <-- 0.0

    The 2 MOVE-IF instructions are often executed in parallel.

    Changing to:

    out = test*(4*in);

    might introduce a multiply instruction which might actually slow things down
    on some CPUs. So do as "Peter van Merkerk" says "Rather than tweaking you
    code here and there, I think your best bet is to reconsider the algorithm
    you are using."

    -al sung
    Rapid Realm Technology, Inc.
    Hopkinton, MA
    Alan Sung, Aug 27, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Agent Mulder

    Re: Code optimisation

    Agent Mulder, Aug 27, 2003, in forum: C++
    Replies:
    1
    Views:
    303
    Peter van Merkerk
    Aug 27, 2003
  2. Rob Williscroft

    Re: Code optimisation

    Rob Williscroft, Aug 27, 2003, in forum: C++
    Replies:
    2
    Views:
    344
    Peter van Merkerk
    Aug 28, 2003
  3. Rob Williscroft

    Re: Code optimisation

    Rob Williscroft, Aug 27, 2003, in forum: C++
    Replies:
    1
    Views:
    338
    Peter van Merkerk
    Aug 27, 2003
  4. mjm

    Re: Code optimisation

    mjm, Aug 29, 2003, in forum: C++
    Replies:
    2
    Views:
    332
    Peter van Merkerk
    Aug 29, 2003
  5. Farraige
    Replies:
    4
    Views:
    275
    Farraige
    Nov 8, 2006
Loading...

Share This Page