Incrementing/decrementing for speed

Discussion in 'C++' started by J. Campbell, Aug 18, 2003.

  1. J. Campbell

    J. Campbell Guest

    Hi everyone. I'm sure that this is common knowledge for many/most
    here. However, it was rather illuminating for me (someone learning
    c++) and I thought it might be helpful for someone else. I'm sure I'll
    get scolded for an OT post, but I think issues of this type should be
    of interest to programmers until they get a handle on them.

    I was reading some old threads about the relative advantages of using
    ++i vs i++ (I won't rehash the many nice discussions that predate this
    post) and I decided to do some tests. I'm using the dev-cpp IDE (v
    4.9.8.0) and the included compiler on a wintel machine. These results
    were obtained using integers, and not the overloaded operators.

    What I did was create an empty loop of the following type to determine
    the time required to count up to or down 2,000,000,000 units.
    e.g.
    ..
    int temp1, temp2;
    int i;
    temp1 = clock();
    for(i = 0; i < 2000000000; i++);
    temp2 = clock();
    cout << temp2-temp1;
    ..
    ..

    I found several things.
    1) Using different optimization settings can cause dramatically
    varying results.
    2) The "best" optimization setting does not necessarily produce the
    fastest code. In fact, in these tests, using the "third worst"
    optimization setting produced the best result. (that is, "minor
    optimizations"="On" and "further optimizations"-->"optimize" = "On"
    3) using the "++" or "—" operator is not faster than using "i+=b"
    4) When a variable is used as the incrementor, or in the comparator, a
    speed increase occurs compared to using a literal constant.
    5) However, a special case which gives a speed boost (only with the
    mentioned optimization settings) is when the comparitor evaluates to
    "i!=0" (using the literal constant) or simply "i".

    So what am I talking about? Here are some times I got.

    for(i=0; i!=2000000000; ++i); //2.7 secs opt 3
    for(i=0; i!=2000000000; i++); //2.7 secs opt 3
    for(i=0; i!=2000000000; i+=1); //2.7 secs opt 3
    int b = 1; for(i=0; i!=2000000000; i+=b); //2.0 secs opt 2
    int z=2000000000; int b = 1; for(i=0; i!=b; i+=1); //2.0 secs opt 4
    int z=-2000000000; int b=1; for(i=z; i!=0; i+=b); //1.6 secs opt 2
    int z=-2000000000; int b=1; int y=0; for(i=z; i!=y; i+=b) //2.0s opt=2
    int z=-2000000000; int b=1; for(i=z; i!=0; ++i) //2.0 secs opt 3


    So...I guess my main conclusion is...that using a variable instead of
    using a literal constant in the "b=1; i+=b;" form can be faster than
    "++1". And...that the labels on the optimization settings should be
    taken with a grain of salt. BTW...the times I show were the best
    times after trying all 5 optimization settings that the compiler
    offered for each increment type.
    I found no speed difference in the increment vs decrement
    times...other than the incidental one that occurs whey you count down
    to 0...then a substantial speed gain occurs because this uses the
    special !=0 case.

    Again...sorry if this is taken as OT. The comp.compiler forum
    moderator refers specific compiler inquiries to the appropriate
    comp.lang forum...kinda a circular problem...
     
    J. Campbell, Aug 18, 2003
    #1
    1. Advertising

  2. <snip>
    > for(i=0; i!=2000000000; ++i); file://2.7 secs opt 3
    > for(i=0; i!=2000000000; i++); file://2.7 secs opt 3
    > for(i=0; i!=2000000000; i+=1); file://2.7 secs opt 3
    > int b = 1; for(i=0; i!=2000000000; i+=b); file://2.0 secs opt 2
    > int z=2000000000; int b = 1; for(i=0; i!=b; i+=1); file://2.0 secs

    opt 4
    > int z=-2000000000; int b=1; for(i=z; i!=0; i+=b); file://1.6 secs

    opt 2
    > int z=-2000000000; int b=1; int y=0; for(i=z; i!=y; i+=b) file://2.0s

    opt=2
    > int z=-2000000000; int b=1; for(i=z; i!=0; ++i) file://2.0 secs

    opt 3
    >
    > So...I guess my main conclusion is...that using a variable instead of
    > using a literal constant in the "b=1; i+=b;" form can be faster than
    > "++1".


    Any outcome is possible. I would not be surprised that the difference in
    speed is caused by better or worse memory alignment. Though other
    explanations are possible as well, such as the value of 'b' it is in a
    processor register which is faster than explicitly loading '1' from
    memory to perform the addition. With a good optimizer there is no
    inherent reason why version with the 'b' variable should be faster than
    the one without.

    The only conclusion that can be drawn from your test is that performance
    is difficult to predict, and that the only way to know for sure is to
    measure two or more cases. Note that memory access times (influenced by
    alignment and caches) can have a significant impact. The results from
    your test depend heavilly on many variables, the compiler being a major
    one. I'm surprised that your compiler did not completely optimize the
    for loop away, MSVC would have.

    > And...that the labels on the optimization settings should be
    > taken with a grain of salt. BTW...the times I show were the best
    > times after trying all 5 optimization settings that the compiler
    > offered for each increment type.


    Many compilers let you choose between optimize for speed or size.
    However if the code size becomes too big the likelyhood of cache miss
    becomes larger. Also for branch prediction logic found in many modern
    processors, it helps if the processor has executed a piece of code
    before. Therefor inlining functions does not always help performance.

    > I found no speed difference in the increment vs decrement
    > times...other than the incidental one that occurs whey you count down
    > to 0...then a substantial speed gain occurs because this uses the
    > special !=0 case.
    >
    > Again...sorry if this is taken as OT. The comp.compiler forum
    > moderator refers specific compiler inquiries to the appropriate
    > comp.lang forum...kinda a circular problem...


    I don't think this is really OT since the issues you are describing
    applies to other C++ compilers as well.

    --
    Peter van Merkerk
    peter.van.merkerk(at)dse.nl
     
    Peter van Merkerk, Aug 18, 2003
    #2
    1. Advertising

  3. Peter van Merkerk wrote:
    ....
    >
    > The only conclusion that can be drawn from your test is that performance
    > is difficult to predict, and that the only way to know for sure is to
    > measure two or more cases. Note that memory access times (influenced by
    > alignment and caches) can have a significant impact.


    Not in this case as everything should execute in cache and there are no
    accesses to data in the loop.

    The results from
    > your test depend heavilly on many variables, the compiler being a major
    > one. I'm surprised that your compiler did not completely optimize the
    > for loop away, MSVC would have.


    Yes, most newer compilers would have optimized the entire loop away.

    Which leads to : *get a better compiler*.
     
    Gianni Mariani, Aug 18, 2003
    #3
  4. J. Campbell

    Axter Guest

    Using integers for an increment vs decrement test, is not a valid test.

    Most modern compilers will treat increment and decrement the same with
    built-in types, by optimizing away the difference.

    This is especially true if used in a for() loop like the example
    you posted.



    To see the difference you need to use a none built-in type.

    Then you would have a more accurate test.


    --
    Top ten Expert at Experts-Exchange


    Posted via http://dbforums.com
     
    Axter, Aug 19, 2003
    #4
  5. > Both you and the other respondant suggested that a more modern
    > compiler would have "optimized the whole loop away". While I'd
    > certainly like to be made aware of such a potential speedup, I
    > personally *want* the processor to do each instruction that I code.


    The problem is that you don't code instructions but C++ code. If you
    want to make sure that a processor executes certain instructions you
    should code in assembler.

    > I think I'd be bummed if my compiler refused to do extra work just
    > because it deemed the output of the work "not important".


    The compiler is free to optimize any way it likes as long as it does not
    does not affect the observable behaviour. Since 2000000000 * nothing ==
    0 * nothing, optimizing the loop away is a valid optimization. In fact
    the C++ standard allows even in some cases optimizations that do change
    the observable behaviour. A 'volatile' variable can as a hint to the
    compiler that it should avoid aggressive optimizations. However
    practically speaking 'volatile' variable are not usefull unless you
    fully understand what it does on your particular implementation.

    > Anyway...is there a consensus that the gnu-compiler isn't a good
    > option for someone learning standard c++?


    I think the GNU 3.x compiler is good choice for learning C++ as the
    conformity to the C++ standard is quite good. Whether the compiler
    produces fast code or not should not be relevant if you are learning
    C++.

    > I looked at speed
    > comparisons published by Intel for their compiler v7 vs the gnu
    > compiler...it looked like the intel-compiled code would execute at
    > circa 25% faster on a pentium4...on average...at a cost of $400.


    The GNU compiler does not generate the fastest code, but that does not
    need to be a problem. For example if your application is I/O bound, a
    compiler that produces faster code would make little difference. And if
    you use it to learn C++, performance should be a non-issue anyway.

    $400 for a good C++ compiler seems to be a reasonable price if its
    benefits are relevant for a project. The total cost of an average
    project are ofter several orders of magnitude higher.

    Also note that benchmarks can be (and often are) misleading. You will
    always have to know how the benchmark is performed and if the test is
    representative for your kind of applications. One compiler may produce
    much faster code on one piece of code while preforming worse on other
    pieces of code. It is not unlikely that a compiler is specifically tuned
    for a popular benchmarksuite like SpecInt. I remember one benchmark that
    claimed that Java is just as fast as C++. The benchmark consisted of
    reading a 20 MBytes file, unfortunately the author did not realize that
    he was testing the speed of its harddisk rather than the execution speed
    of the Java and C++ code.
    --
    Peter van Merkerk
    peter.van.merkerk(at)dse.nl
     
    Peter van Merkerk, Aug 19, 2003
    #5
  6. J. Campbell wrote:

    > Hi everyone. I'm sure that this is common knowledge for many/most
    > here. However, it was rather illuminating for me (someone learning
    > c++) and I thought it might be helpful for someone else. I'm sure I'll
    > get scolded for an OT post, but I think issues of this type should be
    > of interest to programmers until they get a handle on them.
    >
    > I was reading some old threads about the relative advantages of using
    > ++i vs i++ (I won't rehash the many nice discussions that predate this
    > post) and I decided to do some tests. I'm using the dev-cpp IDE (v
    > 4.9.8.0) and the included compiler on a wintel machine. These results
    > were obtained using integers, and not the overloaded operators.
    >
    > What I did was create an empty loop of the following type to determine
    > the time required to count up to or down 2,000,000,000 units.
    > e.g.
    > .
    > int temp1, temp2;
    > int i;
    > temp1 = clock();
    > for(i = 0; i < 2000000000; i++);
    > temp2 = clock();
    > cout << temp2-temp1;
    > .
    > .
    >
    > I found several things.
    > 1) Using different optimization settings can cause dramatically
    > varying results.
    > 2) The "best" optimization setting does not necessarily produce the
    > fastest code. In fact, in these tests, using the "third worst"
    > optimization setting produced the best result. (that is, "minor
    > optimizations"="On" and "further optimizations"-->"optimize" = "On"
    > 3) using the "++" or "—" operator is not faster than using "i+=b"
    > 4) When a variable is used as the incrementor, or in the comparator, a
    > speed increase occurs compared to using a literal constant.
    > 5) However, a special case which gives a speed boost (only with the
    > mentioned optimization settings) is when the comparitor evaluates to
    > "i!=0" (using the literal constant) or simply "i".
    >
    > So what am I talking about? Here are some times I got.
    >
    > for(i=0; i!=2000000000; ++i); //2.7 secs opt 3
    > for(i=0; i!=2000000000; i++); //2.7 secs opt 3
    > for(i=0; i!=2000000000; i+=1); //2.7 secs opt 3
    > int b = 1; for(i=0; i!=2000000000; i+=b); //2.0 secs opt 2
    > int z=2000000000; int b = 1; for(i=0; i!=b; i+=1); //2.0 secs opt 4
    > int z=-2000000000; int b=1; for(i=z; i!=0; i+=b); //1.6 secs opt 2
    > int z=-2000000000; int b=1; int y=0; for(i=z; i!=y; i+=b) //2.0s opt=2
    > int z=-2000000000; int b=1; for(i=z; i!=0; ++i) //2.0 secs opt 3
    >
    >
    > So...I guess my main conclusion is...that using a variable instead of
    > using a literal constant in the "b=1; i+=b;" form can be faster than
    > "++1". And...that the labels on the optimization settings should be
    > taken with a grain of salt. BTW...the times I show were the best
    > times after trying all 5 optimization settings that the compiler
    > offered for each increment type.
    > I found no speed difference in the increment vs decrement
    > times...other than the incidental one that occurs whey you count down
    > to 0...then a substantial speed gain occurs because this uses the
    > special !=0 case.
    >
    > Again...sorry if this is taken as OT. The comp.compiler forum
    > moderator refers specific compiler inquiries to the appropriate
    > comp.lang forum...kinda a circular problem...


    I believe this is an invalid test.
    The compiler can optimize the expressions so they are
    equivalent:
    ++i {for integral types}
    i++
    i+=1
    These expressions all increment the variable. A good compiler
    should recognize these and the final executable code should
    be the same. If its not, get a better compiler.

    To find out about the optimizations, print out the assembly
    code for each loop. See if the code differs. Also, your
    test is compiler dependent. Run the same test using different
    compilers with the same optimization settings.

    About loops, decrementing or counting down loops may be
    faster than counting up loops. This is due to the fact that
    many processors have a single instruction for comparing
    or branching on a zero condition. But hey, that is one
    instruction. If your program has been profiled to prove
    this is the bottleneck, I say there's not much to do but
    unroll the loop.

    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Aug 19, 2003
    #6
  7. > About loops, decrementing or counting down loops may be
    > faster than counting up loops. This is due to the fact that
    > many processors have a single instruction for comparing
    > or branching on a zero condition.


    The compiler may perform this optimization for you. E.g. MSVC replaces a
    incrementing loop counter with a decrementing loop counter if the value
    of the counter is not used inside the loop.

    > But hey, that is one
    > instruction. If your program has been profiled to prove
    > this is the bottleneck, I say there's not much to do but
    > unroll the loop.


    A compiler may perform such a optimization for you as well. Loop
    unrolling makes most sense on VLIW or EPIC architecture processors. On
    other processors the increased code size may cause cache misses, which
    could be more detrimental for the performance than an extra instruction
    in the loop.

    The compiler may even replace the loop with non-looping code that has
    the same effect. For example:

    int fun(int c)
    {
    for(int i = 0; i < 2000; ++i)
    {
    c++;
    }
    return c;
    }

    can be optimized to the equivalent of:

    int fun(int c)
    {
    return c+2000;
    }

    MSVC performs this optimization.

    One should code for clarity first and if, and only, if there is a
    performance problem try to hand optimize the code. When tweaking code
    for speed it is important to do measurements before and after to see if
    those tweaks have the desired effect. I have seen too many times
    optimization attemps (often based on assumptions rather than facts)
    resulting in code that is not only harder to understand but actually
    slower than the original version!

    --
    Peter van Merkerk
    peter.van.merkerk(at)dse.nl
     
    Peter van Merkerk, Aug 19, 2003
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. srikanth
    Replies:
    5
    Views:
    321
    CBFalconer
    Mar 9, 2006
  2. pozz
    Replies:
    10
    Views:
    519
  3. Mitko Haralanov
    Replies:
    0
    Views:
    242
    Mitko Haralanov
    Apr 1, 2008
  4. Bill Cunningham

    decrementing arrays

    Bill Cunningham, Jun 20, 2008, in forum: C Programming
    Replies:
    77
    Views:
    1,442
  5. Une bévue

    for loop decrementing ?

    Une bévue, Jul 8, 2006, in forum: Ruby
    Replies:
    2
    Views:
    108
    Une bévue
    Jul 8, 2006
Loading...

Share This Page