Incrementing/decrementing for speed

J. Campbell · Aug 18, 2003

Hi everyone. I'm sure that this is common knowledge for many/most
here. However, it was rather illuminating for me (someone learning
c++) and I thought it might be helpful for someone else. I'm sure I'll
get scolded for an OT post, but I think issues of this type should be
of interest to programmers until they get a handle on them.

I was reading some old threads about the relative advantages of using
++i vs i++ (I won't rehash the many nice discussions that predate this
post) and I decided to do some tests. I'm using the dev-cpp IDE (v
4.9.8.0) and the included compiler on a wintel machine. These results
were obtained using integers, and not the overloaded operators.

What I did was create an empty loop of the following type to determine
the time required to count up to or down 2,000,000,000 units.
e.g.
..
int temp1, temp2;
int i;
temp1 = clock();
for(i = 0; i < 2000000000; i++);
temp2 = clock();
cout << temp2-temp1;
..
..

I found several things.
1) Using different optimization settings can cause dramatically
varying results.
2) The "best" optimization setting does not necessarily produce the
fastest code. In fact, in these tests, using the "third worst"
optimization setting produced the best result. (that is, "minor
optimizations"="On" and "further optimizations"-->"optimize" = "On"
3) using the "++" or "—" operator is not faster than using "i+=b"
4) When a variable is used as the incrementor, or in the comparator, a
speed increase occurs compared to using a literal constant.
5) However, a special case which gives a speed boost (only with the
mentioned optimization settings) is when the comparitor evaluates to
"i!=0" (using the literal constant) or simply "i".

So what am I talking about? Here are some times I got.

for(i=0; i!=2000000000; ++i); //2.7 secs opt 3
for(i=0; i!=2000000000; i++); //2.7 secs opt 3
for(i=0; i!=2000000000; i+=1); //2.7 secs opt 3
int b = 1; for(i=0; i!=2000000000; i+=b); //2.0 secs opt 2
int z=2000000000; int b = 1; for(i=0; i!=b; i+=1); //2.0 secs opt 4
int z=-2000000000; int b=1; for(i=z; i!=0; i+=b); //1.6 secs opt 2
int z=-2000000000; int b=1; int y=0; for(i=z; i!=y; i+=b) //2.0s opt=2
int z=-2000000000; int b=1; for(i=z; i!=0; ++i) //2.0 secs opt 3

So...I guess my main conclusion is...that using a variable instead of
using a literal constant in the "b=1; i+=b;" form can be faster than
"++1". And...that the labels on the optimization settings should be
taken with a grain of salt. BTW...the times I show were the best
times after trying all 5 optimization settings that the compiler
offered for each increment type.
I found no speed difference in the increment vs decrement
times...other than the incidental one that occurs whey you count down
to 0...then a substantial speed gain occurs because this uses the
special !=0 case.

Again...sorry if this is taken as OT. The comp.compiler forum
moderator refers specific compiler inquiries to the appropriate
comp.lang forum...kinda a circular problem...

Peter van Merkerk · Aug 18, 2003

for(i=0; i!=2000000000; ++i); file://2.7 secs opt 3
for(i=0; i!=2000000000; i++); file://2.7 secs opt 3
for(i=0; i!=2000000000; i+=1); file://2.7 secs opt 3
int b = 1; for(i=0; i!=2000000000; i+=b); file://2.0 secs opt 2
int z=2000000000; int b = 1; for(i=0; i!=b; i+=1); file://2.0 secs opt 4
int z=-2000000000; int b=1; for(i=z; i!=0; i+=b); file://1.6 secs opt 2
int z=-2000000000; int b=1; int y=0; for(i=z; i!=y; i+=b) file://2.0s opt=2
int z=-2000000000; int b=1; for(i=z; i!=0; ++i) file://2.0 secs opt 3

So...I guess my main conclusion is...that using a variable instead of
using a literal constant in the "b=1; i+=b;" form can be faster than
"++1".

Any outcome is possible. I would not be surprised that the difference in
speed is caused by better or worse memory alignment. Though other
explanations are possible as well, such as the value of 'b' it is in a
processor register which is faster than explicitly loading '1' from
memory to perform the addition. With a good optimizer there is no
inherent reason why version with the 'b' variable should be faster than
the one without.

The only conclusion that can be drawn from your test is that performance
is difficult to predict, and that the only way to know for sure is to
measure two or more cases. Note that memory access times (influenced by
alignment and caches) can have a significant impact. The results from
your test depend heavilly on many variables, the compiler being a major
one. I'm surprised that your compiler did not completely optimize the
for loop away, MSVC would have.

And...that the labels on the optimization settings should be
taken with a grain of salt. BTW...the times I show were the best
times after trying all 5 optimization settings that the compiler
offered for each increment type.

Many compilers let you choose between optimize for speed or size.
However if the code size becomes too big the likelyhood of cache miss
becomes larger. Also for branch prediction logic found in many modern
processors, it helps if the processor has executed a piece of code
before. Therefor inlining functions does not always help performance.

I found no speed difference in the increment vs decrement
times...other than the incidental one that occurs whey you count down
to 0...then a substantial speed gain occurs because this uses the
special !=0 case.

Again...sorry if this is taken as OT. The comp.compiler forum
moderator refers specific compiler inquiries to the appropriate
comp.lang forum...kinda a circular problem...

I don't think this is really OT since the issues you are describing
applies to other C++ compilers as well.

Gianni Mariani · Aug 18, 2003

Peter van Merkerk wrote:
....

The only conclusion that can be drawn from your test is that performance
is difficult to predict, and that the only way to know for sure is to
measure two or more cases. Note that memory access times (influenced by
alignment and caches) can have a significant impact.

Not in this case as everything should execute in cache and there are no
accesses to data in the loop.

The results from

your test depend heavilly on many variables, the compiler being a major
one. I'm surprised that your compiler did not completely optimize the
for loop away, MSVC would have.

Yes, most newer compilers would have optimized the entire loop away.

Which leads to : *get a better compiler*.

Axter · Aug 19, 2003

Using integers for an increment vs decrement test, is not a valid test.

Most modern compilers will treat increment and decrement the same with
built-in types, by optimizing away the difference.

This is especially true if used in a for() loop like the example
you posted.

To see the difference you need to use a none built-in type.

Then you would have a more accurate test.

Peter van Merkerk · Aug 19, 2003

Both you and the other respondant suggested that a more modern

compiler would have "optimized the whole loop away". While I'd
certainly like to be made aware of such a potential speedup, I
personally *want* the processor to do each instruction that I code.

The problem is that you don't code instructions but C++ code. If you
want to make sure that a processor executes certain instructions you
should code in assembler.

I think I'd be bummed if my compiler refused to do extra work just
because it deemed the output of the work "not important".

The compiler is free to optimize any way it likes as long as it does not
does not affect the observable behaviour. Since 2000000000 * nothing ==
0 * nothing, optimizing the loop away is a valid optimization. In fact
the C++ standard allows even in some cases optimizations that do change
the observable behaviour. A 'volatile' variable can as a hint to the
compiler that it should avoid aggressive optimizations. However
practically speaking 'volatile' variable are not usefull unless you
fully understand what it does on your particular implementation.

Anyway...is there a consensus that the gnu-compiler isn't a good
option for someone learning standard c++?

I think the GNU 3.x compiler is good choice for learning C++ as the
conformity to the C++ standard is quite good. Whether the compiler
produces fast code or not should not be relevant if you are learning
C++.

I looked at speed
comparisons published by Intel for their compiler v7 vs the gnu
compiler...it looked like the intel-compiled code would execute at
circa 25% faster on a pentium4...on average...at a cost of $400.

The GNU compiler does not generate the fastest code, but that does not
need to be a problem. For example if your application is I/O bound, a
compiler that produces faster code would make little difference. And if
you use it to learn C++, performance should be a non-issue anyway.

$400 for a good C++ compiler seems to be a reasonable price if its
benefits are relevant for a project. The total cost of an average
project are ofter several orders of magnitude higher.

Also note that benchmarks can be (and often are) misleading. You will
always have to know how the benchmark is performed and if the test is
representative for your kind of applications. One compiler may produce
much faster code on one piece of code while preforming worse on other
pieces of code. It is not unlikely that a compiler is specifically tuned
for a popular benchmarksuite like SpecInt. I remember one benchmark that
claimed that Java is just as fast as C++. The benchmark consisted of
reading a 20 MBytes file, unfortunately the author did not realize that
he was testing the speed of its harddisk rather than the execution speed
of the Java and C++ code.

Thomas Matthews · Aug 19, 2003

J. Campbell said:
Hi everyone. I'm sure that this is common knowledge for many/most
here. However, it was rather illuminating for me (someone learning
c++) and I thought it might be helpful for someone else. I'm sure I'll
get scolded for an OT post, but I think issues of this type should be
of interest to programmers until they get a handle on them.

I was reading some old threads about the relative advantages of using
++i vs i++ (I won't rehash the many nice discussions that predate this
post) and I decided to do some tests. I'm using the dev-cpp IDE (v
4.9.8.0) and the included compiler on a wintel machine. These results
were obtained using integers, and not the overloaded operators.

What I did was create an empty loop of the following type to determine
the time required to count up to or down 2,000,000,000 units.
e.g.
.
int temp1, temp2;
int i;
temp1 = clock();
for(i = 0; i < 2000000000; i++);
temp2 = clock();
cout << temp2-temp1;
.
.

I found several things.
1) Using different optimization settings can cause dramatically
varying results.
2) The "best" optimization setting does not necessarily produce the
fastest code. In fact, in these tests, using the "third worst"
optimization setting produced the best result. (that is, "minor
optimizations"="On" and "further optimizations"-->"optimize" = "On"
3) using the "++" or "—" operator is not faster than using "i+=b"
4) When a variable is used as the incrementor, or in the comparator, a
speed increase occurs compared to using a literal constant.
5) However, a special case which gives a speed boost (only with the
mentioned optimization settings) is when the comparitor evaluates to
"i!=0" (using the literal constant) or simply "i".

So what am I talking about? Here are some times I got.

for(i=0; i!=2000000000; ++i); //2.7 secs opt 3
for(i=0; i!=2000000000; i++); //2.7 secs opt 3
for(i=0; i!=2000000000; i+=1); //2.7 secs opt 3
int b = 1; for(i=0; i!=2000000000; i+=b); //2.0 secs opt 2
int z=2000000000; int b = 1; for(i=0; i!=b; i+=1); //2.0 secs opt 4
int z=-2000000000; int b=1; for(i=z; i!=0; i+=b); //1.6 secs opt 2
int z=-2000000000; int b=1; int y=0; for(i=z; i!=y; i+=b) //2.0s opt=2
int z=-2000000000; int b=1; for(i=z; i!=0; ++i) //2.0 secs opt 3

So...I guess my main conclusion is...that using a variable instead of
using a literal constant in the "b=1; i+=b;" form can be faster than
"++1". And...that the labels on the optimization settings should be
taken with a grain of salt. BTW...the times I show were the best
times after trying all 5 optimization settings that the compiler
offered for each increment type.
I found no speed difference in the increment vs decrement
times...other than the incidental one that occurs whey you count down
to 0...then a substantial speed gain occurs because this uses the
special !=0 case.

Again...sorry if this is taken as OT. The comp.compiler forum
moderator refers specific compiler inquiries to the appropriate
comp.lang forum...kinda a circular problem...

I believe this is an invalid test.
The compiler can optimize the expressions so they are
equivalent:
++i {for integral types}
i++
i+=1
These expressions all increment the variable. A good compiler
should recognize these and the final executable code should
be the same. If its not, get a better compiler.

To find out about the optimizations, print out the assembly
code for each loop. See if the code differs. Also, your
test is compiler dependent. Run the same test using different
compilers with the same optimization settings.

About loops, decrementing or counting down loops may be
faster than counting up loops. This is due to the fact that
many processors have a single instruction for comparing
or branching on a zero condition. But hey, that is one
instruction. If your program has been profiled to prove
this is the bottleneck, I say there's not much to do but
unroll the loop.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Peter van Merkerk · Aug 19, 2003

About loops, decrementing or counting down loops may be

faster than counting up loops. This is due to the fact that
many processors have a single instruction for comparing
or branching on a zero condition.

The compiler may perform this optimization for you. E.g. MSVC replaces a
incrementing loop counter with a decrementing loop counter if the value
of the counter is not used inside the loop.

But hey, that is one
instruction. If your program has been profiled to prove
this is the bottleneck, I say there's not much to do but
unroll the loop.

A compiler may perform such a optimization for you as well. Loop
unrolling makes most sense on VLIW or EPIC architecture processors. On
other processors the increased code size may cause cache misses, which
could be more detrimental for the performance than an extra instruction
in the loop.

The compiler may even replace the loop with non-looping code that has
the same effect. For example:

int fun(int c)
{
for(int i = 0; i < 2000; ++i)
{
c++;
}
return c;
}

can be optimized to the equivalent of:

int fun(int c)
{
return c+2000;
}

MSVC performs this optimization.

One should code for clarity first and if, and only, if there is a
performance problem try to hand optimize the code. When tweaking code
for speed it is important to do measurements before and after to see if
those tweaks have the desired effect. I have seen too many times
optimization attemps (often based on assumptions rather than facts)
resulting in code that is not only harder to understand but actually
slower than the original version!

Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
Programming math challenge gives wrong answer	2	Aug 6, 2023
Fibonacci	0	May 13, 2023
Crossword	2	May 11, 2020
Incrementing hex address	5	Nov 30, 2011
Q for a source code in an exercise	1	Dec 15, 2023
Idiomatic Python for incrementing pairs	1	Jun 8, 2013
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020

Incrementing/decrementing for speed

J. Campbell

Peter van Merkerk

Gianni Mariani

Axter

Peter van Merkerk

Thomas Matthews

Peter van Merkerk

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads