ultra-fast loop unrolling with g++ -O3

mark · Jun 12, 2008

Why does the following excerpt of trivial code execute so quickly?

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
static const int SIZE = 1000000;
long nops = 0;
int i, j;
long int outer = atol(argv[1]);
for(i=0; i < outer; i++){
for(j=0; j < SIZE; j++){
++nops;
// arr[j] = arr[j] + 1;
} //for j
} //for i
printf("ran %ld ops\n", nops);
} //main

I compiled this with g++ -O3.
When ran with 50000000000 as an argument, the nops variable is updated
50000000000000000 times. Including loop logic this should take
forever on my 2ghz computer. Yet it runs instantly. I used input from
the command line so that nops simply isn't pre-calculated.

This came about when trying to speed-test C arrays with C++ vectors;
originally the code had an array-update line in the center of the
loops. The vector version was crawling versus the C array (both
compiled with -O3).

What compile/hardware magic is going on, and is it possible to speed
up the vector with it?

Johannes Bauer · Jun 12, 2008

mark said:
What compile/hardware magic is going on, and is it possible to speed
up the vector with it?

1. Your loop is optimized away.
2. Probably not, unless you use vectory which don't do anything useful.
Then yes.

Regards,
Johannes

santosh · Jun 12, 2008

mark said:
Why does the following excerpt of trivial code execute so quickly?

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
static const int SIZE = 1000000;

Better to use long to be safe.

long nops = 0;

unsigned long will give you even more range.

int i, j;
long int outer = atol(argv[1]);
for(i=0; i < outer; i++){

What if 'i' overflows. Make 'i' and 'j' unsigned long or long.

for(j=0; j < SIZE; j++){
++nops;
// arr[j] = arr[j] + 1;
} //for j
} //for i
printf("ran %ld ops\n", nops);
} //main

I compiled this with g++ -O3.
When ran with 50000000000 as an argument, the nops variable is updated
50000000000000000 times. Including loop logic this should take
forever on my 2ghz computer. Yet it runs instantly. I used input from
the command line so that nops simply isn't pre-calculated.

The compiler has optimised the loop away. It simply computes SIZE *
outer and assigns the product to nops. The loop will be left untouched
if you qualify nops with volatile.

What compile/hardware magic is going on, and is it possible to speed
up the vector with it?

You'll have to ask in comp.lang.c++.

Help with Loop	0	Mar 30, 2023
unrolling nested for-loop	10	May 10, 2008
Breaking infinite loop with key stroke	1	Jul 27, 2022
g++ loop unrolling performance	1	Aug 31, 2004
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
Alternatives to modifying loop var in the loop.	67	Dec 27, 2013
How to multiply two matrices of size in using inline assembly in C++	2	Mar 3, 2024
for loop skips items	13	Feb 15, 2012

ultra-fast loop unrolling with g++ -O3

mark

Johannes Bauer

santosh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads