ultra-fast loop unrolling with g++ -O3

M

mark

Why does the following excerpt of trivial code execute so quickly?

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
static const int SIZE = 1000000;
long nops = 0;
int i, j;
long int outer = atol(argv[1]);
for(i=0; i < outer; i++){
for(j=0; j < SIZE; j++){
++nops;
// arr[j] = arr[j] + 1;
} //for j
} //for i
printf("ran %ld ops\n", nops);
} //main

I compiled this with g++ -O3.
When ran with 50000000000 as an argument, the nops variable is updated
50000000000000000 times. Including loop logic this should take
forever on my 2ghz computer. Yet it runs instantly. I used input from
the command line so that nops simply isn't pre-calculated.

This came about when trying to speed-test C arrays with C++ vectors;
originally the code had an array-update line in the center of the
loops. The vector version was crawling versus the C array (both
compiled with -O3).

What compile/hardware magic is going on, and is it possible to speed
up the vector with it?
 
J

Johannes Bauer

mark said:
What compile/hardware magic is going on, and is it possible to speed
up the vector with it?

1. Your loop is optimized away.
2. Probably not, unless you use vectory which don't do anything useful.
Then yes.

Regards,
Johannes
 
S

santosh

mark said:
Why does the following excerpt of trivial code execute so quickly?

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
static const int SIZE = 1000000;

Better to use long to be safe.
long nops = 0;

unsigned long will give you even more range.
int i, j;
long int outer = atol(argv[1]);
for(i=0; i < outer; i++){

What if 'i' overflows. Make 'i' and 'j' unsigned long or long.
for(j=0; j < SIZE; j++){
++nops;
// arr[j] = arr[j] + 1;
} //for j
} //for i
printf("ran %ld ops\n", nops);
} //main

I compiled this with g++ -O3.
When ran with 50000000000 as an argument, the nops variable is updated
50000000000000000 times. Including loop logic this should take
forever on my 2ghz computer. Yet it runs instantly. I used input from
the command line so that nops simply isn't pre-calculated.

The compiler has optimised the loop away. It simply computes SIZE *
outer and assigns the product to nops. The loop will be left untouched
if you qualify nops with volatile.
What compile/hardware magic is going on, and is it possible to speed
up the vector with it?

You'll have to ask in comp.lang.c++.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top