More optimisation

user923005 · Jun 17, 2009

Kaz said:
Eric Sosman a écrit :

Kaz said:

Ben Pfaff wrote:
Kelsey Bjarnason wrote:
[...]
Frankly, I'd be surprised if there were any discernible difference
other than possibly on an implementation designed to be
pathological.
In my one-off test, the second method ran 2.5 times faster than
the first. I guess that makes gcc and Opteron "pathological."
Is this on Linux with glibc?

Click to expand...

Click to expand...

Alas, no: Windows Vaster. (My excuse: I'm working with another
company's product, one that requires a Windows-based management
station even when running on a different platform entirely. So a
Windows desktop is pretty much forced upon me.) The C implementation
is the gcc-based DJGPP environment, originally developed for MS-DOS.

Click to expand...

I just compiled and profiled it as well, on gcc 4.3.3 on GNU/Linux
Ubuntu, on a regular intel machine.
Method 2 won :

% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.50 0.05 0.05 1 50.00 50.00 method1
37.50 0.08 0.03 1 30.00 30.00 method2

BUT

This first benchmark was a bit simplistic.

So I made a real one, where method1 and method2 are successively called
50 times, and then their calls are interlaced 50 times.

Here is the gprof result (initarrays is just what its name says) :

% cumulative self self total
time seconds seconds calls ms/call ms/call name
51.46 2.99 2.99 100 29.90 29.90 method2
47.50 5.75 2.76 100 27.60 27.60 method1
1.03 5.81 0.06 1 60.00 60.00 initarrays

So we can say that the difference becomes negligible in the real world.

If you're interested, here is my benchmark, full of bugs and
inaccuracies, of course, and with no doubt someone will point out that
this is *not* real world... But, was the original problem real world ?

==========================
#include <malloc.h>
#include <stdio.h>
#include <time.h>

#define N 1024*1024

void initarrays(double *a, double *b, double *c)
{
long int i;
for(i=0; i<N; i++) {
a = 2;
b = 2;
c = 2;
}

}

void method1(double *a, double *b, double *c)
{
long int i;

for(i=0; i<N; i++) {
a *= 1;
b *= 2;
c *= 3;
}

}

void method2(double *a, double *b, double *c)
{
long int i;

for(i=0; i<N; i++) {
a *= 1;
}
for(i=0; i<N; i++) {
b *= 2;
}
for(i=0; i<N; i++) {
c *= 3;
}

}

int main(int argc, char *argv[])
{
time_t start, end;
int i;
double *a = malloc(N * sizeof *a);
double *b = malloc(N * sizeof *b);
double *c = malloc(N * sizeof *c);

initarrays(a, b, c);

printf("Running method 1 50 times...\n");
start = time(NULL);
for(i=0; i<50; i++) {
method1(a, b, c);
}
end = time(NULL);
printf("%ld seconds\n", (unsigned long int)(end - start));
printf("\n");

printf("Running method 2 50 times...\n");
start = time(NULL);
for(i=0; i<50; i++) {
method2(a, b, c);
}
end = time(NULL);
printf("%ld seconds\n", (unsigned long int)(end - start));

printf("\n");

printf("Interlacing method 1 & 2 50 times...\n");
start = time(NULL);
for(i=0; i<50; i++) {
method1(a, b, c);
method2(a, b, c);
}
end = time(NULL);
printf("%ld seconds\n", (unsigned long int)(end - start));

free(a);
free(b);
free(c);

return 0;}

==========================

Here is my adaptation and the default values that I got:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

static size_t N = 1024 * 1024;
static unsigned Iterations = 500;

void initarrays(double *a, double *b, double *c)
{
long int i;
for (i = 0; i < N; i++) {
a = 2;
b = 2;
c = 2;
}
}

void method1(double *a, double *b, double *c)
{
long int i;
for (i = 0; i < N; i++) {
a *= 1;
b *= 2;
c *= 3;
}
}

void method2(double *a, double *b, double *c)
{
long int i;
for (i = 0; i < N; i++) {
a *= 1;
}
for (i = 0; i < N; i++) {
b *= 2;
}
for (i = 0; i < N; i++) {
c *= 3;
}
}

int main(int argc, char *argv[])
{
time_t start,
end;
int i;
double *a;
double *b;
double *c;
size_t v = 0;

if (argc > 1)
v = atof(argv[1]);

if (v > 0)
N = v;

a = malloc(N * sizeof *a);
b = malloc(N * sizeof *b);
c = malloc(N * sizeof *c);
initarrays(a, b, c);
printf("Running method 1 %u times...\n", Iterations);
start = time(NULL);
for (i = 0; i < Iterations; i++) {
method1(a, b, c);
}
end = time(NULL);
printf("%ld seconds\n", (unsigned long int) (end - start));
printf("\n");
printf("Running method 2 %u times...\n", Iterations);
start = time(NULL);
for (i = 0; i < Iterations; i++) {
method2(a, b, c);
}
end = time(NULL);
printf("%ld seconds\n", (unsigned long int) (end - start));
printf("\n");
printf("Interlacing method 1 & 2 %u times...\n", Iterations);
start = time(NULL);
for (i = 0; i < Iterations; i++) {
method1(a, b, c);
method2(a, b, c);
}
end = time(NULL);
printf("%ld seconds\n", (unsigned long int) (end - start));
free(a);
free(b);
free(c);
return 0;
}
/*
Running method 1 500 times...
3 seconds

Running method 2 500 times...
4 seconds

Interlacing method 1 & 2 500 times...
8 seconds
*/

Tim Prince · Jun 17, 2009

user923005 said:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

static size_t N = 1024 * 1024;
static unsigned Iterations = 500;

void initarrays(double *a, double *b, double *c)
{
long int i;
for (i = 0; i < N; i++) {
a = 2;
b = 2;
c = 2;
}
}

void method1(double *a, double *b, double *c)
{
long int i;
for (i = 0; i < N; i++) {
a *= 1;
b *= 2;
c *= 3;
}
}

void method2(double *a, double *b, double *c)
{
long int i;
for (i = 0; i < N; i++) {
a *= 1;
}
for (i = 0; i < N; i++) {
b *= 2;
}
for (i = 0; i < N; i++) {
c *= 3;
}
}

int main(int argc, char *argv[])
{
time_t start,
end;
int i;
double *a;
double *b;
double *c;
size_t v = 0;

if (argc > 1)
v = atof(argv[1]);

if (v > 0)
N = v;

a = malloc(N * sizeof *a);
b = malloc(N * sizeof *b);
c = malloc(N * sizeof *c);
initarrays(a, b, c);
printf("Running method 1 %u times...\n", Iterations);
start = time(NULL);
for (i = 0; i < Iterations; i++) {
method1(a, b, c);
}
end = time(NULL);
printf("%ld seconds\n", (unsigned long int) (end - start));
printf("\n");
printf("Running method 2 %u times...\n", Iterations);
start = time(NULL);
for (i = 0; i < Iterations; i++) {
method2(a, b, c);
}
end = time(NULL);
printf("%ld seconds\n", (unsigned long int) (end - start));
printf("\n");
printf("Interlacing method 1 & 2 %u times...\n", Iterations);
start = time(NULL);
for (i = 0; i < Iterations; i++) {
method1(a, b, c);
method2(a, b, c);
}
end = time(NULL);
printf("%ld seconds\n", (unsigned long int) (end - start));
free(a);
free(b);
free(c);
return 0;
}

After restrict qualifiers are added, icc turns method 2 into the same as
method 1. gcc and Sun cc don't optimize either method.

BartC · Jun 18, 2009

Richard Harter said:
Eric Sosman said:

Kelsey Bjarnason wrote:
[...]

Frankly, I'd be surprised if there were any discernible difference
other
than possibly on an implementation designed to be pathological.

In my one-off test, the second method ran 2.5 times faster than
the first. I guess that makes gcc and Opteron "pathological."

Click to expand...

In my tests method 2 ran 0 to 100% slower than method 1, in accordance
with
/my/ theory that method 2 has 3 times as much loop control as method 1.

Click to expand...

What optimization level did you use. At higher levels the
optimizer should do loop unrolling for which the loop control cost
is insignificant.

Using gcc under Windows, using arrays of a million elements, and the loop
repeated 100 times, the timings were something like:

Int arrays:
No optimisation options: 1200ms and 2000ms
With -O3: 900ms and 1100ms (I've also seen both 900ms)

Double arrays, set to 0.0 or 1.0:
No optimisation options: 1800ms and 3200ms
With -O3: 2000ms and 2800ms

I never saw method 2 faster.

Edwin van den Oetelaar · Jun 18, 2009

Hamiral said:
So I made a real one, where method1 and method2 are successively called
50 times, and then their calls are interlaced 50 times.

Here is the gprof result (initarrays is just what its name says) :

% cumulative self self total
time seconds seconds calls ms/call ms/call name
51.46 2.99 2.99 100 29.90 29.90 method2
47.50 5.75 2.76 100 27.60 27.60 method1
1.03 5.81 0.06 1 60.00 60.00 initarrays

So we can say that the difference becomes negligible in the real world.

I confirm using your benchmark and 5000 iteration on my PC. (dual cpu athlon)
It does not make a difference.

oetelaar@debian:~$ gcc -Wall -O3 benchmark.c -o benchmark1
oetelaar@debian:~$ ./benchmark1
Running method 1 5000 times...
69 seconds

Running method 2 5000 times...
71 seconds

Interlacing method 1 & 2 5000 times...
142 seconds

[ snip benchmark code ]

James Dow Allen · Jun 19, 2009

It might be fun to start a thread on "Unusual Optimizations";
less-obvious techniques some of us have encountered to enhance
speed.

kid said:
kid said:

Method 1:
for(long int i = 0; i < n; i++) {
a *= 2;
b *= 3;
c *= 4;
}

Click to expand...

... I went
ahead an measured it on *my* system for n=1024*1024 and three arrays
of mallocated `double': Method 2 took 40% as much time as Method 1.

Others suggested this anomaly was caused by some sort of cache
collision. I suppose the mallocated addresses differed by
1024*1024+4 or so?
Did you experiment with a substitution like 1111*1111 in the malloc()
to avert cache collision? If the cache collision hypothesis
is confirmed, an "unusual optimization" is suggested, e.g.

int a[1024*1024];
/* We stick the foos between a/b hoping to avoid a/b cache collision.
... there are better ways to achieve this.
*/
struct foo bar[NUM_FOOS];
int b[1024*1024];

Note that the problem can arise in a less contrived example, e.g.
a = b * c;

I did something like this in World's Fastest Jpeg Compressor(tm),
making inner-loop functions contiguous and arranging that the
primary data work area had different low-order address bits.
Although I *do* try to err on the side of Pointless Perfectionism,
that wasn't even a Premature Optimization: I'd observed a serious
timing anomaly, as I reported on Usenet:
http://groups.google.com/group/alt.sys.sun/msg/0bfdcb0c000f3e97

It will be fun to compare notes on other non-obvious optimizations.
Ideally they should have sped up a real application significantly
(though "pointless" micro-optimizations can be fun to talk about too).

James Dow Allen

Command Line Arguments	0	Mar 7, 2023
Where is my mistake? Why is s equal to minus infinity at some loop iterations?	0	Oct 9, 2022
Function is not worked in C	2	Jun 26, 2023
Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 12, 2025
Adding adressing of IPv6 to program	1	Feb 16, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Runtime Error with __gcd? (floating point exception)	1	Nov 27, 2024
Homework in C - Help Needed	1	Oct 16, 2024

More optimisation

user923005

Tim Prince

BartC

Edwin van den Oetelaar

James Dow Allen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads