Superbasic efficiency question

aaronfude · Jan 24, 2005

I'm working on a scientific computing application. I need a class
called Element which is no more than a collection of integers, or
"nodes" and has only on method int getNode(int i).

I would like to implement in the most efficient was possible. So I
summoned up my programming intellect and asked myself: Do I want to
have members such as int a, b, c, d, e or a single member such as int
a[5]. So I wrote the following snippet and compiled it with a -O3 flag:

int main(char *argv[], int argc) {

/*
int a, b, c, d, e;
for (int i = 0; i < 1000000000; i++) {
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}

*/

int a[5];
for (int i = 0; i < 1000000000; i++) {
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
}

return 0;
}

The first (commented out) version ran twice as fast. (For doubles
instead of ints, it was a factor of 4). So the simpleton part of me
thinks that that answers my question. The remaining part tells me that
it is never that simple. Finally, the cynical part of me thinks that it
all doesn't matter and other parts of the program are bound to be far
more time consuming.

Please help me resolve my internal struggle.
Many thanks in advance!

Aaron Fude

Joseph Turian · Jan 24, 2005

Aaron,

Performance on the toy code snippet is not informative of anything more
than performance on the toy code snippet.

Please help me resolve my internal struggle.

10 Write the code the most clear, maintable way possible.
20 Run the code under a profiler.
30 Optimize the actual performance bottleneck (as opposed to, say,
imaginary performance bottlenecks).
40 GOTO 20

If you have any problems with 30, get back to the group. Until then,
both you and us are just shooting in the dark.

Joseph

David White · Jan 24, 2005

Joseph Turian said:
Aaron,

Performance on the toy code snippet is not informative of anything more
than performance on the toy code snippet.

10 Write the code the most clear, maintable way possible.
20 Run the code under a profiler.
30 Optimize the actual performance bottleneck (as opposed to, say,
imaginary performance bottlenecks).

Optimizing might require significant changes, even major design changes. If
you know from the outset that speed is important, then consider it from the
outset. Toy code snippets can help you determine early on how you'll need to
go about doing certain things.

40 GOTO 20

If you have any problems with 30, get back to the group. Until then,
both you and us are just shooting in the dark.

DW

Sethalicious · Jan 24, 2005

I'm thinking that your getting a silent error on the code. I don't
believe an integer can handle the value 1000000000 in the following:

for (int i = 0; i < 1000000000; i++)

So what's happening is your integer "i" is overflowing and getting set
to an unpredictable value (possibly negative). You might have better
luck using an "unsigned long" or even "std::size_t".

I'm not sure what the standard says, but I think most compilers
implement 32 bit integers. So an "unsigned" int is 0 to 2^32, and the
"signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

Check your compiler's docs to see the sizes of the integral data types.

Otherwise, there should be no real difference in performance of the
code with optimizations on. When you measure the time, remember to
record cpu clock cycles instead of time in seconds.
Hope this helps!

Chris J. (aka Sethalicious)

davidrubin · Jan 24, 2005

My feeling is that when you try to implement 'getNode', you will find
that the array-based implementation is faster than a switch statement.
You might be able to squeeze out some more performance though with a
variable-based implementation if you templatize 'getNode' on i, and
specialize for values of i. (Presumably there is some more generic code
in your application which makes 'getNodeA', 'getNodeB', etc.,
prohibitive.)

Gianni Mariani · Jan 24, 2005

I'm working on a scientific computing application. I need a class
called Element which is no more than a collection of integers, or
"nodes" and has only on method int getNode(int i).

I would like to implement in the most efficient was possible. So I
summoned up my programming intellect and asked myself: Do I want to
have members such as int a, b, c, d, e or a single member such as int
a[5]. So I wrote the following snippet and compiled it with a -O3 flag:

int main(char *argv[], int argc) {

/*
int a, b, c, d, e;
for (int i = 0; i < 1000000000; i++) {
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}

*/

int a[5];
for (int i = 0; i < 1000000000; i++) {
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
}

return 0;
}

The first (commented out) version ran twice as fast. (For doubles
instead of ints, it was a factor of 4). So the simpleton part of me
thinks that that answers my question. The remaining part tells me that
it is never that simple. Finally, the cynical part of me thinks that it
all doesn't matter and other parts of the program are bound to be far
more time consuming.

It is more than likely that the compiler re-arranged your code

int a, b, c, d, e;
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
for (int i = 0; i < 1000000000; i++) {
}

Or, perhaps the compiler placed the values in registers.

There is a deeper design question for you.

Are these values really related ? Do you do operations on them in
tandem ? Would you ever think that it might be interesting to write a
template function with a "pointer to member" of one of these values ?

I would go with the 5 separate values if they are truly separate. That
way it will be harder to run into other problems like going past array
bounds or issues with using the wrong index etc.

Anyhow, below is an example where the compiler can't (easily) make the
optimization above. The results are essentially identical with:
gcc version 4.0.0 20050102 (experimental)

#include <ostream>
#include <iostream>

struct X
{
virtual void F() = 0; // hard for compiler to optimize this
};

struct A
{
int a, b, c, d, e;
};

struct B
{
int a[5];
};

struct Av
: A, X
{
Av()
: A()
{
}

virtual void F()
{
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}
};

struct Bv
: B, X
{
Bv()
: B()
{
}

virtual void F()
{
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
a[5] = 6;
}
};

int main( int argc, char ** argv )
{
X * x;
if ( argc >= 2 )
{
std::cout << "Making an A\n";
x = new Av;
}
else
{
std::cout << "Making a B\n";
x = new Bv;
}

for (int i = 0; i < 1000000000; i++)
{
x->F();
}

}

$ g++ -O3 -o ./optimal_array_or_members ./optimal_array_or_members.cpp
$ time ./optimal_array_or_members
Making a B
6.900u 0.000s 0:06.92 99.7% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members d
Making an A
6.770u 0.000s 0:06.78 99.8% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members
Making a B
6.960u 0.010s 0:06.96 100.1% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members s
Making an A
6.920u 0.000s 0:06.92 100.0% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members
Making a B
7.010u 0.000s 0:07.00 100.1% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members s
Making an A
6.950u 0.000s 0:06.95 100.0% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members
Making a B
6.770u 0.000s 0:06.76 100.1% 0+0k 0+0io 216pf+0w
$ time ./optimal_array_or_members s
Making an A
6.850u 0.000s 0:06.84 100.1% 0+0k 0+0io 216pf+0w

David White · Jan 24, 2005

[snip]

int main(char *argv[], int argc) {

/*
int a, b, c, d, e;
for (int i = 0; i < 1000000000; i++) {
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}

*/

int a[5];
for (int i = 0; i < 1000000000; i++) {
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
}

return 0;
}

The first (commented out) version ran twice as fast. (For doubles
instead of ints, it was a factor of 4). So the simpleton part of me
thinks that that answers my question. The remaining part tells me that
it is never that simple. Finally, the cynical part of me thinks that it
all doesn't matter and other parts of the program are bound to be far
more time consuming.

The most likely reason for the difference is that the compiler automatically
places arrays in main memory but individual objects in registers when
possible. I don't see how this test helps for your particular problem. You
need your collection to be indexable, so you don't really have the option of
using individual named variables. Although test programs like this can be
useful, you have to determine the reason for the performance difference and
consider whether the results will hold for your program. If the reason is
register use, then the test probably doesn't mean much.

DW

Joseph Turian · Jan 24, 2005

Aaron,

Performance on the toy code snippet is not informative of anything more
than performance on the toy code snippet.

Please help me resolve my internal struggle.

10 Write the code the most clear, maintable way possible.
20 Run the code under a profiler.
30 Optimize the actual performance bottleneck (as opposed to, say,
imaginary performance bottlenecks).
40 GOTO 20

If you have any problems with 30, get back to the group. Until then,
both you and us are just shooting in the dark.

Joseph

davidrubin · Jan 24, 2005

My feeling is that when you try to implement 'getNode', you will find
that an array-based implemenation is faster than a switch statement.
OTOH, you might find that templatizing 'getNode' on i, and then
specializing for values of i is faster yet. Presumably, there is some
more generic part of your application that makes 'getNodeA',
'getNodeB', etc., prohibitive. /david

davidrubin · Jan 24, 2005

My feeling is that when you try to implement 'getNode', you will find
that an array-based implemenation is faster than a switch statement.
OTOH, you might find that templatizing 'getNode' on i, and then
specializing for values of i is faster yet. Presumably, there is some
more generic part of your application that makes 'getNodeA',
'getNodeB', etc., prohibitive.

davidrubin · Jan 24, 2005

My feeling is that when you try to implement 'getNode', you will find
that the array-based implementation is faster than a switch statement.
You might be able to squeeze out some more performance though with a
variable-based implementation if you templatize 'getNode' on i, and
specialize for values of i. (Presumably there is some more generic code
in your application which makes 'getNodeA', 'getNodeB', etc.,
prohibitive.)

davidrubin · Jan 24, 2005

My feeling is that when you try to implement 'getNode', you will find
that an array-based implemenation is faster than a switch statement.
OTOH, you might find that templatizing 'getNode' on i, and then
specializing for values of i is faster yet. Presumably, there is some
more generic part of your application that makes 'getNodeA',
'getNodeB', etc., prohibitive. /david

Sethalicious · Jan 24, 2005

I'm thinking that your getting a silent error on the code. I don't
believe an integer can handle the value 1000000000 in the following:

for (int i = 0; i < 1000000000; i++)

So what's happening is your integer "i" is overflowing and getting set
to an unpredictable value (possibly negative). You might have better
luck using an "unsigned long" or even "std::size_t".

I'm not sure what the standard says, but I think most compilers
implement 32 bit integers. So an "unsigned" int is 0 - 2^32, and the
"signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

Check your compiler's docs to see the sizes of the integral data types.

Otherwise, there should be no real difference in performance of the
code with optimizations on. When you measure the time, remember to
record cpu clock cycles instead of time in seconds.
Hope this helps!

Chris J. (aka Sethalicious)

E. Robert Tisdale · Jan 24, 2005

I'm working on a scientific computing application.
I need a class called Element
which is no more than a collection of integers, or "nodes"
and has only on method int getNode(int i).

I would like to implement in the most efficient was possible. So I
summoned up my programming intellect and asked myself,
Do I want to have members such as int a, b, c, d and e?
Or a single member such as int a[5]?
So I wrote the following snippet and compiled it with a -O3 flag:

cat main.cc

#include <iostream>

int main(int argc, char* argv[]) {

if (1 < argc) {
const
size_t n = atoi(argv[1]);
#ifdef INNER
int a[5];
for (size_t j = 0; j < n; ++j) {
a[0] = 1;
a[1] = 2;
a[2] = 3;
a[3] = 4;
a[4] = 5;
}
#else //INNER
int a, b, c, d, e;
for (size_t j = 0; j < n; ++j) {
a = 1;
b = 2;
c = 3;
d = 4;
e = 5;
}
#endif//INNER
}
else {
std::cerr << argv[0] << " <iterations>"
<< std::endl;
}
return 0;
}

g++ --version g++ (GCC) 3.4.1
uname -srmpio

Linux 2.6.10-1.9_FC2smp i686 i686 i386 GNU/Linux

g++ -DINNER -Wall -ansi -pedantic -O3 -o main main.cc
time ./main 1000000000

3.362u 0.004s 0:03.36 100.0% 0+0k 0+0io 0pf+0w

time ./main 1000000000

3.357u 0.003s 0:03.36 99.7% 0+0k 0+0io 0pf+0w

time ./main 1000000000

3.388u 0.003s 0:03.39 99.7% 0+0k 0+0io 0pf+0w

time ./main 1000000000

3.388u 0.003s 0:03.39 99.7% 0+0k 0+0io 0pf+0w

time ./main 1000000000

3.396u 0.002s 0:03.40 99.7% 0+0k 0+0io 0pf+0w

g++ -Wall -ansi -pedantic -O3 -o main main.cc

time ./main 1000000000

3.396u 0.005s 0:03.40 99.7% 0+0k 0+0io 0pf+0w

time ./main 1000000000

3.360u 0.003s 0:03.37 99.7% 0+0k 0+0io 0pf+0w

time ./main 1000000000

3.371u 0.004s 0:03.37 100.0% 0+0k 0+0io 0pf+0w

time ./main 1000000000

3.388u 0.003s 0:03.39 99.7% 0+0k 0+0io 0pf+0w

time ./main 1000000000

3.371u 0.003s 0:03.37 100.0% 0+0k 0+0io 0pf+0w

The first (commented out) version ran twice as fast.

Mine doesn't.

(For doubles instead of ints, it was a factor of 4).
So the simpleton part of me thinks that that answers my question.
The remaining part tells me that it is never that simple.

We call that good intuition.

Finally, the [skeptical] part of me thinks that it all doesn't matter
and other parts of the program are bound to be far more time consuming.

Probably correct.

Please help me resolve my internal struggle.

Try upgrading your compiler.

Gianni Mariani · Jan 24, 2005

Sethalicious said:
I'm thinking that your getting a silent error on the code. I don't
believe an integer can handle the value 1000000000 in the following:

So what's happening is your integer "i" is overflowing and getting set
to an unpredictable value (possibly negative). You might have better
luck using an "unsigned long" or even "std::size_t".

Not in this case - 1billion is less that 2^31.

I'm not sure what the standard says, but I think most compilers
implement 32 bit integers. So an "unsigned" int is 0 to 2^32, and the
"signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

Think again.

int has a range from - 2^31 to (2^31-1)

Gianni Mariani · Jan 24, 2005

My feeling is that when you try to implement 'getNode', you will find
that the array-based implementation is faster than a switch statement.

That's true. A pointer to member might be an alternative.

You might be able to squeeze out some more performance though with a
variable-based implementation if you templatize 'getNode' on i, and
specialize for values of i. (Presumably there is some more generic code
in your application which makes 'getNodeA', 'getNodeB', etc.,
prohibitive.)

Templates also take pointer to members.

Sethalicious · Jan 24, 2005

I'm thinking that your getting a silent error on the code. I don't
believe an integer can handle the value 1000000000 in the following:

for (int i = 0; i < 1000000000; i++)

So what's happening is your integer "i" is overflowing and getting set
to an unpredictable value (possibly negative). You might have better
luck using an "unsigned long" or even "std::size_t".

I'm not sure what the standard says, but I think most compilers
implement 32 bit integers. So an "unsigned" int is 0 - 2^32, and the
"signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

Check your compiler's docs to see the sizes of the integral data types.

Otherwise, there should be no real difference in performance of the
code with optimizations on. When you measure the time, remember to
record cpu clock cycles instead of time in seconds.
Hope this helps!

Chris J. (aka Sethalicious)

Jonathan Turkanis · Jan 24, 2005

I'm working on a scientific computing application. I need a class
called Element which is no more than a collection of integers, or
"nodes" and has only on method int getNode(int i).

I would like to implement in the most efficient was possible. So I
summoned up my programming intellect and asked myself: Do I want to
have members such as int a, b, c, d, e or a single member such as int
a[5]. So I wrote the following snippet and compiled it with a -O3
flag:

Make sure that Element is well encapsulated so that you can switch easily
between different implementations. Then, take Joseph's advice and don't worry
about which implementation is better until you can do performance measurements
in real-world scenarios.

Jonathan

Sethalicious · Jan 24, 2005

I'm thinking that your getting a silent error on the code. I don't
believe an integer can handle the value 1000000000 in the following:

for (int i = 0; i < 1000000000; i++)

So what's happening is your integer "i" is overflowing and getting set
to an unpredictable value (possibly negative). You might have better
luck using an "unsigned long" or even "std::size_t".

I'm not sure what the standard says, but I think most compilers
implement 32 bit integers. So an "unsigned" int is 0 - 2^32, and the
"signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

Check your compiler's docs to see the sizes of the integral data types.

Otherwise, there should be no real difference in performance of the
code with optimizations on. When you measure the time, remember to
record cpu clock cycles instead of time in seconds.
Hope this helps!

Chris J. (aka Sethalicious)

Sethalicious · Jan 24, 2005

Think again.

int has a range from - 2^31 to (2^31-1)

Whoops. Totally forgot about 2's complement. :-o I stand corrected!
Thanks!

Chris J. (aka Sethalicious)

Arguing efficiency. Arg!	5	Feb 26, 2013
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Problem with codewars.	5	Dec 4, 2023
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
Function is not worked in C	2	Jun 27, 2023
Machine Learning.. Endless Struggle	3	Feb 16, 2023
Minimum Total Difficulty	0	Nov 15, 2023
Need help! Following code isnt working fully Comparison of integer and pointer	0	Nov 20, 2022

Superbasic efficiency question

aaronfude

Joseph Turian

David White

Sethalicious

davidrubin

Gianni Mariani

David White

Joseph Turian

davidrubin

davidrubin

davidrubin

davidrubin

Sethalicious

E. Robert Tisdale

Gianni Mariani

Gianni Mariani

Sethalicious

Jonathan Turkanis

Sethalicious

Sethalicious

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads