# number of machine instructions - for algorithm measuring

Discussion in 'C Programming' started by ben, Feb 22, 2005.

1. ### benGuest

hello,

i'm trying to answer a "prove an upper bound on the number of machine
instructions required to process M connections on N objects" exercise
from an algorithms book. i don't understand a very basic detail of how
question and example programme to hopefully get at what i'd like to
know:

Prove an upper bound on the number of machine instructions required to
process N objects using the below programme. You may
assume, for example, that any C assignment statement always requires
less than c instructions, for some fixed constant c.

/* a very simple and silly example "algorithm" */
#include <stdio.h>
#define N 10000

int main(void)
{
int n = 0;
int x[3] = { 1, 4, 9 };
long unsigned total = 0;

while( n < N ) {
total += x[ n % 3 ];
n++;
}

printf("%lu\n", total);
return 0;
}

i know the "upper bound" aspect of the question isn't particularly
meaningful with this programme, as after the value of N has been
decided there's nothing variable about the number of times the loop is
run -- upper and lower bound the same -- one fixed, by N, value. but i
wanted to leave the question as unchanged as possible.

to answer that exercise, is it a case of assigning a certain number of
machine instructions to each and every C statement, or operation, in
the loop, adding those up, and then multiplying that by the number of
loops?

it's the "You may assume, for example, that any C assignment statement
always requires less than c instructions, for some fixed constant c."
part i'm unsure on.

the sillyness part of it)?

thanks, ben.

ben, Feb 22, 2005

2. ### Tom St DenisGuest

ben wrote:
> hello,

What do you do in big-Oh with constants?

e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
though] that's O(n) time... then the outer loop is O(nlogn) so its the
product or O(n^2logn) time.

.... so the question is really moot.

If you want really precise timings you're gonna have to sample.
Compilers vary too much and they have too many options to enumerate all
possible results..

Tom

Tom St Denis, Feb 22, 2005

3. ### benGuest

In article <>, Tom
St Denis <> wrote:

> e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
> though]

yes, that's the part that i'm asking about and hoping to find out

"You may assume, for example, that any C assignment statement always
requires less than c instructions, for some fixed constant c."

how would you carry out that part with the example loop -- just for one
loop is all that's needed.

the higher level aspect of the exercise question that you're talking
about Tom, i understand -- that's why i gave a very simple bit of code
and labelled it silly and stated "after the value of N has been
decided there's nothing variable about the number of times the loop is
run -- upper and lower bound the same -- one fixed, by N, value. but i
wanted to leave the question as unchanged as possible" -- there's no
need for the variable run aspect to get at what i'm asking about.

so basically how do you apply "You may assume, for example, that any C
assignment statement always requires less than c instructions, for some
fixed constant c." to

while( n < N ) {
total += x[ n % 3 ];
n++;
}

?

thanks, ben.

ben, Feb 22, 2005
4. ### Michael MairGuest

ben wrote:
> In article <>, Tom
> St Denis <> wrote:
>
>>e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
>>though]

>
> yes, that's the part that i'm asking about and hoping to find out
>
> "You may assume, for example, that any C assignment statement always
> requires less than c instructions, for some fixed constant c."
>
> how would you carry out that part with the example loop -- just for one
> loop is all that's needed.

I am not sure how this is meant -- plainly just looking for assignments
seems stupid. "Operation" is more meaningful

[snip]

> so basically how do you apply "You may assume, for example, that any C
> assignment statement always requires less than c instructions, for some
> fixed constant c." to
>
> while( n < N ) {
> total += x[ n % 3 ];
> n++;
> }

Assignments: 2 per loop -> 2*c*N. Complete nonsense.

Operations:
n % 3 -> temp1
x + temp1 -> temp2
total + *temp2 -> total
or:
*temp2 -> temp3
total + temp3 -> total

n + 1 -> n

and of course
n < N -> stay in the loop
else -> leave the loop
or intermediate assignment of n<N -> temp0 and test afterwards.

Would be [5..7]*c*N.

IMO, the exercise should be stated in a clearer way.

Cheers
Michael
--
E-Mail: Mine is a gmx dot de address.

Michael Mair, Feb 22, 2005
5. ### benGuest

In article <>, Michael Mair
<> wrote:

> ben wrote:
> > In article <>, Tom
> > St Denis <> wrote:
> >
> >>e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
> >>though]

> >
> > yes, that's the part that i'm asking about and hoping to find out
> >
> > "You may assume, for example, that any C assignment statement always
> > requires less than c instructions, for some fixed constant c."
> >
> > how would you carry out that part with the example loop -- just for one
> > loop is all that's needed.

>
> I am not sure how this is meant -- plainly just looking for assignments
> seems stupid. "Operation" is more meaningful
>
> [snip]
>
> > so basically how do you apply "You may assume, for example, that any C
> > assignment statement always requires less than c instructions, for some
> > fixed constant c." to
> >
> > while( n < N ) {
> > total += x[ n % 3 ];
> > n++;
> > }

>
> Assignments: 2 per loop -> 2*c*N. Complete nonsense.
>
> Operations:
> n % 3 -> temp1
> x + temp1 -> temp2
> total + *temp2 -> total
> or:
> *temp2 -> temp3
> total + temp3 -> total
>
> n + 1 -> n
>
> and of course
> n < N -> stay in the loop
> else -> leave the loop
> or intermediate assignment of n<N -> temp0 and test afterwards.
>
> Would be [5..7]*c*N.

right. great.

>
> IMO, the exercise should be stated in a clearer way.

great. it was unclear to me also, party for the 'assignment' reason
which i also thought was odd and partly not understanding exactly how
it should be broken down. so i wasn't sure if it was unclear to me
through my lack of knowledge, or due to the exercise being badly
written -- a bit of both was the answer to that.

thanks very much for the reply Michael,

ben.

ben, Feb 22, 2005
6. ### Mark F. HaighGuest

ben wrote:

<snip>

>
>
> Prove an upper bound on the number of machine instructions required

to
> process N objects using the below programme. You may
> assume, for example, that any C assignment statement always requires
> less than c instructions, for some fixed constant c.
>
> /* a very simple and silly example "algorithm" */
> #include <stdio.h>
> #define N 10000
>
> int main(void)
> {
> int n = 0;
> int x[3] = { 1, 4, 9 };
> long unsigned total = 0;
>
> while( n < N ) {
> total += x[ n % 3 ];
> n++;
> }
>
> printf("%lu\n", total);
> return 0;
> }
>

In my opinion, this is a waste of time. Instruction count no longer
has much of a bearing on how much time a program takes to execute. To
use your example program, with N set to 0x4FFFFFFFu:

[mark@icepick ~]\$ gcc -save-temps -Wall -O3 -o bar bar.c && time ./bar
&& wc -l bar.s
1968526669

real 0m16.683s
user 0m16.682s
sys 0m0.002s
44 bar.s <- lines of assembly

[mark@icepick ~]\$ gcc -save-temps -Wall -O3 -march=pentium4 -o bar
bar.c && time ./bar && wc -l bar.s
1968526669

real 0m4.313s
user 0m4.312s
sys 0m0.001s
78 bar.s <- lines of assembly

As you can see, a change in an optimization setting roughly doubled the
assembly output, but resulted in a 4x speed improvement. I picked the
quickest of 3 trials each, for the record.

It may be an interesting academic exercise, but its real world
applicability is questionable. Probably not the answer you wanted, but

Mark F. Haigh

Mark F. Haigh, Feb 22, 2005
7. ### benGuest

In article <>,
Mark F. Haigh <> wrote:

> ben wrote:
>
> <snip>
>
> >
> >
> > Prove an upper bound on the number of machine instructions required

> to
> > process N objects using the below programme. You may
> > assume, for example, that any C assignment statement always requires
> > less than c instructions, for some fixed constant c.
> >
> > /* a very simple and silly example "algorithm" */
> > #include <stdio.h>
> > #define N 10000
> >
> > int main(void)
> > {
> > int n = 0;
> > int x[3] = { 1, 4, 9 };
> > long unsigned total = 0;
> >
> > while( n < N ) {
> > total += x[ n % 3 ];
> > n++;
> > }
> >
> > printf("%lu\n", total);
> > return 0;
> > }
> >

>
> In my opinion, this is a waste of time. Instruction count no longer
> has much of a bearing on how much time a program takes to execute. To
> use your example program, with N set to 0x4FFFFFFFu:
>
>
> [mark@icepick ~]\$ gcc -save-temps -Wall -O3 -o bar bar.c && time ./bar
> && wc -l bar.s
> 1968526669
>
> real 0m16.683s
> user 0m16.682s
> sys 0m0.002s
> 44 bar.s <- lines of assembly
>
>
> [mark@icepick ~]\$ gcc -save-temps -Wall -O3 -march=pentium4 -o bar
> bar.c && time ./bar && wc -l bar.s
> 1968526669
>
> real 0m4.313s
> user 0m4.312s
> sys 0m0.001s
> 78 bar.s <- lines of assembly
>
>
> As you can see, a change in an optimization setting roughly doubled the
> assembly output, but resulted in a 4x speed improvement. I picked the
> quickest of 3 trials each, for the record.
>
> It may be an interesting academic exercise, but its real world
> applicability is questionable. Probably not the answer you wanted, but

Mark,

well, it sounds like a good point. if you're right then it's a very
good answer and one i'd definitely want. the exercise is just from a
book. i don't have to do it as stated by the exercise. if the way
that's detailed isn't so useful and there's a much more useful way then
i'd prefer to learn and do it a way that's useful. i do intend to use
this stuff in the real world -- that's what i'm learning it for.

i'm just wondering though, if you're comparing operation counts of code
compiled using the same settings, and the different programmes are
written in a similar fashion (that is, for example, not comparing one
programme that unrolls a loop and another that doesn't -- code written
in the same style) then the number of operations is a reasonable way to
compare? basically, use the same base settings and style. (although
"code written in the same style" is probably pretty ambiguous)

further thought: different operations take different amounts of time
don't they? -- i think so. so that'd make operation counting less
useful. for instance i think accessing a word of memory that isn't in
the cpu's cache can take several clock cycles, whereas if that data is
in cache the access might take just one clock cycle. correct?

so basically you're saying timing is the way to measure algorithms?

thanks, ben

ben, Feb 23, 2005
8. ### Mark F. HaighGuest

ben wrote:
>
> well, it sounds like a good point. if you're right then it's a very
> good answer and one i'd definitely want. the exercise is just from a
> book. i don't have to do it as stated by the exercise. if the way
> that's detailed isn't so useful and there's a much more useful way

then
> i'd prefer to learn and do it a way that's useful. i do intend to use
> this stuff in the real world -- that's what i'm learning it for.

Good decision. Can't remember who said "In theory, theory and practice
are the same, in practice, they're different."

>
> i'm just wondering though, if you're comparing operation counts of

code
> compiled using the same settings, and the different programmes are
> written in a similar fashion (that is, for example, not comparing one
> programme that unrolls a loop and another that doesn't -- code

written
> in the same style) then the number of operations is a reasonable way

to
> compare? basically, use the same base settings and style. (although
> "code written in the same style" is probably pretty ambiguous)

These days, it would be unwise to manually unroll a loop in code
without timings to back it up. Compilers typically understand the
associated branching costs (whether cheap or expensive, cost of
mispredicted branches, etc) and pipeline scheduling better than most
people do.

>
> further thought: different operations take different amounts of time
> don't they? -- i think so. so that'd make operation counting less
> useful.

Yes, although it depends on the processor architecture exactly how
different the numbers can be. Some instructions can be executed in
parallel. Some instructions may need to be internally broken down into
smaller micro-ops, each of which is individually executed.

> for instance i think accessing a word of memory that isn't in
> the cpu's cache can take several clock cycles, whereas if that data

is
> in cache the access might take just one clock cycle. correct?

Worse, much of the time, unless you're dealing with older or embedded
processors. It can take hundreds or even thousands of cycles on newer
fast machines. Typically it's not one cycle from cache, either, and it
will depend on which cache it's coming from.

For the sake of illustration, let's revisit a previous topic-- loop
unrolling. Let's say you manually unrolled a bunch of loops all over
your code. Are you sure that this new bulk of code does not force some
other critical part of code out of cache, resulting in several hundred
cycle wastage every time it has to be re-fetched? Now the several
cycles you saved looks like nothing. Think of the compiler as your
friend that helps you find the sweet spot amongst many different
competing factors.

>
> so basically you're saying timing is the way to measure algorithms?
>

Absolutely. First, optimize algorithmic complexity. Second, implement
it cleanly. Third, play with your compiler settings/hints and profile
it. Fourth and last, which is rarely needed, micro-optimize.

The fastest linked list in the world will get creamed by a slow hash or
tree if it's used inappropriately. Compilers do a better job of
optimizing if the code is implemented cleanly. Profiling shows you
where the bulk of _time_ is spent, so you don't go chasing false leads
and trying to optimize things that will give you a .00001% increase.

Mark F. Haigh

Mark F. Haigh, Feb 23, 2005
9. ### benGuest

Mark,

ok, thanks for the info

ben.

In article <>,
Mark F. Haigh <> wrote:

> ben wrote:
> >
> > well, it sounds like a good point. if you're right then it's a very
> > good answer and one i'd definitely want. the exercise is just from a
> > book. i don't have to do it as stated by the exercise. if the way
> > that's detailed isn't so useful and there's a much more useful way

> then
> > i'd prefer to learn and do it a way that's useful. i do intend to use
> > this stuff in the real world -- that's what i'm learning it for.

>
> Good decision. Can't remember who said "In theory, theory and practice
> are the same, in practice, they're different."
>
> >
> > i'm just wondering though, if you're comparing operation counts of

> code
> > compiled using the same settings, and the different programmes are
> > written in a similar fashion (that is, for example, not comparing one
> > programme that unrolls a loop and another that doesn't -- code

> written
> > in the same style) then the number of operations is a reasonable way

> to
> > compare? basically, use the same base settings and style. (although
> > "code written in the same style" is probably pretty ambiguous)

>
> These days, it would be unwise to manually unroll a loop in code
> without timings to back it up. Compilers typically understand the
> associated branching costs (whether cheap or expensive, cost of
> mispredicted branches, etc) and pipeline scheduling better than most
> people do.
>
> >
> > further thought: different operations take different amounts of time
> > don't they? -- i think so. so that'd make operation counting less
> > useful.

>
> Yes, although it depends on the processor architecture exactly how
> different the numbers can be. Some instructions can be executed in
> parallel. Some instructions may need to be internally broken down into
> smaller micro-ops, each of which is individually executed.
>
> > for instance i think accessing a word of memory that isn't in
> > the cpu's cache can take several clock cycles, whereas if that data

> is
> > in cache the access might take just one clock cycle. correct?

>
> Worse, much of the time, unless you're dealing with older or embedded
> processors. It can take hundreds or even thousands of cycles on newer
> fast machines. Typically it's not one cycle from cache, either, and it
> will depend on which cache it's coming from.
>
> For the sake of illustration, let's revisit a previous topic-- loop
> unrolling. Let's say you manually unrolled a bunch of loops all over
> your code. Are you sure that this new bulk of code does not force some
> other critical part of code out of cache, resulting in several hundred
> cycle wastage every time it has to be re-fetched? Now the several
> cycles you saved looks like nothing. Think of the compiler as your
> friend that helps you find the sweet spot amongst many different
> competing factors.
>
> >
> > so basically you're saying timing is the way to measure algorithms?
> >

>
> Absolutely. First, optimize algorithmic complexity. Second, implement
> it cleanly. Third, play with your compiler settings/hints and profile
> it. Fourth and last, which is rarely needed, micro-optimize.
>
> The fastest linked list in the world will get creamed by a slow hash or
> tree if it's used inappropriately. Compilers do a better job of
> optimizing if the code is implemented cleanly. Profiling shows you
> where the bulk of _time_ is spent, so you don't go chasing false leads
> and trying to optimize things that will give you a .00001% increase.
>
>
> Mark F. Haigh
>
>

ben, Feb 24, 2005