number of machine instructions - for algorithm measuring

Discussion in 'C Programming' started by ben, Feb 22, 2005.

  1. ben

    ben Guest

    hello,

    i'm trying to answer a "prove an upper bound on the number of machine
    instructions required to process M connections on N objects" exercise
    from an algorithms book. i don't understand a very basic detail of how
    i should go about answering the exercise. here's a simplified example
    question and example programme to hopefully get at what i'd like to
    know:



    Prove an upper bound on the number of machine instructions required to
    process N objects using the below programme. You may
    assume, for example, that any C assignment statement always requires
    less than c instructions, for some fixed constant c.

    /* a very simple and silly example "algorithm" */
    #include <stdio.h>
    #define N 10000

    int main(void)
    {
    int n = 0;
    int x[3] = { 1, 4, 9 };
    long unsigned total = 0;

    while( n < N ) {
    total += x[ n % 3 ];
    n++;
    }

    printf("%lu\n", total);
    return 0;
    }



    i know the "upper bound" aspect of the question isn't particularly
    meaningful with this programme, as after the value of N has been
    decided there's nothing variable about the number of times the loop is
    run -- upper and lower bound the same -- one fixed, by N, value. but i
    wanted to leave the question as unchanged as possible.

    to answer that exercise, is it a case of assigning a certain number of
    machine instructions to each and every C statement, or operation, in
    the loop, adding those up, and then multiplying that by the number of
    loops?

    it's the "You may assume, for example, that any C assignment statement
    always requires less than c instructions, for some fixed constant c."
    part i'm unsure on.

    how exactly would you go about answering the above exercise (ignoring
    the sillyness part of it)?

    thanks, ben.
    ben, Feb 22, 2005
    #1
    1. Advertising

  2. ben

    Tom St Denis Guest

    ben wrote:
    > hello,


    Not about CLC...

    What do you do in big-Oh with constants?

    e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
    though] that's O(n) time... then the outer loop is O(nlogn) so its the
    product or O(n^2logn) time.

    .... so the question is really moot.

    If you want really precise timings you're gonna have to sample.
    Compilers vary too much and they have too many options to enumerate all
    possible results..

    Tom
    Tom St Denis, Feb 22, 2005
    #2
    1. Advertising

  3. ben

    ben Guest

    In article <>, Tom
    St Denis <> wrote:

    > e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
    > though]


    yes, that's the part that i'm asking about and hoping to find out
    about. my question is about this part of the exercise question:

    "You may assume, for example, that any C assignment statement always
    requires less than c instructions, for some fixed constant c."

    how would you carry out that part with the example loop -- just for one
    loop is all that's needed.

    the higher level aspect of the exercise question that you're talking
    about Tom, i understand -- that's why i gave a very simple bit of code
    and labelled it silly and stated "after the value of N has been
    decided there's nothing variable about the number of times the loop is
    run -- upper and lower bound the same -- one fixed, by N, value. but i
    wanted to leave the question as unchanged as possible" -- there's no
    need for the variable run aspect to get at what i'm asking about.





    so basically how do you apply "You may assume, for example, that any C
    assignment statement always requires less than c instructions, for some
    fixed constant c." to

    while( n < N ) {
    total += x[ n % 3 ];
    n++;
    }

    ?

    thanks, ben.
    ben, Feb 22, 2005
    #3
  4. ben

    Michael Mair Guest

    ben wrote:
    > In article <>, Tom
    > St Denis <> wrote:
    >
    >>e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
    >>though]

    >
    > yes, that's the part that i'm asking about and hoping to find out
    > about. my question is about this part of the exercise question:
    >
    > "You may assume, for example, that any C assignment statement always
    > requires less than c instructions, for some fixed constant c."
    >
    > how would you carry out that part with the example loop -- just for one
    > loop is all that's needed.


    I am not sure how this is meant -- plainly just looking for assignments
    seems stupid. "Operation" is more meaningful

    [snip]

    > so basically how do you apply "You may assume, for example, that any C
    > assignment statement always requires less than c instructions, for some
    > fixed constant c." to
    >
    > while( n < N ) {
    > total += x[ n % 3 ];
    > n++;
    > }


    Assignments: 2 per loop -> 2*c*N. Complete nonsense.

    Operations:
    n % 3 -> temp1
    x + temp1 -> temp2
    total + *temp2 -> total
    or:
    *temp2 -> temp3
    total + temp3 -> total

    n + 1 -> n

    and of course
    n < N -> stay in the loop
    else -> leave the loop
    or intermediate assignment of n<N -> temp0 and test afterwards.

    Would be [5..7]*c*N.

    IMO, the exercise should be stated in a clearer way.


    Cheers
    Michael
    --
    E-Mail: Mine is a gmx dot de address.
    Michael Mair, Feb 22, 2005
    #4
  5. ben

    ben Guest

    In article <>, Michael Mair
    <> wrote:

    > ben wrote:
    > > In article <>, Tom
    > > St Denis <> wrote:
    > >
    > >>e.g. inner loop takes O(35n) [35 opcodes per pass, not 35 statements
    > >>though]

    > >
    > > yes, that's the part that i'm asking about and hoping to find out
    > > about. my question is about this part of the exercise question:
    > >
    > > "You may assume, for example, that any C assignment statement always
    > > requires less than c instructions, for some fixed constant c."
    > >
    > > how would you carry out that part with the example loop -- just for one
    > > loop is all that's needed.

    >
    > I am not sure how this is meant -- plainly just looking for assignments
    > seems stupid. "Operation" is more meaningful
    >
    > [snip]
    >
    > > so basically how do you apply "You may assume, for example, that any C
    > > assignment statement always requires less than c instructions, for some
    > > fixed constant c." to
    > >
    > > while( n < N ) {
    > > total += x[ n % 3 ];
    > > n++;
    > > }

    >
    > Assignments: 2 per loop -> 2*c*N. Complete nonsense.
    >
    > Operations:
    > n % 3 -> temp1
    > x + temp1 -> temp2
    > total + *temp2 -> total
    > or:
    > *temp2 -> temp3
    > total + temp3 -> total
    >
    > n + 1 -> n
    >
    > and of course
    > n < N -> stay in the loop
    > else -> leave the loop
    > or intermediate assignment of n<N -> temp0 and test afterwards.
    >
    > Would be [5..7]*c*N.


    right. great.

    >
    > IMO, the exercise should be stated in a clearer way.


    great. it was unclear to me also, party for the 'assignment' reason
    which i also thought was odd and partly not understanding exactly how
    it should be broken down. so i wasn't sure if it was unclear to me
    through my lack of knowledge, or due to the exercise being badly
    written -- a bit of both was the answer to that.

    thanks very much for the reply Michael,

    ben.
    ben, Feb 22, 2005
    #5
  6. ben wrote:

    <snip>

    >
    >
    > Prove an upper bound on the number of machine instructions required

    to
    > process N objects using the below programme. You may
    > assume, for example, that any C assignment statement always requires
    > less than c instructions, for some fixed constant c.
    >
    > /* a very simple and silly example "algorithm" */
    > #include <stdio.h>
    > #define N 10000
    >
    > int main(void)
    > {
    > int n = 0;
    > int x[3] = { 1, 4, 9 };
    > long unsigned total = 0;
    >
    > while( n < N ) {
    > total += x[ n % 3 ];
    > n++;
    > }
    >
    > printf("%lu\n", total);
    > return 0;
    > }
    >


    In my opinion, this is a waste of time. Instruction count no longer
    has much of a bearing on how much time a program takes to execute. To
    use your example program, with N set to 0x4FFFFFFFu:


    [mark@icepick ~]$ gcc -save-temps -Wall -O3 -o bar bar.c && time ./bar
    && wc -l bar.s
    1968526669

    real 0m16.683s
    user 0m16.682s
    sys 0m0.002s
    44 bar.s <- lines of assembly


    [mark@icepick ~]$ gcc -save-temps -Wall -O3 -march=pentium4 -o bar
    bar.c && time ./bar && wc -l bar.s
    1968526669

    real 0m4.313s
    user 0m4.312s
    sys 0m0.001s
    78 bar.s <- lines of assembly


    As you can see, a change in an optimization setting roughly doubled the
    assembly output, but resulted in a 4x speed improvement. I picked the
    quickest of 3 trials each, for the record.

    It may be an interesting academic exercise, but its real world
    applicability is questionable. Probably not the answer you wanted, but
    an answer nonetheless.


    Mark F. Haigh
    Mark F. Haigh, Feb 22, 2005
    #6
  7. ben

    ben Guest

    In article <>,
    Mark F. Haigh <> wrote:

    > ben wrote:
    >
    > <snip>
    >
    > >
    > >
    > > Prove an upper bound on the number of machine instructions required

    > to
    > > process N objects using the below programme. You may
    > > assume, for example, that any C assignment statement always requires
    > > less than c instructions, for some fixed constant c.
    > >
    > > /* a very simple and silly example "algorithm" */
    > > #include <stdio.h>
    > > #define N 10000
    > >
    > > int main(void)
    > > {
    > > int n = 0;
    > > int x[3] = { 1, 4, 9 };
    > > long unsigned total = 0;
    > >
    > > while( n < N ) {
    > > total += x[ n % 3 ];
    > > n++;
    > > }
    > >
    > > printf("%lu\n", total);
    > > return 0;
    > > }
    > >

    >
    > In my opinion, this is a waste of time. Instruction count no longer
    > has much of a bearing on how much time a program takes to execute. To
    > use your example program, with N set to 0x4FFFFFFFu:
    >
    >
    > [mark@icepick ~]$ gcc -save-temps -Wall -O3 -o bar bar.c && time ./bar
    > && wc -l bar.s
    > 1968526669
    >
    > real 0m16.683s
    > user 0m16.682s
    > sys 0m0.002s
    > 44 bar.s <- lines of assembly
    >
    >
    > [mark@icepick ~]$ gcc -save-temps -Wall -O3 -march=pentium4 -o bar
    > bar.c && time ./bar && wc -l bar.s
    > 1968526669
    >
    > real 0m4.313s
    > user 0m4.312s
    > sys 0m0.001s
    > 78 bar.s <- lines of assembly
    >
    >
    > As you can see, a change in an optimization setting roughly doubled the
    > assembly output, but resulted in a 4x speed improvement. I picked the
    > quickest of 3 trials each, for the record.
    >
    > It may be an interesting academic exercise, but its real world
    > applicability is questionable. Probably not the answer you wanted, but
    > an answer nonetheless.


    Mark,

    well, it sounds like a good point. if you're right then it's a very
    good answer and one i'd definitely want. the exercise is just from a
    book. i don't have to do it as stated by the exercise. if the way
    that's detailed isn't so useful and there's a much more useful way then
    i'd prefer to learn and do it a way that's useful. i do intend to use
    this stuff in the real world -- that's what i'm learning it for.

    i'm just wondering though, if you're comparing operation counts of code
    compiled using the same settings, and the different programmes are
    written in a similar fashion (that is, for example, not comparing one
    programme that unrolls a loop and another that doesn't -- code written
    in the same style) then the number of operations is a reasonable way to
    compare? basically, use the same base settings and style. (although
    "code written in the same style" is probably pretty ambiguous)

    further thought: different operations take different amounts of time
    don't they? -- i think so. so that'd make operation counting less
    useful. for instance i think accessing a word of memory that isn't in
    the cpu's cache can take several clock cycles, whereas if that data is
    in cache the access might take just one clock cycle. correct?

    so basically you're saying timing is the way to measure algorithms?

    thanks, ben
    ben, Feb 23, 2005
    #7
  8. ben wrote:
    >
    > well, it sounds like a good point. if you're right then it's a very
    > good answer and one i'd definitely want. the exercise is just from a
    > book. i don't have to do it as stated by the exercise. if the way
    > that's detailed isn't so useful and there's a much more useful way

    then
    > i'd prefer to learn and do it a way that's useful. i do intend to use
    > this stuff in the real world -- that's what i'm learning it for.


    Good decision. Can't remember who said "In theory, theory and practice
    are the same, in practice, they're different."

    >
    > i'm just wondering though, if you're comparing operation counts of

    code
    > compiled using the same settings, and the different programmes are
    > written in a similar fashion (that is, for example, not comparing one
    > programme that unrolls a loop and another that doesn't -- code

    written
    > in the same style) then the number of operations is a reasonable way

    to
    > compare? basically, use the same base settings and style. (although
    > "code written in the same style" is probably pretty ambiguous)


    These days, it would be unwise to manually unroll a loop in code
    without timings to back it up. Compilers typically understand the
    associated branching costs (whether cheap or expensive, cost of
    mispredicted branches, etc) and pipeline scheduling better than most
    people do.

    >
    > further thought: different operations take different amounts of time
    > don't they? -- i think so. so that'd make operation counting less
    > useful.


    Yes, although it depends on the processor architecture exactly how
    different the numbers can be. Some instructions can be executed in
    parallel. Some instructions may need to be internally broken down into
    smaller micro-ops, each of which is individually executed.

    > for instance i think accessing a word of memory that isn't in
    > the cpu's cache can take several clock cycles, whereas if that data

    is
    > in cache the access might take just one clock cycle. correct?


    Worse, much of the time, unless you're dealing with older or embedded
    processors. It can take hundreds or even thousands of cycles on newer
    fast machines. Typically it's not one cycle from cache, either, and it
    will depend on which cache it's coming from.

    For the sake of illustration, let's revisit a previous topic-- loop
    unrolling. Let's say you manually unrolled a bunch of loops all over
    your code. Are you sure that this new bulk of code does not force some
    other critical part of code out of cache, resulting in several hundred
    cycle wastage every time it has to be re-fetched? Now the several
    cycles you saved looks like nothing. Think of the compiler as your
    friend that helps you find the sweet spot amongst many different
    competing factors.

    >
    > so basically you're saying timing is the way to measure algorithms?
    >


    Absolutely. First, optimize algorithmic complexity. Second, implement
    it cleanly. Third, play with your compiler settings/hints and profile
    it. Fourth and last, which is rarely needed, micro-optimize.

    The fastest linked list in the world will get creamed by a slow hash or
    tree if it's used inappropriately. Compilers do a better job of
    optimizing if the code is implemented cleanly. Profiling shows you
    where the bulk of _time_ is spent, so you don't go chasing false leads
    and trying to optimize things that will give you a .00001% increase.


    Mark F. Haigh
    Mark F. Haigh, Feb 23, 2005
    #8
  9. ben

    ben Guest

    Mark,

    ok, thanks for the info

    ben.



    In article <>,
    Mark F. Haigh <> wrote:

    > ben wrote:
    > >
    > > well, it sounds like a good point. if you're right then it's a very
    > > good answer and one i'd definitely want. the exercise is just from a
    > > book. i don't have to do it as stated by the exercise. if the way
    > > that's detailed isn't so useful and there's a much more useful way

    > then
    > > i'd prefer to learn and do it a way that's useful. i do intend to use
    > > this stuff in the real world -- that's what i'm learning it for.

    >
    > Good decision. Can't remember who said "In theory, theory and practice
    > are the same, in practice, they're different."
    >
    > >
    > > i'm just wondering though, if you're comparing operation counts of

    > code
    > > compiled using the same settings, and the different programmes are
    > > written in a similar fashion (that is, for example, not comparing one
    > > programme that unrolls a loop and another that doesn't -- code

    > written
    > > in the same style) then the number of operations is a reasonable way

    > to
    > > compare? basically, use the same base settings and style. (although
    > > "code written in the same style" is probably pretty ambiguous)

    >
    > These days, it would be unwise to manually unroll a loop in code
    > without timings to back it up. Compilers typically understand the
    > associated branching costs (whether cheap or expensive, cost of
    > mispredicted branches, etc) and pipeline scheduling better than most
    > people do.
    >
    > >
    > > further thought: different operations take different amounts of time
    > > don't they? -- i think so. so that'd make operation counting less
    > > useful.

    >
    > Yes, although it depends on the processor architecture exactly how
    > different the numbers can be. Some instructions can be executed in
    > parallel. Some instructions may need to be internally broken down into
    > smaller micro-ops, each of which is individually executed.
    >
    > > for instance i think accessing a word of memory that isn't in
    > > the cpu's cache can take several clock cycles, whereas if that data

    > is
    > > in cache the access might take just one clock cycle. correct?

    >
    > Worse, much of the time, unless you're dealing with older or embedded
    > processors. It can take hundreds or even thousands of cycles on newer
    > fast machines. Typically it's not one cycle from cache, either, and it
    > will depend on which cache it's coming from.
    >
    > For the sake of illustration, let's revisit a previous topic-- loop
    > unrolling. Let's say you manually unrolled a bunch of loops all over
    > your code. Are you sure that this new bulk of code does not force some
    > other critical part of code out of cache, resulting in several hundred
    > cycle wastage every time it has to be re-fetched? Now the several
    > cycles you saved looks like nothing. Think of the compiler as your
    > friend that helps you find the sweet spot amongst many different
    > competing factors.
    >
    > >
    > > so basically you're saying timing is the way to measure algorithms?
    > >

    >
    > Absolutely. First, optimize algorithmic complexity. Second, implement
    > it cleanly. Third, play with your compiler settings/hints and profile
    > it. Fourth and last, which is rarely needed, micro-optimize.
    >
    > The fastest linked list in the world will get creamed by a slow hash or
    > tree if it's used inappropriately. Compilers do a better job of
    > optimizing if the code is implemented cleanly. Profiling shows you
    > where the bulk of _time_ is spent, so you don't go chasing false leads
    > and trying to optimize things that will give you a .00001% increase.
    >
    >
    > Mark F. Haigh
    >
    >
    ben, Feb 24, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kartik Ganesan
    Replies:
    0
    Views:
    425
    Kartik Ganesan
    May 21, 2004
  2. Jay Douglas
    Replies:
    2
    Views:
    9,462
    Alvin Bruney [ASP.NET MVP]
    Mar 16, 2005
  3. Replies:
    11
    Views:
    548
    Johannes Bauer
    Nov 11, 2007
  4. Replies:
    2
    Views:
    424
    Jerry Coffin
    Jan 14, 2008
  5. Øyvind
    Replies:
    5
    Views:
    816
Loading...

Share This Page