Superbasic efficiency question

Discussion in 'C++' started by aaronfude@gmail.com, Jan 24, 2005.

  1. Guest

    I'm working on a scientific computing application. I need a class
    called Element which is no more than a collection of integers, or
    "nodes" and has only on method int getNode(int i).

    I would like to implement in the most efficient was possible. So I
    summoned up my programming intellect and asked myself: Do I want to
    have members such as int a, b, c, d, e or a single member such as int
    a[5]. So I wrote the following snippet and compiled it with a -O3 flag:

    int main(char *argv[], int argc) {

    /*
    int a, b, c, d, e;
    for (int i = 0; i < 1000000000; i++) {
    a = 1;
    b = 2;
    c = 3;
    d = 4;
    e = 5;
    }

    */


    int a[5];
    for (int i = 0; i < 1000000000; i++) {
    a[0] = 1;
    a[1] = 2;
    a[2] = 3;
    a[3] = 4;
    a[4] = 5;
    }

    return 0;
    }

    The first (commented out) version ran twice as fast. (For doubles
    instead of ints, it was a factor of 4). So the simpleton part of me
    thinks that that answers my question. The remaining part tells me that
    it is never that simple. Finally, the cynical part of me thinks that it
    all doesn't matter and other parts of the program are bound to be far
    more time consuming.

    Please help me resolve my internal struggle.
    Many thanks in advance!

    Aaron Fude
     
    , Jan 24, 2005
    #1
    1. Advertising

  2. Aaron,

    Performance on the toy code snippet is not informative of anything more
    than performance on the toy code snippet.

    > Please help me resolve my internal struggle.


    10 Write the code the most clear, maintable way possible.
    20 Run the code under a profiler.
    30 Optimize the actual performance bottleneck (as opposed to, say,
    imaginary performance bottlenecks).
    40 GOTO 20

    If you have any problems with 30, get back to the group. Until then,
    both you and us are just shooting in the dark.

    Joseph
     
    Joseph Turian, Jan 24, 2005
    #2
    1. Advertising

  3. David White Guest

    "Joseph Turian" <> wrote in message
    news:...
    > Aaron,
    >
    > Performance on the toy code snippet is not informative of anything more
    > than performance on the toy code snippet.
    >
    > > Please help me resolve my internal struggle.

    >
    > 10 Write the code the most clear, maintable way possible.
    > 20 Run the code under a profiler.
    > 30 Optimize the actual performance bottleneck (as opposed to, say,
    > imaginary performance bottlenecks).


    Optimizing might require significant changes, even major design changes. If
    you know from the outset that speed is important, then consider it from the
    outset. Toy code snippets can help you determine early on how you'll need to
    go about doing certain things.

    > 40 GOTO 20
    >
    > If you have any problems with 30, get back to the group. Until then,
    > both you and us are just shooting in the dark.


    DW
     
    David White, Jan 24, 2005
    #3
  4. Sethalicious Guest

    I'm thinking that your getting a silent error on the code. I don't
    believe an integer can handle the value 1000000000 in the following:

    > for (int i = 0; i < 1000000000; i++)


    So what's happening is your integer "i" is overflowing and getting set
    to an unpredictable value (possibly negative). You might have better
    luck using an "unsigned long" or even "std::size_t".

    I'm not sure what the standard says, but I think most compilers
    implement 32 bit integers. So an "unsigned" int is 0 to 2^32, and the
    "signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

    Check your compiler's docs to see the sizes of the integral data types.


    Otherwise, there should be no real difference in performance of the
    code with optimizations on. When you measure the time, remember to
    record cpu clock cycles instead of time in seconds.
    Hope this helps!

    Chris J. (aka Sethalicious)
     
    Sethalicious, Jan 24, 2005
    #4
  5. Guest

    My feeling is that when you try to implement 'getNode', you will find
    that the array-based implementation is faster than a switch statement.
    You might be able to squeeze out some more performance though with a
    variable-based implementation if you templatize 'getNode' on i, and
    specialize for values of i. (Presumably there is some more generic code
    in your application which makes 'getNodeA', 'getNodeB', etc.,
    prohibitive.)
     
    , Jan 24, 2005
    #5
  6. wrote:
    > I'm working on a scientific computing application. I need a class
    > called Element which is no more than a collection of integers, or
    > "nodes" and has only on method int getNode(int i).
    >
    > I would like to implement in the most efficient was possible. So I
    > summoned up my programming intellect and asked myself: Do I want to
    > have members such as int a, b, c, d, e or a single member such as int
    > a[5]. So I wrote the following snippet and compiled it with a -O3 flag:
    >
    > int main(char *argv[], int argc) {
    >
    > /*
    > int a, b, c, d, e;
    > for (int i = 0; i < 1000000000; i++) {
    > a = 1;
    > b = 2;
    > c = 3;
    > d = 4;
    > e = 5;
    > }
    >
    > */
    >
    >
    > int a[5];
    > for (int i = 0; i < 1000000000; i++) {
    > a[0] = 1;
    > a[1] = 2;
    > a[2] = 3;
    > a[3] = 4;
    > a[4] = 5;
    > }
    >
    > return 0;
    > }
    >
    > The first (commented out) version ran twice as fast. (For doubles
    > instead of ints, it was a factor of 4). So the simpleton part of me
    > thinks that that answers my question. The remaining part tells me that
    > it is never that simple. Finally, the cynical part of me thinks that it
    > all doesn't matter and other parts of the program are bound to be far
    > more time consuming.


    It is more than likely that the compiler re-arranged your code

    int a, b, c, d, e;
    a = 1;
    b = 2;
    c = 3;
    d = 4;
    e = 5;
    for (int i = 0; i < 1000000000; i++) {
    }

    Or, perhaps the compiler placed the values in registers.

    There is a deeper design question for you.

    Are these values really related ? Do you do operations on them in
    tandem ? Would you ever think that it might be interesting to write a
    template function with a "pointer to member" of one of these values ?

    I would go with the 5 separate values if they are truly separate. That
    way it will be harder to run into other problems like going past array
    bounds or issues with using the wrong index etc.

    Anyhow, below is an example where the compiler can't (easily) make the
    optimization above. The results are essentially identical with:
    gcc version 4.0.0 20050102 (experimental)


    #include <ostream>
    #include <iostream>

    struct X
    {
    virtual void F() = 0; // hard for compiler to optimize this
    };

    struct A
    {
    int a, b, c, d, e;
    };

    struct B
    {
    int a[5];
    };


    struct Av
    : A, X
    {
    Av()
    : A()
    {
    }

    virtual void F()
    {
    a = 1;
    b = 2;
    c = 3;
    d = 4;
    e = 5;
    }
    };

    struct Bv
    : B, X
    {
    Bv()
    : B()
    {
    }

    virtual void F()
    {
    a[0] = 1;
    a[1] = 2;
    a[2] = 3;
    a[3] = 4;
    a[4] = 5;
    a[5] = 6;
    }
    };

    int main( int argc, char ** argv )
    {
    X * x;
    if ( argc >= 2 )
    {
    std::cout << "Making an A\n";
    x = new Av;
    }
    else
    {
    std::cout << "Making a B\n";
    x = new Bv;
    }

    for (int i = 0; i < 1000000000; i++)
    {
    x->F();
    }

    }


    $ g++ -O3 -o ./optimal_array_or_members ./optimal_array_or_members.cpp
    $ time ./optimal_array_or_members
    Making a B
    6.900u 0.000s 0:06.92 99.7% 0+0k 0+0io 216pf+0w
    $ time ./optimal_array_or_members d
    Making an A
    6.770u 0.000s 0:06.78 99.8% 0+0k 0+0io 216pf+0w
    $ time ./optimal_array_or_members
    Making a B
    6.960u 0.010s 0:06.96 100.1% 0+0k 0+0io 216pf+0w
    $ time ./optimal_array_or_members s
    Making an A
    6.920u 0.000s 0:06.92 100.0% 0+0k 0+0io 216pf+0w
    $ time ./optimal_array_or_members
    Making a B
    7.010u 0.000s 0:07.00 100.1% 0+0k 0+0io 216pf+0w
    $ time ./optimal_array_or_members s
    Making an A
    6.950u 0.000s 0:06.95 100.0% 0+0k 0+0io 216pf+0w
    $ time ./optimal_array_or_members
    Making a B
    6.770u 0.000s 0:06.76 100.1% 0+0k 0+0io 216pf+0w
    $ time ./optimal_array_or_members s
    Making an A
    6.850u 0.000s 0:06.84 100.1% 0+0k 0+0io 216pf+0w
     
    Gianni Mariani, Jan 24, 2005
    #6
  7. David White Guest

    <> wrote in message
    news:...

    [snip]

    > int main(char *argv[], int argc) {
    >
    > /*
    > int a, b, c, d, e;
    > for (int i = 0; i < 1000000000; i++) {
    > a = 1;
    > b = 2;
    > c = 3;
    > d = 4;
    > e = 5;
    > }
    >
    > */
    >
    >
    > int a[5];
    > for (int i = 0; i < 1000000000; i++) {
    > a[0] = 1;
    > a[1] = 2;
    > a[2] = 3;
    > a[3] = 4;
    > a[4] = 5;
    > }
    >
    > return 0;
    > }
    >
    > The first (commented out) version ran twice as fast. (For doubles
    > instead of ints, it was a factor of 4). So the simpleton part of me
    > thinks that that answers my question. The remaining part tells me that
    > it is never that simple. Finally, the cynical part of me thinks that it
    > all doesn't matter and other parts of the program are bound to be far
    > more time consuming.


    The most likely reason for the difference is that the compiler automatically
    places arrays in main memory but individual objects in registers when
    possible. I don't see how this test helps for your particular problem. You
    need your collection to be indexable, so you don't really have the option of
    using individual named variables. Although test programs like this can be
    useful, you have to determine the reason for the performance difference and
    consider whether the results will hold for your program. If the reason is
    register use, then the test probably doesn't mean much.

    DW
     
    David White, Jan 24, 2005
    #7
  8. Aaron,

    Performance on the toy code snippet is not informative of anything more
    than performance on the toy code snippet.

    > Please help me resolve my internal struggle.


    10 Write the code the most clear, maintable way possible.
    20 Run the code under a profiler.
    30 Optimize the actual performance bottleneck (as opposed to, say,
    imaginary performance bottlenecks).
    40 GOTO 20

    If you have any problems with 30, get back to the group. Until then,
    both you and us are just shooting in the dark.

    Joseph
     
    Joseph Turian, Jan 24, 2005
    #8
  9. Guest

    My feeling is that when you try to implement 'getNode', you will find
    that an array-based implemenation is faster than a switch statement.
    OTOH, you might find that templatizing 'getNode' on i, and then
    specializing for values of i is faster yet. Presumably, there is some
    more generic part of your application that makes 'getNodeA',
    'getNodeB', etc., prohibitive. /david
     
    , Jan 24, 2005
    #9
  10. Guest

    My feeling is that when you try to implement 'getNode', you will find
    that an array-based implemenation is faster than a switch statement.
    OTOH, you might find that templatizing 'getNode' on i, and then
    specializing for values of i is faster yet. Presumably, there is some
    more generic part of your application that makes 'getNodeA',
    'getNodeB', etc., prohibitive.
     
    , Jan 24, 2005
    #10
  11. Guest

    My feeling is that when you try to implement 'getNode', you will find
    that the array-based implementation is faster than a switch statement.
    You might be able to squeeze out some more performance though with a
    variable-based implementation if you templatize 'getNode' on i, and
    specialize for values of i. (Presumably there is some more generic code
    in your application which makes 'getNodeA', 'getNodeB', etc.,
    prohibitive.)
     
    , Jan 24, 2005
    #11
  12. Guest

    My feeling is that when you try to implement 'getNode', you will find
    that an array-based implemenation is faster than a switch statement.
    OTOH, you might find that templatizing 'getNode' on i, and then
    specializing for values of i is faster yet. Presumably, there is some
    more generic part of your application that makes 'getNodeA',
    'getNodeB', etc., prohibitive. /david
     
    , Jan 24, 2005
    #12
  13. Sethalicious Guest

    I'm thinking that your getting a silent error on the code. I don't
    believe an integer can handle the value 1000000000 in the following:

    > for (int i = 0; i < 1000000000; i++)


    So what's happening is your integer "i" is overflowing and getting set
    to an unpredictable value (possibly negative). You might have better
    luck using an "unsigned long" or even "std::size_t".

    I'm not sure what the standard says, but I think most compilers
    implement 32 bit integers. So an "unsigned" int is 0 - 2^32, and the
    "signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

    Check your compiler's docs to see the sizes of the integral data types.


    Otherwise, there should be no real difference in performance of the
    code with optimizations on. When you measure the time, remember to
    record cpu clock cycles instead of time in seconds.
    Hope this helps!

    Chris J. (aka Sethalicious)
     
    Sethalicious, Jan 24, 2005
    #13
  14. wrote:

    > I'm working on a scientific computing application.
    > I need a class called Element
    > which is no more than a collection of integers, or "nodes"
    > and has only on method int getNode(int i).


    > I would like to implement in the most efficient was possible. So I
    > summoned up my programming intellect and asked myself,
    > Do I want to have members such as int a, b, c, d and e?
    > Or a single member such as int a[5]?
    > So I wrote the following snippet and compiled it with a -O3 flag:


    > cat main.cc

    #include <iostream>

    int main(int argc, char* argv[]) {

    if (1 < argc) {
    const
    size_t n = atoi(argv[1]);
    #ifdef INNER
    int a[5];
    for (size_t j = 0; j < n; ++j) {
    a[0] = 1;
    a[1] = 2;
    a[2] = 3;
    a[3] = 4;
    a[4] = 5;
    }
    #else //INNER
    int a, b, c, d, e;
    for (size_t j = 0; j < n; ++j) {
    a = 1;
    b = 2;
    c = 3;
    d = 4;
    e = 5;
    }
    #endif//INNER
    }
    else {
    std::cerr << argv[0] << " <iterations>"
    << std::endl;
    }
    return 0;
    }

    > g++ --version

    g++ (GCC) 3.4.1
    > uname -srmpio

    Linux 2.6.10-1.9_FC2smp i686 i686 i386 GNU/Linux
    >
    > g++ -DINNER -Wall -ansi -pedantic -O3 -o main main.cc
    > time ./main 1000000000

    3.362u 0.004s 0:03.36 100.0% 0+0k 0+0io 0pf+0w
    > time ./main 1000000000

    3.357u 0.003s 0:03.36 99.7% 0+0k 0+0io 0pf+0w
    > time ./main 1000000000

    3.388u 0.003s 0:03.39 99.7% 0+0k 0+0io 0pf+0w
    > time ./main 1000000000

    3.388u 0.003s 0:03.39 99.7% 0+0k 0+0io 0pf+0w
    > time ./main 1000000000

    3.396u 0.002s 0:03.40 99.7% 0+0k 0+0io 0pf+0w
    > g++ -Wall -ansi -pedantic -O3 -o main main.cc
    >
    > time ./main 1000000000

    3.396u 0.005s 0:03.40 99.7% 0+0k 0+0io 0pf+0w
    > time ./main 1000000000

    3.360u 0.003s 0:03.37 99.7% 0+0k 0+0io 0pf+0w
    > time ./main 1000000000

    3.371u 0.004s 0:03.37 100.0% 0+0k 0+0io 0pf+0w
    > time ./main 1000000000

    3.388u 0.003s 0:03.39 99.7% 0+0k 0+0io 0pf+0w
    > time ./main 1000000000

    3.371u 0.003s 0:03.37 100.0% 0+0k 0+0io 0pf+0w

    > The first (commented out) version ran twice as fast.


    Mine doesn't.

    > (For doubles instead of ints, it was a factor of 4).
    > So the simpleton part of me thinks that that answers my question.
    > The remaining part tells me that it is never that simple.


    We call that good intuition.

    > Finally, the [skeptical] part of me thinks that it all doesn't matter
    > and other parts of the program are bound to be far more time consuming.


    Probably correct.

    > Please help me resolve my internal struggle.


    Try upgrading your compiler.
     
    E. Robert Tisdale, Jan 24, 2005
    #14
  15. Sethalicious wrote:
    > I'm thinking that your getting a silent error on the code. I don't
    > believe an integer can handle the value 1000000000 in the following:
    >
    >
    >>for (int i = 0; i < 1000000000; i++)

    >
    >
    > So what's happening is your integer "i" is overflowing and getting set
    > to an unpredictable value (possibly negative). You might have better
    > luck using an "unsigned long" or even "std::size_t".


    Not in this case - 1billion is less that 2^31.
    >
    > I'm not sure what the standard says, but I think most compilers
    > implement 32 bit integers. So an "unsigned" int is 0 to 2^32, and the
    > "signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).


    Think again.

    int has a range from - 2^31 to (2^31-1)
     
    Gianni Mariani, Jan 24, 2005
    #15
  16. wrote:
    > My feeling is that when you try to implement 'getNode', you will find
    > that the array-based implementation is faster than a switch statement.


    That's true. A pointer to member might be an alternative.

    > You might be able to squeeze out some more performance though with a
    > variable-based implementation if you templatize 'getNode' on i, and
    > specialize for values of i. (Presumably there is some more generic code
    > in your application which makes 'getNodeA', 'getNodeB', etc.,
    > prohibitive.)
    >


    Templates also take pointer to members.
     
    Gianni Mariani, Jan 24, 2005
    #16
  17. Sethalicious Guest

    I'm thinking that your getting a silent error on the code. I don't
    believe an integer can handle the value 1000000000 in the following:

    > for (int i = 0; i < 1000000000; i++)


    So what's happening is your integer "i" is overflowing and getting set
    to an unpredictable value (possibly negative). You might have better
    luck using an "unsigned long" or even "std::size_t".

    I'm not sure what the standard says, but I think most compilers
    implement 32 bit integers. So an "unsigned" int is 0 - 2^32, and the
    "signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

    Check your compiler's docs to see the sizes of the integral data types.


    Otherwise, there should be no real difference in performance of the
    code with optimizations on. When you measure the time, remember to
    record cpu clock cycles instead of time in seconds.
    Hope this helps!

    Chris J. (aka Sethalicious)
     
    Sethalicious, Jan 24, 2005
    #17
  18. wrote:
    > I'm working on a scientific computing application. I need a class
    > called Element which is no more than a collection of integers, or
    > "nodes" and has only on method int getNode(int i).
    >
    > I would like to implement in the most efficient was possible. So I
    > summoned up my programming intellect and asked myself: Do I want to
    > have members such as int a, b, c, d, e or a single member such as int
    > a[5]. So I wrote the following snippet and compiled it with a -O3
    > flag:


    Make sure that Element is well encapsulated so that you can switch easily
    between different implementations. Then, take Joseph's advice and don't worry
    about which implementation is better until you can do performance measurements
    in real-world scenarios.

    Jonathan
     
    Jonathan Turkanis, Jan 24, 2005
    #18
  19. Sethalicious Guest

    I'm thinking that your getting a silent error on the code. I don't
    believe an integer can handle the value 1000000000 in the following:

    > for (int i = 0; i < 1000000000; i++)


    So what's happening is your integer "i" is overflowing and getting set
    to an unpredictable value (possibly negative). You might have better
    luck using an "unsigned long" or even "std::size_t".

    I'm not sure what the standard says, but I think most compilers
    implement 32 bit integers. So an "unsigned" int is 0 - 2^32, and the
    "signed" int is -2^16 to 2^16. (Subtract or add a 1 where needed).

    Check your compiler's docs to see the sizes of the integral data types.


    Otherwise, there should be no real difference in performance of the
    code with optimizations on. When you measure the time, remember to
    record cpu clock cycles instead of time in seconds.
    Hope this helps!

    Chris J. (aka Sethalicious)
     
    Sethalicious, Jan 24, 2005
    #19
  20. Sethalicious Guest

    > Think again.
    >
    > int has a range from - 2^31 to (2^31-1)


    Whoops. Totally forgot about 2's complement. :-o I stand corrected!
    Thanks!


    Chris J. (aka Sethalicious)
     
    Sethalicious, Jan 24, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Trevor Hartman

    dataset efficiency question

    Trevor Hartman, Jul 3, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    410
    Trevor Hartman
    Jul 3, 2003
  2. The Eeediot
    Replies:
    3
    Views:
    546
    =?Utf-8?B?QW5kcmV3IENvcmxleSwgTUNTRCwgTUNEQkE=?=
    Nov 16, 2004
  3. Shawn
    Replies:
    6
    Views:
    459
    Scott M.
    Feb 13, 2005
  4. Edward A Thompson

    Performance/efficiency question

    Edward A Thompson, Jan 26, 2004, in forum: Java
    Replies:
    11
    Views:
    530
    Chris Smith
    Jan 27, 2004
  5. Superbasic efficiency question

    , Jan 24, 2005, in forum: C Programming
    Replies:
    38
    Views:
    893
    CBFalconer
    Jan 26, 2005
Loading...

Share This Page