Optimization with local vs. global arrays

Discussion in 'C++' started by Jim West, Apr 12, 2007.

  1. Jim West

    Jim West Guest

    The execution speed of the following code is dramatically faster if I
    declare some arrays globally rather than locally. That is

    FOO a[10], b[10], c[10];

    void bar() {
    ...
    }

    runs much faster (up to 33%) than

    void bar() {
    FOO a[10], b[10], c[10];
    ...
    }

    There is considerable work being performed in the ... section.
    This is on a Linux Itanium II system, compiled both with the Intel C++
    compiler (V9.1) with interprocedural optimization enabled, and with the
    GNU C V 3.3.5 compiler with -O3 optimization. (The performance change is
    more dramatic with the Intel Compiler.) I tried declaring the local
    FOO arrays static with

    static FOO a[10], b[10], c[10];

    which helped with the GNU compiler but was actually worse with the Intel
    compiler. I also tried

    FOO d[30];
    FOO *a = d, *b = d+10, *c = d+20;

    with a local d array, but that had no effect.

    Is this just a compiler issue, or am I missing something? I want to avoid
    the external arrays, obviously, but that code compiled by the Intel
    compiler gives the fastest execution speed by far. I'd like to get the
    equivalent performance with something less dangerous than global arrays.
    Jim West, Apr 12, 2007
    #1
    1. Advertising

  2. Jim West

    GeekBoy Guest

    "Jim West" <> wrote in message
    news:9FfTh.98640$...
    > The execution speed of the following code is dramatically faster if I
    > declare some arrays globally rather than locally. That is
    >
    > FOO a[10], b[10], c[10];
    >
    > void bar() {
    > ...
    > }
    >
    > runs much faster (up to 33%) than
    >
    > void bar() {
    > FOO a[10], b[10], c[10];
    > ...
    > }
    >
    > There is considerable work being performed in the ... section.
    > This is on a Linux Itanium II system, compiled both with the Intel C++
    > compiler (V9.1) with interprocedural optimization enabled, and with the
    > GNU C V 3.3.5 compiler with -O3 optimization. (The performance change is
    > more dramatic with the Intel Compiler.) I tried declaring the local
    > FOO arrays static with
    >
    > static FOO a[10], b[10], c[10];
    >
    > which helped with the GNU compiler but was actually worse with the Intel
    > compiler. I also tried
    >
    > FOO d[30];
    > FOO *a = d, *b = d+10, *c = d+20;
    >
    > with a local d array, but that had no effect.
    >
    > Is this just a compiler issue, or am I missing something? I want to avoid
    > the external arrays, obviously, but that code compiled by the Intel
    > compiler gives the fastest execution speed by far. I'd like to get the
    > equivalent performance with something less dangerous than global arrays.


    Faster processor?
    GeekBoy, Apr 12, 2007
    #2
    1. Advertising

  3. Jim West

    Mark P Guest

    Jim West wrote:
    > The execution speed of the following code is dramatically faster if I
    > declare some arrays globally rather than locally. That is
    >
    > FOO a[10], b[10], c[10];
    >
    > void bar() {
    > ...
    > }
    >
    > runs much faster (up to 33%) than
    >
    > void bar() {
    > FOO a[10], b[10], c[10];
    > ...
    > }
    >
    > There is considerable work being performed in the ... section.
    > This is on a Linux Itanium II system, compiled both with the Intel C++
    > compiler (V9.1) with interprocedural optimization enabled, and with the
    > GNU C V 3.3.5 compiler with -O3 optimization. (The performance change is
    > more dramatic with the Intel Compiler.) I tried declaring the local
    > FOO arrays static with
    >
    > static FOO a[10], b[10], c[10];
    >
    > which helped with the GNU compiler but was actually worse with the Intel
    > compiler. I also tried
    >
    > FOO d[30];
    > FOO *a = d, *b = d+10, *c = d+20;
    >
    > with a local d array, but that had no effect.
    >
    > Is this just a compiler issue, or am I missing something? I want to avoid
    > the external arrays, obviously, but that code compiled by the Intel
    > compiler gives the fastest execution speed by far. I'd like to get the
    > equivalent performance with something less dangerous than global arrays.


    It's not especially surprising that the local arrays, which may be
    pushed on the stack with each invocation of bar, would be slower than
    the global arrays. If you want something "safer" you could try moving
    the arrays to a namespace.

    Mark
    Mark P, Apr 12, 2007
    #3
  4. Jim West

    Jim West Guest

    On 2007-04-12, GeekBoy <> wrote:
    >
    > Faster processor?


    No, all are run on the same system, OS etc. It is compiled with
    the Intel compiler using

    icc -O3 -ip -c foo.cc

    and with the GNU compiler using

    g++ -O3 -c foo.cc
    Jim West, Apr 12, 2007
    #4
  5. Jim West

    Jim West Guest

    On 2007-04-12, Mark P <> wrote:
    > It's not especially surprising that the local arrays, which may be
    > pushed on the stack with each invocation of bar, would be slower than
    > the global arrays. If you want something "safer" you could try moving
    > the arrays to a namespace.


    OK, I had thought that the time needed to push the small arrays on the
    stack (FOO isn't a very large class) would be small compared to the
    heavy number crunching I do in the bar() routine. Guess not!

    The namespace solution is what I needed, since some of the array names
    are reused through-out the code. Seems obvious once it was pointed out.
    :)

    Thanks for the help.
    Jim West, Apr 12, 2007
    #5
  6. Jim West

    Ian Collins Guest

    Jim West wrote:
    > The execution speed of the following code is dramatically faster if I
    > declare some arrays globally rather than locally. That is
    >
    > FOO a[10], b[10], c[10];
    >
    > void bar() {
    > ...
    > }
    >
    > runs much faster (up to 33%) than
    >
    > void bar() {
    > FOO a[10], b[10], c[10];
    > ...
    > }
    >

    What is a FOO?

    Does it require construction?

    Do you call bar() in a loop?

    --
    Ian Collins.
    Ian Collins, Apr 12, 2007
    #6
  7. Jim West

    Jim West Guest

    On 2007-04-12, Ian Collins <> wrote:
    > Jim West wrote:
    >> The execution speed of the following code is dramatically faster if I
    >> declare some arrays globally rather than locally. That is
    >>
    >> FOO a[10], b[10], c[10];
    >>
    >> void bar() {
    >> ...
    >> }
    >>
    >> runs much faster (up to 33%) than
    >>
    >> void bar() {
    >> FOO a[10], b[10], c[10];
    >> ...
    >> }
    >>

    > What is a FOO?
    >
    > Does it require construction?
    >
    > Do you call bar() in a loop?



    FOO is actually a three-dimensional space vector:

    class FOO {
    float x, y, z;
    FOO() : x_(0), y_(0), z_(0) { };
    FOO(float x, float y, float z) : x_(x), y_(y), z_(z) { };
    inline FOO& operator+=(const FOO& a);
    /* Many more inline operators and member functions included */
    };

    bar() is called many times in a loop.
    Jim West, Apr 12, 2007
    #7
  8. Jim West

    Ian Collins Guest

    Jim West wrote:
    > On 2007-04-12, Ian Collins <> wrote:
    >
    >>Jim West wrote:
    >>
    >>>The execution speed of the following code is dramatically faster if I
    >>>declare some arrays globally rather than locally. That is
    >>>
    >>>FOO a[10], b[10], c[10];
    >>>
    >>>void bar() {
    >>> ...
    >>>}
    >>>
    >>>runs much faster (up to 33%) than
    >>>
    >>>void bar() {
    >>> FOO a[10], b[10], c[10];
    >>> ...
    >>>}
    >>>

    >>
    >>What is a FOO?
    >>
    >>Does it require construction?
    >>
    >>Do you call bar() in a loop?

    >
    >
    >
    > FOO is actually a three-dimensional space vector:
    >
    > class FOO {
    > float x, y, z;
    > FOO() : x_(0), y_(0), z_(0) { };
    > FOO(float x, float y, float z) : x_(x), y_(y), z_(z) { };
    > inline FOO& operator+=(const FOO& a);
    > /* Many more inline operators and member functions included */
    > };
    >
    > bar() is called many times in a loop.


    So there's your reason - FOO() gets called 30 times for each call of bar().

    --
    Ian Collins.
    Ian Collins, Apr 12, 2007
    #8
  9. Jim West

    GeekBoy Guest

    "Ian Collins" <> wrote in message
    news:...
    > Jim West wrote:
    >> The execution speed of the following code is dramatically faster if I
    >> declare some arrays globally rather than locally. That is
    >>
    >> FOO a[10], b[10], c[10];
    >>
    >> void bar() {
    >> ...
    >> }
    >>
    >> runs much faster (up to 33%) than
    >>
    >> void bar() {
    >> FOO a[10], b[10], c[10];
    >> ...
    >> }
    >>

    > What is a FOO?
    >

    Foobar is a universal variable understood to represent whatever is being
    discussed.
    It's usually used in examples that illustrate concepts and ideas in computer
    science.


    For instance, a computer science professor may be discussing different file
    formats. In this case, he would call the generic-example file foo or foobar,
    then list the extensions associated with the file formats (e.g. foobar.txt,
    foobar.gif, foobar.exe, foobar.tar).

    When foo or foobar is used, everyone understands that these are just
    examples, and they don't really exist.


    Programmers and administrators also use foo and foobar in a similar context.
    Files or program s named with foo or foobar are understood not to be
    permanent and will be changed or deleted at anytime.


    Foo, bar, and the compound foobar were commonly used at MIT, Stanford and
    the Helsinki University of Technology, Finland. Other generic variables are
    used other places, but only these three are considered universal.


    > Does it require construction?
    >
    > Do you call bar() in a loop?
    >
    > --
    > Ian Collins.
    GeekBoy, Apr 12, 2007
    #9
  10. Jim West

    Ian Collins Guest

    GeekBoy wrote:
    > "Ian Collins" <> wrote in message
    > news:...
    >>
    >>What is a FOO?
    >>

    >
    > When foo or foobar is used, everyone understands that these are just
    > examples, and they don't really exist.
    >

    Not in this case, if you read the OP's reply.
    >
    > Foo, bar, and the compound foobar were commonly used at MIT, Stanford and
    > the Helsinki University of Technology, Finland. Other generic variables are
    > used other places, but only these three are considered universal.
    >

    If you haven't done so already, research the origin of the term.
    >>
    >>--
    >>Ian Collins.

    >

    *Please* don't quote signatures.

    --
    Ian Collins.
    Ian Collins, Apr 12, 2007
    #10
  11. Jim West

    Pete Becker Guest

    Jim West wrote:
    > The execution speed of the following code is dramatically faster if I
    > declare some arrays globally rather than locally. That is
    >
    > FOO a[10], b[10], c[10];
    >
    > void bar() {
    > ...
    > }
    >
    > runs much faster (up to 33%) than
    >
    > void bar() {
    > FOO a[10], b[10], c[10];
    > ...
    > }
    >


    If, as seems to be the case, the culprit is the constructor for FOO, you
    can just remove it. Since the version with the global array apparently
    works correctly, the rest of the code isn't relying on having the FOO
    values initialized on entry into bar.

    --

    -- Pete
    Roundhouse Consulting, Ltd. (www.versatilecoding.com)
    Author of "The Standard C++ Library Extensions: a Tutorial and
    Reference." (www.petebecker.com/tr1book)
    Pete Becker, Apr 12, 2007
    #11
  12. Jim West

    Howard Guest

    "Pete Becker" <> wrote in message
    news:...
    > Jim West wrote:
    >> The execution speed of the following code is dramatically faster if I
    >> declare some arrays globally rather than locally. That is
    >>
    >> FOO a[10], b[10], c[10];
    >>
    >> void bar() {
    >> ...
    >> }
    >>
    >> runs much faster (up to 33%) than
    >>
    >> void bar() {
    >> FOO a[10], b[10], c[10];
    >> ...
    >> }
    >>

    >
    > If, as seems to be the case, the culprit is the constructor for FOO, you
    > can just remove it. Since the version with the global array apparently
    > works correctly, the rest of the code isn't relying on having the FOO
    > values initialized on entry into bar.
    >


    In another part of the thread, he shows two constructors, one default and
    one parameterized. He can't remove the default constructor because the
    array definitions require one, and the compiler can't create one if there's
    a parameterized version, right? (Of course, he may be able to remove the
    parameterized version as well; I don't know.) But won't the
    compiler-generated constructor perform the same default initializations, and
    still get called 30 times for each call to bar? Or am I misunderstanding
    something here?

    In any case, what I might do in his case is declare the arrays in the
    function which calls bar, prior to the loop that [apparently] calls bar
    repeatedly, and simply pass pointers to those arrays to bar.

    -Howard
    Howard, Apr 12, 2007
    #12
  13. Jim West

    Greg Comeau Guest

    In article <hPrTh.288385$>,
    Howard <> wrote:
    >"Pete Becker" <> wrote in message
    >news:...
    >> Jim West wrote:
    >>> The execution speed of the following code is dramatically faster if I
    >>> declare some arrays globally rather than locally. That is
    >>>
    >>> FOO a[10], b[10], c[10];
    >>>
    >>> void bar() {
    >>> ...
    >>> }
    >>>
    >>> runs much faster (up to 33%) than
    >>>
    >>> void bar() {
    >>> FOO a[10], b[10], c[10];
    >>> ...
    >>> }
    >>>

    >>
    >> If, as seems to be the case, the culprit is the constructor for FOO, you
    >> can just remove it. Since the version with the global array apparently
    >> works correctly, the rest of the code isn't relying on having the FOO
    >> values initialized on entry into bar.
    >>

    >
    >In another part of the thread, he shows two constructors, one default and
    >one parameterized. He can't remove the default constructor because the
    >array definitions require one, and the compiler can't create one if there's
    >a parameterized version, right? (Of course, he may be able to remove the
    >parameterized version as well; I don't know.) But won't the
    >compiler-generated constructor perform the same default initializations, and
    >still get called 30 times for each call to bar? Or am I misunderstanding
    >something here?
    >
    >In any case, what I might do in his case is declare the arrays in the
    >function which calls bar, prior to the loop that [apparently] calls bar
    >repeatedly, and simply pass pointers to those arrays to bar.


    That was added later in the thread. Pete's point was probably
    that (even in light of any new info) that since he is worried
    about speed that the global array's (even if tossed into a named
    or named namespace) are zero-initialized first because they are static
    before other intialization on them occurs.

    BTW, this can be different from having static's inside the function.
    Actually, a bit of this, including the timing of the zero'ing of the
    gloabals, is up to the compiler [system].

    Lastly to Jim, if this is utterly crucial, there is probably
    some other solution possible even faster than the globals,
    but we probably don't know enough about what you're doing
    to offer that at this point.
    --
    Greg Comeau / 4.3.9 with C++0xisms now in beta!
    Comeau C/C++ ONLINE ==> http://www.comeaucomputing.com/tryitout
    World Class Compilers: Breathtaking C++, Amazing C99, Fabulous C90.
    Comeau C/C++ with Dinkumware's Libraries... Have you tried it?
    Greg Comeau, Apr 12, 2007
    #13
  14. Jim West

    Pete Becker Guest

    Howard wrote:
    > "Pete Becker" <> wrote in message
    > news:...
    >> Jim West wrote:
    >>> The execution speed of the following code is dramatically faster if I
    >>> declare some arrays globally rather than locally. That is
    >>>
    >>> FOO a[10], b[10], c[10];
    >>>
    >>> void bar() {
    >>> ...
    >>> }
    >>>
    >>> runs much faster (up to 33%) than
    >>>
    >>> void bar() {
    >>> FOO a[10], b[10], c[10];
    >>> ...
    >>> }
    >>>

    >> If, as seems to be the case, the culprit is the constructor for FOO, you
    >> can just remove it. Since the version with the global array apparently
    >> works correctly, the rest of the code isn't relying on having the FOO
    >> values initialized on entry into bar.
    >>

    >
    > In another part of the thread, he shows two constructors, one default and
    > one parameterized. He can't remove the default constructor because the
    > array definitions require one, and the compiler can't create one if there's
    > a parameterized version, right?


    Shrug. Write an empty default constructor, or refactor. I'm not
    particularly interested in getting into implementation details. The
    point is that the initialization apparently isn't actually required, so
    doesn't belong in the class.

    (Of course, he may be able to remove the
    > parameterized version as well; I don't know.) But won't the
    > compiler-generated constructor perform the same default initializations, and
    > still get called 30 times for each call to bar? Or am I misunderstanding
    > something here?


    The compiler-generated constructor uses the default initializer for each
    of the float fields, and that initializer does nothing. Any reasonable
    compiler will generate no code.

    >
    > In any case, what I might do in his case is declare the arrays in the
    > function which calls bar, prior to the loop that [apparently] calls bar
    > repeatedly, and simply pass pointers to those arrays to bar.
    >


    But that unnecessarily increases coupling because every caller now has
    to provide scratch data for bar.

    --

    -- Pete
    Roundhouse Consulting, Ltd. (www.versatilecoding.com)
    Author of "The Standard C++ Library Extensions: a Tutorial and
    Reference." (www.petebecker.com/tr1book)
    Pete Becker, Apr 12, 2007
    #14
  15. Jim West

    Pete Becker Guest

    Greg Comeau wrote:
    >
    > That was added later in the thread. Pete's point was probably
    > that (even in light of any new info) that since he is worried
    > about speed that the global array's (even if tossed into a named
    > or named namespace) are zero-initialized first because they are static
    > before other intialization on them occurs.
    >


    Not quite. My point was that the global arrays demonstrate that
    initialization is irrelevant, because on calls to bar after the first
    one they have whatever junk was left over from the previous call. If
    that works, then skip the initialization entirely.

    --

    -- Pete
    Roundhouse Consulting, Ltd. (www.versatilecoding.com)
    Author of "The Standard C++ Library Extensions: a Tutorial and
    Reference." (www.petebecker.com/tr1book)
    Pete Becker, Apr 12, 2007
    #15
  16. Jim West

    Howard Guest

    "Pete Becker" <> wrote in message
    news:...

    >>
    >> In any case, what I might do in his case is declare the arrays in the
    >> function which calls bar, prior to the loop that [apparently] calls bar
    >> repeatedly, and simply pass pointers to those arrays to bar.
    >>

    >
    > But that unnecessarily increases coupling because every caller now has to
    > provide scratch data for bar.
    >


    True. I hadn't thought of that. I guess without knowing more details of
    how bar is called and what it's used for, it's tough to come up with the
    "best" solution.

    -Howard
    Howard, Apr 12, 2007
    #16
  17. Jim West

    Jim West Guest

    On 2007-04-12, Howard <> wrote:
    >
    > "Pete Becker" <> wrote in message
    > news:...
    >
    >>>
    >>> In any case, what I might do in his case is declare the arrays in the
    >>> function which calls bar, prior to the loop that [apparently] calls bar
    >>> repeatedly, and simply pass pointers to those arrays to bar.
    >>>

    >>
    >> But that unnecessarily increases coupling because every caller now has to
    >> provide scratch data for bar.
    >>

    >
    > True. I hadn't thought of that. I guess without knowing more details of
    > how bar is called and what it's used for, it's tough to come up with the
    > "best" solution.


    Following up on my situation...I unfortunately can't post to newsgroups
    at work, but I followed the discussions on google groups. While the FOO
    class need not be initialized in this particular case, it is part of a
    numerical library that my group has put together over the years, and any
    changes to the default constructor would potentially break a lot of
    code. Initializing all components to zero seemed like a Good Idea(tm)
    when I first wrote it a long time ago, but in hindsight it was clearly a
    mistake. In any event, I copied the header to this particular file and
    changed FOO to FOO2 everywhere and changed the default constructor to

    FOO2() { };

    This did not give any improvement, surprisingly. (At least with the
    Intel compiler...I forgot to try g++.)

    Allocating the arrays in the calling routines would work, but bar() is
    indeed called in several different places and would require more editing
    than I want to do at this point.

    So, my solution is to use the namespace idea that Mark P (I believe)
    suggested (already implemented). I need to protect them because their
    real names are

    FOO Observation_Points[10], Source_Points[10], Basis_Functions[10];

    names which get reused in a lot of different places throughout the code.
    If I get in the habit of declaring things globally but not protected by
    namespaces I'm afraid it would only be a matter of time before bar7()
    calls bar8() and changes values unexpectedly.

    Finally, I think that this change is going to be sufficient. This is a
    computational electromagnetics code, and the complex math involved
    within bar() is quite intensive (it performs a four-dimensional numerical
    quadrature to yield an electromagnetic field), and should overwhelm
    any more minor tweaking that is possible. (Famous last words...I thought
    the same about the constructor!)

    Thanks for the help everyone. I really appreciate it, and I have learned
    quite a bit about the cost of constructors that seem really, really
    simple on the surface!

    - Jim
    Jim West, Apr 13, 2007
    #17
  18. Jim West

    Jim West Guest

    On 2007-04-13, Jim West <> wrote:
    > code. Initializing all components to zero seemed like a Good Idea(tm)
    > when I first wrote it a long time ago, but in hindsight it was clearly a
    > mistake. In any event, I copied the header to this particular file and
    > changed FOO to FOO2 everywhere and changed the default constructor to
    >
    > FOO2() { };
    >
    > This did not give any improvement, surprisingly.


    I take that back. I mis-read 33.8 seconds as 38 seconds (I must be
    really tired). 33.8 is only slightly slower than I get with the
    global arrays. I'll have to re-consider changing the class default
    constructor and see how many things really do break.

    I think I'll go to bed now before I hurt myself.
    Jim West, Apr 13, 2007
    #18
  19. Jim West

    Pete Becker Guest

    Jim West wrote:
    > On 2007-04-13, Jim West <> wrote:
    >> code. Initializing all components to zero seemed like a Good Idea(tm)
    >> when I first wrote it a long time ago, but in hindsight it was clearly a
    >> mistake. In any event, I copied the header to this particular file and
    >> changed FOO to FOO2 everywhere and changed the default constructor to
    >>
    >> FOO2() { };
    >>
    >> This did not give any improvement, surprisingly.

    >
    > I take that back. I mis-read 33.8 seconds as 38 seconds (I must be
    > really tired). 33.8 is only slightly slower than I get with the
    > global arrays. I'll have to re-consider changing the class default
    > constructor and see how many things really do break.
    >
    > I think I'll go to bed now before I hurt myself.


    Benchmarking sure is fun!

    --

    -- Pete
    Roundhouse Consulting, Ltd. (www.versatilecoding.com)
    Author of "The Standard C++ Library Extensions: a Tutorial and
    Reference." (www.petebecker.com/tr1book)
    Pete Becker, Apr 13, 2007
    #19
  20. Jim West

    Ian Collins Guest

    Jim West wrote:
    > On 2007-04-13, Jim West <> wrote:
    >
    >>code. Initializing all components to zero seemed like a Good Idea(tm)
    >>when I first wrote it a long time ago, but in hindsight it was clearly a
    >>mistake. In any event, I copied the header to this particular file and
    >>changed FOO to FOO2 everywhere and changed the default constructor to
    >>
    >> FOO2() { };
    >>
    >>This did not give any improvement, surprisingly.

    >
    >
    > I take that back. I mis-read 33.8 seconds as 38 seconds (I must be
    > really tired). 33.8 is only slightly slower than I get with the
    > global arrays. I'll have to re-consider changing the class default
    > constructor and see how many things really do break.
    >

    I suggest you investigate profilers for your platform/tools, I'm
    guessing your platform is Linux, if so, Sun Studio has some excellent
    profiling and analysis tools you could use.

    --
    Ian Collins.
    Ian Collins, Apr 13, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. karim
    Replies:
    1
    Views:
    756
    George Ter-Saakov
    Jun 26, 2003
  2. Chris Berg
    Replies:
    9
    Views:
    342
    =?ISO-8859-1?Q?Daniel_Sj=F6blom?=
    Dec 8, 2004
  3. Raymond Hettinger
    Replies:
    7
    Views:
    286
    Michael Hudson
    Apr 19, 2004
  4. Ravikiran

    Zero Optimization and Sign Optimization???

    Ravikiran, Nov 17, 2008, in forum: C Programming
    Replies:
    22
    Views:
    842
    Thad Smith
    Nov 24, 2008
  5. Philipp
    Replies:
    21
    Views:
    1,105
    Philipp
    Jan 20, 2009
Loading...

Share This Page