Array class and pointer aliasing problems

Discussion in 'C++' started by Mathieu Benoit, Jan 18, 2004.

  1. (I'm having trouble to post this, sorry if this mail comes
    several times)

    I'm trying to optimize the use of an integer array class,
    and I found that one major problem is pointer aliasing.

    I consider the following example:

    --------------------------------------------------------
    class IntTab
    {
    public:
    IntTab(int n, int m) :
    data(new int[n*m]), dim0(n), dim1(m) {
    }
    ~IntTab() {
    delete data;
    }
    int ni() const {
    return dim0;
    }
    int nj() const {
    return dim1;
    }
    int & operator()(int i, int j) {
    return data[i*dim1+j];
    }
    const int & operator()(int i, int j) const {
    return data[i*dim1+j];
    }
    void myjob(IntTab & a) const;
    private:
    int * const data;
    const int dim0;
    const int dim1;
    };

    void myjob(const IntTab & a, IntTab & b)
    {
    int i, j, k;
    int ni = b.ni();
    int nj = b.nj();
    for (i=0, k=0; i<ni; i++)
    for (j=0; j<nj; j++, k++)
    b(k,0) = a(i,j);
    }

    void IntTab::myjob(IntTab & a) const
    {
    int a_j = a.dim1;
    int ni = dim0;
    int nj = dim1;
    int i, j, k;
    int * a_data = a.data;
    int * my_data = data;
    for (i=0, k=0; i<ni; i++)
    for (j=0; j<nj; j++, k++)
    a_data[k*a_j] = my_data[i*nj+j];
    }
    ---------------------------------------------------------

    "myjob" is poorly optimized by the compiler (g++ 3.2)
    because it assumes that in the process of writing b, I could
    silently update the pointers (a.data and b.data) and the
    index counters (a.dim1 and b.dim1) which are used in the
    loop. Indeed, making these variables local as in
    IntTab::myjob is much better but of course I prefer to write
    my algorithms with the operator() which allows to easily put
    some asserts for debugging.

    Strangely enough, changing my array for an array of double
    leads to the same problem, even with "-fstrict-aliasing".

    Moreover, the "restrict" keyword might help here but it is
    not supported by g++. Is there another way to tell the
    compiler that caching the pointers is safe while preserving
    readability ?

    Is there something special with g++3.2 ?

    Thanks,
    Benoit MATHIEU
     
    Mathieu Benoit, Jan 18, 2004
    #1
    1. Advertising

  2. Mathieu Benoit

    Jumbo Guest

    "Mathieu Benoit" <> wrote in message
    news:400a7ccb$0$22298$...
    > (I'm having trouble to post this, sorry if this mail comes
    > several times)
    >
    > I'm trying to optimize the use of an integer array class,
    > and I found that one major problem is pointer aliasing.
    >
    > I consider the following example:
    >
    > --------------------------------------------------------
    > class IntTab
    > {
    > public:
    > IntTab(int n, int m) :
    > data(new int[n*m]), dim0(n), dim1(m) {
    > }
    > ~IntTab() {
    > delete data;
    > }
    > int ni() const {
    > return dim0;
    > }
    > int nj() const {
    > return dim1;
    > }
    > int & operator()(int i, int j) {
    > return data[i*dim1+j];
    > }
    > const int & operator()(int i, int j) const {
    > return data[i*dim1+j];
    > }
    > void myjob(IntTab & a) const;
    > private:
    > int * const data;
    > const int dim0;
    > const int dim1;
    > };
    >
    > void myjob(const IntTab & a, IntTab & b)
    > {
    > int i, j, k;
    > int ni = b.ni();
    > int nj = b.nj();
    > for (i=0, k=0; i<ni; i++)
    > for (j=0; j<nj; j++, k++)
    > b(k,0) = a(i,j);
    > }
    >
    > void IntTab::myjob(IntTab & a) const
    > {
    > int a_j = a.dim1;
    > int ni = dim0;
    > int nj = dim1;
    > int i, j, k;
    > int * a_data = a.data;
    > int * my_data = data;
    > for (i=0, k=0; i<ni; i++)
    > for (j=0; j<nj; j++, k++)
    > a_data[k*a_j] = my_data[i*nj+j];
    > }
    > ---------------------------------------------------------
    >
    > "myjob" is poorly optimized by the compiler (g++ 3.2)
    > because it assumes that in the process of writing b, I could
    > silently update the pointers (a.data and b.data) and the
    > index counters (a.dim1 and b.dim1) which are used in the
    > loop. Indeed, making these variables local as in
    > IntTab::myjob is much better but of course I prefer to write
    > my algorithms with the operator() which allows to easily put
    > some asserts for debugging.
    >
    > Strangely enough, changing my array for an array of double
    > leads to the same problem, even with "-fstrict-aliasing".
    >
    > Moreover, the "restrict" keyword might help here but it is
    > not supported by g++. Is there another way to tell the
    > compiler that caching the pointers is safe while preserving
    > readability ?
    >
    > Is there something special with g++3.2 ?
    >
    > Thanks,
    > Benoit MATHIEU
    >

    Why don't you pass the dimensions into the function and call it like this:
    myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())
     
    Jumbo, Jan 18, 2004
    #2
    1. Advertising

  3. Re: Array class and pointer aliasing problems (well, just performanceissues in fact)

    >>I'm trying to optimize an integer array class,
    >>and I found that one major problem is pointer aliasing.
    >>--------------------------------------------------------
    >>class IntTab
    >>{
    >>public:
    >> IntTab(int n, int m) :
    >> data(new int[n*m]), dim0(n), dim1(m) {
    >> }
    >> ~IntTab() {
    >> delete data;
    >> }
    >> int ni() const {
    >> return dim0;
    >> }
    >> int nj() const {
    >> return dim1;
    >> }
    >> int & operator()(int i, int j) {
    >> return data[i*dim1+j];
    >> }
    >> const int & operator()(int i, int j) const {
    >> return data[i*dim1+j];
    >> }
    >> void myjob(IntTab & a) const;
    >>private:
    >> int * const data;
    >> const int dim0;
    >> const int dim1;
    >>};
    >>
    >>void myjob(const IntTab & a, IntTab & b)
    >>{
    >> int i, j, k;
    >> int ni = b.ni();
    >> int nj = b.nj();
    >> for (i=0, k=0; i<ni; i++)
    >> for (j=0; j<nj; j++, k++)
    >> b(k,0) = a(i,j);

    <snip>
    >>"myjob" is poorly optimized by the compiler (g++ 3.2)
    >>because it assumes that in the process of writing b, I could
    >>silently update the pointers (a.data and b.data) and the
    >>index counters (a.dim1 and b.dim1) which are used in the
    >>loop.

    <snip>

    > Why don't you pass the dimensions into the function and call it like this:
    > myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())


    Would this really help ? I already made a local copy of ni
    and nj... I can see in the assembly output that ni and nj
    and handled efficiently, only data and dim1 are reloaded
    every time...

    Maybe this issue is strongly related to the implementation
    details in g++ (which might be particularly conservative
    about aliasing) and I should ask in another group. But if
    this is a g++ issue and the answer is g++ specific, then I
    will probably not use it...

    I would be more interested if somebody knows about a C++
    construct which clearly tells the compiler that the actual
    arrays will never alias these indexes and pointers... Does
    the standard say something about that ?
     
    Benoit Mathieu, Jan 19, 2004
    #3
  4. Mathieu Benoit

    Jumbo Guest

    Re: Array class and pointer aliasing problems (well, just performance issues in fact)

    "Benoit Mathieu" <> wrote in message
    news:400c51cb$0$1167$...
    > >>I'm trying to optimize an integer array class,
    > >>and I found that one major problem is pointer aliasing.
    > >>--------------------------------------------------------
    > >>class IntTab
    > >>{
    > >>public:
    > >> IntTab(int n, int m) :
    > >> data(new int[n*m]), dim0(n), dim1(m) {
    > >> }
    > >> ~IntTab() {
    > >> delete data;
    > >> }
    > >> int ni() const {
    > >> return dim0;
    > >> }
    > >> int nj() const {
    > >> return dim1;
    > >> }
    > >> int & operator()(int i, int j) {
    > >> return data[i*dim1+j];
    > >> }
    > >> const int & operator()(int i, int j) const {
    > >> return data[i*dim1+j];
    > >> }
    > >> void myjob(IntTab & a) const;
    > >>private:
    > >> int * const data;
    > >> const int dim0;
    > >> const int dim1;
    > >>};
    > >>
    > >>void myjob(const IntTab & a, IntTab & b)
    > >>{
    > >> int i, j, k;
    > >> int ni = b.ni();
    > >> int nj = b.nj();
    > >> for (i=0, k=0; i<ni; i++)
    > >> for (j=0; j<nj; j++, k++)
    > >> b(k,0) = a(i,j);

    > <snip>
    > >>"myjob" is poorly optimized by the compiler (g++ 3.2)
    > >>because it assumes that in the process of writing b, I could
    > >>silently update the pointers (a.data and b.data) and the
    > >>index counters (a.dim1 and b.dim1) which are used in the
    > >>loop.

    > <snip>
    >
    > > Why don't you pass the dimensions into the function and call it like

    this:
    > > myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())

    >
    > Would this really help ? I already made a local copy of ni
    > and nj... I can see in the assembly output that ni and nj
    > and handled efficiently, only data and dim1 are reloaded
    > every time...
    >
    > Maybe this issue is strongly related to the implementation
    > details in g++ (which might be particularly conservative
    > about aliasing) and I should ask in another group. But if
    > this is a g++ issue and the answer is g++ specific, then I
    > will probably not use it...
    >
    > I would be more interested if somebody knows about a C++
    > construct which clearly tells the compiler that the actual
    > arrays will never alias these indexes and pointers... Does
    > the standard say something about that ?
    >

    Sorry I don't know.
    What about doing it an an asm block, or something like that?
     
    Jumbo, Jan 20, 2004
    #4
  5. Re: Array class and pointer aliasing problems (well, justperformanceissues in fact)

    Benoit Mathieu wrote:
    >
    > Maybe this issue is strongly related to the implementation
    > details in g++ (which might be particularly conservative
    > about aliasing) and I should ask in another group. But if
    > this is a g++ issue and the answer is g++ specific, then I
    > will probably not use it...
    >
    > I would be more interested if somebody knows about a C++
    > construct which clearly tells the compiler that the actual
    > arrays will never alias these indexes and pointers... Does
    > the standard say something about that ?


    What do you want to achieve?
    As I see it, you want the compiler to replace the index
    calculation in

    for (j=0; j<nj; j++, k++)
    b(k,0) = a(i,j);

    with some sort of pointer magic. The same thing it can
    do when you write

    for( i = 0; i < size; ++i )
    m = ...

    where the compiler can replace the whole thing with

    basePtr = m;
    for( i = 0; i < size; ++i, basePtr++ )
    *basePtr = ...

    If your compiler is unable to do this transformation in
    your current case, then clearly it is a quality of implementation
    issue (meaning: g++ specific).

    But what can you do?
    One thing you can do in any case is to look how others solve this
    (or a similar) problem. What immediatly comes to my mind is the
    STL. How do they do it? The STL has introduced a concept called
    iterators. There are functions for getting the first iterator
    and incrementing such an iterator. Well, in the above, transformed
    example, basePtr acts as an iterator! It is assigned a starting value
    and after that the pointer (aehh: iterator) is incremented and evaluated.
    Hmm.

    So you could add functions to your class IntTab:

    class intTab
    {
    int* GetFirstColumn( int Row ) { return &data[ Row*dim1 ]; }
    int GetNextColumnValue( int*& Pos ) { return *(Pos++); }

    ...
    };

    And your looping construct modifies to:

    for( i = 0, k = 0; i < n1; ++i ) {
    int* PosA = a.GetFirstColumn( i );

    for( j = 0; j < nj; j++, k++ )
    b(k,0) = a.GetNextColumnValue( PosA );
    }

    Try it and see, if the compiler is able to optimize this in a better
    way.

    Of course one could polish up the above by introducing a seperate
    iterator class etc., but for a quick test to see if your compiler can
    do a better job the above should be sufficient.

    Yes I am aware, that this could be seen as premature oprimization
    and as we all know this is the root of all evil :)

    --
    Karl Heinz Buchegger
     
    Karl Heinz Buchegger, Jan 20, 2004
    #5
  6. Re: Array class and pointer aliasing problems (well, just performanceissues in fact)

    Karl Heinz Buchegger wrote:
    <snip>
    > So you could add functions to your class IntTab:
    >
    > class intTab
    > {
    > int* GetFirstColumn( int Row ) { return &data[ Row*dim1 ]; }
    > int GetNextColumnValue( int*& Pos ) { return *(Pos++); }
    >
    > ...
    > };
    >
    > And your looping construct modifies to:
    >
    > for( i = 0, k = 0; i < n1; ++i ) {
    > int* PosA = a.GetFirstColumn( i );
    >
    > for( j = 0; j < nj; j++, k++ )
    > b(k,0) = a.GetNextColumnValue( PosA );
    > }
    >
    > Try it and see, if the compiler is able to optimize this in a better
    > way.
    >
    > Of course one could polish up the above by introducing a seperate
    > iterator class etc., but for a quick test to see if your compiler can
    > do a better job the above should be sufficient.
    >
    > Yes I am aware, that this could be seen as premature oprimization
    > and as we all know this is the root of all evil :)


    (why is it that I so often look at the assembly output,
    that's certainly evil... :) )

    Using an iterator is cool in the example that I showed :
    seen from the compiler, it "is" a pointer, and from my point
    of view it is also quite interesting since I can hide some
    sanity checking in the iterator to prevent the user from
    going out of bounds. Things get worse when I need to access
    the data randomly (and this is in fact the general case, and
    here it is much more intersting to have some bound
    checking). An iterator won't do the job in this case... I
    also wonder if preserving logical constness is easy with
    iterators (I'll have a look in the STL, the answer probably
    sits there).

    I feel that there is no easy solution to my
    performance/clarity/safety tradeoff problem (I mean, other
    than "write the code in fortran" (uh!)). I will take it this
    way : code every thing as clearly as possible with all the
    safe bound checking, run the code and eventually perhaps
    concentrate on the few pieces of code where I really spend a
    lot of time (creating some "friend" functions where I can
    hack with pointers and other evil things...)

    Anyway this is interesting: I should have a look at the STL
    more often. Thank you ...

    BenoƮt Mathieu
     
    Benoit Mathieu, Jan 20, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bryan Parkoff

    Strict Pointer Aliasing Question

    Bryan Parkoff, Jan 14, 2004, in forum: C++
    Replies:
    2
    Views:
    1,253
    tom_usenet
    Jan 15, 2004
  2. David Mathog
    Replies:
    3
    Views:
    707
    Chris Torek
    Jul 5, 2007
  3. Hansen
    Replies:
    3
    Views:
    1,110
    rep_movsd
    Apr 24, 2010
  4. , India

    pointer to an array vs pointer to pointer

    , India, Sep 20, 2011, in forum: C Programming
    Replies:
    5
    Views:
    460
    James Kuyper
    Sep 23, 2011
  5. Xavier Roche
    Replies:
    3
    Views:
    107
    James Kuyper
    Mar 25, 2014
Loading...

Share This Page