Array class and pointer aliasing problems

M

Mathieu Benoit

(I'm having trouble to post this, sorry if this mail comes
several times)

I'm trying to optimize the use of an integer array class,
and I found that one major problem is pointer aliasing.

I consider the following example:

--------------------------------------------------------
class IntTab
{
public:
IntTab(int n, int m) :
data(new int[n*m]), dim0(n), dim1(m) {
}
~IntTab() {
delete data;
}
int ni() const {
return dim0;
}
int nj() const {
return dim1;
}
int & operator()(int i, int j) {
return data[i*dim1+j];
}
const int & operator()(int i, int j) const {
return data[i*dim1+j];
}
void myjob(IntTab & a) const;
private:
int * const data;
const int dim0;
const int dim1;
};

void myjob(const IntTab & a, IntTab & b)
{
int i, j, k;
int ni = b.ni();
int nj = b.nj();
for (i=0, k=0; i<ni; i++)
for (j=0; j<nj; j++, k++)
b(k,0) = a(i,j);
}

void IntTab::myjob(IntTab & a) const
{
int a_j = a.dim1;
int ni = dim0;
int nj = dim1;
int i, j, k;
int * a_data = a.data;
int * my_data = data;
for (i=0, k=0; i<ni; i++)
for (j=0; j<nj; j++, k++)
a_data[k*a_j] = my_data[i*nj+j];
}
---------------------------------------------------------

"myjob" is poorly optimized by the compiler (g++ 3.2)
because it assumes that in the process of writing b, I could
silently update the pointers (a.data and b.data) and the
index counters (a.dim1 and b.dim1) which are used in the
loop. Indeed, making these variables local as in
IntTab::myjob is much better but of course I prefer to write
my algorithms with the operator() which allows to easily put
some asserts for debugging.

Strangely enough, changing my array for an array of double
leads to the same problem, even with "-fstrict-aliasing".

Moreover, the "restrict" keyword might help here but it is
not supported by g++. Is there another way to tell the
compiler that caching the pointers is safe while preserving
readability ?

Is there something special with g++3.2 ?

Thanks,
Benoit MATHIEU
 
J

Jumbo

Mathieu Benoit said:
(I'm having trouble to post this, sorry if this mail comes
several times)

I'm trying to optimize the use of an integer array class,
and I found that one major problem is pointer aliasing.

I consider the following example:

--------------------------------------------------------
class IntTab
{
public:
IntTab(int n, int m) :
data(new int[n*m]), dim0(n), dim1(m) {
}
~IntTab() {
delete data;
}
int ni() const {
return dim0;
}
int nj() const {
return dim1;
}
int & operator()(int i, int j) {
return data[i*dim1+j];
}
const int & operator()(int i, int j) const {
return data[i*dim1+j];
}
void myjob(IntTab & a) const;
private:
int * const data;
const int dim0;
const int dim1;
};

void myjob(const IntTab & a, IntTab & b)
{
int i, j, k;
int ni = b.ni();
int nj = b.nj();
for (i=0, k=0; i<ni; i++)
for (j=0; j<nj; j++, k++)
b(k,0) = a(i,j);
}

void IntTab::myjob(IntTab & a) const
{
int a_j = a.dim1;
int ni = dim0;
int nj = dim1;
int i, j, k;
int * a_data = a.data;
int * my_data = data;
for (i=0, k=0; i<ni; i++)
for (j=0; j<nj; j++, k++)
a_data[k*a_j] = my_data[i*nj+j];
}
---------------------------------------------------------

"myjob" is poorly optimized by the compiler (g++ 3.2)
because it assumes that in the process of writing b, I could
silently update the pointers (a.data and b.data) and the
index counters (a.dim1 and b.dim1) which are used in the
loop. Indeed, making these variables local as in
IntTab::myjob is much better but of course I prefer to write
my algorithms with the operator() which allows to easily put
some asserts for debugging.

Strangely enough, changing my array for an array of double
leads to the same problem, even with "-fstrict-aliasing".

Moreover, the "restrict" keyword might help here but it is
not supported by g++. Is there another way to tell the
compiler that caching the pointers is safe while preserving
readability ?

Is there something special with g++3.2 ?

Thanks,
Benoit MATHIEU
Why don't you pass the dimensions into the function and call it like this:
myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())
 
B

Benoit Mathieu

I'm trying to optimize an integer array class,
and I found that one major problem is pointer aliasing.
--------------------------------------------------------
class IntTab
{
public:
IntTab(int n, int m) :
data(new int[n*m]), dim0(n), dim1(m) {
}
~IntTab() {
delete data;
}
int ni() const {
return dim0;
}
int nj() const {
return dim1;
}
int & operator()(int i, int j) {
return data[i*dim1+j];
}
const int & operator()(int i, int j) const {
return data[i*dim1+j];
}
void myjob(IntTab & a) const;
private:
int * const data;
const int dim0;
const int dim1;
};

void myjob(const IntTab & a, IntTab & b)
{
int i, j, k;
int ni = b.ni();
int nj = b.nj();
for (i=0, k=0; i<ni; i++)
for (j=0; j<nj; j++, k++)
b(k,0) = a(i,j);
"myjob" is poorly optimized by the compiler (g++ 3.2)
because it assumes that in the process of writing b, I could
silently update the pointers (a.data and b.data) and the
index counters (a.dim1 and b.dim1) which are used in the
loop.
Why don't you pass the dimensions into the function and call it like this:
myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())

Would this really help ? I already made a local copy of ni
and nj... I can see in the assembly output that ni and nj
and handled efficiently, only data and dim1 are reloaded
every time...

Maybe this issue is strongly related to the implementation
details in g++ (which might be particularly conservative
about aliasing) and I should ask in another group. But if
this is a g++ issue and the answer is g++ specific, then I
will probably not use it...

I would be more interested if somebody knows about a C++
construct which clearly tells the compiler that the actual
arrays will never alias these indexes and pointers... Does
the standard say something about that ?
 
J

Jumbo

Benoit Mathieu said:
I'm trying to optimize an integer array class,
and I found that one major problem is pointer aliasing.
--------------------------------------------------------
class IntTab
{
public:
IntTab(int n, int m) :
data(new int[n*m]), dim0(n), dim1(m) {
}
~IntTab() {
delete data;
}
int ni() const {
return dim0;
}
int nj() const {
return dim1;
}
int & operator()(int i, int j) {
return data[i*dim1+j];
}
const int & operator()(int i, int j) const {
return data[i*dim1+j];
}
void myjob(IntTab & a) const;
private:
int * const data;
const int dim0;
const int dim1;
};

void myjob(const IntTab & a, IntTab & b)
{
int i, j, k;
int ni = b.ni();
int nj = b.nj();
for (i=0, k=0; i<ni; i++)
for (j=0; j<nj; j++, k++)
b(k,0) = a(i,j);
"myjob" is poorly optimized by the compiler (g++ 3.2)
because it assumes that in the process of writing b, I could
silently update the pointers (a.data and b.data) and the
index counters (a.dim1 and b.dim1) which are used in the
loop.
Why don't you pass the dimensions into the function and call it like this:
myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())

Would this really help ? I already made a local copy of ni
and nj... I can see in the assembly output that ni and nj
and handled efficiently, only data and dim1 are reloaded
every time...

Maybe this issue is strongly related to the implementation
details in g++ (which might be particularly conservative
about aliasing) and I should ask in another group. But if
this is a g++ issue and the answer is g++ specific, then I
will probably not use it...

I would be more interested if somebody knows about a C++
construct which clearly tells the compiler that the actual
arrays will never alias these indexes and pointers... Does
the standard say something about that ?
Sorry I don't know.
What about doing it an an asm block, or something like that?
 
K

Karl Heinz Buchegger

Benoit said:
Maybe this issue is strongly related to the implementation
details in g++ (which might be particularly conservative
about aliasing) and I should ask in another group. But if
this is a g++ issue and the answer is g++ specific, then I
will probably not use it...

I would be more interested if somebody knows about a C++
construct which clearly tells the compiler that the actual
arrays will never alias these indexes and pointers... Does
the standard say something about that ?

What do you want to achieve?
As I see it, you want the compiler to replace the index
calculation in

for (j=0; j<nj; j++, k++)
b(k,0) = a(i,j);

with some sort of pointer magic. The same thing it can
do when you write

for( i = 0; i < size; ++i )
m = ...

where the compiler can replace the whole thing with

basePtr = m;
for( i = 0; i < size; ++i, basePtr++ )
*basePtr = ...

If your compiler is unable to do this transformation in
your current case, then clearly it is a quality of implementation
issue (meaning: g++ specific).

But what can you do?
One thing you can do in any case is to look how others solve this
(or a similar) problem. What immediatly comes to my mind is the
STL. How do they do it? The STL has introduced a concept called
iterators. There are functions for getting the first iterator
and incrementing such an iterator. Well, in the above, transformed
example, basePtr acts as an iterator! It is assigned a starting value
and after that the pointer (aehh: iterator) is incremented and evaluated.
Hmm.

So you could add functions to your class IntTab:

class intTab
{
int* GetFirstColumn( int Row ) { return &data[ Row*dim1 ]; }
int GetNextColumnValue( int*& Pos ) { return *(Pos++); }

...
};

And your looping construct modifies to:

for( i = 0, k = 0; i < n1; ++i ) {
int* PosA = a.GetFirstColumn( i );

for( j = 0; j < nj; j++, k++ )
b(k,0) = a.GetNextColumnValue( PosA );
}

Try it and see, if the compiler is able to optimize this in a better
way.

Of course one could polish up the above by introducing a seperate
iterator class etc., but for a quick test to see if your compiler can
do a better job the above should be sufficient.

Yes I am aware, that this could be seen as premature oprimization
and as we all know this is the root of all evil :)
 
B

Benoit Mathieu

Karl Heinz Buchegger wrote:
So you could add functions to your class IntTab:

class intTab
{
int* GetFirstColumn( int Row ) { return &data[ Row*dim1 ]; }
int GetNextColumnValue( int*& Pos ) { return *(Pos++); }

...
};

And your looping construct modifies to:

for( i = 0, k = 0; i < n1; ++i ) {
int* PosA = a.GetFirstColumn( i );

for( j = 0; j < nj; j++, k++ )
b(k,0) = a.GetNextColumnValue( PosA );
}

Try it and see, if the compiler is able to optimize this in a better
way.

Of course one could polish up the above by introducing a seperate
iterator class etc., but for a quick test to see if your compiler can
do a better job the above should be sufficient.

Yes I am aware, that this could be seen as premature oprimization
and as we all know this is the root of all evil :)

(why is it that I so often look at the assembly output,
that's certainly evil... :) )

Using an iterator is cool in the example that I showed :
seen from the compiler, it "is" a pointer, and from my point
of view it is also quite interesting since I can hide some
sanity checking in the iterator to prevent the user from
going out of bounds. Things get worse when I need to access
the data randomly (and this is in fact the general case, and
here it is much more intersting to have some bound
checking). An iterator won't do the job in this case... I
also wonder if preserving logical constness is easy with
iterators (I'll have a look in the STL, the answer probably
sits there).

I feel that there is no easy solution to my
performance/clarity/safety tradeoff problem (I mean, other
than "write the code in fortran" (uh!)). I will take it this
way : code every thing as clearly as possible with all the
safe bound checking, run the code and eventually perhaps
concentrate on the few pieces of code where I really spend a
lot of time (creating some "friend" functions where I can
hack with pointers and other evil things...)

Anyway this is interesting: I should have a look at the STL
more often. Thank you ...

Benoît Mathieu
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top