# How can a Vector class be optimized ?

Discussion in 'C++' started by mast2as@yahoo.com, Nov 5, 2006.

1. ### Guest

Hi everyone

I am working on some code that uses colors. Until recently this code
used colors represented a tree floats (RGB format) but recently changed
so colors are now defined as spectrum. The size of the vector went from
3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
are using a simple Vector class defined as follow:

template<typename T, int Depth>
class Vector
{ ...
};

Since the move from the RGB version of the code to the Spectral version
the application has significantly slowed dow. I did a test where I use
the Vector class & just a straight usage of arrays of 151 floats on
which the same operations are performed 1 million times.

int maxIter = static_cast<int>( 1e+6 );

#include <time.h>

clock_t c1, c0 = clock();

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
memset( v, 0, sizeof( float ) * 151 );
memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nfloat[ 151 ]" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
Vector<float, 151> v( 12.0 );
Vector<float, 151> v2( -12.0 );
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = clock();

cerr << "\nSuperVector class" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

Here are the results
// RGB version, Vector<float, 3>
end CPU time : 390000
elapsed CPU time : 0.39

// Spectral Version Vector<float, 151>
end CPU time : 10510000
elapsed CPU time : 10.12

// Using arrays of 151 floats
end CPU time : 13230000
elapsed CPU time : 2.72

Basically it of course shows that using the Vector class really really
slows down the application especially has the size of the Vector
increases and is not as efficient as doing the operations on arrays of
floats directly. So basically my question is : is there a way of
optimising it ?

I do realise that doing:
Vector<float, 151> result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;

is not the same as doing:
float result[ 151 ], temp [ 151 ];
for ( int i = 0; i < 151; ++i ) {
temp[ i ] = 0.1f;
result[ i ] = temp[ i ] * 0.1 * 100.0;
}

But isn't there a way i can make the Vector class as efficient as the
second option (which is to do the math operation on arrays of float
directly) ? Or if the speed is a priority is writing some C type of
code the only way i can get it back when the vector size becomes an
issue ?

Thanks for you help -

template<typename T, int Size>
class SuperVector
{

public:
T w[ Size ];
public:
SuperVector()
{ memset( w, 0, sizeof( T ) * Size ); }
SuperVector( const T &real )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] = real;
}
}

inline SuperVector<T, Size> operator * ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
sv[ i ] = (*this).w[ i ] * v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size> operator * ( const T &real )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] *= real;
}
return sv;
}

inline SuperVector<T, Size> operator + ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
&v )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] += v.w[ i ];
}
return *this;
}
};

, Nov 5, 2006

2. ### Kai-Uwe BuxGuest

wrote:

> Hi everyone
>
> I am working on some code that uses colors. Until recently this code
> used colors represented a tree floats (RGB format) but recently changed
> so colors are now defined as spectrum. The size of the vector went from
> 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
> are using a simple Vector class defined as follow:
>
> template<typename T, int Depth>
> class Vector
> { ...
> };
>
> Since the move from the RGB version of the code to the Spectral version
> the application has significantly slowed dow. I did a test where I use
> the Vector class & just a straight usage of arrays of 151 floats on
> which the same operations are performed 1 million times.

[code snipped]

Read up on expression templates. Or, better, use a linear algebra library.

Best

Kai-Uwe Bux

Kai-Uwe Bux, Nov 6, 2006

3. ### Salt_PeterGuest

wrote:
> Hi everyone
>
> I am working on some code that uses colors. Until recently this code
> used colors represented a tree floats (RGB format) but recently changed
> so colors are now defined as spectrum. The size of the vector went from
> 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
> are using a simple Vector class defined as follow:
>
> template<typename T, int Depth>
> class Vector
> { ...
> };
>
> Since the move from the RGB version of the code to the Spectral version
> the application has significantly slowed dow. I did a test where I use
> the Vector class & just a straight usage of arrays of 151 floats on
> which the same operations are performed 1 million times.
>
> int maxIter = static_cast<int>( 1e+6 );
>
> #include <time.h>
>
> clock_t c1, c0 = clock();
>
> c0 = clock();
> for ( int i = 0; i < maxIter; ++i ) {
> float real = 1.245;
> float anotherReal = 20.43492342;
> float v[ 151 ];
> float v2[ 151 ];
> memset( v, 0, sizeof( float ) * 151 );
> memset( v2, 0, sizeof( float ) * 151 );
>
> // mixing
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
> }
>
> // summing up & *
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] = v[ j ] * real;
> }
>
> // summing up & *
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] = v[ j ] * anotherReal;
> }
>
> // summing up & *
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] += v[ j ];
> }
> }
> c1 = clock();
>
> cerr << "\nfloat[ 151 ]" << endl;
> cerr << "end CPU time : " << (long)c1 << endl;
> cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
> << endl;
>
> c0 = clock();
> for ( int i = 0; i < maxIter; ++i ) {
> float real = 1.245;
> float anotherReal = 20.43492342;
> Vector<float, 151> v( 12.0 );
> Vector<float, 151> v2( -12.0 );

std::vector<float> v(151, 12.0);
std::vector<float> v2(151, -12.0);

using the exact same random iterator calculations as the array above:
see the clock results below.

> v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
> v += Vector<float, 151>( 10.0 ) * real * anotherReal;
> }
>
> c1 = clock();
>
> cerr << "\nSuperVector class" << endl;
> cerr << "end CPU time : " << (long)c1 << endl;
> cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
> << endl;
>
> Here are the results
> // RGB version, Vector<float, 3>
> end CPU time : 390000
> elapsed CPU time : 0.39
>
> // Spectral Version Vector<float, 151>
> end CPU time : 10510000
> elapsed CPU time : 10.12
>
> // Using arrays of 151 floats
> end CPU time : 13230000
> elapsed CPU time : 2.72

_____________________________
Results:

float[ 151 ]
end CPU time : 2620000
elapsed CPU time : 2.62
std::vector class
end CPU time : 4680000
elapsed CPU time : 2.06

>
> Basically it of course shows that using the Vector class really really
> slows down the application especially has the size of the Vector
> increases and is not as efficient as doing the operations on arrays of
> floats directly. So basically my question is : is there a way of
> optimising it ?

yes, use resize() to manually specify the container's size.

void resize(n, t = T())
- Inserts or erases elements at the end such that the size becomes n

>
> I do realise that doing:
> Vector<float, 151> result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;
>
> is not the same as doing:
> float result[ 151 ], temp [ 151 ];
> for ( int i = 0; i < 151; ++i ) {
> temp[ i ] = 0.1f;
> result[ i ] = temp[ i ] * 0.1 * 100.0;
> }
>
> But isn't there a way i can make the Vector class as efficient as the
> second option (which is to do the math operation on arrays of float
> directly) ? Or if the speed is a priority is writing some C type of
> code the only way i can get it back when the vector size becomes an
> issue ?
>
> Thanks for you help -
>
> template<typename T, int Size>
> class SuperVector
> {
>
> public:
> T w[ Size ];
> public:
> SuperVector()
> { memset( w, 0, sizeof( T ) * Size ); }
> SuperVector( const T &real )
> {
> for ( int i = 0; i < Size; ++i ) {
> (*this).w[ i ] = real;
> }
> }
>
> inline SuperVector<T, Size> operator * ( const SuperVector<T, Size>
> &v )
> {
> SuperVector<T, Size> sv;
> for ( int i = 0; i < Size; ++i ) {
> sv[ i ] = (*this).w[ i ] * v.w[ i ];
> }
> return sv;
> }
>
> inline SuperVector<T, Size> operator * ( const T &real )
> {
> SuperVector<T, Size> sv;
> for ( int i = 0; i < Size; ++i ) {
> (*this).w[ i ] *= real;
> }
> return sv;
> }
>
>
> inline SuperVector<T, Size> operator + ( const SuperVector<T, Size>
> &v )
> {
> SuperVector<T, Size> sv;
> for ( int i = 0; i < Size; ++i ) {
> sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
> }
> return sv;
> }
>
> inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
> &v )
> {
> for ( int i = 0; i < Size; ++i ) {
> (*this).w[ i ] += v.w[ i ];
> }
> return *this;
> }
> };

Salt_Peter, Nov 6, 2006
4. ### Daniel T.Guest

"" <> wrote:

> v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
> v += Vector<float, 151>( 10.0 ) * real * anotherReal;

Optimize your vector class by removing the op* and op+. Too many
temporaries are being created.

Here is an interesting exorcise:

int maxIter = static_cast<int>( 1e+6 );

clock_t c1, c0 = clock();

struct binary_op
{
float operator()( float lhs, float rhs ) const
{
return lhs * ( 1.0 - 0.5 ) + rhs * 0.5;
}
};

struct unary_op
{
unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { }
float operator()( float v ) const {
return v + 10.0 * real * anotherReal;
}
const float real, anotherReal;
};

int main() {
float real = 1.245;
float anotherReal = 20.43492342;
vector<float> v( 151, 12.0 );
vector<float> v2( 151, -12.0 );
c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nManual iteration" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

for ( int i = 0; i < 151; ++i ) {
v = 12.0;
v2 = -12.0;
}

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
transform( v2.begin(), v2.end(), v.begin(), v.begin(),
binary_op() );
transform( v.begin(), v.end(), v.begin(),
unary_op( real, anotherReal ) );
}
c1 = clock();
cerr << "\nAlgorithm Use" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;
}

My output:

Manual iteration
end CPU time : 174
elapsed CPU time : 1.74

Algorithm Use
end CPU time : 265
elapsed CPU time : 0.91

--
To send me email, put "sheltie" in the subject.

Daniel T., Nov 6, 2006
5. ### ma740988Guest

Daniel T. wrote:
> "" <> wrote:
>
> > v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
> > v += Vector<float, 151>( 10.0 ) * real * anotherReal;

>
> Optimize your vector class by removing the op* and op+. Too many
> temporaries are being created.
>
> Here is an interesting exorcise:
>
> int maxIter = static_cast<int>( 1e+6 );
>
> clock_t c1, c0 = clock();
>
> struct binary_op
> {
> float operator()( float lhs, float rhs ) const
> {
> return lhs * ( 1.0 - 0.5 ) + rhs * 0.5;
> }
> };
>
> struct unary_op
> {
> unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { }
> float operator()( float v ) const {
> return v + 10.0 * real * anotherReal;
> }
> const float real, anotherReal;
> };
>
> int main() {
> float real = 1.245;
> float anotherReal = 20.43492342;
> vector<float> v( 151, 12.0 );
> vector<float> v2( 151, -12.0 );
> c0 = clock();
> for ( int i = 0; i < maxIter; ++i ) {
> // mixing
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
> }
>
> // summing up & *
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] = v[ j ] * real;
> }
>
> // summing up & *
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] = v[ j ] * anotherReal;
> }
>
> // summing up & *
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] += v[ j ];
> }
> }
> c1 = clock();
>
> cerr << "\nManual iteration" << endl;
> cerr << "end CPU time : " << (long)c1 << endl;
> cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
> << endl;
>
> for ( int i = 0; i < 151; ++i ) {
> v = 12.0;
> v2 = -12.0;
> }
>
> c0 = clock();
> for ( int i = 0; i < maxIter; ++i ) {
> transform( v2.begin(), v2.end(), v.begin(), v.begin(),
> binary_op() );
> transform( v.begin(), v.end(), v.begin(),
> unary_op( real, anotherReal ) );
> }
> c1 = clock();
> cerr << "\nAlgorithm Use" << endl;
> cerr << "end CPU time : " << (long)c1 << endl;
> cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
> << endl;
> }
>
> My output:
>
> Manual iteration
> end CPU time : 174
> elapsed CPU time : 1.74
>
> Algorithm Use
> end CPU time : 265
> elapsed CPU time : 0.91
>

Perhaps, I'm missing something here, nonetheless, the output for Manual
Iteration ( dump v.front() and v.back() ) results in all zeros, while
that of the algorithm doesn't. Digging deeper I realize that the
return from unary_op and the summation in the manual iteration aren't
the same. So I modified the source such that we have:

int main() {
float real = 1.245;
float anotherReal = 20.43492342;
std::vector<float> v( 151, 12.0 );
std::vector<float> v2( 151, -12.0 );
c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] + 10 * real * anotherReal;
}

}
c1 = clock();

std::cerr << "\nManual iteration" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) /
CLOCKS_PER_SEC
<< std::endl;

//std::cout << " ### " << v.front() << std::endl;
//std::cout << " ### " << v.back() << std::endl;

for ( int i = 0; i < 151; ++i ) {
v = 12.0;
v2 = -12.0;
}

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
std::transform( v2.begin(), v2.end(), v.begin(), v.begin(),
binary_op() );
std::transform( v.begin(), v.end(), v.begin(),
unary_op( real, anotherReal ) );
}
c1 = clock();
std::cerr << "\nAlgorithm Use" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) /
CLOCKS_PER_SEC
<< std::endl;

//std::cout << " ## " << v.front() << std::endl;
//std::cout << " ## " << v.back() << std::endl;

}

P4 3.2Ghz MSVC.NET -O3 optimization

Manual iteration
end CPU time : 7570
elapsed CPU time : 7.57

Algorithm Use
end CPU time : 28453
elapsed CPU time : 20.883

Press any key to continue . . .

ma740988, Nov 7, 2006
6. ### Daniel T.Guest

"ma740988" <> wrote:
> Daniel T. wrote:
>
>> Manual iteration
>> end CPU time : 174
>> elapsed CPU time : 1.74
>>
>> Algorithm Use
>> end CPU time : 265
>> elapsed CPU time : 0.91

>
> Perhaps, I'm missing something here, nonetheless, the output for Manual
> Iteration ( dump v.front() and v.back() ) results in all zeros, while
> that of the algorithm doesn't. Digging deeper I realize that the
> return from unary_op and the summation in the manual iteration aren't
> the same. So I modified the source...
>
> P4 3.2Ghz MSVC.NET -O3 optimization
>
> Manual iteration
> end CPU time : 7570
> elapsed CPU time : 7.57
>
> Algorithm Use
> end CPU time : 28453
> elapsed CPU time : 20.883
>
> Press any key to continue . . .

Odd, I used your main and it, of course, speeded up the manual iteration
considerably:

PowerPC G5 1.6GHz g++-4.0

Manual iteration
end CPU time : 96
elapsed CPU time : 0.96

Algorithm Use
end CPU time : 187
elapsed CPU time : 0.91

cpp_sandbox has exited with status 0.

Of course the fact that my computer is faster than yours isn't relevant,
it's the percentage difference in the numbers that surprises. I'm
showing a 5% increase in speed for the algorithm use, whereas you show a
176% *decrease*. Are you sure you compiled with full optimizations for
speed?

--
To send me email, put "sheltie" in the subject.

Daniel T., Nov 7, 2006
7. ### ma740988Guest

Daniel T. wrote:
> "ma740988" <> wrote:
> > Daniel T. wrote:

>
> Of course the fact that my computer is faster than yours isn't relevant,
> it's the percentage difference in the numbers that surprises. I'm
> showing a 5% increase in speed for the algorithm use, whereas you show a
> 176% *decrease*. Are you sure you compiled with full optimizations for
> speed?

I'll try full optimization then observe the difference. My initial
test was done with 03 (MSCV.NET 05 ) optimzation. What compiler are
you using?

ma740988, Nov 8, 2006
8. ### Daniel T.Guest

"ma740988" <> wrote:
> Daniel T. wrote:
>> "ma740988" <> wrote:
>>> Daniel T. wrote:

>>
>> Of course the fact that my computer is faster than yours isn't
>> relevant, it's the percentage difference in the numbers that
>> surprises. I'm showing a 5% increase in speed for the algorithm
>> use, whereas you show a 176% *decrease*. Are you sure you compiled
>> with full optimizations for speed?

>
> I'll try full optimization then observe the difference. My initial
> test was done with 03 (MSCV.NET 05 ) optimzation. What compiler are
> you using?

PowerPC G5 1.6GHz g++ - 4.0

--
To send me email, put "sheltie" in the subject.

Daniel T., Nov 8, 2006
9. ### Kai-Uwe BuxGuest

wrote:

> Hi everyone
>
> I am working on some code that uses colors. Until recently this code
> used colors represented a tree floats (RGB format) but recently changed
> so colors are now defined as spectrum. The size of the vector went from
> 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
> are using a simple Vector class defined as follow:
>
> template<typename T, int Depth>
> class Vector
> { ...
> };
>
> Since the move from the RGB version of the code to the Spectral version
> the application has significantly slowed dow. I did a test where I use
> the Vector class & just a straight usage of arrays of 151 floats on
> which the same operations are performed 1 million times.
>
> int maxIter = static_cast<int>( 1e+6 );
>
> #include <time.h>
>
> clock_t c1, c0 = clock();
>
> c0 = clock();
> for ( int i = 0; i < maxIter; ++i ) {
> float real = 1.245;
> float anotherReal = 20.43492342;
> float v[ 151 ];
> float v2[ 151 ];
> memset( v, 0, sizeof( float ) * 151 );
> memset( v2, 0, sizeof( float ) * 151 );
>
> // mixing
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
> }
>
> // summing up & *
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] = v[ j ] * real;
> }
>
> // summing up & *
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] = v[ j ] * anotherReal;
> }
>
> // summing up & *
> for ( int j = 0; j < 151; ++j ) {
> v[ j ] += v[ j ];
> }
> }
> c1 = clock();
>
> cerr << "\nfloat[ 151 ]" << endl;
> cerr << "end CPU time : " << (long)c1 << endl;
> cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
> << endl;
>
> c0 = clock();
> for ( int i = 0; i < maxIter; ++i ) {
> float real = 1.245;
> float anotherReal = 20.43492342;
> Vector<float, 151> v( 12.0 );
> Vector<float, 151> v2( -12.0 );
> v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
> v += Vector<float, 151>( 10.0 ) * real * anotherReal;
> }
>
> c1 = clock();
>
> cerr << "\nSuperVector class" << endl;
> cerr << "end CPU time : " << (long)c1 << endl;
> cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
> << endl;
>
> Here are the results
> // RGB version, Vector<float, 3>
> end CPU time : 390000
> elapsed CPU time : 0.39
>
> // Spectral Version Vector<float, 151>
> end CPU time : 10510000
> elapsed CPU time : 10.12
>
> // Using arrays of 151 floats
> end CPU time : 13230000
> elapsed CPU time : 2.72
>
> Basically it of course shows that using the Vector class really really
> slows down the application especially has the size of the Vector
> increases and is not as efficient as doing the operations on arrays of
> floats directly. So basically my question is : is there a way of
> optimising it ?
>
> I do realise that doing:
> Vector<float, 151> result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;
>
> is not the same as doing:
> float result[ 151 ], temp [ 151 ];
> for ( int i = 0; i < 151; ++i ) {
> temp[ i ] = 0.1f;
> result[ i ] = temp[ i ] * 0.1 * 100.0;
> }
>
> But isn't there a way i can make the Vector class as efficient as the
> second option (which is to do the math operation on arrays of float
> directly) ? Or if the speed is a priority is writing some C type of
> code the only way i can get it back when the vector size becomes an
> issue ?
>
> Thanks for you help -
>
> template<typename T, int Size>
> class SuperVector
> {
>
> public:
> T w[ Size ];
> public:
> SuperVector()
> { memset( w, 0, sizeof( T ) * Size ); }
> SuperVector( const T &real )
> {
> for ( int i = 0; i < Size; ++i ) {
> (*this).w[ i ] = real;
> }
> }
>
> inline SuperVector<T, Size> operator * ( const SuperVector<T, Size>
> &v )
> {
> SuperVector<T, Size> sv;
> for ( int i = 0; i < Size; ++i ) {
> sv[ i ] = (*this).w[ i ] * v.w[ i ];
> }
> return sv;
> }
>
> inline SuperVector<T, Size> operator * ( const T &real )
> {
> SuperVector<T, Size> sv;
> for ( int i = 0; i < Size; ++i ) {
> (*this).w[ i ] *= real;
> }
> return sv;
> }
>
>
> inline SuperVector<T, Size> operator + ( const SuperVector<T, Size>
> &v )
> {
> SuperVector<T, Size> sv;
> for ( int i = 0; i < Size; ++i ) {
> sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
> }
> return sv;
> }
>
> inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
> &v )
> {
> for ( int i = 0; i < Size; ++i ) {
> (*this).w[ i ] += v.w[ i ];
> }
> return *this;
> }
> };

Here is an illustration of how expression template reduce the number of
temporaries:

#include <cstdlib> // std::size_t
#include <iostream>

/*
We define many things that are not meant to show up
in client code:
*/
namespace DO_NOT_USE {

// the basic container:
template < typename ValueType, std::size_t Size >
class VectorData {

ValueType the_data [ Size ];

public:

VectorData ( ValueType val = ValueType() )
{
for ( std::size_t i = 0; i < Size; ++ i ) {
the_data[ i ] = val;
}
}

ValueType operator[] ( std::size_t i ) const {
return ( the_data );
}

ValueType & operator[] ( std::size_t i ) {
return ( the_data );
}

};

template < typename ValueType, std::size_t Size, typename Expr >
struct VectorExpr : public Expr {

VectorExpr ( void ) : Expr() {}

VectorExpr ( Expr const & a ) : Expr(a) {}

};

template < typename ValueType, std::size_t Size, typename Expr >
std:stream &
operator<< ( std:stream & o_str,
VectorExpr< ValueType, Size, Expr > const & a ) {
if ( Size > 0 ) {
std::size_t i = 0;
while( i < Size-1 ) {
o_str << a << " ";
++i;
}
o_str << a;
}
return ( o_str );
}

template < typename ValueType, std::size_t Size, typename ExprA, typename
ExprB >
struct VectorPlusVector {

ExprA const & a;
ExprB const & b;

VectorPlusVector ( ExprA const & a_, ExprB const & b_ )
: a ( a_ )
, b ( b_ )
{}

ValueType operator[] ( std::size_t i ) const {
return ( a + b );
}

};

template < typename ValueType, std::size_t Size, typename ExprA, typename
ExprB >
VectorExpr< ValueType, Size, VectorPlusVector< ValueType, Size, ExprA,
ExprB > >
operator+ ( VectorExpr< ValueType, Size, ExprA > const & a,
VectorExpr< ValueType, Size, ExprB > const & b ) {
return ( VectorPlusVector< ValueType, Size, ExprA, ExprB >( a, b ) );
}

template < typename ValueType, std::size_t Size, typename ExprA >
struct VectorTimesScalar {

ExprA const & a;
ValueType b;

VectorTimesScalar ( ExprA const & a_, ValueType b_ )
: a ( a_ )
, b ( b_ )
{}

ValueType operator[] ( std::size_t i ) const {
return ( a * b );
}

};

template < typename ValueType, std::size_t Size, typename ExprA >
VectorExpr< ValueType, Size, VectorTimesScalar< ValueType, Size, ExprA > >
operator* ( VectorExpr< ValueType, Size, ExprA > const & a,
ValueType b ) {
return ( VectorTimesScalar< ValueType, Size, ExprA >( a, b ) );
}

template < typename ValueType, std::size_t Size >
class la_vect
: public VectorExpr< ValueType, Size, VectorData< ValueType, Size > >
{
public:

la_vect ( ValueType val = ValueType() )
: VectorExpr< ValueType, Size, VectorData< ValueType, Size > >( val )
{}

template < typename Expr >
la_vect & operator= ( VectorExpr< ValueType, Size, Expr > const & rhs )
{
for ( std::size_t i = 0; i < Size; ++i ) {
(*this)[ i ] = rhs[ i ];
}
return ( *this );
}

template < typename Expr >
la_vect & operator+= ( VectorExpr< ValueType, Size, Expr > const & rhs )
{
for ( std::size_t i = 0; i < Size; ++i ) {
(*this)[ i ] += rhs[ i ];
}
return ( *this );
}

};

}

using DO_NOT_USE::la_vect;

int maxIter = static_cast<int>( 100000 );

#include <ctime>
#include <cstdlib>

int main ( void ) {
std::clock_t c1, c0 = std::clock();

c0 = std::clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
std::memset( v, 0, sizeof( float ) * 151 );
std::memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = std::clock();

std::cerr << "\nfloat[ 151 ]" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< std::endl;

c0 = std::clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
la_vect<float, 151> v( 12.0 );
la_vect<float, 151> v2( -12.0 );
v = v2 * float( 1.0 - 0.5 ) + v * float(0.5);
v += la_vect<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = std::clock();

std::cerr << "\nla_vect class" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< std::endl;

}

design> a.out

float[ 151 ]
end CPU time : 540000
elapsed CPU time : 0.54

la_vect class
end CPU time : 1160000
elapsed CPU time : 0.62

Also note that the computation using raw arrays and loops does not the same
thing as the computation using SuperVector. This might explain the
remaining difference: theoretically, an optimizing compiler could eliminate
almost all temporaries.

Best

Kai-Uwe Bux

Kai-Uwe Bux, Nov 8, 2006
10. ### ma740988Guest

> > Daniel T. wrote:
>
> Of course the fact that my computer is faster than yours isn't relevant,
> it's the percentage difference in the numbers that surprises. I'm
> showing a 5% increase in speed for the algorithm use, whereas you show a
> 176% *decrease*. Are you sure you compiled with full optimizations for
> speed?
>

Timing is one of those things that often puzzles me when using the .NET
compiler ( .NET 05 ). Algorithms almost always seem so much slower
that conventional loops. Full optimization produce a similar result.

Manual iteration
end CPU time : 7653
elapsed CPU time : 7.653

Algorithm Use
end CPU time : 29034
elapsed CPU time : 21.366

I'm confused.

ma740988, Nov 10, 2006
11. ### peter kochGuest

ma740988 skrev:
> > > Daniel T. wrote:

> >
> > Of course the fact that my computer is faster than yours isn't relevant,
> > it's the percentage difference in the numbers that surprises. I'm
> > showing a 5% increase in speed for the algorithm use, whereas you show a
> > 176% *decrease*. Are you sure you compiled with full optimizations for
> > speed?
> >

> Timing is one of those things that often puzzles me when using the .NET
> compiler ( .NET 05 ). Algorithms almost always seem so much slower
> that conventional loops. Full optimization produce a similar result.

Could you give us the command-line arguments here? Also tell us if you
have removed the extra checking in Visual Studio 2005.
/Peter
>
> Manual iteration
> end CPU time : 7653
> elapsed CPU time : 7.653
>
> Algorithm Use
> end CPU time : 29034
> elapsed CPU time : 21.366
>
> I'm confused.

peter koch, Nov 10, 2006
12. ### Daniel T.Guest

ma740988 wrote:
>
> Timing is one of those things that often puzzles me when using the .NET
> compiler ( .NET 05 ). Algorithms almost always seem so much slower
> that conventional loops. Full optimization produce a similar result.
>
> Manual iteration
> end CPU time : 7653
> elapsed CPU time : 7.653
>
> Algorithm Use
> end CPU time : 29034
> elapsed CPU time : 21.366
>
> I'm confused.

Even more confusing: These numbers are *slower* than when you didn't
compile with full optimization. Maybe you are optimizing for smaller
size rather than faster speed?

Daniel T., Nov 10, 2006
13. ### ma740988Guest

peter koch wrote:
> Could you give us the command-line arguments here? Also tell us if you
> have removed the extra checking in Visual Studio 2005.
> /Peter

/Ox /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm
/EHsc /MDd /Fo"Debug\\" /Fd"Debug\vc80.pdb" /W3 /nologo /c /Wp64 /TP
/errorReportrompt

ma740988, Nov 10, 2006