How can a Vector class be optimized ?

M

mast2as

Hi everyone

I am working on some code that uses colors. Until recently this code
used colors represented a tree floats (RGB format) but recently changed
so colors are now defined as spectrum. The size of the vector went from
3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
are using a simple Vector class defined as follow:

template<typename T, int Depth>
class Vector
{ ...
};

Since the move from the RGB version of the code to the Spectral version
the application has significantly slowed dow. I did a test where I use
the Vector class & just a straight usage of arrays of 151 floats on
which the same operations are performed 1 million times.

int maxIter = static_cast<int>( 1e+6 );

#include <time.h>

clock_t c1, c0 = clock();

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
memset( v, 0, sizeof( float ) * 151 );
memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nfloat[ 151 ]" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
Vector<float, 151> v( 12.0 );
Vector<float, 151> v2( -12.0 );
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = clock();

cerr << "\nSuperVector class" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

Here are the results
// RGB version, Vector<float, 3>
end CPU time : 390000
elapsed CPU time : 0.39

// Spectral Version Vector<float, 151>
end CPU time : 10510000
elapsed CPU time : 10.12

// Using arrays of 151 floats
end CPU time : 13230000
elapsed CPU time : 2.72

Basically it of course shows that using the Vector class really really
slows down the application especially has the size of the Vector
increases and is not as efficient as doing the operations on arrays of
floats directly. So basically my question is : is there a way of
optimising it ?

I do realise that doing:
Vector<float, 151> result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;

is not the same as doing:
float result[ 151 ], temp [ 151 ];
for ( int i = 0; i < 151; ++i ) {
temp[ i ] = 0.1f;
result[ i ] = temp[ i ] * 0.1 * 100.0;
}

But isn't there a way i can make the Vector class as efficient as the
second option (which is to do the math operation on arrays of float
directly) ? Or if the speed is a priority is writing some C type of
code the only way i can get it back when the vector size becomes an
issue ?

Thanks for you help -

template<typename T, int Size>
class SuperVector
{

public:
T w[ Size ];
public:
SuperVector()
{ memset( w, 0, sizeof( T ) * Size ); }
SuperVector( const T &real )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] = real;
}
}

inline SuperVector<T, Size> operator * ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
sv[ i ] = (*this).w[ i ] * v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size> operator * ( const T &real )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] *= real;
}
return sv;
}


inline SuperVector<T, Size> operator + ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
&v )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] += v.w[ i ];
}
return *this;
}
};
 
K

Kai-Uwe Bux

Hi everyone

I am working on some code that uses colors. Until recently this code
used colors represented a tree floats (RGB format) but recently changed
so colors are now defined as spectrum. The size of the vector went from
3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
are using a simple Vector class defined as follow:

template<typename T, int Depth>
class Vector
{ ...
};

Since the move from the RGB version of the code to the Spectral version
the application has significantly slowed dow. I did a test where I use
the Vector class & just a straight usage of arrays of 151 floats on
which the same operations are performed 1 million times.

[code snipped]

Read up on expression templates. Or, better, use a linear algebra library.


Best

Kai-Uwe Bux
 
S

Salt_Peter

Hi everyone

I am working on some code that uses colors. Until recently this code
used colors represented a tree floats (RGB format) but recently changed
so colors are now defined as spectrum. The size of the vector went from
3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
are using a simple Vector class defined as follow:

template<typename T, int Depth>
class Vector
{ ...
};

Since the move from the RGB version of the code to the Spectral version
the application has significantly slowed dow. I did a test where I use
the Vector class & just a straight usage of arrays of 151 floats on
which the same operations are performed 1 million times.

int maxIter = static_cast<int>( 1e+6 );

#include <time.h>

clock_t c1, c0 = clock();

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
memset( v, 0, sizeof( float ) * 151 );
memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nfloat[ 151 ]" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
Vector<float, 151> v( 12.0 );
Vector<float, 151> v2( -12.0 );

std::vector<float> v(151, 12.0);
std::vector<float> v2(151, -12.0);

using the exact same random iterator calculations as the array above:
see the clock results below.
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = clock();

cerr << "\nSuperVector class" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

Here are the results
// RGB version, Vector<float, 3>
end CPU time : 390000
elapsed CPU time : 0.39

// Spectral Version Vector<float, 151>
end CPU time : 10510000
elapsed CPU time : 10.12

// Using arrays of 151 floats
end CPU time : 13230000
elapsed CPU time : 2.72
_____________________________
Results:

float[ 151 ]
end CPU time : 2620000
elapsed CPU time : 2.62
std::vector class
end CPU time : 4680000
elapsed CPU time : 2.06
Basically it of course shows that using the Vector class really really
slows down the application especially has the size of the Vector
increases and is not as efficient as doing the operations on arrays of
floats directly. So basically my question is : is there a way of
optimising it ?

yes, use resize() to manually specify the container's size.

void resize(n, t = T())
- Inserts or erases elements at the end such that the size becomes n
I do realise that doing:
Vector<float, 151> result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;

is not the same as doing:
float result[ 151 ], temp [ 151 ];
for ( int i = 0; i < 151; ++i ) {
temp[ i ] = 0.1f;
result[ i ] = temp[ i ] * 0.1 * 100.0;
}

But isn't there a way i can make the Vector class as efficient as the
second option (which is to do the math operation on arrays of float
directly) ? Or if the speed is a priority is writing some C type of
code the only way i can get it back when the vector size becomes an
issue ?

Thanks for you help -

template<typename T, int Size>
class SuperVector
{

public:
T w[ Size ];
public:
SuperVector()
{ memset( w, 0, sizeof( T ) * Size ); }
SuperVector( const T &real )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] = real;
}
}

inline SuperVector<T, Size> operator * ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
sv[ i ] = (*this).w[ i ] * v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size> operator * ( const T &real )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] *= real;
}
return sv;
}


inline SuperVector<T, Size> operator + ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
&v )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] += v.w[ i ];
}
return *this;
}
};
 
D

Daniel T.

v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;

Optimize your vector class by removing the op* and op+. Too many
temporaries are being created.

Here is an interesting exorcise:

int maxIter = static_cast<int>( 1e+6 );

clock_t c1, c0 = clock();

struct binary_op
{
float operator()( float lhs, float rhs ) const
{
return lhs * ( 1.0 - 0.5 ) + rhs * 0.5;
}
};

struct unary_op
{
unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { }
float operator()( float v ) const {
return v + 10.0 * real * anotherReal;
}
const float real, anotherReal;
};

int main() {
float real = 1.245;
float anotherReal = 20.43492342;
vector<float> v( 151, 12.0 );
vector<float> v2( 151, -12.0 );
c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nManual iteration" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

for ( int i = 0; i < 151; ++i ) {
v = 12.0;
v2 = -12.0;
}

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
transform( v2.begin(), v2.end(), v.begin(), v.begin(),
binary_op() );
transform( v.begin(), v.end(), v.begin(),
unary_op( real, anotherReal ) );
}
c1 = clock();
cerr << "\nAlgorithm Use" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;
}

My output:

Manual iteration
end CPU time : 174
elapsed CPU time : 1.74

Algorithm Use
end CPU time : 265
elapsed CPU time : 0.91
 
M

ma740988

Daniel said:
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;

Optimize your vector class by removing the op* and op+. Too many
temporaries are being created.

Here is an interesting exorcise:

int maxIter = static_cast<int>( 1e+6 );

clock_t c1, c0 = clock();

struct binary_op
{
float operator()( float lhs, float rhs ) const
{
return lhs * ( 1.0 - 0.5 ) + rhs * 0.5;
}
};

struct unary_op
{
unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { }
float operator()( float v ) const {
return v + 10.0 * real * anotherReal;
}
const float real, anotherReal;
};

int main() {
float real = 1.245;
float anotherReal = 20.43492342;
vector<float> v( 151, 12.0 );
vector<float> v2( 151, -12.0 );
c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nManual iteration" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

for ( int i = 0; i < 151; ++i ) {
v = 12.0;
v2 = -12.0;
}

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
transform( v2.begin(), v2.end(), v.begin(), v.begin(),
binary_op() );
transform( v.begin(), v.end(), v.begin(),
unary_op( real, anotherReal ) );
}
c1 = clock();
cerr << "\nAlgorithm Use" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;
}

My output:

Manual iteration
end CPU time : 174
elapsed CPU time : 1.74

Algorithm Use
end CPU time : 265
elapsed CPU time : 0.91

Perhaps, I'm missing something here, nonetheless, the output for Manual
Iteration ( dump v.front() and v.back() ) results in all zeros, while
that of the algorithm doesn't. Digging deeper I realize that the
return from unary_op and the summation in the manual iteration aren't
the same. So I modified the source such that we have:

int main() {
float real = 1.245;
float anotherReal = 20.43492342;
std::vector<float> v( 151, 12.0 );
std::vector<float> v2( 151, -12.0 );
c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] + 10 * real * anotherReal;
}

}
c1 = clock();

std::cerr << "\nManual iteration" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) /
CLOCKS_PER_SEC
<< std::endl;


//std::cout << " ### " << v.front() << std::endl;
//std::cout << " ### " << v.back() << std::endl;

for ( int i = 0; i < 151; ++i ) {
v = 12.0;
v2 = -12.0;
}


c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
std::transform( v2.begin(), v2.end(), v.begin(), v.begin(),
binary_op() );
std::transform( v.begin(), v.end(), v.begin(),
unary_op( real, anotherReal ) );
}
c1 = clock();
std::cerr << "\nAlgorithm Use" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) /
CLOCKS_PER_SEC
<< std::endl;

//std::cout << " ## " << v.front() << std::endl;
//std::cout << " ## " << v.back() << std::endl;

}

P4 3.2Ghz MSVC.NET -O3 optimization

Manual iteration
end CPU time : 7570
elapsed CPU time : 7.57

Algorithm Use
end CPU time : 28453
elapsed CPU time : 20.883

Press any key to continue . . .
 
D

Daniel T.

ma740988 said:
Perhaps, I'm missing something here, nonetheless, the output for Manual
Iteration ( dump v.front() and v.back() ) results in all zeros, while
that of the algorithm doesn't. Digging deeper I realize that the
return from unary_op and the summation in the manual iteration aren't
the same. So I modified the source...

P4 3.2Ghz MSVC.NET -O3 optimization

Manual iteration
end CPU time : 7570
elapsed CPU time : 7.57

Algorithm Use
end CPU time : 28453
elapsed CPU time : 20.883

Press any key to continue . . .

Odd, I used your main and it, of course, speeded up the manual iteration
considerably:

PowerPC G5 1.6GHz g++-4.0

Manual iteration
end CPU time : 96
elapsed CPU time : 0.96

Algorithm Use
end CPU time : 187
elapsed CPU time : 0.91

cpp_sandbox has exited with status 0.

Of course the fact that my computer is faster than yours isn't relevant,
it's the percentage difference in the numbers that surprises. I'm
showing a 5% increase in speed for the algorithm use, whereas you show a
176% *decrease*. Are you sure you compiled with full optimizations for
speed?
 
M

ma740988

Daniel said:
Of course the fact that my computer is faster than yours isn't relevant,
it's the percentage difference in the numbers that surprises. I'm
showing a 5% increase in speed for the algorithm use, whereas you show a
176% *decrease*. Are you sure you compiled with full optimizations for
speed?

I'll try full optimization then observe the difference. My initial
test was done with 03 (MSCV.NET 05 ) optimzation. What compiler are
you using?
 
D

Daniel T.

ma740988 said:
I'll try full optimization then observe the difference. My initial
test was done with 03 (MSCV.NET 05 ) optimzation. What compiler are
you using?

PowerPC G5 1.6GHz g++ - 4.0
 
K

Kai-Uwe Bux

Hi everyone

I am working on some code that uses colors. Until recently this code
used colors represented a tree floats (RGB format) but recently changed
so colors are now defined as spectrum. The size of the vector went from
3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
are using a simple Vector class defined as follow:

template<typename T, int Depth>
class Vector
{ ...
};

Since the move from the RGB version of the code to the Spectral version
the application has significantly slowed dow. I did a test where I use
the Vector class & just a straight usage of arrays of 151 floats on
which the same operations are performed 1 million times.

int maxIter = static_cast<int>( 1e+6 );

#include <time.h>

clock_t c1, c0 = clock();

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
memset( v, 0, sizeof( float ) * 151 );
memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = clock();

cerr << "\nfloat[ 151 ]" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

c0 = clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
Vector<float, 151> v( 12.0 );
Vector<float, 151> v2( -12.0 );
v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
v += Vector<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = clock();

cerr << "\nSuperVector class" << endl;
cerr << "end CPU time : " << (long)c1 << endl;
cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< endl;

Here are the results
// RGB version, Vector<float, 3>
end CPU time : 390000
elapsed CPU time : 0.39

// Spectral Version Vector<float, 151>
end CPU time : 10510000
elapsed CPU time : 10.12

// Using arrays of 151 floats
end CPU time : 13230000
elapsed CPU time : 2.72

Basically it of course shows that using the Vector class really really
slows down the application especially has the size of the Vector
increases and is not as efficient as doing the operations on arrays of
floats directly. So basically my question is : is there a way of
optimising it ?

I do realise that doing:
Vector<float, 151> result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;

is not the same as doing:
float result[ 151 ], temp [ 151 ];
for ( int i = 0; i < 151; ++i ) {
temp[ i ] = 0.1f;
result[ i ] = temp[ i ] * 0.1 * 100.0;
}

But isn't there a way i can make the Vector class as efficient as the
second option (which is to do the math operation on arrays of float
directly) ? Or if the speed is a priority is writing some C type of
code the only way i can get it back when the vector size becomes an
issue ?

Thanks for you help -

template<typename T, int Size>
class SuperVector
{

public:
T w[ Size ];
public:
SuperVector()
{ memset( w, 0, sizeof( T ) * Size ); }
SuperVector( const T &real )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] = real;
}
}

inline SuperVector<T, Size> operator * ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
sv[ i ] = (*this).w[ i ] * v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size> operator * ( const T &real )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] *= real;
}
return sv;
}


inline SuperVector<T, Size> operator + ( const SuperVector<T, Size>
&v )
{
SuperVector<T, Size> sv;
for ( int i = 0; i < Size; ++i ) {
sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
}
return sv;
}

inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
&v )
{
for ( int i = 0; i < Size; ++i ) {
(*this).w[ i ] += v.w[ i ];
}
return *this;
}
};


Here is an illustration of how expression template reduce the number of
temporaries:

#include <cstdlib> // std::size_t
#include <iostream>

/*
We define many things that are not meant to show up
in client code:
*/
namespace DO_NOT_USE {

// the basic container:
template < typename ValueType, std::size_t Size >
class VectorData {

ValueType the_data [ Size ];

public:

VectorData ( ValueType val = ValueType() )
{
for ( std::size_t i = 0; i < Size; ++ i ) {
the_data[ i ] = val;
}
}

ValueType operator[] ( std::size_t i ) const {
return ( the_data );
}

ValueType & operator[] ( std::size_t i ) {
return ( the_data );
}

};

template < typename ValueType, std::size_t Size, typename Expr >
struct VectorExpr : public Expr {

VectorExpr ( void ) : Expr() {}

VectorExpr ( Expr const & a ) : Expr(a) {}

};


template < typename ValueType, std::size_t Size, typename Expr >
std::eek:stream &
operator<< ( std::eek:stream & o_str,
VectorExpr< ValueType, Size, Expr > const & a ) {
if ( Size > 0 ) {
std::size_t i = 0;
while( i < Size-1 ) {
o_str << a << " ";
++i;
}
o_str << a;
}
return ( o_str );
}

template < typename ValueType, std::size_t Size, typename ExprA, typename
ExprB >
struct VectorPlusVector {

ExprA const & a;
ExprB const & b;

VectorPlusVector ( ExprA const & a_, ExprB const & b_ )
: a ( a_ )
, b ( b_ )
{}

ValueType operator[] ( std::size_t i ) const {
return ( a + b );
}

};

template < typename ValueType, std::size_t Size, typename ExprA, typename
ExprB >
VectorExpr< ValueType, Size, VectorPlusVector< ValueType, Size, ExprA,
ExprB > >
operator+ ( VectorExpr< ValueType, Size, ExprA > const & a,
VectorExpr< ValueType, Size, ExprB > const & b ) {
return ( VectorPlusVector< ValueType, Size, ExprA, ExprB >( a, b ) );
}


template < typename ValueType, std::size_t Size, typename ExprA >
struct VectorTimesScalar {

ExprA const & a;
ValueType b;

VectorTimesScalar ( ExprA const & a_, ValueType b_ )
: a ( a_ )
, b ( b_ )
{}

ValueType operator[] ( std::size_t i ) const {
return ( a * b );
}

};

template < typename ValueType, std::size_t Size, typename ExprA >
VectorExpr< ValueType, Size, VectorTimesScalar< ValueType, Size, ExprA > >
operator* ( VectorExpr< ValueType, Size, ExprA > const & a,
ValueType b ) {
return ( VectorTimesScalar< ValueType, Size, ExprA >( a, b ) );
}



template < typename ValueType, std::size_t Size >
class la_vect
: public VectorExpr< ValueType, Size, VectorData< ValueType, Size > >
{
public:

la_vect ( ValueType val = ValueType() )
: VectorExpr< ValueType, Size, VectorData< ValueType, Size > >( val )
{}

template < typename Expr >
la_vect & operator= ( VectorExpr< ValueType, Size, Expr > const & rhs )
{
for ( std::size_t i = 0; i < Size; ++i ) {
(*this)[ i ] = rhs[ i ];
}
return ( *this );
}

template < typename Expr >
la_vect & operator+= ( VectorExpr< ValueType, Size, Expr > const & rhs )
{
for ( std::size_t i = 0; i < Size; ++i ) {
(*this)[ i ] += rhs[ i ];
}
return ( *this );
}

};

}

using DO_NOT_USE::la_vect;


int maxIter = static_cast<int>( 100000 );

#include <ctime>
#include <cstdlib>

int main ( void ) {
std::clock_t c1, c0 = std::clock();

c0 = std::clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
float v[ 151 ];
float v2[ 151 ];
std::memset( v, 0, sizeof( float ) * 151 );
std::memset( v2, 0, sizeof( float ) * 151 );

// mixing
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * real;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] = v[ j ] * anotherReal;
}

// summing up & *
for ( int j = 0; j < 151; ++j ) {
v[ j ] += v[ j ];
}
}
c1 = std::clock();

std::cerr << "\nfloat[ 151 ]" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< std::endl;

c0 = std::clock();
for ( int i = 0; i < maxIter; ++i ) {
float real = 1.245;
float anotherReal = 20.43492342;
la_vect<float, 151> v( 12.0 );
la_vect<float, 151> v2( -12.0 );
v = v2 * float( 1.0 - 0.5 ) + v * float(0.5);
v += la_vect<float, 151>( 10.0 ) * real * anotherReal;
}

c1 = std::clock();

std::cerr << "\nla_vect class" << std::endl;
std::cerr << "end CPU time : " << (long)c1 << std::endl;
std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
<< std::endl;

}

design> a.out

float[ 151 ]
end CPU time : 540000
elapsed CPU time : 0.54

la_vect class
end CPU time : 1160000
elapsed CPU time : 0.62


Also note that the computation using raw arrays and loops does not the same
thing as the computation using SuperVector. This might explain the
remaining difference: theoretically, an optimizing compiler could eliminate
almost all temporaries.


Best

Kai-Uwe Bux
 
M

ma740988

Daniel said:
Of course the fact that my computer is faster than yours isn't relevant,
it's the percentage difference in the numbers that surprises. I'm
showing a 5% increase in speed for the algorithm use, whereas you show a
176% *decrease*. Are you sure you compiled with full optimizations for
speed?
Timing is one of those things that often puzzles me when using the .NET
compiler ( .NET 05 ). Algorithms almost always seem so much slower
that conventional loops. Full optimization produce a similar result.

Manual iteration
end CPU time : 7653
elapsed CPU time : 7.653

Algorithm Use
end CPU time : 29034
elapsed CPU time : 21.366

I'm confused.
 
P

peter koch

ma740988 skrev:
Timing is one of those things that often puzzles me when using the .NET
compiler ( .NET 05 ). Algorithms almost always seem so much slower
that conventional loops. Full optimization produce a similar result.

Could you give us the command-line arguments here? Also tell us if you
have removed the extra checking in Visual Studio 2005.
/Peter
 
D

Daniel T.

ma740988 said:
Timing is one of those things that often puzzles me when using the .NET
compiler ( .NET 05 ). Algorithms almost always seem so much slower
that conventional loops. Full optimization produce a similar result.

Manual iteration
end CPU time : 7653
elapsed CPU time : 7.653

Algorithm Use
end CPU time : 29034
elapsed CPU time : 21.366

I'm confused.

Even more confusing: These numbers are *slower* than when you didn't
compile with full optimization. Maybe you are optimizing for smaller
size rather than faster speed?
 
M

ma740988

peter said:
Could you give us the command-line arguments here? Also tell us if you
have removed the extra checking in Visual Studio 2005.
/Peter

/Ox /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm
/EHsc /MDd /Fo"Debug\\" /Fd"Debug\vc80.pdb" /W3 /nologo /c /Wp64 /TP
/errorReport:prompt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top