How can a Vector class be optimized ?

Discussion in 'C++' started by mast2as@yahoo.com, Nov 5, 2006.

  1. Guest

    Hi everyone

    I am working on some code that uses colors. Until recently this code
    used colors represented a tree floats (RGB format) but recently changed
    so colors are now defined as spectrum. The size of the vector went from
    3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
    are using a simple Vector class defined as follow:

    template<typename T, int Depth>
    class Vector
    { ...
    };

    Since the move from the RGB version of the code to the Spectral version
    the application has significantly slowed dow. I did a test where I use
    the Vector class & just a straight usage of arrays of 151 floats on
    which the same operations are performed 1 million times.

    int maxIter = static_cast<int>( 1e+6 );

    #include <time.h>

    clock_t c1, c0 = clock();

    c0 = clock();
    for ( int i = 0; i < maxIter; ++i ) {
    float real = 1.245;
    float anotherReal = 20.43492342;
    float v[ 151 ];
    float v2[ 151 ];
    memset( v, 0, sizeof( float ) * 151 );
    memset( v2, 0, sizeof( float ) * 151 );

    // mixing
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
    }

    // summing up & *
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v[ j ] * real;
    }

    // summing up & *
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v[ j ] * anotherReal;
    }

    // summing up & *
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] += v[ j ];
    }
    }
    c1 = clock();

    cerr << "\nfloat[ 151 ]" << endl;
    cerr << "end CPU time : " << (long)c1 << endl;
    cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    << endl;

    c0 = clock();
    for ( int i = 0; i < maxIter; ++i ) {
    float real = 1.245;
    float anotherReal = 20.43492342;
    Vector<float, 151> v( 12.0 );
    Vector<float, 151> v2( -12.0 );
    v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
    v += Vector<float, 151>( 10.0 ) * real * anotherReal;
    }

    c1 = clock();

    cerr << "\nSuperVector class" << endl;
    cerr << "end CPU time : " << (long)c1 << endl;
    cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    << endl;

    Here are the results
    // RGB version, Vector<float, 3>
    end CPU time : 390000
    elapsed CPU time : 0.39

    // Spectral Version Vector<float, 151>
    end CPU time : 10510000
    elapsed CPU time : 10.12

    // Using arrays of 151 floats
    end CPU time : 13230000
    elapsed CPU time : 2.72

    Basically it of course shows that using the Vector class really really
    slows down the application especially has the size of the Vector
    increases and is not as efficient as doing the operations on arrays of
    floats directly. So basically my question is : is there a way of
    optimising it ?

    I do realise that doing:
    Vector<float, 151> result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;

    is not the same as doing:
    float result[ 151 ], temp [ 151 ];
    for ( int i = 0; i < 151; ++i ) {
    temp[ i ] = 0.1f;
    result[ i ] = temp[ i ] * 0.1 * 100.0;
    }

    But isn't there a way i can make the Vector class as efficient as the
    second option (which is to do the math operation on arrays of float
    directly) ? Or if the speed is a priority is writing some C type of
    code the only way i can get it back when the vector size becomes an
    issue ?

    Thanks for you help -

    template<typename T, int Size>
    class SuperVector
    {

    public:
    T w[ Size ];
    public:
    SuperVector()
    { memset( w, 0, sizeof( T ) * Size ); }
    SuperVector( const T &real )
    {
    for ( int i = 0; i < Size; ++i ) {
    (*this).w[ i ] = real;
    }
    }

    inline SuperVector<T, Size> operator * ( const SuperVector<T, Size>
    &v )
    {
    SuperVector<T, Size> sv;
    for ( int i = 0; i < Size; ++i ) {
    sv[ i ] = (*this).w[ i ] * v.w[ i ];
    }
    return sv;
    }

    inline SuperVector<T, Size> operator * ( const T &real )
    {
    SuperVector<T, Size> sv;
    for ( int i = 0; i < Size; ++i ) {
    (*this).w[ i ] *= real;
    }
    return sv;
    }


    inline SuperVector<T, Size> operator + ( const SuperVector<T, Size>
    &v )
    {
    SuperVector<T, Size> sv;
    for ( int i = 0; i < Size; ++i ) {
    sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
    }
    return sv;
    }

    inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
    &v )
    {
    for ( int i = 0; i < Size; ++i ) {
    (*this).w[ i ] += v.w[ i ];
    }
    return *this;
    }
    };
     
    , Nov 5, 2006
    #1
    1. Advertising

  2. Kai-Uwe Bux Guest

    wrote:

    > Hi everyone
    >
    > I am working on some code that uses colors. Until recently this code
    > used colors represented a tree floats (RGB format) but recently changed
    > so colors are now defined as spectrum. The size of the vector went from
    > 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
    > are using a simple Vector class defined as follow:
    >
    > template<typename T, int Depth>
    > class Vector
    > { ...
    > };
    >
    > Since the move from the RGB version of the code to the Spectral version
    > the application has significantly slowed dow. I did a test where I use
    > the Vector class & just a straight usage of arrays of 151 floats on
    > which the same operations are performed 1 million times.


    [code snipped]

    Read up on expression templates. Or, better, use a linear algebra library.


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Nov 6, 2006
    #2
    1. Advertising

  3. Salt_Peter Guest

    wrote:
    > Hi everyone
    >
    > I am working on some code that uses colors. Until recently this code
    > used colors represented a tree floats (RGB format) but recently changed
    > so colors are now defined as spectrum. The size of the vector went from
    > 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
    > are using a simple Vector class defined as follow:
    >
    > template<typename T, int Depth>
    > class Vector
    > { ...
    > };
    >
    > Since the move from the RGB version of the code to the Spectral version
    > the application has significantly slowed dow. I did a test where I use
    > the Vector class & just a straight usage of arrays of 151 floats on
    > which the same operations are performed 1 million times.
    >
    > int maxIter = static_cast<int>( 1e+6 );
    >
    > #include <time.h>
    >
    > clock_t c1, c0 = clock();
    >
    > c0 = clock();
    > for ( int i = 0; i < maxIter; ++i ) {
    > float real = 1.245;
    > float anotherReal = 20.43492342;
    > float v[ 151 ];
    > float v2[ 151 ];
    > memset( v, 0, sizeof( float ) * 151 );
    > memset( v2, 0, sizeof( float ) * 151 );
    >
    > // mixing
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
    > }
    >
    > // summing up & *
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] = v[ j ] * real;
    > }
    >
    > // summing up & *
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] = v[ j ] * anotherReal;
    > }
    >
    > // summing up & *
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] += v[ j ];
    > }
    > }
    > c1 = clock();
    >
    > cerr << "\nfloat[ 151 ]" << endl;
    > cerr << "end CPU time : " << (long)c1 << endl;
    > cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    > << endl;
    >
    > c0 = clock();
    > for ( int i = 0; i < maxIter; ++i ) {
    > float real = 1.245;
    > float anotherReal = 20.43492342;
    > Vector<float, 151> v( 12.0 );
    > Vector<float, 151> v2( -12.0 );


    std::vector<float> v(151, 12.0);
    std::vector<float> v2(151, -12.0);

    using the exact same random iterator calculations as the array above:
    see the clock results below.

    > v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
    > v += Vector<float, 151>( 10.0 ) * real * anotherReal;
    > }
    >
    > c1 = clock();
    >
    > cerr << "\nSuperVector class" << endl;
    > cerr << "end CPU time : " << (long)c1 << endl;
    > cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    > << endl;
    >
    > Here are the results
    > // RGB version, Vector<float, 3>
    > end CPU time : 390000
    > elapsed CPU time : 0.39
    >
    > // Spectral Version Vector<float, 151>
    > end CPU time : 10510000
    > elapsed CPU time : 10.12
    >
    > // Using arrays of 151 floats
    > end CPU time : 13230000
    > elapsed CPU time : 2.72

    _____________________________
    Results:

    float[ 151 ]
    end CPU time : 2620000
    elapsed CPU time : 2.62
    std::vector class
    end CPU time : 4680000
    elapsed CPU time : 2.06

    >
    > Basically it of course shows that using the Vector class really really
    > slows down the application especially has the size of the Vector
    > increases and is not as efficient as doing the operations on arrays of
    > floats directly. So basically my question is : is there a way of
    > optimising it ?


    yes, use resize() to manually specify the container's size.

    void resize(n, t = T())
    - Inserts or erases elements at the end such that the size becomes n

    >
    > I do realise that doing:
    > Vector<float, 151> result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;
    >
    > is not the same as doing:
    > float result[ 151 ], temp [ 151 ];
    > for ( int i = 0; i < 151; ++i ) {
    > temp[ i ] = 0.1f;
    > result[ i ] = temp[ i ] * 0.1 * 100.0;
    > }
    >
    > But isn't there a way i can make the Vector class as efficient as the
    > second option (which is to do the math operation on arrays of float
    > directly) ? Or if the speed is a priority is writing some C type of
    > code the only way i can get it back when the vector size becomes an
    > issue ?
    >
    > Thanks for you help -
    >
    > template<typename T, int Size>
    > class SuperVector
    > {
    >
    > public:
    > T w[ Size ];
    > public:
    > SuperVector()
    > { memset( w, 0, sizeof( T ) * Size ); }
    > SuperVector( const T &real )
    > {
    > for ( int i = 0; i < Size; ++i ) {
    > (*this).w[ i ] = real;
    > }
    > }
    >
    > inline SuperVector<T, Size> operator * ( const SuperVector<T, Size>
    > &v )
    > {
    > SuperVector<T, Size> sv;
    > for ( int i = 0; i < Size; ++i ) {
    > sv[ i ] = (*this).w[ i ] * v.w[ i ];
    > }
    > return sv;
    > }
    >
    > inline SuperVector<T, Size> operator * ( const T &real )
    > {
    > SuperVector<T, Size> sv;
    > for ( int i = 0; i < Size; ++i ) {
    > (*this).w[ i ] *= real;
    > }
    > return sv;
    > }
    >
    >
    > inline SuperVector<T, Size> operator + ( const SuperVector<T, Size>
    > &v )
    > {
    > SuperVector<T, Size> sv;
    > for ( int i = 0; i < Size; ++i ) {
    > sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
    > }
    > return sv;
    > }
    >
    > inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
    > &v )
    > {
    > for ( int i = 0; i < Size; ++i ) {
    > (*this).w[ i ] += v.w[ i ];
    > }
    > return *this;
    > }
    > };
     
    Salt_Peter, Nov 6, 2006
    #3
  4. Daniel T. Guest

    "" <> wrote:

    > v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
    > v += Vector<float, 151>( 10.0 ) * real * anotherReal;


    Optimize your vector class by removing the op* and op+. Too many
    temporaries are being created.

    Here is an interesting exorcise:

    int maxIter = static_cast<int>( 1e+6 );

    clock_t c1, c0 = clock();

    struct binary_op
    {
    float operator()( float lhs, float rhs ) const
    {
    return lhs * ( 1.0 - 0.5 ) + rhs * 0.5;
    }
    };

    struct unary_op
    {
    unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { }
    float operator()( float v ) const {
    return v + 10.0 * real * anotherReal;
    }
    const float real, anotherReal;
    };

    int main() {
    float real = 1.245;
    float anotherReal = 20.43492342;
    vector<float> v( 151, 12.0 );
    vector<float> v2( 151, -12.0 );
    c0 = clock();
    for ( int i = 0; i < maxIter; ++i ) {
    // mixing
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
    }

    // summing up & *
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v[ j ] * real;
    }

    // summing up & *
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v[ j ] * anotherReal;
    }

    // summing up & *
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] += v[ j ];
    }
    }
    c1 = clock();

    cerr << "\nManual iteration" << endl;
    cerr << "end CPU time : " << (long)c1 << endl;
    cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    << endl;

    for ( int i = 0; i < 151; ++i ) {
    v = 12.0;
    v2 = -12.0;
    }

    c0 = clock();
    for ( int i = 0; i < maxIter; ++i ) {
    transform( v2.begin(), v2.end(), v.begin(), v.begin(),
    binary_op() );
    transform( v.begin(), v.end(), v.begin(),
    unary_op( real, anotherReal ) );
    }
    c1 = clock();
    cerr << "\nAlgorithm Use" << endl;
    cerr << "end CPU time : " << (long)c1 << endl;
    cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    << endl;
    }

    My output:

    Manual iteration
    end CPU time : 174
    elapsed CPU time : 1.74

    Algorithm Use
    end CPU time : 265
    elapsed CPU time : 0.91

    --
    To send me email, put "sheltie" in the subject.
     
    Daniel T., Nov 6, 2006
    #4
  5. ma740988 Guest

    Daniel T. wrote:
    > "" <> wrote:
    >
    > > v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
    > > v += Vector<float, 151>( 10.0 ) * real * anotherReal;

    >
    > Optimize your vector class by removing the op* and op+. Too many
    > temporaries are being created.
    >
    > Here is an interesting exorcise:
    >
    > int maxIter = static_cast<int>( 1e+6 );
    >
    > clock_t c1, c0 = clock();
    >
    > struct binary_op
    > {
    > float operator()( float lhs, float rhs ) const
    > {
    > return lhs * ( 1.0 - 0.5 ) + rhs * 0.5;
    > }
    > };
    >
    > struct unary_op
    > {
    > unary_op( float r, float r2 ): real( r ), anotherReal( r2 ) { }
    > float operator()( float v ) const {
    > return v + 10.0 * real * anotherReal;
    > }
    > const float real, anotherReal;
    > };
    >
    > int main() {
    > float real = 1.245;
    > float anotherReal = 20.43492342;
    > vector<float> v( 151, 12.0 );
    > vector<float> v2( 151, -12.0 );
    > c0 = clock();
    > for ( int i = 0; i < maxIter; ++i ) {
    > // mixing
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
    > }
    >
    > // summing up & *
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] = v[ j ] * real;
    > }
    >
    > // summing up & *
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] = v[ j ] * anotherReal;
    > }
    >
    > // summing up & *
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] += v[ j ];
    > }
    > }
    > c1 = clock();
    >
    > cerr << "\nManual iteration" << endl;
    > cerr << "end CPU time : " << (long)c1 << endl;
    > cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    > << endl;
    >
    > for ( int i = 0; i < 151; ++i ) {
    > v = 12.0;
    > v2 = -12.0;
    > }
    >
    > c0 = clock();
    > for ( int i = 0; i < maxIter; ++i ) {
    > transform( v2.begin(), v2.end(), v.begin(), v.begin(),
    > binary_op() );
    > transform( v.begin(), v.end(), v.begin(),
    > unary_op( real, anotherReal ) );
    > }
    > c1 = clock();
    > cerr << "\nAlgorithm Use" << endl;
    > cerr << "end CPU time : " << (long)c1 << endl;
    > cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    > << endl;
    > }
    >
    > My output:
    >
    > Manual iteration
    > end CPU time : 174
    > elapsed CPU time : 1.74
    >
    > Algorithm Use
    > end CPU time : 265
    > elapsed CPU time : 0.91
    >

    Perhaps, I'm missing something here, nonetheless, the output for Manual
    Iteration ( dump v.front() and v.back() ) results in all zeros, while
    that of the algorithm doesn't. Digging deeper I realize that the
    return from unary_op and the summation in the manual iteration aren't
    the same. So I modified the source such that we have:

    int main() {
    float real = 1.245;
    float anotherReal = 20.43492342;
    std::vector<float> v( 151, 12.0 );
    std::vector<float> v2( 151, -12.0 );
    c0 = clock();
    for ( int i = 0; i < maxIter; ++i ) {
    // mixing
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
    }
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v[ j ] + 10 * real * anotherReal;
    }

    }
    c1 = clock();

    std::cerr << "\nManual iteration" << std::endl;
    std::cerr << "end CPU time : " << (long)c1 << std::endl;
    std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) /
    CLOCKS_PER_SEC
    << std::endl;


    //std::cout << " ### " << v.front() << std::endl;
    //std::cout << " ### " << v.back() << std::endl;

    for ( int i = 0; i < 151; ++i ) {
    v = 12.0;
    v2 = -12.0;
    }


    c0 = clock();
    for ( int i = 0; i < maxIter; ++i ) {
    std::transform( v2.begin(), v2.end(), v.begin(), v.begin(),
    binary_op() );
    std::transform( v.begin(), v.end(), v.begin(),
    unary_op( real, anotherReal ) );
    }
    c1 = clock();
    std::cerr << "\nAlgorithm Use" << std::endl;
    std::cerr << "end CPU time : " << (long)c1 << std::endl;
    std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) /
    CLOCKS_PER_SEC
    << std::endl;

    //std::cout << " ## " << v.front() << std::endl;
    //std::cout << " ## " << v.back() << std::endl;

    }

    P4 3.2Ghz MSVC.NET -O3 optimization

    Manual iteration
    end CPU time : 7570
    elapsed CPU time : 7.57

    Algorithm Use
    end CPU time : 28453
    elapsed CPU time : 20.883

    Press any key to continue . . .
     
    ma740988, Nov 7, 2006
    #5
  6. Daniel T. Guest

    "ma740988" <> wrote:
    > Daniel T. wrote:
    >
    >> Manual iteration
    >> end CPU time : 174
    >> elapsed CPU time : 1.74
    >>
    >> Algorithm Use
    >> end CPU time : 265
    >> elapsed CPU time : 0.91

    >
    > Perhaps, I'm missing something here, nonetheless, the output for Manual
    > Iteration ( dump v.front() and v.back() ) results in all zeros, while
    > that of the algorithm doesn't. Digging deeper I realize that the
    > return from unary_op and the summation in the manual iteration aren't
    > the same. So I modified the source...
    >
    > P4 3.2Ghz MSVC.NET -O3 optimization
    >
    > Manual iteration
    > end CPU time : 7570
    > elapsed CPU time : 7.57
    >
    > Algorithm Use
    > end CPU time : 28453
    > elapsed CPU time : 20.883
    >
    > Press any key to continue . . .


    Odd, I used your main and it, of course, speeded up the manual iteration
    considerably:

    PowerPC G5 1.6GHz g++-4.0

    Manual iteration
    end CPU time : 96
    elapsed CPU time : 0.96

    Algorithm Use
    end CPU time : 187
    elapsed CPU time : 0.91

    cpp_sandbox has exited with status 0.

    Of course the fact that my computer is faster than yours isn't relevant,
    it's the percentage difference in the numbers that surprises. I'm
    showing a 5% increase in speed for the algorithm use, whereas you show a
    176% *decrease*. Are you sure you compiled with full optimizations for
    speed?

    --
    To send me email, put "sheltie" in the subject.
     
    Daniel T., Nov 7, 2006
    #6
  7. ma740988 Guest

    Daniel T. wrote:
    > "ma740988" <> wrote:
    > > Daniel T. wrote:

    >
    > Of course the fact that my computer is faster than yours isn't relevant,
    > it's the percentage difference in the numbers that surprises. I'm
    > showing a 5% increase in speed for the algorithm use, whereas you show a
    > 176% *decrease*. Are you sure you compiled with full optimizations for
    > speed?


    I'll try full optimization then observe the difference. My initial
    test was done with 03 (MSCV.NET 05 ) optimzation. What compiler are
    you using?
     
    ma740988, Nov 8, 2006
    #7
  8. Daniel T. Guest

    "ma740988" <> wrote:
    > Daniel T. wrote:
    >> "ma740988" <> wrote:
    >>> Daniel T. wrote:

    >>
    >> Of course the fact that my computer is faster than yours isn't
    >> relevant, it's the percentage difference in the numbers that
    >> surprises. I'm showing a 5% increase in speed for the algorithm
    >> use, whereas you show a 176% *decrease*. Are you sure you compiled
    >> with full optimizations for speed?

    >
    > I'll try full optimization then observe the difference. My initial
    > test was done with 03 (MSCV.NET 05 ) optimzation. What compiler are
    > you using?


    PowerPC G5 1.6GHz g++ - 4.0

    --
    To send me email, put "sheltie" in the subject.
     
    Daniel T., Nov 8, 2006
    #8
  9. Kai-Uwe Bux Guest

    wrote:

    > Hi everyone
    >
    > I am working on some code that uses colors. Until recently this code
    > used colors represented a tree floats (RGB format) but recently changed
    > so colors are now defined as spectrum. The size of the vector went from
    > 3 (RGB) to 151 (400 nm to 700 with a sample every 2nm). The variables
    > are using a simple Vector class defined as follow:
    >
    > template<typename T, int Depth>
    > class Vector
    > { ...
    > };
    >
    > Since the move from the RGB version of the code to the Spectral version
    > the application has significantly slowed dow. I did a test where I use
    > the Vector class & just a straight usage of arrays of 151 floats on
    > which the same operations are performed 1 million times.
    >
    > int maxIter = static_cast<int>( 1e+6 );
    >
    > #include <time.h>
    >
    > clock_t c1, c0 = clock();
    >
    > c0 = clock();
    > for ( int i = 0; i < maxIter; ++i ) {
    > float real = 1.245;
    > float anotherReal = 20.43492342;
    > float v[ 151 ];
    > float v2[ 151 ];
    > memset( v, 0, sizeof( float ) * 151 );
    > memset( v2, 0, sizeof( float ) * 151 );
    >
    > // mixing
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
    > }
    >
    > // summing up & *
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] = v[ j ] * real;
    > }
    >
    > // summing up & *
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] = v[ j ] * anotherReal;
    > }
    >
    > // summing up & *
    > for ( int j = 0; j < 151; ++j ) {
    > v[ j ] += v[ j ];
    > }
    > }
    > c1 = clock();
    >
    > cerr << "\nfloat[ 151 ]" << endl;
    > cerr << "end CPU time : " << (long)c1 << endl;
    > cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    > << endl;
    >
    > c0 = clock();
    > for ( int i = 0; i < maxIter; ++i ) {
    > float real = 1.245;
    > float anotherReal = 20.43492342;
    > Vector<float, 151> v( 12.0 );
    > Vector<float, 151> v2( -12.0 );
    > v = v2 * ( 1.0 - 0.5 ) + v * 0.5;
    > v += Vector<float, 151>( 10.0 ) * real * anotherReal;
    > }
    >
    > c1 = clock();
    >
    > cerr << "\nSuperVector class" << endl;
    > cerr << "end CPU time : " << (long)c1 << endl;
    > cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    > << endl;
    >
    > Here are the results
    > // RGB version, Vector<float, 3>
    > end CPU time : 390000
    > elapsed CPU time : 0.39
    >
    > // Spectral Version Vector<float, 151>
    > end CPU time : 10510000
    > elapsed CPU time : 10.12
    >
    > // Using arrays of 151 floats
    > end CPU time : 13230000
    > elapsed CPU time : 2.72
    >
    > Basically it of course shows that using the Vector class really really
    > slows down the application especially has the size of the Vector
    > increases and is not as efficient as doing the operations on arrays of
    > floats directly. So basically my question is : is there a way of
    > optimising it ?
    >
    > I do realise that doing:
    > Vector<float, 151> result = Vecotr<float, 151>( 0.1 ) * 0.1 * 100.0;
    >
    > is not the same as doing:
    > float result[ 151 ], temp [ 151 ];
    > for ( int i = 0; i < 151; ++i ) {
    > temp[ i ] = 0.1f;
    > result[ i ] = temp[ i ] * 0.1 * 100.0;
    > }
    >
    > But isn't there a way i can make the Vector class as efficient as the
    > second option (which is to do the math operation on arrays of float
    > directly) ? Or if the speed is a priority is writing some C type of
    > code the only way i can get it back when the vector size becomes an
    > issue ?
    >
    > Thanks for you help -
    >
    > template<typename T, int Size>
    > class SuperVector
    > {
    >
    > public:
    > T w[ Size ];
    > public:
    > SuperVector()
    > { memset( w, 0, sizeof( T ) * Size ); }
    > SuperVector( const T &real )
    > {
    > for ( int i = 0; i < Size; ++i ) {
    > (*this).w[ i ] = real;
    > }
    > }
    >
    > inline SuperVector<T, Size> operator * ( const SuperVector<T, Size>
    > &v )
    > {
    > SuperVector<T, Size> sv;
    > for ( int i = 0; i < Size; ++i ) {
    > sv[ i ] = (*this).w[ i ] * v.w[ i ];
    > }
    > return sv;
    > }
    >
    > inline SuperVector<T, Size> operator * ( const T &real )
    > {
    > SuperVector<T, Size> sv;
    > for ( int i = 0; i < Size; ++i ) {
    > (*this).w[ i ] *= real;
    > }
    > return sv;
    > }
    >
    >
    > inline SuperVector<T, Size> operator + ( const SuperVector<T, Size>
    > &v )
    > {
    > SuperVector<T, Size> sv;
    > for ( int i = 0; i < Size; ++i ) {
    > sv.w[ i ] = (*this).w[ i ] + v.w[ i ];
    > }
    > return sv;
    > }
    >
    > inline SuperVector<T, Size>& operator += ( const SuperVector<T, Size>
    > &v )
    > {
    > for ( int i = 0; i < Size; ++i ) {
    > (*this).w[ i ] += v.w[ i ];
    > }
    > return *this;
    > }
    > };



    Here is an illustration of how expression template reduce the number of
    temporaries:

    #include <cstdlib> // std::size_t
    #include <iostream>

    /*
    We define many things that are not meant to show up
    in client code:
    */
    namespace DO_NOT_USE {

    // the basic container:
    template < typename ValueType, std::size_t Size >
    class VectorData {

    ValueType the_data [ Size ];

    public:

    VectorData ( ValueType val = ValueType() )
    {
    for ( std::size_t i = 0; i < Size; ++ i ) {
    the_data[ i ] = val;
    }
    }

    ValueType operator[] ( std::size_t i ) const {
    return ( the_data );
    }

    ValueType & operator[] ( std::size_t i ) {
    return ( the_data );
    }

    };

    template < typename ValueType, std::size_t Size, typename Expr >
    struct VectorExpr : public Expr {

    VectorExpr ( void ) : Expr() {}

    VectorExpr ( Expr const & a ) : Expr(a) {}

    };


    template < typename ValueType, std::size_t Size, typename Expr >
    std::eek:stream &
    operator<< ( std::eek:stream & o_str,
    VectorExpr< ValueType, Size, Expr > const & a ) {
    if ( Size > 0 ) {
    std::size_t i = 0;
    while( i < Size-1 ) {
    o_str << a << " ";
    ++i;
    }
    o_str << a;
    }
    return ( o_str );
    }

    template < typename ValueType, std::size_t Size, typename ExprA, typename
    ExprB >
    struct VectorPlusVector {

    ExprA const & a;
    ExprB const & b;

    VectorPlusVector ( ExprA const & a_, ExprB const & b_ )
    : a ( a_ )
    , b ( b_ )
    {}

    ValueType operator[] ( std::size_t i ) const {
    return ( a + b );
    }

    };

    template < typename ValueType, std::size_t Size, typename ExprA, typename
    ExprB >
    VectorExpr< ValueType, Size, VectorPlusVector< ValueType, Size, ExprA,
    ExprB > >
    operator+ ( VectorExpr< ValueType, Size, ExprA > const & a,
    VectorExpr< ValueType, Size, ExprB > const & b ) {
    return ( VectorPlusVector< ValueType, Size, ExprA, ExprB >( a, b ) );
    }


    template < typename ValueType, std::size_t Size, typename ExprA >
    struct VectorTimesScalar {

    ExprA const & a;
    ValueType b;

    VectorTimesScalar ( ExprA const & a_, ValueType b_ )
    : a ( a_ )
    , b ( b_ )
    {}

    ValueType operator[] ( std::size_t i ) const {
    return ( a * b );
    }

    };

    template < typename ValueType, std::size_t Size, typename ExprA >
    VectorExpr< ValueType, Size, VectorTimesScalar< ValueType, Size, ExprA > >
    operator* ( VectorExpr< ValueType, Size, ExprA > const & a,
    ValueType b ) {
    return ( VectorTimesScalar< ValueType, Size, ExprA >( a, b ) );
    }



    template < typename ValueType, std::size_t Size >
    class la_vect
    : public VectorExpr< ValueType, Size, VectorData< ValueType, Size > >
    {
    public:

    la_vect ( ValueType val = ValueType() )
    : VectorExpr< ValueType, Size, VectorData< ValueType, Size > >( val )
    {}

    template < typename Expr >
    la_vect & operator= ( VectorExpr< ValueType, Size, Expr > const & rhs )
    {
    for ( std::size_t i = 0; i < Size; ++i ) {
    (*this)[ i ] = rhs[ i ];
    }
    return ( *this );
    }

    template < typename Expr >
    la_vect & operator+= ( VectorExpr< ValueType, Size, Expr > const & rhs )
    {
    for ( std::size_t i = 0; i < Size; ++i ) {
    (*this)[ i ] += rhs[ i ];
    }
    return ( *this );
    }

    };

    }

    using DO_NOT_USE::la_vect;


    int maxIter = static_cast<int>( 100000 );

    #include <ctime>
    #include <cstdlib>

    int main ( void ) {
    std::clock_t c1, c0 = std::clock();

    c0 = std::clock();
    for ( int i = 0; i < maxIter; ++i ) {
    float real = 1.245;
    float anotherReal = 20.43492342;
    float v[ 151 ];
    float v2[ 151 ];
    std::memset( v, 0, sizeof( float ) * 151 );
    std::memset( v2, 0, sizeof( float ) * 151 );

    // mixing
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v2[ j ] * ( 1.0 - 0.5 ) + v[ j ] * 0.5;
    }

    // summing up & *
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v[ j ] * real;
    }

    // summing up & *
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] = v[ j ] * anotherReal;
    }

    // summing up & *
    for ( int j = 0; j < 151; ++j ) {
    v[ j ] += v[ j ];
    }
    }
    c1 = std::clock();

    std::cerr << "\nfloat[ 151 ]" << std::endl;
    std::cerr << "end CPU time : " << (long)c1 << std::endl;
    std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    << std::endl;

    c0 = std::clock();
    for ( int i = 0; i < maxIter; ++i ) {
    float real = 1.245;
    float anotherReal = 20.43492342;
    la_vect<float, 151> v( 12.0 );
    la_vect<float, 151> v2( -12.0 );
    v = v2 * float( 1.0 - 0.5 ) + v * float(0.5);
    v += la_vect<float, 151>( 10.0 ) * real * anotherReal;
    }

    c1 = std::clock();

    std::cerr << "\nla_vect class" << std::endl;
    std::cerr << "end CPU time : " << (long)c1 << std::endl;
    std::cerr << "elapsed CPU time : " << (float)( c1 - c0 ) / CLOCKS_PER_SEC
    << std::endl;

    }

    design> a.out

    float[ 151 ]
    end CPU time : 540000
    elapsed CPU time : 0.54

    la_vect class
    end CPU time : 1160000
    elapsed CPU time : 0.62


    Also note that the computation using raw arrays and loops does not the same
    thing as the computation using SuperVector. This might explain the
    remaining difference: theoretically, an optimizing compiler could eliminate
    almost all temporaries.


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Nov 8, 2006
    #9
  10. ma740988 Guest

    > > Daniel T. wrote:
    >
    > Of course the fact that my computer is faster than yours isn't relevant,
    > it's the percentage difference in the numbers that surprises. I'm
    > showing a 5% increase in speed for the algorithm use, whereas you show a
    > 176% *decrease*. Are you sure you compiled with full optimizations for
    > speed?
    >

    Timing is one of those things that often puzzles me when using the .NET
    compiler ( .NET 05 ). Algorithms almost always seem so much slower
    that conventional loops. Full optimization produce a similar result.

    Manual iteration
    end CPU time : 7653
    elapsed CPU time : 7.653

    Algorithm Use
    end CPU time : 29034
    elapsed CPU time : 21.366

    I'm confused.
     
    ma740988, Nov 10, 2006
    #10
  11. peter koch Guest

    ma740988 skrev:
    > > > Daniel T. wrote:

    > >
    > > Of course the fact that my computer is faster than yours isn't relevant,
    > > it's the percentage difference in the numbers that surprises. I'm
    > > showing a 5% increase in speed for the algorithm use, whereas you show a
    > > 176% *decrease*. Are you sure you compiled with full optimizations for
    > > speed?
    > >

    > Timing is one of those things that often puzzles me when using the .NET
    > compiler ( .NET 05 ). Algorithms almost always seem so much slower
    > that conventional loops. Full optimization produce a similar result.


    Could you give us the command-line arguments here? Also tell us if you
    have removed the extra checking in Visual Studio 2005.
    /Peter
    >
    > Manual iteration
    > end CPU time : 7653
    > elapsed CPU time : 7.653
    >
    > Algorithm Use
    > end CPU time : 29034
    > elapsed CPU time : 21.366
    >
    > I'm confused.
     
    peter koch, Nov 10, 2006
    #11
  12. Daniel T. Guest

    ma740988 wrote:
    >
    > Timing is one of those things that often puzzles me when using the .NET
    > compiler ( .NET 05 ). Algorithms almost always seem so much slower
    > that conventional loops. Full optimization produce a similar result.
    >
    > Manual iteration
    > end CPU time : 7653
    > elapsed CPU time : 7.653
    >
    > Algorithm Use
    > end CPU time : 29034
    > elapsed CPU time : 21.366
    >
    > I'm confused.


    Even more confusing: These numbers are *slower* than when you didn't
    compile with full optimization. Maybe you are optimizing for smaller
    size rather than faster speed?
     
    Daniel T., Nov 10, 2006
    #12
  13. ma740988 Guest

    peter koch wrote:
    > Could you give us the command-line arguments here? Also tell us if you
    > have removed the extra checking in Visual Studio 2005.
    > /Peter


    /Ox /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm
    /EHsc /MDd /Fo"Debug\\" /Fd"Debug\vc80.pdb" /W3 /nologo /c /Wp64 /TP
    /errorReport:prompt
     
    ma740988, Nov 10, 2006
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Collin VanDyck
    Replies:
    3
    Views:
    408
    Collin VanDyck
    Oct 27, 2003
  2. pmatos
    Replies:
    6
    Views:
    24,022
  3. Replies:
    8
    Views:
    1,984
    Csaba
    Feb 18, 2006
  4. Javier
    Replies:
    2
    Views:
    601
    James Kanze
    Sep 4, 2007
  5. Rushikesh Joshi
    Replies:
    0
    Views:
    386
    Rushikesh Joshi
    Jul 10, 2004
Loading...

Share This Page