STL - Vector - Normalization ?

R

Rakesh Kumar

Hi,
Is there a function that can help me to normalize the given vector
(belonging to C++ STL) of 'double's in the range 0.0 - 1.0

i.e . given a set of 'double's, each element 'ele' would be replaced by
elenew = (ele - min) / (max - min) , where max and min are the
maximum and minimum of the given vector.

Or, aliter - is there a way , I can specify a function (here, in
this case, I could specify my normalize function) to be applied to each
and every element of a vector and get a new vector consisting of the
value returned by the function.
 
V

Victor Bazarov

Rakesh Kumar said:
Hi,
Is there a function that can help me to normalize the given vector
(belonging to C++ STL) of 'double's in the range 0.0 - 1.0

i.e . given a set of 'double's, each element 'ele' would be replaced by
elenew = (ele - min) / (max - min) , where max and min are the
maximum and minimum of the given vector.

Or, aliter - is there a way , I can specify a function (here, in
this case, I could specify my normalize function) to be applied to each
and every element of a vector and get a new vector consisting of the
value returned by the function.

Aw, come on... What happened to the programmer's pride? Couldn't
you write one if you just needed it?
 
D

David Harmon

On Wed, 21 Apr 2004 17:25:51 -0700 in comp.lang.c++, Rakesh Kumar
i.e . given a set of 'double's, each element 'ele' would be replaced by
elenew = (ele - min) / (max - min) , where max and min are the
maximum and minimum of the given vector.

See std::min_element<>, std::max_element<>, and std::transform<>.
Stroustrup section 18.9, 18.6.2.
 
D

Dietmar Kuehl

Victor Bazarov said:
Aw, come on... What happened to the programmer's pride? Couldn't
you write one if you just needed it?

Actually, I think it would make sense to have a quite extensive algorithms
library which would cover, amoung other things, also many of the typical
numerical algorithms.
 
P

puppet_sock

Actually, I think it would make sense to have a quite extensive algorithms
library which would cover, amoung other things, also many of the typical
numerical algorithms.

Indeed it would. But I'm not sure it would make sense for it to
be part of the standard language. The language is already pretty
large. I think it makes more sense to have math packages available.
As well, it makes sense for such things to be tweaked for special
purposes, tweaked for specific hardware, etc.

Hey, guess what? That's the situation now. So I can be lazy.
Socks
 
S

Siemel Naran

David Harmon said:
See std::min_element<>, std::max_element<>, and std::transform<>.
Stroustrup section 18.9, 18.6.2.

This would be an 2*O(N) algorithm to find the min element and then the max
element. Is there a standard algorithm to find the max and min in one pass?
 
S

Siemel Naran

Pete Becker said:
Siemel Naran wrote:

2*O(N) == O(N).

Sure, mathematically speaking. But what I'm trying to say is the original
algorithm performs 2 passes through the array, whereas we could get by with
just 1.
 
V

Victor Bazarov

Siemel Naran said:
Sure, mathematically speaking. But what I'm trying to say is the original
algorithm performs 2 passes through the array, whereas we could get by with
just 1.

But the whole point of complexity being the same is that while doing
your single path, you will have to do more on every iteration, which
basically negates the difference between one and two passes.

Victor
 
S

Siemel Naran

Victor Bazarov said:
But the whole point of complexity being the same is that while doing
your single path, you will have to do more on every iteration, which
basically negates the difference between one and two passes.

Sure, there's no time saved in the instructions of the body of the loop.
But there is the time saved in the loop itself. In the 2 pass algorithm for
(int i=0; i<N; i++) we do i++ and i<N a total of 2*N times. In mathematical
libraries, people are very concerned about performance.
 
M

Michiel Salters

Siemel Naran said:
"Victor Bazarov" <[email protected]> wrote in message news:gg0ic.5519
[ min_element followed by max_element ]
Sure, there's no time saved in the instructions of the body of the loop.

Actually, there is. After the first element, an element can be either the
minimum or the maximum, but not both. The first element usually is
treated as a special case anyway, so the normal case can actually use
an else:
iter = begin;
min = max = *iter; ++iter;
while( *iter != end )
if( *iter < min ) min = *iter;
else // This reduces the complexity per element.
if (*iter > max ) max = *iter;

Of course, with a random order an long sequences, the average
saving is almost 0, and if the optimizer recognizes the pattern
the net saving of the 'else' will be 0 anyway.

Regards,
Michiel Salters
 
J

Jeff

If this is a very large vector, and this call is in the core loop of
your code, then maybe it's reasonable to worry about the 2-pass vs.
1-pass thing. Or more accurately, a 3-pass vs. 2-pass since the
normalizing the vectors will require another pass.

Here's a one pass solution to finding the max and min:

#include <algorithm>
#include <limits.h>
using namespace std;
struct MaxMin {
int max;
int min;
MaxMin() : max(MIN_INT), min(MAX_INT) {}
inline void operator()(int n) {
if (n < min) min = n;
if (n > max) max = n;
}
};

.....
MaxMin mm;
Vector<int> myvector;
// fill myvector
foreach(myvector.begin(), myvector.end(), mm);
// Now the max and min are stored in mm.max and mm.min.

But in any case, please measure the running time of this solution vs.
the running time of the one-pass solution. I suspect there will be so
little difference that the max_element() and min_element() approach
will be preferable because it requires fewer lines of code.
 
S

Siemel Naran

Jeff said:
"Siemel Naran" <[email protected]> wrote in message
If this is a very large vector, and this call is in the core loop of
your code, then maybe it's reasonable to worry about the 2-pass vs.
1-pass thing. Or more accurately, a 3-pass vs. 2-pass since the
normalizing the vectors will require another pass.

Here's a one pass solution to finding the max and min:

#include <algorithm>
#include <limits.h>
using namespace std;
struct MaxMin {
int max;
int min;
MaxMin() : max(MIN_INT), min(MAX_INT) {}
inline void operator()(int n) {
if (n < min) min = n;
if (n > max) max = n;
}
};

....
MaxMin mm;
Vector<int> myvector;
// fill myvector
foreach(myvector.begin(), myvector.end(), mm);
// Now the max and min are stored in mm.max and mm.min.

But in any case, please measure the running time of this solution vs.
the running time of the one-pass solution. I suspect there will be so
little difference that the max_element() and min_element() approach
will be preferable because it requires fewer lines of code.


So I actually timed it. Sure, if the cost of operator<(const T&, const T&)
is expensive, then the time of the for loop (namely ++i and i<N), is small
in comparison to the body of the for loop which calls operator<(const T&,
const T&). But if T is int or some other small type, as is often the case
for numerical programs, then we expect the difference to be significant.

I wrote a program that takes a command line argument. If it is "one" it
calls the one-pass method, and for "two" it calls the two-pass method. The
program creates an array of 10 million integers, fills them with random
numbers, then calls either the one-pass or two-pass method to compute the
min and max. To test the functionality of the algorithm only, and not how
well the compiler optimizes STL algorithms, I encoded the logic of the
algorithms min_element etc in the function and did not use functors.

The one pass method took 190 to 195 clock cycles (I ran it about 5 times).
The two pass method to 310 to 370 clock cycles, with a larger standard
deviation for some reason. This is a significant difference.

My platform is Windows ME, 128 MB RAM, using Borland C++ Builder version 6,
AMD Athlon processor.


#if defined(__BORLANDC__)
#include <vcl.h>
#endif

#include <string>
#include <cstdlib> // rand
#include <ctime> // clock
#include <utility> // pair
#include <iostream>
#include <cassert>


typedef std::pair<int,int> PairInt;


PairInt action1(register const int * begin, register const int *const end)
{
assert(begin != end);
register int min = *begin;
register int max = min;
for (++begin; begin!=end; ++begin)
{
const int val = *begin;
if (val<min) min=val;
else if (val>max) max=val;
}
return PairInt(min, max);
}


PairInt action2(const int *const const_begin, register const int *const end)
{
assert(const_begin != end);
register int min = *const_begin;
register int max = min;
register const int * begin = NULL;
begin = const_begin;
for (++begin; begin!=end; ++begin)
{
register const int val = *begin;
if (val<min) min=val;
}
begin = const_begin;
for (++begin; begin!=end; ++begin)
{
register const int val = *begin;
if (val>max) max=val;
}
return PairInt(min, max);
}


int main(int argc, char * * argv)
{
if (argc == 2)
{
const int N = 10000000;
int * array = new int[N];
int *const begin = array;
int *const end = array+N;

for (int * iter = begin; iter != end; ++iter) *iter = std::rand();
const std::string which = argv[1];

typedef PairInt (* PtrAction)(const int *, const int *);
PtrAction ptr = NULL;
if (which == "one") ptr = &action1;
else if (which == "two") ptr = &action2;

if (ptr)
{
using std::clock_t;
using std::clock;
const clock_t clock_start = clock();
const PairInt pair = (*ptr)(begin, end);
const clock_t clock_end = clock();

std::cout
<< "min = " << pair.first << ", "
<< "max = " << pair.second << ", "
<< "time = " << clock_end - clock_start
<< '\n'
;
}

delete[] array;
}
}
 
S

Siemel Naran

I wrote a program that takes a command line argument. If it is "one" it
calls the one-pass method, and for "two" it calls the two-pass method. The
program creates an array of 10 million integers, fills them with random
numbers, then calls either the one-pass or two-pass method to compute the
min and max. To test the functionality of the algorithm only, and not how
well the compiler optimizes STL algorithms, I encoded the logic of the
algorithms min_element etc in the function and did not use functors.

The one pass method took 190 to 195 clock cycles (I ran it about 5 times).
The two pass method to 310 to 370 clock cycles, with a larger standard
deviation for some reason. This is a significant difference.

There is something else I ran into last week when I did the test, but didn't
think much of it. My initial test used an array of 100 million integers.
But this took a very long time, and I noticed my hard disk light was on a
lot. Which means my system ran out of memory (it has only 128MB RAM) and so
was swapping memory in and out.

This is significant because it means that if operator++ or operator* is
expensive, then a one-pass method is surely preferred. This could happen if
you're short on memory and the computer has to swap RAM with disk space, or
if operator++ or operator* is intrinsically expensive as with a database
select iterator or probably even std::deque, or on some platforms where
memory access is slower (but inline arithmetic operations are really fast)
which I think is the case for SPARC machines.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top