# STL - Vector - Normalization ?

Discussion in 'C++' started by Rakesh Kumar, Apr 22, 2004.

1. ### Rakesh KumarGuest

Hi,
Is there a function that can help me to normalize the given vector
(belonging to C++ STL) of 'double's in the range 0.0 - 1.0

i.e . given a set of 'double's, each element 'ele' would be replaced by
elenew = (ele - min) / (max - min) , where max and min are the
maximum and minimum of the given vector.

Or, aliter - is there a way , I can specify a function (here, in
this case, I could specify my normalize function) to be applied to each
and every element of a vector and get a new vector consisting of the
value returned by the function.

--
Rakesh Kumar
** Remove nospamplz from my email address for my real email **

Rakesh Kumar, Apr 22, 2004

2. ### Victor BazarovGuest

"Rakesh Kumar" <> wrote...
> Hi,
> Is there a function that can help me to normalize the given vector
> (belonging to C++ STL) of 'double's in the range 0.0 - 1.0
>
> i.e . given a set of 'double's, each element 'ele' would be replaced by
> elenew = (ele - min) / (max - min) , where max and min are the
> maximum and minimum of the given vector.
>
> Or, aliter - is there a way , I can specify a function (here, in
> this case, I could specify my normalize function) to be applied to each
> and every element of a vector and get a new vector consisting of the
> value returned by the function.

Aw, come on... What happened to the programmer's pride? Couldn't
you write one if you just needed it?

Victor Bazarov, Apr 22, 2004

3. ### David HarmonGuest

On Wed, 21 Apr 2004 17:25:51 -0700 in comp.lang.c++, Rakesh Kumar
<> wrote,

>i.e . given a set of 'double's, each element 'ele' would be replaced by
> elenew = (ele - min) / (max - min) , where max and min are the
>maximum and minimum of the given vector.

See std::min_element<>, std::max_element<>, and std::transform<>.
Stroustrup section 18.9, 18.6.2.

David Harmon, Apr 22, 2004
4. ### Dietmar KuehlGuest

"Victor Bazarov" <> wrote:
> Aw, come on... What happened to the programmer's pride? Couldn't
> you write one if you just needed it?

Actually, I think it would make sense to have a quite extensive algorithms
library which would cover, amoung other things, also many of the typical
numerical algorithms.
--
<mailto:> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting

Dietmar Kuehl, Apr 22, 2004
5. ### Guest

(Dietmar Kuehl) wrote in message news:<>...
> "Victor Bazarov" <> wrote:
> > Aw, come on... What happened to the programmer's pride? Couldn't
> > you write one if you just needed it?

>
> Actually, I think it would make sense to have a quite extensive algorithms
> library which would cover, amoung other things, also many of the typical
> numerical algorithms.

Indeed it would. But I'm not sure it would make sense for it to
be part of the standard language. The language is already pretty
large. I think it makes more sense to have math packages available.
As well, it makes sense for such things to be tweaked for special
purposes, tweaked for specific hardware, etc.

Hey, guess what? That's the situation now. So I can be lazy.
Socks

, Apr 22, 2004
6. ### Siemel NaranGuest

"David Harmon" <> wrote in message
> On Wed, 21 Apr 2004 17:25:51 -0700 in comp.lang.c++, Rakesh Kumar

> >i.e . given a set of 'double's, each element 'ele' would be replaced by
> > elenew = (ele - min) / (max - min) , where max and min are the
> >maximum and minimum of the given vector.

>
> See std::min_element<>, std::max_element<>, and std::transform<>.
> Stroustrup section 18.9, 18.6.2.

This would be an 2*O(N) algorithm to find the min element and then the max
element. Is there a standard algorithm to find the max and min in one pass?

Siemel Naran, Apr 22, 2004
7. ### Pete BeckerGuest

Siemel Naran wrote:
>
> This would be an 2*O(N) algorithm to find the min element and then the max
> element.

2*O(N) == O(N).

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Pete Becker, Apr 22, 2004
8. ### Rakesh KumarGuest

Thanks David for the pointers.

David Harmon wrote:
> See std::min_element<>, std::max_element<>, and std::transform<>.
> Stroustrup section 18.9, 18.6.2.
>

--
Rakesh Kumar
** Remove nospamplz from my email address for my real email **

Rakesh Kumar, Apr 23, 2004
9. ### Siemel NaranGuest

"Pete Becker" <> wrote in message
news:...
> Siemel Naran wrote:

> > This would be an 2*O(N) algorithm to find the min element and then the

max
> > element.

>
> 2*O(N) == O(N).

Sure, mathematically speaking. But what I'm trying to say is the original
algorithm performs 2 passes through the array, whereas we could get by with
just 1.

Siemel Naran, Apr 23, 2004
10. ### Victor BazarovGuest

"Siemel Naran" <> wrote...
> "Pete Becker" <> wrote in message
> news:...
> > Siemel Naran wrote:

>
> > > This would be an 2*O(N) algorithm to find the min element and then the

> max
> > > element.

> >
> > 2*O(N) == O(N).

>
> Sure, mathematically speaking. But what I'm trying to say is the original
> algorithm performs 2 passes through the array, whereas we could get by

with
> just 1.

But the whole point of complexity being the same is that while doing
your single path, you will have to do more on every iteration, which
basically negates the difference between one and two passes.

Victor

Victor Bazarov, Apr 23, 2004
11. ### Siemel NaranGuest

"Victor Bazarov" <> wrote in message news:gg0ic.5519
> "Siemel Naran" <> wrote...

> > Sure, mathematically speaking. But what I'm trying to say is the

original
> > algorithm performs 2 passes through the array, whereas we could get by

> with
> > just 1.

>
> But the whole point of complexity being the same is that while doing
> your single path, you will have to do more on every iteration, which
> basically negates the difference between one and two passes.

Sure, there's no time saved in the instructions of the body of the loop.
But there is the time saved in the loop itself. In the 2 pass algorithm for
(int i=0; i<N; i++) we do i++ and i<N a total of 2*N times. In mathematical
libraries, people are very concerned about performance.

Siemel Naran, Apr 23, 2004
12. ### Michiel SaltersGuest

"Siemel Naran" <> wrote in message news:<sp2ic.12749\$>...
> "Victor Bazarov" <> wrote in message news:gg0ic.5519

[ min_element followed by max_element ]
> >
> > But the whole point of complexity being the same is that while doing
> > your single path, you will have to do more on every iteration, which
> > basically negates the difference between one and two passes.

>
> Sure, there's no time saved in the instructions of the body of the loop.

Actually, there is. After the first element, an element can be either the
minimum or the maximum, but not both. The first element usually is
treated as a special case anyway, so the normal case can actually use
an else:
iter = begin;
min = max = *iter; ++iter;
while( *iter != end )
if( *iter < min ) min = *iter;
else // This reduces the complexity per element.
if (*iter > max ) max = *iter;

Of course, with a random order an long sequences, the average
saving is almost 0, and if the optimizer recognizes the pattern
the net saving of the 'else' will be 0 anyway.

Regards,
Michiel Salters

Michiel Salters, Apr 23, 2004
13. ### JeffGuest

If this is a very large vector, and this call is in the core loop of
your code, then maybe it's reasonable to worry about the 2-pass vs.
1-pass thing. Or more accurately, a 3-pass vs. 2-pass since the
normalizing the vectors will require another pass.

Here's a one pass solution to finding the max and min:

#include <algorithm>
#include <limits.h>
using namespace std;
struct MaxMin {
int max;
int min;
MaxMin() : max(MIN_INT), min(MAX_INT) {}
inline void operator()(int n) {
if (n < min) min = n;
if (n > max) max = n;
}
};

.....
MaxMin mm;
Vector<int> myvector;
// fill myvector
foreach(myvector.begin(), myvector.end(), mm);
// Now the max and min are stored in mm.max and mm.min.

But in any case, please measure the running time of this solution vs.
the running time of the one-pass solution. I suspect there will be so
little difference that the max_element() and min_element() approach
will be preferable because it requires fewer lines of code.

"Siemel Naran" <> wrote in message news:<sp2ic.12749\$>...
> "Victor Bazarov" <> wrote in message news:gg0ic.5519
> > "Siemel Naran" <> wrote...

>
> > > Sure, mathematically speaking. But what I'm trying to say is the

> original
> > > algorithm performs 2 passes through the array, whereas we could get by

> with
> > > just 1.

> >
> > But the whole point of complexity being the same is that while doing
> > your single path, you will have to do more on every iteration, which
> > basically negates the difference between one and two passes.

>
> Sure, there's no time saved in the instructions of the body of the loop.
> But there is the time saved in the loop itself. In the 2 pass algorithm for
> (int i=0; i<N; i++) we do i++ and i<N a total of 2*N times. In mathematical
> libraries, people are very concerned about performance.

Jeff, Apr 23, 2004
14. ### Siemel NaranGuest

"Jeff" <> wrote in message
news:...
> "Siemel Naran" <> wrote in message

news:<sp2ic.12749\$>...
> > "Victor Bazarov" <> wrote in message

news:gg0ic.5519

> > > But the whole point of complexity being the same is that while doing
> > > your single path, you will have to do more on every iteration, which
> > > basically negates the difference between one and two passes.

> >
> > Sure, there's no time saved in the instructions of the body of the loop.
> > But there is the time saved in the loop itself. In the 2 pass algorithm

for
> > (int i=0; i<N; i++) we do i++ and i<N a total of 2*N times. In

mathematical
> > libraries, people are very concerned about performance.

> If this is a very large vector, and this call is in the core loop of
> your code, then maybe it's reasonable to worry about the 2-pass vs.
> 1-pass thing. Or more accurately, a 3-pass vs. 2-pass since the
> normalizing the vectors will require another pass.
>
> Here's a one pass solution to finding the max and min:
>
> #include <algorithm>
> #include <limits.h>
> using namespace std;
> struct MaxMin {
> int max;
> int min;
> MaxMin() : max(MIN_INT), min(MAX_INT) {}
> inline void operator()(int n) {
> if (n < min) min = n;
> if (n > max) max = n;
> }
> };
>
> ....
> MaxMin mm;
> Vector<int> myvector;
> // fill myvector
> foreach(myvector.begin(), myvector.end(), mm);
> // Now the max and min are stored in mm.max and mm.min.
>
> But in any case, please measure the running time of this solution vs.
> the running time of the one-pass solution. I suspect there will be so
> little difference that the max_element() and min_element() approach
> will be preferable because it requires fewer lines of code.

So I actually timed it. Sure, if the cost of operator<(const T&, const T&)
is expensive, then the time of the for loop (namely ++i and i<N), is small
in comparison to the body of the for loop which calls operator<(const T&,
const T&). But if T is int or some other small type, as is often the case
for numerical programs, then we expect the difference to be significant.

I wrote a program that takes a command line argument. If it is "one" it
calls the one-pass method, and for "two" it calls the two-pass method. The
program creates an array of 10 million integers, fills them with random
numbers, then calls either the one-pass or two-pass method to compute the
min and max. To test the functionality of the algorithm only, and not how
well the compiler optimizes STL algorithms, I encoded the logic of the
algorithms min_element etc in the function and did not use functors.

The one pass method took 190 to 195 clock cycles (I ran it about 5 times).
The two pass method to 310 to 370 clock cycles, with a larger standard
deviation for some reason. This is a significant difference.

My platform is Windows ME, 128 MB RAM, using Borland C++ Builder version 6,
AMD Athlon processor.

#if defined(__BORLANDC__)
#include <vcl.h>
#endif

#include <string>
#include <cstdlib> // rand
#include <ctime> // clock
#include <utility> // pair
#include <iostream>
#include <cassert>

typedef std:air<int,int> PairInt;

PairInt action1(register const int * begin, register const int *const end)
{
assert(begin != end);
register int min = *begin;
register int max = min;
for (++begin; begin!=end; ++begin)
{
const int val = *begin;
if (val<min) min=val;
else if (val>max) max=val;
}
return PairInt(min, max);
}

PairInt action2(const int *const const_begin, register const int *const end)
{
assert(const_begin != end);
register int min = *const_begin;
register int max = min;
register const int * begin = NULL;
begin = const_begin;
for (++begin; begin!=end; ++begin)
{
register const int val = *begin;
if (val<min) min=val;
}
begin = const_begin;
for (++begin; begin!=end; ++begin)
{
register const int val = *begin;
if (val>max) max=val;
}
return PairInt(min, max);
}

int main(int argc, char * * argv)
{
if (argc == 2)
{
const int N = 10000000;
int * array = new int[N];
int *const begin = array;
int *const end = array+N;

for (int * iter = begin; iter != end; ++iter) *iter = std::rand();
const std::string which = argv[1];

typedef PairInt (* PtrAction)(const int *, const int *);
PtrAction ptr = NULL;
if (which == "one") ptr = &action1;
else if (which == "two") ptr = &action2;

if (ptr)
{
using std::clock_t;
using std::clock;
const clock_t clock_start = clock();
const PairInt pair = (*ptr)(begin, end);
const clock_t clock_end = clock();

std::cout
<< "min = " << pair.first << ", "
<< "max = " << pair.second << ", "
<< "time = " << clock_end - clock_start
<< '\n'
;
}

delete[] array;
}
}

Siemel Naran, Apr 24, 2004
15. ### Siemel NaranGuest

"Siemel Naran" <> wrote in message
news:V1lic.15816

> I wrote a program that takes a command line argument. If it is "one" it
> calls the one-pass method, and for "two" it calls the two-pass method.

The
> program creates an array of 10 million integers, fills them with random
> numbers, then calls either the one-pass or two-pass method to compute the
> min and max. To test the functionality of the algorithm only, and not how
> well the compiler optimizes STL algorithms, I encoded the logic of the
> algorithms min_element etc in the function and did not use functors.
>
> The one pass method took 190 to 195 clock cycles (I ran it about 5 times).
> The two pass method to 310 to 370 clock cycles, with a larger standard
> deviation for some reason. This is a significant difference.

There is something else I ran into last week when I did the test, but didn't
think much of it. My initial test used an array of 100 million integers.
But this took a very long time, and I noticed my hard disk light was on a
lot. Which means my system ran out of memory (it has only 128MB RAM) and so
was swapping memory in and out.

This is significant because it means that if operator++ or operator* is
expensive, then a one-pass method is surely preferred. This could happen if
you're short on memory and the computer has to swap RAM with disk space, or
if operator++ or operator* is intrinsically expensive as with a database
select iterator or probably even std::deque, or on some platforms where
memory access is slower (but inline arithmetic operations are really fast)
which I think is the case for SPARC machines.

Siemel Naran, Apr 28, 2004