STL - Vector - Normalization ?

Discussion in 'C++' started by Rakesh Kumar, Apr 22, 2004.

  1. Rakesh Kumar

    Rakesh Kumar Guest

    Hi,
    Is there a function that can help me to normalize the given vector
    (belonging to C++ STL) of 'double's in the range 0.0 - 1.0

    i.e . given a set of 'double's, each element 'ele' would be replaced by
    elenew = (ele - min) / (max - min) , where max and min are the
    maximum and minimum of the given vector.

    Or, aliter - is there a way , I can specify a function (here, in
    this case, I could specify my normalize function) to be applied to each
    and every element of a vector and get a new vector consisting of the
    value returned by the function.

    --
    Rakesh Kumar
    ** Remove nospamplz from my email address for my real email **
     
    Rakesh Kumar, Apr 22, 2004
    #1
    1. Advertising

  2. "Rakesh Kumar" <> wrote...
    > Hi,
    > Is there a function that can help me to normalize the given vector
    > (belonging to C++ STL) of 'double's in the range 0.0 - 1.0
    >
    > i.e . given a set of 'double's, each element 'ele' would be replaced by
    > elenew = (ele - min) / (max - min) , where max and min are the
    > maximum and minimum of the given vector.
    >
    > Or, aliter - is there a way , I can specify a function (here, in
    > this case, I could specify my normalize function) to be applied to each
    > and every element of a vector and get a new vector consisting of the
    > value returned by the function.


    Aw, come on... What happened to the programmer's pride? Couldn't
    you write one if you just needed it?
     
    Victor Bazarov, Apr 22, 2004
    #2
    1. Advertising

  3. Rakesh Kumar

    David Harmon Guest

    On Wed, 21 Apr 2004 17:25:51 -0700 in comp.lang.c++, Rakesh Kumar
    <> wrote,

    >i.e . given a set of 'double's, each element 'ele' would be replaced by
    > elenew = (ele - min) / (max - min) , where max and min are the
    >maximum and minimum of the given vector.


    See std::min_element<>, std::max_element<>, and std::transform<>.
    Stroustrup section 18.9, 18.6.2.
     
    David Harmon, Apr 22, 2004
    #3
  4. "Victor Bazarov" <> wrote:
    > Aw, come on... What happened to the programmer's pride? Couldn't
    > you write one if you just needed it?


    Actually, I think it would make sense to have a quite extensive algorithms
    library which would cover, amoung other things, also many of the typical
    numerical algorithms.
    --
    <mailto:> <http://www.dietmar-kuehl.de/>
    <http://www.contendix.com> - Software Development & Consulting
     
    Dietmar Kuehl, Apr 22, 2004
    #4
  5. Rakesh Kumar

    Guest

    (Dietmar Kuehl) wrote in message news:<>...
    > "Victor Bazarov" <> wrote:
    > > Aw, come on... What happened to the programmer's pride? Couldn't
    > > you write one if you just needed it?

    >
    > Actually, I think it would make sense to have a quite extensive algorithms
    > library which would cover, amoung other things, also many of the typical
    > numerical algorithms.


    Indeed it would. But I'm not sure it would make sense for it to
    be part of the standard language. The language is already pretty
    large. I think it makes more sense to have math packages available.
    As well, it makes sense for such things to be tweaked for special
    purposes, tweaked for specific hardware, etc.

    Hey, guess what? That's the situation now. So I can be lazy.
    Socks
     
    , Apr 22, 2004
    #5
  6. Rakesh Kumar

    Siemel Naran Guest

    "David Harmon" <> wrote in message
    > On Wed, 21 Apr 2004 17:25:51 -0700 in comp.lang.c++, Rakesh Kumar


    > >i.e . given a set of 'double's, each element 'ele' would be replaced by
    > > elenew = (ele - min) / (max - min) , where max and min are the
    > >maximum and minimum of the given vector.

    >
    > See std::min_element<>, std::max_element<>, and std::transform<>.
    > Stroustrup section 18.9, 18.6.2.


    This would be an 2*O(N) algorithm to find the min element and then the max
    element. Is there a standard algorithm to find the max and min in one pass?
     
    Siemel Naran, Apr 22, 2004
    #6
  7. Rakesh Kumar

    Pete Becker Guest

    Siemel Naran wrote:
    >
    > This would be an 2*O(N) algorithm to find the min element and then the max
    > element.


    2*O(N) == O(N).

    --

    Pete Becker
    Dinkumware, Ltd. (http://www.dinkumware.com)
     
    Pete Becker, Apr 22, 2004
    #7
  8. Rakesh Kumar

    Rakesh Kumar Guest

    Thanks David for the pointers.

    David Harmon wrote:
    > See std::min_element<>, std::max_element<>, and std::transform<>.
    > Stroustrup section 18.9, 18.6.2.
    >


    --
    Rakesh Kumar
    ** Remove nospamplz from my email address for my real email **
     
    Rakesh Kumar, Apr 23, 2004
    #8
  9. Rakesh Kumar

    Siemel Naran Guest

    "Pete Becker" <> wrote in message
    news:...
    > Siemel Naran wrote:


    > > This would be an 2*O(N) algorithm to find the min element and then the

    max
    > > element.

    >
    > 2*O(N) == O(N).


    Sure, mathematically speaking. But what I'm trying to say is the original
    algorithm performs 2 passes through the array, whereas we could get by with
    just 1.
     
    Siemel Naran, Apr 23, 2004
    #9
  10. "Siemel Naran" <> wrote...
    > "Pete Becker" <> wrote in message
    > news:...
    > > Siemel Naran wrote:

    >
    > > > This would be an 2*O(N) algorithm to find the min element and then the

    > max
    > > > element.

    > >
    > > 2*O(N) == O(N).

    >
    > Sure, mathematically speaking. But what I'm trying to say is the original
    > algorithm performs 2 passes through the array, whereas we could get by

    with
    > just 1.


    But the whole point of complexity being the same is that while doing
    your single path, you will have to do more on every iteration, which
    basically negates the difference between one and two passes.

    Victor
     
    Victor Bazarov, Apr 23, 2004
    #10
  11. Rakesh Kumar

    Siemel Naran Guest

    "Victor Bazarov" <> wrote in message news:gg0ic.5519
    > "Siemel Naran" <> wrote...


    > > Sure, mathematically speaking. But what I'm trying to say is the

    original
    > > algorithm performs 2 passes through the array, whereas we could get by

    > with
    > > just 1.

    >
    > But the whole point of complexity being the same is that while doing
    > your single path, you will have to do more on every iteration, which
    > basically negates the difference between one and two passes.


    Sure, there's no time saved in the instructions of the body of the loop.
    But there is the time saved in the loop itself. In the 2 pass algorithm for
    (int i=0; i<N; i++) we do i++ and i<N a total of 2*N times. In mathematical
    libraries, people are very concerned about performance.
     
    Siemel Naran, Apr 23, 2004
    #11
  12. "Siemel Naran" <> wrote in message news:<sp2ic.12749$>...
    > "Victor Bazarov" <> wrote in message news:gg0ic.5519

    [ min_element followed by max_element ]
    > >
    > > But the whole point of complexity being the same is that while doing
    > > your single path, you will have to do more on every iteration, which
    > > basically negates the difference between one and two passes.

    >
    > Sure, there's no time saved in the instructions of the body of the loop.


    Actually, there is. After the first element, an element can be either the
    minimum or the maximum, but not both. The first element usually is
    treated as a special case anyway, so the normal case can actually use
    an else:
    iter = begin;
    min = max = *iter; ++iter;
    while( *iter != end )
    if( *iter < min ) min = *iter;
    else // This reduces the complexity per element.
    if (*iter > max ) max = *iter;

    Of course, with a random order an long sequences, the average
    saving is almost 0, and if the optimizer recognizes the pattern
    the net saving of the 'else' will be 0 anyway.

    Regards,
    Michiel Salters
     
    Michiel Salters, Apr 23, 2004
    #12
  13. Rakesh Kumar

    Jeff Guest

    If this is a very large vector, and this call is in the core loop of
    your code, then maybe it's reasonable to worry about the 2-pass vs.
    1-pass thing. Or more accurately, a 3-pass vs. 2-pass since the
    normalizing the vectors will require another pass.

    Here's a one pass solution to finding the max and min:

    #include <algorithm>
    #include <limits.h>
    using namespace std;
    struct MaxMin {
    int max;
    int min;
    MaxMin() : max(MIN_INT), min(MAX_INT) {}
    inline void operator()(int n) {
    if (n < min) min = n;
    if (n > max) max = n;
    }
    };

    .....
    MaxMin mm;
    Vector<int> myvector;
    // fill myvector
    foreach(myvector.begin(), myvector.end(), mm);
    // Now the max and min are stored in mm.max and mm.min.

    But in any case, please measure the running time of this solution vs.
    the running time of the one-pass solution. I suspect there will be so
    little difference that the max_element() and min_element() approach
    will be preferable because it requires fewer lines of code.

    "Siemel Naran" <> wrote in message news:<sp2ic.12749$>...
    > "Victor Bazarov" <> wrote in message news:gg0ic.5519
    > > "Siemel Naran" <> wrote...

    >
    > > > Sure, mathematically speaking. But what I'm trying to say is the

    > original
    > > > algorithm performs 2 passes through the array, whereas we could get by

    > with
    > > > just 1.

    > >
    > > But the whole point of complexity being the same is that while doing
    > > your single path, you will have to do more on every iteration, which
    > > basically negates the difference between one and two passes.

    >
    > Sure, there's no time saved in the instructions of the body of the loop.
    > But there is the time saved in the loop itself. In the 2 pass algorithm for
    > (int i=0; i<N; i++) we do i++ and i<N a total of 2*N times. In mathematical
    > libraries, people are very concerned about performance.
     
    Jeff, Apr 23, 2004
    #13
  14. Rakesh Kumar

    Siemel Naran Guest

    "Jeff" <> wrote in message
    news:...
    > "Siemel Naran" <> wrote in message

    news:<sp2ic.12749$>...
    > > "Victor Bazarov" <> wrote in message

    news:gg0ic.5519

    > > > But the whole point of complexity being the same is that while doing
    > > > your single path, you will have to do more on every iteration, which
    > > > basically negates the difference between one and two passes.

    > >
    > > Sure, there's no time saved in the instructions of the body of the loop.
    > > But there is the time saved in the loop itself. In the 2 pass algorithm

    for
    > > (int i=0; i<N; i++) we do i++ and i<N a total of 2*N times. In

    mathematical
    > > libraries, people are very concerned about performance.


    > If this is a very large vector, and this call is in the core loop of
    > your code, then maybe it's reasonable to worry about the 2-pass vs.
    > 1-pass thing. Or more accurately, a 3-pass vs. 2-pass since the
    > normalizing the vectors will require another pass.
    >
    > Here's a one pass solution to finding the max and min:
    >
    > #include <algorithm>
    > #include <limits.h>
    > using namespace std;
    > struct MaxMin {
    > int max;
    > int min;
    > MaxMin() : max(MIN_INT), min(MAX_INT) {}
    > inline void operator()(int n) {
    > if (n < min) min = n;
    > if (n > max) max = n;
    > }
    > };
    >
    > ....
    > MaxMin mm;
    > Vector<int> myvector;
    > // fill myvector
    > foreach(myvector.begin(), myvector.end(), mm);
    > // Now the max and min are stored in mm.max and mm.min.
    >
    > But in any case, please measure the running time of this solution vs.
    > the running time of the one-pass solution. I suspect there will be so
    > little difference that the max_element() and min_element() approach
    > will be preferable because it requires fewer lines of code.



    So I actually timed it. Sure, if the cost of operator<(const T&, const T&)
    is expensive, then the time of the for loop (namely ++i and i<N), is small
    in comparison to the body of the for loop which calls operator<(const T&,
    const T&). But if T is int or some other small type, as is often the case
    for numerical programs, then we expect the difference to be significant.

    I wrote a program that takes a command line argument. If it is "one" it
    calls the one-pass method, and for "two" it calls the two-pass method. The
    program creates an array of 10 million integers, fills them with random
    numbers, then calls either the one-pass or two-pass method to compute the
    min and max. To test the functionality of the algorithm only, and not how
    well the compiler optimizes STL algorithms, I encoded the logic of the
    algorithms min_element etc in the function and did not use functors.

    The one pass method took 190 to 195 clock cycles (I ran it about 5 times).
    The two pass method to 310 to 370 clock cycles, with a larger standard
    deviation for some reason. This is a significant difference.

    My platform is Windows ME, 128 MB RAM, using Borland C++ Builder version 6,
    AMD Athlon processor.


    #if defined(__BORLANDC__)
    #include <vcl.h>
    #endif

    #include <string>
    #include <cstdlib> // rand
    #include <ctime> // clock
    #include <utility> // pair
    #include <iostream>
    #include <cassert>


    typedef std::pair<int,int> PairInt;


    PairInt action1(register const int * begin, register const int *const end)
    {
    assert(begin != end);
    register int min = *begin;
    register int max = min;
    for (++begin; begin!=end; ++begin)
    {
    const int val = *begin;
    if (val<min) min=val;
    else if (val>max) max=val;
    }
    return PairInt(min, max);
    }


    PairInt action2(const int *const const_begin, register const int *const end)
    {
    assert(const_begin != end);
    register int min = *const_begin;
    register int max = min;
    register const int * begin = NULL;
    begin = const_begin;
    for (++begin; begin!=end; ++begin)
    {
    register const int val = *begin;
    if (val<min) min=val;
    }
    begin = const_begin;
    for (++begin; begin!=end; ++begin)
    {
    register const int val = *begin;
    if (val>max) max=val;
    }
    return PairInt(min, max);
    }


    int main(int argc, char * * argv)
    {
    if (argc == 2)
    {
    const int N = 10000000;
    int * array = new int[N];
    int *const begin = array;
    int *const end = array+N;

    for (int * iter = begin; iter != end; ++iter) *iter = std::rand();
    const std::string which = argv[1];

    typedef PairInt (* PtrAction)(const int *, const int *);
    PtrAction ptr = NULL;
    if (which == "one") ptr = &action1;
    else if (which == "two") ptr = &action2;

    if (ptr)
    {
    using std::clock_t;
    using std::clock;
    const clock_t clock_start = clock();
    const PairInt pair = (*ptr)(begin, end);
    const clock_t clock_end = clock();

    std::cout
    << "min = " << pair.first << ", "
    << "max = " << pair.second << ", "
    << "time = " << clock_end - clock_start
    << '\n'
    ;
    }

    delete[] array;
    }
    }
     
    Siemel Naran, Apr 24, 2004
    #14
  15. Rakesh Kumar

    Siemel Naran Guest

    "Siemel Naran" <> wrote in message
    news:V1lic.15816

    > I wrote a program that takes a command line argument. If it is "one" it
    > calls the one-pass method, and for "two" it calls the two-pass method.

    The
    > program creates an array of 10 million integers, fills them with random
    > numbers, then calls either the one-pass or two-pass method to compute the
    > min and max. To test the functionality of the algorithm only, and not how
    > well the compiler optimizes STL algorithms, I encoded the logic of the
    > algorithms min_element etc in the function and did not use functors.
    >
    > The one pass method took 190 to 195 clock cycles (I ran it about 5 times).
    > The two pass method to 310 to 370 clock cycles, with a larger standard
    > deviation for some reason. This is a significant difference.


    There is something else I ran into last week when I did the test, but didn't
    think much of it. My initial test used an array of 100 million integers.
    But this took a very long time, and I noticed my hard disk light was on a
    lot. Which means my system ran out of memory (it has only 128MB RAM) and so
    was swapping memory in and out.

    This is significant because it means that if operator++ or operator* is
    expensive, then a one-pass method is surely preferred. This could happen if
    you're short on memory and the computer has to swap RAM with disk space, or
    if operator++ or operator* is intrinsically expensive as with a database
    select iterator or probably even std::deque, or on some platforms where
    memory access is slower (but inline arithmetic operations are really fast)
    which I think is the case for SPARC machines.
     
    Siemel Naran, Apr 28, 2004
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    1,095
  2. Chris

    URL normalization

    Chris, May 3, 2004, in forum: Java
    Replies:
    2
    Views:
    3,233
    Real Gagnon
    May 4, 2004
  3. Aaron Gray

    HTML normalization

    Aaron Gray, Feb 2, 2006, in forum: HTML
    Replies:
    15
    Views:
    1,471
    Andy Dingley
    Feb 5, 2006
  4. Replies:
    8
    Views:
    1,997
    Csaba
    Feb 18, 2006
  5. Luca Risolia

    STL map to STL vector

    Luca Risolia, Jan 13, 2014, in forum: C++
    Replies:
    32
    Views:
    431
    Seungbeom Kim
    Jan 18, 2014
Loading...

Share This Page