Vector question

Discussion in 'C++' started by Kitty, Aug 29, 2004.

  1. Kitty

    Kitty Guest

    Hi, everyone.

    Given a vector<int>, what is the fastest way to find out whether there is a
    repeated element in it? The result is just "true" or "false". Thanks.

    Kitty
     
    Kitty, Aug 29, 2004
    #1
    1. Advertising

  2. Kitty

    Jon Bell Guest

    In article <413160ad$-cable.com>, Kitty <No spam> wrote:
    >
    >Given a vector<int>, what is the fastest way to find out whether there is a
    >repeated element in it? The result is just "true" or "false". Thanks.


    I'd construct an empty set<int>, then walk through the vector, processing
    each element in turn. If the element is in the set already, stop and
    return 'true'; otherwise insert the element into the set and continue.
    If you reach the end of the vector without finding any duplicates, return
    'false'.

    --
    Jon Bell <> Presbyterian College
    Dept. of Physics and Computer Science Clinton, South Carolina USA
     
    Jon Bell, Aug 29, 2004
    #2
    1. Advertising

  3. Kitty

    Siemel Naran Guest

    "Kitty" <No spam> wrote in message news:413160ad$-cable.com...

    > Given a vector<int>, what is the fastest way to find out whether there is

    a
    > repeated element in it? The result is just "true" or "false". Thanks.


    Sounds like an interview question. There are many answers, each with
    tradeoffs. Jon's method uses additional memory, but is fast time wise (time
    is N*O(lg(N), space is O(N)), and easy to write too -- that's important too
    because software that is easier to write and maintain pays off long term,
    though this rarely comes up in interviews (though it should). You can also
    not use additional memory, but sacrifice time (time would be O(N^2), space
    is O(1)). And each method has its own varations, such as set or hashtable,
    type of hash function if a hash, set or sorted vector, vector or list, etc,
    etc.
     
    Siemel Naran, Aug 29, 2004
    #3
  4. Kitty

    Siemel Naran Guest

    "Casper" <> wrote in message news:GFeYc.73396

    > In a similar problem, I use a different approach due to the fact that
    > jon's solution is not an option as it consumes too much memmory.
    >
    > I use a modified qsort and in the comparation part if two sucessive
    > elements are the same, I return true. This is slow with small lists but
    > fast with huge datasets (30.000+ as in my application takes about 100ms)
    > but require no additional memmory!


    A good solution, I think the fastest possible without using extra space.
    But it changes the original vector. Can you think of a way to do it without
    changing the original?
     
    Siemel Naran, Aug 29, 2004
    #4
  5. Kitty

    Kai-Uwe Bux Guest

    "Kitty" <No spam> wrote:

    > Hi, everyone.
    >
    > Given a vector<int>, what is the fastest way to find out whether there is
    > a repeated element in it? The result is just "true" or "false". Thanks.
    >
    > Kitty


    Here are three obvious ways of using standard algorithms. These solutions
    have the advantage that they are easy to understand. The second one could
    avoid copying if it was allowed to modify the sequence in the first place.

    #include <vector>
    #include <set>
    #include <iostream>

    template < typename ConstIter >
    bool has_dublicates_a ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::set< value_type > set_type;
    typedef typename set_type::size_type size_type;

    size_type i = 0;
    set_type s;
    for ( ConstIter iter = from; iter != to; ++iter ) {
    ++ i;
    s.insert( *iter );
    // this test could also be done after the loop:
    if ( s.size() != i ) {
    return( true );
    }
    }
    return( false );
    }

    template < typename ConstIter >
    bool has_dublicates_b ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type v ( from, to );
    std::sort( v.begin(), v.end() );
    return( std::adjacent_find( v.begin(), v.end() ) != v.end() );
    }

    int main( int argn, char ** args ){
    std::vector< int > a;
    std::vector< int > b;
    for ( int i = 0; i < 10; ++i ) {
    a.push_back( i );
    b.push_back( i ); b.push_back( i );
    }

    std::cout << has_dublicates_a( a.begin(), a.end() )
    << " "
    << has_dublicates_a( b.begin(), b.end() ) << "\n";
    std::cout << has_dublicates_b( a.begin(), a.end() )
    << " "
    << has_dublicates_b( b.begin(), b.end() ) << "\n";
    }

    As for speed, I would suggest a method based on the "partition by exchange"
    idea underlying quicksort, possibly modified like introsort to avoid O(N^2)
    worst case runtime. But that idea has been mentioned in another posting
    already.


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Aug 29, 2004
    #5
  6. Kitty

    Alex Vinokur Guest

    Alex Vinokur, Aug 29, 2004
    #6
  7. Kitty

    Kai-Uwe Bux Guest

    Siemel Naran wrote:

    > "Casper" <> wrote in message news:GFeYc.73396
    >
    >> In a similar problem, I use a different approach due to the fact that
    >> jon's solution is not an option as it consumes too much memmory.
    >>
    >> I use a modified qsort and in the comparation part if two sucessive
    >> elements are the same, I return true. This is slow with small lists but
    >> fast with huge datasets (30.000+ as in my application takes about 100ms)
    >> but require no additional memmory!

    >
    > A good solution, I think the fastest possible without using extra space.


    I agree that this is probably the fastest solution using comparisons. Given
    that the question was for vector<int>, there could be a linear time
    solution. Here is a (messy) draft of something like that. The idea is to do
    partition by exchange based on the even/odd distinction (note that
    dublicates have the same parity). So in one pass, we put all the even
    elements to the left of all the odd elements. During the same pass, shift
    down by one bit. Now, we return true if there is a dublicate in the left
    segment or in the right segment. Farther down in the recursion, we can
    sometimes return true just based on the length of the segment: if we have
    shifted so many times that all values are in [0..15], then any segment of
    length 17 is bound to contain dublicates. The run time for the worst case
    should be O( N * bit_length<int> ).


    #include <vector>
    #include <iostream>
    #include <kubux/sequenceio>

    typedef std::vector< unsigned int > Uvector;

    void print_range ( u_int* from, u_int* to ) {
    std::cerr << "[ ";
    while ( from <= to ) {
    std::cerr << *from << " ";
    ++from;
    }
    std::cerr << "]";
    }

    bool has_dublicates_helper ( u_int* from, u_int* to, u_int max ) {
    // WARNING, range is closed: [from,to]
    print_range( from, to );
    if ( to <= from ) {
    return( false );
    }
    if ( max < to - from ) {
    return( true );
    }
    u_int* low = from;
    u_int* high = to;
    while ( true ) {
    while ( ( low < high ) && ( *low % 2 == 0 ) ) {
    *low >>= 1;
    ++low;
    }
    while ( ( low < high ) && ( *high % 2 != 0 ) ) {
    *high >>= 1;
    --high;
    }
    // either ( low == high ) or ( *high is even )
    if ( low < high ) {
    // *low is odd and *high is even
    std::swap( *low, *high );
    } else {
    break;
    }
    }
    std::cerr << std::endl;
    print_range( from, to );
    std::cerr << std::endl;
    // low == high;
    if ( *low % 2 == 0 ) {
    *low >>= 1;
    return( has_dublicates_helper( from, low, max >> 1 )
    ||
    has_dublicates_helper( high+1, to, max >>1 ) );
    } else {

    *low >>= 1;
    return( has_dublicates_helper( from, low-1, max >> 1 )
    ||
    has_dublicates_helper( high, to, max >>1 ) );
    }
    }

    bool has_dublicates ( Uvector u_vect ) {
    // this makes a copy
    return( has_dublicates_helper( &u_vect[0], &u_vect[ u_vect.size()-1 ],
    std::numeric_limits< unsigned int >::max() ) );
    }

    int main( int argn, char ** args ){
    Uvector a;
    Uvector b;
    for ( int i = 0; i < 10; ++i ) {
    a.push_back( i );
    b.push_back( i ); b.push_back( i );
    }

    std::cout << has_dublicates( a )
    << " "
    << has_dublicates( b )
    << "\n";
    }


    > But it changes the original vector. Can you think of a way to do it
    > without changing the original?


    Same here, I do not see how to avoid that.


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Aug 29, 2004
    #7
  8. Kitty

    Casper Guest

    In a similar problem, I use a different approach due to the fact that
    jon's solution is not an option as it consumes too much memmory.

    I use a modified qsort and in the comparation part if two sucessive
    elements are the same, I return true. This is slow with small lists but
    fast with huge datasets (30.000+ as in my application takes about 100ms)
    but require no additional memmory!

    /Casper

    Kitty wrote:

    > Hi, everyone.
    >
    > Given a vector<int>, what is the fastest way to find out whether there is a
    > repeated element in it? The result is just "true" or "false". Thanks.
    >
    > Kitty
    >
    >
     
    Casper, Aug 29, 2004
    #8
  9. Kitty

    Conrad Weyns Guest

    "Siemel Naran" <> wrote in message
    news:GSeYc.268353$...
    > "Casper" <> wrote in message news:GFeYc.73396
    >
    > > In a similar problem, I use a different approach due to the fact that
    > > jon's solution is not an option as it consumes too much memmory.
    > >
    > > I use a modified qsort and in the comparation part if two sucessive
    > > elements are the same, I return true. This is slow with small lists but
    > > fast with huge datasets (30.000+ as in my application takes about 100ms)
    > > but require no additional memmory!

    >
    > A good solution, I think the fastest possible without using extra space.
    > But it changes the original vector. Can you think of a way to do it

    without
    > changing the original?


    If time is not an issue, perhaps this will do?

    bool has_duplicates(vector<int> const& i_Ints)
    {
    for (
    std::vector<int>::const_iterator cit = i_Ints.begin();
    cit != i_Ints.end();
    )
    {
    int i = *cit++;
    std::vector<int>::const_iterator jit = cit;
    while (jit != i_Ints.end())
    if (*jit++ == i)
    return true;
    }
    return false;
    }
    cheers,
    Conrad Weyns

    >
    >
     
    Conrad Weyns, Aug 29, 2004
    #9
  10. Kitty

    Kitty Guest

    The most important thing is to keep elements in the same positions after
    processing. That is sorting-like algorithm is not allowed.


    "Kitty" <No spam> ¦b¶l¥ó news:413160ad$-cable.com ¤¤¼¶¼g...
    > Hi, everyone.
    >
    > Given a vector<int>, what is the fastest way to find out whether there is

    a
    > repeated element in it? The result is just "true" or "false". Thanks.
    >
    > Kitty
    >
    >
     
    Kitty, Aug 29, 2004
    #10
  11. Kitty

    Daniel T. Guest

    In article <413160ad$-cable.com>, "Kitty" <No spam> wrote:

    > Given a vector<int>, what is the fastest way to find out whether there is a
    > repeated element in it? The result is just "true" or "false". Thanks.


    People have given great answers to the question asked, but I want to
    take another tack... Another way would be to not allow repeated elements
    in the vector in the first place, this can be done by wrapping the
    vector in a class that checks during insertion to make sure the element
    isn't already there...
     
    Daniel T., Aug 29, 2004
    #11
  12. Kitty

    Casper Guest

    This also sounds like a good idea, a kind of sorted vector using binary
    search. But then again, this comes awfully close to the functionality of
    a hash map which would be the ultimate solution speed wize though: if a
    collision happens, you've got yourself a duplet instance. Again maybe if
    you give a bit more info as to the size, content and use of the vector?

    /Casper

    Daniel T. wrote:
    > In article <413160ad$-cable.com>, "Kitty" <No spam> wrote:
    >
    >
    >>Given a vector<int>, what is the fastest way to find out whether there is a
    >>repeated element in it? The result is just "true" or "false". Thanks.

    >
    >
    > People have given great answers to the question asked, but I want to
    > take another tack... Another way would be to not allow repeated elements
    > in the vector in the first place, this can be done by wrapping the
    > vector in a class that checks during insertion to make sure the element
    > isn't already there...
     
    Casper, Aug 30, 2004
    #12
  13. Kitty

    Siemel Naran Guest

    "Conrad Weyns" <> wrote in message news:<LMjYc.3724$g%
    > "Siemel Naran" <> wrote in message


    > If time is not an issue, perhaps this will do?
    >
    > bool has_duplicates(vector<int> const& i_Ints)
    > {
    > for (
    > std::vector<int>::const_iterator cit = i_Ints.begin();
    > cit != i_Ints.end();
    > )
    > {
    > int i = *cit++;
    > std::vector<int>::const_iterator jit = cit;
    > while (jit != i_Ints.end())
    > if (*jit++ == i)
    > return true;
    > }
    > return false;
    > }


    Very good. Remember though to handle the special case of a vector of one element.
     
    Siemel Naran, Aug 31, 2004
    #13
  14. Kitty

    Siemel Naran Guest

    Kai-Uwe Bux <> wrote in message
    > Siemel Naran wrote:


    > I agree that this is probably the fastest solution using comparisons. Given
    > that the question was for vector<int>, there could be a linear time
    > solution. Here is a (messy) draft of something like that. The idea is to do
    > partition by exchange based on the even/odd distinction (note that
    > dublicates have the same parity). So in one pass, we put all the even
    > elements to the left of all the odd elements. During the same pass, shift
    > down by one bit. Now, we return true if there is a dublicate in the left
    > segment or in the right segment. Farther down in the recursion, we can
    > sometimes return true just based on the length of the segment: if we have
    > shifted so many times that all values are in [0..15], then any segment of
    > length 17 is bound to contain dublicates. The run time for the worst case
    > should be O( N * bit_length<int> ).


    This sounds reasonable, though I've not looked at the details
    carefully. But with N typically equal to 32 or 64, would it be too
    slow?

    > #include <vector>
    > #include <iostream>
    > #include <kubux/sequenceio>
    >
    > typedef std::vector< unsigned int > Uvector;
    >
    > void print_range ( u_int* from, u_int* to ) {
    > std::cerr << "[ ";
    > while ( from <= to ) {
    > std::cerr << *from << " ";
    > ++from;
    > }
    > std::cerr << "]";
    > }
    >
    > bool has_dublicates_helper ( u_int* from, u_int* to, u_int max ) {
    > // WARNING, range is closed: [from,to]
    > print_range( from, to );
    > if ( to <= from ) {
    > return( false );
    > }
    > if ( max < to - from ) {
    > return( true );
    > }
    > u_int* low = from;
    > u_int* high = to;
    > while ( true ) {
    > while ( ( low < high ) && ( *low % 2 == 0 ) ) {
    > *low >>= 1;
    > ++low;
    > }
    > while ( ( low < high ) && ( *high % 2 != 0 ) ) {
    > *high >>= 1;
    > --high;
    > }
    > // either ( low == high ) or ( *high is even )
    > if ( low < high ) {
    > // *low is odd and *high is even
    > std::swap( *low, *high );
    > } else {
    > break;
    > }
    > }
    > std::cerr << std::endl;
    > print_range( from, to );
    > std::cerr << std::endl;
    > // low == high;
    > if ( *low % 2 == 0 ) {
    > *low >>= 1;
    > return( has_dublicates_helper( from, low, max >> 1 )
    > ||
    > has_dublicates_helper( high+1, to, max >>1 ) );
    > } else {
    >
    > *low >>= 1;
    > return( has_dublicates_helper( from, low-1, max >> 1 )
    > ||
    > has_dublicates_helper( high, to, max >>1 ) );
    > }
    > }
    >
    > bool has_dublicates ( Uvector u_vect ) {
    > // this makes a copy
    > return( has_dublicates_helper( &u_vect[0], &u_vect[ u_vect.size()-1 ],
    > std::numeric_limits< unsigned int >::max() ) );
    > }
    >
    > int main( int argn, char ** args ){
    > Uvector a;
    > Uvector b;
    > for ( int i = 0; i < 10; ++i ) {
    > a.push_back( i );
    > b.push_back( i ); b.push_back( i );
    > }
    >
    > std::cout << has_dublicates( a )
    > << " "
    > << has_dublicates( b )
    > << "\n";
    > }
    >
    >
    > > But it changes the original vector. Can you think of a way to do it
    > > without changing the original?

    >
    > Same here, I do not see how to avoid that.


    You can for for the O(N^2) algorithm, as in the other post.
     
    Siemel Naran, Aug 31, 2004
    #14
  15. Kitty

    Kai-Uwe Bux Guest

    Siemel Naran wrote:

    > Kai-Uwe Bux <> wrote in message
    >> Siemel Naran wrote:

    >
    >> I agree that this is probably the fastest solution using comparisons.
    >> Given that the question was for vector<int>, there could be a linear time
    >> solution. Here is a (messy) draft of something like that. The idea is to
    >> do partition by exchange based on the even/odd distinction (note that
    >> dublicates have the same parity). So in one pass, we put all the even
    >> elements to the left of all the odd elements. During the same pass, shift
    >> down by one bit. Now, we return true if there is a dublicate in the left
    >> segment or in the right segment. Farther down in the recursion, we can
    >> sometimes return true just based on the length of the segment: if we have
    >> shifted so many times that all values are in [0..15], then any segment of
    >> length 17 is bound to contain dublicates. The run time for the worst case
    >> should be O( N * bit_length<int> ).

    >
    > This sounds reasonable, though I've not looked at the details
    > carefully. But with N typically equal to 32 or 64, would it be too
    > slow?



    Well, uhm, yes.

    I thought of that too: In order for ( N * bit_length<int> ) to be faster
    than O( N * log(N) ), one would have to have N > MAX_INT. That is usually a
    very unreasonable assumption.

    However, the difference to a suitably modified qsort is that this gives you
    a good *worst case*, which is still quadratic for qsort. Nonetheless, I
    found empirically that

    template < typename ConstIter >
    bool has_dublicates ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type v ( from, to );
    std::sort( v.begin(), v.end() );
    return( std::adjacent_find( v.begin(), v.end() ) != v.end() );
    }

    is strictly faster than the even/odd partitioning scheme; and I would
    expect that in many implementations of std::sort(), quicksort has been
    replaces by introsort or something similar, which would guarantee an

    O( N * log(N) )

    worst case, too.

    Moreover, there is one consideration that should be taken most seriously:
    What will typically be the case, dublicates or not? If the algorithm will
    most often encounter dublicate-free sequences, then it makes sense to
    optimize for proving that no dublicates occur. If typically the sequence
    will have dublicates, the algorithm should try to find those as quickly as
    possible. I realized that even/odd partitioning scheme sucks in this regard
    because dublicates are not found until towards the end. This is where the
    real strength of std::set<> based solutions lies:

    template < typename ConstIter >
    bool has_dublicates_a ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::set< value_type > set_type;
    typedef typename set_type::size_type size_type;

    size_type i = 0;
    set_type s;
    for ( ConstIter iter = from; iter != to; ++iter ) {
    ++ i;
    s.insert( *iter );
    // this test could also be done after the loop:
    if ( s.size() != i ) {
    return( true );
    }
    }
    return( false );
    }

    This flags dublicates early on. However, it is about 10 times slower in
    proving that there are no dublicates. Maybe, one could speed up the process
    by using an array of 256 sets -- one for each possible value of the
    low-order byte.


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Aug 31, 2004
    #15
  16. Kitty

    Conrad Weyns Guest

    "Siemel Naran" <> wrote in message
    news:...
    > "Conrad Weyns" <> wrote in message news:<LMjYc.3724$g%
    > > "Siemel Naran" <> wrote in message

    >
    > > If time is not an issue, perhaps this will do?
    > >
    > > bool has_duplicates(vector<int> const& i_Ints)
    > > {
    > > for (
    > > std::vector<int>::const_iterator cit = i_Ints.begin();
    > > cit != i_Ints.end();
    > > )
    > > {
    > > int i = *cit++;
    > > std::vector<int>::const_iterator jit = cit;
    > > while (jit != i_Ints.end())
    > > if (*jit++ == i)
    > > return true;
    > > }
    > > return false;
    > > }

    >
    > Very good. Remember though to handle the special case of a vector of one

    element.

    It's allready done: jit == i_Ints.end() when size() == 1.
    I'll admit though, I never thought of checking it, so it's mere luck... :)
    /Conrad
     
    Conrad Weyns, Aug 31, 2004
    #16
  17. Kitty

    Kai-Uwe Bux Guest

    "Kitty" <No spam> wrote:

    > Hi, everyone.
    >
    > Given a vector<int>, what is the fastest way to find out whether there is
    > a repeated element in it? The result is just "true" or "false". Thanks.
    >
    > Kitty


    Sorry for the long post. I have implemented about 10 methods and some test
    code to actually find out the fastest. After playing around with the test
    code, I am of the opinon that there are only two serious contestants as far
    as worst case runtime is concerned. Their average speed appears also to be
    fine.

    a) A non-generic method based upon partition exchange: put the even numbers
    to the left of the odd ones and during the same pass shift down by one bit.
    Then recurse on the left and right segment. This is *has_dublicates_a*, and
    *has_dublicate_b* is a small modification.

    b) A generic method based on Mussers modification of quick-sort. This is
    *has_dublicates_f*.

    Both methods have a memory overhead of N and a runtime of O(N*log(N)). The
    memory-overhead is due to copying the vector beforehand. Predictably,
    method (b) has some problems on sorted sequences. The pure quick-sort idea
    implemented in has_dublicates_e will just bail out on these.

    If someone sees undefined behavior, any kind of bad code, or others ways to
    improve this, I would love to hear about it.


    Best

    Kai-Uwe Bux



    #include <vector>
    #include <set>
    #include <algorithm>
    #include <iostream>

    typedef std::vector< unsigned long > Uvector;
    typedef unsigned long u_long;

    class has_dublicates_notification {};

    // this uses an even/odd partitioning scheme:
    // (works for unsigned integer types)
    void has_dublicates_helper_a ( u_long* from, u_long* to, u_long max ) {
    // WARNING, range is closed: [from,to]
    if ( to <= from ) {
    return;
    }
    if ( max < static_cast< unsigned long >( to - from ) ) {
    throw( has_dublicates_notification() );
    }
    u_long* low = from;
    u_long* high = to;
    while ( true ) {
    #if 0
    // before inner loop optimization
    while ( ( low < high ) && ( *low % 2 == 0 ) ) {
    *low >>= 1;
    ++low;
    }
    while ( ( low < high ) && ( *high % 2 != 0 ) ) {
    *high >>= 1;
    --high;
    }
    #else
    // optimized inner loop using sentinells.
    unsigned long dummy = *high;
    *high = 1; // odd sentinel
    while ( *low % 2 == 0 ) {
    *low >>= 1;
    ++low;
    }
    *high = dummy;
    dummy = *low;
    *low = 0; // even sentinell
    while ( *high % 2 != 0 ) {
    *high >>= 1;
    --high;
    }
    *low = dummy;
    #endif
    // either ( low == high ) or ( *high is even )
    if ( low < high ) {
    // *low is odd and *high is even
    std::swap( *low, *high );
    } else {
    break;
    }
    }
    // low == high;
    if ( *low % 2 == 0 ) {
    *low >>= 1;
    has_dublicates_helper_a( from, low, max >> 1 );
    has_dublicates_helper_a( high+1, to, max >>1 );
    } else {
    *low >>= 1;
    has_dublicates_helper_a( from, low-1, max >> 1 );
    has_dublicates_helper_a( high, to, max >> 1 );
    }
    }

    template < typename ConstIter >
    bool has_dublicates_a ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type u_vect ( from, to );
    try {
    has_dublicates_helper_a( &u_vect[0], &u_vect[ u_vect.size()-1 ],
    std::numeric_limits< unsigned long >::max() );
    return( false );
    }
    catch ( has_dublicates_notification ) {
    return( true );
    }
    }

    // this uses an even/odd partitioning scheme:
    // (works for unsigned integer types)
    // same as _a, but not using throw
    bool has_dublicates_helper_aa ( u_long* from, u_long* to, u_long max ) {
    // WARNING, range is closed: [from,to]
    if ( to <= from ) {
    return( false );
    }
    if ( max < static_cast< unsigned long >( to - from ) ) {
    return( true );
    }
    u_long* low = from;
    u_long* high = to;
    while ( true ) {
    #if 0
    // before inner loop optimization
    while ( ( low < high ) && ( *low % 2 == 0 ) ) {
    *low >>= 1;
    ++low;
    }
    while ( ( low < high ) && ( *high % 2 != 0 ) ) {
    *high >>= 1;
    --high;
    }
    #else
    // optimized inner loop using sentinells.
    unsigned long dummy = *high;
    *high = 1; // odd sentinel
    while ( *low % 2 == 0 ) {
    *low >>= 1;
    ++low;
    }
    *high = dummy;
    dummy = *low;
    *low = 0; // even sentinell
    while ( *high % 2 != 0 ) {
    *high >>= 1;
    --high;
    }
    *low = dummy;
    #endif
    // either ( low == high ) or ( *high is even )
    if ( low < high ) {
    // *low is odd and *high is even
    std::swap( *low, *high );
    } else {
    break;
    }
    }
    // low == high;
    if ( *low % 2 == 0 ) {
    *low >>= 1;
    return(
    has_dublicates_helper_aa( from, low, max >> 1 )
    ||
    has_dublicates_helper_aa( high+1, to, max >>1 )
    );
    } else {
    *low >>= 1;
    return(
    has_dublicates_helper_aa( from, low-1, max >> 1 )
    ||
    has_dublicates_helper_aa( high, to, max >> 1 )
    );
    }
    }

    template < typename ConstIter >
    bool has_dublicates_aa ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type u_vect ( from, to );
    return( has_dublicates_helper_aa( &u_vect[0], &u_vect[ u_vect.size()-1 ],
    std::numeric_limits< unsigned long >::max() ) );
    }

    // this also uses the even/odd partitioning scheme,
    // but a different stop criterion.
    // (works for unsigned integer types)
    void has_dublicates_helper_b ( u_long* from, u_long* to ) {
    // WARNING, range is closed: [from,to]
    if ( to <= from ) {
    return;
    }
    // so to > from
    if ( *from == *to ) {
    throw( has_dublicates_notification() );
    }
    u_long* low = from;
    u_long* high = to;
    while ( true ) {
    #if 0
    // before inner loop optimization
    while ( ( low < high ) && ( *low % 2 == 0 ) ) {
    *low >>= 1;
    ++low;
    }
    while ( ( low < high ) && ( *high % 2 != 0 ) ) {
    *high >>= 1;
    --high;
    }
    #else
    // optimized inner loop using sentinells.
    unsigned long dummy = *high;
    *high = 1; // odd sentinel
    while ( *low % 2 == 0 ) {
    *low >>= 1;
    ++low;
    }
    *high = dummy;
    dummy = *low;
    *low = 0; // even sentinell
    while ( *high % 2 != 0 ) {
    *high >>= 1;
    --high;
    }
    *low = dummy;
    #endif
    // either ( low == high ) or ( *high is even )
    if ( low < high ) {
    // *low is odd and *high is even
    std::swap( *low, *high );
    } else {
    break;
    }
    }
    // low == high;
    if ( *low % 2 == 0 ) {
    *low >>= 1;
    has_dublicates_helper_b( from, low );
    has_dublicates_helper_b( high+1, to );
    } else {
    *low >>= 1;
    has_dublicates_helper_b( from, low-1 );
    has_dublicates_helper_b( high, to );
    }
    }

    template < typename ConstIter >
    bool has_dublicates_b ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type u_vect ( from, to );
    try {
    has_dublicates_helper_b( &u_vect[0], &u_vect[ u_vect.size()-1 ] );
    return( false );
    }
    catch ( has_dublicates_notification ) {
    return( true );
    }
    }

    // this is the most straight forward:
    template < typename ConstIter >
    bool has_dublicates_c ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type v ( from, to );
    std::sort( v.begin(), v.end() );
    return( std::adjacent_find( v.begin(), v.end() ) != v.end() );
    }

    // this modifies heap sort:
    // (unfortunately this is slow, if someone can make this fast,
    // I would love to see that.)
    template < typename ConstIter >
    bool has_dublicates_d ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;
    typedef typename vector_type::size_type size_type;
    vector_type vect ( from, to );
    /*
    | This one is a little tough to grasp.
    | We are searching for a dublicate. As long asa we do not
    | find one, we organize the initial segment as a left-slanted heap,
    | i.e., we think of the natural numbers as labelling a binary tree
    | wherein K is the parent of 2K+1 and 2K+2. The heap invariant
    | is that a root is always strictly bigger than it children and that
    | every node in the left subtree is strictly bigger than every node
    | in the right subtree.
    */
    size_type high = 1;
    size_type last = vect.size();
    while ( high < last ) {
    size_type bad_node = high;
    while ( bad_node > 0 ) {
    // THROUGHOUT THIS LOOP:
    // The tree satisfies:
    // a) left-slanted heap condition everywhere but possibly 'bad_node'
    // b) the subtree headed by bad_node is a left-slanted heap.
    // c) if 'bad_node' is a leaf, it does not have a right sibling.
    // d)
    if ( bad_node % 2 == 0 ) {
    // 'bad_node' is a right child of its parent:
    size_type left = bad_node - 1;
    size_type right_most_child_in_left = left;
    while ( right_most_child_in_left < high/2 ) {
    right_most_child_in_left = 2*right_most_child_in_left + 2;
    }
    size_type parent = left / 2;

    if ( vect[right_most_child_in_left] < vect[bad_node] ) {
    std::swap( vect[right_most_child_in_left], vect[bad_node] );
    bad_node = right_most_child_in_left;
    // we are back to a leaf position:
    } else if ( vect[right_most_child_in_left] == vect[bad_node] ) {
    // found doublicate
    return( true );
    } else {
    bad_node = parent;
    }
    } else {
    // 'bad_node' is a left child.
    size_type parent = bad_node / 2;
    if ( vect[bad_node] == vect[parent] ) {
    return( true );
    } else if ( vect[parent] < vect[bad_node] ) {
    std::swap( vect[parent], vect[bad_node] );
    }
    bad_node = parent;
    }
    }
    ++ high;
    }
    return( false );
    }

    // this modifies quicksort:
    template < typename ConstIter >
    void has_dublicates_helper_e ( ConstIter from, ConstIter to ) {
    ConstIter low = from;
    ConstIter high = to;
    while ( low < high ) {
    while ( *low < *high ) {
    ++low;
    }
    if ( low < high ) {
    if ( *low == *high ) {
    throw( has_dublicates_notification() );
    }
    std::swap( *low, *high );
    }
    while ( *low < *high ) {
    --high;
    }
    if ( low < high ) {
    if ( *low == *high ) {
    throw( has_dublicates_notification() );
    }
    std::swap( *low, *high );
    }
    }
    if ( from < low ) {
    has_dublicates_helper_e( from, low-1 );
    }
    if ( high < to ) {
    has_dublicates_helper_e( high+1, to );
    }
    }

    template < typename ConstIter >
    bool has_dublicates_e ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type vect ( from, to );
    try {
    if ( vect.size() > 1 ) {
    has_dublicates_helper_e( vect.begin(), vect.end() - 1 );
    }
    return( false );
    }
    catch ( has_dublicates_notification ) {
    return( true );
    }
    }

    // this modifies quicksort:
    // (same as _e, but not using throw)
    template < typename ConstIter >
    bool has_dublicates_helper_ee ( ConstIter from, ConstIter to ) {
    ConstIter low = from;
    ConstIter high = to;
    while ( low < high ) {
    while ( *low < *high ) {
    ++low;
    }
    if ( low < high ) {
    if ( *low == *high ) {
    return( true );
    }
    std::swap( *low, *high );
    }
    while ( *low < *high ) {
    --high;
    }
    if ( low < high ) {
    if ( *low == *high ) {
    return( true );
    }
    std::swap( *low, *high );
    }
    }
    if ( ( from < low ) && has_dublicates_helper_ee( from, low-1 ) ) {
    return( true );
    }
    if ( ( high < to ) && has_dublicates_helper_ee( high+1, to ) ) {
    return( true );
    }
    return( false );
    }

    template < typename ConstIter >
    bool has_dublicates_ee ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type vect ( from, to );
    return( has_dublicates_helper_ee( vect.begin(), vect.end() - 1 ) );
    }

    // this modifies introsort:
    // (guaranteed O( N *log(N) )
    template < typename ConstIter >
    void has_dublicates_helper_f ( ConstIter from, ConstIter to,
    unsigned short depth ) {
    // WARNING: the range is [from,to].
    if ( depth == 0 ) {
    // Mussers self-inspection trick to ensure O( N * log(N) ).
    std::stable_sort( from, to + 1 );
    if ( std::adjacent_find( from, to+1 ) != to+1 ) {
    throw( has_dublicates_notification() );
    }
    return;
    }
    ConstIter low = from;
    ConstIter high = to;
    while ( low < high ) {
    while ( *low < *high ) {
    ++low;
    }
    if ( low < high ) {
    if ( *low == *high ) {
    throw( has_dublicates_notification() );
    }
    std::swap( *low, *high );
    }
    while ( *low < *high ) {
    --high;
    }
    if ( low < high ) {
    if ( *low == *high ) {
    throw( has_dublicates_notification() );
    }
    std::swap( *low, *high );
    }
    }
    if ( from < low ) {
    has_dublicates_helper_f( from, low-1, depth - 1 );
    }
    if ( high < to ) {
    has_dublicates_helper_f( high+1, to, depth - 1 );
    }
    }

    template < typename ConstIter >
    bool has_dublicates_f ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type vect ( from, to );
    unsigned short log = 0;
    typename vector_type::size_type v_size = vect.size();
    while ( v_size > 0 ) {
    ++ log;
    v_size >>=1;
    }
    try {
    has_dublicates_helper_f( vect.begin(), vect.end() - 1, 2*log );
    return( false );
    }
    catch ( has_dublicates_notification ) {
    return( true );
    }
    }

    // this modifies introsort:
    // (guaranteed O( N *log(N) )
    // (same as _f, but not using throw)
    template < typename ConstIter >
    bool has_dublicates_helper_ff ( ConstIter from, ConstIter to,
    unsigned short depth ) {
    // WARNING: the range is [from,to].
    if ( depth == 0 ) {
    // Mussers self-inspection trick to ensure O( N * log(N) ).
    std::stable_sort( from, to + 1 );
    return ( std::adjacent_find( from, to+1 ) != to+1 );
    }
    ConstIter low = from;
    ConstIter high = to;
    while ( low < high ) {
    while ( *low < *high ) {
    ++low;
    }
    if ( low < high ) {
    if ( *low == *high ) {
    return( true );
    }
    std::swap( *low, *high );
    }
    while ( *low < *high ) {
    --high;
    }
    if ( low < high ) {
    if ( *low == *high ) {
    return( true );
    }
    std::swap( *low, *high );
    }
    }
    if ( ( from < low ) && has_dublicates_helper_ff( from, low-1, depth - 1
    ) ) {
    return( true );
    }
    if ( ( high < to ) && has_dublicates_helper_ff( high+1, to, depth - 1 ) )
    {
    return( true );
    }
    return( false );
    }

    template < typename ConstIter >
    bool has_dublicates_ff ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::vector< value_type > vector_type;

    vector_type vect ( from, to );
    unsigned short log = 0;
    typename vector_type::size_type v_size = vect.size();
    while ( v_size > 0 ) {
    ++ log;
    v_size >>=1;
    }
    return( has_dublicates_helper_ff( vect.begin(), vect.end() - 1, 2*log )
    );
    }

    // this uses a set, but only checks at the very end:
    // (the worst of all O( N * log( N ) )
    template < typename ConstIter >
    bool has_dublicates_g ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::set< value_type > set_type;
    typedef typename set_type::size_type size_type;

    size_type i = 0;
    set_type s;
    for ( ConstIter iter = from; iter != to; ++iter ) {
    ++ i;
    s.insert( *iter );
    }
    if ( s.size() != i ) {
    return( true );
    }
    return( false );
    }

    // this uses a has table of sets:
    // (implemented for unsigned integer types only)
    template < typename ConstIter >
    bool has_dublicates_h ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::set< value_type > set_type;
    typedef typename set_type::size_type size_type;
    typedef set_type set_array [ 256 ];
    typedef size_type size_array [ 256 ];

    size_array sizes;
    set_array sets;
    for ( unsigned long i = 0; i < 256; ++i ) {
    sizes = 0;
    }
    for ( ConstIter iter = from; iter != to; ++iter ) {
    unsigned int hash = *iter % 256;
    ++sizes[hash];
    sets[hash].insert( *iter );
    if ( sizes[hash] != sets[hash].size() ) {
    return( true );
    }
    }
    return( false );
    }

    // this one is quadratic:
    // but does not need to copy the range
    template < typename ConstIter >
    bool has_dublicates_i ( ConstIter from, ConstIter to ) {
    for ( ConstIter high = from; high < to; ++high ) {
    for ( ConstIter low = from; low < high; ++low ) {
    if ( *low == *high ) {
    return( true );
    }
    }
    }
    return( false );
    }

    // this uses a set:
    template < typename ConstIter >
    bool has_dublicates_j ( ConstIter from, ConstIter to ) {
    typedef typename std::iterator_traits< ConstIter >::value_type value_type;
    typedef typename std::set< value_type > set_type;
    typedef typename set_type::size_type size_type;

    size_type i = 0;
    set_type s;
    for ( ConstIter iter = from; iter != to; ++iter ) {
    ++ i;
    s.insert( *iter );
    // this test could also be done after the loop:
    if ( s.size() != i ) {
    return( true );
    }
    }
    return( false );
    }

    #include <ctime>
    #include <cstdlib>

    class RandomNumberGenerator {
    public:

    RandomNumberGenerator ( unsigned long int seed )
    {
    std::srand( seed );
    }

    RandomNumberGenerator ( void )
    {}

    unsigned long lower ( void ) const {
    return ( 0 );
    }

    unsigned long upper ( void ) const {
    return ( RAND_MAX );
    }

    unsigned long operator() ( void ) {
    return( std::rand() );
    }

    unsigned long int operator() ( unsigned long int bound ) {
    unsigned long int_width = RAND_MAX / bound;
    unsigned long max_valid = int_width * bound;
    unsigned long seed;
    do {
    seed = std::rand();
    } while ( seed >= max_valid );
    return( seed / int_width );
    }

    }; // RandomNumberGeneratorBase

    typedef std::vector< unsigned long > u_vector;

    void print_measurements ( const u_vector & vect ) {
    {
    std::clock_t start = std::clock();
    std::cout << "_a: " << has_dublicates_a( vect.begin(), vect.end() ) <<
    " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    {
    std::clock_t start = std::clock();
    std::cout << "_aa: " << has_dublicates_aa( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    {
    std::clock_t start = std::clock();
    std::cout << "_b: " << has_dublicates_b( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    /*
    {
    std::clock_t start = std::clock();
    std::cout << "_c: " << has_dublicates_c( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    */
    /*
    {
    std::clock_t start = std::clock();
    std::cout << "_d: " << has_dublicates_d( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    */
    /*
    {
    std::clock_t start = std::clock();
    std::cout << "_e: " << has_dublicates_e( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    {
    std::clock_t start = std::clock();
    std::cout << "_ee: " << has_dublicates_ee( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    */
    {
    std::clock_t start = std::clock();
    std::cout << "_f: " << has_dublicates_f( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    {
    std::clock_t start = std::clock();
    std::cout << "_ff: " << has_dublicates_ff( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    /*
    {
    std::clock_t start = std::clock();
    std::cout << "_g: " << has_dublicates_g( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    */
    /*
    {
    std::clock_t start = std::clock();
    std::cout << "_h: " << has_dublicates_h( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    */
    /*
    {
    std::clock_t start = std::clock();
    std::cout << "_i: " << has_dublicates_i( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    */
    /*
    {
    std::clock_t start = std::clock();
    std::cout << "_i: " << has_dublicates_j( vect.begin(), vect.end() )
    << " ";
    std::clock_t stop = std::clock();
    std::cout << stop - start << "\n";
    }
    */
    }

    int main( int argn, char ** args ){
    u_vector vect;
    unsigned long size = std::atoi( args[1] );
    unsigned long max = std::atoi( args[2] ) + 1;
    unsigned long seed = std::atoi( args[3] );
    RandomNumberGenerator R ( seed );
    for ( unsigned long i = 0; i < size; ++i ) {
    vect.push_back( R(max) );
    }
    std::sort ( vect.begin(), vect.end() );
    std::cout << "random, sorte up:\n";
    print_measurements( vect );

    std::reverse( vect.begin(), vect.end() );
    std::cout << "random, sorted down:\n";
    print_measurements( vect );

    std::random_shuffle( vect.begin(), vect.end() );
    std::cout << "random:\n";
    print_measurements( vect );

    vect.clear();
    for ( unsigned long i = 0; i < size; ++i ) {
    vect.push_back( i );
    }
    std::cout << "unique, sorted up\n";
    print_measurements( vect );

    std::reverse( vect.begin(), vect.end() );
    std::cout << "unique, sorted down:\n";
    print_measurements( vect );

    std::random_shuffle( vect.begin(), vect.end() );
    std::cout << "unique, random:\n";
    print_measurements( vect );
    }

    /*
    usage:

    a.out <length> <random_number_range> <random_number_seed>

    */
     
    Kai-Uwe Bux, Sep 1, 2004
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. BCC
    Replies:
    12
    Views:
    844
    Clark Cox
    Feb 6, 2004
  2. pmatos
    Replies:
    6
    Views:
    24,031
  3. Replies:
    8
    Views:
    1,987
    Csaba
    Feb 18, 2006
  4. Javier
    Replies:
    2
    Views:
    603
    James Kanze
    Sep 4, 2007
  5. Rushikesh Joshi
    Replies:
    0
    Views:
    387
    Rushikesh Joshi
    Jul 10, 2004
Loading...

Share This Page