Finding most frequent element in std::vector?

S

Steve555

Hi,

Given a vector, I'm trying to find the most frequently occurring
element, and return it, along with a count of its frequency.

Although I've got a way of doing it below, this gets called millions
of times in an inner loop and I need to speed it up if possible.

I'm looping through each element of the vector, and using std::count
to count it's frequency:

long FindMostFrequentInVector(vector<long> *inVec, long *maxElement)
{
long c, maxFreq=0, e;
for(long i=0; i<inVec->size(); i++)
{
e = (*inVec);
c = count(inVec->begin(), inVec->end(), e);
if(c > maxFreq)
{
* maxElement = e;
maxFreq = c;
}
}
return maxFreq;
}

I can inline it, but for now I'd prefer to keep it in it's own
function; it's the logic I'm more interested in.
For my application, if it makes a difference, there are known
constraints:
1) inVec->size() will _never_ be greater than 10,
2) the elements' values will _always_ be between 0 and 200.
3) it doesn't matter that there may be 2 or more elements with equally
maximum frequency, any one will do.
(in the above example, the last one found will be the one returned.)


On a separate note, when passing arrays or vectors in to a function,
am I right in assuming it's faster to pass a pointer and deference it
with (*inVec) each time?

Thanks

Steve
 
M

Murali

Hi,

Given a vector, I'm trying to find the most frequently occurring
element, and return it, along with a count of its frequency.

Although I've got a way of doing it below, this gets called millions
of times in an inner loop and I need to speed it up if possible.

I'm looping through each element of the vector, and using std::count
to count it's frequency:

long    FindMostFrequentInVector(vector<long> *inVec, long *maxElement)
{
        long c, maxFreq=0, e;
        for(long i=0; i<inVec->size(); i++)
        {
                e = (*inVec);
                c = count(inVec->begin(), inVec->end(), e);
                if(c > maxFreq)
                {
                        * maxElement = e;
                        maxFreq = c;
                }
        }
        return maxFreq;

}

I can inline it, but for now I'd prefer to keep it in it's own
function;  it's the logic I'm more interested in.
For my application, if it makes a difference,  there are known
constraints:
1) inVec->size() will _never_ be greater than 10,
2) the elements' values will _always_ be between 0 and 200.
3) it doesn't matter that there may be 2 or more elements with equally
maximum frequency, any one will do.
(in the above example, the last one found will be the one returned.)

On a separate note, when passing arrays or vectors in to a function,
am I right in assuming it's faster to pass a pointer and deference it
with (*inVec) each time?

Thanks

Steve


Hi,

You can take the advantage of second constraint.

(i) Create a vector freqVec of size 201 and initialize all to 0.
(ii) Traverse over inVec, and increase the frequency of each value, by
doing ++freqVec[(*invec))
(iii) After the second step, freqVec is frequency of value i (i
from 0 to 200)
(iv) Find the max value in freqVec to get the frequently occurring
number

--Murali
 
P

P. Lepin

[snip code]
1) inVec->size() will _never_ be greater than 10,
2) the elements' values will _always_ be between 0 and 200.
3) it doesn't matter that there may be 2 or more elements with equally
maximum frequency, any one will do.
(in the above example, the last one found will be the one returned.)

(i) Create a vector freqVec of size 201 and initialize all to 0.
(ii) Traverse over inVec, and increase the frequency of each value, by
doing ++freqVec[(*invec))
(iii) After the second step, freqVec is frequency of value i (i
from 0 to 200)
(iv) Find the max value in freqVec to get the frequently occurring
number


You don't need to do (ii) and (iv) separately.

For that matter, max over 200 elements would be prohibitively expensive
compared to any operations on a 10-element vector - and I guess the same
applies to re-initialization of 200 * sizeof(long) bytes millions of times
in an inner loop? Perhaps std::map would work better, despite more
expensive lookups?

OP should try and measure IMHO.
 
M

Michael Doubez

Given a vector, I'm trying to find the most frequently occurring
element, and return it, along with a count of its frequency.

Although I've got a way of doing it below, this gets called millions
of times in an inner loop and I need to speed it up if possible.

I'm looping through each element of the vector, and using std::count
to count it's frequency:

You do realize that your algorithm is quadratic. Within a loop, it is
a killer.
long    FindMostFrequentInVector(vector<long> *inVec, long *maxElement)
{
        long c, maxFreq=0, e;
        for(long i=0; i<inVec->size(); i++)
        {
                e = (*inVec);
                c = count(inVec->begin(), inVec->end(), e);
                if(c > maxFreq)
                {
                        * maxElement = e;
                        maxFreq = c;
                }
        }
        return maxFreq;

}

I can inline it, but for now I'd prefer to keep it in it's own
function;  it's the logic I'm more interested in.
For my application, if it makes a difference,  there are known
constraints:
1) inVec->size() will _never_ be greater than 10,
2) the elements' values will _always_ be between 0 and 200.
3) it doesn't matter that there may be 2 or more elements with equally
maximum frequency, any one will do.
(in the above example, the last one found will be the one returned.)


There are a lot of possibilities of algorithm but this look like too
much like homework to give you a plain answer.
On a separate note, when passing arrays or vectors in to a function,
am I right in assuming it's faster to pass a pointer and deference it
with (*inVec) each time?


You mean is it more efficient to pass by value than by reference.
I guess it is in this case.
 
M

Michael Doubez

On a separate note, when passing arrays or vectors in to a function,
am I right in assuming it's faster to pass a pointer and deference it
with (*inVec) each time?


You mean is it more efficient to pass by value than by reference.
I guess it is in this case.


I meant is would be more efficient to pass by reference.

Well, benchmark will tell.
 
B

Bart van Ingen Schenau

Hi,

Given a vector, I'm trying to find the most frequently occurring
element, and return it, along with a count of its frequency.

Although I've got a way of doing it below, this gets called millions
of times in an inner loop and I need to speed it up if possible.

I'm looping through each element of the vector, and using std::count
to count it's frequency:

long    FindMostFrequentInVector(vector<long> *inVec, long *maxElement)
{
        long c, maxFreq=0, e;
        for(long i=0; i<inVec->size(); i++)
        {
                e = (*inVec);
                c = count(inVec->begin(), inVec->end(), e);


Calling count(begin, end) for each element in the vector is about the
most inefficient algorithm that you can have, for two reasons:
1. All elements between begin and the current item are either
different (in which case letting count iterate over them is a waste of
time), or the same (in which case you just repeat the work done for
the first occurrence of this value)
2. If an element has a frequency of N occurrences, you count that
frequency N times.
                if(c > maxFreq)
                {
                        * maxElement = e;
                        maxFreq = c;
                }
        }
        return maxFreq;

}

I can inline it, but for now I'd prefer to keep it in it's own
function;  it's the logic I'm more interested in.
For my application, if it makes a difference,  there are known
constraints:
1) inVec->size() will _never_ be greater than 10,
2) the elements' values will _always_ be between 0 and 200.
3) it doesn't matter that there may be 2 or more elements with equally
maximum frequency, any one will do.
(in the above example, the last one found will be the one returned.)

This certainly makes a difference.
Here is a rewrite of your function, using the second constraint in
particular to allocate an auxiliary buffer (to avoid re-counting
values that we have already processed)

#define MAX_VALUE 200
long FindMostFrequentInVector(const vector<long> &inVec, long
*maxElement)
{
long c, maxFreq=0, e;
bool value_processed[MAX_VALUE+1] = { 0 };

for (vector<long>::const_iterator i = inVec.begin(); i !=
inVec.end(); i++)
{
e = *i;
if ( (e > MAX_VALUE) || (!value_processed[e]) )
{
c = count(i, inVec.end(), e);
if (e <= MAX_VALUE)
{
value_processed[e] = true;
}
if(c > maxFreq)
{
* maxElement = e;
maxFreq = c;
}
}
return maxFreq;
}

This function is most efficient for values smaller than MAX_VALUE,
because then it counts only the first occurrence of each value. For
larger values, later occurrences are counted again, but the first
occurrence will always give the largest count (because only the
occurrences in the remainder of the vector are counted).
You can play with the value of MAX_VALUE to get a different speed/
space trade-off.
On a separate note, when passing arrays or vectors in to a function,
am I right in assuming it's faster to pass a pointer and deference it
with (*inVec) each time?


Passing a pointer will be faster than passing by value, but it will
not be faster than passing by reference (as I have done in my
rewrite).
The advantage of passing a reference is that you get the notational
convenience of a normal object (no need to dereference each time),
combined with the efficiency of passing a pointer.
Thanks

Steve

Bart v Ingen Schenau
 
K

Kai-Uwe Bux

Steve555 said:
Hi,

Given a vector, I'm trying to find the most frequently occurring
element, and return it, along with a count of its frequency.

Although I've got a way of doing it below, this gets called millions
of times in an inner loop and I need to speed it up if possible.
[quadratic time algorithm snipped]

a) You get asymptotically better performance if you do something like this:

1) sort the vector.
2) do one pass over the vector to count the frequencies;
while doing so, keep track of the maximum frequency.

I can inline it, but for now I'd prefer to keep it in it's own
function; it's the logic I'm more interested in.
For my application, if it makes a difference, there are known
constraints:
1) inVec->size() will _never_ be greater than 10,
2) the elements' values will _always_ be between 0 and 200.
3) it doesn't matter that there may be 2 or more elements with equally
maximum frequency, any one will do.
(in the above example, the last one found will be the one returned.)

Another interesting thing to know would be whether the vectors, for which
you want to determine the maximum frequency, are related. You say that you
need to do this for really many vectors and that it happens in an inner
loop. If, from one iteration to the next, the vector changes only sightly,
it might be possible to take advantage of that. E.g., one could think of a
data structure that keeps track of frequencies and updates the maximum upon
change. The cost for such an update could be really small -- much much
smaller than recomputing the frequencies from scratch.

On a separate note, when passing arrays or vectors in to a function,
am I right in assuming it's faster to pass a pointer and deference it
with (*inVec) each time?


Faster than _what_?


Best

Kai-Uwe Bux
 
S

Steve555

Given a vector, I'm trying to find the most frequently occurring
element, and return it, along with a count of its frequency.
Although I've got a way of doing it below, this gets called millions
of times in an inner loop and I need to speed it up if possible.
I'm looping through each element of the vector, and using std::count
to count it's frequency:
long    FindMostFrequentInVector(vector<long> *inVec, long *maxElement)
{
        long c, maxFreq=0, e;
        for(long i=0; i<inVec->size(); i++)
        {
                e = (*inVec);
                c = count(inVec->begin(), inVec->end(), e);


Calling count(begin, end) for each element in the vector is about the
most inefficient algorithm that you can have, for two reasons:
1. All elements between begin and the current item are either
different (in which case letting count iterate over them is a waste of
time), or the same (in which case you just repeat the work done for
the first occurrence of this value)
2. If an element has a frequency of N occurrences, you count that
frequency N times.




                if(c > maxFreq)
                {
                        * maxElement = e;
                        maxFreq = c;
                }
        }
        return maxFreq;

I can inline it, but for now I'd prefer to keep it in it's own
function;  it's the logic I'm more interested in.
For my application, if it makes a difference,  there are known
constraints:
1) inVec->size() will _never_ be greater than 10,
2) the elements' values will _always_ be between 0 and 200.
3) it doesn't matter that there may be 2 or more elements with equally
maximum frequency, any one will do.
(in the above example, the last one found will be the one returned.)

This certainly makes a difference.
Here is a rewrite of your function, using the second constraint in
particular to allocate an auxiliary buffer (to avoid re-counting
values that we have already processed)

#define MAX_VALUE 200
long    FindMostFrequentInVector(const vector<long> &inVec, long
*maxElement)
{
  long c, maxFreq=0, e;
  bool value_processed[MAX_VALUE+1] = { 0 };

  for (vector<long>::const_iterator i = inVec.begin(); i !=
inVec.end(); i++)
  {
    e = *i;
    if ( (e > MAX_VALUE) || (!value_processed[e]) )
    {
      c = count(i, inVec.end(), e);
      if (e <= MAX_VALUE)
      {
         value_processed[e] = true;
      }
      if(c > maxFreq)
      {
        * maxElement = e;
        maxFreq = c;
      }
    }
    return maxFreq;

}

This function is most efficient for values smaller than MAX_VALUE,
because then it counts only the first occurrence of each value. For
larger values, later occurrences are counted again, but the first
occurrence will always give the largest count (because only the
occurrences in the remainder of the vector are counted).
You can play with the value of MAX_VALUE to get a different speed/
space trade-off.


On a separate note, when passing arrays or vectors in to a function,
am I right in assuming it's faster to pass a pointer and deference it
with (*inVec) each time?


Passing a pointer will be faster than passing by value, but it will
not be faster than passing by reference (as I have done in my
rewrite).
The advantage of passing a reference is that you get the notational
convenience of a normal object (no need to dereference each time),
combined with the efficiency of passing a pointer.



Bart v Ingen Schenau


Thanks for the many suggestions, using various ideas from each
(especially using freqArray) I've been able to reduce the time for 1
million runs from 1.91 seconds down to 0.69, so a huge improvement.
(I used the code directly in the end, rather than in a separate
function, but both times reflect this)


// assuming vec already exists
long *ptr, *freqArray = new long[201];
long maxElement, maxFreq;
for(long l=0;l<1000000;l++)
{
//clear the freqArray (using a ptr was quicker than []
ptr = freqArray;
for(long j=0; j<=200; j++) *ptr++ = 0;
maxFreq = 0;
for(long i=0; i<vec.size(); i++)
{
++freqArray[vec];
if(freqArray[vec] > maxFreq)
{
maxElement = vec;
maxFreq = freqArray[vec];
}
}
}

P. Lepin: In this case, map worked out slower, probably the lookup
time.

Michael: Sorry don't know what you mean by "quadratic" in this
context. And at 49 1/4, I'm a bit old for homework ;-) Before, I took
it out of it's own function, passing by reference made a good
difference

Stuart: "Each time, reinitialize only the (at most) ten elements
referred to in
the array". Sounds promising, but how do I know which elements to
reset without building yet another array to keep note of them?

Bart: Like the idea of " c = count(i, inVec.end(), e); " for only
counting the remaining range, but for some reason, although maxElement
is correct, maxFreq is always 1?

Many thanks

Steve
 
A

Andrew Poelstra

Steve555 said:
Thanks for the many suggestions, using various ideas from each
(especially using freqArray) I've been able to reduce the time for 1
million runs from 1.91 seconds down to 0.69, so a huge improvement.
(I used the code directly in the end, rather than in a separate
function, but both times reflect this)


// assuming vec already exists
long *ptr, *freqArray = new long[201];
long maxElement, maxFreq;
for(long l=0;l<1000000;l++)

Ugh. Why would you use l as a variable?
{
//clear the freqArray (using a ptr was quicker than []
ptr = freqArray;
for(long j=0; j<=200; j++) *ptr++ = 0;

Why not just use memset? It would be faster and clearer. Plus
you could drop the pointer. (In C you could just use calloc
initially to get zero'd memory.)
maxFreq = 0;
for(long i=0; i<vec.size(); i++)

You might want to cache the vector size. Probably the
compiler can do this for you, but maybe not.
{
++freqArray[vec];
if(freqArray[vec] > maxFreq)
{
maxElement = vec;
maxFreq = freqArray[vec];
}
}
}

P. Lepin: In this case, map worked out slower, probably the lookup
time.


Well, map also insists on sorting its elements. You could write
your own hash table using linked lists and % as a hash function
and get far better performance.
 
A

Andrew Poelstra

Steve555 said:
// assuming vec already exists
long *ptr, *freqArray = new long[201];
long maxElement, maxFreq;
for(long l=0;l<1000000;l++)
{
//clear the freqArray (using a ptr was quicker than []
ptr = freqArray;
for(long j=0; j<=200; j++) *ptr++ = 0;
maxFreq = 0;
for(long i=0; i<vec.size(); i++)
{
++freqArray[vec];
if(freqArray[vec] > maxFreq)
{
maxElement = vec;
maxFreq = freqArray[vec];
}
}
}


As the size is fixed you *might* see a performance improvement if you make
freqArray an ordinary array on the stack (rather than newing it) but don't
tell James Kanze as he will likely start a pointless tiresome argument.


s/might/probably

Not only will he avoid any overhead caused by allocating memory
from the system (plus any "constructor" the compiler could tack
onto long[]), stack space is far more likely to be in processor
cache and therefore be quicker to access.

It also lets the compiler use CPU registers (though I can't see
why in this case), something that is often nearly impossible to
do with pointers.

Having said that, compilers are smart and CPU's are various. So
take this with a grain of salt.
 
M

Michael Doubez

[snip]

Thanks for the many suggestions, using various ideas from each
(especially using freqArray) I've been able to reduce the time for 1
million runs from 1.91 seconds down to 0.69, so a huge improvement.
(I used the code directly in the end, rather than in a separate
function, but both times reflect this)

// assuming vec already exists
long *ptr, *freqArray = new long[201];
long maxElement, maxFreq;
for(long l=0;l<1000000;l++)
{
  //clear the freqArray (using a ptr was quicker than []
  ptr = freqArray;
  for(long j=0; j<=200; j++) *ptr++ = 0;
  maxFreq = 0;
  for(long i=0; i<vec.size(); i++)
  {
      ++freqArray[vec];
      if(freqArray[vec] > maxFreq)
      {
        maxElement = vec;
        maxFreq = freqArray[vec];
      }
  }

}

P. Lepin:  In this case, map worked out slower, probably the lookup
time.


I guess it is the locality and the dynamic allocation.
Michael: Sorry don't know what you mean by "quadratic" in this
context.

Quadratic means that the complexity is proportional to the square of
the number of elements.

It is as worse as it can get except for exponential or NP-complete
complexity.
And at 49 1/4, I'm a bit old for homework ;-)

Never too late.

[snip]

Out of shear curiosity could you try the following algorithm:

long FindMostFrequentInVector(vector<long> *inVec, long *maxElement)
{
vector<long>::iterator it = inVec->begin();
vector<long>::iterator end = inVec->end();

// sort inVec - best would be for it to be already sorted
std::sort(it,end);

long maxCount = 0;

// assert( !inVec->empty() );
long count = 1;
long element = *it;
while( ++it != end )
{
if( *it == element )
{ // count current element
++count;
}
else
{ // new element value
if( count > maxCount )
{
maxCount = count;
*maxElement = element;
}
element = *it;
count = 1;
}
}

return maxCount;
}
 
S

Steve555

// assuming vec already exists
long *ptr, *freqArray = new long[201];
long maxElement, maxFreq;
for(long l=0;l<1000000;l++)
{
 //clear the freqArray (using a ptr was quicker than []
 ptr = freqArray;
 for(long j=0; j<=200; j++) *ptr++ = 0;
 maxFreq = 0;
 for(long i=0; i<vec.size(); i++)
 {
     ++freqArray[vec];
     if(freqArray[vec] > maxFreq)
     {
maxElement = vec;
maxFreq = freqArray[vec];
     }
 }
}

As the size is fixed you *might* see a performance improvement if you make
freqArray an ordinary array on the stack (rather than newing it) but don't
tell James Kanze as he will likely start a pointless tiresome argument.

s/might/probably

Not only will he avoid any overhead caused by allocating memory
from the system (plus any "constructor" the compiler could tack
onto long[]), stack space is far more likely to be in processor
cache and therefore be quicker to access.

It also lets the compiler use CPU registers (though I can't see
why in this case), something that is often nearly impossible to
do with pointers.

Having said that, compilers are smart and CPU's are various. So
take this with a grain of salt.


Using the stack for freqArray instead of new reduced the time from .
69 to .61

Kai-Uwe: Thanks:

long maxElement, maxFreq, num, lastVal=-1, freq;
for(long l=0;l<1000000;l++)
{
sort(vec.begin(), vec.end());
maxFreq = 0;
for(long i=0; i<vec.size(); i++)
{
if(vec != lastVal)
{
freq = 1;
lastVal = vec;
}
else
freq++;
if(freq > maxFreq)
{
maxElement = vec;
maxFreq = freq;
if(vec.size() - i <= maxFreq)
break;
}
}
}
Reduced the time by about 20% down from .69 to .54

"Another interesting thing to know would be whether the vectors, for
which
you want to determine the maximum frequency, are related."
No, they're coming from tables and are not related.

Stuart:" http://en.wikipedia.org/wiki/Big_O_notation" Thanks. Got it!
As for zeroing the individual elements, memset() is working incredibly
well instead.

Andrew: "Ugh. Why would you use l as a variable? " lol, what do have
against ls!?!?!
More to the point using memset(freqArray, 0, 804); has got it down
from .69 to an incredible .22! Thanks!

Steve
 
A

Andrew Poelstra

Using the stack for freqArray instead of new reduced the time from .
69 to .61

Out of curiousity, how much performance do you actually need?
If you really need to run this function millions of times (and
profiling shows that this function is indeed the bottleneck)
you should probably create your own collection that keeps
track of maximally-occurring elements.

A map of vectors comes to mind.

While in this case it's fine because in generally you want
to avoid heap allocations if you can, we're into the realmn
of micro-optimisations here, which are normally a Bad Thing.
Andrew: "Ugh. Why would you use l as a variable? " lol, what do have
against ls!?!?!
More to the point using memset(freqArray, 0, 804); has got it down
from .69 to an incredible .22! Thanks!

!! this is why profiling is your friend. Turns out the actual
/counting/ stuff was pretty much irrelevent to the overall
performance of the algorithm.

But don't assume sizeof(long) is 4. That's a non-portability
that will screw you over in ways you can't predict.

Instead, use:

long count[201];
memset(count, sizeof count, 0);

Which will work no matter what long is, /and/ no matter what
count is. :)
 
S

Steve555

Using the stack for freqArray instead of new reduced the time from .
69  to .61

Out of curiousity, how much performance do you actually need?
If you really need to run this function millions of times (and
profiling shows that this function is indeed the bottleneck)
you should probably create your own collection that keeps
track of maximally-occurring elements.

A map of vectors comes to mind.

While in this case it's fine because in generally you want
to avoid heap allocations if you can, we're into the realmn
of micro-optimisations here, which are normally a Bad Thing.


Andrew: "Ugh. Why would you use l as a variable? " lol, what do have
against ls!?!?!
More to the point using memset(freqArray, 0, 804);  has got it down
from .69 to an incredible .22! Thanks!

!! this is why profiling is your friend. Turns out the actual
/counting/ stuff was pretty much irrelevent to the overall
performance of the algorithm.

But don't assume sizeof(long) is 4. That's a non-portability
that will screw you over in ways you can't predict.

Instead, use:

  long count[201];
  memset(count, sizeof count, 0);

Which will work no matter what long is, /and/ no matter what
count is. :)

"Out of curiousity, how much performance do you actually need? "

This is enough!

"you should probably create your own collection that keeps
track of maximally-occurring elements.
A map of vectors comes to mind. "

The permutations of different vectors, while not quite 200^10, is
still too huge for maps. (The data represents lexical categories of
chains of 10 English words)
 
A

Andrew Poelstra

I missed this one:

The number one and letter ell look nearly identical. Even
in context, it's confusing to use it as a variable name.
 
S

Steve555

Given a vector, I'm trying to find the most frequently occurring
element, and return it, along with a count of its frequency.
Although I've got a way of doing it below, this gets called millions
of times in an inner loop and I need to speed it up if possible.
I'm looping through each element of the vector, and using std::count
to count it's frequency:
[snip]





Thanks for the many suggestions, using various ideas from each
(especially using freqArray) I've been able to reduce the time for 1
million runs from 1.91 seconds down to 0.69, so a huge improvement.
(I used the code directly in the end, rather than in a separate
function, but both times reflect this)
// assuming vec already exists
long *ptr, *freqArray = new long[201];
long maxElement, maxFreq;
for(long l=0;l<1000000;l++)
{
  //clear the freqArray (using a ptr was quicker than []
  ptr = freqArray;
  for(long j=0; j<=200; j++) *ptr++ = 0;
  maxFreq = 0;
  for(long i=0; i<vec.size(); i++)
  {
      ++freqArray[vec];
      if(freqArray[vec] > maxFreq)
      {
        maxElement = vec;
        maxFreq = freqArray[vec];
      }
  }

P. Lepin:  In this case, map worked out slower, probably the lookup
time.


I guess it is the locality and the dynamic allocation.
Michael: Sorry don't know what you mean by "quadratic" in this
context.

Quadratic means that the complexity is proportional to the square of
the number of elements.

It is as worse as it can get except for exponential or NP-complete
complexity.
And at 49 1/4, I'm a bit old for homework ;-)

Never too late.

[snip]

Out of shear curiosity could you try the following algorithm:

long FindMostFrequentInVector(vector<long> *inVec, long *maxElement)
{
  vector<long>::iterator it  = inVec->begin();
  vector<long>::iterator end = inVec->end();

  // sort inVec - best would be for it to be already sorted
  std::sort(it,end);

  long maxCount = 0;

  // assert( !inVec->empty() );
  long count   = 1;
  long element = *it;
  while( ++it != end )
  {
    if( *it == element )
    { // count current element
      ++count;
    }
    else
    { // new element value
      if( count > maxCount )
      {
        maxCount    = count;
        *maxElement = element;
      }
      element = *it;
      count   = 1;
    }
  }

  return maxCount;

}


Thanks Michael. Running your algorithm (explicitly rather than in a
separate function) gave a time of .51 (compared to original .69 for 1
million loops). I've found (on my system at least) that iterators
carry some overhead compared to simple array indexing. Hence, using an
array buffer to keep track of the counts instead, along with memset()
to quickly clear it, brought the time down to .22.

What's also quite interesting about all the different methods
suggested is the amount of variation from run to run. The ones where I
use iterators or stl algorithms are very consistent (if a little
slower), whereas the best solution (using a freqArray and memset() )
can vary by as much as 60 ms (~25%). That said, it's a bit of an
artificial situation, loading a program, setting timers, looping a
million times and exiting... maybe there's some system crap causing
the variability. Anyway, not a problem, just a curiosity.
(I'm compiling in Xcode on OSX 10.6.2 on a 2.8ghz Intel Mac, timimg
done in microseconds, rounded)

Steve
 
J

Joshua Maurice

Given a vector, I'm trying to find the most frequently occurring
element, and return it, along with a count of its frequency.
Although I've got a way of doing it below, this gets called millions
of times in an inner loop and I need to speed it up if possible.
I'm looping through each element of the vector, and using std::count
to count it's frequency:
Thanks for the many suggestions, using various ideas from each
(especially using freqArray) I've been able to reduce the time for 1
million runs from 1.91 seconds down to 0.69, so a huge improvement.
(I used the code directly in the end, rather than in a separate
function, but both times reflect this)
// assuming vec already exists
long *ptr, *freqArray = new long[201];
long maxElement, maxFreq;
for(long l=0;l<1000000;l++)
{
  //clear the freqArray (using a ptr was quicker than []
  ptr = freqArray;
  for(long j=0; j<=200; j++) *ptr++ = 0;
  maxFreq = 0;
  for(long i=0; i<vec.size(); i++)
  {
      ++freqArray[vec];
      if(freqArray[vec] > maxFreq)
      {
        maxElement = vec;
        maxFreq = freqArray[vec];
      }
  }
}
P. Lepin:  In this case, map worked out slower, probably the lookup
time.

I guess it is the locality and the dynamic allocation.
Quadratic means that the complexity is proportional to the square of
the number of elements.
It is as worse as it can get except for exponential or NP-complete
complexity.
Never too late.

Out of shear curiosity could you try the following algorithm:
long FindMostFrequentInVector(vector<long> *inVec, long *maxElement)
{
  vector<long>::iterator it  = inVec->begin();
  vector<long>::iterator end = inVec->end();
  // sort inVec - best would be for it to be already sorted
  std::sort(it,end);
  long maxCount = 0;
  // assert( !inVec->empty() );
  long count   = 1;
  long element = *it;
  while( ++it != end )
  {
    if( *it == element )
    { // count current element
      ++count;
    }
    else
    { // new element value
      if( count > maxCount )
      {
        maxCount    = count;
        *maxElement = element;
      }
      element = *it;
      count   = 1;
    }
  }
  return maxCount;

Thanks Michael. Running your algorithm (explicitly rather than in a
separate function) gave a time of .51 (compared to original .69 for 1
million loops). I've found (on my system at least) that iterators
carry some overhead compared to simple array indexing. Hence, using an
array buffer to keep track of the counts instead, along with memset()
to quickly clear it, brought the time down to .22.

What's also quite interesting about all the different methods
suggested is the amount of variation from run to run. The ones where I
use iterators or stl algorithms are very consistent (if a little
slower), whereas the best solution (using a freqArray and memset() )
can vary by as much as 60 ms (~25%). That said, it's a bit of an
artificial situation, loading a program, setting timers, looping a
million times and exiting... maybe there's some system crap causing
the variability. Anyway, not a problem, just a curiosity.
(I'm compiling in Xcode  on OSX 10.6.2 on a 2.8ghz Intel Mac, timimg
done in microseconds, rounded)


First, you are in one of the rare situations where std::vector is
legitimately significantly slower than built-in arrays.
vector<int> x(10);
will initialize all of its 10 elements to 0. A built-in array
int x[10];
will not. This is usually not a concern, but in your case it might be.
Zero initializing all the bins of the bin sort might be quite costly.

Second, if you're using microsoft visual studios compiler, consider
turning off checked iterators. In release mode, VS includes some extra
run time checking in its STL iterators.
http://msdn.microsoft.com/en-us/library/aa985965(VS.80).aspx

Otherwise, with a halfway decent optimizing compiler with its
optimizations turned on, there should be no noticeable difference
between built-in arrays and std::vectors.

I would suggest a basic bin sort, or some variation:
- Create a built-in array of bins for the bin sort. (Do not initialize
it. We're trying to save this cost.)
- For each element in the input vector, zero the corresponding bin.
- Loop over the input vector, keeping track of the "best thus far"
value and "number of times seen" for the "best thus far" value. For
each element, increment the corresponding bin, and then update if
necessary the "best thus far" counters.

This has 2 loops of 10 iterations each which is almost certainly
faster than the algorithm with a loop of 200 iterations. However,
std::memset may be significantly faster than a loop of 200 iterations,
so it might be faster to memset all 200 bins to 0 instead of looping
an additional time up front to zero out the 1-10 "will be used" bins.

We traverse the input vector in increasing address order, which caches
tend to like. However, we are hitting the bins in a random order,
which isn't too cache friendly, though I would guess that it's still
faster than our alternatives. Possible alternatives: You might be able
to use a hash table for the bins, though I suspect the extra overhead
of "hashing" would outweigh any benefits, and there's always those
pesky problems of getting a good hash function and the potential
really slow worst case. You might try having a map of pairs of (value,
frequency), or more specifically a sorted array of pair<int, int>. I
suspect that the caching effects with the small input size would
easily favor the sorted built-in array over "Big O"-superior std::map.
A linear search on the sorted array might even outperform a binary
search on the sorted array due to its small size. Hell, it might even
be faster to not sort at all and just do a linear lookup in the bins
for each of the 10 elements of the input vector.

Note specifically that many of the alternatives presented fly in the
face of conventional Big O analysis. However, you specifically said
that your algorithm has these fixed \small\ values, and Big O only
applies when the size is "large enough". Note however that this is not
an excuse to code yourself into a corner if at all possible, so if and
when the range of values becomes "any float", or the input size
becomes "several million", you are not screwed.

As always, the basic "guideline" of optimization is to first write
code "sanely", then measure to find the slow spots, then optimize /
hack the slow spots, measuring all alternatives. I've listed a lot of
alternatives, and I really don't know which would perform the best.
I'd guess that it depends a lot on the hardware and distribution of
input values.
 
J

Joshua Maurice

Given a vector, I'm trying to find the most frequently occurring
element, and return it, along with a count of its frequency.
Although I've got a way of doing it below, this gets called millions
of times in an inner loop and I need to speed it up if possible..
I'm looping through each element of the vector, and using std::count
to count it's frequency:
[snip]
Thanks for the many suggestions, using various ideas from each
(especially using freqArray) I've been able to reduce the time for 1
million runs from 1.91 seconds down to 0.69, so a huge improvement.
(I used the code directly in the end, rather than in a separate
function, but both times reflect this)
// assuming vec already exists
long *ptr, *freqArray = new long[201];
long maxElement, maxFreq;
for(long l=0;l<1000000;l++)
{
  //clear the freqArray (using a ptr was quicker than []
  ptr = freqArray;
  for(long j=0; j<=200; j++) *ptr++ = 0;
  maxFreq = 0;
  for(long i=0; i<vec.size(); i++)
  {
      ++freqArray[vec];
      if(freqArray[vec] > maxFreq)
      {
        maxElement = vec;
        maxFreq = freqArray[vec];
      }
  }
}
P. Lepin:  In this case, map worked out slower, probably the lookup
time.
I guess it is the locality and the dynamic allocation.
Michael: Sorry don't know what you mean by "quadratic" in this
context.
Quadratic means that the complexity is proportional to the square of
the number of elements.
It is as worse as it can get except for exponential or NP-complete
complexity.
And at 49 1/4, I'm a bit old for homework ;-)
Never too late.
[snip]
Out of shear curiosity could you try the following algorithm:
long FindMostFrequentInVector(vector<long> *inVec, long *maxElement)
{
  vector<long>::iterator it  = inVec->begin();
  vector<long>::iterator end = inVec->end();
  // sort inVec - best would be for it to be already sorted
  std::sort(it,end);
  long maxCount = 0;
  // assert( !inVec->empty() );
  long count   = 1;
  long element = *it;
  while( ++it != end )
  {
    if( *it == element )
    { // count current element
      ++count;
    }
    else
    { // new element value
      if( count > maxCount )
      {
        maxCount    = count;
        *maxElement = element;
      }
      element = *it;
      count   = 1;
    }
  }
  return maxCount;
}

Thanks Michael. Running your algorithm (explicitly rather than in a
separate function) gave a time of .51 (compared to original .69 for 1
million loops). I've found (on my system at least) that iterators
carry some overhead compared to simple array indexing. Hence, using an
array buffer to keep track of the counts instead, along with memset()
to quickly clear it, brought the time down to .22.
What's also quite interesting about all the different methods
suggested is the amount of variation from run to run. The ones where I
use iterators or stl algorithms are very consistent (if a little
slower), whereas the best solution (using a freqArray and memset() )
can vary by as much as 60 ms (~25%). That said, it's a bit of an
artificial situation, loading a program, setting timers, looping a
million times and exiting... maybe there's some system crap causing
the variability. Anyway, not a problem, just a curiosity.
(I'm compiling in Xcode  on OSX 10.6.2 on a 2.8ghz Intel Mac, timimg
done in microseconds, rounded)

First, you are in one of the rare situations where std::vector is
legitimately significantly slower than built-in arrays.
  vector<int> x(10);
will initialize all of its 10 elements to 0. A built-in array
  int x[10];
will not. This is usually not a concern, but in your case it might be.
Zero initializing all the bins of the bin sort might be quite costly.

Second, if you're using microsoft visual studios compiler, consider
turning off checked iterators. In release mode, VS includes some extra
run time checking in its STL iterators.
 http://msdn.microsoft.com/en-us/library/aa985965(VS.80).aspx

Otherwise, with a halfway decent optimizing compiler with its
optimizations turned on, there should be no noticeable difference
between built-in arrays and std::vectors.

I would suggest a basic bin sort, or some variation:
- Create a built-in array of bins for the bin sort. (Do not initialize
it. We're trying to save this cost.)
- For each element in the input vector, zero the corresponding bin.
- Loop over the input vector, keeping track of the "best thus far"
value and "number of times seen" for the "best thus far" value. For
each element, increment the corresponding bin, and then update if
necessary the "best thus far" counters.

This has 2 loops of 10 iterations each which is almost certainly
faster than the algorithm with a loop of 200 iterations. However,
std::memset may be significantly faster than a loop of 200 iterations,
so it might be faster to memset all 200 bins to 0 instead of looping
an additional time up front to zero out the 1-10 "will be used" bins.

We traverse the input vector in increasing address order, which caches
tend to like. However, we are hitting the bins in a random order,
which isn't too cache friendly, though I would guess that it's still
faster than our alternatives. Possible alternatives: You might be able
to use a hash table for the bins, though I suspect the extra overhead
of "hashing" would outweigh any benefits, and there's always those
pesky problems of getting a good hash function and the potential
really slow worst case. You might try having a map of pairs of (value,
frequency), or more specifically a sorted array of pair<int, int>. I
suspect that the caching effects with the small input size would
easily favor the sorted built-in array over "Big O"-superior std::map.
A linear search on the sorted array might even outperform a binary
search on the sorted array due to its small size. Hell, it might even
be faster to not sort at all and just do a linear lookup in the bins
for each of the 10 elements of the input vector.

Note specifically that many of the alternatives presented fly in the
face of conventional Big O analysis. However, you specifically said
that your algorithm has these fixed \small\ values, and Big O only
applies when the size is "large enough". Note however that this is not
an excuse to code yourself into a corner if at all possible, so if and
when the range of values becomes "any float", or the input size
becomes "several million", you are not screwed.

As always, the basic "guideline" of optimization is to first write
code "sanely", then measure to find the slow spots, then optimize /
hack the slow spots, measuring all alternatives. I've listed a lot of
alternatives, and I really don't know which would perform the best.
I'd guess that it depends a lot on the hardware and distribution of
input values.


Entertainingly enough, on my windows desktop, my guesses were spot on
(from my quickie testing). I coded up quickie implementations for
each, and barring mistakes on my part, it seems that your best bet for
the common windows desktop is a simple bin sort with two passes over
the input data, the first to zero out the "to be used" bins, and the
second pass to compute the most frequent element (or one of the best
if there's a tie). I'm sure there's plenty of variables I'm not
controlling for, but I think it's a somewhat fair test of the OP's use
case.

In case I made a mistake somewhere, here my results and code, results
first, when run with standard optimization options, including checked
iterators off:

starting tests
get_most_occuring_element_linear_map_lookup 1 seconds, 296
milliseconds
get_most_occuring_element_binary_map_lookup 2 seconds, 922
milliseconds
get_most_occuring_element_bin_sort_two_pass 782 milliseconds
get_most_occuring_element_bin_sort_with_memset 2 seconds, 31
milliseconds

//start code:

#include <windows.h>

#include <algorithm>
#include <iostream>
#include <sstream>
#include <vector>

#include <ctime>
#include <cstdlib>
using namespace std;

typedef unsigned long long my_time_t;

my_time_t convert(FILETIME const & fileTime)
{ //TODO type punning
union { DWORD x[2]; my_time_t y; };
x[0] = fileTime.dwLowDateTime;
x[1] = fileTime.dwHighDateTime;
y *= (my_time_t)100;
return y;
}

my_time_t jGetCurrentTime()
{ SYSTEMTIME currentTime_SYSTEMTIME;
GetSystemTime(&currentTime_SYSTEMTIME);
FILETIME currentTime_FILETIME;
if (SystemTimeToFileTime(&currentTime_SYSTEMTIME,
&currentTime_FILETIME) == 0)
abort();
return convert(currentTime_FILETIME);
}

struct HumanUnderstandableTimeDuration
{ int hours; int minutes; int seconds; int milliseconds; int
nanoseconds; };

HumanUnderstandableTimeDuration
jToHumanUnderstandableTimeDuration(my_time_t x)
{ HumanUnderstandableTimeDuration result;

result.nanoseconds = x % ((my_time_t)1000 * (my_time_t)1000);

result.milliseconds = x / ((my_time_t)1000 * (my_time_t)1000);
result.nanoseconds = x % ((my_time_t)1000 * (my_time_t)1000);

result.seconds = result.milliseconds / (my_time_t)1000;
result.milliseconds = result.milliseconds % (my_time_t)1000;

result.minutes = result.seconds / (my_time_t)60;
result.seconds = result.seconds % (my_time_t)60;

result.hours = result.minutes / (my_time_t)60;
result.minutes = result.minutes % (my_time_t)60;

return result;
}

string stringify(HumanUnderstandableTimeDuration x)
{
bool needsHours = x.hours;
bool needsMinutes = x.minutes || ((x.hours) &&
(x.seconds || x.milliseconds || x.nanoseconds));
bool needsSeconds = x.seconds || ((x.hours || x.minutes)
&& (x.milliseconds || x.nanoseconds));
bool needsMilliSeconds = x.milliseconds || ((x.hours || x.minutes
|| x.seconds) && (x.nanoseconds));
bool needsNanoSeconds = x.nanoseconds;

stringstream ss;
bool needComma = false;
if (needsHours)
{ ss << x.hours << " hours";
needComma = true;
}
if (needsMinutes)
{ if (needComma)
ss << ", ";
ss << x.minutes << " minutes";
needComma = true;
}
if (needsSeconds)
{ if (needComma)
ss << ", ";
ss << x.seconds << " seconds";
needComma = true;
}
if (needsMilliSeconds)
{ if (needComma)
ss << ", ";
ss << x.milliseconds << " milliseconds";
needComma = true;
}
if (needsNanoSeconds)
{ if (needComma)
ss << ", ";
ss << x.nanoseconds << " nanoseconds";
}
if (!needsHours && !needsMinutes && !needsSeconds && !
needsMilliSeconds && !needsNanoSeconds)
ss << "~0";
return ss.str();
}

struct bin_t
{ int label, frequency;
bin_t() {}
bin_t(int label_) : label(label_) {}
bool operator< (bin_t y) const { return label < y.label; }
bool operator== (bin_t y) const { return label == y.label; }
};

struct get_most_occuring_element_linear_map_lookup
{ inline int operator() (vector<int> const& x)
{ bin_t bins[10];
bin_t* bins_start = bins;
bin_t* bins_end = bins;

int best_thus_far = bins[0].label = x[0];
int best_thus_far_frequency = bins[0].frequency = 1;
for (int i=1; i<10; ++i)
{ int ele = x;
bin_t* insert_location = find(bins_start, bins_end,
bin_t(ele));
if (insert_location == bins_end)
{ insert_location->label = ele;
insert_location->frequency = 0;
++bins_end;
}
if (++ insert_location->frequency >
best_thus_far_frequency)
{ best_thus_far = ele;
best_thus_far_frequency = insert_location->frequency;
}
}
return best_thus_far;
}
};

struct get_most_occuring_element_binary_map_lookup
{ inline int operator() (vector<int> const& x)
{ bin_t bins[10];
bin_t* bins_start = bins;
bin_t* bins_end = bins;

int best_thus_far = bins[0].label = x[0];
int best_thus_far_frequency = bins[0].frequency = 1;
for (int i=1; i<10; ++i)
{ int ele = x;
bin_t* insert_location = lower_bound(bins_start, bins_end,
bin_t(ele));

if (insert_location == bins_end)
{ insert_location->label = ele;
insert_location->frequency = 0;
++bins_end;
}else if (insert_location->label != ele)
{ //make room for insert
for (bin_t * b = bins_end;;)
{ if (b == insert_location)
break;
--b;
b[1] = b[0];
}
insert_location->label = ele;
insert_location->frequency = 0;
++bins_end;
}
if (++ insert_location->frequency >
best_thus_far_frequency)
{ best_thus_far = ele;
best_thus_far_frequency = insert_location->frequency;
}
}
return best_thus_far;
}
};

struct get_most_occuring_element_bin_sort_two_pass
{ inline int operator() (vector<int> const& x)
{ int bins[200];

for (int i=1; i<10; ++i)
bins[x] = 0;

int best_thus_far = x[0];
int best_thus_far_frequency = bins[x[0]] = 1;

for (int i=1; i<10; ++i)
{ int ele = x;
if (++ bins[ele] > best_thus_far_frequency)
{ best_thus_far = ele;
best_thus_far_frequency = bins[ele];
}
}
return best_thus_far;
}
};

struct get_most_occuring_element_bin_sort_with_memset
{ inline int operator() (vector<int> const& x)
{ int bins[200];
memset(bins, 0, 200 * sizeof(int));

int best_thus_far = x[0];
int best_thus_far_frequency = bins[x[0]] = 1;

for (int i=1; i<10; ++i)
{ int ele = x;
if (++ bins[ele] > best_thus_far_frequency)
{ best_thus_far = ele;
best_thus_far_frequency = bins[ele];
}
}
return best_thus_far;
}
};

template <typename sorter_t>
void do_test(char const* test_name, int& result, vector<vector<int> >
const& test_data)
{
sorter_t sorter;

//to "warm up" the instruction cache
result += sorter(test_data[0]);

int const num_runs = 1000;

my_time_t startTime = jGetCurrentTime();
for (int runs = 0; runs < num_runs; ++runs)
for (int i=0; i<test_data.size(); ++i)
result += sorter(test_data);
my_time_t endTime = jGetCurrentTime();
HumanUnderstandableTimeDuration diff =
jToHumanUnderstandableTimeDuration(endTime - startTime);
cout << test_name << " " << stringify(diff) << endl;
}

int main()
{
int result = 0;
vector<vector<int> > test_data;
for (int i=0; i<10000; ++i)
{ test_data.push_back(vector<int>());
for (int j=0; j<10; ++j)
test_data.back().push_back(rand() % 200);
}

/*
(not entirely correct) sanity checks that the algorithms are doing
the right thing.
test_data[0][0] = test_data[0][1];
test_data[1][8] = test_data[1][9];
for (int i=0; i<10; ++i)
cout <<
get_most_occuring_element_linear_map_lookup(test_data) << " ";
cout << endl;
for (int i=0; i<10; ++i)
cout <<
get_most_occuring_element_binary_map_lookup(test_data) << " ";
cout << endl;
for (int i=0; i<10; ++i)
cout <<
get_most_occuring_element_bin_sort_two_pass(test_data) << " ";
cout << endl;
*/

cout << "starting tests" << endl;

do_test<get_most_occuring_element_linear_map_lookup>("get_most_occuring_element_linear_map_lookup",
result, test_data);

do_test<get_most_occuring_element_binary_map_lookup>("get_most_occuring_element_binary_map_lookup",
result, test_data);

do_test<get_most_occuring_element_bin_sort_two_pass>("get_most_occuring_element_bin_sort_two_pass",
result, test_data);

do_test<get_most_occuring_element_bin_sort_with_memset>("get_most_occuring_element_bin_sort_with_memset",
result, test_data);
return result;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top