finding most common elements between thousands of multiple arrays.

R

Raymond Hettinger

[Scott David Daniels]
def most_frequent(arr, N):
     '''Return the top N (freq, val) elements in arr'''
     counted = frequency(arr) # get an iterator for freq-val pairs
     heap = []
     # First, just fill up the array with the first N distinct
     for i in range(N):
         try:
             heap.append(counted.next())
         except StopIteration:
             break # If we run out here, no need for a heap
     else:
         # more to go, switch to a min-heap, and replace the least
         # element every time we find something better
         heapq.heapify(heap)
         for pair in counted:
             if pair > heap[0]:
                 heapq.heapreplace(heap, pair)
     return sorted(heap, reverse=True) # put most frequent first.

In Py2.4 and later, see heapq.nlargest().
In Py3.1, see collections.Counter(data).most_common(n)


Raymond
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top