Modal value of an array

datta.abhirup · Mar 29, 2007

Hi

How can I find out the modal value in an array. That is the value
which occurs maximum time in the sequence ..

e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
maximum time 2 occurs in the array. so this function should be able to
return 2 as a result ..

So is there any function in built in python which can do that ?

Thanks

Abhirup

Ben Finney · Mar 29, 2007

[email protected] said:
Hi

How can I find out the modal value in an array. That is the value
which occurs maximum time in the sequence ..

e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
maximum time 2 occurs in the array. so this function should be able to
return 2 as a result ..

That's not the only case though. What do you expect to be returned for
an input of ["eggs", "beans", "beans", "eggs", "spam"] ?

Assuming you want *a* mode value, and any one will do (e.g. any of
"spam", "eggs" or "beans" is okay), I'd write it this way as a first
guess:

>>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
>>> counts = [(foo.count(val), val) for val in set(foo)]
>>> counts [(2, 'eggs'), (1, 'beans'), (4, 'spam')]
>>> sorted(counts)[-1] (4, 'spam')
>>> sorted(counts)[-1][1]

Click to expand...

Click to expand...

'spam'

>>> foo = ["eggs", "beans", "beans", "eggs", "spam"]
>>> counts = [(foo.count(val), val) for val in set(foo)]
>>> sorted(counts)[-1][1]

Click to expand...

Click to expand...

'eggs'

Paddy · Mar 29, 2007

Hi

How can I find out the modal value in an array. That is the value
which occurs maximum time in the sequence ..

e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
maximum time 2 occurs in the array. so this function should be able to
return 2 as a result ..

So is there any function in built in python which can do that ?

Thanks

Abhirup

With the same assumptions as Ben Finney, I came up with this:

import operator
foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
count = {}
for item in foo: count[item] = count.get(item, 0) +1 ....
maxitem = max(count.items(), key= operator.itemgetter(1))
maxitem ('spam', 4)

Click to expand...

Click to expand...

I was trying to minimise the iterations through the list.

- Paddy.

Steven D'Aprano · Mar 29, 2007

Hi

How can I find out the modal value in an array. That is the value
which occurs maximum time in the sequence ..

e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
maximum time 2 occurs in the array. so this function should be able to
return 2 as a result ..

So is there any function in built in python which can do that ?

No. You need to create a frequency table, then do a reverse-lookup on the
frequency table. Assuming your data is small, this should be plenty fast
enough.

def mode(data):
# create a frequency table
freq = {}
for x in data:
freq[x] = freq.get(x, 0) + 1
# find the maximum frequency
F = max(freq.values())
# return the items (one or more) with that frequency
modes = []
for x, f in freq.items():
if f == F:
modes.append(x)
return modes

mode([2,3,2,2,2,4,2,2]) [2]
mode([2,3,2,3,2,3,4,1])

Click to expand...

Click to expand...

[2, 3]

Alex Martelli · Mar 29, 2007

Ben Finney said:
That's not the only case though. What do you expect to be returned for
an input of ["eggs", "beans", "beans", "eggs", "spam"] ?

Assuming you want *a* mode value, and any one will do (e.g. any of
"spam", "eggs" or "beans" is okay), I'd write it this way as a first
guess:

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
counts = [(foo.count(val), val) for val in set(foo)]
counts [(2, 'eggs'), (1, 'beans'), (4, 'spam')]
sorted(counts)[-1] (4, 'spam')
sorted(counts)[-1][1]

Click to expand...

Click to expand...

'spam'

A bit more directly:

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
max(foo, key=foo.count)

Click to expand...

Click to expand...

'spam'

Alex

bearophileHUGS · Mar 29, 2007

Alex Martelli:

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
max(foo, key=foo.count)

Click to expand...

Click to expand...

It's a very nice solution, the shortest too. But I think it's better
to develop your own well tested and efficient stats module (and there
is one already done that can be found around) and import it when you
need functions, instead of using similar onliners or re-writing code.
As you know your solution becomes rather slow if the list is quite
long, and it works with lists only.
This uses more memory but it's probably much faster for longer
interables:

from collections import defaultdict

def mode(seq):
freqs = defaultdict(int)
for el in seq:
freqs[el] += 1
return max(freqs.itervalues())

Generally you may want to know what's the mode element(s) too:

def mode2(seq):
freqs = defaultdict(int)
for el in seq:
freqs[el] += 1
maxfreq = max(freqs.itervalues())
mode_els = [el for el,f in freqs.iteritems() if f == maxfreq]
return maxfreq, mode_els

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
print mode(foo)
print mode2(foo)

Bye,
bearophile

Paddy · Mar 29, 2007

...

That's not the only case though. What do you expect to be returned for
an input of ["eggs", "beans", "beans", "eggs", "spam"] ?

Click to expand...

Assuming you want *a* mode value, and any one will do (e.g. any of
"spam", "eggs" or "beans" is okay), I'd write it this way as a first
guess:

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
counts = [(foo.count(val), val) for val in set(foo)]
counts

Click to expand...

[(2, 'eggs'), (1, 'beans'), (4, 'spam')]

sorted(counts)[-1] (4, 'spam')
sorted(counts)[-1][1]

Click to expand...

'spam'

Click to expand...

A bit more directly:

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
max(foo, key=foo.count)

Click to expand...

Click to expand...

'spam'

Alex

This doesn't call foo.count for duplicate entries by keeping a cache

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
def cachecount(x, cache={}):

Click to expand...

Click to expand...

.... return cache.setdefault(x, foo.count(x))
....
- Paddy.

Alex Martelli · Mar 30, 2007

Paddy said:
A bit more directly:

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
max(foo, key=foo.count)

Click to expand...

'spam'

Alex

Click to expand...

This doesn't call foo.count for duplicate entries by keeping a cache

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
def cachecount(x, cache={}):

Click to expand...

Click to expand...

... return cache.setdefault(x, foo.count(x))
...

If you're willing to do that much extra coding to save some work (while
still being O(N squared)), then the further small extra needed to be
O(N) starts looking good:

counts = collections.defaultdict(int)
for item in foo: counts[item] += 1
max(foo, key=counts.get)

Alex

Paddy · Mar 30, 2007

...

A bit more directly:
foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
max(foo, key=foo.count)
'spam'
Alex

Click to expand...

Click to expand...

This doesn't call foo.count for duplicate entries by keeping a cache

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
def cachecount(x, cache={}):

Click to expand...

... return cache.setdefault(x, foo.count(x))
...

max(foo, key=cachecount) 'spam'
cachecount.func_defaults

Click to expand...

({'eggs': 2, 'beans': 1, 'spam': 4},)

Click to expand...

If you're willing to do that much extra coding to save some work (while
still being O(N squared)), then the further small extra needed to be
O(N) starts looking good:

counts = collections.defaultdict(int)
for item in foo: counts[item] += 1
max(foo, key=counts.get)

Alex

Yeh, My first answer is like that but I had to play around with your
original to try and 'fix' the idea in my head - it might be useful
someday.

- Paddy.

Gabriel Genellina · Mar 30, 2007

On Mar 29, 8:49 am, (e-mail address removed) (Alex Martelli) wrote:

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
max(foo, key=foo.count)

Click to expand...

'spam'

Click to expand...

This doesn't call foo.count for duplicate entries by keeping a cache

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
def cachecount(x, cache={}):

Click to expand...

Click to expand...

... return cache.setdefault(x, foo.count(x))

Unfortunately it does, because all arguments are evaluated *before* a
function call, so you gain nothing.

Paddy · Mar 31, 2007

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
max(foo, key=foo.count)
'spam'

Click to expand...

Click to expand...

This doesn't call foo.count for duplicate entries by keeping a cache

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
def cachecount(x, cache={}):

Click to expand...

... return cache.setdefault(x, foo.count(x))

Click to expand...

Unfortunately it does, because all arguments are evaluated *before* a
function call, so you gain nothing.

I had to experiment to find out what you meant but I finally got it.
that call to foo.count in the setdefault is *always* called. Forgive
my senility.

- Paddy.

Help with array	4	Jan 8, 2023
Array of structs function pointer	10	Jul 16, 2023
Hello guys, how do I do arithmetics with a certain index from an array ? JavaScript	3	Dec 7, 2022
Program to find the largest integer element of an array.	1	Mar 2, 2022
Trying to get the average value of the elements, please help ! JavaScript	3	Dec 13, 2022
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020
Getting incorrect output in finding the maximum pair sum in the given array.	7	Apr 6, 2023
Getting value of instances of variable.	1	Mar 25, 2023

Modal value of an array

datta.abhirup

Ben Finney

Paddy

Steven D'Aprano

Alex Martelli

bearophileHUGS

Paddy

Alex Martelli

Paddy

Gabriel Genellina

Paddy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads