Modal value of an array

Discussion in 'Python' started by datta.abhirup@gmail.com, Mar 29, 2007.

  1. Guest

    Hi

    How can I find out the modal value in an array. That is the value
    which occurs maximum time in the sequence ..

    e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
    maximum time 2 occurs in the array. so this function should be able to
    return 2 as a result ..

    So is there any function in built in python which can do that ?


    Thanks

    Abhirup
    , Mar 29, 2007
    #1
    1. Advertising

  2. Ben Finney Guest

    "" <> writes:

    > Hi
    >
    > How can I find out the modal value in an array. That is the value
    > which occurs maximum time in the sequence ..
    >
    > e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
    > maximum time 2 occurs in the array. so this function should be able to
    > return 2 as a result ..


    That's not the only case though. What do you expect to be returned for
    an input of ["eggs", "beans", "beans", "eggs", "spam"] ?

    Assuming you want *a* mode value, and any one will do (e.g. any of
    "spam", "eggs" or "beans" is okay), I'd write it this way as a first
    guess:

    >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    >>> counts = [(foo.count(val), val) for val in set(foo)]
    >>> counts

    [(2, 'eggs'), (1, 'beans'), (4, 'spam')]
    >>> sorted(counts)[-1]

    (4, 'spam')
    >>> sorted(counts)[-1][1]

    'spam'

    >>> foo = ["eggs", "beans", "beans", "eggs", "spam"]
    >>> counts = [(foo.count(val), val) for val in set(foo)]
    >>> sorted(counts)[-1][1]

    'eggs'

    --
    \ "Anger makes dull men witty, but it keeps them poor." -- |
    `\ Elizabeth Tudor |
    _o__) |
    Ben Finney
    Ben Finney, Mar 29, 2007
    #2
    1. Advertising

  3. Paddy Guest

    On Mar 29, 4:40 am, ""
    <> wrote:
    > Hi
    >
    > How can I find out the modal value in an array. That is the value
    > which occurs maximum time in the sequence ..
    >
    > e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
    > maximum time 2 occurs in the array. so this function should be able to
    > return 2 as a result ..
    >
    > So is there any function in built in python which can do that ?
    >
    > Thanks
    >
    > Abhirup


    With the same assumptions as Ben Finney, I came up with this:

    >>> import operator
    >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    >>> count = {}
    >>> for item in foo: count[item] = count.get(item, 0) +1

    ....
    >>> maxitem = max(count.items(), key= operator.itemgetter(1))
    >>> maxitem

    ('spam', 4)
    >>>


    I was trying to minimise the iterations through the list.

    - Paddy.
    Paddy, Mar 29, 2007
    #3
  4. On Wed, 28 Mar 2007 20:40:22 -0700, wrote:

    > Hi
    >
    > How can I find out the modal value in an array. That is the value
    > which occurs maximum time in the sequence ..
    >
    > e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
    > maximum time 2 occurs in the array. so this function should be able to
    > return 2 as a result ..
    >
    > So is there any function in built in python which can do that ?


    No. You need to create a frequency table, then do a reverse-lookup on the
    frequency table. Assuming your data is small, this should be plenty fast
    enough.

    def mode(data):
    # create a frequency table
    freq = {}
    for x in data:
    freq[x] = freq.get(x, 0) + 1
    # find the maximum frequency
    F = max(freq.values())
    # return the items (one or more) with that frequency
    modes = []
    for x, f in freq.items():
    if f == F:
    modes.append(x)
    return modes

    >>> mode([2,3,2,2,2,4,2,2])

    [2]
    >>> mode([2,3,2,3,2,3,4,1])

    [2, 3]


    --
    Steven.
    Steven D'Aprano, Mar 29, 2007
    #4
  5. Ben Finney <> wrote:
    ...
    > That's not the only case though. What do you expect to be returned for
    > an input of ["eggs", "beans", "beans", "eggs", "spam"] ?
    >
    > Assuming you want *a* mode value, and any one will do (e.g. any of
    > "spam", "eggs" or "beans" is okay), I'd write it this way as a first
    > guess:
    >
    > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > >>> counts = [(foo.count(val), val) for val in set(foo)]
    > >>> counts

    > [(2, 'eggs'), (1, 'beans'), (4, 'spam')]
    > >>> sorted(counts)[-1]

    > (4, 'spam')
    > >>> sorted(counts)[-1][1]

    > 'spam'


    A bit more directly:

    >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    >>> max(foo, key=foo.count)

    'spam'


    Alex
    Alex Martelli, Mar 29, 2007
    #5
  6. Guest

    Alex Martelli:
    > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > >>> max(foo, key=foo.count)


    It's a very nice solution, the shortest too. But I think it's better
    to develop your own well tested and efficient stats module (and there
    is one already done that can be found around) and import it when you
    need functions, instead of using similar onliners or re-writing code.
    As you know your solution becomes rather slow if the list is quite
    long, and it works with lists only.
    This uses more memory but it's probably much faster for longer
    interables:

    from collections import defaultdict

    def mode(seq):
    freqs = defaultdict(int)
    for el in seq:
    freqs[el] += 1
    return max(freqs.itervalues())


    Generally you may want to know what's the mode element(s) too:

    def mode2(seq):
    freqs = defaultdict(int)
    for el in seq:
    freqs[el] += 1
    maxfreq = max(freqs.itervalues())
    mode_els = [el for el,f in freqs.iteritems() if f == maxfreq]
    return maxfreq, mode_els

    foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    print mode(foo)
    print mode2(foo)

    Bye,
    bearophile
    , Mar 29, 2007
    #6
  7. Paddy Guest

    On Mar 29, 8:49 am, (Alex Martelli) wrote:
    > Ben Finney <> wrote:
    >
    > ...
    >
    > > That's not the only case though. What do you expect to be returned for
    > > an input of ["eggs", "beans", "beans", "eggs", "spam"] ?

    >
    > > Assuming you want *a* mode value, and any one will do (e.g. any of
    > > "spam", "eggs" or "beans" is okay), I'd write it this way as a first
    > > guess:

    >
    > > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > > >>> counts = [(foo.count(val), val) for val in set(foo)]
    > > >>> counts

    > > [(2, 'eggs'), (1, 'beans'), (4, 'spam')]
    > > >>> sorted(counts)[-1]

    > > (4, 'spam')
    > > >>> sorted(counts)[-1][1]

    > > 'spam'

    >
    > A bit more directly:
    >
    > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > >>> max(foo, key=foo.count)

    >
    > 'spam'
    >
    > Alex


    This doesn't call foo.count for duplicate entries by keeping a cache

    >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    >>> def cachecount(x, cache={}):

    .... return cache.setdefault(x, foo.count(x))
    ....
    >>> max(foo, key=cachecount)

    'spam'
    >>> cachecount.func_defaults

    ({'eggs': 2, 'beans': 1, 'spam': 4},)
    >>>


    - Paddy.
    Paddy, Mar 29, 2007
    #7
  8. Paddy <> wrote:
    ...
    > > A bit more directly:
    > >
    > > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > > >>> max(foo, key=foo.count)

    > >
    > > 'spam'
    > >
    > > Alex

    >
    > This doesn't call foo.count for duplicate entries by keeping a cache
    >
    > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > >>> def cachecount(x, cache={}):

    > ... return cache.setdefault(x, foo.count(x))
    > ...
    > >>> max(foo, key=cachecount)

    > 'spam'
    > >>> cachecount.func_defaults

    > ({'eggs': 2, 'beans': 1, 'spam': 4},)
    > >>>


    If you're willing to do that much extra coding to save some work (while
    still being O(N squared)), then the further small extra needed to be
    O(N) starts looking good:

    counts = collections.defaultdict(int)
    for item in foo: counts[item] += 1
    max(foo, key=counts.get)


    Alex
    Alex Martelli, Mar 30, 2007
    #8
  9. Paddy Guest

    On Mar 30, 2:58 am, (Alex Martelli) wrote:
    > Paddy <> wrote:
    >
    > ...
    >
    >
    >
    > > > A bit more directly:

    >
    > > > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > > > >>> max(foo, key=foo.count)

    >
    > > > 'spam'

    >
    > > > Alex

    >
    > > This doesn't call foo.count for duplicate entries by keeping a cache

    >
    > > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > > >>> def cachecount(x, cache={}):

    > > ... return cache.setdefault(x, foo.count(x))
    > > ...
    > > >>> max(foo, key=cachecount)

    > > 'spam'
    > > >>> cachecount.func_defaults

    > > ({'eggs': 2, 'beans': 1, 'spam': 4},)

    >
    > If you're willing to do that much extra coding to save some work (while
    > still being O(N squared)), then the further small extra needed to be
    > O(N) starts looking good:
    >
    > counts = collections.defaultdict(int)
    > for item in foo: counts[item] += 1
    > max(foo, key=counts.get)
    >
    > Alex


    Yeh, My first answer is like that but I had to play around with your
    original to try and 'fix' the idea in my head - it might be useful
    someday.
    :)

    - Paddy.
    Paddy, Mar 30, 2007
    #9
  10. En Thu, 29 Mar 2007 19:44:56 -0300, Paddy <>
    escribió:

    > On Mar 29, 8:49 am, (Alex Martelli) wrote:


    >> >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    >> >>> max(foo, key=foo.count)

    >>
    >> 'spam'

    >
    > This doesn't call foo.count for duplicate entries by keeping a cache
    >
    >>>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    >>>> def cachecount(x, cache={}):

    > ... return cache.setdefault(x, foo.count(x))


    Unfortunately it does, because all arguments are evaluated *before* a
    function call, so you gain nothing.

    --
    Gabriel Genellina
    Gabriel Genellina, Mar 30, 2007
    #10
  11. Paddy Guest

    On Mar 30, 10:17 pm, "Gabriel Genellina" <>
    wrote:
    > En Thu, 29 Mar 2007 19:44:56 -0300, Paddy <>
    > escribió:
    >
    > > On Mar 29, 8:49 am, (Alex Martelli) wrote:
    > >> >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > >> >>> max(foo, key=foo.count)

    >
    > >> 'spam'

    >
    > > This doesn't call foo.count for duplicate entries by keeping a cache

    >
    > >>>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
    > >>>> def cachecount(x, cache={}):

    > > ... return cache.setdefault(x, foo.count(x))

    >
    > Unfortunately it does, because all arguments are evaluated *before* a
    > function call, so you gain nothing.
    >
    > --
    > Gabriel Genellina


    I had to experiment to find out what you meant but I finally got it.
    that call to foo.count in the setdefault is *always* called. Forgive
    my senility.

    - Paddy.
    Paddy, Mar 31, 2007
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Leila
    Replies:
    0
    Views:
    393
    Leila
    Apr 26, 2005
  2. Matt
    Replies:
    1
    Views:
    3,126
    Whitecrest
    Jun 1, 2004
  3. Don
    Replies:
    0
    Views:
    341
  4. Matt
    Replies:
    0
    Views:
    213
  5. gopal srinivasan
    Replies:
    0
    Views:
    219
    gopal srinivasan
    Nov 5, 2004
Loading...

Share This Page