Clustering the keys of a dict according to its values

Discussion in 'Python' started by bearophileHUGS@lycos.com, Nov 14, 2008.

  1. Guest

    Florian Brucker:
    > We may assume that all values in the
    > original dict/list can be used as dict keys.


    Are you sure? Is this for school?

    Bye,
    bearophile
     
    , Nov 14, 2008
    #1
    1. Advertising

  2. alex23 Guest

    Florian Brucker <> wrote:
    > Given a dictionary, I want to create a clustered version of it,
    > collecting keys that have the same value:
    >
    >  >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
    >  >>> cluster(d)
    > {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
    >
    > That is, generate a new dict which holds for each value of the old dict
    > a list of the keys of the old dict that have that very value.


    >>> from collections import defaultdict
    >>> def cluster(d):

    .... if not hasattr(d, 'iteritems'):
    .... d = dict((x,y) for x,y in enumerate(d))
    .... cluster = defaultdict(list)
    .... for key, value in d.iteritems():
    .... cluster[value].append(key)
    .... return cluster
    ....
    >>> d1 = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
    >>> d2 = ['a','a','b','c','b','d','d']
    >>> cluster(d1)

    defaultdict(<type 'list'>, {1: ['a', 'c', 'd'], 2: ['b', 'e'], 3:
    ['f']})
    >>> cluster(d2)

    defaultdict(<type 'list'>, {'a': [0, 1], 'c': [3], 'b': [2, 4], 'd':
    [5, 6]})

    defaultdicts act pretty much exactly like dicts, except for the
    default-value-for-everything behaviour, but if you'd rather have pure
    dicts, just coerce with a dict(cluster) at the end of the function.

    My version converts lists to dicts so the same core behaviour will
    work. At worst it'll step through a sequence twice, but if it's passed
    a dict it'll only do it once, unlike yours' which steps through each
    twice no matter what (in order to obtain the unique values you
    require).

    A side note: if you're using Python2.4+, you can replace your unique()
    function with the built-in set().
     
    alex23, Nov 14, 2008
    #2
    1. Advertising

  3. On Nov 14, 1:16 pm, Florian Brucker <> wrote:
    > Hi everybody!
    >
    > Given a dictionary, I want to create a clustered version of it,
    > collecting keys that have the same value:
    >
    > >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
    > >>> cluster(d)

    > {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
    >
    > That is, generate a new dict which holds for each value of the old dict
    > a list of the keys of the old dict that have that very value.
    >
    > Another requirement is that it should also work on lists, in that case
    > with indices instead of keys. We may assume that all values in the
    > original dict/list can be used as dict keys.
    >
    > Right now I'm doing it like this:
    >
    > def cluster(d):
    > try:
    > # is d a dict?
    > values = unique(d.values())
    > keys = d.keys()
    > except AttributeError:
    > # assume d is a list
    > values = unique(d)
    > keys = range(len(d))
    >
    > clusters = {}
    > for value in values:
    > clusters[value] = filter(lambda v: d[v] == value, keys)
    >
    > return clusters
    >
    > where unique() is from O'Reilly's Python Cookbook and returns a list
    > containing each item of the given list exactly once.
    >
    > Now I'm pretty new to Python and chances are rather high that this could
    > be done prettier and/or more efficient. Therefore any comment on the
    > above code is greatly appreciated.
    >
    > Regards,
    > Florian


    Your implementation is pretty inefficient as it iterates over the
    whole dictionary as many times as there are distinct values in it.
    Here is a simpler solution.

    from collections import defaultdict

    def cluster(d):
    clusters = defaultdict(list)
    for key, val in d.iteritems():
    clusters[val].append(key)
    return clusters


    Or without defaultdict:

    def cluster(d):
    clusters = {}
    for key, val in d.iteritems():
    clusters.setdefault(val, []).append(key)
    return clusters


    --
    Arnaud
     
    Arnaud Delobelle, Nov 14, 2008
    #3
  4. Hi everybody!

    Given a dictionary, I want to create a clustered version of it,
    collecting keys that have the same value:

    >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
    >>> cluster(d)

    {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

    That is, generate a new dict which holds for each value of the old dict
    a list of the keys of the old dict that have that very value.

    Another requirement is that it should also work on lists, in that case
    with indices instead of keys. We may assume that all values in the
    original dict/list can be used as dict keys.

    Right now I'm doing it like this:

    def cluster(d):
    try:
    # is d a dict?
    values = unique(d.values())
    keys = d.keys()
    except AttributeError:
    # assume d is a list
    values = unique(d)
    keys = range(len(d))

    clusters = {}
    for value in values:
    clusters[value] = filter(lambda v: d[v] == value, keys)

    return clusters

    where unique() is from O'Reilly's Python Cookbook and returns a list
    containing each item of the given list exactly once.

    Now I'm pretty new to Python and chances are rather high that this could
    be done prettier and/or more efficient. Therefore any comment on the
    above code is greatly appreciated.


    Regards,
    Florian
     
    Florian Brucker, Nov 14, 2008
    #4
  5. Guest

    Not much tested:

    from collections import defaultdict

    def cluster(pairs):
    """
    >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
    >>> cluster(d) == {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

    True
    >>> p = [1, 2, 1, 1, 2, 3]
    >>> cluster(p) == {1: [0, 2, 3], 2: [1, 4], 3: [5]}

    True
    """
    d = defaultdict(list)
    if isinstance(pairs, list):
    for k, v in enumerate(pairs):
    d[v].append(k)
    else:
    for k, v in pairs.iteritems():
    d[v].append(k)
    return d

    if __name__ == "__main__":
    import doctest
    doctest.testmod()
    print "Doctests finished.\n"

    Bye,
    bearophile
     
    , Nov 14, 2008
    #5
  6. Guest

    Alternative version:

    def cluster(data):
    d = defaultdict(list)
    pairs = enumerate(data) if isinstance(data, list) else
    data.iteritems()
    for k, v in pairs:
    d[v].append(k)
    return d

    Bye,
    bearophile
     
    , Nov 14, 2008
    #6
  7. alex23 Guest

    On Nov 14, 11:24 pm, wrote:
    > Alternative version:
    >
    > def cluster(data):
    >     d = defaultdict(list)
    >     pairs = enumerate(data) if isinstance(data, list) else
    > data.iteritems()
    >     for k, v in pairs:
    >         d[v].append(k)
    >     return d
    >
    > Bye,
    > bearophile


    Very nice, +1.
     
    alex23, Nov 14, 2008
    #7
  8. > Are you sure? Is this for school?

    Yes, I'm pretty sure (the values of the original dict are integers), and
    no, this isn't for school :) If the "We may assume ..." made you think
    so, I guess that's the way everybody formulates requirements after
    having read to many math papers :D

    If it is of interest to you: I'm trying to reproduce a certain sorted
    version of the keys of a dict (sorted according to their value) where
    the values are not unique, and thus the sorted version is not unique,
    either. All I have left of the original sorting is some statistics, so I
    can check for every sorted version if it's the same as the original. If
    I have the clustering I'm talking about getting all the valid sorted
    lists is a matter of concatenating permutations of the clusters.


    Regards,
    Florian
     
    Florian Brucker, Nov 14, 2008
    #8
  9. Florian Brucker a écrit :
    > Hi everybody!
    >
    > Given a dictionary, I want to create a clustered version of it,
    > collecting keys that have the same value:
    >
    > >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
    > >>> cluster(d)

    > {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
    >
    > That is, generate a new dict which holds for each value of the old dict
    > a list of the keys of the old dict that have that very value.
    >
    > Another requirement is that it should also work on lists, in that case
    > with indices instead of keys. We may assume that all values in the
    > original dict/list can be used as dict keys.


    from collections import defaultdict

    def cluster(source):
    iteritems = getattr(
    source, "iteritems",
    lambda : enumerate(source)
    )
    index = defaultdict(list)
    for k, v in iteritems():
    index[k].append(v)
    return index

    HTH
     
    Bruno Desthuilliers, Nov 14, 2008
    #9
  10. a écrit :
    > Alternative version:
    >
    > def cluster(data):
    > d = defaultdict(list)
    > pairs = enumerate(data) if isinstance(data, list) else
    > data.iteritems()



    What is data is another type of sequence or iterable ?-)



    > for k, v in pairs:
    > d[v].append(k)
    > return d
    >
    > Bye,
    > bearophile
     
    Bruno Desthuilliers, Nov 14, 2008
    #10
  11. Mark Wooding Guest

    Florian Brucker <> wrote:

    > That is, generate a new dict which holds for each value of the old
    > dict a list of the keys of the old dict that have that very value.
    > Another requirement is that it should also work on lists, in that case
    > with indices instead of keys. We may assume that all values in the
    > original dict/list can be used as dict keys.


    import itertools

    def invert(d):
    def hack(d):
    try:
    return d.iteritems()
    except AttributeError:
    return itertools.izip(itertools.count(), d)
    dd = {}
    for k, v in hack(d):
    dd.setdefault(v, []).append(k)
    return dd

    (It's the setdefault trick which is the important part.)

    -- [mdw]
     
    Mark Wooding, Nov 14, 2008
    #11
  12. Wow, thanks everybody! There's a lot to learn for me from these examples...


    Enjoy your weekend!
    Florian

    Florian Brucker wrote:
    > Hi everybody!
    >
    > Given a dictionary, I want to create a clustered version of it,
    > collecting keys that have the same value:
    >
    > >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
    > >>> cluster(d)

    > {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
    >
    > That is, generate a new dict which holds for each value of the old dict
    > a list of the keys of the old dict that have that very value.
    >
    > Another requirement is that it should also work on lists, in that case
    > with indices instead of keys. We may assume that all values in the
    > original dict/list can be used as dict keys.
    >
    > Right now I'm doing it like this:
    >
    > def cluster(d):
    > try:
    > # is d a dict?
    > values = unique(d.values())
    > keys = d.keys()
    > except AttributeError:
    > # assume d is a list
    > values = unique(d)
    > keys = range(len(d))
    >
    > clusters = {}
    > for value in values:
    > clusters[value] = filter(lambda v: d[v] == value, keys)
    >
    > return clusters
    >
    > where unique() is from O'Reilly's Python Cookbook and returns a list
    > containing each item of the given list exactly once.
    >
    > Now I'm pretty new to Python and chances are rather high that this could
    > be done prettier and/or more efficient. Therefore any comment on the
    > above code is greatly appreciated.
    >
    >
    > Regards,
    > Florian
     
    Florian Brucker, Nov 15, 2008
    #12
  13. Guest

    Bruno Desthuilliers:
    > What is data is another type of sequence or iterable ?-)<


    The original problem statement was:

    Florian Brucker:
    >Given a dictionary, I want to create a clustered version of it, collecting keys that have the same value: [...] Another requirement is that it should also work on lists, in that case with indices instead of keys. We may assume that all values in the original dict/list can be used as dict keys.<


    If the problem changes, then the code has to/can change. When you
    write code it's better to avoid over-generalization (this is also a
    rule in Agile programming practices).

    Bye,
    bearophile
     
    , Nov 16, 2008
    #13
  14. Florian Brucker wrote:

    > Florian Brucker wrote:
    >> Hi everybody!
    >>
    >> Given a dictionary, I want to create a clustered version of it,
    >> collecting keys that have the same value:
    >>
    >> >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
    >> >>> cluster(d)

    >> {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
    >>
    >> That is, generate a new dict which holds for each value of the old
    >> dict a list of the keys of the old dict that have that very value.
    >>
    >> Another requirement is that it should also work on lists, in that case
    >> with indices instead of keys. We may assume that all values in the
    >> original dict/list can be used as dict keys.
    >>


    [...]

    > Wow, thanks everybody! There's a lot to learn for me from these

    examples...
    >
    >



    d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}

    from itertools import groupby

    d2 = dict(
    (key, [G[1] for G in g]) for (key, g) in
    groupby(
    sorted( (val, key) for (key, val) in d.iteritems() ),
    lambda X: X[0]
    )
    )

    print d2


    {1: ['a', 'c', 'd'], 2: ['b', 'e'], 3: ['f']}


    ;-) *ducks*

    G.
     
    Gerard flanagan, Nov 16, 2008
    #14
  15. a écrit :
    > Bruno Desthuilliers:
    >> What is data is another type of sequence or iterable ?-)<

    >
    > The original problem statement was:


    I did read it, thanks.

    >
    > If the problem changes, then the code has to/can change. When you
    > write code it's better to avoid over-generalization


    It's not a problem of "over-generalization", it's a problem of not being
    _uselessly_ restrictive. Your solution will fail with any non-list
    sequence, for no good reason.

    > (this is also a
    > rule in Agile programming practices).


    The rational behind the YAGNI rule is to avoid wasting time on useless
    generalizations. This does not apply here since the more generic
    solution doesn't take more time to write or maintain.
     
    Bruno Desthuilliers, Nov 16, 2008
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Menghan Zheng
    Replies:
    1
    Views:
    271
    alex23
    Apr 20, 2010
  2. Cameron Simpson
    Replies:
    6
    Views:
    352
    alex23
    Apr 21, 2010
  3. Ethan Furman

    Python 3: dict & dict.keys()

    Ethan Furman, Jul 24, 2013, in forum: Python
    Replies:
    4
    Views:
    212
    Steven D'Aprano
    Jul 25, 2013
  4. Peter Otten

    Re: Python 3: dict & dict.keys()

    Peter Otten, Jul 24, 2013, in forum: Python
    Replies:
    1
    Views:
    93
    Neil Cerutti
    Jul 24, 2013
  5. Oscar Benjamin

    Re: Python 3: dict & dict.keys()

    Oscar Benjamin, Jul 24, 2013, in forum: Python
    Replies:
    0
    Views:
    105
    Oscar Benjamin
    Jul 24, 2013
Loading...

Share This Page