Re: sorting on keys in a list of dicts

Discussion in 'Python' started by Jp Calderone, Jan 6, 2005.

  1. Jp Calderone

    Jp Calderone Guest

    On Thu, 06 Jan 2005 15:31:22 +0100, J Berends <j.p.t.j.berends@[n0sp4m].nl> wrote:
    >Suppose I have a list of dictionaries and each dict has a common keyname
    > with a (sortable) value in it.
    >
    > How can I shuffle their position in the list in such way that they
    > become sorted.
    >


    In Python 2.4,

    import operator
    L.sort(key=operator.itemgetter(key))

    In Python 2.3,

    L2 = [(d[key], i, d) for (i, d) in enumerate(L)]
    L2.sort()
    L = [d for (v, i, d) in L2]

    Jp
     
    Jp Calderone, Jan 6, 2005
    #1
    1. Advertising

  2. Jp Calderone

    J Berends Guest

    Jp Calderone wrote:
    > On Thu, 06 Jan 2005 15:31:22 +0100, J Berends <j.p.t.j.berends@[n0sp4m].nl> wrote:
    >
    >>Suppose I have a list of dictionaries and each dict has a common keyname
    >>with a (sortable) value in it.
    >>
    >>How can I shuffle their position in the list in such way that they
    >>become sorted.
    >>

    >
    >
    > In Python 2.4,
    >
    > import operator
    > L.sort(key=operator.itemgetter(key))
    >
    > In Python 2.3,
    >
    > L2 = [(d[key], i, d) for (i, d) in enumerate(L)]
    > L2.sort()
    > L = [d for (v, i, d) in L2]
    >
    > Jp

    the 2.3 version seems to work quite nicely! thanks. Using the 2.3 for
    version compatibility. Thanks!
     
    J Berends, Jan 6, 2005
    #2
    1. Advertising

  3. Jp Calderone

    Jeff Shannon Guest

    Jp Calderone wrote:

    > L2 = [(d[key], i, d) for (i, d) in enumerate(L)]
    > L2.sort()
    > L = [d for (v, i, d) in L2]


    Out of curiosity, any reason that you're including the index? I'd
    have expected to just do

    L2 = [(d[key], d) for d in L]
    L2.sort()
    L = [d for (v, d) in L2]

    I suppose that your version has the virtue that, if the sortkey value
    is equal, items retain the order that they were in the original list,
    whereas my version will sort them into an essentially arbitrary order.
    Is there anything else that I'm missing here?

    Jeff Shannon
    Technician/Programmer
    Credit International
     
    Jeff Shannon, Jan 6, 2005
    #3
  4. Jp Calderone

    Nick Coghlan Guest

    Jeff Shannon wrote:
    > I suppose that your version has the virtue that, if the sortkey value is
    > equal, items retain the order that they were in the original list,
    > whereas my version will sort them into an essentially arbitrary order.
    > Is there anything else that I'm missing here?


    Stability in sorting is a property not to be sneezed at - it means switching to
    sorting by a second key gives the effect of "sort by key 1, then by key 2",
    whereas that doesn't hold with an unstable sort algorithm. If you've ever used
    an application with an unstable sorting process and that only allows sorting a
    table on one column at a time, you'll appreciate the frustration that can cause :)

    Also, it's required to match the behaviour of the Python 2.4 version (which gets
    to take advantage of the stability of the builtin sort).

    Cheers,
    Nick.

    --
    Nick Coghlan | | Brisbane, Australia
    ---------------------------------------------------------------
    http://boredomandlaziness.skystorm.net
     
    Nick Coghlan, Jan 7, 2005
    #4
  5. Jp Calderone

    It's me Guest

    What does it mean by "stability in sorting"?

    Can somebody please give a sample for using the code posted? I am a little
    lost here and I like to know more about the use of keys....

    Thanks,

    "Nick Coghlan" <> wrote in message
    news:...
    > Jeff Shannon wrote:
    > > I suppose that your version has the virtue that, if the sortkey value is
    > > equal, items retain the order that they were in the original list,
    > > whereas my version will sort them into an essentially arbitrary order.
    > > Is there anything else that I'm missing here?

    >
    > Stability in sorting is a property not to be sneezed at - it means

    switching to
    > sorting by a second key gives the effect of "sort by key 1, then by key

    2",
    > whereas that doesn't hold with an unstable sort algorithm. If you've ever

    used
    > an application with an unstable sorting process and that only allows

    sorting a
    > table on one column at a time, you'll appreciate the frustration that can

    cause :)
    >
    > Also, it's required to match the behaviour of the Python 2.4 version

    (which gets
    > to take advantage of the stability of the builtin sort).
    >
    > Cheers,
    > Nick.
    >
    > --
    > Nick Coghlan | | Brisbane, Australia
    > ---------------------------------------------------------------
    > http://boredomandlaziness.skystorm.net
     
    It's me, Jan 7, 2005
    #5
  6. Jp Calderone

    Craig Ringer Guest

    On Sat, 2005-01-08 at 00:26, It's me wrote:
    > What does it mean by "stability in sorting"?


    If I understand correctly, it means that when two sorts are performed in
    sequence, the keys that are equal to the second sort end up ordered the
    way they were left by the first sort.

    I'm far from certain of this, but at least I'm presenting an opportunity
    for someone to yell "no, you're wrong!" and in the process definitively
    answer the question.

    For example, given the list:

    ..>>> l = [(1,2), (8,2), (2,2), (3,2), (4,3), (5,3), (8,9)]

    if we sort by the first element of each tuple then the second (the
    default), we get:

    ..>>> l.sort()
    ..>>> l
    [(1, 2), (2, 2), (3, 2), (4, 3), (5, 3), (8, 2), (8, 9)]

    Now, if we sort based on the second element we get:

    ..>>> def seconditem(x):
    ..... return x[1]
    .....
    ..>>> l.sort(key=seconditem)
    ..>>> l
    [(1, 2), (2, 2), (3, 2), (8, 2), (4, 3), (5, 3), (8, 9)]

    You'll note that there are several correct answers to the request "sort
    the list 'l' by the second element of each item", including:

    [(1, 2), (2, 2), (3, 2), (8, 2), (4, 3), (5, 3), (8, 9)]
    [(2, 2), (1, 2), (8, 2), (3, 2), (4, 3), (5, 3), (8, 9)]
    [(1, 2), (2, 2), (3, 2), (8, 2), (5, 3), (4, 3), (8, 9)]

    and many others. Because we didn't specify that the first item in the
    value tuples should be used in the sort key, so long as the second key
    is equal for a group of items it doesn't matter what order items in that
    group appear in.

    Python (at least 2.4), however, returns those groups where the order
    isn't defined in the same order they were before the sort. Look at this,
    for example:

    ..>>> l.sort()
    ..>>> l.reverse()
    ..>>> l
    [(8, 9), (8, 2), (5, 3), (4, 3), (3, 2), (2, 2), (1, 2)]
    ..>>> l.sort(key=seconditem)
    ..>>> l
    [(8, 2), (3, 2), (2, 2), (1, 2), (5, 3), (4, 3), (8, 9)]

    See how the exact same sort command was used this time around, but
    because the list was reverse-sorted first, the elements are in reverse
    order by first item when the second item is equal?

    In the first case we used the same result as the stable sort could be
    obtained with:

    ..>>> def revitem(x):
    ..... return (x[1], x[0])
    >>> l.sort(key=revitem)
    >>> l

    [(1, 2), (2, 2), (3, 2), (8, 2), (4, 3), (5, 3), (8, 9)]

    (in other words, saying "use the value tuple as the sort key, but sort
    by the second element before the first")

    That doesn't extend to more complex cases very well though. Imagine you
    had 3-tuples not 2-tuples, and wanted to maintain the previous sort
    order of equal groupings when re-sorting by a different key... but you
    didn't know what key was last used for sorting. A stable sort algorithm
    means you don't need to care, because the order will be maintained for
    you not randomized.

    Well, that's several hundred more words than were probably required, but
    I hope I made sense.

    --
    Craig Ringer
     
    Craig Ringer, Jan 7, 2005
    #6
  7. Jp Calderone

    Nick Coghlan Guest

    It's me wrote:
    > What does it mean by "stability in sorting"?
    >
    > Can somebody please give a sample for using the code posted? I am a little
    > lost here and I like to know more about the use of keys....


    It's the jargon for what Jeff said - if you are sorting by some value calculated
    from each list entry, and different list entries may give the same value, then a
    stable sort guarantees that the order of entries giving the same value will be
    preserved from the original list.

    Consider:

    Py> from operator import itemgetter as item
    Py> seq = [(1, 1), (2, 1), (2, 3), (1, 5)]
    Py> seq.sort(key=item(1))
    Py> seq #1
    [(1, 1), (2, 1), (2, 3), (1, 5)]
    Py> seq.sort(reverse=True)
    Py> seq #2
    [(2, 3), (2, 1), (1, 5), (1, 1)]
    Py> seq.sort(key=item(1))
    Py> seq #3
    [(2, 1), (1, 1), (2, 3), (1, 5)]

    This snippet sorts the tuples according to the second item, then sorts them in
    reverse order of the whole tuple, then resorts them according to the second item.

    Notice that the order of the first two items is different between point #1 and
    point #3. This is because sorting by the second item makes no distinction
    between these two tuples, and they retain whatever order they had before the
    sort began.

    This is what it means to have a stable sort, and it makes expressing complex
    sorting quite easy by chaining different sort operations together.

    Python 2.3 has a stable sort, and Python 2.4 brought the guarantee that it shall
    remain that way. I'm not sure about Python 2.2 and earlier.

    Cheers,
    Nick.

    --
    Nick Coghlan | | Brisbane, Australia
    ---------------------------------------------------------------
    http://boredomandlaziness.skystorm.net
     
    Nick Coghlan, Jan 7, 2005
    #7
  8. Jp Calderone

    Tim Peters Guest

    [Nick Coghlan]
    ....
    > Python 2.3 has a stable sort, and Python 2.4 brought the guarantee that it shall
    > remain that way. I'm not sure about Python 2.2 and earlier.


    No list.sort() implementation before 2.3 was stable. It was
    confusing, though, because the samplesort/binary_insertion_sort hybrid
    Python used for the 4 years preceding 2.3 *was* stable for all "small
    enough" lists. More than one person got fooled by guessing that
    stability observed in small test cases meant it was always stable.
     
    Tim Peters, Jan 7, 2005
    #8
  9. Jp Calderone

    Nick Coghlan Guest

    Craig Ringer wrote:
    > Well, that's several hundred more words than were probably required, but
    > I hope I made sense.


    Remarkably similar to what I just posted. . . I guess numeric 2-tuples are just
    too good to pass up when discussing sorting :)

    Cheers,
    Nick.

    --
    Nick Coghlan | | Brisbane, Australia
    ---------------------------------------------------------------
    http://boredomandlaziness.skystorm.net
     
    Nick Coghlan, Jan 7, 2005
    #9
  10. Jeff Shannon wrote:
    > Jp Calderone wrote:
    >> L2 = [(d[key], i, d) for (i, d) in enumerate(L)]
    >> L2.sort()
    >> L = [d for (v, i, d) in L2]

    > Out of curiosity, any reason that you're including the index?


    Others have already remarked that this preserves sort stability
    (which is, in fact a lovely property). There is another property
    which hasn't been mentioned:
    As written, only the key and the index are compared.

    Try sorting:
    tricky = [dict(a=5j, b=1), dict(a=4j, b=1)]
    or:
    similar = [(5j, 1), (4j, 1)]

    without the index.

    --Scott David Daniels
     
    Scott David Daniels, Jan 7, 2005
    #10
  11. Jp Calderone

    Jeff Shannon Guest

    Nick Coghlan wrote:

    > Jeff Shannon wrote:
    >
    >> I suppose that your version has the virtue that, if the sortkey value
    >> is equal, items retain the order that they were in the original list,
    >> whereas my version will sort them into an essentially arbitrary order.
    >> Is there anything else that I'm missing here?

    >
    >
    > Stability in sorting is a property not to be sneezed at [...]


    Agreed. I'd started typing before I realized that it'd provide a
    stable sort, which pretty much answered my own question, but decided
    to send it anyhow in case I'd missed anything else... :)

    Jeff Shannon
    Technician/Programmer
    Credit International
     
    Jeff Shannon, Jan 7, 2005
    #11
  12. Jp Calderone

    Nick Coghlan Guest

    Jeff Shannon wrote:
    > Agreed. I'd started typing before I realized that it'd provide a stable
    > sort, which pretty much answered my own question, but decided to send it
    > anyhow in case I'd missed anything else... :)


    And it turns out we both missed the fact that it avoids comparing the
    dictionaries which could save a *lot* of number crunching (as well as making
    otherwise unsortable lists sortable).

    So it's a good thing you did decide to send it :)

    Cheers,
    Nick.


    --
    Nick Coghlan | | Brisbane, Australia
    ---------------------------------------------------------------
    http://boredomandlaziness.skystorm.net
     
    Nick Coghlan, Jan 8, 2005
    #12
  13. Jp Calderone

    Carl Banks Guest

    Jeff Shannon wrote:
    > Jp Calderone wrote:
    >
    > > L2 = [(d[key], i, d) for (i, d) in enumerate(L)]
    > > L2.sort()
    > > L = [d for (v, i, d) in L2]

    >
    > Out of curiosity, any reason that you're including the index? I'd
    > have expected to just do
    >
    > L2 = [(d[key], d) for d in L]
    > L2.sort()
    > L = [d for (v, d) in L2]



    Suppose L is a list of objects that can't be compared (for example,
    they are dicts that have complex number items) and the keys are not all
    distinct. If sort tries to compare two equal keys, it'll proceed to
    compare the objects themselves, which could throw an exception.

    Stick the index in there, and that possibility is gone.


    --
    CARL BANKS
     
    Carl Banks, Jan 8, 2005
    #13
  14. Jp Calderone

    Peter Hansen Guest

    Nick Coghlan wrote:
    > Stability in sorting is a property not to be sneezed at - it means
    > switching to sorting by a second key gives the effect of "sort by key 1,
    > then by key 2", whereas that doesn't hold with an unstable sort


    Assuming "key 1" refers to the first key, and "key 2" to the second
    key, then I believe you meant "gives the effect of 'sort by key 2
    then by key 1'".

    In other words, if you have a stable sort algorithm and do one sort
    operation using key 2, then a second sort operation using key 1,
    it's the same as doing a single sort with the key combination "key 1, key 2",
    not the other way around.

    -Peter
     
    Peter Hansen, Jan 8, 2005
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jiba

    unicode keys in dicts

    Jiba, Jan 8, 2004, in forum: Python
    Replies:
    2
    Views:
    313
    Peter Hansen
    Jan 8, 2004
  2. J Berends

    sorting on keys in a list of dicts

    J Berends, Jan 6, 2005, in forum: Python
    Replies:
    1
    Views:
    303
    Paul Rubin
    Jan 6, 2005
  3. bruce
    Replies:
    0
    Views:
    256
    bruce
    Jan 10, 2012
  4. rlelis
    Replies:
    12
    Views:
    190
    Neil Cerutti
    May 10, 2013
  5. Replies:
    8
    Views:
    158
Loading...

Share This Page