Feature request: sorting a list slice

Discussion in 'Python' started by George Sakkis, May 18, 2006.

  1. It would be useful if list.sort() accepted two more optional
    parameters, start and stop, so that you can sort a slice in place. In
    other words,

    x = range(1000000)
    x.sort(start=3, stop=-1)

    would be equivalent to

    x[3:-1] = sorted(x[3:-1])

    but more efficient and without repeating the slice twice. If it's not
    too late for feature additions, I'd love to see this in 2.5.


    George
    George Sakkis, May 18, 2006
    #1
    1. Advertising

  2. George Sakkis

    John Machin Guest

    Use case?
    John Machin, May 18, 2006
    #2
    1. Advertising

  3. George Sakkis wrote:

    > It would be useful if list.sort() accepted two more optional
    > parameters


    useful for what? what's the use case ?

    </F>
    Fredrik Lundh, May 18, 2006
    #3
  4. George Sakkis wrote:
    > It would be useful if list.sort() accepted two more optional
    > parameters, start and stop, so that you can sort a slice in place. In
    > other words,
    >
    > x = range(1000000)
    > x.sort(start=3, stop=-1)
    >
    > would be equivalent to
    >
    > x[3:-1] = sorted(x[3:-1])
    >
    > but more efficient and without repeating the slice twice. If it's not
    > too late for feature additions, I'd love to see this in 2.5.


    This is a false optimization. The slicing steps are O(n) and the sort
    step is O(n log n) unless the data has some internal structure that
    Timsort can use to get closer to O(n).

    If you implemented this and timed in it real apps, I would be surprised
    to find that the slice extraction and assignments were the performance
    bottleneck. IMO, a worthwhile performance gain is unlikely.


    Raymond
    Raymond Hettinger, May 18, 2006
    #4
  5. Fredrik Lundh wrote:

    > George Sakkis wrote:
    >
    >> It would be useful if list.sort() accepted two more optional
    >> parameters

    >
    >
    > useful for what? what's the use case ?
    >
    > </F>
    >


    Although there is a use case (see below), I don't think it's strictly
    necessary in this case. Arguments to add it:
    - Don't Repeat Yourself.
    - It doesn't introduce new syntax, builtin or standard library
    function; only two optional keywords in an existing method.
    - These keywords are already used in several list and string methods
    (index,find,..) and their meaning is instantly obvious.
    - Increased (even if small) efficiency. Yes, I know that in practice,
    and especially in apps that involve I/O, network, DB connections and so
    on, it's unlikely to be the bottleneck but this isn't a general
    argument for being sloppy. The fact that having a "for i in
    xrange(100000):pass" at the top of every file costs me only a few msec
    is not a reason to keep it there.

    IMO the only reason not to add it would be if it makes sort's
    implementation much more complicate that it already is. I don't know
    Timsort's details, but I doubt that this is the case; please correct me
    if I am wrong.

    And here's (a cut-down version of) the use case: a recursive
    generalization of groupby. On each recursive call, the argument list is
    sorted and regrouped. If sort() accepted a (start,end) range I could
    pass this over instead of creating a slice of the input each time. It's
    not hard to see that there can be an exponential number of calls wrt to
    the input's length, so cutting down the average cost of each call may
    be worthwhile for medium-largish sized inputs (of course it's still
    exponential asymptotically, so the constant factor does not make much
    difference for really large inputs).


    import itertools as it

    def groupBy(iterable, *keys):
    return _groupBy(list(iterable), keys, len(keys))

    def _groupBy(lst, keys, depth):
    if depth == 0:
    return lst
    key = keys[-depth]
    # sort group before subgrouping
    lst.sort(key=key)
    # group by key and then recursively do the same for each subgroup
    return [(k, _groupBy(list(subgroup),keys,depth-1))
    for k,subgroup in it.groupby(lst,key)]


    #==== example ========================================

    class Struct(object):
    def __init__(self, args):
    self.args = args
    def __getitem__(self,index):
    return self.args[index]

    data = map(Struct, [
    ('a', 2, True),
    ('b', 1, False),
    ('a', 1, True),
    ('a', 1, False),
    ('b', 2, True),
    ('b', 2, True),
    ('a', 2, True),
    ])

    from pprint import pprint
    from operator import itemgetter
    pprint(groupBy(data, itemgetter(2), itemgetter(0), itemgetter(1)))


    #======= output =================================

    [(False,
    [('a', [(1, [<__main__.Struct object at 0xb7dc128c>])]),
    ('b', [(1, [<__main__.Struct object at 0xb7dc124c>])])]),
    (True,
    [('a',
    [(1, [<__main__.Struct object at 0xb7dc126c>]),
    (2,
    [<__main__.Struct object at 0xb7dc122c>,
    <__main__.Struct object at 0xb7dc12ec>])]),
    ('b',
    [(2,
    [<__main__.Struct object at 0xb7dc12ac>,
    <__main__.Struct object at 0xb7dc12cc>])])])]


    George
    George Sakkis, May 18, 2006
    #5
  6. Am Donnerstag 18 Mai 2006 22:13 schrieb Raymond Hettinger:
    > This is a false optimization. The slicing steps are O(n) and the sort
    > step is O(n log n) unless the data has some internal structure that
    > Timsort can use to get closer to O(n).
    >
    > If you implemented this and timed in it real apps, I would be surprised
    > to find that the slice extraction and assignments were the performance
    > bottleneck. IMO, a worthwhile performance gain is unlikely.


    I personally wouldn't find this to be a false optimization, at least not
    memory-wise. If you sort really large lists, and only want to sort part of
    the list, the memory gains of not having to do a slice should be enormous,
    and there should be some time-gains too. And, additionally, implementing this
    (if I understand timsort correctly, just had a look at the sources) isn't
    hard.

    Rather, I'd pose the usage question: why are you only sorting a part of a
    list? lists are basically meant for homogenous data. And sorting only a part
    of that should be a pretty meaningless operation, mostly...

    Anyway, I've just written up a patch to extend sort() with a start/stop
    parameter (which behaves just like the default slice notation). Generally,
    this will make sort() setup slightly slower in the "standard" case (because
    there's some more pointer arithmetic involved, but this should be negligible,
    and is O(1)), but for the actual use case, the following numbers can be seen:

    modelnine@phoenix ~/mercurial/python-modelnine $ ./python test.py
    New average time: 13.7254650593 ms per loop
    Old average time: 14.839854002 ms per loop
    [10198 refs]
    modelnine@phoenix ~/mercurial/python-modelnine $

    This is just a very, very simple test (testing the case of a list with one
    million entries, where 99000 are sorted):

    >>>

    from random import randrange
    from time import time

    x = [randrange(256) for x in xrange(1000000)]
    timesnew, timesold = [], []

    for reps in range(1000):
    y = x[:]
    timesnew.append(time())
    y.sort(start=1000,stop=10000)
    timesnew[-1] = time() - timesnew[-1]

    for reps in range(1000):
    y = x[:]
    timesold.append(time())
    y[1000:10000] = sorted(y[1000:10000])
    timesold[-1] = time() - timesold[-1]

    print "New average time:", sum(timesnew), "ms per loop"
    print "Old average time:", sum(timesold), "ms per loop"
    >>>


    I'll post the patch to the SF bugtracker tomorrow, it's too late today for me
    to review it, but generally, I wouldn't find it to be a bad addition, if
    there's actually a general use case to only sort parts of a list.

    --- Heiko.
    Heiko Wundram, May 18, 2006
    #6
  7. Fredrik Lundh wrote:
    > George Sakkis wrote:
    >
    > > It would be useful if list.sort() accepted two more optional
    > > parameters


    +1

    > useful for what? what's the use case ?


    Actually, I was in need of such a construct only few weeks ago. The
    task was to organize playlists
    of an mp3 player. I wanted to sort future entries but not past ones
    (according to an index in the
    list that seperates past from future). I can imagine many more
    situations in which sorting part of
    a list is usefull.

    While I agree that the improvement of the additional keywords is minor,
    I think that the additional
    load they impose on the sort function is also minor. In my oppinion,
    this is a straight-forward
    extension of what we already find in other places in the library. I
    would like to see it in a future version.

    - harold -
    Harold Fellermann, May 19, 2006
    #7
  8. Am Donnerstag 18 Mai 2006 19:27 schrieb George Sakkis:
    > It would be useful if list.sort() accepted two more optional
    > parameters, start and stop, so that you can sort a slice in place.


    I've just submitted:

    http://sourceforge.net/tracker/index.php?func=detail&aid=1491804&group_id=5470&atid=305470

    to the bugtracker, which extends the (start, stop) keyword arguments to
    list.reverse() (which I've needed more than once). The patch updates the test
    suite, documentation, list object, and sorted() builtin to accept (or
    specify) the new arguments.

    Any comment/feedback would be appreciated.

    --- Heiko.
    Heiko Wundram, May 19, 2006
    #8
  9. Heiko Wundram wrote:

    > Am Donnerstag 18 Mai 2006 19:27 schrieb George Sakkis:
    > > It would be useful if list.sort() accepted two more optional
    > > parameters, start and stop, so that you can sort a slice in place.

    >
    > I've just submitted:
    >
    > http://sourceforge.net/tracker/index.php?func=detail&aid=1491804&group_id=5470&atid=305470
    >
    > to the bugtracker, which extends the (start, stop) keyword arguments to
    > list.reverse() (which I've needed more than once). The patch updates the test
    > suite, documentation, list object, and sorted() builtin to accept (or
    > specify) the new arguments.
    >
    > Any comment/feedback would be appreciated.
    >
    > --- Heiko.


    This is great, thanks Heiko ! Any idea on the chances of being
    considered for inclusion in 2.5 ?

    George
    George Sakkis, May 19, 2006
    #9
  10. Am Freitag 19 Mai 2006 23:24 schrieb George Sakkis:
    > This is great, thanks Heiko ! Any idea on the chances of being
    > considered for inclusion in 2.5 ?


    Don't ask me, I'm not one of the core developers... ;-) But, anyway, the
    people on python-dev are doing their best to review patches. Just: I rather
    write them, than review them... ;-)

    --- Heiko.
    Heiko Wundram, May 19, 2006
    #10
  11. John Machin wrote:

    > Use case?


    quicksort! :)

    --
    Edward Elliott
    UC Berkeley School of Law (Boalt Hall)
    complangpython at eddeye dot net
    Edward Elliott, May 21, 2006
    #11
  12. George Sakkis

    Steve Holden Guest

    The effbot wrote:

    > George Sakkis wrote:
    >
    >>> It would be useful if list.sort() accepted two more optional
    >>> parameters

    >
    > useful for what? what's the use case ?
    >

    John Machin then wrote, without quoting any context at all:
    > Use case?
    >

    He means "under what circumstances do you see someone actually using the
    proposed feature rather than just nodding and saying "that looks
    useful". IOW, who would use the new feature, and why?

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Love me, love my blog http://holdenweb.blogspot.com
    Recent Ramblings http://del.icio.us/steve.holden
    Steve Holden, May 21, 2006
    #12
  13. George Sakkis

    Robert Kern Guest

    Steve Holden wrote:
    > The effbot wrote:
    >
    >>George Sakkis wrote:
    >>
    >>>>It would be useful if list.sort() accepted two more optional
    >>>>parameters

    >>
    >>useful for what? what's the use case ?

    >
    > John Machin then wrote, without quoting any context at all:
    >
    >>Use case?

    >
    > He means "under what circumstances do you see someone actually using the
    > proposed feature rather than just nodding and saying "that looks
    > useful". IOW, who would use the new feature, and why?


    I believe that John was asking George for a use case rather than asking Fredrik
    what a use case was.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, May 21, 2006
    #13
  14. George Sakkis

    Steve Holden Guest

    Robert Kern wrote:
    > Steve Holden wrote:
    >
    >>The effbot wrote:
    >>
    >>
    >>>George Sakkis wrote:
    >>>
    >>>
    >>>>>It would be useful if list.sort() accepted two more optional
    >>>>>parameters
    >>>
    >>>useful for what? what's the use case ?

    >>
    >>John Machin then wrote, without quoting any context at all:
    >>
    >>
    >>>Use case?

    >>
    >>He means "under what circumstances do you see someone actually using the
    >>proposed feature rather than just nodding and saying "that looks
    >>useful". IOW, who would use the new feature, and why?

    >
    >
    > I believe that John was asking George for a use case rather than asking Fredrik
    > what a use case was.
    >

    In which case, as I pointed out, John would have done better to quote a
    little more context.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Love me, love my blog http://holdenweb.blogspot.com
    Recent Ramblings http://del.icio.us/steve.holden
    Steve Holden, May 21, 2006
    #14
  15. George Sakkis

    Robert Kern Guest

    Steve Holden wrote:
    > Robert Kern wrote:


    >>I believe that John was asking George for a use case rather than asking Fredrik
    >>what a use case was.

    >
    > In which case, as I pointed out, John would have done better to quote a
    > little more context.


    No argument here.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, May 21, 2006
    #15
  16. George Sakkis

    John Machin Guest

    Context? The whole message asked for a new feature. Please tell me what
    context I should have supplied.
    John Machin, May 21, 2006
    #16
  17. George Sakkis

    Steve Holden Guest

    Steve Holden, May 21, 2006
    #17
  18. Getting a patch ready for checkin (corrected, documented, reviewed, and
    tested) is only part of the battle. The real challenge of language
    design is figuring out whether something should be done.

    Adding optional parameters to a method makes its invocation slower and
    makes the API harder to learn and remember.

    There has not yet been a thorough discussion of use cases. In
    particular, people should scan their existing, deployed codebase for
    examples of where a particular code fragment would have been improved.
    We should then compare the fragment with and without the feature -- how
    much more concise is it, how much is performance improved/harmed, is
    the code more or less readable, etc.

    If the perf gain is small and the use cases are infrequent, the
    addition is likely unwarranted. There is an entire class of feature
    requests that are more appropriate as recipes than for inclusion in the
    language. If python had accepted most requests like this, it would
    have become somewhat bloated and unattractive.
    Raymond Hettinger, May 21, 2006
    #18
  19. Am Sonntag 21 Mai 2006 18:55 schrieb Raymond Hettinger:
    > If the perf gain is small and the use cases are infrequent, the
    > addition is likely unwarranted. There is an entire class of feature
    > requests that are more appropriate as recipes than for inclusion in the
    > language.


    The thing is: having an explicit start/stop argument to reverse() and sort()
    doesn't slow down method call much (it's just one if() whose body is skipped
    when the parameters aren't passed, I'd say that the time that's lost here is
    pretty insignificant, in the order of 10^-6 seconds, on _any_ modern
    machine), and on the other hand warrants huge memory gains (if not
    performance gains by saving a memcpy) when you do need to sort or reverse
    slices of large lists.

    I've had use cases of the latter (reversing large parts of even larger lists
    in memory) in several data archiving and evaluation programs I wrote, but I
    can also understand the use case that was made by George for having these
    arguments for sort(), so that groupBy() can be extended easily to work on
    subgroups without requiring slicing.

    Anyway: having these "extensions" as a recipe won't work here: the main idea
    is saving the slicing by having sort() and reverse() do the "slicing"
    internally (rather, they just add a constant to the lower and upper bound,
    and the user specifies these constants, the internal functions they call
    already work on slices, and the current listobject.c gives them the whole
    list as the slice by default).

    The user can't patch this as an extension on the fly, because it requires
    changes to the underlying listobject.c source. That's why George is asking
    for inclusion of this patch.

    I just wrote the patch because I had the time to do so, and I won't battle for
    it's inclusion, but again, I see the use cases clearly, at the very least for
    slice support in list.reverse() and array.reverse() (which the patch also
    implements).

    --- Heiko.
    Heiko Wundram, May 21, 2006
    #19
  20. George Sakkis

    Robert Kern Guest

    John Machin wrote:
    > Context? The whole message asked for a new feature. Please tell me what
    > context I should have supplied.


    When you reply to a message, please quote part of that message. That's what was
    meant by context.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, May 21, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    238
    Steven Bethard
    Jun 12, 2007
  2. hwg
    Replies:
    2
    Views:
    214
    Paul McGuire
    Aug 29, 2007
  3. Replies:
    19
    Views:
    224
    Nobody
    Feb 26, 2013
  4. Andrew Robinson
    Replies:
    0
    Views:
    71
    Andrew Robinson
    Feb 25, 2013
  5. Andrew Robinson
    Replies:
    0
    Views:
    62
    Andrew Robinson
    Feb 25, 2013
Loading...

Share This Page