RE: outline-style sorting algorithm

Discussion in 'Python' started by Delaney, Timothy C (Timothy), Apr 29, 2004.

  1. Thorsten Kampe wrote:

    >>>>> foo

    >> ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20',
    >> '1.20.1', '1.30']
    >>>>> foo.sort()
    >>>>> foo

    >> ['1.0', '1.0.1', '1.1.1', '1.10', '1.11', '1.2', '1.20', '1.20.1',
    >> '1.30', '1.9']
    >>
    >> Obviously 1.20.1 should be after 1.9 if we look at this as
    >> dot-delimited integers, not as decimal numbers.

    >
    > You need some general approach to avoid the DSU thing:
    >
    > def funcsort(seq, func):
    > """ sort seq by func(item) """
    > seq = seq[:]
    > seq.sort(lambda x, y: cmp(func(x), func(y)))
    > return seq
    >
    > funcsort(foo, lambda x: map(int, x.split('.')))


    I've seen you give this advice several times, today, and IMO it's
    completely wrong.

    DSU is *exactly* what you want to do. If you want to wrap it inside a
    general function, that's fine, but DSU is in almost all cases preferred
    to passing a comparison function - it's much faster.

    def sorted (seq, cmp=None, key=None):
    """ sort seq by func(item) """
    if key is None:
    seq = seq[:]
    else:
    seq = [(key(e), i, e) for i, e in enumerate(seq)]

    seq.sort(cmp)

    if key is None:
    return seq
    else:
    return [i[-1] for i in seq]

    foo = ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11',
    '1.20', '1.20.1', '1.30']

    foo = sorted(foo, key=lambda x: map(int, x.split('.')))
    print foo

    Note that Python 2.4 will have DSU as a built-in idiom to `list.sort`
    i.e. `list.sort(key=func)` but will be somewhat more efficient than the
    above. Likewise there will be a new builtin `sorted` which has exactly
    the same semantics as the above - it is stable, and it does not ever
    compare the actual value if a key function is supplied - only the key
    value (the above also compares the position, but that's an
    implementation detail and has no visible effect).

    Tim Delaney
     
    Delaney, Timothy C (Timothy), Apr 29, 2004
    #1
    1. Advertising

  2. * Delaney, Timothy C (Timothy) (2004-04-29 02:52 +0100)
    > Thorsten Kampe wrote:
    >>>>>> foo
    >>> ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20',
    >>> '1.20.1', '1.30']
    >>>>>> foo.sort()
    >>>>>> foo
    >>> ['1.0', '1.0.1', '1.1.1', '1.10', '1.11', '1.2', '1.20', '1.20.1',
    >>> '1.30', '1.9']
    >>>
    >>> Obviously 1.20.1 should be after 1.9 if we look at this as
    >>> dot-delimited integers, not as decimal numbers.

    >>
    >> You need some general approach to avoid the DSU thing:
    >>
    >> def funcsort(seq, func):
    >> """ sort seq by func(item) """
    >> seq = seq[:]
    >> seq.sort(lambda x, y: cmp(func(x), func(y)))
    >> return seq
    >>
    >> funcsort(foo, lambda x: map(int, x.split('.')))

    >
    > I've seen you give this advice several times, today, and IMO it's
    > completely wrong.
    >
    > DSU is *exactly* what you want to do. If you want to wrap it inside a
    > general function, that's fine, but DSU is in almost all cases preferred
    > to passing a comparison function - it's much faster.


    My advice was to use a more general approach like 'funcsort' to avoid
    reiventing the wheel for every problem. DSU is okay for funcsort.

    > def sorted (seq, cmp=None, key=None):


    What is 'cmp=None' for?

    > """ sort seq by func(item) """
    > if key is None:
    > seq = seq[:]
    > else:
    > seq = [(key(e), i, e) for i, e in enumerate(seq)]
    >
    > seq.sort(cmp)


    Are you passing a second comparison function? And if, isn't that the
    same as I did and to which you said "preferred to passing a comparison
    function"? Are you shadowing the builtin cmp function?

    > if key is None:
    > return seq
    > else:
    > return [i[-1] for i in seq]


    What is 'key=None' for? Is that for speeding up if someone wants to
    pass the indentity function (f(x)=x; lambda x: x)?

    > Note that Python 2.4 will have DSU as a built-in idiom to `list.sort`
    > i.e. `list.sort(key=func)` but will be somewhat more efficient than the
    > above. Likewise there will be a new builtin `sorted` which has exactly
    > the same semantics as the above - it is stable, and it does not ever
    > compare the actual value if a key function is supplied - only the key
    > value (the above also compares the position, but that's an
    > implementation detail and has no visible effect).


    If it has no visible effects than it would be useless. In my opinion
    it has the effect that two items that have the same func(x) stay in
    the same position as before. Is that desirable? Was that your goal?

    >>> sorted([4, 2], None, lambda x: x % 2)

    [4, 2], but [2, 4] if the index was omitted


    Thorsten
     
    Thorsten Kampe, Apr 29, 2004
    #2
    1. Advertising

  3. * Delaney, Timothy C (Timothy) (2004-04-29 02:52 +0100)
    > Thorsten Kampe wrote:
    >>>>>> foo
    >>> ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20',
    >>> '1.20.1', '1.30']
    >>>>>> foo.sort()
    >>>>>> foo
    >>> ['1.0', '1.0.1', '1.1.1', '1.10', '1.11', '1.2', '1.20', '1.20.1',
    >>> '1.30', '1.9']
    >>>
    >>> Obviously 1.20.1 should be after 1.9 if we look at this as
    >>> dot-delimited integers, not as decimal numbers.

    >>
    >> You need some general approach to avoid the DSU thing:
    >>
    >> def funcsort(seq, func):
    >> """ sort seq by func(item) """
    >> seq = seq[:]
    >> seq.sort(lambda x, y: cmp(func(x), func(y)))
    >> return seq
    >>
    >> funcsort(foo, lambda x: map(int, x.split('.')))

    >
    > I've seen you give this advice several times, today, and IMO it's
    > completely wrong.
    >
    > DSU is *exactly* what you want to do. If you want to wrap it inside a
    > general function, that's fine, but DSU is in almost all cases preferred
    > to passing a comparison function - it's much faster.


    If made some tests with lists of one million items:

    The first test was with a randomized sequence of numbers from 1 -
    1000000. Duplicate numbers were allowed. The comparing function was
    the identity (lambda x: x).

    You DSU approach took about 43 s and mine took about 86 s.
    Than I tested range(1000000) in reverse order:
    Your sorted program took about 24 s and mine about 5!

    So you cannot simply state that DSU is faster than passing a function
    if the structure of the input you want to sort is unknown...


    Thorsten
     
    Thorsten Kampe, Apr 29, 2004
    #3
  4. "Delaney, Timothy C (Timothy)" <> wrote:

    > foo = sorted(foo, key=lambda x: map(int, x.split('.')))
    > print foo


    Generality is also about making as little assumptions about the data
    as possible and still provide minimal functionality. For example
    dropping the "integer" assumption:

    def _cmp(x,y):
    L,R = x.split('.'),y.split('.')
    for a,b in zip(L,R):
    r = cmp(len(a),len(b)) or cmp(a,b)
    if r:
    return r
    return cmp(len(L),len(R))

    def test():
    L = ['1.0', '1.0.1', '1.1.1', '1.2', '1.10',
    '1.11', '1.20', '1.20.1', '1.30', '1.9']
    L.sort(_cmp)
    print L

    if __name__=='__main__':
    test()

    Anton
     
    Anton Vredegoor, Apr 29, 2004
    #4
  5. Delaney, Timothy C (Timothy)

    Terry Reedy Guest

    "Thorsten Kampe" <> wrote in message
    news:...
    > If made some tests with lists of one million items:
    >
    > The first test was with a randomized sequence of numbers from 1 -
    > 1000000. Duplicate numbers were allowed. The comparing function was
    > the identity (lambda x: x).


    I do not understand what you did, and therefore cannot judge results. A
    compare function takes two args (the objects to be compared) and
    returns -1, 0, 1. (I believe 0, 1 for <= or >t may be sufficient
    currently). How did you use the one-param identity?

    Terry J. Reedy
     
    Terry Reedy, Apr 29, 2004
    #5
  6. * Terry Reedy (2004-04-29 20:03 +0100)
    > "Thorsten Kampe" <> wrote in message
    > news:...
    >> If made some tests with lists of one million items:
    >>
    >> The first test was with a randomized sequence of numbers from 1 -
    >> 1000000. Duplicate numbers were allowed. The comparing function was
    >> the identity (lambda x: x).

    >
    > I do not understand what you did, and therefore cannot judge results. A
    > compare function takes two args (the objects to be compared) and
    > returns -1, 0, 1. (I believe 0, 1 for <= or >t may be sufficient
    > currently).


    That's the "traditional" approach (and it doesn't make much sense -
    see the "Searching and Sorting FAQ" in the Python Cookbook from Tim
    Peters: "Why does Python use the three out-come cmp for sorting? Why
    doesn't it use a simple less-than comparison instead?". It better to
    abstract the function from the comparison.

    > How did you use the one-param identity?


    I assembled my "pass function" approach and TCD's DSU approach into a
    single program[1]

    Then I generated two lists[2]:
    >>> a = range(1000000); a.reverse()

    and
    >>> b = randseq(1, 1000000, 1000000, 'repeat')

    (meaning: create a list with one million integers 1 <= n <= 1000000;
    allow duplicate numbers)

    Then I timed [3]
    timer(funcsort, a, lambda x: x, 'dsu')) -> 43 sec
    timer(funcsort, a, lambda x: x, 'pass')) -> 86 sec
    timer(funcsort, b, lambda x: x, 'dsu')) -> 24 sec
    timer(funcsort, b, lambda x: x, 'pass')) -> 5 sec

    [1]
    def funcsort(seq, func, modus = 'dsu'):
    """ sort seq by func(item) """
    if func is None:
    seq = seq[:]
    seq.sort()
    return seq
    elif modus == 'dsu':
    seq = [(func(item), index, item) for index, item in enumerate(seq)]
    seq.sort()
    return [item[2] for item in seq]
    elif modus == 'pass':
    seq = seq[:]
    seq.sort(lambda x, y: cmp(func(x), func(y)))
    return seq
    [2]
    def randseq(range_start, range_end = 0, count = 0, modus = 'norepeat'):
    from random import randint

    if modus == 'repeat':
    return [randint(range_start, range_end) for index in range(count)]

    elif isinstance(range_start, int):
    return randseq(range(range_start, range_end + 1))[:count]

    else: # 'range_start' is a seq, so shuffle
    range_start = range_start[:]
    randomseq = []

    for i in range(len(range_start)):
    itemindex = randint(0, len(range_start) - 1)
    randomseq.append(range_start[itemindex])
    del range_start[itemindex]
    return randomseq
    [3]
    def timer(iteration, *func_and_args):
    """
    print the time elapsed (in seconds) evaluating func iteration
    times
    (default is '1') """
    import gc, \
    time
    from colour import ltyellow, \
    reset_color

    if isinstance(iteration, int):
    func, args = func_and_args[0], func_and_args[1:]
    else:
    # if first argument is not a number, set func to iteration and iteration
    # to '1'
    iteration, func, args = 1, iteration, func_and_args

    iteration = range(iteration)
    gc.collect() # force garbage collection
    start_time_cpu = time.clock()
    start_time_total = time.time()

    for i in iteration:
    func(*args)
    print ltyellow, 'cpu: %(cpu).3f,' % {'cpu': time.clock() - start_time_cpu}, \
    'total: %(total).3f' % {'total': time.time() - start_time_total}, reset_color
     
    Thorsten Kampe, Apr 29, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. msnews.microsoft.com

    Document Outline in VS.NET! WOW!

    msnews.microsoft.com, Jul 25, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    1,717
    msnews.microsoft.com
    Jul 26, 2004
  2. King of Red Lions

    Text outline

    King of Red Lions, Apr 8, 2004, in forum: HTML
    Replies:
    9
    Views:
    20,552
    mikewandling
    Oct 25, 2008
  3. Replies:
    5
    Views:
    333
    Thorsten Kampe
    Apr 29, 2004
  4. Terry Reedy

    Re: outline-style sorting algorithm

    Terry Reedy, Apr 19, 2004, in forum: Python
    Replies:
    5
    Views:
    322
    Fredrik Lundh
    Apr 20, 2004
  5. Felix Collins

    HELP:sorting list of outline numbers

    Felix Collins, Aug 3, 2005, in forum: Python
    Replies:
    5
    Views:
    300
    Christopher Subich
    Aug 3, 2005
Loading...

Share This Page