A Sort Optimization Technique: decorate-sort-dedecorate

Discussion in 'Java' started by xahlee@gmail.com, Aug 27, 2006.

  1. Guest

    Last year, i've posted a tutorial and commentary about Python and
    Perl's sort function. (http://xahlee.org/perl-python/sort_list.html)

    In that article, i discussed a technique known among juvenile Perlers
    as the Schwartzian Transform, which also manifests in Python as its
    “key†optional parameter.

    Here, i give a more detailed account on why and how of this construct.

    ----

    Language and Sort Optimization: decorate-sort-dedecorate

    There are many algorithms for sorting. Each language can chose which to
    use. See wikipedia for a detailed list and explanation:
    Sorting_algorithm↗ .

    However, does not matter which algorithm is used, ultimately it will
    need the order-deciding function on the items to be sorted. Suppose
    your items are (a,b,c,d,...), and your order-deciding function is F.
    Various algorithms will try to minimize the number of times F is
    called, but nevertheless, F will be applied to a particular element in
    the list multiple times. For example, F(a,b) may be called to see which
    of “a†or “b†comes first. Then, later the algorithm might need
    to call F(m,a), or F(a,z). The point here is that, F will be called
    many times on arbitrary two items in your list, even if one of the
    element has been compared to others before.

    Now suppose, you are sorting some polygons in 3D space, or personal
    records by the person's address's distance from a location, or sorting
    matrixes by their eigen-values in some math application, or ordering
    files by number of occurrences of some text in the file.

    In general, when you define your decision function F(x,y), you will
    need to extract some property from the elements to be sorted. For
    example, when sorting points in space by a criterion of distance, one
    will need to compute the distance for the point. When sorting personal
    records from database by the person's location, the decision function
    will need to retrieve the person's address from the database, then find
    the coordinate of that address, that compute the distance from there to
    a given coordinate. In sorting matrixes in math by eigen-values, the
    order-decision will first compute the eigen-value of the matrix. A
    common theme from all of the above is that they all need to do some
    non-trivial computation on each element.

    As we can see, the order-decision function F may need to do some
    expensive computation on each element first, and this is almost always
    the case when sorting elements other than simple numbers. Also, we know
    that a sorting algorithm will need to call F(x,y) many times, even if
    one of x or y has been compared to others before. So, this may result
    high inefficiency. For example, you need to order people by their
    location to your home. So when F(Mary,Jane) is called, Mary's address
    is first retrieved from a database across a network, the coordinate of
    her address is looked up in another database. Then the distance to your
    home are computed using spherical geometry. The exact same thing is
    done for Jane. But later on, it may call F(Mary,Chrissy),
    F(Mary,Jonesy), F(Mary,Pansy) and so on, and the entire record
    retrieval for Mary is repeated many times.

    One solution, is to do the expensive extraction one time for each
    element, then associate that with the corresponding elements. Suppose
    this expensive extraction function is called gist(). So, you create a
    new list ([Mary,gist(Mary)], [Jane,gist(Jane)], [John,gist(John)],
    [Jenny,gist(Jenny)], ...) and sort this list instead, when done, remove
    associated gist. This technique is sometimes called
    decorate-sort-dedecorate.

    In Perl programing, this decorate-sort-dedecorate technique is sillily
    known as Schwartzian Transform as we have demonstrated previously. In
    Python, they tried to incorporate this technique into the language, by
    adding the “key†optional parameter, which is our gist() function.

    ----------
    This post is archived at:
    http://xahlee.org/perl-python/sort_list.html

    I would be interested in comments about how Common Lisp, Scheme, and
    Haskell deal with the decorate-sort-dedecorate technique. In
    particular, does it manifest in the language itself? If not, how does
    one usually do it in the code? (note: please reply to approprate groups
    if it is of no general interest. Thanks) (am also interested on how
    Perl6 or Python3000 does this, if there are major changes to their sort
    function)

    Thanks.

    Xah

    ∑ http://xahlee.org/
     
    , Aug 27, 2006
    #1
    1. Advertising

  2. Tom Cole Guest

    Well you cross-posted this enough, including a Java group, and didn't
    even ask about us... What a pity.

    In Java, classes can implement the Comparable interface. This interface
    contains only one method, a compareTo(Object o) method, and it is
    defined to return a value < 0 if the Object is considered less than the
    one being passed as an argument, it returns a value > 0 if considered
    greater than, and 0 if they are considered equal.

    The object implementing this interface can use any of the variables
    available to it (AKA address, zip code, longitude, latitude, first
    name, whatever) to return this -1, 0 or 1. This is slightly different
    than what you mention as we don't have to "decorate" the object. These
    are all variables that already exist in the Object, and if fact make it
    what it is. So, of course, there is no need to un-decorate at the end.

    There are several built-in objects and methods available to sort
    Objects that are Comparable, even full Arrays of them.
     
    Tom Cole, Aug 28, 2006
    #2
    1. Advertising

  3. Henry Law Guest

    Tom Cole wrote:
    > Well you cross-posted this enough, including a Java group, and didn't
    > even ask about us... What a pity.


    Tom, this guy's a persistently pestiferous troll in comp.lang.perl.misc;
    I suggest you don't waste your breath.

    I'm posting this just to the Java group he cross-splattered in.

    --

    Henry Law <>< Manchester, England
     
    Henry Law, Aug 28, 2006
    #3
  4. wrote:

    > I would be interested in comments about how Common Lisp, Scheme, and
    > Haskell deal with the decorate-sort-dedecorate technique.


    %w(FORTRAN LISP COBOL).sort_by{|s| s.reverse}
    ==>["COBOL", "FORTRAN", "LISP"]

    --
    Common Lisp did kill Lisp. Period. ... It is to Lisp what
    C++ is to C. A monstrosity that totally ignores the basics
    of language design, simplicity, and orthogonality to begin
    with. --- Bernard Lang
     
    William James, Aug 28, 2006
    #4
  5. In <>, Tom Cole wrote:

    > In Java, classes can implement the Comparable interface. This interface
    > contains only one method, a compareTo(Object o) method, and it is
    > defined to return a value < 0 if the Object is considered less than the
    > one being passed as an argument, it returns a value > 0 if considered
    > greater than, and 0 if they are considered equal.
    >
    > The object implementing this interface can use any of the variables
    > available to it (AKA address, zip code, longitude, latitude, first
    > name, whatever) to return this -1, 0 or 1. This is slightly different
    > than what you mention as we don't have to "decorate" the object. These
    > are all variables that already exist in the Object, and if fact make it
    > what it is. So, of course, there is no need to un-decorate at the end.


    Python has such a mechanism too, the special `__cmp__()` method
    has basically the same signature. The problem the decorate, sort,
    un-decorate pattern solves is that this object specific compare operations
    only use *one* criteria.

    Let's say you have a `Person` object with name, surname, date of birth and
    so on. When you have a list of such objects and want to sort them by name
    or by date of birth you can't use the `compareTo()` method for both.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Aug 28, 2006
    #5
  6. Jim Gibson Guest

    In article <>, Marc 'BlackJack'
    Rintsch <> wrote:

    > In <>, Tom Cole wrote:
    >
    > > In Java, classes can implement the Comparable interface. This interface
    > > contains only one method, a compareTo(Object o) method, and it is
    > > defined to return a value < 0 if the Object is considered less than the
    > > one being passed as an argument, it returns a value > 0 if considered
    > > greater than, and 0 if they are considered equal.
    > >
    > > The object implementing this interface can use any of the variables
    > > available to it (AKA address, zip code, longitude, latitude, first
    > > name, whatever) to return this -1, 0 or 1. This is slightly different
    > > than what you mention as we don't have to "decorate" the object. These
    > > are all variables that already exist in the Object, and if fact make it
    > > what it is. So, of course, there is no need to un-decorate at the end.

    >
    > Python has such a mechanism too, the special `__cmp__()` method
    > has basically the same signature. The problem the decorate, sort,
    > un-decorate pattern solves is that this object specific compare operations
    > only use *one* criteria.


    I can't believe I am getting drawn into a thread started by xahlee, but
    here goes anyway:

    The problem addressed by what is know in Perl as the 'Schwartzian
    Transform' is that the compare operation can be an expensive one,
    regardless of the whether the comparison uses multiple keys. Since in
    comparison sorts, the compare operation will be executed N(logN) times,
    it is more efficient to pre-compute a set of keys, one for each object
    to be sorted. That need be done only N times. The sort can then use
    these pre-computed keys to sort the objects. See, for example:

    http://en.wikipedia.org/wiki/Schwartzian_transform

    --
    Jim Gibson

    Posted Via Usenet.com Premium Usenet Newsgroup Services
    ----------------------------------------------------------
    ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
    ----------------------------------------------------------
    http://www.usenet.com
     
    Jim Gibson, Aug 28, 2006
    #6
  7. Dr.Ruud Guest

    Jim Gibson schreef:

    > The problem addressed by what is know in Perl as the 'Schwartzian
    > Transform' is that the compare operation can be an expensive one,
    > regardless of the whether the comparison uses multiple keys. Since in
    > comparison sorts, the compare operation will be executed N(logN)
    > times, it is more efficient to pre-compute a set of keys, one for
    > each object to be sorted. That need be done only N times. The sort
    > can then use these pre-computed keys to sort the objects.


    Basically it first builds, than sorts an index.

    The pre-computed (multi-)keys can often be optimized, see Uri's
    Sort::Maker http://search.cpan.org/search?query=Sort::Maker
    for facilities.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Aug 28, 2006
    #7
  8. Jim Gibson schrieb:
    >
    > The problem addressed by what is know in Perl as the 'Schwartzian
    > Transform' is that the compare operation can be an expensive one,
    > regardless of the whether the comparison uses multiple keys. Since in
    > comparison sorts, the compare operation will be executed N(logN) times,
    > it is more efficient to pre-compute a set of keys, one for each object
    > to be sorted. That need be done only N times.


    Wikipedia says it's going from 2NlogN to N. If a sort is massively
    dominated by the comparison, that could give a speedup of up to 100%
    (approximately - dropping the logN factor is almost irrelevant, what
    counts is losing that factor of 2).

    Regards,
    Jo
     
    Joachim Durchholz, Aug 29, 2006
    #8
  9. Guest

    Joachim Durchholz <> wrote:
    > Jim Gibson schrieb:
    > >
    > > The problem addressed by what is know in Perl as the 'Schwartzian
    > > Transform' is that the compare operation can be an expensive one,
    > > regardless of the whether the comparison uses multiple keys. Since in
    > > comparison sorts, the compare operation will be executed N(logN) times,
    > > it is more efficient to pre-compute a set of keys, one for each object
    > > to be sorted. That need be done only N times.

    >
    > Wikipedia says it's going from 2NlogN to N. If a sort is massively
    > dominated by the comparison, that could give a speedup of up to 100%
    > (approximately - dropping the logN factor is almost irrelevant, what
    > counts is losing that factor of 2).


    It seems to me that ln 1,000,000 is 13.8, and that 13.8 is quite a bit
    greater than 2.

    Cheers,

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 29, 2006
    #9
  10. neoedmund Guest

    yeah, java also have 2 interface, Comparator and Comparable, which
    equal to python's compareTo() and __cmp__()

    Marc 'BlackJack' Rintsch wrote:
    > In <>, Tom Cole wrote:
    >
    > > In Java, classes can implement the Comparable interface. This interface
    > > contains only one method, a compareTo(Object o) method, and it is
    > > defined to return a value < 0 if the Object is considered less than the
    > > one being passed as an argument, it returns a value > 0 if considered
    > > greater than, and 0 if they are considered equal.
    > >
    > > The object implementing this interface can use any of the variables
    > > available to it (AKA address, zip code, longitude, latitude, first
    > > name, whatever) to return this -1, 0 or 1. This is slightly different
    > > than what you mention as we don't have to "decorate" the object. These
    > > are all variables that already exist in the Object, and if fact make it
    > > what it is. So, of course, there is no need to un-decorate at the end.

    >
    > Python has such a mechanism too, the special `__cmp__()` method
    > has basically the same signature. The problem the decorate, sort,
    > un-decorate pattern solves is that this object specific compare operations
    > only use *one* criteria.
    >
    > Let's say you have a `Person` object with name, surname, date of birth and
    > so on. When you have a list of such objects and want to sort them by name
    > or by date of birth you can't use the `compareTo()` method for both.
    >
    > Ciao,
    > Marc 'BlackJack' Rintsch
     
    neoedmund, Aug 31, 2006
    #10
  11. Xah Lee Guest

    i just want to make it known that i think most if not all of the
    replies in this thread are of not much technical value. They are either
    wrong and or misleading, and the perl module mentioned about sorting or
    the Java language aspect on sorting, as they are discussed or
    represented, are rather stupid.

    I may or may not write a detailed account later. If you have specific
    questions, or want to know specific reasons of my claims, please don't
    hesitate to email. (privately if you deem it proper)

    Xah

    ∑ http://xahlee.org/

    wrote:
    > Last year, i've posted a tutorial and commentary about Python and
    > Perl's sort function. (http://xahlee.org/perl-python/sort_list.html)
    >
    > In that article, i discussed a technique known among juvenile Perlers
    > as the Schwartzian Transform, which also manifests in Python as its
    > “key†optional parameter.
    >
    > Here, i give a more detailed account on why and how of this construct.
    >...
    > This post is archived at:
    > http://xahlee.org/perl-python/sort_list.html
    >
    > I would be interested in comments about how Common Lisp, Scheme, and
    > Haskell deal with the decorate-sort-dedecorate technique. In
    > particular, does it manifest in the language itself? If not, how does
    > one usually do it in the code? (note: please reply to approprate groups
    > if it is of no general interest. Thanks) (am also interested on how
    > Perl6 or Python3000 does this, if there are major changes to their sort
    > function)
    >
    > Thanks.
    >
    > Xah
    >
    > ∑ http://xahlee.org/
     
    Xah Lee, Sep 8, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas Philips
    Replies:
    2
    Views:
    342
    Paul McGuire
    Jun 23, 2004
  2. Tom Anderson
    Replies:
    1
    Views:
    454
    Benji York
    Aug 3, 2005
  3. Replies:
    18
    Views:
    523
    Xah Lee
    Sep 8, 2006
  4. Good Guy
    Replies:
    4
    Views:
    317
    Good Guy
    Oct 19, 2010
  5. Replies:
    10
    Views:
    216
    Xah Lee
    Sep 8, 2006
Loading...

Share This Page