Merge Sort question

Discussion in 'Java' started by Seth G., Aug 11, 2003.

  1. Seth G.

    Seth G. Guest

    The following question is about merge sort, and not particularly java
    ( I didnt know where else to put this query).

    If merge sort uses the divide and conquer approach to sort an array in
    the following manner(roughly):

    MERGE_SORT(Array,begin,end)
    {
    if ( begin < end )
    {
    middle = ( begin + end ) / 2

    // merge sort first half of array
    MERGE_SORT(Array,begin,end/2);

    // merge sort second half of array
    MERGE_SORT(Array,end/2,end);

    MERGE(Array,begin,middle,end);
    }

    }

    What effect would dividing by the array into more than 2 (say
    3,4,...,n) have on the algorithm? Would it be more efficient than the
    merge sort runtime of
    O(n logn)? My inclination is to say no, and it may in fact be less
    efficient due to the increased recursive calls to merge sort, pushing
    larger chunks of data onto the stack.

    I remeber speaking to someone about this very topic before, and I
    cant recall the outcome of that conversation and why I convinced
    myself it was no more efficient than merge sort "as is".

    Thanks in advance!
    Seth G., Aug 11, 2003
    #1
    1. Advertising

  2. Seth G.

    Chris Smith Guest

    Seth G. wrote:
    > What effect would dividing by the array into more than 2 (say
    > 3,4,...,n) have on the algorithm? Would it be more efficient than the
    > merge sort runtime of
    > O(n logn)?


    No. Logarithms of different bases are constant multiples of each other,
    so dividing into three pieces would produce an algorithm that could be
    called O(n log[base 3] n), but that is equivalent to any other
    logarithmic base, as far as big-O notation is concerned. You just end
    up with a constant multiple difference of (log 3 / log 2), and constant
    multiples don't matter to big-O notation.

    --
    www.designacourse.com
    The Easiest Way to Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
    Chris Smith, Aug 11, 2003
    #2
    1. Advertising

  3. Seth G.

    Roedy Green Guest

    On 11 Aug 2003 08:52:45 -0700, (Seth G.) wrote
    or quoted :

    >What effect would dividing by the array into more than 2 (say
    >3,4,...,n) have on the algorithm? Would it be more efficient than the
    >merge sort runtime of
    >O(n logn)? My inclination is to say no, and it may in fact be less
    >efficient due to the increased recursive calls to merge sort, pushing
    >larger chunks of data onto the stack.


    It all depends on what algorithm is it using to sort the two halves.
    Is it more or less efficient than merge over what range of sizes.
    Normally a merge is used when:

    1. you already have two streams sorted.

    2. You have not enough ram to sort the whole pile at once, so you sort
    chunks. The advantage of merge is you can merge two chunks without
    needing room for both in the in and out chunks in ram at once.

    Generally I would just use Sun's sort unless there is something
    dreadfully wrong with it. Focus on writing a fast tight comparator.
    If you want to experiment, benchmark various algorithms. See
    http://mindprod.com/jgloss/sort.html

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Aug 11, 2003
    #3
  4. Seth G.

    Eric Sosman Guest

    "Seth G." wrote:
    >
    > The following question is about merge sort, and not particularly java
    > ( I didnt know where else to put this query).
    >
    > If merge sort uses the divide and conquer approach to sort an array in
    > the following manner(roughly): [pseudocode snipped]
    >
    > What effect would dividing by the array into more than 2 (say
    > 3,4,...,n) have on the algorithm? Would it be more efficient than the
    > merge sort runtime of
    > O(n logn)? My inclination is to say no, and it may in fact be less
    > efficient due to the increased recursive calls to merge sort, pushing
    > larger chunks of data onto the stack.


    The big-Oh behavior will be unaffected, although all of
    the constant factor, the threshold, and the lower-order terms
    might change.

    Note that merging three sorted sub-files of twelve items
    each is harder than merging two sorted sub-files of eighteen
    items. Merging six six-item files is harder still -- and,
    of course, if you found yourself merging thirty-six one-item
    files, it might occur to you that the sort hadn't made a lot
    of progress up to that point ...

    Carrying the idea to not quite such an extreme leads to
    the "replacement selection" algorithm, described in detail
    by Knuth ("The Art of Computer Programming, Volume III: Sorting
    and Searching," Section 5.4.1) in connection with developing
    the initial runs for external sorting, typically on magnetic
    tape. The medium may seem archaic, but the algorithm can
    certainly be applied in other contexts.

    --
    Eric Sosman, Aug 11, 2003
    #4
  5. Seth G.

    Roedy Green Guest

    On Mon, 11 Aug 2003 15:23:33 -0400, Eric Sosman <>
    wrote or quoted :

    > in connection with developing
    >the initial runs for external sorting, typically on magnetic
    >tape. The medium may seem archaic, but the algorithm can
    >certainly be applied in other contexts.


    The nice thing about merge is it requires only sequential access. It
    can work in a bare minimum of RAM.

    You will see it used in OptTech sort for example after RAM-fuls of
    data have been sorted and dumped to disk. After that first phase, you
    do an N-way merge to create longer and longer chunks on disk until you
    are done.

    see http://mindprod.com/jgloss/sort.html

    In the old days, 3 way merges were accomplished with a bank of 6 tape
    drives, three in and three out. You would see saw the data back and
    forth, gradually getting longer blocks of sorted data.

    It was hypnotic to watch the tapes twitching away.

    Today, people seem to panic as soon as Sunsort runs out of RAM.

    You'd think someone would have written a merge sort to extend SunSort
    to giant data sets. It works best if you can put each of your temp
    files on a different physical disk.

    The complication is you have to start serialising the objects.
    With pure-in ram sort, all you need to manipulate is a single array of
    references.


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Aug 11, 2003
    #5
  6. Seth G.

    Eric Sosman Guest

    (Straying from topicality, but ...)

    Roedy Green wrote:
    >
    > On Mon, 11 Aug 2003 15:23:33 -0400, Eric Sosman <>
    > wrote or quoted :
    >
    > > in connection with developing
    > >the initial runs for external sorting, typically on magnetic
    > >tape. The medium may seem archaic, but the algorithm can
    > >certainly be applied in other contexts.

    >
    > The nice thing about merge is it requires only sequential access. It
    > can work in a bare minimum of RAM.
    >
    > You will see it used in OptTech sort for example after RAM-fuls of
    > data have been sorted and dumped to disk. After that first phase, you
    > do an N-way merge to create longer and longer chunks on disk until you
    > are done.


    The nice thing about replacement selection is that you
    typically get run lengths of about twice the memory capacity,
    hence half as many chunks to deal with while merging.

    > see http://mindprod.com/jgloss/sort.html


    Hmmm ... I'd take issue with your characterization of
    HeapSort as mostly good for already-ordered data and of
    ShellSort as good only for small data sets. HeapSort is
    perfectly happy with jumbled data, and ShellSort with well-
    chosen increments can be very fast indeed. See

    http://www.cs.princeton.edu/~rs/shell/index.html

    > In the old days, 3 way merges were accomplished with a bank of 6 tape
    > drives, three in and three out. You would see saw the data back and
    > forth, gradually getting longer blocks of sorted data.


    That wasn't a very good way to do things. See Knuth for
    descriptions and analyses of tape merge patterns that are far
    more efficient, in the sense of shuffling the data around
    fewer times. It seems likely that the tapes you saw were in
    fact following one or another of these more sophisticated
    patterns.

    > It was hypnotic to watch the tapes twitching away.


    Been there, done that, done thaaat, doo-one tha-a-a...

    --
    Eric Sosman, Aug 11, 2003
    #6
  7. Seth G.

    Roedy Green Guest

    On Mon, 11 Aug 2003 17:50:45 -0400, Eric Sosman <>
    wrote or quoted :

    >replacement selection


    Is this what is called a "tournament" sort? How does it work?

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Aug 12, 2003
    #7
  8. Seth G.

    Eric Sosman Guest

    Roedy Green wrote:
    >
    > On Mon, 11 Aug 2003 17:50:45 -0400, Eric Sosman <>
    > wrote or quoted :
    >
    > >replacement selection

    >
    > Is this what is called a "tournament" sort? How does it work?


    *Very* briefly, since the topicality seems shaky: Read as
    many items as you possibly can. Then select the least item and
    write it to the merge file. Fill the vacated spot by reading
    another input item, which gives two cases: if the new item is
    greater than or equal to the last item output, just keep on
    repeating the select-write-replace loop. If the new item is
    less than the last item output, it can't be added to the run
    currently being written, so it must be deferred to the next
    run. Set it aside for now, thus reducing the number of items
    still available for selection. If any items still are available
    for selection, continue the select-write-replace loop.

    Eventually, memory becomes completely full of deferred items,
    none of which can belong to the current output run. Finish off
    that run (writing trailer records, closing files, whatever) and
    begin a new one, using all those deferred items as the initial
    contents of the selection tree.

    This process is something like an M-way merge of the input
    data with itself, where M is the number of items that fit in
    memory. For further details, including an analysis of why the
    expected run length is approximately 2*M, see Knuth "The Art
    of Computer Programming, Volume III: Sorting and Searching,"
    Section 5.4.1.

    --
    Eric Sosman, Aug 12, 2003
    #8
  9. Seth G.

    Roedy Green Guest

    On Tue, 12 Aug 2003 15:11:06 -0400, Eric Sosman <>
    wrote or quoted :

    > If the new item is
    >less than the last item output, it can't be added to the run
    >currently being written, so it must be deferred to the next
    >run.


    You need some sort of structure that lets you easily peel off the
    lowest element, and that lets you add a new element so that when its
    time comes it will peel off as lowest.


    This may not necessarily be a sorted list. Many years ago I
    implemented C.A.R. Hoare's HeapSort. I have a Java version at
    http://mindprod.com/jgloss/heapsort.html The structure it uses might
    fill the bill. It has been too long ago to remember. I vaguely recall
    it was like a authority hierarchy at a business.
    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Aug 14, 2003
    #9
  10. Seth G.

    Roedy Green Guest

    On Thu, 14 Aug 2003 12:43:19 -0400, Eric Sosman <>
    wrote or quoted :

    > Heapsort and binary search are the two algorithms that
    >make me nostalgic for FORTRAN's "arithmetic IF" statement
    >and its three-way branching capability.


    CompareTo is reminiscent of the old Fortran 3-way arithmetic IF.

    you write code something like this:

    int diff = a.compareTo(b);
    if ( diff > 0 )
    {
    return x;
    }
    else if ( diff < 0 )
    {
    return y;
    }
    else
    {
    return z;
    }

    If that is natively compiled on a Pentium, the comparison is done only
    once, and then there are conditional jumps based on the condition
    code.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Aug 14, 2003
    #10
  11. "Eric Sosman" <> wrote in message
    news:...
    > "Seth G." wrote:
    > >

    > Note that merging three sorted sub-files of twelve items
    > each is harder than merging two sorted sub-files of eighteen
    > items. Merging six six-item files is harder still -- and,
    > of course, if you found yourself merging thirty-six one-item
    > files, it might occur to you that the sort hadn't made a lot
    > of progress up to that point ...


    I have code (available at http://www.jordanzimmerman.com/code) that
    implements a "K-Way merge" that's extremely efficient at merging > 2 sorted
    containers. It uses a tree structure to reduce the number of comparisons
    required to merge multiple containers.

    --
    Jordan Zimmerman
    http://www.jordanzimmerman.com
    Jordan Zimmerman, Aug 15, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    4,453
  2. Replies:
    1
    Views:
    960
    Jonathan Bromley
    Mar 31, 2005
  3. Kevin King

    merge sort #of comparisons

    Kevin King, Nov 29, 2003, in forum: C++
    Replies:
    3
    Views:
    774
    Elie Nader
    Nov 29, 2003
  4. rkk
    Replies:
    9
    Views:
    803
    CBFalconer
    Sep 24, 2006
  5. Navin
    Replies:
    1
    Views:
    671
    Ken Schaefer
    Sep 9, 2003
Loading...

Share This Page