Merge Sort question

S

Seth G.

The following question is about merge sort, and not particularly java
( I didnt know where else to put this query).

If merge sort uses the divide and conquer approach to sort an array in
the following manner(roughly):

MERGE_SORT(Array,begin,end)
{
if ( begin < end )
{
middle = ( begin + end ) / 2

// merge sort first half of array
MERGE_SORT(Array,begin,end/2);

// merge sort second half of array
MERGE_SORT(Array,end/2,end);

MERGE(Array,begin,middle,end);
}

}

What effect would dividing by the array into more than 2 (say
3,4,...,n) have on the algorithm? Would it be more efficient than the
merge sort runtime of
O(n logn)? My inclination is to say no, and it may in fact be less
efficient due to the increased recursive calls to merge sort, pushing
larger chunks of data onto the stack.

I remeber speaking to someone about this very topic before, and I
cant recall the outcome of that conversation and why I convinced
myself it was no more efficient than merge sort "as is".

Thanks in advance!
 
C

Chris Smith

Seth said:
What effect would dividing by the array into more than 2 (say
3,4,...,n) have on the algorithm? Would it be more efficient than the
merge sort runtime of
O(n logn)?

No. Logarithms of different bases are constant multiples of each other,
so dividing into three pieces would produce an algorithm that could be
called O(n log[base 3] n), but that is equivalent to any other
logarithmic base, as far as big-O notation is concerned. You just end
up with a constant multiple difference of (log 3 / log 2), and constant
multiples don't matter to big-O notation.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
R

Roedy Green

What effect would dividing by the array into more than 2 (say
3,4,...,n) have on the algorithm? Would it be more efficient than the
merge sort runtime of
O(n logn)? My inclination is to say no, and it may in fact be less
efficient due to the increased recursive calls to merge sort, pushing
larger chunks of data onto the stack.

It all depends on what algorithm is it using to sort the two halves.
Is it more or less efficient than merge over what range of sizes.
Normally a merge is used when:

1. you already have two streams sorted.

2. You have not enough ram to sort the whole pile at once, so you sort
chunks. The advantage of merge is you can merge two chunks without
needing room for both in the in and out chunks in ram at once.

Generally I would just use Sun's sort unless there is something
dreadfully wrong with it. Focus on writing a fast tight comparator.
If you want to experiment, benchmark various algorithms. See
http://mindprod.com/jgloss/sort.html
 
E

Eric Sosman

Seth G. said:
The following question is about merge sort, and not particularly java
( I didnt know where else to put this query).

If merge sort uses the divide and conquer approach to sort an array in
the following manner(roughly): [pseudocode snipped]

What effect would dividing by the array into more than 2 (say
3,4,...,n) have on the algorithm? Would it be more efficient than the
merge sort runtime of
O(n logn)? My inclination is to say no, and it may in fact be less
efficient due to the increased recursive calls to merge sort, pushing
larger chunks of data onto the stack.

The big-Oh behavior will be unaffected, although all of
the constant factor, the threshold, and the lower-order terms
might change.

Note that merging three sorted sub-files of twelve items
each is harder than merging two sorted sub-files of eighteen
items. Merging six six-item files is harder still -- and,
of course, if you found yourself merging thirty-six one-item
files, it might occur to you that the sort hadn't made a lot
of progress up to that point ...

Carrying the idea to not quite such an extreme leads to
the "replacement selection" algorithm, described in detail
by Knuth ("The Art of Computer Programming, Volume III: Sorting
and Searching," Section 5.4.1) in connection with developing
the initial runs for external sorting, typically on magnetic
tape. The medium may seem archaic, but the algorithm can
certainly be applied in other contexts.
 
R

Roedy Green

in connection with developing
the initial runs for external sorting, typically on magnetic
tape. The medium may seem archaic, but the algorithm can
certainly be applied in other contexts.

The nice thing about merge is it requires only sequential access. It
can work in a bare minimum of RAM.

You will see it used in OptTech sort for example after RAM-fuls of
data have been sorted and dumped to disk. After that first phase, you
do an N-way merge to create longer and longer chunks on disk until you
are done.

see http://mindprod.com/jgloss/sort.html

In the old days, 3 way merges were accomplished with a bank of 6 tape
drives, three in and three out. You would see saw the data back and
forth, gradually getting longer blocks of sorted data.

It was hypnotic to watch the tapes twitching away.

Today, people seem to panic as soon as Sunsort runs out of RAM.

You'd think someone would have written a merge sort to extend SunSort
to giant data sets. It works best if you can put each of your temp
files on a different physical disk.

The complication is you have to start serialising the objects.
With pure-in ram sort, all you need to manipulate is a single array of
references.
 
E

Eric Sosman

(Straying from topicality, but ...)

Roedy said:
The nice thing about merge is it requires only sequential access. It
can work in a bare minimum of RAM.

You will see it used in OptTech sort for example after RAM-fuls of
data have been sorted and dumped to disk. After that first phase, you
do an N-way merge to create longer and longer chunks on disk until you
are done.

The nice thing about replacement selection is that you
typically get run lengths of about twice the memory capacity,
hence half as many chunks to deal with while merging.

Hmmm ... I'd take issue with your characterization of
HeapSort as mostly good for already-ordered data and of
ShellSort as good only for small data sets. HeapSort is
perfectly happy with jumbled data, and ShellSort with well-
chosen increments can be very fast indeed. See

http://www.cs.princeton.edu/~rs/shell/index.html
In the old days, 3 way merges were accomplished with a bank of 6 tape
drives, three in and three out. You would see saw the data back and
forth, gradually getting longer blocks of sorted data.

That wasn't a very good way to do things. See Knuth for
descriptions and analyses of tape merge patterns that are far
more efficient, in the sense of shuffling the data around
fewer times. It seems likely that the tapes you saw were in
fact following one or another of these more sophisticated
patterns.
It was hypnotic to watch the tapes twitching away.

Been there, done that, done thaaat, doo-one tha-a-a...
 
E

Eric Sosman

Roedy said:
Is this what is called a "tournament" sort? How does it work?

*Very* briefly, since the topicality seems shaky: Read as
many items as you possibly can. Then select the least item and
write it to the merge file. Fill the vacated spot by reading
another input item, which gives two cases: if the new item is
greater than or equal to the last item output, just keep on
repeating the select-write-replace loop. If the new item is
less than the last item output, it can't be added to the run
currently being written, so it must be deferred to the next
run. Set it aside for now, thus reducing the number of items
still available for selection. If any items still are available
for selection, continue the select-write-replace loop.

Eventually, memory becomes completely full of deferred items,
none of which can belong to the current output run. Finish off
that run (writing trailer records, closing files, whatever) and
begin a new one, using all those deferred items as the initial
contents of the selection tree.

This process is something like an M-way merge of the input
data with itself, where M is the number of items that fit in
memory. For further details, including an analysis of why the
expected run length is approximately 2*M, see Knuth "The Art
of Computer Programming, Volume III: Sorting and Searching,"
Section 5.4.1.
 
R

Roedy Green

If the new item is
less than the last item output, it can't be added to the run
currently being written, so it must be deferred to the next
run.

You need some sort of structure that lets you easily peel off the
lowest element, and that lets you add a new element so that when its
time comes it will peel off as lowest.


This may not necessarily be a sorted list. Many years ago I
implemented C.A.R. Hoare's HeapSort. I have a Java version at
http://mindprod.com/jgloss/heapsort.html The structure it uses might
fill the bill. It has been too long ago to remember. I vaguely recall
it was like a authority hierarchy at a business.
 
R

Roedy Green

Heapsort and binary search are the two algorithms that
make me nostalgic for FORTRAN's "arithmetic IF" statement
and its three-way branching capability.

CompareTo is reminiscent of the old Fortran 3-way arithmetic IF.

you write code something like this:

int diff = a.compareTo(b);
if ( diff > 0 )
{
return x;
}
else if ( diff < 0 )
{
return y;
}
else
{
return z;
}

If that is natively compiled on a Pentium, the comparison is done only
once, and then there are conditional jumps based on the condition
code.
 
J

Jordan Zimmerman

Eric Sosman said:
Note that merging three sorted sub-files of twelve items
each is harder than merging two sorted sub-files of eighteen
items. Merging six six-item files is harder still -- and,
of course, if you found yourself merging thirty-six one-item
files, it might occur to you that the sort hadn't made a lot
of progress up to that point ...

I have code (available at http://www.jordanzimmerman.com/code) that
implements a "K-Way merge" that's extremely efficient at merging > 2 sorted
containers. It uses a tree structure to reduce the number of comparisons
required to merge multiple containers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,999
Latest member
MakersCBDGummiesReview

Latest Threads

Top