# Best way to merge/sort two sorted lists?...

A

#### Aaron Watters

....is to forget they are sorted???

While trying to optimize some NUCULAR libraries I discovered
that the best way to merge 2 sorted lists together
into a new sorted list is to just append
them and re-sort. The following test case demonstrates this.
It can be criticized in many ways: it only tests lists of the same
size,
it only uses "hashed" data, etcetera...
Still, my testing shows "resort everything" is consistently
two times faster than an explicit python merge even for fairly large
data.

I'm beginning to think
a "sorted list merger" might make a nice tiny extension module
(at least for my purposes).

See timing demonstration code below. Let me know if there
is a better way or if the test is fatally flawed, please.

--- Aaron Watters
====
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=greedy+bastard

==========snip: test code below
"test different ways to merge two sorted lists"

def ObviousMerge(L1, L2):
"obvious way"
len1 = len(L1)
len2 = len(L2)
resultlen = len1+len2
result = [None] * resultlen
count = 0
index1 = 0
index2 = 0
while count<resultlen:
if index1<len1:
elt1 = L1[index1]
if index2<len2:
elt2 = L2[index2]
if elt1<elt2:
result[count] = elt1
index1+=1
else:
result[count] = elt2
index2+=1
else:
result[count] = elt1
index1+=1
else:
result[count] = L2[index2]
index2+=1
count+=1
return result

def AggregateTailMerge(L1, L2):
"obvious way, aggregating the tail"
len1 = len(L1)
len2 = len(L2)
resultlen = len1+len2
result = [None] * resultlen
count = 0
index1 = 0
index2 = 0
while index1<len1 and index2<len2:
elt1 = L1[index1]
elt2 = L2[index2]
if elt1<elt2:
result[count] = elt1
index1+=1
else:
result[count] = elt2
index2+=1
count+=1
if index1<len1:
result[count:] = L1[index1:]
if index2<len2:
result[count:] = L2[index2:]
return result

# could aggregate head and tail using bisect: skipped

def ResortEverythingMerge(L1, L2):
"aggregate everything using list append and sort"
result = L1+L2
result.sort()
return result

mergeFunctions = [ResortEverythingMerge, ObviousMerge,
AggregateTailMerge, ]

# make some test data
def makeAListOfHashes(length, data):
import md5
return [md5.md5(str(i)+data).hexdigest() for i in xrange(length)]

print "constructing global test data"

SIZES = [10, 100, 1000, 10000, 100000, 1000000]

LISTS = [ (L, makeAListOfHashes(L,"A"), makeAListOfHashes(L, "B") )
for L in SIZES ]

for (length, L1, L2) in LISTS:
L1.sort()
L2.sort()

print "done with making global test data"
print

def timings(mergerFunction):
from time import time
fname = mergerFunction.__name__
print
print "now timing", mergerFunction
print
for (length, L1, L2) in LISTS:
now = time()
result = f(L1, L2)
elapsed = time() - now
print " for", length, "elapsed %3.5f"%elapsed, "for", fname

if __name__=="__main__":
print
print "running timings"
for f in mergeFunctions:
timings(f)

================ snip run output below:
constructing global test data
done with making global test data

running timings

now timing <function ResortEverythingMerge at 0x20000000004f4de8>

for 10 elapsed 0.00003 for ResortEverythingMerge
for 100 elapsed 0.00006 for ResortEverythingMerge
for 1000 elapsed 0.00057 for ResortEverythingMerge
for 10000 elapsed 0.00626 for ResortEverythingMerge
for 100000 elapsed 0.12484 for ResortEverythingMerge
for 1000000 elapsed 1.60117 for ResortEverythingMerge

now timing <function ObviousMerge at 0x20000000004f47d0>

for 10 elapsed 0.00008 for ObviousMerge
for 100 elapsed 0.00027 for ObviousMerge
for 1000 elapsed 0.00259 for ObviousMerge
for 10000 elapsed 0.02587 for ObviousMerge
for 100000 elapsed 0.27862 for ObviousMerge
for 1000000 elapsed 2.89965 for ObviousMerge

now timing <function AggregateTailMerge at 0x20000000004f4cf8>

for 10 elapsed 0.00008 for AggregateTailMerge
for 100 elapsed 0.00025 for AggregateTailMerge
for 1000 elapsed 0.00246 for AggregateTailMerge
for 10000 elapsed 0.02366 for AggregateTailMerge
for 100000 elapsed 0.26594 for AggregateTailMerge
for 1000000 elapsed 2.77103 for AggregateTailMerge

R

#### Robin Becker

Aaron said:
...is to forget they are sorted???
code snipped

Aaron I just flung your python merge code into pyrex and the results show that
the iteration overhead can be brought down without much effort. The real deal
would presumably be to start using pointers into the result list rather than the
generic python type code which I have used. I haven't done much of that with
pyrex yet, but I believe others do that all the time with these predetermined
list lengths. The pyrex code doesn't look too different from python that it puts
them off (I guess). I'm guessing a lot of the pyrex time is spent incrementing
and decrementing refcounts etc etc and that can clearly be made more efficient.
Also since None is being used as a place holder we could eliminate that by using
null or undefined initial contents for the result lists thus saving time
decrementing None's refcount since each result list element is only accessed once.

####start obvmerge.pyx
def PyxObviousMerge(L1, L2):
"obvious way"
cdef int len1, len2, resultlen, index1, index2
len1 = len(L1)
len2 = len(L2)
resultlen = len1+len2
result = [None] * resultlen
count = 0
index1 = 0
index2 = 0
while count<resultlen:
if index1<len1:
elt1 = L1[index1]
if index2<len2:
elt2 = L2[index2]
if elt1<elt2:
result[count] = elt1
index1 = index1+1
else:
result[count] = elt2
index2 = index2 + 1
else:
result[count] = elt1
index1 = index1+1
else:
result[count] = L2[index2]
index2=index2+1
count = count + 1
return result

def PyxAggregateTailMerge(L1, L2):
"obvious way, aggregating the tail"
cdef int len1, len2, resultlen, index1, index2
len1 = len(L1)
len2 = len(L2)
resultlen = len1+len2
result = [None] * resultlen
count = 0
index1 = 0
index2 = 0
while index1<len1 and index2<len2:
elt1 = L1[index1]
elt2 = L2[index2]
if elt1<elt2:
result[count] = elt1
index1= index1+1
else:
result[count] = elt2
index2 = index2+1
count=count+1
if index1<len1:
result[count:] = L1[index1:]
if index2<len2:
result[count:] = L2[index2:]
return result
####end obvmerge.pyx

==========test results
C:\code\users\robin>tmerge.py
constructing global test data
done with making global test data

running timings

now timing <function ResortEverythingMerge at 0x00C150F0>

for 10 elapsed 0.00000 for ResortEverythingMerge
for 100 elapsed 0.00000 for ResortEverythingMerge
for 1000 elapsed 0.00000 for ResortEverythingMerge
for 10000 elapsed 0.00000 for ResortEverythingMerge
for 100000 elapsed 0.06200 for ResortEverythingMerge
for 1000000 elapsed 0.78100 for ResortEverythingMerge

now timing <function ObviousMerge at 0x00C15070>

for 10 elapsed 0.00000 for ObviousMerge
for 100 elapsed 0.00000 for ObviousMerge
for 1000 elapsed 0.00000 for ObviousMerge
for 10000 elapsed 0.00000 for ObviousMerge
for 100000 elapsed 0.12500 for ObviousMerge
for 1000000 elapsed 1.32800 for ObviousMerge

now timing <built-in function PyxObviousMerge>

for 10 elapsed 0.00000 for PyxObviousMerge
for 100 elapsed 0.00000 for PyxObviousMerge
for 1000 elapsed 0.00000 for PyxObviousMerge
for 10000 elapsed 0.01600 for PyxObviousMerge
for 100000 elapsed 0.09300 for PyxObviousMerge
for 1000000 elapsed 1.09400 for PyxObviousMerge

now timing <function AggregateTailMerge at 0x00C150B0>

for 10 elapsed 0.00000 for AggregateTailMerge
for 100 elapsed 0.00000 for AggregateTailMerge
for 1000 elapsed 0.00000 for AggregateTailMerge
for 10000 elapsed 0.00000 for AggregateTailMerge
for 100000 elapsed 0.12500 for AggregateTailMerge
for 1000000 elapsed 1.20300 for AggregateTailMerge

now timing <built-in function PyxAggregateTailMerge>

for 10 elapsed 0.00000 for PyxAggregateTailMerge
for 100 elapsed 0.00000 for PyxAggregateTailMerge
for 1000 elapsed 0.00000 for PyxAggregateTailMerge
for 10000 elapsed 0.00000 for PyxAggregateTailMerge
for 100000 elapsed 0.09400 for PyxAggregateTailMerge
for 1000000 elapsed 1.03100 for PyxAggregateTailMerge

C:\code\users\robin>

N

#### Neil Cerutti

...is to forget they are sorted???

While trying to optimize some NUCULAR libraries I discovered
that the best way to merge 2 sorted lists together
into a new sorted list is to just append
them and re-sort. The following test case demonstrates this.
It can be criticized in many ways: it only tests lists of the same
size,
it only uses "hashed" data, etcetera...
Still, my testing shows "resort everything" is consistently
two times faster than an explicit python merge even for fairly large
data.

I'm beginning to think
a "sorted list merger" might make a nice tiny extension module
(at least for my purposes).

See timing demonstration code below. Let me know if there
is a better way or if the test is fatally flawed, please.

--- Aaron Watters
====
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=greedy+bastard

==========snip: test code below
"test different ways to merge two sorted lists"

def ObviousMerge(L1, L2):
def AggregateTailMerge(L1, L2):
<SNIP>

Your merge functions are pretty complex; here's a simpler,
obvious solution:

def merge_sorted(a, b):
"""Merge two sorted lists into a new sorted list.
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
[0, 1, 2]
[0, 1, 2]
[]
"""
rval = []
aix, bix = 0, 0
astop, bstop = len(a), len(b)
while aix < astop and bix < bstop:
if a[aix] < b[bix]:
rval.append(a[aix])
aix += 1
else:
rval.append(b[bix])
bix += 1
while aix < astop:
rval.append(a[aix])
aix += 1
while bix < bstop:
rval.append(b[bix])
bix += 1
return rval

It should beat ResortEverything consistently once the lists
become larger than a certain size. Do you get better results at
all with the above function?

R

N

#### Neil Cerutti

That's fairly awesome.

N

#### Neil Cerutti

It should beat ResortEverything consistently once the lists
become larger than a certain size. Do you get better results at
all with the above function?

With psyco, my merge_sorted becamse faster than relying on
timsort at roughly 80 elements total. Without psyco it was always
slowest.

Both were much faster than the ASPN imerge recipe--if you built a
list with the result. imerge is wicked fast if you never extract
any values, though. ;-)

Or perhaps I'm just misprofiling. I used:

a = range(20000) # e.g
b = range(17000)

t = timeit.Timer('merge_sorted(a, b)', 'from __main__ import '
'merge_sorted, a, b')
t2 = timeit.Timer('merge_sorted2(a, b)', 'from __main__ import '
'merge_sorted2, a, b')
t3 = timeit.Timer('list(imerge(a, b))', 'from __main__ import imerge, a, b')
print t.timeit(100)
print t2.timeit(100)
print t3.timeit(100)

A

#### Aaron Watters

That's fairly awesome.

The second one is! That's why it works so fast.
Tim Peters optimized this case!
Gotta hand it to Tim Peters. The first one should
win some sort of obfuscated code contest, imho.
It also seems to be 5 times slower than any of the others.

btw: Neil's merge_sorted is a bit slower than any of
mine due to the use of list.append, I think instead
of preallocating a list of the right size.

-- Aaron
===
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=inherently+hosed

N

#### Neil Cerutti

The second one is! That's why it works so fast.
Tim Peters optimized this case!
Gotta hand it to Tim Peters. The first one should
win some sort of obfuscated code contest, imho.
It also seems to be 5 times slower than any of the others.

btw: Neil's merge_sorted is a bit slower than any of mine due
to the use of list.append, I think instead of preallocating a
list of the right size.

I was hoping that, as it was slightly simpler that way, it would
optimization, too.

T

#### Terry Reedy

| While trying to optimize some NUCULAR libraries I discovered
| that the best way to merge 2 sorted lists together
| into a new sorted list is to just append
| them and re-sort.

The current version of list.sort (timsort) was designed to take advantage
of pre-existing order. It should discover the 2 sorted sublists and merge
them together. It will not really re-sort the way it would with random
data.

A psyco/Python version can be faster only because it avoids the comparisons
need to discover the existing sorted sublists.

tjr

A

#### Aaron Watters

The current version of list.sort (timsort) was designed to take advantage
of pre-existing order. It should discover the 2 sorted sublists and merge
them together. It will not really re-sort the way it would with random
data.

A psyco/Python version can be faster only because it avoids the comparisons
need to discover the existing sorted sublists.

I'm not sure about psyco, but I bet a tuned C implementation
can get more than another factor of 2-3 better, and it wouldn't
be that long either (except I'd have to stretch some unused
brain muscles). This is especially true since the functionality
I need is slightly more complex than this (I need to sort keys
and values together, preferably without making a list of pairs
to do it, and I need "remainder" lists where the key values do
not overlap). This is what nucular index builds spend all their
time doing, so it would have a big impact, I think.
Cool stuff. Fun too.
-- Aaron Watters

A

#### Aaron Watters

See recipes:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/491285
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/305269

I previously noted in
that I found the first recipe slow and obscure.
Raymond pointed out privately
that it is designed to minimize memory
requirements when merging possibly many files from
a filesystem. It does that very well, and I apologize
for the offensive tone of my comment.

-- Aaron Watters
===
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=cryptic+code+performance+hacks

P

#### Paul Rubin

Aaron Watters said:
The second one is! That's why it works so fast.
Tim Peters optimized this case!
Gotta hand it to Tim Peters. The first one should
win some sort of obfuscated code contest, imho.
It also seems to be 5 times slower than any of the others.

The heapq method is the right way to do it when you want to
merge n lists instead of two lists. Each selection takes O(log n)
operations.