generic xmerge ?

A

Alex Martelli

Hi,

I was reading this recipe and am wondering if there is a generic
version of it floating around ? My list is a tuple (date, v1, v2, v3)
and I would like it to sort on date. The documentation doesn't mention
how the items are compared and the example only use integers.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/141934

I'm not sure what "my list is a tuple" mean (list and tuple being
different types) nor what this has to do with the recipe. Anyway...
sequences are compared lexicographically -- first items first, then
second items if the first items are equal, and so on. So, if you have a
list X whose items tuples and want X sorted on the tuples' first items,
X.sort() will suffice -- if the tuples never have equal first-items, or
if you're OK with second-items getting compared when the first-items are
equal. If you want to sort on first-items ONLY, leaving the tuples in
the same order in the list when their first-items are equal:

import operator
X.sort(key=operator.itemgetter(0))


Alex
 
B

bonono

oops, sorry. I meant

l1=[(date,v1,v2,v3), ...]
l2=[ another set of tuples ]


Thanks. so I have to concat the multiple lists first(all of them are
sorted already) ?
 
A

Alex Martelli

oops, sorry. I meant

l1=[(date,v1,v2,v3), ...]
l2=[ another set of tuples ]

Thanks. so I have to concat the multiple lists first(all of them are
sorted already) ?

You can do it either way -- simplest, and pretty fast, is to concatenate
them all and sort the result (the sort method is very good at taking
advantage of any sorting that may already be present in some parts of
the list it's sorting), but you can also try a merging approach. E.g.:


import heapq

def merge_by_heap(*lists):
sources = [[s.next(), i, s.next]
for i, s in enumerate(map(iter,lists))]
heapq.heapify(sources)
while sources:
best_source = sources[0]
yield best_source[0]
try: best_source[0] = best_source[-1]()
except StopIteration: heapq.heappop(sources)
else: heapq.heapreplace(sources, best_source)

Now, L=list(merge_by_heap(l1, l2, l3)) should work, just as well as,
say, L = sorted(l1+l2+l3). I suspect the second approach may be faster,
as well as simpler, but it's best to _measure_ (use the timeit.py module
from the standard library) if this code is highly speed-critical for
your overall application.


Alex
 
B

bonono

million thanks. So the default compare funcion of heapq also do it like
sort ?

The size of the list is not very large but has the potential of being
run many times(web apps). So I believe second one should be faster(from
the app perspective) as it goes into the optimized code quickly without
all the overheads in the merge case.
 
A

Alex Martelli

million thanks. So the default compare funcion of heapq also do it like
sort ?

By defaults, all comparisons in Python occur by the same mechanisms: by
preference, specific comparison operators such as < , <= , and so on
(corresponding to special methods __lt__, __le__, and so on) -- missing
that, the three-way comparison done by built-in function cmp
(corresponding to speial method __cmp__). For built-in sequences, in
particular (both tuples and lists), the comparisons are lexicographical.

Some (but not all) occasions that imply comparisons let you specify
something else than the default, for example by such mechanisms as the
cmp= optional argument to .sort (which tends to have unpleasant
performance impacts) and the key= optional argument (which tends to have
good performance impact). heapq does not offer this feature in today's
Python (i.e., 2.4), but I believe it's planned to have it in the future
2.5 release.

But in your case the default comparisons appear to be OK, so you should
have no problem either way.

The size of the list is not very large but has the potential of being
run many times(web apps). So I believe second one should be faster(from
the app perspective) as it goes into the optimized code quickly without
all the overheads in the merge case.

Yes, the simpler solution may well perform better. Note that:

L = list(l1)
L.extend(l2)
L.extend(l3)
L.sort()

may perform better than L = sorted(l1+l2+l3) -- if speed matters a lot,
be sure to try (and measure!) both versions.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,111
Latest member
KetoBurn
Top