Best way to insert sorted in a list

S

SherjilOzair

There are basically two ways to go about this.
One is, to append the new value, and then sort the list.
Another is to traverse the list, and insert the new value at the
appropriate position.

The second one's complexity is O(N), while the first one's is O(N *
log N).

Still, the second one works much better, because C code is being used
instead of pythons.

Still, being a programmer, using the first way (a.insert(x);
a.sort()), does not feel right.

What has the community to say about this ? What is the best (fastest)
way to insert sorted in a list ?
 
S

Shashank Singh

There are basically two ways to go about this.
One is, to append the new value, and then sort the list.
Another is to traverse the list, and insert the new value at the
appropriate position.

The second one's complexity is O(N), while the first one's is O(N *
log N).

Correct me if I am wrong here but isn't the second one is O(log N)?
Binary search?
That is when you have an already sorted list from somewhere and you
are inserting just one new value.
In case you are building the whole list yourself it's the same (N * log N)
 
I

Ian Kelly

Correct me if I am wrong here but isn't the second one is O(log N)?
Binary search?
That is when you have an already sorted list from somewhere and you
are inserting just one new value.

Finding the position to insert is O(log n), but the actual insert is
O(n). This is because Python lists are implemented with arrays and
everything after the inserted item has to be moved in memory to make
space for the insert.
 
E

Ethan Furman

SherjilOzair said:
What has the community to say about this ? What is the best (fastest)
way to insert sorted in a list ?

Check out the bisect module.

~Ethan~
 
C

Chris Torek

There are basically two ways to go about this.
One is, to append the new value, and then sort the list.
Another is to traverse the list, and insert the new value at the
appropriate position.

The second one's complexity is O(N), while the first one's is O(N *
log N).

This is not quite right; see below.
Still, the second one works much better, because C code is being used
instead of pythons.

Still, being a programmer, using the first way (a.insert(x);
a.sort()), does not feel right.

What has the community to say about this ? What is the best (fastest)
way to insert sorted in a list ?

In this case, the "best" way is most likely "don't do that at all".

First, we should note that a python list() data structure is actually
an array. Thus, you can locate the correct insertion point pretty
fast, by using a binary or (better but not as generally applicable)
interpolative search to find the proper insertion point.

Having found that point, though, there is still the expense of
the insertion, which requires making some room in the array-that-
makes-the-list (I will use the name "a" as you did above):

position = locate_place_for_insert(a, the_item)
# The above is O(log n) for binary search,
# O(log log n) for interpolative search, where
# n is len(a).

a.insert(position, the_item)
# This is still O(log n), alas.

Appending to the list is much faster, and if you are going to
dump a set of new items in, you can do that with:

# wrong way:
# for item in large_list:
# a.append(item)
# right way, but fundamentally still the same cost (constant
# factor is much smaller due to built-in append())
a.append(large_list)

If len(large_list) is m, this is O(m). Inserting each item in
the "right place" would be O(m log (n + m)). But we still
have to sort:

a.sort()

This is O(log (n + m)), hence likely better than repeatedly inserting
in the correct place.

Depending on your data and other needs, though, it might be best
to use a red-black tree, an AVL tree, or a skip list. You might
also investigate radix sort, radix trees, and ternary search trees
(again depending on your data).
 
E

Ethan Furman

Chris said:
Appending to the list is much faster, and if you are going to
dump a set of new items in, you can do that with:

# wrong way:
# for item in large_list:
# a.append(item)
# right way, but fundamentally still the same cost (constant
# factor is much smaller due to built-in append())
a.append(large_list)
^- should be a.extend(large_list)

~Ethan~
 
I

Ian Kelly

If len(large_list) is m, this is O(m).  Inserting each item in
the "right place" would be O(m log (n + m)).  But we still
have to sort:

   a.sort()

This is O(log (n + m)), hence likely better than repeatedly inserting
in the correct place.

Surely you mean O((n + m) log (n + m)).
 
C

Chris Torek

Appending to the list is much faster, and if you are going to
dump a set of new items in, you can do that with: [...]

a.append(large_list)
^- should be a.extend(large_list)[/QUOTE]

Er, right. Posted in haste (had to get out the door). I also
wrote:

Surely you mean O((n + m) log (n + m)).

Er, "maybe"? (It depends on the relative values of m and n, and
the underlying sort algorithm to some extent. Some algorithms are
better at inserting a relatively small number of items into a
mostly-sorted large list. As I recall, Shell sort does well with
this.) But generally, yes. See "posted in haste" above. :)

There are a lot of other options, such as sorting just the list of
"items to be inserted", which lets you do a single merge pass:

# UNTESTED
def merge_sorted(it1, it2, must_copy = True):
"""
Merge two sorted lists/iterators it1 and it2.
Roughly equivalent to sorted(list(it2) + list(it2)),
except for attempts to be space-efficient.

You can provide must_copy = False if the two iterators
are already lists and can be destroyed for the purpose
of creating the result.
"""

# If it1 and it1 are deque objects, we don't need to
# reverse them, as popping from the front is efficient.
# If they are plain lists, popping from the end is
# required. If they are iterators or tuples we need
# to make a list version anyway. So:
if must_copy:
it1 = list(it1)
it2 = list(it2)

# Reverse sorted lists (it1 and it2 are definitely
# lists now) so that we can pop off the end.
it1.reverse()
it2.reverse()

# Now accumulate final sorted list. Basically, this is:
# take first (now last) item from each list, and put whichever
# one is smaller into the result. When either list runs
# out, tack on the entire remaining list (whichever one is
# non-empty -- if both are empty, the two extend ops are
# no-ops, so we can just add both lists).
#
# Note that we have to re-reverse them to get
# them back into forward order before extending.
result = []
while it1 and it2:
# Note: I don't know if it might be faster
# to .pop() each item and .append() the one we
# did not want to pop after all. This is just
# an example, after all.
last1 = it1[-1]
last2 = it2[-1]
if last2 < last1:
result.append(last2)
it2.pop()
else:
result.append(last1)
it1.pop()
it1.reverse()
it2.reverse()
result.extend(it1)
result.extend(it2)
return result

So, now if "a" is the original (sorted) list and "b" is the not-yet-
sorted list of things to add:

a = merge_sorted(a, sorted(b), must_copy = False)

will work, provided you are not required to do the merge "in place".
Use the usual slicing trick if that is necessary:

a[:] = merge_sorted(a, sorted(b), must_copy = False)

If list b is already sorted, leave out the sorted() step. If list
b is not sorted and is particularly long, use b.sort() to sort in
place, rather than making a sorted copy.
 
S

Steven D'Aprano

What has the community to say about this ? What is the best (fastest)
way to insert sorted in a list ?

if you're doing repeated insertions into an already sorted list, there's
no question that the bisect module is the right way to do it.

(Unless you have a very old version of Python, version 2.3 or older.)
.... L = list(range(10000)) + list(range(10100, 30000))
.... from bisect import insort
.... def sort_insert(a, x):
.... a.append(x)
.... a.sort()
.... """0.3741610050201416

(For those unfamiliar with timeit, small numbers are faster.)
 
D

Dennis Lee Bieber

There are basically two ways to go about this.
One is, to append the new value, and then sort the list.
Another is to traverse the list, and insert the new value at the
appropriate position.

The second one's complexity is O(N), while the first one's is O(N *
log N).

Still, the second one works much better, because C code is being used
instead of pythons.

Still, being a programmer, using the first way (a.insert(x);
a.sort()), does not feel right.

What has the community to say about this ? What is the best (fastest)
way to insert sorted in a list ?

How often do the inserts occur? How often do you need to access the
list?

If inserts tend to come in batches, I'd probably append all the
inserts and use sort. If inserts come very sporadically, bisect (or
similar binary search), split, join ( alist = alist[:splitpoint] + data
+ alist[splitpoint:] style, or similar)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,052
Latest member
LucyCarper

Latest Threads

Top