unique-ifying a list

kj · Aug 7, 2009

Suppose that x is some list. To produce a version of the list with
duplicate elements removed one could, I suppose, do this:

x = list(set(x))

but I expect that this will not preserve the original order of
elements.

I suppose that I could write something like

def uniquify(items):
seen = set()
ret = []
for i in items:
if not i in seen:
ret.append(i)
seen.add(i)
return ret

But this seems to me like such a commonly needed operation that I
find it hard to believe one would need to resort to such self-rolled
solutions. Isn't there some more standard (and hopefully more
efficient, as in "C-coded"/built-in) approach?

TIA!

kynn

Jonathan Gardner · Aug 7, 2009

Suppose that x is some list. To produce a version of the list with
duplicate elements removed one could, I suppose, do this:

x = list(set(x))

but I expect that this will not preserve the original order of
elements.

I suppose that I could write something like

def uniquify(items):
seen = set()
ret = []
for i in items:
if not i in seen:
ret.append(i)
seen.add(i)
return ret

But this seems to me like such a commonly needed operation that I
find it hard to believe one would need to resort to such self-rolled
solutions. Isn't there some more standard (and hopefully more
efficient, as in "C-coded"/built-in) approach?

Honestly, doing unique operations is pretty rare in the application
level. Unless you're writing some kind of database, I don't see why
you'd do it. (My recommendation is not to write databases.)

If you want unique elements, use a set. If you want to order them,
sort a list of the items in the set.

If you want to preserve the order, then using a dict may be even
better.

orig_order = dict(reversed([reversed(i) for i in enumerate
(items)])
unique_ordered = sorted(orig_order.keys(), key=lambda k: orig_order
[k])

Hints to understanding:
* enumerate generates (index, item) pairs.
* We reverse each pair so that we get an item -> index mapping.
* We reverse it so that the first ones appear last. Later pairs
override earlier ones in dict().

Gabriel Genellina · Aug 8, 2009

En Fri said:
Suppose that x is some list. To produce a version of the list with
duplicate elements removed one could, I suppose, do this:

x = list(set(x))

but I expect that this will not preserve the original order of
elements.

I suppose that I could write something like

def uniquify(items):
seen = set()
ret = []
for i in items:
if not i in seen:
ret.append(i)
seen.add(i)
return ret

Assuming the elements are hashable, yes, that's the fastest way (minus
some microoptimizations like using local names for ret.append and
seen.add, or the 'not in' operator).

See bearophile's recipe [1], another one [2] by Tim Peters (quite old but
worths reading the comment section), and this thread [3]

[1] http://code.activestate.com/recipes/438599/
[2] http://code.activestate.com/recipes/52560/
[3] http://groups.google.com/group/comp.lang.python/t/40c6c455f4fd5154/

ryles · Aug 8, 2009

Suppose that x is some list. To produce a version of the list with
duplicate elements removed one could, I suppose, do this:

x = list(set(x))

but I expect that this will not preserve the original order of
elements.

OrderedSet is most likely on the way, but for now Python 3.1 and 2.7
have OrderedDict. For 3.0 and 2.6 the recipe is here:

http://code.activestate.com/recipes/576669

With OrderedDict you can do:

OrderedDict.fromkeys(x).keys() # Returns an iterator in 3.x, a list
in 2.x.

Paul Rubin · Aug 8, 2009

Dennis Lee Bieber said:
Why bother with seen ?

The version with seen runs in linear time because of the O(1) set
lookup. Your version runs in quadratic time.

Simon Forman · Aug 9, 2009

Suppose that x is some list. To produce a version of the list with
duplicate elements removed one could, I suppose, do this:

x = list(set(x))

but I expect that this will not preserve the original order of
elements.

I suppose that I could write something like

def uniquify(items):
seen = set()
ret = []
for i in items:
if not i in seen:
ret.append(i)
seen.add(i)
return ret

But this seems to me like such a commonly needed operation that I
find it hard to believe one would need to resort to such self-rolled
solutions. Isn't there some more standard (and hopefully more
efficient, as in "C-coded"/built-in) approach?

TIA!

kynn

Unique items in a list, in the same order as in the list:

x = sorted(set(x), key=x.index)

;]

generating unique set of dicts from a list of dicts	0	Jan 10, 2012
unique elements from list of lists	6	Feb 9, 2007
Is there any advantage or disadvantage to using sets over list compsto ensure a list of unique entri	5	Jun 20, 2011
Unique Elements in a List	24	May 9, 2005
Creating Unique Dictionary Variables from List	2	Apr 11, 2007
Creating Unique Dictionary Variables from List	5	Apr 11, 2007
List of unique elements?	9	Jan 4, 2007
shelf-like list?	9	Aug 10, 2010

unique-ifying a list

kj

Jonathan Gardner

Gabriel Genellina

ryles

Paul Rubin

Simon Forman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads