Is there any advantage or disadvantage to using sets over list compsto ensure a list of unique entri

deathweaselx86 · Jun 20, 2011

Howdy guys, I am new.

I've been converting lists to sets, then back to lists again to get
unique lists.
e.g

Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

foo = ['1','2','3']
bar = ['2','5']
foo.extend(bar)
foo = list(set(foo))
foo

Click to expand...

Click to expand...

['1', '3', '2', '5']

I used to use list comps to do this instead.

foo = ['1','2','3']
bar = ['2','5']
foo.extend([a for a in bar if a not in foo])
foo

Click to expand...

Click to expand...

['1', '2', '3', '5']

A very long time ago, we all used dictionaries, but I'm not interested
in that ever again. ;-)
Is there any performance hit to using one of these methods over the
other for rather large lists?

Ian Kelly · Jun 20, 2011

Howdy guys, I am new.

I've been converting lists to sets, then back to lists again to get
unique lists.
e.g

Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

foo = ['1','2','3']
bar = ['2','5']
foo.extend(bar)
foo = list(set(foo))
foo

Click to expand...

Click to expand...

['1', '3', '2', '5']

I used to use list comps to do this instead.

foo = ['1','2','3']
bar = ['2','5']
foo.extend([a for a in bar if a not in foo])
foo

Click to expand...

Click to expand...

['1', '2', '3', '5']

A very long time ago, we all used dictionaries, but I'm not interested
in that ever again. ;-)
Is there any performance hit to using one of these methods over the
other for rather large lists?

Yes. In the second approach, "if a not in foo" is O(n) if foo is a
list, and since you're doing it m times, that's O(n * m).

The first approach is merely O(n + m).

Steven D'Aprano · Jun 21, 2011

Howdy guys, I am new.

I've been converting lists to sets, then back to lists again to get
unique lists.
e.g

Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48) [GCC 4.2.4 (Ubuntu
4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or
"license" for more information.

foo = ['1','2','3']
bar = ['2','5']
foo.extend(bar)
foo = list(set(foo))
foo

Click to expand...

Click to expand...

['1', '3', '2', '5']

I used to use list comps to do this instead.

foo = ['1','2','3']
bar = ['2','5']
foo.extend([a for a in bar if a not in foo]) foo

Click to expand...

Click to expand...

['1', '2', '3', '5']

A very long time ago, we all used dictionaries, but I'm not interested
in that ever again. ;-)
Is there any performance hit to using one of these methods over the
other for rather large lists?

Absolutely!

Membership testing in lists is O(N), that is, it gets slower as the
number of items in the list increases. So if list foo above is huge, "a
not in foo" may take time proportional to how huge it is.

For sets, membership testing is (roughly) O(1), that it, it takes
(roughly) constant time no matter how big the set is. There are special
cases where this does not hold, but you have to work hard to find one, so
in general you can assume that any lookup in a dict or set will take the
same amount of time.

However, the cost of that extra power is that sets are limited to
hashable objects (e.g. ints, strings, but not lists, etc.), they can't
contain duplicates, and they are unordered. If you care about the order
of the list, it is better to do something like this:

# pseudo-code
target = ['1', '3', '2', '7', '5', '9']
source = ['6', '2', '1', '0']
# add unique items of source to target, keeping the order
tmp = sorted(target)
for item in source:
flag = binary search for item in tmp
if not flag:
insert item in tmp keeping the list sorted
append item to the end of target
del tmp

The bisect module will be useful here.

For small lists, it really doesn't matter what you do. This probably only
matters beyond a few tens of thousands of items.

Ethan Furman · Jun 21, 2011

Steven said:
I've been converting lists to sets, then back to lists again to get
unique lists.

I used to use list comps to do this instead.

foo = ['1','2','3']
bar = ['2','5']
foo.extend([a for a in bar if a not in foo]) foo

Click to expand...

['1', '2', '3', '5']

Is there any performance hit to using one of these methods over the
other for rather large lists?

Click to expand...

Absolutely!

For small lists, it really doesn't matter what you do. This probably only
matters beyond a few tens of thousands of items.

Depends on the complexity of the object. It only took a couple thousand
dbf records to notice a *huge* slowdown using 'in' tests on regular lists.

~Ethan~

rusi · Jun 21, 2011

Howdy guys, I am new.

I've been converting lists to sets, then back to lists again to get
unique lists.

Maybe you should consider whether its best to work with sets only and
not use lists at all.
This needs to be said because:
1. Most intros to python show lists (and tuples which is almost the
same) and hardly mention sets
2. Until recent versions python did not have sets

The basic question to ask yourself is: Is order/repetition
significant?
If not you want a set.

Raymond Hettinger · Jun 25, 2011

Howdy guys, I am new.

I've been converting lists to sets, then back to lists again to get
unique lists.
e.g

Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>>foo = ['1','2','3']

bar = ['2','5']
foo.extend(bar)
foo = list(set(foo))

Click to expand...

Click to expand...

That works great and is very clear.

If you want to also preserve order, use an OrderedDict:
['a', 'b', 'r', 'c', 'd']

Raymond

length of a tuple or a list containing only one element	9	Nov 3, 2008
Deeper copy than deepcopy	4	Oct 27, 2009
Appending a list's elements to another list using a list comprehension	17	Oct 17, 2007
Is there any way to have a dropdown list with non-unique values?	1	May 12, 2005
FAQ 4.42 How can I tell whether a certain element is contained in a list or array?	0	Feb 8, 2011
SWIG, c++ to Python: array of pointers (double pointer) not working	1	Mar 14, 2009
Recurring patterns: Am I missing it, or can we get these added to thelanguage?	8	Apr 15, 2008
Is there an easier way to break a list into sub groups	4	Feb 22, 2004

Is there any advantage or disadvantage to using sets over list compsto ensure a list of unique entri

deathweaselx86

Ian Kelly

Steven D'Aprano

Ethan Furman

rusi

Raymond Hettinger

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads