Eliminating duplicates entries from a list efficiently

P

Paul

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul
 
?

=?ISO-8859-1?Q?Gerhard_H=E4ring?=

Paul said:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u [...]

That's slow for large lists. Better use a dictionary u. Also it's unwise
to reuse builtin names like "list" as variable names.

Many discussions of this can be found in the mailing list/newsgroup
archives via http://groups.google.com I also remember some neat
solutions for this problem in the Python cookbook.

-- Gerhard
 
R

Roy Smith

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']

Thanks,

-- Paul

Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()

This should do it and will run in, if I'm not mistaken, O(n). The only
problem is that it scrambles the order of the items in the list, which
may or may not be a problem, depending on your application.
 
R

Roger Binns

Paul said:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine,

The less naive approach to go to the Python cookbook which contains
lots of recipes for doing all sorts of things, commented on and
improved by the community.

The starting page is:

http://aspn.activestate.com/ASPN/Python/Cookbook/

For example here is a recipe by one Tim Peters that shows how to
do it, including all the subtle cases you didn't even realise
existed.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560

I would highly recommend getting the Python Cookbook dead tree
version.

Roger
 
?

=?iso-8859-1?Q?Fran=E7ois?= Pinard

[Roy Smith]
[Paul]
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:
Something like:
d = {}
for item in list:
d[item] = True
list = d.keys()
This should do it and will run in, if I'm not mistaken, O(n). The
only problem is that it scrambles the order of the items in the list,
which may or may not be a problem, depending on your application.

If order is not important, here is a short idiom:

from sets import Set as set
d = list(set(d))

This may be slower than the dictionary solution in current Python. Sets
were experimental, but as they are stabilising, I think there are plans
now to make them faster in subsequent Python releases. So, it might be
a good bet, learning to depend on them in the long run.
 
D

Duncan Booth

Something like:

d = {}
for item in list:
d[item] = True
list = d.keys()

Or just:

aList = dict.fromkeys(aList).keys()

which does the same thing (except I renamed the variable to avoid confusion
with the list type).
 
P

phil hunt

Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']


How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u
 
L

Larry Bates

Take a look at recipe here:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/204297

another way would be to create a dictionary (which would
eliminate the duplicates) and then convert back to a list.

uniq_list=dict(zip(list,list)).keys()

FYI,
Larry Bates
Syscon, Inc.


phil hunt said:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

def uniq(list):
u = []
for x in list:
if x not in u: u.append(x)
return u

print uniq(['a','b','c','a'])
['a', 'b', 'c']


How about:

def uniq(list):
list.sort() # put duplicate entries together
lastValue = "different from 0th value" + str(list[0])
u = []
for item in list:
if item != lastValue:
u.append(item)
lastValue = item
return u
 
S

Scott David Daniels

Larry said:
Take a look at recipe here:...

Or even (you get to the first few earlier):

def uniq(lst):
matched = {}
for item in lst:
if item not in matched:
yield item
matched[item] = item
 
P

Paul Rubin

Paul said:
Can anyone suggest an efficient way to eliminate duplicate entries
in a list? The naive approach below works fine, but is very slow
with lists containing tens of thousands of entries:

I generally do it by sorting the list and then making a pass through
it moving stuff down:

def uniq(a):
"Remove duplicate elements from a. a must be sorted."
p=0
for i in xrange(1,len(a)):
if a != a[p]:
p=p+1
a[p]=a
del a[p+1:]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top