how to remove 50000 elements from a 100000 list?

Ju Hui · May 5, 2006

I want to remove about 50000 elements from a list,which has 100000
elements.
sample code like below:
.... a.remove(x)
....[4, 5, 6, 7, 8, 9]

when a and b is small size, it will finished quickly, but when a and b
have many elements.
such as:.... a.remove(x)
....
it will very slowly. Shall I change to another data structure and choos
a better arithmetic?
any suggestion is welcome.
thanks a lot!

Diez B. Roggisch · May 5, 2006

Ju said:
I want to remove about 50000 elements from a list,which has 100000
elements.
sample code like below:
... a.remove(x)
...[4, 5, 6, 7, 8, 9]

when a and b is small size, it will finished quickly, but when a and b
have many elements.
such as:... a.remove(x)
...
it will very slowly. Shall I change to another data structure and choos
a better arithmetic?

How about a listcomprehension?

new_list = [e for e in old_list if predicate(e)]
# Possibly you need this, too:
old_list[:] = new_list

Diez

roman.zeyde · May 5, 2006

Try to use set objects:

Tim Chase · May 5, 2006

but when a and b have many elements. such as:

... a.remove(x)
...
it will very slowly.

Well, your problem is rather ambiguous. In your examples,
you're removing contiguous ranges. Thus, you should be able
to do something like

>>> a = range(100000)
>>> del a[:50000]

Click to expand...

Click to expand...

which may have a considerable speedup. However, if B
contains a disjoint sets of entries, the simple solution
will require A*B checks, making it dependant on the size of
A and B (as you've discovered).

Assuming the "del" method is considerably faster, you might
be able to do some sort of optimization of the disjoint set,
using the above-mentioned method. Break B into contiguous
pieces, and then pass those as slices to delete.

Or if B is a pattern of some sort, you could do

>>> a = range(100000)
>>> del a[::2] #delete even numbers
>>> a = range(100000)
>>> del a[1::2] #delete odd numbers
>>> a = range(100000)
>>> del a[2::5] # delete every 5th element beginning with

Click to expand...

Click to expand...

the 3rd

Another attempt might be to try

>>> a = [x for x in a if x not in b]

Click to expand...

Click to expand...

However, this is still doing A*B checks, and will likely
degrade with as their sizes increase.

Just a few ideas.

-tkc

Jack Orenstein · May 5, 2006

> ... a.remove(x)
> ...
> it will very slowly. Shall I change to another data structure and choos
> a better arithmetic?
> any suggestion is welcome.

If removal is an O(n) operation, then removing 1/2 the list will take
O(n**2), which you don't want. You'd be better off with the contents of
"a" being in a hash table (O(1) removal in practice) or a balanced
tree (O(log n) removal).

Another possibility: If the a and b lists are ordered in the same way,
then you could walk through the lists in order using a merge procedure,
generating a new list as you go.

After ruling out slow data structures and algorithms, you'll almost
certainly be better off using something built in to Python rather than
coding your own data structure in Python.

Jack Orenstein

Larry Bates · May 5, 2006

Ju said:
I want to remove about 50000 elements from a list,which has 100000
elements.
sample code like below:
... a.remove(x)
...[4, 5, 6, 7, 8, 9]

when a and b is small size, it will finished quickly, but when a and b
have many elements.
such as:... a.remove(x)
...
it will very slowly. Shall I change to another data structure and choos
a better arithmetic?
any suggestion is welcome.
thanks a lot!

A list isn't going to respond very well to random removals of large
numbers of elements like this. Remember that it must do linear search
to find the value to be removed and then basically build a completely
new list with the remaining values. Depending on the data in question,
you might be able to leverage things. Are the two lists sorted? If so
you can iterate over both of them and build a third list with the results.

This is not particularly 'elegant' but it is fast for sorted lists:

import time
a=range(100000)
b=range(50000)

start_time=time.time()
for x in b:
a.remove(x)

stop_time=time.time()
print "brute force elapsed time=%.2f seconds" % (stop_time-start_time)

start_time=time.time()
n=[]
i1=0
i2=0
while 1:
try: v1=a[i1]
except IndexError:
break

try: v2=b[i2]
except IndexError:
n.extend(a[i1:])
break

if v1 < v2:
n.append(v1)
i1+=1
continue

elif v1 > v2:
i2+=1
continue

else:
i1+=1
i2+=1
continue

stop_time=time.time()
print "new elapsed time=%.2f seconds" % (stop_time-start_time)

start_time=time.time()
a=set(a)
b=set(b)
a.symmetric_difference(b)
stop_time=time.time()
print "sets elapsed time=%.2f seconds" % (stop_time-start_time)

brute force elapsed time=4.13 seconds
new elapsed time=0.05 seconds
sets elapsed time=0.03 seconds

There may be other ways using bisect or some other module. If data
is random, unsorted or has duplicates, I would look at in-memory
database like SQLlite instead.

If the values are unique you can do something like:

a=set(range(100000))
b=set(range(50000))

a.symmetric_difference(b)

Which is the fastest way.

It all depends on the "real" data which we can't tell from your
test using range().

-Larry Bates

Diez B. Roggisch · May 5, 2006

How about a listcomprehension?

new_list = [e for e in old_list if predicate(e)]
# Possibly you need this, too:
old_list[:] = new_list

forgot the predicate. And you should use a set of objects to remove, as then
you'd have O(1) behavior for the in-operator

So:

to_remove = set(b)
new_list = [e for e in a if e not in to_remove]

Diez

Sion Arrowsmith · May 5, 2006

Tim Chase said:
Another attempt might be to try

a = [x for x in a if x not in b]

Click to expand...

Click to expand...

However, this is still doing A*B checks, and will likely
degrade with as their sizes increase.

Combine this with the use of sets others have suggested if the
order of a matters, ie:

bset = set(b)
a = [ x for x in a if x not in bset ]

Click to expand...

Click to expand...

which gets you down to O(A) since set membership is O(1).

Ju Hui · May 5, 2006

cool!
thanks you all!
I choose
a=set(range(100000))
b=set(range(50000))
a.symmetric_difference(b)
certainly,the real data not range(), I am satisfied the performance of
set.difference

thank you all again!

Peter Otten · May 5, 2006

Ju said:
cool!
thanks you all!
I choose
a=set(range(100000))
b=set(range(50000))
a.symmetric_difference(b)
certainly,the real data not range(), I am satisfied the performance of
set.difference

thank you all again!

Be warned that this may /add/ items to a:
set(['a', 'd'])

If your subject is correct you want difference(), not
symmetric_difference().

Peter

Andrew Gwozdziewycz · May 5, 2006

It's easy in this case:

a = range(50000, 100000)

But, I'm just trying to add some humor to this thread

I want to remove about 50000 elements from a list,which has 100000
elements.
sample code like below:
... a.remove(x)
...[4, 5, 6, 7, 8, 9]

when a and b is small size, it will finished quickly, but when a and b
have many elements.
such as:... a.remove(x)
...
it will very slowly. Shall I change to another data structure and
choos
a better arithmetic?
any suggestion is welcome.
thanks a lot!

---
Andrew Gwozdziewycz
(e-mail address removed)
http://www.23excuses.com
http://ihadagreatview.org
http://and.rovir.us

Larry Bates · May 5, 2006

Peter said:
Ju said:

cool!
thanks you all!
I choose
a=set(range(100000))
b=set(range(50000))
a.symmetric_difference(b)
certainly,the real data not range(), I am satisfied the performance of
set.difference

thank you all again!

Click to expand...

Be warned that this may /add/ items to a:
set(['a', 'd'])

If your subject is correct you want difference(), not
symmetric_difference().

Peter

Whoops. My bad. Peter is correct.

-Larry

Ju Hui · May 6, 2006

to Andrew Gwozdziewycz:
Real humor...
Peter Otten:
thanks your reminder, in my project, a will a superset of b.
so symmetric_difference equals difference.
thank you all again!

Mandatory Elements To Conduct JavaScript Form Manipulation	7	Aug 22, 2023
Range / empty list issues??	1	Dec 11, 2023
remove elements incrementally from a list	4	May 19, 2010
FAQ 4.41 How can I remove duplicate elements from a list or array?	0	Mar 1, 2011
Remove unwanted characters from column	9	Jan 27, 2014
make elements of a list twice or more.	2	Aug 7, 2013
Removing items from a list	5	Feb 10, 2012
using "*" to make a list of lists with repeated (and independent) elements	18	Sep 26, 2012

how to remove 50000 elements from a 100000 list?

Ju Hui

Diez B. Roggisch

roman.zeyde

Tim Chase

Jack Orenstein

Larry Bates

Diez B. Roggisch

Sion Arrowsmith

Ju Hui

Peter Otten

Andrew Gwozdziewycz

Larry Bates

Ju Hui

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads