fast list search?

R

ramon aragues

Hi,

I´ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn´t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)


but it is extremely slow... is there a faster way of doing this in python?

Thanks a lot,

Ramon Aragues
 
T

Thomas Guettler

Am Wed, 09 Jun 2004 11:49:19 +0200 schrieb ramon aragues:
Hi,

I´ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn´t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)


but it is extremely slow... is there a faster way of doing this in python?

Hi,

Use a dictionary instead of the list:

if not long_dict.has_key(new_int):
long_dict[new_int]=1

Thomas
 
T

Terry Reedy

Thomas Guettler said:
Am Wed, 09 Jun 2004 11:49:19 +0200 schrieb ramon aragues:
Hi,

I´ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn´t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)


but it is extremely slow... is there a faster way of doing this in
python?

Hi,

Use a dictionary instead of the list:

if not long_dict.has_key(new_int):
long_dict[new_int]=1

Or use a set, which I believe will be faster (more C-coded) in 2.4. That
eliminates the dummy value.

TJR
 
H

has

Thomas Guettler said:
Use a dictionary instead of the list:

if not long_dict.has_key(new_int):
long_dict[new_int]=1

Or use an ordered list, which can be searched using an O(log n) binary
search algorithm (e.g. see the bisect module) instead of O(n) linear
search. Not sure which would be more efficient; if performance matters
you'd need to implement both and run comparative timing tests,
otherwise just use the dict solution as it's the simplest.

If your original list is unsorted and order is important, you can
maintain a second ordered list or dict alongside it purely for
performing your itemExists(i) tests. Remember to always add and remove
items to both to keep them in sync. And you can wrap the lot up as a
single custom BigUniqueList class to keep it neat and tidy and mediate
all access via accessor/mutator methods to ensure integrity is
maintained.
 
J

Javier Zaragozá Arenas

I think the better way is sort before the list

And then, a binary search
point_of_insertion = bisect.bisect(thelist, item)
is_present = thelist[point_of_insertion -1:point_of_insertion] == [item]
>> if not is_present:
..... thelist.insert(point_of_insertion, item)

and done!

Fco Javier Zaragozá Arenas
 
D

Dan Gunter

How about using a dictionary, with the keys being the ints and the
values being the index in the 'list', i.e. len(dictionary). This is
quite fast:
.... if not d.has_key(i): d = len(d)

-Dan
I think the better way is sort before the list



And then, a binary search

point_of_insertion = bisect.bisect(thelist, item)
is_present = thelist[point_of_insertion -1:point_of_insertion] == [item]
if not is_present:
.... thelist.insert(point_of_insertion, item)

and done!

Fco Javier Zaragozá Arenas



Hi,

I´ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn´t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)


but it is extremely slow... is there a faster way of doing this in python?

Thanks a lot,

Ramon Aragues
 
P

Phil Frost

Perhaps the standard 'sets' module would be of use? It's implemented
with dictionaries, and seems to do the thing that is needed.

How about using a dictionary, with the keys being the ints and the
values being the index in the 'list', i.e. len(dictionary). This is
quite fast:
... if not d.has_key(i): d = len(d)

-Dan
I think the better way is sort before the list

thelist.sort()


And then, a binary search

point_of_insertion = bisect.bisect(thelist, item)
is_present = thelist[point_of_insertion -1:point_of_insertion] == [item]
if not is_present:
.... thelist.insert(point_of_insertion, item)

and done!

Fco Javier Zaragoz? Arenas



Hi,

I?ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn?t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)


but it is extremely slow... is there a faster way of doing this in python?

Thanks a lot,

Ramon Aragues
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top