fast list search?

Discussion in 'Python' started by ramon aragues, Jun 9, 2004.

  1. Hi,

    I´ve got a list with more than 500,000 ints. Before inserting new ints,
    I have to check that it doesn´t exist already in the list.

    Currently, I am doing the standard:

    if new_int not in long_list:
    long_list.append(new_int)


    but it is extremely slow... is there a faster way of doing this in python?

    Thanks a lot,

    Ramon Aragues
     
    ramon aragues, Jun 9, 2004
    #1
    1. Advertising

  2. Am Wed, 09 Jun 2004 11:49:19 +0200 schrieb ramon aragues:

    > Hi,
    >
    > I´ve got a list with more than 500,000 ints. Before inserting new ints,
    > I have to check that it doesn´t exist already in the list.
    >
    > Currently, I am doing the standard:
    >
    > if new_int not in long_list:
    > long_list.append(new_int)
    >
    >
    > but it is extremely slow... is there a faster way of doing this in python?


    Hi,

    Use a dictionary instead of the list:

    if not long_dict.has_key(new_int):
    long_dict[new_int]=1

    Thomas
     
    Thomas Guettler, Jun 9, 2004
    #2
    1. Advertising

  3. ramon aragues

    Terry Reedy Guest

    "Thomas Guettler" <> wrote in message
    news:p...
    > Am Wed, 09 Jun 2004 11:49:19 +0200 schrieb ramon aragues:
    >
    > > Hi,
    > >
    > > I´ve got a list with more than 500,000 ints. Before inserting new ints,
    > > I have to check that it doesn´t exist already in the list.
    > >
    > > Currently, I am doing the standard:
    > >
    > > if new_int not in long_list:
    > > long_list.append(new_int)
    > >
    > >
    > > but it is extremely slow... is there a faster way of doing this in

    python?
    >
    > Hi,
    >
    > Use a dictionary instead of the list:
    >
    > if not long_dict.has_key(new_int):
    > long_dict[new_int]=1


    Or use a set, which I believe will be faster (more C-coded) in 2.4. That
    eliminates the dummy value.

    TJR
     
    Terry Reedy, Jun 9, 2004
    #3
  4. ramon aragues

    Peter Otten Guest

    ramon aragues wrote:

    > if new_int not in long_list:
    > long_list.append(new_int)


    > but it is extremely slow... is there a faster way of doing this in python?



    >>> import sets
    >>> int_set = sets.Set()
    >>> for i in [1,2,2,3,4,4,4]:

    .... int_set.add(i)
    ....
    >>> int_set

    Set([1, 2, 3, 4])
    >>>


    This requires Python 2.3.

    Peter
     
    Peter Otten, Jun 9, 2004
    #4
  5. ramon aragues

    has Guest

    "Thomas Guettler" <> wrote in message news:<>...

    > Use a dictionary instead of the list:
    >
    > if not long_dict.has_key(new_int):
    > long_dict[new_int]=1


    Or use an ordered list, which can be searched using an O(log n) binary
    search algorithm (e.g. see the bisect module) instead of O(n) linear
    search. Not sure which would be more efficient; if performance matters
    you'd need to implement both and run comparative timing tests,
    otherwise just use the dict solution as it's the simplest.

    If your original list is unsorted and order is important, you can
    maintain a second ordered list or dict alongside it purely for
    performing your itemExists(i) tests. Remember to always add and remove
    items to both to keep them in sync. And you can wrap the lot up as a
    single custom BigUniqueList class to keep it neat and tidy and mediate
    all access via accessor/mutator methods to ensure integrity is
    maintained.
     
    has, Jun 9, 2004
    #5
  6. I think the better way is sort before the list

    >> thelist.sort()


    And then, a binary search

    >> point_of_insertion = bisect.bisect(thelist, item)
    >> is_present = thelist[point_of_insertion -1:point_of_insertion] == [item]
    >> if not is_present:

    ..... thelist.insert(point_of_insertion, item)

    and done!

    Fco Javier Zaragozá Arenas



    "ramon aragues" <> escribió en el mensaje
    news:...
    > Hi,
    >
    > I´ve got a list with more than 500,000 ints. Before inserting new ints,
    > I have to check that it doesn´t exist already in the list.
    >
    > Currently, I am doing the standard:
    >
    > if new_int not in long_list:
    > long_list.append(new_int)
    >
    >
    > but it is extremely slow... is there a faster way of doing this in python?
    >
    > Thanks a lot,
    >
    > Ramon Aragues
    >
    >
     
    Javier Zaragozá Arenas, Jun 12, 2004
    #6
  7. ramon aragues

    Dan Gunter Guest

    How about using a dictionary, with the keys being the ints and the
    values being the index in the 'list', i.e. len(dictionary). This is
    quite fast:

    >>> d = {}
    >>> for i in xrange(0,500000):

    .... if not d.has_key(i): d = len(d)

    -Dan

    Javier Zaragozá Arenas wrote:
    > I think the better way is sort before the list
    >
    >
    >>>thelist.sort()

    >
    >
    > And then, a binary search
    >
    >
    >>>point_of_insertion = bisect.bisect(thelist, item)
    >>>is_present = thelist[point_of_insertion -1:point_of_insertion] == [item]

    >
    > >> if not is_present:

    > .... thelist.insert(point_of_insertion, item)
    >
    > and done!
    >
    > Fco Javier Zaragozá Arenas
    >
    >
    >
    > "ramon aragues" <> escribió en el mensaje
    > news:...
    >
    >>Hi,
    >>
    >>I´ve got a list with more than 500,000 ints. Before inserting new ints,
    >>I have to check that it doesn´t exist already in the list.
    >>
    >>Currently, I am doing the standard:
    >>
    >>if new_int not in long_list:
    >>long_list.append(new_int)
    >>
    >>
    >>but it is extremely slow... is there a faster way of doing this in python?
    >>
    >>Thanks a lot,
    >>
    >>Ramon Aragues
    >>
    >>

    >
    >
    >
     
    Dan Gunter, Jun 12, 2004
    #7
  8. ramon aragues

    Phil Frost Guest

    Perhaps the standard 'sets' module would be of use? It's implemented
    with dictionaries, and seems to do the thing that is needed.

    On Fri, Jun 11, 2004 at 08:06:17PM -0700, Dan Gunter wrote:
    > How about using a dictionary, with the keys being the ints and the
    > values being the index in the 'list', i.e. len(dictionary). This is
    > quite fast:
    >
    > >>> d = {}
    > >>> for i in xrange(0,500000):

    > ... if not d.has_key(i): d = len(d)
    >
    > -Dan
    >
    > Javier Zaragoz? Arenas wrote:
    > >I think the better way is sort before the list
    > >
    > >
    > >>>thelist.sort()

    > >
    > >
    > >And then, a binary search
    > >
    > >
    > >>>point_of_insertion = bisect.bisect(thelist, item)
    > >>>is_present = thelist[point_of_insertion -1:point_of_insertion] == [item]

    > >
    > > >> if not is_present:

    > >.... thelist.insert(point_of_insertion, item)
    > >
    > >and done!
    > >
    > >Fco Javier Zaragoz? Arenas
    > >
    > >
    > >
    > >"ramon aragues" <> escribi? en el mensaje
    > >news:...
    > >
    > >>Hi,
    > >>
    > >>I?ve got a list with more than 500,000 ints. Before inserting new ints,
    > >>I have to check that it doesn?t exist already in the list.
    > >>
    > >>Currently, I am doing the standard:
    > >>
    > >>if new_int not in long_list:
    > >>long_list.append(new_int)
    > >>
    > >>
    > >>but it is extremely slow... is there a faster way of doing this in python?
    > >>
    > >>Thanks a lot,
    > >>
    > >>Ramon Aragues
     
    Phil Frost, Jun 12, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jonas Forssell
    Replies:
    5
    Views:
    460
  2. Replies:
    0
    Views:
    680
  3. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    579
  4. Juha Nieminen
    Replies:
    22
    Views:
    1,048
    Kai-Uwe Bux
    Oct 12, 2007
  5. Abby Lee
    Replies:
    5
    Views:
    443
    Abby Lee
    Aug 2, 2004
Loading...

Share This Page