Is there any advantage or disadvantage to using sets over list compsto ensure a list of unique entri

Discussion in 'Python' started by deathweaselx86, Jun 20, 2011.

  1. Howdy guys, I am new.

    I've been converting lists to sets, then back to lists again to get
    unique lists.
    e.g

    Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
    [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> foo = ['1','2','3']
    >>> bar = ['2','5']
    >>> foo.extend(bar)
    >>> foo = list(set(foo))
    >>> foo

    ['1', '3', '2', '5']

    I used to use list comps to do this instead.
    >>> foo = ['1','2','3']
    >>> bar = ['2','5']
    >>> foo.extend([a for a in bar if a not in foo])
    >>> foo

    ['1', '2', '3', '5']

    A very long time ago, we all used dictionaries, but I'm not interested
    in that ever again. ;-)
    Is there any performance hit to using one of these methods over the
    other for rather large lists?
     
    deathweaselx86, Jun 20, 2011
    #1
    1. Advertising

  2. deathweaselx86

    Ian Kelly Guest

    Re: Is there any advantage or disadvantage to using sets over listcomps to ensure a list of unique entries?

    On Mon, Jun 20, 2011 at 1:43 PM, deathweaselx86 <> wrote:
    > Howdy guys, I am new.
    >
    > I've been converting lists to sets, then back to lists again to get
    > unique lists.
    > e.g
    >
    > Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
    > [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
    > Type "help", "copyright", "credits" or "license" for more information.
    >>>> foo = ['1','2','3']
    >>>> bar = ['2','5']
    >>>> foo.extend(bar)
    >>>> foo = list(set(foo))
    >>>> foo

    > ['1', '3', '2', '5']
    >
    > I used to use list comps to do this instead.
    >>>> foo = ['1','2','3']
    >>>> bar = ['2','5']
    >>>> foo.extend([a for a in bar if a not in foo])
    >>>> foo

    > ['1', '2', '3', '5']
    >
    > A very long time ago, we all used dictionaries, but I'm not interested
    > in that ever again. ;-)
    > Is there any performance hit to using one of these methods over the
    > other for rather large lists?


    Yes. In the second approach, "if a not in foo" is O(n) if foo is a
    list, and since you're doing it m times, that's O(n * m).

    The first approach is merely O(n + m).
     
    Ian Kelly, Jun 20, 2011
    #2
    1. Advertising

  3. Re: Is there any advantage or disadvantage to using sets over listcomps to ensure a list of unique entries?

    On Mon, 20 Jun 2011 12:43:52 -0700, deathweaselx86 wrote:

    > Howdy guys, I am new.
    >
    > I've been converting lists to sets, then back to lists again to get
    > unique lists.
    > e.g
    >
    > Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48) [GCC 4.2.4 (Ubuntu
    > 4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or
    > "license" for more information.
    >>>> foo = ['1','2','3']
    >>>> bar = ['2','5']
    >>>> foo.extend(bar)
    >>>> foo = list(set(foo))
    >>>> foo

    > ['1', '3', '2', '5']
    >
    > I used to use list comps to do this instead.
    >>>> foo = ['1','2','3']
    >>>> bar = ['2','5']
    >>>> foo.extend([a for a in bar if a not in foo]) foo

    > ['1', '2', '3', '5']
    >
    > A very long time ago, we all used dictionaries, but I'm not interested
    > in that ever again. ;-)
    > Is there any performance hit to using one of these methods over the
    > other for rather large lists?


    Absolutely!

    Membership testing in lists is O(N), that is, it gets slower as the
    number of items in the list increases. So if list foo above is huge, "a
    not in foo" may take time proportional to how huge it is.

    For sets, membership testing is (roughly) O(1), that it, it takes
    (roughly) constant time no matter how big the set is. There are special
    cases where this does not hold, but you have to work hard to find one, so
    in general you can assume that any lookup in a dict or set will take the
    same amount of time.

    However, the cost of that extra power is that sets are limited to
    hashable objects (e.g. ints, strings, but not lists, etc.), they can't
    contain duplicates, and they are unordered. If you care about the order
    of the list, it is better to do something like this:

    # pseudo-code
    target = ['1', '3', '2', '7', '5', '9']
    source = ['6', '2', '1', '0']
    # add unique items of source to target, keeping the order
    tmp = sorted(target)
    for item in source:
    flag = binary search for item in tmp
    if not flag:
    insert item in tmp keeping the list sorted
    append item to the end of target
    del tmp

    The bisect module will be useful here.

    For small lists, it really doesn't matter what you do. This probably only
    matters beyond a few tens of thousands of items.



    --
    Steven
     
    Steven D'Aprano, Jun 21, 2011
    #3
  4. deathweaselx86

    Ethan Furman Guest

    Re: Is there any advantage or disadvantage to using sets over listcomps to ensure a list of unique entries?

    Steven D'Aprano wrote:
    > On Mon, 20 Jun 2011 12:43:52 -0700, deathweaselx86 wrote:
    >> I've been converting lists to sets, then back to lists again to get
    >> unique lists.
    >>
    >> I used to use list comps to do this instead.
    >>>>> foo = ['1','2','3']
    >>>>> bar = ['2','5']
    >>>>> foo.extend([a for a in bar if a not in foo]) foo

    >> ['1', '2', '3', '5']
    >>
    >> Is there any performance hit to using one of these methods over the
    >> other for rather large lists?

    >
    > Absolutely!
    >
    > For small lists, it really doesn't matter what you do. This probably only
    > matters beyond a few tens of thousands of items.


    Depends on the complexity of the object. It only took a couple thousand
    dbf records to notice a *huge* slowdown using 'in' tests on regular lists.

    ~Ethan~
     
    Ethan Furman, Jun 21, 2011
    #4
  5. deathweaselx86

    rusi Guest

    Re: Is there any advantage or disadvantage to using sets over listcomps to ensure a list of unique entries?

    On Jun 21, 12:43 am, deathweaselx86 <> wrote:
    > Howdy guys, I am new.
    >
    > I've been converting lists to sets, then back to lists again to get
    > unique lists.


    Maybe you should consider whether its best to work with sets only and
    not use lists at all.
    This needs to be said because:
    1. Most intros to python show lists (and tuples which is almost the
    same) and hardly mention sets
    2. Until recent versions python did not have sets

    The basic question to ask yourself is: Is order/repetition
    significant?
    If not you want a set.
     
    rusi, Jun 21, 2011
    #5
  6. Re: Is there any advantage or disadvantage to using sets over listcomps to ensure a list of unique entries?

    On Jun 20, 9:43 pm, deathweaselx86 <> wrote:
    > Howdy guys, I am new.
    >
    > I've been converting lists to sets, then back to lists again to get
    > unique lists.
    > e.g
    >
    > Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
    > [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
    > Type "help", "copyright", "credits" or "license" for more information.>>>foo = ['1','2','3']
    > >>> bar = ['2','5']
    > >>> foo.extend(bar)
    > >>> foo = list(set(foo))


    That works great and is very clear.

    If you want to also preserve order, use an OrderedDict:

    >>> list(OrderedDict.fromkeys('abracadabra'))

    ['a', 'b', 'r', 'c', 'd']


    Raymond
     
    Raymond Hettinger, Jun 25, 2011
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe
    Replies:
    3
    Views:
    4,515
    Sudsy
    Dec 5, 2003
  2. Thirsty Traveler

    Is there any advantage to using a webservice over a post

    Thirsty Traveler, Jun 19, 2006, in forum: ASP .Net Web Services
    Replies:
    1
    Views:
    154
    Josh Twist
    Jun 20, 2006
  3. Costan
    Replies:
    9
    Views:
    111
    SHINDO Motoaki
    Jul 7, 2008
  4. Token Type
    Replies:
    9
    Views:
    391
    Chris Angelico
    Sep 9, 2012
  5. JL
    Replies:
    8
    Views:
    82
    Terry Reedy
    Dec 11, 2013
Loading...

Share This Page