unique-ifying a list

Discussion in 'Python' started by kj, Aug 7, 2009.

  1. kj

    kj Guest

    Suppose that x is some list. To produce a version of the list with
    duplicate elements removed one could, I suppose, do this:

    x = list(set(x))

    but I expect that this will not preserve the original order of
    elements.

    I suppose that I could write something like

    def uniquify(items):
    seen = set()
    ret = []
    for i in items:
    if not i in seen:
    ret.append(i)
    seen.add(i)
    return ret

    But this seems to me like such a commonly needed operation that I
    find it hard to believe one would need to resort to such self-rolled
    solutions. Isn't there some more standard (and hopefully more
    efficient, as in "C-coded"/built-in) approach?

    TIA!

    kynn
     
    kj, Aug 7, 2009
    #1
    1. Advertisements

  2. Honestly, doing unique operations is pretty rare in the application
    level. Unless you're writing some kind of database, I don't see why
    you'd do it. (My recommendation is not to write databases.)

    If you want unique elements, use a set. If you want to order them,
    sort a list of the items in the set.

    If you want to preserve the order, then using a dict may be even
    better.

    orig_order = dict(reversed([reversed(i) for i in enumerate
    (items)])
    unique_ordered = sorted(orig_order.keys(), key=lambda k: orig_order
    [k])

    Hints to understanding:
    * enumerate generates (index, item) pairs.
    * We reverse each pair so that we get an item -> index mapping.
    * We reverse it so that the first ones appear last. Later pairs
    override earlier ones in dict().
     
    Jonathan Gardner, Aug 7, 2009
    #2
    1. Advertisements

  3. Assuming the elements are hashable, yes, that's the fastest way (minus
    some microoptimizations like using local names for ret.append and
    seen.add, or the 'not in' operator).

    See bearophile's recipe [1], another one [2] by Tim Peters (quite old but
    worths reading the comment section), and this thread [3]

    [1] http://code.activestate.com/recipes/438599/
    [2] http://code.activestate.com/recipes/52560/
    [3] http://groups.google.com/group/comp.lang.python/t/40c6c455f4fd5154/
     
    Gabriel Genellina, Aug 8, 2009
    #3
  4. kj

    ryles Guest

    OrderedSet is most likely on the way, but for now Python 3.1 and 2.7
    have OrderedDict. For 3.0 and 2.6 the recipe is here:

    http://code.activestate.com/recipes/576669

    With OrderedDict you can do:

    OrderedDict.fromkeys(x).keys() # Returns an iterator in 3.x, a list
    in 2.x.
     
    ryles, Aug 8, 2009
    #4
  5. kj

    Paul Rubin Guest

    The version with seen runs in linear time because of the O(1) set
    lookup. Your version runs in quadratic time.
     
    Paul Rubin, Aug 8, 2009
    #5
  6. kj

    Simon Forman Guest


    Unique items in a list, in the same order as in the list:

    x = sorted(set(x), key=x.index)

    ;]
     
    Simon Forman, Aug 9, 2009
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.