Re: insert unique data in a list

Discussion in 'Python' started by knifenomad, Dec 14, 2009.

  1. knifenomad

    knifenomad Guest

    On 12ì›”14ì¼, 오전2ì‹œ57분, mattia <> wrote:
    > Il Sun, 13 Dec 2009 16:37:20 +0000, mattia ha scritto:
    >
    > > How can I insert non-duplicate data in a list? I mean, is there a
    > > particular option in the creation of a list that permit me not to use
    > > something like:
    > > def append_unique(l, val):
    > >     if val not in l:
    > >         l.append(val)

    >
    > > Thanks,
    > > Mattia

    >
    > Ok, so you all suggest to use a set. Now the second question, more
    > interesting. Why can't I insert a list into a set? I mean, I have a
    > function that returns a list. I call this function several times and
    > maybe the list returned is the same as another one already returned. I
    > usually put all this lists into another list. How can I assure that my
    > list contains only unique lists? Using set does'n work (i.e. the python
    > interpreter tells me: TypeError: unhashable type: 'list')...


    this makes the set type hashable.

    class Set(set):
    __hash__ = lambda self: id(self)

    s = Set()
    s.add(1)
    s.add(2)
    s.add(1)

    print s
    set([1, 2])

    d = {}
    d = 'voila'

    print d
    {Set([1,2]):'voila'}

    print d
    'voila'
    knifenomad, Dec 14, 2009
    #1
    1. Advertising

  2. knifenomad

    knifenomad Guest

    On 12ì›”14ì¼, 오전10ì‹œ19분, knifenomad <> wrote:
    > On 12ì›”14ì¼, 오전2ì‹œ57분, mattia <> wrote:
    >
    >
    >
    >
    >
    > > Il Sun, 13 Dec 2009 16:37:20 +0000, mattia ha scritto:

    >
    > > > How can I insert non-duplicate data in a list? I mean, is there a
    > > > particular option in the creation of a list that permit me not to use
    > > > something like:
    > > > def append_unique(l, val):
    > > >     if val not in l:
    > > >         l.append(val)

    >
    > > > Thanks,
    > > > Mattia

    >
    > > Ok, so you all suggest to use a set. Now the second question, more
    > > interesting. Why can't I insert a list into a set? I mean, I have a
    > > function that returns a list. I call this function several times and
    > > maybe the list returned is the same as another one already returned. I
    > > usually put all this lists into another list. How can I assure that my
    > > list contains only unique lists? Using set does'n work (i.e. the python
    > > interpreter tells me: TypeError: unhashable type: 'list')...

    >
    > this makes the set type hashable.
    >
    > class Set(set):
    >     __hash__ = lambda self: id(self)
    >
    > s = Set()
    > s.add(1)
    > s.add(2)
    > s.add(1)
    >
    > print s
    > set([1, 2])
    >
    > d = {}
    > d = 'voila'
    >
    > print d
    > {Set([1,2]):'voila'}
    >
    > print d
    > 'voila'- ì›ë³¸ í…스트 숨기기 -
    >
    > - ì›ë³¸ í…스트 보기 -


    although it's not what you've asked about. it's intereting to make set
    hashable using __hash__.
    knifenomad, Dec 14, 2009
    #2
    1. Advertising

  3. knifenomad

    Chris Rebert Guest

    On Sun, Dec 13, 2009 at 5:19 PM, knifenomad <> wrote:
    > On 12ì›”14ì¼, 오전2ì‹œ57분, mattia <> wrote:
    >> Il Sun, 13 Dec 2009 16:37:20 +0000, mattia ha scritto:
    >> > How can I insert non-duplicate data in a list? I mean, is there a
    >> > particular option in the creation of a list that permit me not to use
    >> > something like:
    >> > def append_unique(l, val):
    >> >     if val not in l:
    >> >         l.append(val)

    >>
    >> Ok, so you all suggest to use a set. Now the second question, more
    >> interesting. Why can't I insert a list into a set? I mean, I have a
    >> function that returns a list. I call this function several times and
    >> maybe the list returned is the same as another one already returned. I
    >> usually put all this lists into another list. How can I assure that my
    >> list contains only unique lists? Using set does'n work (i.e. the python
    >> interpreter tells me: TypeError: unhashable type: 'list')...

    >
    > this makes the set type hashable.
    >
    > class Set(set):
    >    __hash__ = lambda self: id(self)


    Or you could just use frozenset and get the correct semantics:
    http://docs.python.org/library/stdtypes.html#frozenset

    Cheers,
    Chris
    --
    http://blog.rebertia.com
    Chris Rebert, Dec 14, 2009
    #3
  4. On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote:


    > this makes the set type hashable.
    >
    > class Set(set):
    > __hash__ = lambda self: id(self)



    That's a *seriously* broken hash function.

    >>> key = "voila"
    >>> d = { Set(key): 1 }
    >>> d

    {Set(['i', 'a', 'l', 'o', 'v']): 1}
    >>> d[ Set(key) ]

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    KeyError: Set(['i', 'a', 'l', 'o', 'v'])


    --
    Steven
    Steven D'Aprano, Dec 14, 2009
    #4
  5. knifenomad

    knifenomad Guest

    On 12ì›”14ì¼, 오후12ì‹œ42분, Steven D'Aprano
    <> wrote:
    > On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote:
    > > this makes the set type hashable.

    >
    > > class Set(set):
    > >     __hash__ = lambda self: id(self)

    >
    > That's a *seriously* broken hash function.
    >
    > >>> key = "voila"
    > >>> d = { Set(key): 1 }
    > >>> d

    >
    > {Set(['i', 'a', 'l', 'o', 'v']): 1}>>> d[ Set(key) ]
    >
    > Traceback (most recent call last):
    >   File "<stdin>", line 1, in <module>
    > KeyError: Set(['i', 'a', 'l', 'o', 'v'])
    >
    > --
    > Steven


    of course it is broken as long as it uses it's instance id.
    i added this to notify that unhashable can become hashable
    implementing __hash__ inside the class. which probably set to None by
    default.
    knifenomad, Dec 14, 2009
    #5
  6. knifenomad

    Lie Ryan Guest

    On 12/15/2009 4:13 AM, mattia wrote:
    >> >
    >> > of course it is broken as long as it uses it's instance id. i added this
    >> > to notify that unhashable can become hashable implementing __hash__
    >> > inside the class. which probably set to None by default.

    > Ok, nice example, but I believe that using id() as the hash function can
    > lead to unexpected collisions.


    For dict and set to work correctly, the hash function must conform to
    the contract that:
    - if A == B then hash(A) == hash(B)

    If the id() of two objects differ but their content equal (i.e. they are
    two equivalent, but distinct object), they should have the same hash. If
    id() is used for the hash of an arbitrary object, the contract will be
    broken unless you define A == B in terms of id(A) == id(B).
    Lie Ryan, Dec 14, 2009
    #6
  7. On Mon, 14 Dec 2009 17:13:24 +0000, mattia wrote:

    > Il Sun, 13 Dec 2009 21:17:28 -0800, knifenomad ha scritto:
    >
    >> On 12ì›”14ì¼, 오후12ì‹œ42분, Steven D'Aprano
    >> <> wrote:
    >>> On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote:
    >>> > this makes the set type hashable.
    >>>
    >>> > class Set(set):
    >>> >     __hash__ = lambda self: id(self)
    >>>
    >>> That's a *seriously* broken hash function.
    >>>
    >>> >>> key = "voila"
    >>> >>> d = { Set(key): 1 }
    >>> >>> d
    >>>
    >>> {Set(['i', 'a', 'l', 'o', 'v']): 1}>>> d[ Set(key) ]
    >>>
    >>> Traceback (most recent call last):
    >>>   File "<stdin>", line 1, in <module>
    >>> KeyError: Set(['i', 'a', 'l', 'o', 'v'])
    >>>
    >>> --
    >>> Steven

    >>
    >> of course it is broken as long as it uses it's instance id. i added
    >> this to notify that unhashable can become hashable implementing
    >> __hash__ inside the class. which probably set to None by default.

    >
    > Ok, nice example, but I believe that using id() as the hash function can
    > lead to unexpected collisions.


    No, you have that backwards. Using id() as the hash function means that
    equal keys will hash unequal -- rather than unexpected collisions, it
    leads to unexpected failure-to-collide-when-it-should-collide.

    And it isn't a "nice example", it is a terrible example.

    Firstly, the example fails to behave correctly. It simply doesn't work as
    advertised.

    Secondly, and much worse, it encourages people to do something dangerous
    without thinking about the consequences. If it is so easy to hash mutable
    objects, why don't built-in lists and sets don't have a __hash__ method?
    Do you think that the Python developers merely forgot?

    No, there is good reason why mutable items shouldn't be used as keys in
    dicts or in sets, and this example simply papers over the reasons why and
    gives the impression that using mutable objects as keys is easy and safe
    when it is neither.

    Using mutable objects as keys or set elements leads to surprising,
    unexpected behaviour and hard-to-find bugs. Consider the following set
    with lists as elements:

    L = [1, 2]
    s = Set() # set that allows mutable elements
    s.add(L)
    s.add([1, 2, 3])

    So far so good. But what should happen now?

    L.append(3)

    The set now has two equal elements, which breaks the set invariant that
    it has no duplicates.

    Putting the problem of duplicates aside, there is another problem:

    L = [1, 2]
    s = Set([L])
    L.append(3)

    There are two possibilities: either the hash function of L changes when
    the object changes, or it doesn't. Suppose it changes. Then the hash of L
    after the append will be different from the hash of L before the append,
    and so membership testing (L in s) will fail.

    Okay, suppose we somehow arrange matters so that the hash of the object
    doesn't change as the object mutates. This will solve the problem above:
    we can mutate L as often as we like, and L in s will continue to work
    correctly. But now the hash of an object doesn't just depend on it's
    value, but on its history. That means that two objects which are
    identical can hash differently, and we've already seen this is a problem.

    There is one final approach which could work: we give the object a
    constant hash function, so that all objects of that type hash
    identically. This means that the performance of searches and lookups in
    sets and dicts will fall to that of lists. There is no point in paying
    all the extra overhead of a dict to get behaviour as slow, or slower,
    than a list.

    In other words, no matter what you do, using mutable objects as keys or
    set elements leads to serious problems that need to be dealt with. It
    simply isn't true that all you need to do to make mutable objects usable
    in dicts or sets is to add a hash function. That part is trivial.



    --
    Steven
    Steven D'Aprano, Dec 14, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ToshiBoy
    Replies:
    6
    Views:
    847
    ToshiBoy
    Aug 12, 2008
  2. Fire Crow

    Re: insert unique data in a list

    Fire Crow, Dec 13, 2009, in forum: Python
    Replies:
    4
    Views:
    295
    Lie Ryan
    Dec 14, 2009
  3. deathweaselx86
    Replies:
    5
    Views:
    1,108
    Raymond Hettinger
    Jun 25, 2011
  4. J. Muenchbourg
    Replies:
    3
    Views:
    240
    Aaron Bertrand - MVP
    Sep 30, 2003
  5. Token Type
    Replies:
    9
    Views:
    355
    Chris Angelico
    Sep 9, 2012
Loading...

Share This Page