Is there a unique method in python to unique a list?

Discussion in 'Python' started by Token Type, Sep 9, 2012.

  1. Token Type

    Token Type Guest

    Is there a unique method in python to unique a list? thanks
    Token Type, Sep 9, 2012
    #1
    1. Advertising

  2. On Sun, Sep 9, 2012 at 3:43 PM, Token Type <> wrote:
    > Is there a unique method in python to unique a list? thanks


    I don't believe there's a method for that, but if you don't care about
    order, try turning your list into a set and then back into a list.

    ChrisA
    Chris Angelico, Sep 9, 2012
    #2
    1. Advertising

  3. On Sun, Sep 9, 2012 at 4:29 PM, John H. Li <> wrote:
    > However, if I don't put list(set(lemma_list)) to a variable name, it works
    > much faster.


    Try backdenting that statement. You're currently doing it at every
    iteration of the loop - that's why it's so much slower.

    But you'll probably find it better to work with the set directly,
    instead of uniquifying a list as a separate operation.

    ChrisA
    Chris Angelico, Sep 9, 2012
    #3
  4. Token Type

    Token Type Guest


    > Try backdenting that statement. You're currently doing it at every
    >
    > iteration of the loop - that's why it's so much slower.


    Thanks. I works now.

    >>> def average_polysemy(pos):

    synset_list = list(wn.all_synsets(pos))
    sense_number = 0
    lemma_list = []
    for synset in synset_list:
    lemma_list.extend(synset.lemma_names)
    for lemma in list(set(lemma_list)):
    sense_number_new = len(wn.synsets(lemma, pos))
    sense_number = sense_number + sense_number_new
    return sense_number/len(set(lemma_list))

    >>> average_polysemy('n')

    1


    > But you'll probably find it better to work with the set directly,
    >
    > instead of uniquifying a list as a separate operation.


    Yes, the following second methods still runs faster if I don't give a separate variable name to list(set(lemma_list)). Why will this happen?

    >>> def average_polysemy(pos):

    synset_list = list(wn.all_synsets(pos))
    sense_number = 0
    lemma_list = []
    for synset in synset_list:
    lemma_list.extend(synset.lemma_names)
    for lemma in list(set(lemma_list)):
    sense_number_new = len(wn.synsets(lemma, pos))
    sense_number = sense_number + sense_number_new
    return sense_number/len(set(lemma_list))

    >>> average_polysemy('n')

    1
    Token Type, Sep 9, 2012
    #4
  5. Token Type

    Token Type Guest


    > Try backdenting that statement. You're currently doing it at every
    >
    > iteration of the loop - that's why it's so much slower.


    Thanks. I works now.

    >>> def average_polysemy(pos):

    synset_list = list(wn.all_synsets(pos))
    sense_number = 0
    lemma_list = []
    for synset in synset_list:
    lemma_list.extend(synset.lemma_names)
    for lemma in list(set(lemma_list)):
    sense_number_new = len(wn.synsets(lemma, pos))
    sense_number = sense_number + sense_number_new
    return sense_number/len(set(lemma_list))

    >>> average_polysemy('n')

    1


    > But you'll probably find it better to work with the set directly,
    >
    > instead of uniquifying a list as a separate operation.


    Yes, the following second methods still runs faster if I don't give a separate variable name to list(set(lemma_list)). Why will this happen?

    >>> def average_polysemy(pos):

    synset_list = list(wn.all_synsets(pos))
    sense_number = 0
    lemma_list = []
    for synset in synset_list:
    lemma_list.extend(synset.lemma_names)
    for lemma in list(set(lemma_list)):
    sense_number_new = len(wn.synsets(lemma, pos))
    sense_number = sense_number + sense_number_new
    return sense_number/len(set(lemma_list))

    >>> average_polysemy('n')

    1
    Token Type, Sep 9, 2012
    #5
  6. On 09.09.12 08:47, Donald Stufft wrote:
    > If you don't need to retain order you can just use a set,


    Only if elements are hashable.
    Serhiy Storchaka, Sep 9, 2012
    #6
  7. Token Type

    Paul Rubin Guest

    Token Type <> writes:
    >>>> def average_polysemy(pos):

    > synset_list = list(wn.all_synsets(pos))
    > sense_number = 0
    > lemma_list = []
    > for synset in synset_list:
    > lemma_list.extend(synset.lemma_names)
    > for lemma in list(set(lemma_list)):
    > sense_number_new = len(wn.synsets(lemma, pos))
    > sense_number = sense_number + sense_number_new
    > return sense_number/len(set(lemma_list))


    I think you mean (untested):

    synsets = wn.all_synsets(pos)
    sense_number = 0
    lemma_set = set()
    for synset in synsets:
    lemma_set.add(synset.lemma_names)
    for lemma in lemma_set:
    sense_number += len(wn.synsets(lemma,pos))
    return sense_number / len(lemma_set)
    Paul Rubin, Sep 9, 2012
    #7
  8. Token Type

    Paul Rubin Guest

    Paul Rubin <> writes:
    > I think you mean (untested):
    >
    > synsets = wn.all_synsets(pos)
    > sense_number = 0
    > lemma_set = set()
    > for synset in synsets:
    > lemma_set.add(synset.lemma_names)
    > for lemma in lemma_set:
    > sense_number += len(wn.synsets(lemma,pos))
    > return sense_number / len(lemma_set)


    Or even:

    lemma_set = set(synset for synset in wn.all_synsets(pos))
    sense_number = sum(len(wn.synsets(lemma, pos)) for lemma in lemma_set)
    return sense_number / len(lemma_set)
    Paul Rubin, Sep 9, 2012
    #8
  9. Token Type

    Token Type Guest

    Thanks. I try to use set() suggested by you. However, not successful. Please see:
    >>> synsets = list(wn.all_synsets('n'))
    >>> synsets[:5]

    [Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('abstraction.n.06'), Synset('thing.n.12'), Synset('object.n.01')]
    >>> lemma_set = set()
    >>> for synset in synsets:

    lemma_set.add(synset.lemma_names)


    Traceback (most recent call last):
    File "<pyshell#43>", line 2, in <module>
    lemma_set.add(synset.lemma_names)
    TypeError: unhashable type: 'list'
    >>> for synset in synsets:

    lemma_set.add(set(synset.lemma_names))

    Traceback (most recent call last):
    File "<pyshell#45>", line 2, in <module>
    lemma_set.add(set(synset.lemma_names))
    TypeError: unhashable type: 'set'
    Token Type, Sep 9, 2012
    #9
  10. On Sun, Sep 9, 2012 at 11:44 PM, Token Type <> wrote:
    > lemma_set.add(synset.lemma_names)


    That tries to add the whole list as a single object, which doesn't
    work because lists can't go into sets. There are two solutions,
    depending on what you want to do.

    1) If you want each addition to remain discrete, make a tuple instead:
    lemma_set.add(tuple(synset.lemma_names))

    2) If you want to add the elements of that list individually into the
    set, use update:
    lemma_set.update(synset.lemma_names)

    I'm thinking you probably want option 2 here.

    ChrisA
    Chris Angelico, Sep 9, 2012
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    698
    Patrice
    May 12, 2005
  2. David Stockwell
    Replies:
    2
    Views:
    541
    Grant Edwards
    Jun 8, 2004
  3. ToshiBoy
    Replies:
    6
    Views:
    842
    ToshiBoy
    Aug 12, 2008
  4. Peng Yu
    Replies:
    0
    Views:
    310
    Peng Yu
    Sep 26, 2009
  5. deathweaselx86
    Replies:
    5
    Views:
    1,104
    Raymond Hettinger
    Jun 25, 2011
Loading...

Share This Page