RE: Find duplicates in a list and count them ...

Discussion in 'Python' started by Paul.Scipione@aps.com, Mar 26, 2009.

  1. Guest

    Hi D'Arcy J.M. Cain,

    Thank you. I tried this and my list of 76,979 integers got reduced to a dictionary of 76,963 items, each item listing the integer value from the list, a comma, and a 1. I think what this is doing is finding all integers from my list that are unique (only one instance of it in the list), instead of creating a dictionary with integers that are not unique, with a count of how many times they occur. My dictionary should contain only 11 items listing 11 integer values and the number of times they appear in my original list.

    Thanks,

    Paul J. Scipione
    GIS Database Administrator
    work: 602-371-7091
    cell: 480-980-4721

    -----Original Message-----
    From: D'Arcy J.M. Cain [mailto:]
    Sent: Thursday, March 26, 2009 12:50 PM
    To: Scipione, Paul (ZP5296)
    Cc:
    Subject: Re: Find duplicates in a list and count them ...

    On Thu, 26 Mar 2009 12:22:27 -0700
    wrote:
    > I'm a newbie to Python. I have a list which contains integers (about 80,000). I want to find a quick way to get the numbers that occur in the list more than once, and how many times that number is duplicated in the list. I've done this right now by looping through the list, getting a number, querying the list to find out how many times the number exists, then writing it to a new list. On this many records it takes a couple of minutes. What I am looking for is something in python that can grab this info without looping through a list.


    icount = {}
    for i in list_of_ints:
    icount = icount.get(i, 0) + 1

    Now you have a dictionary of every integer in the list and the count of times it appears.

    --
    D'Arcy J.M. Cain <> | Democracy is three wolves
    http://www.druid.net/darcy/ | and a sheep voting on
    +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

    Email Firewall made the following annotations

    ---------------------------------------------------------------------
    --- NOTICE ---

    This message is for the designated recipient only and may contain confidential, privileged or proprietary information. If you have received it in error, please notify the sender immediately and delete the original and any copy or printout. Unintended recipients are prohibited from making any other use of this e-mail. Although we have taken reasonable precautions to ensure no viruses are present in this e-mail, we accept no liability for any loss or damage arising from the use of this e-mail or attachments, or for any delay or errors or omissions in the contents which result from e-mail transmission.

    ---------------------------------------------------------------------
    , Mar 26, 2009
    #1
    1. Advertising

  2. John Machin Guest

    On Mar 27, 8:14 am, wrote:
    > Hi D'Arcy J.M. Cain,
    >
    > Thank you.  I tried this and my list of 76,979 integers got reduced to a dictionary of 76,963 items, each item listing the integer value from the list, a comma, and a 1.


    I doubt this very much. Please show:
    (a) your implementation of D'Arcy's suggestion
    (b) the code you used that lead you to the conclusion that all counts
    were 1. See example below.


    >  I think what this is doing is finding all integers from my list that are unique (only one instance of it in the list), instead of creating a dictionary with integers that are not unique, with a count of how many times they occur.  My dictionary should contain only 11 items listing 11 integer values and the number of times they appear in my original list.



    The only way of getting your desired result is to get a dict of counts
    and then to filter out the ones where the count is greater than one.
    D'Arcy appears to have presumed that it was not necessary to show the
    second stage :)

    [assuming Python 2.6]
    >>> list_of_ints = [999, 2, 3, 999, 2, 2, 8, 42, 999, 42, 5]
    >>> len(list_of_ints)

    11
    >>> icount = {}
    >>> for i in list_of_ints:

    ... icount = icount.get(i, 0) + 1
    ...
    >>> icount

    {2: 3, 3: 1, 5: 1, 999: 3, 8: 1, 42: 2}
    >>> len(icount)

    6
    >>> all(count == 1 for count in icount.itervalues())

    False
    >>> dups = dict((k, v) for k, v in icount.iteritems() if v > 1)
    >>> dups

    {2: 3, 42: 2, 999: 3}
    >>>


    HTH,
    John
    John Machin, Mar 26, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Anonieko

    HttpHandlers - Learn Them. Use Them.

    Anonieko, Jun 15, 2006, in forum: ASP .Net
    Replies:
    5
    Views:
    513
    tdavisjr
    Jun 16, 2006
  2. D'Arcy J.M. Cain
    Replies:
    1
    Views:
    575
    Paul Rubin
    Mar 26, 2009
  3. MRAB
    Replies:
    0
    Views:
    718
  4. why the lucky stiff
    Replies:
    5
    Views:
    140
    why the lucky stiff
    Sep 22, 2004
  5. basi
    Replies:
    4
    Views:
    166
    Wayne Vucenic
    Aug 1, 2005
Loading...

Share This Page