Re: count

Discussion in 'Python' started by Vilya Harvey, Jul 8, 2009.

  1. Vilya Harvey

    Vilya Harvey Guest

    2009/7/8 Dhananjay <>:
    > I wanted to sort column 2 in assending order  and I read whole file in array
    > "data" and did the following:
    >
    > data.sort(key = lambda fields:(fields[2]))
    >
    > I have sorted column 2, however I want to count the numbers in the column 2.
    > i.e. I want to know, for example, how many repeates of say '3' (first row,
    > 2nd column in above data) are there in column 2.


    One thing: indexes in Python start from 0, so the second column has an
    index of 1 not 2. In other words, it should be data.sort(key = lambda
    fields: fields[1]) instead.

    With that out of the way, the following will print out a count of each
    unique item in the second column:

    from itertools import groupby
    for x, g in groupby([fields[1] for fields in data]):
    print x, len(tuple(g))

    Hope that helps,
    Vil.
    Vilya Harvey, Jul 8, 2009
    #1
    1. Advertising

  2. Vilya Harvey

    Bearophile Guest

    Vilya Harvey:
    > from itertools import groupby
    > for x, g in groupby([fields[1] for fields in data]):
    >     print x, len(tuple(g))


    Avoid that len(tuple(g)), use something like the following, it's lazy
    and saves some memory.


    def leniter(iterator):
    """leniter(iterator): return the length of a given
    iterator, consuming it, without creating a list.
    Never use it with infinite iterators.

    >>> leniter()

    Traceback (most recent call last):
    ...
    TypeError: leniter() takes exactly 1 argument (0 given)
    >>> leniter([])

    0
    >>> leniter([1])

    1
    >>> leniter(iter([1]))

    1
    >>> leniter(x for x in xrange(100) if x%2)

    50
    >>> from itertools import groupby
    >>> [(leniter(g), h) for h,g in groupby("aaaabccaadeeee")]

    [(4, 'a'), (1, 'b'), (2, 'c'), (2, 'a'), (1, 'd'), (4, 'e')]

    >>> def foo0():

    ... if False: yield 1
    >>> leniter(foo0())

    0

    >>> def foo1(): yield 1
    >>> leniter(foo1())

    1
    """
    # This code is faster than: sum(1 for _ in iterator)
    if hasattr(iterator, "__len__"):
    return len(iterator)
    nelements = 0
    for _ in iterator:
    nelements += 1
    return nelements

    Bye,
    bearophile
    Bearophile, Jul 8, 2009
    #2
    1. Advertising

  3. Vilya Harvey

    Paul Rubin Guest

    Bearophile <> writes:
    > >     print x, len(tuple(g))

    >
    > Avoid that len(tuple(g)), use something like the following


    print x, sum(1 for _ in g)
    Paul Rubin, Jul 8, 2009
    #3
  4. Vilya Harvey

    Aahz Guest

    In article <>,
    Bearophile <> wrote:
    >Vilya Harvey:
    >>
    >> from itertools import groupby
    >> for x, g in groupby([fields[1] for fields in data]):
    >> =A0 =A0 print x, len(tuple(g))

    >
    >Avoid that len(tuple(g)), use something like the following, it's lazy
    >and saves some memory.


    The question is whether it saves time, have you tested it?
    --
    Aahz () <*> http://www.pythoncraft.com/

    "as long as we like the same operating system, things are cool." --piranha
    Aahz, Jul 8, 2009
    #4
  5. Vilya Harvey

    Paul Rubin Guest

    (Aahz) writes:
    > >Avoid that len(tuple(g)), use something like the following, it's lazy
    > >and saves some memory.

    > The question is whether it saves time, have you tested it?


    len(tuple(xrange(100000000))) ... hmm.
    Paul Rubin, Jul 8, 2009
    #5
  6. Vilya Harvey

    Aahz Guest

    In article <>,
    Paul Rubin <http://> wrote:
    > (Aahz) writes:
    >>Paul Rubin deleted an attribution:
    >>>
    >>>Avoid that len(tuple(g)), use something like the following, it's lazy
    >>>and saves some memory.

    >>
    >> The question is whether it saves time, have you tested it?

    >
    >len(tuple(xrange(100000000))) ... hmm.


    When dealing with small N, O() can get easily swamped by the constant
    factors. How often do you deal with more than a hundred fields?
    --
    Aahz () <*> http://www.pythoncraft.com/

    "as long as we like the same operating system, things are cool." --piranha
    Aahz, Jul 8, 2009
    #6
  7. Vilya Harvey

    Paul Rubin Guest

    (Aahz) writes:
    > When dealing with small N, O() can get easily swamped by the constant
    > factors. How often do you deal with more than a hundred fields?


    The number of fields in the OP's post was not stated. Expecting it to
    be less than 100 seems like an ill-advised presumption. If N is
    unknown, speed-tuning the case where N is small at the expense of
    consuming monstrous amounts of memory when N is large sounds
    somewhere between a premature optimization and a nasty bug.
    Paul Rubin, Jul 8, 2009
    #7
  8. On Wed, 2009-07-08 at 14:45 -0700, Paul Rubin wrote:
    > (Aahz) writes:
    > > >Avoid that len(tuple(g)), use something like the following, it's lazy
    > > >and saves some memory.

    > > The question is whether it saves time, have you tested it?

    >
    > len(tuple(xrange(100000000))) ... hmm.


    timer.py
    --------
    from datetime import datetime

    def tupler(n):
    return len(tuple(xrange(n)))

    def summer(n):
    return sum(1 for x in xrange(n))

    def test_func(f, n):
    print f.__name__,
    start = datetime.now()
    print f(n)
    end = datetime.now()
    print "Start: %s" % start
    print "End: %s" % end
    print "Duration: %s" % (end - start,)

    if __name__ == '__main__':
    test_func(summer, 10000000)
    test_func(tupler, 10000000)
    test_func(summer, 100000000)
    test_func(tupler, 100000000)

    $ python timer.py
    summer 10000000
    Start: 2009-07-08 22:02:13.216689
    End: 2009-07-08 22:02:15.855931
    Duration: 0:00:02.639242
    tupler 10000000
    Start: 2009-07-08 22:02:15.856122
    End: 2009-07-08 22:02:16.743153
    Duration: 0:00:00.887031
    summer 100000000
    Start: 2009-07-08 22:02:16.743863
    End: 2009-07-08 22:02:49.372756
    Duration: 0:00:32.628893
    Killed
    $

    Note that "Killed" did not come from anything I did. The tupler just
    bombed out when the tuple got too big for it to handle. Tupler was
    faster for as large an input as it could handle, as well as for small
    inputs (test not shown).
    J. Clifford Dyer, Jul 9, 2009
    #8
  9. Vilya Harvey

    Bearophile Guest

    Paul Rubin:
    > print x, sum(1 for _ in g)


    Don't use that, use my function :) If g has a __len__ you are wasting
    time. And sum(1 ...) is (on my PC) slower.


    J. Clifford Dyer:
    > if __name__ == '__main__':
    >     test_func(summer, 10000000)
    >     test_func(tupler, 10000000)
    >     test_func(summer, 100000000)
    >     test_func(tupler, 100000000)


    Have you forgotten my function?

    Bye,
    bearophile
    Bearophile, Jul 9, 2009
    #9
  10. Bearophile wins! (This only times the loop itself. It doesn't check
    for __len__)

    summer:5
    0:00:00.000051
    bearophile:5
    0:00:00.000009
    summer:50
    0:00:00.000030
    bearophile:50
    0:00:00.000013
    summer:500
    0:00:00.000077
    bearophile:500
    0:00:00.000053
    summer:5000
    0:00:00.000575
    bearophile:5000
    0:00:00.000473
    summer:50000
    0:00:00.005583
    bearophile:50000
    0:00:00.004625
    summer:500000
    0:00:00.055834
    bearophile:500000
    0:00:00.046137
    summer:5000000
    0:00:00.426734
    bearophile:5000000
    0:00:00.349573
    summer:50000000
    0:00:04.180920
    bearophile:50000000
    0:00:03.652311
    summer:500000000
    0:00:42.647885
    bearophile: 500000000
    0:00:35.190550

    On Thu, 2009-07-09 at 04:04 -0700, Bearophile wrote:
    > Paul Rubin:
    > > print x, sum(1 for _ in g)

    >
    > Don't use that, use my function :) If g has a __len__ you are wasting
    > time. And sum(1 ...) is (on my PC) slower.
    >
    >
    > J. Clifford Dyer:
    > > if __name__ == '__main__':
    > > test_func(summer, 10000000)
    > > test_func(tupler, 10000000)
    > > test_func(summer, 100000000)
    > > test_func(tupler, 100000000)

    >
    > Have you forgotten my function?
    >
    > Bye,
    > bearophile
    J. Cliff Dyer, Jul 9, 2009
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Derek LaZard

    Re: Abt Datareader Count

    Derek LaZard, Jul 8, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    1,738
    Derek LaZard
    Jul 8, 2003
  2. Isaac
    Replies:
    2
    Views:
    3,841
    Arvind Kumar
    Aug 18, 2003
  3. ralf
    Replies:
    0
    Views:
    1,154
  4. Replies:
    3
    Views:
    932
  5. efelnavarro09
    Replies:
    2
    Views:
    934
    efelnavarro09
    Jan 26, 2011
Loading...

Share This Page