Summing a 2D list

Discussion in 'Python' started by Mark, Jun 12, 2008.

  1. Mark

    Mark Guest

    Hi all,

    I have a scenario where I have a list like this:

    User Score
    1 0
    1 1
    1 5
    2 3
    2 1
    3 2
    4 3
    4 3
    4 2

    And I need to add up the score for each user to get something like
    this:

    User Score
    1 6
    2 4
    3 2
    4 8

    Is this possible? If so, how can I do it? I've tried looping through
    the arrays and not had much luck so far.

    Any help much appreciated,

    Mark
     
    Mark, Jun 12, 2008
    #1
    1. Advertising

  2. Mark wrote:

    > Hi all,
    >
    > I have a scenario where I have a list like this:
    >
    > User Score
    > 1 0
    > 1 1
    > 1 5
    > 2 3
    > 2 1
    > 3 2
    > 4 3
    > 4 3
    > 4 2
    >
    > And I need to add up the score for each user to get something like
    > this:
    >
    > User Score
    > 1 6
    > 2 4
    > 3 2
    > 4 8
    >
    > Is this possible? If so, how can I do it? I've tried looping through
    > the arrays and not had much luck so far.
    >
    > Any help much appreciated,


    Show us your efforts in code so far. Especially what the actual data looks
    like. Then we can suggest a solution.

    Diez
     
    Diez B. Roggisch, Jun 12, 2008
    #2
    1. Advertising

  3. Mark

    Chris Guest

    On Jun 12, 3:48 pm, Mark <> wrote:
    > Hi all,
    >
    > I have a scenario where I have a list like this:
    >
    > User            Score
    > 1                 0
    > 1                 1
    > 1                 5
    > 2                 3
    > 2                 1
    > 3                 2
    > 4                 3
    > 4                 3
    > 4                 2
    >
    > And I need to add up the score for each user to get something like
    > this:
    >
    > User            Score
    > 1                 6
    > 2                 4
    > 3                 2
    > 4                 8
    >
    > Is this possible? If so, how can I do it? I've tried looping through
    > the arrays and not had much luck so far.
    >
    > Any help much appreciated,
    >
    > Mark


    user_score = {}
    for record in list:
    user, score = record.split()
    if user in user_score: user_score[user] += score
    else: user_score[user] = score

    print '\n'.join(['%s\t%s' % (user, score) for user,score in
    sorted(user_score.items())])

    You don't mention what data structure you are keeping your records in
    but hopefully this helps you in the right direction.
     
    Chris, Jun 12, 2008
    #3
  4. Mark

    Mark Guest

    On Jun 12, 3:02 pm, "Diez B. Roggisch" <> wrote:
    > Mark wrote:
    > > Hi all,

    >
    > > I have a scenario where I have a list like this:

    >
    > > User            Score
    > > 1                 0
    > > 1                 1
    > > 1                 5
    > > 2                 3
    > > 2                 1
    > > 3                 2
    > > 4                 3
    > > 4                 3
    > > 4                 2

    >
    > > And I need to add up the score for each user to get something like
    > > this:

    >
    > > User            Score
    > > 1                 6
    > > 2                 4
    > > 3                 2
    > > 4                 8

    >
    > > Is this possible? If so, how can I do it? I've tried looping through
    > > the arrays and not had much luck so far.

    >
    > > Any help much appreciated,

    >
    > Show us your efforts in code so far. Especially what the actual data looks
    > like. Then we can suggest a solution.
    >
    > Diez


    Hi Diez, thanks for the quick reply.

    To be honest I'm relatively new to Python, so I don't know too much
    about how all the loop constructs work and how they differ to other
    languages. I'm building an app in Django and this data is coming out
    of a database and it looks like what I put up there!

    This was my (failed) attempt:

    predictions = Prediction.objects.all()
    scores = []
    for prediction in predictions:
    i = [prediction.predictor.id, 0]
    if prediction.predictionscore:
    i[1] += int(prediction.predictionscore)
    scores.append(i)

    I did have another loop in there (I'm fairly sure I need one) but that
    didn't work either. I don't imagine that snippet is very helpful,
    sorry!

    Any tips would be gratefully recieved!

    Thanks,

    Mark
     
    Mark, Jun 12, 2008
    #4
  5. Mark

    Aidan Guest

    Mark wrote:
    > Hi all,
    >
    > I have a scenario where I have a list like this:
    >
    > User Score
    > 1 0
    > 1 1
    > 1 5
    > 2 3
    > 2 1
    > 3 2
    > 4 3
    > 4 3
    > 4 2
    >
    > And I need to add up the score for each user to get something like
    > this:
    >
    > User Score
    > 1 6
    > 2 4
    > 3 2
    > 4 8
    >
    > Is this possible? If so, how can I do it? I've tried looping through
    > the arrays and not had much luck so far.
    >
    > Any help much appreciated,
    >
    > Mark



    does this work for you?


    users = [1,1,1,2,2,3,4,4,4]
    score = [0,1,5,3,1,2,3,3,2]

    d = dict()

    for u,s in zip(users,score):
    if d.has_key(u):
    d += s
    else:
    d = s

    for key in d.keys():
    print 'user: %d\nscore: %d\n' % (key,d[key])
     
    Aidan, Jun 12, 2008
    #5
  6. Mark

    John Salerno Guest

    "Mark" <> wrote in message
    news:...
    On Jun 12, 3:02 pm, "Diez B. Roggisch" <> wrote:
    > Mark wrote:


    ---
    This was my (failed) attempt:

    predictions = Prediction.objects.all()
    scores = []
    for prediction in predictions:
    i = [prediction.predictor.id, 0]
    if prediction.predictionscore:
    i[1] += int(prediction.predictionscore)
    scores.append(i)
    ---

    Your question sounds like a fun little project, but can you post what the
    actual list of users/scores looks like? Is it a list of tuples like this:

    [(1, 0), (1, 1) ... ]

    Or something else?
     
    John Salerno, Jun 12, 2008
    #6
  7. > To be honest I'm relatively new to Python, so I don't know too much
    > about how all the loop constructs work and how they differ to other
    > languages. I'm building an app in Django and this data is coming out
    > of a database and it looks like what I put up there!
    >
    > This was my (failed) attempt:
    >
    > predictions = Prediction.objects.all()
    > scores = []
    > for prediction in predictions:
    > i = [prediction.predictor.id, 0]
    > if prediction.predictionscore:
    > i[1] += int(prediction.predictionscore)
    > scores.append(i)
    >
    > I did have another loop in there (I'm fairly sure I need one) but that
    > didn't work either. I don't imagine that snippet is very helpful,
    > sorry!


    It is helpful because it tells us what your actual data looks like.

    What you need is to get a list of (predictor, score)-pairs. These you should
    be able to get like this:

    l = [(p.predictor.id, p.predictionscore) for p in predictions]

    Now you need to sort this list - because in the next step, we will aggregate
    the values for each predictor.

    result = []
    current_predictor = None
    total_sum = 0
    for predictor, score in l:
    if predictor != current_predictor:
    # only if we really have a current_predictor,
    # the non-existent first one doesn't count
    if current_predictor is not None:
    result.append((predictor, total_sum))
    total_sum = 0
    current_predictor = predictor
    total_sum += score

    That should be roughly it.

    Diez
     
    Diez B. Roggisch, Jun 12, 2008
    #7
  8. Mark

    Mark Guest

    John, it's a QuerySet coming from a database in Django. I don't know
    enough about the structure of this object to go into detail I'm
    afraid.

    Aidan, I got an error trying your suggestion: 'zip argument #2 must
    support iteration', I don't know what this means!

    Thanks to all who have answered! Sorry I'm not being very specific!
     
    Mark, Jun 12, 2008
    #8
  9. Mark

    Aidan Guest

    Mark wrote:
    > John, it's a QuerySet coming from a database in Django. I don't know
    > enough about the structure of this object to go into detail I'm
    > afraid.
    >
    > Aidan, I got an error trying your suggestion: 'zip argument #2 must
    > support iteration', I don't know what this means!


    well, if we can create 2 iterable sequences one which contains the user
    the other the scores, it should work

    the error means that the second argument to the zip function was not an
    iterable, such as a list tuple or string

    can you show me the lines you're using to retrieve the data sets from
    the database? then i might be able to tell you how to build the 2 lists
    you need.

    > Thanks to all who have answered! Sorry I'm not being very specific!
     
    Aidan, Jun 12, 2008
    #9
  10. Mark

    Aidan Guest

    Aidan wrote:
    > Mark wrote:
    >> John, it's a QuerySet coming from a database in Django. I don't know
    >> enough about the structure of this object to go into detail I'm
    >> afraid.
    >>
    >> Aidan, I got an error trying your suggestion: 'zip argument #2 must
    >> support iteration', I don't know what this means!

    >
    > well, if we can create 2 iterable sequences one which contains the user
    > the other the scores, it should work
    >
    > the error means that the second argument to the zip function was not an
    > iterable, such as a list tuple or string
    >
    > can you show me the lines you're using to retrieve the data sets from
    > the database? then i might be able to tell you how to build the 2 lists
    > you need.
    >


    wait you already did...

    predictions = Prediction.objects.all()
    pairs = [(p.predictor.id,p.predictionscore) for p in predictions]

    those 2 lines will will build a list of user/score pairs. you can then
    replace the call to zip with pairs

    any luck?
     
    Aidan, Jun 12, 2008
    #10
  11. Mark wrote:
    > John, it's a QuerySet coming from a database in Django. I don't know
    > enough about the structure of this object to go into detail I'm
    > afraid. [...]


    Then let the database do the summing up. That's what it's there for :)

    select user, sum(score) from score_table
    group by user

    or something very similar, depending on the actual database schema. I
    don't know how to do this with Django's ORM, but is the way to do it in
    plain SQL.

    -- Gerhard
     
    Gerhard Häring, Jun 12, 2008
    #11
  12. Counting things fast - was Re: Summing a 2D list

    Aidan wrote:
    > does this work for you?
    >
    > users = [1,1,1,2,2,3,4,4,4]
    > score = [0,1,5,3,1,2,3,3,2]
    >
    > d = dict()
    >
    > for u,s in zip(users,score):
    > if d.has_key(u):
    > d += s
    > else:
    > d = s
    >
    > for key in d.keys():
    > print 'user: %d\nscore: %d\n' % (key,d[key])


    I've recently had the very same problem and needed to optimize for the
    best solution. I've tried quite a few, including:

    1) using a dictionary with a default value

    d = collections.defaultdict(lambda: 0)
    d[key] += value

    2) Trying out if avoiding object allocation is worth the effort. Using
    Cython:

    cdef class Counter:
    cdef int _counter
    def __init__(self):
    self._counter = 0

    def inc(self):
    self._counter += 1

    def __int__(self):
    return self._counter

    def __iadd__(self, operand):
    self._counter += 1
    return self

    And no, this was *not* faster than the final solution. This counter
    class, which is basically a mutable int, is exactly as fast as just
    using this one (final solution) - tada!

    counter = {}
    try:
    counter[key] += 1
    except KeyError:
    counter[key] = 1

    Using psyco makes this a bit faster still. psyco can't optimize
    defaultdict or my custom Counter class, though.

    -- Gerhard
     
    Gerhard Häring, Jun 12, 2008
    #12
  13. Mark

    Mark Guest

    On Jun 12, 3:45 pm, Aidan <> wrote:
    > Aidan wrote:
    > > Mark wrote:
    > >> John, it's a QuerySet coming from a database in Django. I don't know
    > >> enough about the structure of this object to go into detail I'm
    > >> afraid.

    >
    > >> Aidan, I got an error trying your suggestion: 'zip argument #2 must
    > >> support iteration', I don't know what this means!

    >
    > > well, if we can create 2 iterable sequences one which contains the user
    > > the other the scores, it should work

    >
    > > the error means that the second argument to the zip function was not an
    > > iterable, such as a list tuple or string

    >
    > > can you show me the lines you're using to retrieve the data sets from
    > > the database? then i might be able to tell you how to build the 2 lists
    > > you need.

    >
    > wait you already did...
    >
    > predictions = Prediction.objects.all()
    > pairs = [(p.predictor.id,p.predictionscore) for p in predictions]
    >
    > those 2 lines will will build a list of user/score pairs.  you can then
    > replace the call to zip with pairs
    >
    > any luck?


    Thanks Aidan, this works great!

    Thanks also to everyone else, I'm sure your suggestions would have
    worked too if I'd been competent enough to do them properly!
     
    Mark, Jun 12, 2008
    #13
  14. Mark

    Paddy Guest

    Re: Counting things fast - was Re: Summing a 2D list

    On Jun 12, 4:14 pm, Gerhard Häring <> wrote:
    > Aidan wrote:
    > > does this work for you?

    >
    > > users = [1,1,1,2,2,3,4,4,4]
    > > score = [0,1,5,3,1,2,3,3,2]

    >
    > > d = dict()

    >
    > > for u,s in zip(users,score):
    > > if d.has_key(u):
    > > d += s
    > > else:
    > > d = s

    >
    > > for key in d.keys():
    > > print 'user: %d\nscore: %d\n' % (key,d[key])

    >
    > I've recently had the very same problem and needed to optimize for the
    > best solution. I've tried quite a few, including:
    >
    > 1) using a dictionary with a default value
    >
    > d = collections.defaultdict(lambda: 0)
    > d[key] += value
    >

    <<SNIP>>
    > -- Gerhard


    This might be faster, by avoiding the lambda:

    d = collections.defaultdict(int)
    d[key] += value

    - Paddy.
     
    Paddy, Jun 12, 2008
    #14
  15. Hi Mark,

    Mark <> writes:
    > I have a scenario where I have a list like this:
    >
    > User Score
    > 1 0
    > 1 1
    > 1 5
    > 2 3
    > 2 1
    > 3 2
    > 4 3
    > 4 3
    > 4 2
    >
    > And I need to add up the score for each user to get something like
    > this:
    >
    > User Score
    > 1 6
    > 2 4
    > 3 2
    > 4 8
    >
    > Is this possible? If so, how can I do it? I've tried looping through
    > the arrays and not had much luck so far.


    Although your problem has already been solved, I'd like to present a
    different approach which can be quite a bit faster. The most common
    approach seems to be using a dictionary:

    summed_up={}
    for user,vote in pairs:
    if summed_up.has_key(user):
    summed_up[user]+=vote
    else:
    summed_up[user]=vote

    But if the list of users is compact and the maximum value is known
    before, the using a list and coding the user into the list position is
    much more elegant:

    summed_up=list( (0,) * max_user )
    for user,vote in pairs:
    summed_up[user] += vote

    I've run a quick and dirty test on these approaches and found that the
    latter takes only half the time than the first. More precisely, with
    about 2 million pairs, i got:

    * dict approach: 2s
    (4s with "try: ... except KeyError:" instead of the "if")
    * list approach: 0.9s

    BTW this was inspired by the book "Programming Pearls" I read some
    years ago where a similar approach saved some magnitudes of time
    (using a bit field instead of a list to store reserved/free phone
    numbers IIRC).

    Yours,
    Karsten
     
    Karsten Heymann, Jun 13, 2008
    #15
  16. On Fri, Jun 13, 2008 at 2:12 PM, Karsten Heymann
    <> wrote:
    > Although your problem has already been solved, I'd like to present a
    > different approach which can be quite a bit faster. The most common
    > approach seems to be using a dictionary:
    >
    > summed_up={}
    > for user,vote in pairs:
    > if summed_up.has_key(user):
    > summed_up[user]+=vote
    > else:
    > summed_up[user]=vote


    You'll save even more by using:

    if user in summed_up:

    instead of has_key.

    --
    mvh Björn
     
    BJörn Lindqvist, Jun 13, 2008
    #16
  17. On Thu, Jun 12, 2008 at 3:48 PM, Mark <> wrote:
    > Hi all,
    >
    > I have a scenario where I have a list like this:
    >
    > User Score
    > 1 0
    > 1 1
    > 1 5
    > 2 3
    > 2 1
    > 3 2
    > 4 3
    > 4 3
    > 4 2
    >
    > And I need to add up the score for each user to get something like
    > this:
    >
    > User Score
    > 1 6
    > 2 4
    > 3 2
    > 4 8
    >
    > Is this possible? If so, how can I do it? I've tried looping through
    > the arrays and not had much luck so far.


    Here is another solution:

    from itertools import groupby
    from operator import itemgetter

    users = [1, 1, 1, 2, 2, 3, 4, 4, 4]
    scores = [0, 1, 5, 3, 1, 2, 3, 3, 2]

    for u, s in groupby(zip(users, scores), itemgetter(0)):
    print u, sum(y for x, y in s)

    You probably should reconsider how you can store your data in a more
    efficient format.

    --
    mvh Björn
     
    BJörn Lindqvist, Jun 13, 2008
    #17
  18. BJörn Lindqvist wrote:
    > [...]
    > Here is another solution:
    >
    > from itertools import groupby
    > from operator import itemgetter
    >
    > users = [1, 1, 1, 2, 2, 3, 4, 4, 4]
    > scores = [0, 1, 5, 3, 1, 2, 3, 3, 2]
    >
    > for u, s in groupby(zip(users, scores), itemgetter(0)):
    > print u, sum(y for x, y in s)


    Except that this won't work unless users and scores are sorted by user
    first. groupby() only coalesces identical values, and doesn't do what a
    "GROUP BY" clause in SQL is doing.

    Adding a sorted() call around zip() should be enough to make groupby()
    actually useful.

    But then, this code definitely starts to look like somebody desperately
    wanted to find a use for Python's new gimmicks.

    Here's more of the same sort ;-)

    >>> import sqlite3
    >>> sqlite3.connect(":memory:").execute("create table tmp(user,

    score)").executemany("insert into tmp(user, score) values (?, ?)",
    zip(users, scores)).execute("select user, sum(score) from tmp group by
    user").fetchall()
    [(1, 6), (2, 4), (3, 2), (4, 8)]

    -- Gerhard
     
    Gerhard Häring, Jun 13, 2008
    #18
  19. Hi Björn,

    "BJörn Lindqvist" <> writes:
    > On Fri, Jun 13, 2008 at 2:12 PM, Karsten Heymann
    > <> wrote:
    >> summed_up={}
    >> for user,vote in pairs:
    >> if summed_up.has_key(user):
    >> summed_up[user]+=vote
    >> else:
    >> summed_up[user]=vote

    >
    > You'll save even more by using:
    >
    > if user in summed_up:
    >
    > instead of has_key.


    You're right, then it goes down to 1.5s (compared to 0.9 for the pure
    list version). Pythons dictionaries are really great :)

    Yours
    Karsten
     
    Karsten Heymann, Jun 13, 2008
    #19
  20. Mark

    Paddy Guest

    On Jun 13, 1:12 pm, Karsten Heymann <>
    wrote:
    > Hi Mark,
    >
    >
    >
    > Mark <> writes:
    > > I have a scenario where I have a list like this:

    >
    > > User Score
    > > 1 0
    > > 1 1
    > > 1 5
    > > 2 3
    > > 2 1
    > > 3 2
    > > 4 3
    > > 4 3
    > > 4 2

    >
    > > And I need to add up the score for each user to get something like
    > > this:

    >
    > > User Score
    > > 1 6
    > > 2 4
    > > 3 2
    > > 4 8

    >
    > > Is this possible? If so, how can I do it? I've tried looping through
    > > the arrays and not had much luck so far.

    >
    > Although your problem has already been solved, I'd like to present a
    > different approach which can be quite a bit faster. The most common
    > approach seems to be using a dictionary:
    >
    > summed_up={}
    > for user,vote in pairs:
    > if summed_up.has_key(user):
    > summed_up[user]+=vote
    > else:
    > summed_up[user]=vote
    >
    > But if the list of users is compact and the maximum value is known
    > before, the using a list and coding the user into the list position is
    > much more elegant:
    >
    > summed_up=list( (0,) * max_user )
    > for user,vote in pairs:
    > summed_up[user] += vote
    >
    > I've run a quick and dirty test on these approaches and found that the
    > latter takes only half the time than the first. More precisely, with
    > about 2 million pairs, i got:
    >
    > * dict approach: 2s
    > (4s with "try: ... except KeyError:" instead of the "if")
    > * list approach: 0.9s
    >
    > BTW this was inspired by the book "Programming Pearls" I read some
    > years ago where a similar approach saved some magnitudes of time
    > (using a bit field instead of a list to store reserved/free phone
    > numbers IIRC).
    >
    > Yours,
    > Karsten


    How does your solution fare against the defaultdict solution of:

    d = collections.defaultdict(int)
    for u,s in zip(users,score): d += s


    - Paddy
     
    Paddy, Jun 13, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Summing a list

    , Sep 22, 2005, in forum: XML
    Replies:
    2
    Views:
    543
  2. Alf P. Steinbach
    Replies:
    2
    Views:
    551
    Chris Theis
    Feb 5, 2004
  3. Yaroslav Bulatov

    Microbenchmark: Summing over array of doubles

    Yaroslav Bulatov, Aug 1, 2004, in forum: Python
    Replies:
    9
    Views:
    417
    Christopher T King
    Aug 3, 2004
  4. Replies:
    0
    Views:
    532
  5. Jackson
    Replies:
    1
    Views:
    221
    Peter Otten
    May 17, 2011
Loading...

Share This Page