M
Maric Michaud
Le Friday 13 June 2008 14:12:40 Karsten Heymann, vous avez écrit :
So, writing C in python, which has dictionnary as builtin type, should be
considered "more elegant" ?
You are comparing apples with lemons, there is no such a difference between
list index access and dictionnary key access in Python.
If you know in advance the number and names of users, what prevent you to
initialize completelly the target dictionnary ?
The following code compare the same algorithm, once with list and the second
time with dict :
#!/usr/bin/env python
def do(f, u, v) :
from time import time
n=time()
f(u, v)
return time() -n
def c_dict(users, votes) :
d = dict(((e, 0) for e in users))
for u, v in votes : d += v
return d.values()
def c_list(users, votes) :
d = [ 0 for i in users ]
for u, v in votes : d += v
return d
u = range(3000)
import random
v = list((u[r%3000], random.randint(0,10000)) for r in range(5*10**6))
print "with list", do(c_list, u, v)
print "with dict", do(c_dict, u, v)
The result is pretty close now :
maric@redflag1 17:04:36:~$ ./test.py
with list 1.40726399422
with dict 1.63094091415
So why use list where the obvious and natural data structure is a
dictionnary ?
Hi Mark,
Mark said:I have a scenario where I have a list like this:
User Score
1 0
1 1
1 5
2 3
2 1
3 2
4 3
4 3
4 2
And I need to add up the score for each user to get something like
this:
User Score
1 6
2 4
3 2
4 8
Is this possible? If so, how can I do it? I've tried looping through
the arrays and not had much luck so far.
Although your problem has already been solved, I'd like to present a
different approach which can be quite a bit faster. The most common
approach seems to be using a dictionary:
summed_up={}
for user,vote in pairs:
if summed_up.has_key(user):
summed_up[user]+=vote
else:
summed_up[user]=vote
But if the list of users is compact and the maximum value is known
before, the using a list and coding the user into the list position is
much more elegant:
So, writing C in python, which has dictionnary as builtin type, should be
considered "more elegant" ?
summed_up=list( (0,) * max_user )
for user,vote in pairs:
summed_up[user] += vote
I've run a quick and dirty test on these approaches and found that the
latter takes only half the time than the first. More precisely, with
about 2 million pairs, i got:
* dict approach: 2s
(4s with "try: ... except KeyError:" instead of the "if")
* list approach: 0.9s
You are comparing apples with lemons, there is no such a difference between
list index access and dictionnary key access in Python.
BTW this was inspired by the book "Programming Pearls" I read some
years ago where a similar approach saved some magnitudes of time
(using a bit field instead of a list to store reserved/free phone
numbers IIRC).
If you know in advance the number and names of users, what prevent you to
initialize completelly the target dictionnary ?
The following code compare the same algorithm, once with list and the second
time with dict :
#!/usr/bin/env python
def do(f, u, v) :
from time import time
n=time()
f(u, v)
return time() -n
def c_dict(users, votes) :
d = dict(((e, 0) for e in users))
for u, v in votes : d += v
return d.values()
def c_list(users, votes) :
d = [ 0 for i in users ]
for u, v in votes : d += v
return d
u = range(3000)
import random
v = list((u[r%3000], random.randint(0,10000)) for r in range(5*10**6))
print "with list", do(c_list, u, v)
print "with dict", do(c_dict, u, v)
The result is pretty close now :
maric@redflag1 17:04:36:~$ ./test.py
with list 1.40726399422
with dict 1.63094091415
So why use list where the obvious and natural data structure is a
dictionnary ?