related lists mean value

  • Thread starter dimitri pater - serpia
  • Start date
D

dimitri pater - serpia

Hi,

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!

thanks!
Dimitri
 
J

John Posner

Hi,

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!

Nobody expects object-orientation (or the Spanish Inquisition):

#-------------------------
from collections import defaultdict

class Tally:
def __init__(self, id=None):
self.id = id
self.total = 0
self.count = 0

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y]
obj.id = y
obj.total += x
obj.count += 1

# process data
result_list = []
for key in sorted(tally_dict):
obj = tally_dict[key]
mean = 1.0 * obj.total / obj.count
result_list.extend([mean] * obj.count)
print result_list
#-------------------------

-John
 
J

John Posner

On 3/8/2010 9:39 PM, John Posner wrote:

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y]
obj.id = y <--- statement redundant, remove it
obj.total += x
obj.count += 1


-John
 
J

John Posner

On 3/8/2010 9:39 PM, John Posner wrote:

<snip>
obj.id = y <--- statement redundant, remove it


Sorry for the thrashing! It's more correct to say that the Tally class
doesn't require an "id" attribute at all. So the code becomes:

#---------
from collections import defaultdict

class Tally:
def __init__(self):
self.total = 0
self.count = 0

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y]
obj.total += x
obj.count += 1

# process data
result_list = []
for key in sorted(tally_dict):
obj = tally_dict[key]
mean = 1.0 * obj.total / obj.count
result_list.extend([mean] * obj.count)
print result_list
#---------

-John
 
M

Michael Rudolf

Am 08.03.2010 23:34, schrieb dimitri pater - serpia:
Hi,

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

This kinda looks like you used the wrong data structure.
Maybe you should have used a dict, like:
{'a': [1, 2], 'c': [5, 0, 7], 'b': [8]} ?
I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!

As said, I'd have used a dict in the first place, so lets transform this
straight forward into one:

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

# initialize dict
d={}
for idx in set(y):
d[idx]=[]

#collect values
for i, idx in enumerate(y):
d[idx].append(x)

print("d is now a dict of lists: %s" % d)

#calculate average
for key, values in d.items():
d[key]=sum(values)/len(values)

print("d is now a dict of averages: %s" % d)

# build the final list
w = [ d[key] for key in y ]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))


Output is:
d is now a dict of lists: {'a': [1, 2], 'c': [5, 0, 7], 'b': [8]}
d is now a dict of averages: {'a': 1.5, 'c': 4.0, 'b': 8.0}
w is now the list of averages, corresponding with y:

x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

Could have used a defaultdict to avoid dict initialisation, though.
Or write a custom class:

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

class A:
def __init__(self):
self.store={}
def add(self, key, number):
if key in self.store:
self.store[key].append(number)
else:
self.store[key] = [number]
a=A()

# collect data
for idx, val in zip(y,x):
a.add(idx, val)

# build the final list:
w = [ sum(a.store[key])/len(a.store[key]) for key in y ]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))

Produces same output, of course.

Note that those solutions are both not very efficient, but who cares ;)

No Problem,

Michael
 
M

Michael Rudolf

OK, I golfed it :D
Go ahead and kill me ;)

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(a,b,v={}):
try: v[a].append(b)
except: v[a]=
def g(a): return sum(v[a])/len(v[a])
return g
w = [g(i) for g,i in [(f(i,v),i) for i,v in zip(y,x)]]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))

Output:
w is now the list of averages, corresponding with y:

x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

Regards,
Michael
 
P

Peter Otten

Michael said:
OK, I golfed it :D
Go ahead and kill me ;)

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(a,b,v={}):
try: v[a].append(b)
except: v[a]=
def g(a): return sum(v[a])/len(v[a])
return g
w = [g(i) for g,i in [(f(i,v),i) for i,v in zip(y,x)]]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))

Output:
w is now the list of averages, corresponding with y:

x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
[sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]

[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

Peter
 
M

Michael Rudolf

Am 09.03.2010 13:02, schrieb Peter Otten:
[sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
Peter

.... pwned.
Should be the fastest and shortest way to do it.

I tried to do something like this, but my brain hurt while trying to
visualize list comprehension evaluation orders ;)

Regards,
Michael
 
S

Steve Howell

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]
I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!

Nobody expects object-orientation (or the Spanish Inquisition):

Heh. Yep, I avoided OO for this. Seems like a functional problem.
My solution is functional on the outside, imperative on the inside.
You could add recursion here, but I don't think it would be as
straightforward.

def num_dups_at_head(lst):
assert len(lst) > 0
val = lst[0]
i = 1
while i < len(lst) and lst == val:
i += 1
return i

def smooth(x, y):
result = []
while x:
cnt = num_dups_at_head(y)
avg = sum(x[:cnt]) * 1.0 / cnt
result += [avg] * cnt
x = x[cnt:]
y = y[cnt:]
return result

#-------------------------
from collections import defaultdict

class Tally:
     def __init__(self, id=None):
         self.id = id
         self.total = 0
         self.count = 0

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
     obj = tally_dict[y]
     obj.id = y
     obj.total += x
     obj.count += 1

# process data
result_list = []
for key in sorted(tally_dict):
     obj = tally_dict[key]
     mean = 1.0 * obj.total / obj.count
     result_list.extend([mean] * obj.count)
print result_list
#-------------------------
 
P

Peter Otten

Michael said:
Am 09.03.2010 13:02, schrieb Peter Otten:
[sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
Peter

... pwned.
Should be the fastest and shortest way to do it.

It may be short, but it is not particularly efficient. A dict-based approach
is probably the fastest. If y is guaranteed to be sorted itertools.groupby()
may also be worth a try.

$ cat tmp_average_compare.py
from __future__ import division
from collections import defaultdict
try:
from itertools import izip as zip
except ImportError:
pass

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(x=x, y=y):
p = defaultdict(int)
q = defaultdict(int)
for a, b in zip(x, y):
p += a
q += 1
return [p/q for b in y]

def g(x=x, y=y):
return [sum(a for a,b in zip(x,y)if b==c)/y.count(c)for c in y]

if __name__ == "__main__":
print(f())
print(g())
assert f() == g()
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'f()'
100000 loops, best of 3: 11.4 usec per loop
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'g()'
10000 loops, best of 3: 22.8 usec per loop

Peter
 
S

Steve Howell

Hi,

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

What results are you expecting if you have multiple runs of 'a' in a
longer list?
 
S

Steve Howell

def num_dups_at_head(lst):
    assert len(lst) > 0
    val = lst[0]
    i = 1
    while i < len(lst) and lst == val:
        i += 1
    return i

def smooth(x, y):
    result = []
    while x:
        cnt = num_dups_at_head(y)
        avg = sum(x[:cnt]) * 1.0 / cnt
        result += [avg] * cnt
        x = x[cnt:] # expensive?
        y = y[cnt:] # expensive?
    return result


BTW I recognize that my solution would be inefficient for long lists,
unless the underlying list implementation had copy-on-write. I'm
wondering what the easiest fix would be. I tried a quick shot at
islice(), but the lack of len() thwarted me.
 
N

nn

Peter said:
Michael said:
Am 09.03.2010 13:02, schrieb Peter Otten:
[sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
Peter

... pwned.
Should be the fastest and shortest way to do it.

It may be short, but it is not particularly efficient. A dict-based approach
is probably the fastest. If y is guaranteed to be sorted itertools.groupby()
may also be worth a try.

$ cat tmp_average_compare.py
from __future__ import division
from collections import defaultdict
try:
from itertools import izip as zip
except ImportError:
pass

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(x=x, y=y):
p = defaultdict(int)
q = defaultdict(int)
for a, b in zip(x, y):
p += a
q += 1
return [p/q for b in y]

def g(x=x, y=y):
return [sum(a for a,b in zip(x,y)if b==c)/y.count(c)for c in y]

if __name__ == "__main__":
print(f())
print(g())
assert f() == g()
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'f()'
100000 loops, best of 3: 11.4 usec per loop
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'g()'
10000 loops, best of 3: 22.8 usec per loop

Peter



I converged to the same solution but had an extra reduction step in
case there were a lot of repeats in the input. I think it is a good
compromise between efficiency, readability and succinctness.

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
from collections import defaultdict
totdct = defaultdict(int)
cntdct = defaultdict(int)
for name, num in zip(y,x):
totdct[name] += num
cntdct[name] += 1
avgdct = {name : totdct[name]/cnts for name, cnts in cntdct.items()}
w = [avgdct[name] for name in y]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top