related lists mean value

dimitri pater - serpia · Mar 8, 2010

Hi,

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!

thanks!
Dimitri

John Posner · Mar 8, 2010

Hi,

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!

Nobody expects object-orientation (or the Spanish Inquisition):

#-------------------------
from collections import defaultdict

class Tally:
def __init__(self, id=None):
self.id = id
self.total = 0
self.count = 0

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y]
obj.id = y
obj.total += x
obj.count += 1

# process data
result_list = []
for key in sorted(tally_dict):
obj = tally_dict[key]
mean = 1.0 * obj.total / obj.count
result_list.extend([mean] * obj.count)
print result_list
#-------------------------

-John

John Posner · Mar 8, 2010

On 3/8/2010 9:39 PM, John Posner wrote:

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y]
obj.id = y <--- statement redundant, remove it
obj.total += x
obj.count += 1

-John

John Posner · Mar 8, 2010

On 3/8/2010 9:39 PM, John Posner wrote:

<snip>

obj.id = y <--- statement redundant, remove it

Click to expand...

Sorry for the thrashing! It's more correct to say that the Tally class
doesn't require an "id" attribute at all. So the code becomes:

#---------
from collections import defaultdict

class Tally:
def __init__(self):
self.total = 0
self.count = 0

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y]
obj.total += x
obj.count += 1

# process data
result_list = []
for key in sorted(tally_dict):
obj = tally_dict[key]
mean = 1.0 * obj.total / obj.count
result_list.extend([mean] * obj.count)
print result_list
#---------

-John

Michael Rudolf · Mar 9, 2010

Am 08.03.2010 23:34, schrieb dimitri pater - serpia:

Hi,

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

This kinda looks like you used the wrong data structure.
Maybe you should have used a dict, like:
{'a': [1, 2], 'c': [5, 0, 7], 'b': [8]} ?

I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!

As said, I'd have used a dict in the first place, so lets transform this
straight forward into one:

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

# initialize dict
d={}
for idx in set(y):
d[idx]=[]

#collect values
for i, idx in enumerate(y):
d[idx].append(x)

print("d is now a dict of lists: %s" % d)

#calculate average
for key, values in d.items():
d[key]=sum(values)/len(values)

print("d is now a dict of averages: %s" % d)

# build the final list
w = [ d[key] for key in y ]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))

Output is:
d is now a dict of lists: {'a': [1, 2], 'c': [5, 0, 7], 'b': [8]}
d is now a dict of averages: {'a': 1.5, 'c': 4.0, 'b': 8.0}
w is now the list of averages, corresponding with y:

x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

Could have used a defaultdict to avoid dict initialisation, though.
Or write a custom class:

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

class A:
def __init__(self):
self.store={}
def add(self, key, number):
if key in self.store:
self.store[key].append(number)
else:
self.store[key] = [number]
a=A()

# collect data
for idx, val in zip(y,x):
a.add(idx, val)

# build the final list:
w = [ sum(a.store[key])/len(a.store[key]) for key in y ]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))

Produces same output, of course.

Note that those solutions are both not very efficient, but who cares

thanks!

Click to expand...

No Problem,

Michael

Michael Rudolf · Mar 9, 2010

OK, I golfed it

Go ahead and kill me

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(a,b,v={}):
try: v[a].append(b)
except: v[a]=
def g(a): return sum(v[a])/len(v[a])
return g
w = [g(i) for g,i in [(f(i,v),i) for i,v in zip(y,x)]]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))

Output:
w is now the list of averages, corresponding with y:

x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

Regards,
Michael

Peter Otten · Mar 9, 2010

Michael said:
OK, I golfed it
Go ahead and kill me

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(a,b,v={}):
try: v[a].append(b)
except: v[a]=
def g(a): return sum(v[a])/len(v[a])
return g
w = [g(i) for g,i in [(f(i,v),i) for i,v in zip(y,x)]]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))

Output:
w is now the list of averages, corresponding with y:

x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

[sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]

Click to expand...

Click to expand...

[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

Peter

Michael Rudolf · Mar 9, 2010

Am 09.03.2010 13:02, schrieb Peter Otten:

[sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]

Click to expand...

Click to expand...

[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
Peter

.... pwned.
Should be the fastest and shortest way to do it.

I tried to do something like this, but my brain hurt while trying to
visualize list comprehension evaluation orders

Regards,
Michael

Steve Howell · Mar 9, 2010

Hi,

Click to expand...

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

Click to expand...

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

Click to expand...

I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!

Click to expand...

Nobody expects object-orientation (or the Spanish Inquisition):

Heh. Yep, I avoided OO for this. Seems like a functional problem.
My solution is functional on the outside, imperative on the inside.
You could add recursion here, but I don't think it would be as
straightforward.

def num_dups_at_head(lst):
assert len(lst) > 0
val = lst[0]
i = 1
while i < len(lst) and lst == val:
i += 1
return i

def smooth(x, y):
result = []
while x:
cnt = num_dups_at_head(y)
avg = sum(x[:cnt]) * 1.0 / cnt
result += [avg] * cnt
x = x[cnt:]
y = y[cnt:]
return result

#-------------------------
from collections import defaultdict

class Tally:
def __init__(self, id=None):
self.id = id
self.total = 0
self.count = 0

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y]
obj.id = y
obj.total += x
obj.count += 1

# process data
result_list = []
for key in sorted(tally_dict):
obj = tally_dict[key]
mean = 1.0 * obj.total / obj.count
result_list.extend([mean] * obj.count)
print result_list
#-------------------------

Click to expand...

Peter Otten · Mar 9, 2010

Michael said:
Am 09.03.2010 13:02, schrieb Peter Otten:

[sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]

Click to expand...

[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
Peter

Click to expand...

... pwned.
Should be the fastest and shortest way to do it.

It may be short, but it is not particularly efficient. A dict-based approach
is probably the fastest. If y is guaranteed to be sorted itertools.groupby()
may also be worth a try.

$ cat tmp_average_compare.py
from __future__ import division
from collections import defaultdict
try:
from itertools import izip as zip
except ImportError:
pass

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(x=x, y=y):
p = defaultdict(int)
q = defaultdict(int)
for a, b in zip(x, y):
p += a
q += 1
return [p/q for b in y]

def g(x=x, y=y):
return [sum(a for a,b in zip(x,y)if b==c)/y.count(c)for c in y]

if __name__ == "__main__":
print(f())
print(g())
assert f() == g()
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'f()'
100000 loops, best of 3: 11.4 usec per loop
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'g()'
10000 loops, best of 3: 22.8 usec per loop

Peter

Steve Howell · Mar 9, 2010

Hi,

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

What results are you expecting if you have multiple runs of 'a' in a
longer list?

Steve Howell · Mar 9, 2010

def num_dups_at_head(lst):
assert len(lst) > 0
val = lst[0]
i = 1
while i < len(lst) and lst == val:
i += 1
return i

def smooth(x, y):
result = []
while x:
cnt = num_dups_at_head(y)
avg = sum(x[:cnt]) * 1.0 / cnt
result += [avg] * cnt
x = x[cnt:] # expensive?
y = y[cnt:] # expensive?
return result

BTW I recognize that my solution would be inefficient for long lists,
unless the underlying list implementation had copy-on-write. I'm
wondering what the easiest fix would be. I tried a quick shot at
islice(), but the lack of len() thwarted me.

nn · Mar 9, 2010

Peter said:
Michael said:

Am 09.03.2010 13:02, schrieb Peter Otten:

[sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
Peter

Click to expand...

... pwned.
Should be the fastest and shortest way to do it.

Click to expand...

It may be short, but it is not particularly efficient. A dict-based approach
is probably the fastest. If y is guaranteed to be sorted itertools.groupby()
may also be worth a try.

$ cat tmp_average_compare.py
from __future__ import division
from collections import defaultdict
try:
from itertools import izip as zip
except ImportError:
pass

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(x=x, y=y):
p = defaultdict(int)
q = defaultdict(int)
for a, b in zip(x, y):
p += a
q += 1
return [p/q for b in y]

def g(x=x, y=y):
return [sum(a for a,b in zip(x,y)if b==c)/y.count(c)for c in y]

if __name__ == "__main__":
print(f())
print(g())
assert f() == g()
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'f()'
100000 loops, best of 3: 11.4 usec per loop
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'g()'
10000 loops, best of 3: 22.8 usec per loop

Peter

I converged to the same solution but had an extra reduction step in
case there were a lot of repeats in the input. I think it is a good
compromise between efficiency, readability and succinctness.

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
from collections import defaultdict
totdct = defaultdict(int)
cntdct = defaultdict(int)
for name, num in zip(y,x):
totdct[name] += num
cntdct[name] += 1
avgdct = {name : totdct[name]/cnts for name, cnts in cntdct.items()}
w = [avgdct[name] for name in y]

Hello I am learning how to code and I tried making a calculator with HTML and js with some CSS I am stuck at thing, Like the screen value is	0	Mar 13, 2025
Struggling for inspiration with lists	1	Dec 17, 2013
AES-128 Clipboard Protector: Auto-Encrypt Ctrl+C, Smart-Decrypt Ctrl+V (C++ Windows Hook)	7	Mar 24, 2026
How to multiply two matrices of size in using inline assembly in C++	3	Mar 3, 2024
Lowest Value in List	5	Oct 2, 2013
How to use Densenet121 in monai	0	Feb 16, 2024
Help me sort out this script	1	Oct 17, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023

related lists mean value

dimitri pater - serpia

John Posner

John Posner

John Posner

Michael Rudolf

Michael Rudolf

Peter Otten

Michael Rudolf

Steve Howell

Peter Otten

Steve Howell

Steve Howell

nn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads