vector addition

G

GZ

Hi,

I am looking for a fast internal vector representation so that
(a1,b2,c1)+(a2,b2,c2)=(a1+a2,b1+b2,c1+c2).

So I have a list

l = ['a'a,'bb','ca','de'...]

I want to count all items that start with an 'a', 'b', and 'c'.

What I can do is:

count_a = sum(int(x[1]=='a') for x in l)
count_b = sum(int(x[1]=='b') for x in l)
count_c = sum(int(x[1]=='c') for x in l)

But this loops through the list three times, which can be slow.

I'd like to have something like this:
count_a, count_b, count_c =
sum( (int(x[1]=='a',int(x[1]=='b',int(x[1]=='c') for x in l)

I hesitate to use numpy array, because that will literally create and
destroy a ton of the arrays, and is likely to be slow.
 
C

Chris Rebert

Hi,

I am looking for a fast internal vector representation so that
(a1,b2,c1)+(a2,b2,c2)=(a1+a2,b1+b2,c1+c2).

So I have a list

l = ['a'a,'bb','ca','de'...]

I want to count all items that start with an 'a', 'b', and 'c'.

What I can do is:

count_a = sum(int(x[1]=='a') for x in l)
count_b = sum(int(x[1]=='b') for x in l)
count_c = sum(int(x[1]=='c') for x in l)

But this loops through the list three times, which can be slow.

I don't really get how that relates to vectors or why you'd use that
representation, and it looks like you're forgotten that Python uses
0-based indexing, but anyway, here's my crack at something more
efficient:

from collections import defaultdict

cared_about = set('abc')
letter2count = defaultdict(int)
for item in l:
initial = item[0]
if initial in cared_about:
letter2count[initial] += 1

count_a = letter2count['a']
count_b = letter2count['b']
count_c = letter2count['c']

Cheers,
Chris
 
M

MRAB

GZ said:
Hi,

I am looking for a fast internal vector representation so that
(a1,b2,c1)+(a2,b2,c2)=(a1+a2,b1+b2,c1+c2).

So I have a list

l = ['a'a,'bb','ca','de'...]

I want to count all items that start with an 'a', 'b', and 'c'.

What I can do is:

count_a = sum(int(x[1]=='a') for x in l)
count_b = sum(int(x[1]=='b') for x in l)
count_c = sum(int(x[1]=='c') for x in l)

But this loops through the list three times, which can be slow.

I'd like to have something like this:
count_a, count_b, count_c =
sum( (int(x[1]=='a',int(x[1]=='b',int(x[1]=='c') for x in l)

I hesitate to use numpy array, because that will literally create and
destroy a ton of the arrays, and is likely to be slow.
If you want to do vector addition then numpy is the way to go. However,
first you could try:

from collections import defaultdict
counts = defaultdict(int)
for x in l:
counts[x[0]] += 1

(Note that in Python indexes are zero-based.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top