all pairs of items in a list without indexing?

S

Steven Bethard

So I need to do something like:

for i in range(len(l)):
for j in range(i+1, len(l)):
# do something with (l, l[j])

where I get all pairs of items in a list (where I'm thinking of pairs
as sets, not tuples, so order doesn't matter). There isn't really
anything wrong with the solution here, but since Python's for-each
construction is so nice, I usually try to avoid range(len(..)) type
calls and list indexing when I can... Is there any way to do this
without indexing, e.g.:

for item1 in ...:
for item2 in ...:
# do something with (item1, item2)

I could do something like:

for i, item1 in enumerate(l):
for j in range(i+1, len(l)):
(item1, l[j])

but that only gets me halfway there... I also thought of something like:

for i, item1 in enumerate(l):
for item2 in l[i+1:]:
(item1, item2)

but that seems like a lot of wasteful list slicing...

Thanks in advance,

Steve
 
J

Jim Sizelove

Steven said:
So I need to do something like:

for i in range(len(l)):
for j in range(i+1, len(l)):
# do something with (l, l[j])

where I get all pairs of items in a list (where I'm thinking of pairs
as sets, not tuples, so order doesn't matter). There isn't really
anything wrong with the solution here, but since Python's for-each
construction is so nice, I usually try to avoid range(len(..)) type
calls and list indexing when I can... Is there any way to do this
without indexing


Are you trying to pair each item in a list with every other
item exactly once? Maybe this code does what you want:

while len(L)>0:
item1 = L.pop()
for item2 in L:
print (item1, item2)

pop() removes one item from the list. The inner for loop matches that
item with each of the remaining items. At the end of the outer loop the
list L is empty; you may want to run this loop over a copy of the list
if you want to do other things with the list later.

HTH,

Jim
 
J

jepler

Steven said:
So I need to do something like:

for i in range(len(l)):
for j in range(i+1, len(l)):
# do something with (l, l[j])

where I get all pairs of items in a list (where I'm thinking of pairs
as sets, not tuples, so order doesn't matter). There isn't really
anything wrong with the solution here, but since Python's for-each
construction is so nice, I usually try to avoid range(len(..)) type
calls and list indexing when I can... Is there any way to do this
without indexing


Are you trying to pair each item in a list with every other
item exactly once? Maybe this code does what you want:

while len(L)>0:
item1 = L.pop()
for item2 in L:
print (item1, item2)

This looks good, as long as the fact that the "item1"s will come
out of the list backwards is OK.

I'd write 'while L:' instead of your more complicated test, though.

def all_pairs(L):
while L:
i = L.pop()
for j in L: yield i, j
[(4, 0), (4, 1), (4, 2), (4, 3), (3, 0), (3, 1), (3, 2), (2, 0), (2, 1), (1, 0)]

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBWd8LJd01MZaTXX0RAtSuAJ9b1FjtfXQF/uAruhDukWlcA9ftCwCeJfm+
9r/rzibh0ssDsVjfVA7d51s=
=e1EA
-----END PGP SIGNATURE-----
 
S

Steven Bethard

def all_pairs(L):
while L:
i = L.pop()
for j in L: yield i, j

Interesting. I hadn't thought of this one -- it's not bad other than
requiring the list copy (since I need to maintain the original list).

Thanks,

Steve
 
M

Marc Christiansen

Steven Bethard said:
So I need to do something like:

for i in range(len(l)):
for j in range(i+1, len(l)):
# do something with (l, l[j])
[...]
I could do something like:

for i, item1 in enumerate(l):
for j in range(i+1, len(l)):
(item1, l[j])

but that only gets me halfway there... I also thought of something like:

for i, item1 in enumerate(l):
for item2 in l[i+1:]:
(item1, item2)

but that seems like a lot of wasteful list slicing...


You could try something like:

for i, item1 in enumerate(l):
for item2 in itertools.islice(l, i+1, None):
(item1, item2)

I'm not sure which version I prefer.

Marc
 
P

Patrick Maupin

Steven Bethard said:
So I need to do something like:

for i in range(len(l)):
for j in range(i+1, len(l)):
# do something with (l, l[j]) ...
Is there any way to do this without indexing


As others have pointed out, you can do this without indexing, but it
may involve overhead of other sorts (list copies, popping, etc.).

I'd just like to point out that, for the problem you are describing
(where you want every possible pair, but order doesn't matter), you
_can_ make your loop perhaps a bit more readable (depending on the
context it is used in), and perhaps a tiny bit faster:

for i in range(1,len(mylist)):
for j in range(i):
# do something with (l, l[j])

Regards,
Pat
 
A

Alex Martelli

Steven Bethard said:
for i, item1 in enumerate(l):
for item2 in l[i+1:]:
(item1, item2)

but that seems like a lot of wasteful list slicing...

Sorry, I don't get it: what's wasteful about it? Do you mean in terms
of performance?

With slicing:

kallisti:~/cb alex$ python2.4 timeit.py -s'l=range(333)' '[(x,y) for i,x
in enumerate(l) for y in l[i+1:]]'
10 loops, best of 3: 1.43e+05 usec per loop
kallisti:~/cb alex$ python2.4 timeit.py -s'l=range(333)' '[(x,y) for i,x
in enumerate(l) for y in l[i+1:]]'
10 loops, best of 3: 1.41e+05 usec per loop

With xrange(len(...:

kallisti:~/cb alex$ python2.4 timeit.py -s'l=range(333)' '[(l,l[j])
for i in xrange(len(l)) for j in xrange(i+1,len(l))]'
10 loops, best of 3: 1.61e+05 usec per loop
kallisti:~/cb alex$ python2.4 timeit.py -s'l=range(333)' '[(l,l[j])
for i in xrange(len(l)) for j in xrange(i+1,len(l))]'
10 loops, best of 3: 1.62e+05 usec per loop

You could use itertools.islice(l,i+1,len(l)) instead of l[i+1:], but in
my tests, in this particular case, itertools appears to be slower than
plain old slicing, albeit not by much -- about 1.45 to 1.46 vs plain
slicing's 1.42 or so, and xrange's 1.61 or so.

But maybe you can clarify the 'wasteful' observation better!


Alex
 
A

Alex Martelli

Steven Bethard said:
Interesting. I hadn't thought of this one -- it's not bad other than
requiring the list copy (since I need to maintain the original list).

If you start with an L=list(L), you can also optionally L.reverse() to
play with the ordering (if significant, issue still not clarified).

Ordering apart, performance is still not quite as good as with the
slicing you consider wasteful, in my measurements. Remember with the
slicing we got about 1.42e+05 microseconds -- with this approach we see:

kallisti:~/cb alex$ python2.4 timeit.py -s'l=range(333)' -s'
def alp(L):
L = list(L)
while L:
it1 = L.pop()
for it2 in L: yield it1, it2
' 'list(alp(l))'
10 loops, best of 3: 1.51e+05 usec per loop


Alex
 
S

Steven Bethard

Alex Martelli said:
With slicing:

kallisti:~/cb alex$ python2.4 timeit.py -s'l=range(333)' '[(x,y) for i,x
in enumerate(l) for y in l[i+1:]]'
10 loops, best of 3: 1.43e+05 usec per loop
kallisti:~/cb alex$ python2.4 timeit.py -s'l=range(333)' '[(x,y) for i,x
in enumerate(l) for y in l[i+1:]]'
10 loops, best of 3: 1.41e+05 usec per loop

With xrange(len(...:

kallisti:~/cb alex$ python2.4 timeit.py -s'l=range(333)' '[(l,l[j])
for i in xrange(len(l)) for j in xrange(i+1,len(l))]'
10 loops, best of 3: 1.61e+05 usec per loop
kallisti:~/cb alex$ python2.4 timeit.py -s'l=range(333)' '[(l,l[j])
for i in xrange(len(l)) for j in xrange(i+1,len(l))]'
10 loops, best of 3: 1.62e+05 usec per loop


Wow, interesting. Slicling's still faster in a slightly more fair comparison:
python timeit.py -s "l=range(333)" "[(x,i) for i,x in enumerate(l) for y in l
[:i+1]]"
10 loops, best of 3: 24.5 msec per loop
python timeit.py -s "l=range(333)" "[(x,i) for i,x in enumerate(l) for y in l
[:i+1]]"
10 loops, best of 3: 24.4 msec per loop
python timeit.py -s "l=range(333)" "[(x,l[j]) for i,x in enumerate(l) for j
in xrange(i+1,len(l))]"
10 loops, best of 3: 28.9 msec per loop
python timeit.py -s "l=range(333)" "[(x,l[j]) for i,x in enumerate(l) for j
in xrange(i+1,len(l))]"
10 loops, best of 3: 28.8 msec per loop

So slicing a list and iterating over the slice is faster than iterating over
the indices and indexing the items... Is this generally true? Is it
something I can depend on across implementations?

STeve
 
S

Steven Bethard

Alex Martelli said:
If you start with an L=list(L), you can also optionally L.reverse() to
play with the ordering (if significant, issue still not clarified).

The order of the elements isn't crucial, but for debugging purposes it would
be slightly better if they maintained order.

Steve
 
S

Steven Bethard

Alex Martelli said:
Sorry, I don't get it: what's wasteful about it?

Sorry, perhaps I should have said it has a bad smell. I wasn't the only one
who thought this smelled a little bad; quoting Jeff:

"Something about taking a slice of seq for the inner loop doesn't seem right
to me."

Maybe we need to retrain our noses. ;)

Steve
 
A

Alex Martelli

Steven Bethard said:
So slicing a list and iterating over the slice is faster than iterating over
the indices and indexing the items... Is this generally true? Is it
something I can depend on across implementations?

Unfortunately, nothing can forbid a new implementation, or even a new
release of an existing implementation, acquiring some special new
optimization that makes a technique that used to be slower suddenly
faster. And if it _was_ possible to forbid optimizations, you wouldn't
really want to, anyway -- a potential optimization of some _other_
technique wouldn't damage the solution you've chosen, anyway, except in
terms of "relative standing" vs the now-optimized alternative.

Since using the slice is simpler, and at least for now it's also faster
(so it won't become _too_ slow anyway), that's what I would suggest. As
a precaution you might want to use a 'virtual slice' through a function
such as:

def vslice_from(alist, start):
''' returns some iterable x such that "for i in x:" is exactly
equivalent to "for i in alist[start:]:". '''
return alist[start:]

so that you can experiment with alternatives without having to modify
most of your code, just through alterations to vslice_from. E.g.

import itertools
def vslice_from(alist, start, islice=itertools.islice):
return islice(alist, start, len(alist))

or

def vslice_from(alist, start):
for i in xrange(start, len(alist)): yield alist

or even customized pyrex or C implementations if your quest for the
ultimate performance needs them...


Alex
 
A

Alex Martelli

Steven Bethard said:
Sorry, perhaps I should have said it has a bad smell. I wasn't the only one
who thought this smelled a little bad; quoting Jeff:

"Something about taking a slice of seq for the inner loop doesn't seem right
to me."

Maybe we need to retrain our noses. ;)

Hmmm, I see (or should I say, I sniff). Perhaps a Numeric.array, where
slices share slots rather than making new slots, would smell better and
might indeed perform faster here. But the comparison is with index
loops on xrange(i,len(l)), which is hardly Chanel N. 5 either...;-).


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top