# pairs from a list

Discussion in 'Python' started by Alan Isaac, Jan 22, 2008.

1. ### Alan IsaacGuest

I want to generate sequential pairs from a list.
Here is a way::

from itertools import izip, islice
for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
print x12

(Of course the print statement is just illustrative.)
What is the fastest way? (Ignore the import time.)

Thanks,
Alan Isaac

Alan Isaac, Jan 22, 2008

2. ### Paul RubinGuest

Alan Isaac <> writes:
> (Of course the print statement is just illustrative.)
> What is the fastest way? (Ignore the import time.)

You have to try a bunch of different ways and time them. One
idea (untested):

def pairs(seq):
while True:
yield (seq.next(), seq.next())

Paul Rubin, Jan 22, 2008

3. ### George SakkisGuest

On Jan 21, 10:20 pm, Alan Isaac <> wrote:
> I want to generate sequential pairs from a list.
> Here is a way::
>
> from itertools import izip, islice
> for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
> print x12
>
> (Of course the print statement is just illustrative.)
> What is the fastest way? (Ignore the import time.)

Look up the timeit module and test yourself the various alternatives;
that's the most reliable way to tell for sure.

George

George Sakkis, Jan 22, 2008

On Jan 22, 3:20 am, Alan Isaac <> wrote:
> I want to generate sequential pairs from a list.

<<snip>>
> What is the fastest way? (Ignore the import time.)

1) How fast is the method you have?
2) How much faster does it need to be for your application?
3) Are their any other bottlenecks in your application?
4) Is this the routine whose smallest % speed-up would give the
largest overall speed up of your application?

5. ### George SakkisGuest

On Jan 22, 12:15 am, Paddy <> wrote:
> On Jan 22, 3:20 am, Alan Isaac <> wrote:> I want to generate sequential pairs from a list.
> <<snip>>
> > What is the fastest way? (Ignore the import time.)

>
> 1) How fast is the method you have?
> 2) How much faster does it need to be for your application?
> 3) Are their any other bottlenecks in your application?
> 4) Is this the routine whose smallest % speed-up would give the
> largest overall speed up of your application?

I believe the "what is the fastest way" question for such small well-
defined tasks is worth asking on its own, regardless of whether it
makes a difference in the application (or even if there is no
application to begin with). Just because cpu cycles are cheap these
days is not a good reason to be sloppy. Moreover, often the fastest
pure Python version happens to be among the most elegant and concise,
unlike other languages where optimization usually implies obfuscation.

George

George Sakkis, Jan 22, 2008
6. ### Steven D'ApranoGuest

On Mon, 21 Jan 2008 21:34:28 -0800, George Sakkis wrote:

> I believe the "what is the fastest way" question for such small well-
> defined tasks is worth asking on its own, regardless of whether it makes
> a difference in the application (or even if there is no application to
> begin with). Just because cpu cycles are cheap these days is not a good
> reason to be sloppy. Moreover, often the fastest pure Python version
> happens to be among the most elegant and concise, unlike other languages
> where optimization usually implies obfuscation.

I wonder why it is that people automatically assume that "optimization"
means optimize the time taken, and not the developer effort to write it
in the first place, the effort required to maintain it over time, or the
memory used at runtime, let alone some combination of all four factors.

Memory is cheap, but applications are hungry.

CPUs are fast, and for most applications the difference between 3ms and
30ms is undetectable by the user. Why do we care so little about saving
memory and so much about ever-decreasing time savings?

--
Steven

Steven D'Aprano, Jan 22, 2008
7. ### Arnaud DelobelleGuest

On Jan 22, 3:20 am, Alan Isaac <> wrote:
> I want to generate sequential pairs from a list.
> Here is a way::
>
>     from itertools import izip, islice
>     for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
>         print x12
>
> (Of course the print statement is just illustrative.)
> What is the fastest way? (Ignore the import time.)
>
> Thanks,
> Alan Isaac

Don't know the fastest, but here's a very concise way:

from itertools import izip

def ipairs(seq):
it = iter(seq)
return izip(it, it)

>>> list(pairs(xrange(10)))

[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
>>> list(pairs('hello'))

[('h', 'e'), ('l', 'l')]

--
Arnaud

Arnaud Delobelle, Jan 22, 2008
8. ### Alan IsaacGuest

I suppose my question should have been,
is there an obviously faster way?
Anyway, of the four ways below, the
first is substantially fastest. Is
there an obvious reason why?

Thanks,
Alan Isaac

PS My understanding is that the behavior
of the last is implementation dependent
and not guaranteed.

def pairs1(x):
for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
yield x12

def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()

def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],

def pairs4(x):
xiter = iter(x)
for x12 in izip(xiter,xiter):
yield x12

Alan Isaac, Jan 22, 2008
9. ### Arnaud DelobelleGuest

On Jan 22, 1:19 pm, Alan Isaac <> wrote:
[...]
> PS My understanding is that the behavior
> of the last is implementation dependent
> and not guaranteed.

[...]
> def pairs4(x):
>     xiter = iter(x)
>     for x12 in izip(xiter,xiter):
>         yield x12

According to the docs [1], izip is defined to be equivalent to:

def izip(*iterables):
iterables = map(iter, iterables)
while iterables:
result = [it.next() for it in iterables]
yield tuple(result)

This guarantees that it.next() will be performed from left to right,
so there is no risk that e.g. pairs4([1, 2, 3, 4]) returns [(2, 1),
(4, 3)].

Is there anything else that I am overlooking?

[1] http://docs.python.org/lib/itertools-functions.html

--
Arnaud

Arnaud Delobelle, Jan 22, 2008
10. ### Guest

Alan Isaac>What is the fastest way? (Ignore the import time.)<

Maybe someday someone will realize such stuff belongs to the python
STD lib...

If you need a lazy generator without padding, that splits starting
from the start, then this is the faster to me if n is close to 2:

def xpartition(seq, n=2):
return izip( *(iter(seq),)*n )

If you need the faster greedy version without padding then there are
two answers, one for Psyco and one for Python without...
If you need padding or to start from the end then there are more

Bye,
bearophile

, Jan 22, 2008
11. ### Arnaud DelobelleGuest

On Jan 22, 1:19 pm, Alan Isaac <> wrote:
> I suppose my question should have been,
> is there an obviously faster way?
> Anyway, of the four ways below, the
> first is substantially fastest.  Is
> there an obvious reason why?

I get different ones (pairs1 and pairs2 rewritten slightly to avoid
unnecessary indirection).

====== pairs.py ===========
from itertools import *

def pairs1(x):
return izip(islice(x,0,None,2),islice(x,1,None,2))

def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()

def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],

def pairs4(x):
xiter = iter(x)
return izip(xiter,xiter)

def compare():
import timeit
for i in '1234':
t = timeit.Timer('list(pairs.pairs%s(l))' % i,
'import pairs; l=range(1000)')
print 'pairs%s: %s' % (i, t.timeit(10000))

if __name__ == '__main__':
compare()
=====================

marigoldython arno\$ python pairs.py
pairs1: 0.789824962616
pairs2: 4.08462786674
pairs3: 2.90438890457
pairs4: 0.536775827408

pairs4 wins.

--
Arnaud

Arnaud Delobelle, Jan 22, 2008
12. ### Alan IsaacGuest

Arnaud Delobelle wrote:
> According to the docs [1], izip is defined to be equivalent to:
>
> def izip(*iterables):
> iterables = map(iter, iterables)
> while iterables:
> result = [it.next() for it in iterables]
> yield tuple(result)
>
> This guarantees that it.next() will be performed from left to right,
> so there is no risk that e.g. pairs4([1, 2, 3, 4]) returns [(2, 1),
> (4, 3)].
>
> Is there anything else that I am overlooking?
>
> [1] http://docs.python.org/lib/itertools-functions.html

<URL:http://bugs.python.org/issue1121416>

fwiw,
Alan Isaac

Alan Isaac, Jan 22, 2008
13. ### Arnaud DelobelleGuest

On Jan 22, 4:10 pm, Alan Isaac <> wrote:

> <URL:http://bugs.python.org/issue1121416>
>
> fwiw,
> Alan Isaac

Thanks. So I guess I shouldn't take the code snippet I quoted as a
specification of izip but rather as an illustration.

--
Arnaud

Arnaud Delobelle, Jan 22, 2008
14. ### Alan IsaacGuest

Arnaud Delobelle wrote:
> pairs4 wins.

Oops. I see a smaller difference,
but yes, pairs4 wins.

Alan Isaac

import time
from itertools import islice, izip

x = range(500001)

def pairs1(x):
return izip(islice(x,0,None,2),islice(x,1,None,2))

def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()

def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],

def pairs4(x):
xiter = iter(x)
return izip(xiter,xiter)

t = time.clock()
for x1, x2 in pairs1(x):
pass
t1 = time.clock() - t

t = time.clock()
for x1, x2 in pairs2(x):
pass
t2 = time.clock() - t

t = time.clock()
for x1, x2 in pairs3(x):
pass
t3 = time.clock() - t

t = time.clock()
for x1, x2 in pairs4(x):
pass
t4 = time.clock() - t

print t1, t2, t3, t4

Output:
0.317524154606 1.13436847421 1.07100930426 0.262926712753

Alan Isaac, Jan 22, 2008

On Jan 22, 5:34 am, George Sakkis <> wrote:
> On Jan 22, 12:15 am, Paddy <> wrote:
>
> > On Jan 22, 3:20 am, Alan Isaac <> wrote:> I want to generate sequential pairs from a list.
> > <<snip>>
> > > What is the fastest way? (Ignore the import time.)

>
> > 1) How fast is the method you have?
> > 2) How much faster does it need to be for your application?
> > 3) Are their any other bottlenecks in your application?
> > 4) Is this the routine whose smallest % speed-up would give the
> > largest overall speed up of your application?

>
> I believe the "what is the fastest way" question for such small well-
> defined tasks is worth asking on its own, regardless of whether it
> makes a difference in the application (or even if there is no
> application to begin with).

Hi George,
You need to 'get it right' first. Micro optimizations for speed
without thought of the wider context is a bad habit to form and a time
waster.
If the routine is all that needs to be delivered and it does not
perform at an acceptable speed then find out what is acceptable and
optimise towards that goal. My questions were set to get posters to
think more about the need for speed optimizations and where they
should be applied, (if at all).

A bit of forethought might justify leaving the routine alone, or

16. ### Arnaud DelobelleGuest

On Jan 22, 6:34 pm, Paddy <> wrote:
[...]
> Hi George,
> You need to 'get it right' first. Micro optimizations for speed
> without thought of the wider context is a bad habit to form and a time
> waster.
> If the routine is all that needs to be delivered and it does not
> perform at an acceptable speed then find out what is acceptable and
> optimise towards that goal. My questions were set to get posters to
> think more about the need for speed optimizations and where they
> should be applied, (if at all).
>
> A bit of forethought might justify leaving the routine alone, or

But it's fun!

Some-of-us-can't-help-it'ly yours
--
Arnaud

Arnaud Delobelle, Jan 22, 2008
17. ### Peter OttenGuest

Arnaud Delobelle wrote:

> On Jan 22, 4:10Â pm, Alan Isaac <> wrote:
>
>> <URL:http://bugs.python.org/issue1121416>
>>
>> fwiw,
>> Alan Isaac

>
> Thanks. So I guess I shouldn't take the code snippet I quoted as a
> specification of izip but rather as an illustration.

You can be bolder here as the izip() docs explicitly state

"""
Note, the left-to-right evaluation order of the iterables is
guaranteed. This makes possible an idiom for clustering a data series into
n-length groups using "izip(*[iter(s)]*n)".
"""

and the bug report with Raymond Hettinger saying

"""
Left the evaluation order as an unspecified, implementation
specific detail.
"""

Peter

Peter Otten, Jan 22, 2008
18. ### Raymond HettingerGuest

[Peter Otten]
> You can be bolder here as the izip() docs explicitly state
>
> """
> Note, the left-to-right evaluation order of the iterables is
> guaranteed. This makes possible an idiom for clustering a data series into
> n-length groups using "izip(*[iter(s)]*n)".
> """

. . .
> is about zip(), not izip().

FWIW, I just added a similar guarantee for zip().

Raymond

Raymond Hettinger, Jan 22, 2008
19. ### George SakkisGuest

On Jan 22, 1:34 pm, Paddy <> wrote:
> On Jan 22, 5:34 am, George Sakkis <> wrote:
>
>
>
> > On Jan 22, 12:15 am, Paddy <> wrote:

>
> > > On Jan 22, 3:20 am, Alan Isaac <> wrote:> I want to generate sequential pairs from a list.
> > > <<snip>>
> > > > What is the fastest way? (Ignore the import time.)

>
> > > 1) How fast is the method you have?
> > > 2) How much faster does it need to be for your application?
> > > 3) Are their any other bottlenecks in your application?
> > > 4) Is this the routine whose smallest % speed-up would give the
> > > largest overall speed up of your application?

>
> > I believe the "what is the fastest way" question for such small well-
> > defined tasks is worth asking on its own, regardless of whether it
> > makes a difference in the application (or even if there is no
> > application to begin with).

>
> Hi George,
> You need to 'get it right' first.

For such trivial problems, getting it right alone isn't a particularly
high expectation.

> Micro optimizations for speed
> without thought of the wider context is a bad habit to form and a time
> waster.

The OP didn't mention anything about the context; for all we know,
this might be a homework problem or the body of a tight inner loop.
There is this tendency on c.l.py to assume that every optimization
question is about a tiny subproblem of a 100 KLOC application. Without
further context, we just don't know.

> If the routine is all that needs to be delivered and it does not
> perform at an acceptable speed then find out what is acceptable
> and optimise towards that goal. My questions were set to get
> posters to think more about the need for speed optimizations and
> where they should be applied, (if at all).

I don't agree with this logic in general. Just because one can solve a
problem by throwing a quick and dirty hack with quadratic complexity
that happens to do well enough on current typical input, it doesn't
mean he shouldn't spend ten or thirty minutes more to write a proper
linear time solution, all else being equal or at least comparable
spending a week to save a few milliseconds on average is usually a
waste for most applications, but being a lazy keyboard banger writing
the first thing that pops into mind is not that good either.

George

George Sakkis, Jan 23, 2008
20. ### Steven D'ApranoGuest

On Tue, 22 Jan 2008 18:32:22 -0800, George Sakkis wrote:

> The OP didn't mention anything about the context; for all we know, this
> might be a homework problem or the body of a tight inner loop. There is
> this tendency on c.l.py to assume that every optimization question is
> about a tiny subproblem of a 100 KLOC application. Without further
> context, we just don't know.

Funny. As far as I can tell, the usual assumption on c.l.py is that every
tiny two-line piece of code is the absolute most critically important
heart of an application which gets called billions of times on petabytes
of data daily.

Given the human psychology displayed involved, in the absence of
definitive evidence one way or another it is a far safer bet to assume
that people are unnecessarily asking for "the fastest" out of a misguided
and often ignorant belief that they need it, rather than the opposite.
People who actually need a faster solution usually know enough to preface
their comments with an explanation of why their existing solution is too
slow rather than just a context-free demand for "the fastest" solution.

Fast code is like fast cars. There *are* people who really genuinely need
to have the fastest car available, but that number is dwarfed by the vast
legions of tossers trying to make up for their lack of self-esteem by
buying a car with a spoiler. Yeah, you're going to be traveling SO FAST
on the way to the mall that the car is at risk of getting airborne, sure,
we believe you.

(The above sarcasm naturally doesn't apply to those who actually do need
to travel at 200mph in a school zone, like police, taxi drivers and stock
brokers.)

--
Steven

Steven D'Aprano, Jan 23, 2008