pairs from a list

A

Alan Isaac

I want to generate sequential pairs from a list.
Here is a way::

from itertools import izip, islice
for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
print x12

(Of course the print statement is just illustrative.)
What is the fastest way? (Ignore the import time.)

Thanks,
Alan Isaac
 
P

Paul Rubin

Alan Isaac said:
(Of course the print statement is just illustrative.)
What is the fastest way? (Ignore the import time.)

You have to try a bunch of different ways and time them. One
idea (untested):

def pairs(seq):
while True:
yield (seq.next(), seq.next())
 
G

George Sakkis

I want to generate sequential pairs from a list.
Here is a way::

from itertools import izip, islice
for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
print x12

(Of course the print statement is just illustrative.)
What is the fastest way? (Ignore the import time.)

Look up the timeit module and test yourself the various alternatives;
that's the most reliable way to tell for sure.

George
 
P

Paddy

I want to generate sequential pairs from a list.
What is the fastest way? (Ignore the import time.)
1) How fast is the method you have?
2) How much faster does it need to be for your application?
3) Are their any other bottlenecks in your application?
4) Is this the routine whose smallest % speed-up would give the
largest overall speed up of your application?

- Paddy.
 
G

George Sakkis

1) How fast is the method you have?
2) How much faster does it need to be for your application?
3) Are their any other bottlenecks in your application?
4) Is this the routine whose smallest % speed-up would give the
largest overall speed up of your application?

I believe the "what is the fastest way" question for such small well-
defined tasks is worth asking on its own, regardless of whether it
makes a difference in the application (or even if there is no
application to begin with). Just because cpu cycles are cheap these
days is not a good reason to be sloppy. Moreover, often the fastest
pure Python version happens to be among the most elegant and concise,
unlike other languages where optimization usually implies obfuscation.

George
 
S

Steven D'Aprano

I believe the "what is the fastest way" question for such small well-
defined tasks is worth asking on its own, regardless of whether it makes
a difference in the application (or even if there is no application to
begin with). Just because cpu cycles are cheap these days is not a good
reason to be sloppy. Moreover, often the fastest pure Python version
happens to be among the most elegant and concise, unlike other languages
where optimization usually implies obfuscation.

I wonder why it is that people automatically assume that "optimization"
means optimize the time taken, and not the developer effort to write it
in the first place, the effort required to maintain it over time, or the
memory used at runtime, let alone some combination of all four factors.

Memory is cheap, but applications are hungry.

CPUs are fast, and for most applications the difference between 3ms and
30ms is undetectable by the user. Why do we care so little about saving
memory and so much about ever-decreasing time savings?
 
A

Arnaud Delobelle

I want to generate sequential pairs from a list.
Here is a way::

    from itertools import izip, islice
    for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
        print x12

(Of course the print statement is just illustrative.)
What is the fastest way? (Ignore the import time.)

Thanks,
Alan Isaac

Don't know the fastest, but here's a very concise way:

from itertools import izip

def ipairs(seq):
it = iter(seq)
return izip(it, it)
list(pairs(xrange(10))) [(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
list(pairs('hello'))
[('h', 'e'), ('l', 'l')]
 
A

Alan Isaac

I suppose my question should have been,
is there an obviously faster way?
Anyway, of the four ways below, the
first is substantially fastest. Is
there an obvious reason why?

Thanks,
Alan Isaac

PS My understanding is that the behavior
of the last is implementation dependent
and not guaranteed.

def pairs1(x):
for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
yield x12

def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()

def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],

def pairs4(x):
xiter = iter(x)
for x12 in izip(xiter,xiter):
yield x12
 
A

Arnaud Delobelle

PS My understanding is that the behavior
of the last is implementation dependent
and not guaranteed. [...]
def pairs4(x):
    xiter = iter(x)
    for x12 in izip(xiter,xiter):
        yield x12

According to the docs [1], izip is defined to be equivalent to:

def izip(*iterables):
iterables = map(iter, iterables)
while iterables:
result = [it.next() for it in iterables]
yield tuple(result)

This guarantees that it.next() will be performed from left to right,
so there is no risk that e.g. pairs4([1, 2, 3, 4]) returns [(2, 1),
(4, 3)].

Is there anything else that I am overlooking?

[1] http://docs.python.org/lib/itertools-functions.html
 
B

bearophileHUGS

Alan Isaac>What is the fastest way? (Ignore the import time.)<

Maybe someday someone will realize such stuff belongs to the python
STD lib...

If you need a lazy generator without padding, that splits starting
from the start, then this is the faster to me if n is close to 2:

def xpartition(seq, n=2):
return izip( *(iter(seq),)*n )

If you need the faster greedy version without padding then there are
two answers, one for Psyco and one for Python without... :)
If you need padding or to start from the end then there are more
answers...

Bye,
bearophile
 
A

Arnaud Delobelle

I suppose my question should have been,
is there an obviously faster way?
Anyway, of the four ways below, the
first is substantially fastest.  Is
there an obvious reason why?

Can you post your results?

I get different ones (pairs1 and pairs2 rewritten slightly to avoid
unnecessary indirection).

====== pairs.py ===========
from itertools import *

def pairs1(x):
return izip(islice(x,0,None,2),islice(x,1,None,2))

def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()

def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],

def pairs4(x):
xiter = iter(x)
return izip(xiter,xiter)

def compare():
import timeit
for i in '1234':
t = timeit.Timer('list(pairs.pairs%s(l))' % i,
'import pairs; l=range(1000)')
print 'pairs%s: %s' % (i, t.timeit(10000))

if __name__ == '__main__':
compare()
=====================

marigold:python arno$ python pairs.py
pairs1: 0.789824962616
pairs2: 4.08462786674
pairs3: 2.90438890457
pairs4: 0.536775827408

pairs4 wins.
 
A

Alan Isaac

Arnaud said:
According to the docs [1], izip is defined to be equivalent to:

def izip(*iterables):
iterables = map(iter, iterables)
while iterables:
result = [it.next() for it in iterables]
yield tuple(result)

This guarantees that it.next() will be performed from left to right,
so there is no risk that e.g. pairs4([1, 2, 3, 4]) returns [(2, 1),
(4, 3)].

Is there anything else that I am overlooking?

[1] http://docs.python.org/lib/itertools-functions.html


<URL:http://bugs.python.org/issue1121416>

fwiw,
Alan Isaac
 
A

Alan Isaac

Arnaud said:
pairs4 wins.


Oops. I see a smaller difference,
but yes, pairs4 wins.

Alan Isaac

import time
from itertools import islice, izip

x = range(500001)

def pairs1(x):
return izip(islice(x,0,None,2),islice(x,1,None,2))

def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()

def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],

def pairs4(x):
xiter = iter(x)
return izip(xiter,xiter)

t = time.clock()
for x1, x2 in pairs1(x):
pass
t1 = time.clock() - t

t = time.clock()
for x1, x2 in pairs2(x):
pass
t2 = time.clock() - t

t = time.clock()
for x1, x2 in pairs3(x):
pass
t3 = time.clock() - t

t = time.clock()
for x1, x2 in pairs4(x):
pass
t4 = time.clock() - t

print t1, t2, t3, t4

Output:
0.317524154606 1.13436847421 1.07100930426 0.262926712753
 
P

Paddy

I believe the "what is the fastest way" question for such small well-
defined tasks is worth asking on its own, regardless of whether it
makes a difference in the application (or even if there is no
application to begin with).

Hi George,
You need to 'get it right' first. Micro optimizations for speed
without thought of the wider context is a bad habit to form and a time
waster.
If the routine is all that needs to be delivered and it does not
perform at an acceptable speed then find out what is acceptable and
optimise towards that goal. My questions were set to get posters to
think more about the need for speed optimizations and where they
should be applied, (if at all).

A bit of forethought might justify leaving the routine alone, or
optimising for readability instead.

- Paddy.
 
A

Arnaud Delobelle

Hi George,
You need to 'get it right' first. Micro optimizations for speed
without thought of the wider context is a bad habit to form and a time
waster.
If the routine is all that needs to be delivered and it does not
perform at an acceptable speed then find out what is acceptable and
optimise towards that goal. My questions were set to get posters to
think more about the need for speed optimizations and where they
should be applied, (if at all).

A bit of forethought might justify leaving the routine alone, or
optimising for readability instead.

But it's fun!

Some-of-us-can't-help-it'ly yours
 
P

Peter Otten

Arnaud said:
Thanks. So I guess I shouldn't take the code snippet I quoted as a
specification of izip but rather as an illustration.

You can be bolder here as the izip() docs explicitly state

"""
Note, the left-to-right evaluation order of the iterables is
guaranteed. This makes possible an idiom for clustering a data series into
n-length groups using "izip(*[iter(s)]*n)".
"""

and the bug report with Raymond Hettinger saying

"""
Left the evaluation order as an unspecified, implementation
specific detail.
"""

is about zip(), not izip().

Peter
 
R

Raymond Hettinger

[Peter Otten]
You can be bolder here as the izip() docs explicitly state

"""
Note, the left-to-right evaluation order of the iterables is
guaranteed. This makes possible an idiom for clustering a data series into
n-length groups using "izip(*[iter(s)]*n)".
""" . . .
is about zip(), not izip().

FWIW, I just added a similar guarantee for zip().


Raymond
 
G

George Sakkis

Hi George,
You need to 'get it right' first.

For such trivial problems, getting it right alone isn't a particularly
high expectation.
Micro optimizations for speed
without thought of the wider context is a bad habit to form and a time
waster.

The OP didn't mention anything about the context; for all we know,
this might be a homework problem or the body of a tight inner loop.
There is this tendency on c.l.py to assume that every optimization
question is about a tiny subproblem of a 100 KLOC application. Without
further context, we just don't know.
If the routine is all that needs to be delivered and it does not
perform at an acceptable speed then find out what is acceptable
and optimise towards that goal. My questions were set to get
posters to think more about the need for speed optimizations and
where they should be applied, (if at all).

I don't agree with this logic in general. Just because one can solve a
problem by throwing a quick and dirty hack with quadratic complexity
that happens to do well enough on current typical input, it doesn't
mean he shouldn't spend ten or thirty minutes more to write a proper
linear time solution, all else being equal or at least comparable
(elegance, conciseness, readability, etc.). Of course it's a tradeoff;
spending a week to save a few milliseconds on average is usually a
waste for most applications, but being a lazy keyboard banger writing
the first thing that pops into mind is not that good either.

George
 
S

Steven D'Aprano

The OP didn't mention anything about the context; for all we know, this
might be a homework problem or the body of a tight inner loop. There is
this tendency on c.l.py to assume that every optimization question is
about a tiny subproblem of a 100 KLOC application. Without further
context, we just don't know.

Funny. As far as I can tell, the usual assumption on c.l.py is that every
tiny two-line piece of code is the absolute most critically important
heart of an application which gets called billions of times on petabytes
of data daily.

Given the human psychology displayed involved, in the absence of
definitive evidence one way or another it is a far safer bet to assume
that people are unnecessarily asking for "the fastest" out of a misguided
and often ignorant belief that they need it, rather than the opposite.
People who actually need a faster solution usually know enough to preface
their comments with an explanation of why their existing solution is too
slow rather than just a context-free demand for "the fastest" solution.

Fast code is like fast cars. There *are* people who really genuinely need
to have the fastest car available, but that number is dwarfed by the vast
legions of tossers trying to make up for their lack of self-esteem by
buying a car with a spoiler. Yeah, you're going to be traveling SO FAST
on the way to the mall that the car is at risk of getting airborne, sure,
we believe you.

(The above sarcasm naturally doesn't apply to those who actually do need
to travel at 200mph in a school zone, like police, taxi drivers and stock
brokers.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,681
Members
48,796
Latest member
Greg L.

Latest Threads

Top