pairs from a list

Discussion in 'Python' started by Alan Isaac, Jan 22, 2008.

  1. Alan Isaac

    Alan Isaac Guest

    I want to generate sequential pairs from a list.
    Here is a way::

    from itertools import izip, islice
    for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
    print x12

    (Of course the print statement is just illustrative.)
    What is the fastest way? (Ignore the import time.)

    Thanks,
    Alan Isaac
    Alan Isaac, Jan 22, 2008
    #1
    1. Advertising

  2. Alan Isaac

    Paul Rubin Guest

    Alan Isaac <> writes:
    > (Of course the print statement is just illustrative.)
    > What is the fastest way? (Ignore the import time.)


    You have to try a bunch of different ways and time them. One
    idea (untested):

    def pairs(seq):
    while True:
    yield (seq.next(), seq.next())
    Paul Rubin, Jan 22, 2008
    #2
    1. Advertising

  3. On Jan 21, 10:20 pm, Alan Isaac <> wrote:
    > I want to generate sequential pairs from a list.
    > Here is a way::
    >
    > from itertools import izip, islice
    > for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
    > print x12
    >
    > (Of course the print statement is just illustrative.)
    > What is the fastest way? (Ignore the import time.)


    Look up the timeit module and test yourself the various alternatives;
    that's the most reliable way to tell for sure.

    George
    George Sakkis, Jan 22, 2008
    #3
  4. Alan Isaac

    Paddy Guest

    On Jan 22, 3:20 am, Alan Isaac <> wrote:
    > I want to generate sequential pairs from a list.

    <<snip>>
    > What is the fastest way? (Ignore the import time.)

    1) How fast is the method you have?
    2) How much faster does it need to be for your application?
    3) Are their any other bottlenecks in your application?
    4) Is this the routine whose smallest % speed-up would give the
    largest overall speed up of your application?

    - Paddy.
    Paddy, Jan 22, 2008
    #4
  5. On Jan 22, 12:15 am, Paddy <> wrote:
    > On Jan 22, 3:20 am, Alan Isaac <> wrote:> I want to generate sequential pairs from a list.
    > <<snip>>
    > > What is the fastest way? (Ignore the import time.)

    >
    > 1) How fast is the method you have?
    > 2) How much faster does it need to be for your application?
    > 3) Are their any other bottlenecks in your application?
    > 4) Is this the routine whose smallest % speed-up would give the
    > largest overall speed up of your application?


    I believe the "what is the fastest way" question for such small well-
    defined tasks is worth asking on its own, regardless of whether it
    makes a difference in the application (or even if there is no
    application to begin with). Just because cpu cycles are cheap these
    days is not a good reason to be sloppy. Moreover, often the fastest
    pure Python version happens to be among the most elegant and concise,
    unlike other languages where optimization usually implies obfuscation.

    George
    George Sakkis, Jan 22, 2008
    #5
  6. On Mon, 21 Jan 2008 21:34:28 -0800, George Sakkis wrote:

    > I believe the "what is the fastest way" question for such small well-
    > defined tasks is worth asking on its own, regardless of whether it makes
    > a difference in the application (or even if there is no application to
    > begin with). Just because cpu cycles are cheap these days is not a good
    > reason to be sloppy. Moreover, often the fastest pure Python version
    > happens to be among the most elegant and concise, unlike other languages
    > where optimization usually implies obfuscation.


    I wonder why it is that people automatically assume that "optimization"
    means optimize the time taken, and not the developer effort to write it
    in the first place, the effort required to maintain it over time, or the
    memory used at runtime, let alone some combination of all four factors.

    Memory is cheap, but applications are hungry.

    CPUs are fast, and for most applications the difference between 3ms and
    30ms is undetectable by the user. Why do we care so little about saving
    memory and so much about ever-decreasing time savings?



    --
    Steven
    Steven D'Aprano, Jan 22, 2008
    #6
  7. On Jan 22, 3:20 am, Alan Isaac <> wrote:
    > I want to generate sequential pairs from a list.
    > Here is a way::
    >
    >     from itertools import izip, islice
    >     for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
    >         print x12
    >
    > (Of course the print statement is just illustrative.)
    > What is the fastest way? (Ignore the import time.)
    >
    > Thanks,
    > Alan Isaac


    Don't know the fastest, but here's a very concise way:

    from itertools import izip

    def ipairs(seq):
    it = iter(seq)
    return izip(it, it)

    >>> list(pairs(xrange(10)))

    [(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
    >>> list(pairs('hello'))

    [('h', 'e'), ('l', 'l')]

    --
    Arnaud
    Arnaud Delobelle, Jan 22, 2008
    #7
  8. Alan Isaac

    Alan Isaac Guest

    I suppose my question should have been,
    is there an obviously faster way?
    Anyway, of the four ways below, the
    first is substantially fastest. Is
    there an obvious reason why?

    Thanks,
    Alan Isaac

    PS My understanding is that the behavior
    of the last is implementation dependent
    and not guaranteed.

    def pairs1(x):
    for x12 in izip(islice(x,0,None,2),islice(x,1,None,2)):
    yield x12

    def pairs2(x):
    xiter = iter(x)
    while True:
    yield xiter.next(), xiter.next()

    def pairs3(x):
    for i in range( len(x)//2 ):
    yield x[2*i], x[2*i+1],

    def pairs4(x):
    xiter = iter(x)
    for x12 in izip(xiter,xiter):
    yield x12
    Alan Isaac, Jan 22, 2008
    #8
  9. On Jan 22, 1:19 pm, Alan Isaac <> wrote:
    [...]
    > PS My understanding is that the behavior
    > of the last is implementation dependent
    > and not guaranteed.

    [...]
    > def pairs4(x):
    >     xiter = iter(x)
    >     for x12 in izip(xiter,xiter):
    >         yield x12


    According to the docs [1], izip is defined to be equivalent to:

    def izip(*iterables):
    iterables = map(iter, iterables)
    while iterables:
    result = [it.next() for it in iterables]
    yield tuple(result)

    This guarantees that it.next() will be performed from left to right,
    so there is no risk that e.g. pairs4([1, 2, 3, 4]) returns [(2, 1),
    (4, 3)].

    Is there anything else that I am overlooking?

    [1] http://docs.python.org/lib/itertools-functions.html

    --
    Arnaud
    Arnaud Delobelle, Jan 22, 2008
    #9
  10. Alan Isaac

    Guest

    Alan Isaac>What is the fastest way? (Ignore the import time.)<

    Maybe someday someone will realize such stuff belongs to the python
    STD lib...

    If you need a lazy generator without padding, that splits starting
    from the start, then this is the faster to me if n is close to 2:

    def xpartition(seq, n=2):
    return izip( *(iter(seq),)*n )

    If you need the faster greedy version without padding then there are
    two answers, one for Psyco and one for Python without... :)
    If you need padding or to start from the end then there are more
    answers...

    Bye,
    bearophile
    , Jan 22, 2008
    #10
  11. On Jan 22, 1:19 pm, Alan Isaac <> wrote:
    > I suppose my question should have been,
    > is there an obviously faster way?
    > Anyway, of the four ways below, the
    > first is substantially fastest.  Is
    > there an obvious reason why?


    Can you post your results?

    I get different ones (pairs1 and pairs2 rewritten slightly to avoid
    unnecessary indirection).

    ====== pairs.py ===========
    from itertools import *

    def pairs1(x):
    return izip(islice(x,0,None,2),islice(x,1,None,2))

    def pairs2(x):
    xiter = iter(x)
    while True:
    yield xiter.next(), xiter.next()

    def pairs3(x):
    for i in range( len(x)//2 ):
    yield x[2*i], x[2*i+1],

    def pairs4(x):
    xiter = iter(x)
    return izip(xiter,xiter)

    def compare():
    import timeit
    for i in '1234':
    t = timeit.Timer('list(pairs.pairs%s(l))' % i,
    'import pairs; l=range(1000)')
    print 'pairs%s: %s' % (i, t.timeit(10000))

    if __name__ == '__main__':
    compare()
    =====================

    marigold:python arno$ python pairs.py
    pairs1: 0.789824962616
    pairs2: 4.08462786674
    pairs3: 2.90438890457
    pairs4: 0.536775827408

    pairs4 wins.

    --
    Arnaud
    Arnaud Delobelle, Jan 22, 2008
    #11
  12. Alan Isaac

    Alan Isaac Guest

    Arnaud Delobelle wrote:
    > According to the docs [1], izip is defined to be equivalent to:
    >
    > def izip(*iterables):
    > iterables = map(iter, iterables)
    > while iterables:
    > result = [it.next() for it in iterables]
    > yield tuple(result)
    >
    > This guarantees that it.next() will be performed from left to right,
    > so there is no risk that e.g. pairs4([1, 2, 3, 4]) returns [(2, 1),
    > (4, 3)].
    >
    > Is there anything else that I am overlooking?
    >
    > [1] http://docs.python.org/lib/itertools-functions.html



    <URL:http://bugs.python.org/issue1121416>

    fwiw,
    Alan Isaac
    Alan Isaac, Jan 22, 2008
    #12
  13. On Jan 22, 4:10 pm, Alan Isaac <> wrote:

    > <URL:http://bugs.python.org/issue1121416>
    >
    > fwiw,
    > Alan Isaac


    Thanks. So I guess I shouldn't take the code snippet I quoted as a
    specification of izip but rather as an illustration.

    --
    Arnaud
    Arnaud Delobelle, Jan 22, 2008
    #13
  14. Alan Isaac

    Alan Isaac Guest

    Arnaud Delobelle wrote:
    > pairs4 wins.



    Oops. I see a smaller difference,
    but yes, pairs4 wins.

    Alan Isaac

    import time
    from itertools import islice, izip

    x = range(500001)

    def pairs1(x):
    return izip(islice(x,0,None,2),islice(x,1,None,2))

    def pairs2(x):
    xiter = iter(x)
    while True:
    yield xiter.next(), xiter.next()

    def pairs3(x):
    for i in range( len(x)//2 ):
    yield x[2*i], x[2*i+1],

    def pairs4(x):
    xiter = iter(x)
    return izip(xiter,xiter)

    t = time.clock()
    for x1, x2 in pairs1(x):
    pass
    t1 = time.clock() - t

    t = time.clock()
    for x1, x2 in pairs2(x):
    pass
    t2 = time.clock() - t

    t = time.clock()
    for x1, x2 in pairs3(x):
    pass
    t3 = time.clock() - t

    t = time.clock()
    for x1, x2 in pairs4(x):
    pass
    t4 = time.clock() - t

    print t1, t2, t3, t4

    Output:
    0.317524154606 1.13436847421 1.07100930426 0.262926712753
    Alan Isaac, Jan 22, 2008
    #14
  15. Alan Isaac

    Paddy Guest

    On Jan 22, 5:34 am, George Sakkis <> wrote:
    > On Jan 22, 12:15 am, Paddy <> wrote:
    >
    > > On Jan 22, 3:20 am, Alan Isaac <> wrote:> I want to generate sequential pairs from a list.
    > > <<snip>>
    > > > What is the fastest way? (Ignore the import time.)

    >
    > > 1) How fast is the method you have?
    > > 2) How much faster does it need to be for your application?
    > > 3) Are their any other bottlenecks in your application?
    > > 4) Is this the routine whose smallest % speed-up would give the
    > > largest overall speed up of your application?

    >
    > I believe the "what is the fastest way" question for such small well-
    > defined tasks is worth asking on its own, regardless of whether it
    > makes a difference in the application (or even if there is no
    > application to begin with).


    Hi George,
    You need to 'get it right' first. Micro optimizations for speed
    without thought of the wider context is a bad habit to form and a time
    waster.
    If the routine is all that needs to be delivered and it does not
    perform at an acceptable speed then find out what is acceptable and
    optimise towards that goal. My questions were set to get posters to
    think more about the need for speed optimizations and where they
    should be applied, (if at all).

    A bit of forethought might justify leaving the routine alone, or
    optimising for readability instead.

    - Paddy.
    Paddy, Jan 22, 2008
    #15
  16. On Jan 22, 6:34 pm, Paddy <> wrote:
    [...]
    > Hi George,
    > You need to 'get it right' first. Micro optimizations for speed
    > without thought of the wider context is a bad habit to form and a time
    > waster.
    > If the routine is all that needs to be delivered and it does not
    > perform at an acceptable speed then find out what is acceptable and
    > optimise towards that goal. My questions were set to get posters to
    > think more about the need for speed optimizations and where they
    > should be applied, (if at all).
    >
    > A bit of forethought might justify leaving the routine alone, or
    > optimising for readability instead.


    But it's fun!

    Some-of-us-can't-help-it'ly yours
    --
    Arnaud
    Arnaud Delobelle, Jan 22, 2008
    #16
  17. Alan Isaac

    Peter Otten Guest

    Arnaud Delobelle wrote:

    > On Jan 22, 4:10 pm, Alan Isaac <> wrote:
    >
    >> <URL:http://bugs.python.org/issue1121416>
    >>
    >> fwiw,
    >> Alan Isaac

    >
    > Thanks. So I guess I shouldn't take the code snippet I quoted as a
    > specification of izip but rather as an illustration.


    You can be bolder here as the izip() docs explicitly state

    """
    Note, the left-to-right evaluation order of the iterables is
    guaranteed. This makes possible an idiom for clustering a data series into
    n-length groups using "izip(*[iter(s)]*n)".
    """

    and the bug report with Raymond Hettinger saying

    """
    Left the evaluation order as an unspecified, implementation
    specific detail.
    """

    is about zip(), not izip().

    Peter
    Peter Otten, Jan 22, 2008
    #17
  18. [Peter Otten]
    > You can be bolder here as the izip() docs explicitly state
    >
    > """
    > Note, the left-to-right evaluation order of the iterables is
    > guaranteed. This makes possible an idiom for clustering a data series into
    > n-length groups using "izip(*[iter(s)]*n)".
    > """

    . . .
    > is about zip(), not izip().


    FWIW, I just added a similar guarantee for zip().


    Raymond
    Raymond Hettinger, Jan 22, 2008
    #18
  19. On Jan 22, 1:34 pm, Paddy <> wrote:
    > On Jan 22, 5:34 am, George Sakkis <> wrote:
    >
    >
    >
    > > On Jan 22, 12:15 am, Paddy <> wrote:

    >
    > > > On Jan 22, 3:20 am, Alan Isaac <> wrote:> I want to generate sequential pairs from a list.
    > > > <<snip>>
    > > > > What is the fastest way? (Ignore the import time.)

    >
    > > > 1) How fast is the method you have?
    > > > 2) How much faster does it need to be for your application?
    > > > 3) Are their any other bottlenecks in your application?
    > > > 4) Is this the routine whose smallest % speed-up would give the
    > > > largest overall speed up of your application?

    >
    > > I believe the "what is the fastest way" question for such small well-
    > > defined tasks is worth asking on its own, regardless of whether it
    > > makes a difference in the application (or even if there is no
    > > application to begin with).

    >
    > Hi George,
    > You need to 'get it right' first.


    For such trivial problems, getting it right alone isn't a particularly
    high expectation.

    > Micro optimizations for speed
    > without thought of the wider context is a bad habit to form and a time
    > waster.


    The OP didn't mention anything about the context; for all we know,
    this might be a homework problem or the body of a tight inner loop.
    There is this tendency on c.l.py to assume that every optimization
    question is about a tiny subproblem of a 100 KLOC application. Without
    further context, we just don't know.

    > If the routine is all that needs to be delivered and it does not
    > perform at an acceptable speed then find out what is acceptable
    > and optimise towards that goal. My questions were set to get
    > posters to think more about the need for speed optimizations and
    > where they should be applied, (if at all).


    I don't agree with this logic in general. Just because one can solve a
    problem by throwing a quick and dirty hack with quadratic complexity
    that happens to do well enough on current typical input, it doesn't
    mean he shouldn't spend ten or thirty minutes more to write a proper
    linear time solution, all else being equal or at least comparable
    (elegance, conciseness, readability, etc.). Of course it's a tradeoff;
    spending a week to save a few milliseconds on average is usually a
    waste for most applications, but being a lazy keyboard banger writing
    the first thing that pops into mind is not that good either.

    George
    George Sakkis, Jan 23, 2008
    #19
  20. On Tue, 22 Jan 2008 18:32:22 -0800, George Sakkis wrote:

    > The OP didn't mention anything about the context; for all we know, this
    > might be a homework problem or the body of a tight inner loop. There is
    > this tendency on c.l.py to assume that every optimization question is
    > about a tiny subproblem of a 100 KLOC application. Without further
    > context, we just don't know.


    Funny. As far as I can tell, the usual assumption on c.l.py is that every
    tiny two-line piece of code is the absolute most critically important
    heart of an application which gets called billions of times on petabytes
    of data daily.

    Given the human psychology displayed involved, in the absence of
    definitive evidence one way or another it is a far safer bet to assume
    that people are unnecessarily asking for "the fastest" out of a misguided
    and often ignorant belief that they need it, rather than the opposite.
    People who actually need a faster solution usually know enough to preface
    their comments with an explanation of why their existing solution is too
    slow rather than just a context-free demand for "the fastest" solution.

    Fast code is like fast cars. There *are* people who really genuinely need
    to have the fastest car available, but that number is dwarfed by the vast
    legions of tossers trying to make up for their lack of self-esteem by
    buying a car with a spoiler. Yeah, you're going to be traveling SO FAST
    on the way to the mall that the car is at risk of getting airborne, sure,
    we believe you.

    (The above sarcasm naturally doesn't apply to those who actually do need
    to travel at 200mph in a school zone, like police, taxi drivers and stock
    brokers.)



    --
    Steven
    Steven D'Aprano, Jan 23, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Woodster
    Replies:
    6
    Views:
    352
    Woodster
    Oct 31, 2003
  2. Steven Bethard
    Replies:
    12
    Views:
    492
    Alex Martelli
    Sep 29, 2004
  3. Michael Sparks
    Replies:
    2
    Views:
    346
    Alex Martelli
    Sep 29, 2004
  4. Steven Bethard
    Replies:
    5
    Views:
    374
    Martin Maney
    Oct 14, 2004
  5. Replies:
    7
    Views:
    378
    John Machin
    Feb 22, 2005
Loading...

Share This Page