Into itertools

B

bearophileHUGS

Some idioms are so common that I think they deserve to be written in C
into the itertools module.

1) leniter(iterator)

It returns the length of a given iterator, consuming it, without
creating a list. I have discussed this twice in the past.
Like itertools.izip_longest don't use it with infinite iterators.

The following code is faster than
sum(1 for _ in iterator)

A Python implementation:

def leniter(iterator):
if hasattr(iterator, "__len__"):
return len(iterator)
nelements = 0
for _ in iterator:
nelements += 1
return nelements

I use it to count the grup items that groupby() gives:

[(h, leniter(g)) for h,g in groupby(iterable)]

And I use it various other situations, like when I want to test the
speed of a lazy generator:

timeit(leniter, xpairwise(xrange(20)))

(Where "timeit" is a function of mine similar to a Mathematica
function that given a function and some data, applies it and returns
the result and the time taken to perform it).

----------------

2) xpairwise(iterable)

This is at the the bottom of the itertools docs:

def pairwise(iterable):
a, b = tee(iterable)
for elem in b:
break
return izip(a, b)


The following of mine is faster:

def xpairwise(iterable):
return izip(iterable, islice(iterable, 1, None))

But a C implementation is better (also avoiding copy&paste from the
bottom of the docs page).

This function can be generalized a lot (I also have written a much
more flexible version), but in practice I have seen this basic version
covers most of the common situations.

----------------

3) xpairs(seq)

This is a Python implementation:

def xpairs(seq):
"""xpairs(seq): yields all the distinct pairs of the given seq,
inside a pair
the order doesn't matter. No pairs have the same element twice.
This means it yields just the upper triangular half of the pair
matrix.
>>> list( xpairs([]) ) []
>>> list(xpairs([1,2])) [(1, 2)]
>>> list( xpairs("abc") ) [('a', 'b'), ('a', 'c'), ('b', 'c')]
>>> list(xpairs(range(5)))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3),
(2, 4), (3, 4)]
"""
len_seq = len(seq)
for i, e1 in enumerate(seq):
for j in xrange(i+1, len_seq):
yield e1, seq[j]


A more general implementation can also be written when seq isn't
randomly-addressable, but I think that may become not efficient.

IHMO xpairs and xpairwise may even deserve an interpreter support, to
speed up them when possible. In a modern compiled language (like
Chapel, http://chapel.cs.washington.edu/ ) such idioms (in theory)
produce no slowdown compared to normal loops. I have written generic
xpairs() and xpairwise() for the D language too, but the compiler
isn't able to fully optimize away such constructs.

----------------

4) xsubsets(seq)

This is the recipe at the bottom of the docs page (a bit modified, now
it returns lists because in most situations I don't want the results
to be sets):

def subsets(iterable):
# Recipe credited to Eric Raymond
pairs = [(2 ** i, x) for i, x in enumerate(iterable)]
for n in xrange(2 ** len(pairs)):
yield [x for m, x in pairs if m & n]


After starting to use Python 2.6 I now use the following version (this
replaces a longer version I used in the past):

def xsubsets(seq):
seq = list(seq)
yield []
for nitems in xrange(1, len(seq)+1):
for subset in combinations(seq, nitems):
yield subset


Thanks to the very fast combinations() this xsubsets code is faster
than that recipe by Eric Raymond, but it has the disadvantage that it
doesn't yield subsets in the natural "binary" order. So a C
implementation can be done to respect that natural order.

Such "binary" order has a disadvantage, because the lists/tuples it
yields have a different length, so the C code may have troubles in
using the the nice trick it currently use, re-using the same memory
and avoiding a new memory allocation every loop:

/* Copy the previous result tuple or re-use it if available */
if (Py_REFCNT(result) > 1) {
PyObject *old_result = result;
result = PyTuple_New(r);
if (result == NULL)
goto empty;
co->result = result;
for (i=0; i<r ; i++) {
elem = PyTuple_GET_ITEM(old_result, i);
Py_INCREF(elem);
PyTuple_SET_ITEM(result, i, elem);
}
Py_DECREF(old_result);
}
/* Now, we've got the only copy so we can update it in-place
* CPython's empty tuple is a singleton and cached in
* PyTuple's freelist.
*/
assert(r == 0 || Py_REFCNT(result) == 1);


Maybe there are ways to avoid that in the C implementation of xsubsets
too, I don't know Python implementation enough to tell...

Bye,
bearophile
 
A

AggieDan04

Some idioms are so common that I think they deserve to be written in C
into the itertools module.

1) leniter(iterator) ....
2) xpairwise(iterable) ....
3) xpairs(seq) ....
4) xsubsets(seq)
....

Good suggestions. Another useful function I'd like to see in
itertools is the Cartesian product. This can be implemented as:

def cartesian_product(*args):
"""Iterates over the Cartesian product of args[0], args[1], ..."""
if not args:
return
elif len(args) == 1:
for item in args[0]:
yield (item,)
else:
for item in args[0]:
for item2 in cartesian_product(*args[1:]):
yield (item,) + item2
 
A

Arnaud Delobelle

Some idioms are so common that I think they deserve to be written in C
into the itertools module.

1) leniter(iterator) [...]
A Python implementation:

def leniter(iterator):
    if hasattr(iterator, "__len__"):
        return len(iterator)
    nelements = 0
    for _ in iterator:
        nelements += 1
    return nelements

Some people would write it as:

def leniter(iterable):
if hasattr(iterable, '__len__'):
return len(iteratble)
return sum(1 for _ in iterable)

[...]
2) xpairwise(iterable)

This is at the the bottom of the itertools docs:

def pairwise(iterable):
    a, b = tee(iterable)
    for elem in b:
        break
    return izip(a, b)

The following of mine is faster:

def xpairwise(iterable):
    return izip(iterable, islice(iterable, 1, None))

This doesn't work if the iterable is an iterator.

E.g
[(0, 2), (3, 4)]

[...]
3) xpairs(seq)

This is a Python implementation:

def xpairs(seq):
    len_seq = len(seq)
    for i, e1 in enumerate(seq):
        for j in xrange(i+1, len_seq):
            yield e1, seq[j]

Why not:

def xpairs(seq):
for i, el in enumerate(seq):
for j in xrange(i):
yield seq[j], el
 
M

Mark Dickinson

3) xpairs(seq)
    >>> list(xpairs(range(5)))
    [(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3),
(2, 4), (3, 4)]

Doesn't itertools.combinations already do this for you?
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2,
4), (3, 4)]

Mark
 
B

bearophileHUGS

Arnaud Delobelle:
Some people would write it as:

def leniter(iterable):
    if hasattr(iterable, '__len__'):
        return len(iteratble)
    return sum(1 for _ in iterable)

That's slower than my version.


This doesn't work if the iterable is an iterator.

I see. I think I have never used it with an iterator so far :)
I'll fix it.

Why not:

def xpairs(seq):
    for i, el in enumerate(seq):
        for j in xrange(i):
            yield seq[j], el

Maybe because I was not smart enough to invent that :)

-------------------

Mark Dickinson:
3) xpairs(seq)[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3),
(2, 4), (3, 4)]

Doesn't itertools.combinations already do this for you?
I have written most of those functions when I have used Python 2.4,
before itertools.combinations. But you are right, xpairs is now almost
useless, just one line long :) Combinatorics is so fun :)

Thank you very much to both.

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top