Into itertools

bearophileHUGS · Apr 26, 2009

Some idioms are so common that I think they deserve to be written in C
into the itertools module.

1) leniter(iterator)

It returns the length of a given iterator, consuming it, without
creating a list. I have discussed this twice in the past.
Like itertools.izip_longest don't use it with infinite iterators.

The following code is faster than
sum(1 for _ in iterator)

A Python implementation:

def leniter(iterator):
if hasattr(iterator, "__len__"):
return len(iterator)
nelements = 0
for _ in iterator:
nelements += 1
return nelements

I use it to count the grup items that groupby() gives:

[(h, leniter(g)) for h,g in groupby(iterable)]

And I use it various other situations, like when I want to test the
speed of a lazy generator:

timeit(leniter, xpairwise(xrange(20)))

(Where "timeit" is a function of mine similar to a Mathematica
function that given a function and some data, applies it and returns
the result and the time taken to perform it).

----------------

2) xpairwise(iterable)

This is at the the bottom of the itertools docs:

def pairwise(iterable):
a, b = tee(iterable)
for elem in b:
break
return izip(a, b)

The following of mine is faster:

def xpairwise(iterable):
return izip(iterable, islice(iterable, 1, None))

But a C implementation is better (also avoiding copy&paste from the
bottom of the docs page).

This function can be generalized a lot (I also have written a much
more flexible version), but in practice I have seen this basic version
covers most of the common situations.

----------------

3) xpairs(seq)

This is a Python implementation:

def xpairs(seq):
"""xpairs(seq): yields all the distinct pairs of the given seq,
inside a pair
the order doesn't matter. No pairs have the same element twice.
This means it yields just the upper triangular half of the pair
matrix.

>>> list( xpairs([]) ) []
>>> list(xpairs([1,2])) [(1, 2)]
>>> list( xpairs("abc") ) [('a', 'b'), ('a', 'c'), ('b', 'c')]
>>> list(xpairs(range(5)))

Click to expand...

Click to expand...

[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3),
(2, 4), (3, 4)]
"""
len_seq = len(seq)
for i, e1 in enumerate(seq):
for j in xrange(i+1, len_seq):
yield e1, seq[j]

A more general implementation can also be written when seq isn't
randomly-addressable, but I think that may become not efficient.

IHMO xpairs and xpairwise may even deserve an interpreter support, to
speed up them when possible. In a modern compiled language (like
Chapel, http://chapel.cs.washington.edu/ ) such idioms (in theory)
produce no slowdown compared to normal loops. I have written generic
xpairs() and xpairwise() for the D language too, but the compiler
isn't able to fully optimize away such constructs.

----------------

4) xsubsets(seq)

This is the recipe at the bottom of the docs page (a bit modified, now
it returns lists because in most situations I don't want the results
to be sets):

def subsets(iterable):
# Recipe credited to Eric Raymond
pairs = [(2 ** i, x) for i, x in enumerate(iterable)]
for n in xrange(2 ** len(pairs)):
yield [x for m, x in pairs if m & n]

After starting to use Python 2.6 I now use the following version (this
replaces a longer version I used in the past):

def xsubsets(seq):
seq = list(seq)
yield []
for nitems in xrange(1, len(seq)+1):
for subset in combinations(seq, nitems):
yield subset

Thanks to the very fast combinations() this xsubsets code is faster
than that recipe by Eric Raymond, but it has the disadvantage that it
doesn't yield subsets in the natural "binary" order. So a C
implementation can be done to respect that natural order.

Such "binary" order has a disadvantage, because the lists/tuples it
yields have a different length, so the C code may have troubles in
using the the nice trick it currently use, re-using the same memory
and avoiding a new memory allocation every loop:

/* Copy the previous result tuple or re-use it if available */
if (Py_REFCNT(result) > 1) {
PyObject *old_result = result;
result = PyTuple_New(r);
if (result == NULL)
goto empty;
co->result = result;
for (i=0; i<r ; i++) {
elem = PyTuple_GET_ITEM(old_result, i);
Py_INCREF(elem);
PyTuple_SET_ITEM(result, i, elem);
}
Py_DECREF(old_result);
}
/* Now, we've got the only copy so we can update it in-place
* CPython's empty tuple is a singleton and cached in
* PyTuple's freelist.
*/
assert(r == 0 || Py_REFCNT(result) == 1);

Maybe there are ways to avoid that in the C implementation of xsubsets
too, I don't know Python implementation enough to tell...

Bye,
bearophile

AggieDan04 · Apr 26, 2009

Some idioms are so common that I think they deserve to be written in C
into the itertools module.

1) leniter(iterator) ....
2) xpairwise(iterable) ....
3) xpairs(seq) ....
4) xsubsets(seq)

....

Good suggestions. Another useful function I'd like to see in
itertools is the Cartesian product. This can be implemented as:

def cartesian_product(*args):
"""Iterates over the Cartesian product of args[0], args[1], ..."""
if not args:
return
elif len(args) == 1:
for item in args[0]:
yield (item,)
else:
for item in args[0]:
for item2 in cartesian_product(*args[1:]):
yield (item,) + item2

Arnaud Delobelle · Apr 26, 2009

AggieDan04 said:
Good suggestions. Another useful function I'd like to see in
itertools is the Cartesian product. This can be implemented as:

[snip implementation]

You mean itertools.product?

http://docs.python.org/library/itertools.html#itertools.product

Arnaud Delobelle · Apr 27, 2009

Some idioms are so common that I think they deserve to be written in C
into the itertools module.

1) leniter(iterator) [...]
A Python implementation:

def leniter(iterator):
if hasattr(iterator, "__len__"):
return len(iterator)
nelements = 0
for _ in iterator:
nelements += 1
return nelements

Some people would write it as:

def leniter(iterable):
if hasattr(iterable, '__len__'):
return len(iteratble)
return sum(1 for _ in iterable)

[...]

2) xpairwise(iterable)

This is at the the bottom of the itertools docs:

def pairwise(iterable):
a, b = tee(iterable)
for elem in b:
break
return izip(a, b)

The following of mine is faster:

def xpairwise(iterable):
return izip(iterable, islice(iterable, 1, None))

This doesn't work if the iterable is an iterator.

E.g
[(0, 2), (3, 4)]

[...]

3) xpairs(seq)

This is a Python implementation:

def xpairs(seq):
len_seq = len(seq)
for i, e1 in enumerate(seq):
for j in xrange(i+1, len_seq):
yield e1, seq[j]

Why not:

def xpairs(seq):
for i, el in enumerate(seq):
for j in xrange(i):
yield seq[j], el

Mark Dickinson · Apr 27, 2009

3) xpairs(seq)

>>> list(xpairs(range(5)))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3),
(2, 4), (3, 4)]

Doesn't itertools.combinations already do this for you?
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2,
4), (3, 4)]

Mark

bearophileHUGS · Apr 27, 2009

Arnaud Delobelle:

Some people would write it as:

def leniter(iterable):
if hasattr(iterable, '__len__'):
return len(iteratble)
return sum(1 for _ in iterable)

That's slower than my version.

This doesn't work if the iterable is an iterator.

I see. I think I have never used it with an iterator so far

I'll fix it.

Why not:

def xpairs(seq):
for i, el in enumerate(seq):
for j in xrange(i):
yield seq[j], el

Maybe because I was not smart enough to invent that

-------------------

Mark Dickinson:

3) xpairs(seq)[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3),
(2, 4), (3, 4)]

Doesn't itertools.combinations already do this for you?
I have written most of those functions when I have used Python 2.4,
before itertools.combinations. But you are right, xpairs is now almost
useless, just one line long

Combinatorics is so fun

Thank you very much to both.

Bye,
bearophile

calculating binomial coefficients using itertools	1	Jul 5, 2013
itertools doc example "consume"	0	Mar 8, 2013
itertools, functools, file enhancement ideas	8	Apr 7, 2007
Question about permutations (itertools)	3	May 31, 2010
Cartesian Product of two lists (itertools)	3	Jan 25, 2009
PyWart: Itertools module needs attention	3	Sep 12, 2011
xslice idea \| a generator slice	7	Jul 11, 2013
Light slices + COW	5	May 3, 2008

Into itertools

bearophileHUGS

AggieDan04

Arnaud Delobelle

Arnaud Delobelle

Mark Dickinson

bearophileHUGS

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads