itertools candidate: warehouse()

R

Robert Brewer

def warehouse(stock, factory=None):
"""warehouse(stock, factory=None) -> iavailable, iremainder.

Iterate over stock, yielding each value. Once the 'stock' sequence
is
exhausted, the factory function (or any callable, such as a class)
is
called to produce a new valid object upon each subsequent call to
next().

If factory is None, the class of the first item in the sequence is
used
as a constructor. If the factory function is not a bound method, it
does
not receive any arguments, so if your class has mandatory arguments
to
__init__, wrap the class in a function which can supply those.

A common use for warehouse is to reuse a set of existing objects,
often
because object creation and/or destruction is expensive. The
warehouse
function returns the second iterable ('iremainder') to allow
consumers to
"clean up" any items from the initial sequence which did not get
re-used.

For example, given a homogeneous iterable named 'i':

available, remainder = warehouse(i)
for thing in some_other_sequence:
thing.use(available.next())
for item in remainder:
item.close()
"""

if not hasattr(stock, 'next'):
stock = iter(stock)

def pull():
"""An inner generator from itertools.warehouse()."""
for item in stock:
yield item

if factory is None:
try:
local_factory = item.__class__
except NameError:
raise ValueError("Empty sequence and no factory
supplied.")
else:
local_factory = factory

while True:
yield local_factory()

return pull(), stock


What do you all think? I've been using a class-based variant of this in
production code (business app) for six months now, and have found it
extremely helpful. I saw a pointer to itertools today and figured a
function-based version might be nice to include in that module someday.


Robert Brewer
MIS
Amor Ministries
(e-mail address removed)
 
M

Michael Hoffman

Robert said:
What do you all think? I've been using a class-based variant of this in
production code (business app) for six months now, and have found it
extremely helpful. I saw a pointer to itertools today and figured a
function-based version might be nice to include in that module someday.

Could you give a real-world example of its use?
 
C

Carlos Ribeiro

Could you give a real-world example of its use?

I think Robert is referring to a conversation where I took part
earlier today. My problem was to evaluate some alternatives for a
recursive generator. The problem with the recursive generator is that
one has to loop over the nested generators simply to yield back the
results to the original caller; for example, if you are at depth three
on the recursive generator, each yield in the inner generator will in
turn generate two more yields on as you go up the recursion stack,
before finally yielding the value to the original caller.

The warehouse may be useful in that situation; the solution is not
exactly the same, but it allows to express similar ideas and may be a
good alternative when designing generator-based code for complex
structures.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
P

Peter Otten

Robert said:
def warehouse(stock, factory=None):

[snip documentation]
if not hasattr(stock, 'next'):
stock = iter(stock)

Note that unconditional use of iter() is usually harmless:
True
def pull():
"""An inner generator from itertools.warehouse()."""
for item in stock:
yield item

if factory is None:
try:
local_factory = item.__class__
except NameError:
raise ValueError("Empty sequence and no factory
supplied.")
else:
local_factory = factory

while True:
yield local_factory()

return pull(), stock


What do you all think? I've been using a class-based variant of this in
production code (business app) for six months now, and have found it
extremely helpful. I saw a pointer to itertools today and figured a
function-based version might be nice to include in that module someday.

Most of the building blocks for the warehouse() are already there, but you
didn't use them, oddly enough.
So here comes a variant written in terms of current (2.4) itertools:
.... a, b = tee(iterable)
.... try:
.... return a, b.next()
.... except StopIteration:
.... raise ValueError("cannot peek into an empty iterable")
........ print a, repr(b)
....
0 'a'
1 'b'
2 'c'
3 ''
4 ''
5 ''
6 ''
7 ''
8 ''
9 ''

Ok, I see I'm suffering from a functional overdose :)
I would have suggested chain(iterable, starmap(factory, repeat(()))) for
inclusion in the collection of recipes in the documentation, but upon
checking the development docs I see that Raymond Hettinger has already been
there, done that with the starmap() part.

So now you can do

chain(iterable, repeatfunc(factory))

after putting a copy of these recipes into your site-packages. Why aren't
they already there, btw?

The only extra your warehouse() has to offer is the (last) item's class as
the default factory for the padding items. I don't think that is needed
often enough to warrant the inclusion in the itertools.

Peter
 
A

Alex Martelli

Peter Otten said:
...
No, you cannot pass a callable to repeat() and have it called. In the above
line repeat(()) yields the same empty tuple ad infinitum. The trick is that
starmap() calls random.random() with that empty tuple as the argument list,

Stylistically, I prefer
iter(random.random, None)
using the 2-args form of the built-in iter, to
itertools.starmap(random.random, itertools.repeat(()))

However, itertools IS a speed demon...:

kallisti:~/cb alex$ python -m timeit -s 'import random, itertools as it'
\ > 'list(it.islice(iter(random.random, None), 666))'
1000 loops, best of 3: 884 usec per loop

kallisti:~/cb alex$ python -m timeit -s 'import random, itertools as it'
\ > 'list(it.islice(it.starmap(random.random, it.repeat(())), 666))'
1000 loops, best of 3: 407 usec per loop


Alex
 
J

Jack Diederich

Stylistically, I prefer
iter(random.random, None)
using the 2-args form of the built-in iter, to
itertools.starmap(random.random, itertools.repeat(()))

kallisti:~/cb alex$ python -m timeit -s 'import random, itertools as it'
\ > 'list(it.islice(it.starmap(random.random, it.repeat(())), 666))'

unrelated, I also often import itertools as 'it'. It bothers me because
I do it frequently (grep says 25 of 97 modules). Moving all of itertools into
builtins seems like overkill, but could we hang them off the 'iter' builtin?
Making the above:
kallisti:~/cb alex$ python -m timeit -s 'import random'
\ > 'list(iter.islice(iter.starmap(random.random, iter.repeat(())), 666))'

My life would definitely get easier, no more assuming 'it' is imported and
then adding the import when I find out differently.

A quick dir() of iter shows no conflicts.['__call__', '__class__', '__cmp__', '__delattr__', '__doc__', '__getattribute__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__str__']


-Jack
 
C

Carlos Ribeiro

Stylistically, I prefer
iter(random.random, None)
using the 2-args form of the built-in iter, to
itertools.starmap(random.random, itertools.repeat(()))

kallisti:~/cb alex$ python -m timeit -s 'import random, itertools as it'
\ > 'list(it.islice(it.starmap(random.random, it.repeat(())), 666))'

unrelated, I also often import itertools as 'it'. It bothers me because
I do it frequently (grep says 25 of 97 modules). Moving all of itertools into
builtins seems like overkill, but could we hang them off the 'iter' builtin?
Making the above:
kallisti:~/cb alex$ python -m timeit -s 'import random'
\ > 'list(iter.islice(iter.starmap(random.random, iter.repeat(())), 666))'

My life would definitely get easier, no more assuming 'it' is imported and
then adding the import when I find out differently.

A quick dir() of iter shows no conflicts.['__call__', '__class__', '__cmp__', '__delattr__', '__doc__', '__getattribute__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__str__']

Just wondering, but maybe this idea could be generalized to reduce the
(ever growing) cluttering of the builtins namespace. I'm not sure
about the side effects though. The basic idea is that one would still
import the itertools module (under the name iter, in this example):

import iter

....and then it get both iter(), and all iter.<something> methods.

It's actually possible to do some similar things right now; for
example, the str builtin allows one to access all string methods. Is
there any 'showstopper' for this generalization? I don't know if a
module can have a __call__ method, but it seems to be something
interesting to explore...

Just curious, and sincerely, haven't checked it anywhere...


--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (e-mail address removed)
mail: (e-mail address removed)
 
R

Raymond Hettinger

[Alex Martelli]
Stylistically, I prefer
iter(random.random, None)
using the 2-args form of the built-in iter, to
itertools.starmap(random.random, itertools.repeat(()))

FWIW, I prefer loading the itertools recipes so I can write:

repeatfunc(random.random)

IMO, that is plainer than both the iter() and starmap() versions.

However, itertools IS a speed demon...:

kallisti:~/cb alex$ python -m timeit -s 'import random, itertools as it'
\ > 'list(it.islice(iter(random.random, None), 666))'
1000 loops, best of 3: 884 usec per loop

kallisti:~/cb alex$ python -m timeit -s 'import random, itertools as it'
\ > 'list(it.islice(it.starmap(random.random, it.repeat(())), 666))'
1000 loops, best of 3: 407 usec per loop

Also time:

iter(random.random, 1.0)

IIRC, floats compare to each other faster than a float to None.


Raymond Hettinger
 
A

Alex Martelli

Raymond Hettinger said:
[Alex Martelli]
Stylistically, I prefer
iter(random.random, None)
using the 2-args form of the built-in iter, to
itertools.starmap(random.random, itertools.repeat(()))

FWIW, I prefer loading the itertools recipes so I can write:

repeatfunc(random.random)

IMO, that is plainer than both the iter() and starmap() versions.

I agree, but I cannot distribute Python code containing that, since the
itertools recipes aren't part of the stdlib.

Also time:

iter(random.random, 1.0)

IIRC, floats compare to each other faster than a float to None.

Excellent observation! They do indeed:

kallisti:~/cb/little_neat_things alex$ python -m timeit -s 'import
random, itertools as it' 'list(it.islice(iter(random.random, 1.0),
666))'
1000 loops, best of 3: 564 usec per loop


Alex
 
P

Peter Otten

Alex said:
Raymond Hettinger said:
[Alex Martelli]
Stylistically, I prefer
iter(random.random, None)
using the 2-args form of the built-in iter, to
itertools.starmap(random.random, itertools.repeat(()))

FWIW, I prefer loading the itertools recipes so I can write:

repeatfunc(random.random)

IMO, that is plainer than both the iter() and starmap() versions.

I agree, but I cannot distribute Python code containing that, since the
itertools recipes aren't part of the stdlib.

Also time:

iter(random.random, 1.0)

IIRC, floats compare to each other faster than a float to None.

Excellent observation! They do indeed:

kallisti:~/cb/little_neat_things alex$ python -m timeit -s 'import
random, itertools as it' 'list(it.islice(iter(random.random, 1.0),
666))'
1000 loops, best of 3: 564 usec per loop

My first idea was to entirely remove the sentinel, but the benchmarks were a
bit disappointing (not to mention any stylistic damage). So just for the
record:

from random import random
from itertools import *

N = 666

def star():
return list(islice(starmap(random, repeat(())), N))

class Rand(object):
next = random
def __iter__(self): return self

def iter_inf(rand=Rand()):
return list(islice(iter(rand), N))

def iter_sent(sentinel):
return list(islice(iter(random, sentinel), N))

$ ./python -m timeit -s"import bench" "bench.star()"
10000 loops, best of 3: 125 usec per loop
$ ./python -m timeit -s"import bench" "bench.iter_sent(None)"
1000 loops, best of 3: 238 usec per loop
$ ./python -m timeit -s"import bench" "bench.iter_sent(1.0)"
10000 loops, best of 3: 175 usec per loop
$ ./python -m timeit -s"import bench" "bench.iter_inf()"
10000 loops, best of 3: 170 usec per loop

Peter
 
R

Raymond Hettinger

[Raymond Hettinger]
[Alex Martelli]
I agree, but I cannot distribute Python code containing that, since the
itertools recipes aren't part of the stdlib.

Hogwash (see below).


--- in a separate note ---
[Robert Brewer]
But in general, I don't write scripts for my own limited use;
I'm writing frameworks, which shouldn't depend upon little
recipes scattered hither and yon. :/

Double hogwash ;-)

Just paste the relevant recipe in your code and be done. No need for code
scattered hither and yon. Just apply a pre-made solution ready for re-use.

It is not a sin to write little helper functions to make the rest of your code
more readable. Recipes are perfect for this use because they have been tested
and refined for generic re-use.

Since when did a design pattern or code technique have to be in the standard
library to be useful?


Raymond
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top