itertools to iter transition (WAS: Pre-PEP: Dictionary accumulatormethods)

S

Steven Bethard

Jack said:
>
> itertools to iter transition, huh? I slipped that one in, I mentioned
> it to Raymond at PyCon and he didn't flinch. It would be nice not to
> have to sprinkle 'import itertools as it' in code. iter could also
> become a type wrapper instead of a function, so an iter instance could
> be a wrapper that figures out whether to call .next or __getitem__
> depending on it's argument.
> for item in iter(mylist).imap:
> print item
> or
> for item in iter.imap(mylist):
> print item

Very cool idea. I think the transition from
itertools.XXX(iterable, *args, **kwargs)
to
iter.XXX(iterable, *args, **kwargs)
ought to be pretty easy. The transition from here to
iter(iterable).XXX(*args, **kwargs)
seems like it might be more complicated though -- iter would have to
return a proxy object instead of the object returned by __iter__[1]. I
guess it already does that for objects that support only the __getitem__
protocol though, so maybe it's not so bad...

STeVe

[1] And you'd probably also want to special-case this so that if iter()
was called on an object that's already an instance of iter, that the
object itself was returned, not a proxy.
 
J

Jack Diederich

Jack said:
itertools to iter transition, huh? I slipped that one in, I mentioned
it to Raymond at PyCon and he didn't flinch. It would be nice not to
have to sprinkle 'import itertools as it' in code. iter could also
become a type wrapper instead of a function, so an iter instance could
be a wrapper that figures out whether to call .next or __getitem__
depending on it's argument.
for item in iter(mylist).imap:
print item
or
for item in iter.imap(mylist):
print item

Very cool idea. I think the transition from
itertools.XXX(iterable, *args, **kwargs)
to
iter.XXX(iterable, *args, **kwargs)
ought to be pretty easy. The transition from here to
iter(iterable).XXX(*args, **kwargs)
seems like it might be more complicated though -- iter would have to
return a proxy object instead of the object returned by __iter__[1]. I
guess it already does that for objects that support only the __getitem__
protocol though, so maybe it's not so bad...

I only included making iter a type to make it more symmetric with str
being a type. iter is currently a function, as a practical matter I wouldn't
mind if it doubled as a namespace but that might make others flinch.
[1] And you'd probably also want to special-case this so that if iter()
was called on an object that's already an instance of iter, that the
object itself was returned, not a proxy.
 
D

David Eppstein

Jack Diederich said:
I only included making iter a type to make it more symmetric with str
being a type. iter is currently a function, as a practical matter I wouldn't
mind if it doubled as a namespace but that might make others flinch.

iter having the attributes currently residing as methods in itertools
sounds just fine to me.

I really don't like iter as a type instead of a function, though. It
sounds like a cool idea at first glance, but then you think about it and
realize that (unlike what happens with any class name) iter(x) is almost
never going to return an object of that type.
 
R

Raymond Hettinger

[Jack Diederich]
[Steven Bethard]
Very cool idea. I think the transition from
itertools.XXX(iterable, *args, **kwargs)
to
iter.XXX(iterable, *args, **kwargs)
ought to be pretty easy.

Just to make sure you guys can live with your proposed syntax, trying using it
for a month or so and report back on whether the experience was pleasant. Try
dropping the following into your setup.py

def wrapiter():
import __builtin__, itertools
orig = __builtin__.iter
def iter(*args):
return orig(*args)
for name in ('__doc__', '__name__'):
setattr(iter, name, getattr(orig, name))
vars(iter).update(vars(itertools))
__builtin__.iter = iter
wrapiter()

If the experience works out, then all you're left with is the trivial matter of
convincing Guido that function attributes are a sure cure for the burden of
typing import statements.


Raymond Hettinger
 
V

Ville Vainio

Raymond> If the experience works out, then all you're left with is
Raymond> the trivial matter of convincing Guido that function
Raymond> attributes are a sure cure for the burden of typing
Raymond> import statements.

For one thing, it would make it harder to find the functions from the
docs. It's easy to find the doc for 'itertools', but iter object
methods would require browsing that infamous Chapter 2 of the
documentation...

Apart from that, I don't really see the advantage in moving away from
itertools.
 
S

Steven Bethard

Ville said:
Raymond> If the experience works out, then all you're left with is
Raymond> the trivial matter of convincing Guido that function
Raymond> attributes are a sure cure for the burden of typing
Raymond> import statements.

For one thing, it would make it harder to find the functions from the
docs. It's easy to find the doc for 'itertools', but iter object
methods would require browsing that infamous Chapter 2 of the
documentation...

Well, it would only make them as hard to find as, say, dict.fromkeys,
which is probably the best parallel here. Of course iter would have to
be documented as a builtin type. I don't find the argument "builtin
type methods are hard to find" convincing -- the solution here is to fix
the documentation, not refuse to add builtin types.
Apart from that, I don't really see the advantage in moving away from
itertools.

True it's not a huge win. But I'd argue that for the same reasons that
dict.fromkeys is a dict classmethod, the itertools methods could be iter
classmethods (or staticmethods). The basic idea being that it's nice to
place the methods associated with a type in that type's definiton. The
parallel's a little weaker here because calling iter doesn't always
produce objects of type iter:

py> class C(object):
.... def __iter__(self):
.... yield 1
....
py> iter(C())
<generator object at 0x011805A8>

But note that iter does produce 'iterator' objects for the old
__getitem__ protocol:

py> class C(object):
.... def __getitem__(self, index):
.... if index > 5:
.... raise IndexError
.... return index
....
py> iter(C())
<iterator object at 0x01162EF0>

I guess the real questions are[1]:
* How much does iter feel like a type?
* How closely are the itertools functions associated with iter?

STeVe

[1] There's also the question of how much you believe in OO tenets like
"functions closely associated with a type should be members of that type"...
 
M

Michael Spencer

Steven said:
Ville said:
Raymond> If the experience works out, then all you're left with is
Raymond> the trivial matter of convincing Guido that function
Raymond> attributes are a sure cure for the burden of typing
Raymond> import statements.

For one thing, it would make it harder to find the functions from the
docs. It's easy to find the doc for 'itertools', but iter object
methods would require browsing that infamous Chapter 2 of the
documentation...


Well, it would only make them as hard to find as, say, dict.fromkeys,
which is probably the best parallel here. Of course iter would have to
be documented as a builtin type. I don't find the argument "builtin
type methods are hard to find" convincing -- the solution here is to fix
the documentation, not refuse to add builtin types.
Apart from that, I don't really see the advantage in moving away from
itertools.


True it's not a huge win. But I'd argue that for the same reasons that
dict.fromkeys is a dict classmethod, the itertools methods could be iter
classmethods (or staticmethods). The basic idea being that it's nice to
place the methods associated with a type in that type's definiton. The
parallel's a little weaker here because calling iter doesn't always
produce objects of type iter:

py> class C(object):
... def __iter__(self):
... yield 1
...
py> iter(C())
<generator object at 0x011805A8>

But note that iter does produce 'iterator' objects for the old
__getitem__ protocol:

py> class C(object):
... def __getitem__(self, index):
... if index > 5:
... raise IndexError
... return index
...
py> iter(C())
<iterator object at 0x01162EF0>

I guess the real questions are[1]:
* How much does iter feel like a type?
* How closely are the itertools functions associated with iter?

STeVe

[1] There's also the question of how much you believe in OO tenets like
"functions closely associated with a type should be members of that
type"...
While we're on the topic, what do you think of having unary, non-summary
builtins automatically map themselves when called with an iterable that would
otherwise be an illegal argument:

e.g.,
int(iterable) -> (int(i) for i in iterable)
ord(iterable) -> (ord(i) for i in iterable)


This would be unambiguous, I think, in the cases of bool, int, callable, chr,
float, hex, id, long, oct, ord, vars...

It would shorten the common cases of:
for char in somestring:
ordchar = ord(char)
# do something with ordchar, but not char
to
for ordchar in ord(somestring):
...

It would not work for summarizing functions or those that can accept an iterable
today e.g., len, repr

Michael
 
G

George Sakkis

Steven Bethard said:
[snip]

I guess the real questions are[1]:
* How much does iter feel like a type?
* How closely are the itertools functions associated with iter?

STeVe

[1] There's also the question of how much you believe in OO tenets like
"functions closely associated with a type should be members of that type"...

I would answer positively for both: iter does feel like a type conceptually and (most, if not all)
itertools would be suitable methods for such a type. Here I am referring to 'type' more as an
interface (or protocol; i'm not sure of the difference) rather than a concrete class, so whether the
result of iter is an iterator or a generator object is of little importance as long as it works as
expected (that it, whether it makes calls to next() or __getitem__() becomes a hidden implementation
detail).

If iter was a type, it would also be neat to replace some itertool callables with special methods,
as it has been mentioned in another thread (http://tinyurl.com/6mmmf), so that:
iter(x)[a:b:c] := itertools.islice(iter(x),a,b,c)
iter(x) + iter(y) := itertools.chain(iter(x), iter(y))
iter(x) * 3 := itertools.chain(* itertools.tee(iter(x), 3))

George
 
J

Jack Diederich

Raymond> If the experience works out, then all you're left with is
Raymond> the trivial matter of convincing Guido that function
Raymond> attributes are a sure cure for the burden of typing
Raymond> import statements.

For one thing, it would make it harder to find the functions from the
docs. It's easy to find the doc for 'itertools', but iter object
methods would require browsing that infamous Chapter 2 of the
documentation...

Apart from that, I don't really see the advantage in moving away from
itertools.

I only use itertools when I have to currently, which isn't necessarily
bad (premature optimization etc) but I do use lists when I just need an
iterator - simply because 'list()' is easier to type than
'^<space><home>^n^nimport itertools as it<CR>^x^x' (emacsen to mark HERE,
jump to the top, import itertools, and jump back). If itertools methods
were handier I'd use them when I just want to iterate. As an anecdote
I use generator comprehensions[1] more often than list comprehensions.

I'll give the builtin manipulations a try but since I have to deal with
many machines I can't promise to flex it much.

-jack

[1] aside, I didn't care too much about upgrading machines 2.2 => 2.3, but
when 2.4 came along with set as a builtin and generator comprehensions it
was compelling.
 
B

Bengt Richter

While we're on the topic, what do you think of having unary, non-summary
builtins automatically map themselves when called with an iterable that would
otherwise be an illegal argument:
That last "otherwise" is pretty important for strings as in int('1234') ;-)
e.g.,
int(iterable) -> (int(i) for i in iterable)
ord(iterable) -> (ord(i) for i in iterable)


This would be unambiguous, I think, in the cases of bool, int, callable, chr,
float, hex, id, long, oct, ord, vars...
It would shorten the common cases of:
for char in somestring:
ordchar = ord(char)
# do something with ordchar, but not char

But wouldn't you really currently write the "->" form from above? I.e.,

for ordchar in (ord(c) for c in somestring):
...

to compare with
to
for ordchar in ord(somestring):
...
So it's not _that_ much shorter ;-)
It would not work for summarizing functions or those that can accept an iterable
today e.g., len, repr
I like concise expression, so I'm willing to try it. I guess it would be enough
to override __builtins__ to get a taste, e.g., (not thought through):
... oldint = __builtins__.int
... def __new__(cls, arg):
... try: return cls.oldint(arg)
... except (TypeError, ValueError):
... oi = cls.oldint
... return (oi(item) for item in arg)
... ...
1 23 456 ...
1 3 5 7 ...
1 2 3
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 7, in <generator expression>
ValueError: invalid literal for int(): x

Hm, ... ;-)

Regards,
Bengt Richter
 
T

Terry Reedy

Steven Bethard said:
True it's not a huge win. But I'd argue that for the same reasons that
dict.fromkeys is a dict classmethod, the itertools methods could be iter
classmethods (or staticmethods).

As near as I could tell from the doc, .fromkeys is the only dict method
that is a classmethod (better, typemethod) rather than an instance method.
And all list methods are instance methods. And I believe the same is true
of all number operations (and the corresponding special methods). So
..fromkeys seems to be an anomaly.

I believe the reason for its existence is that the signature for dict()
itself was already pretty well 'used up' and Guido preferred to add an
alternate constructor as a method rather than further complicate the
signature of dict() by adding a fromkeys flag to signal an alternate
interpretation of the first and possibly the second parameter.

Terry J. Reedy
 
G

George Sakkis

Terry Reedy said:
As near as I could tell from the doc, .fromkeys is the only dict method
that is a classmethod (better, typemethod) rather than an instance method.
And all list methods are instance methods. And I believe the same is true
of all number operations (and the corresponding special methods). So
.fromkeys seems to be an anomaly.

I believe the reason for its existence is that the signature for dict()
itself was already pretty well 'used up' and Guido preferred to add an
alternate constructor as a method rather than further complicate the
signature of dict() by adding a fromkeys flag to signal an alternate
interpretation of the first and possibly the second parameter.

Terry J. Reedy

Apart from the anomaly you mention, it's hard to justify dict.fromkeys' existence in today's python.
For one thing, how often does one need to initialize a dict with a bunch of keys mapped to the same
value ? I imagine this would be handy, for example, in ad-hoc implementations of sets before
python2.3, where the set elements were stored internally as keys in a dict and the values were not
used. Even when initializing with the same value is necessary, this can be accomplished in 2.4 with
essentially the same performance in one line (dict((i,value) for i in iterable)). The few more
keystrokes are just not worth an extra rarely used method. It looks like an a easy target for
removal in python 3K.

George
 
S

Steven Bethard

Terry said:
As near as I could tell from the doc, .fromkeys is the only dict method
that is a classmethod (better, typemethod) rather than an instance method.
And all list methods are instance methods. And I believe the same is true
of all number operations (and the corresponding special methods). So
.fromkeys seems to be an anomaly.

I believe the reason for its existence is that the signature for dict()
itself was already pretty well 'used up' and Guido preferred to add an
alternate constructor as a method rather than further complicate the
signature of dict() by adding a fromkeys flag to signal an alternate
interpretation of the first and possibly the second parameter.

True enough, and I also agree with George Sakkis's sentiment that
fromkeys() isn't really necessary now that set() is a builtin.

But if classmethods are intended to provide alternate constructors then
one could argue that the functions in itertools are appropriate as they
all produce iterators and are thus something like alternate iter
constructors. Of course you don't want every function that produces an
iterator as a member of the iter type, just like you don't want every
function that produces a dict as a member of the dict type. But I could
see that it might be reasonable to put some of the more commonly used
"alternate constructors" there...

STeVe
 
T

Terry Reedy

Steven Bethard said:
True enough, and I also agree with George Sakkis's sentiment that
fromkeys() isn't really necessary now that set() is a builtin.

So perhaps it will disappear in the future.
But if classmethods are intended to provide alternate constructors

But I do not remember that being given as a reason for classmethod(). But
I am not sure what was.

Terry J. Reedy
 
S

Steven Bethard

Terry said:
But I do not remember that being given as a reason for classmethod(). But
I am not sure what was.

Well I haven't searched thoroughly, but I know one place that it's
referenced is in descrintro[1]:

"Factoid: __new__ is a static method, not a class method. I initially
thought it would have to be a class method, and that's why I added the
classmethod primitive. Unfortunately, with class methods, upcalls don't
work right in this case, so I had to make it a static method with an
explicit class as its first argument. Ironically, there are now no known
uses for class methods in the Python distribution (other than in the
test suite). However, class methods are still useful in other places,
for example, to program inheritable alternate constructors."

Not sure if this is the only reason though, and even if it is, it might
not be entirely applicable because while the itertools functions may be
supplying alternate constructors, it's not clear why anyone would
subclass iter[2], so the constructors aren't likely to be inherited.

STeVe

[1] http://www.python.org/2.2.3/descrintro.html#__new__
[2] That is, in the simple case, where iter is still basically a factory
function, not a type wrapper.
 
V

Ville Vainio

Steven> to be documented as a builtin type. I don't find the
Steven> argument "builtin type methods are hard to find"
Steven> convincing -- the solution here is to fix the
Steven> documentation, not refuse to add builtin types.

Yep - that's why we should perhaps fix the documentation first :).

Steven> I guess the real questions are[1]:
Steven> * How much does iter feel like a type?

Guess this depends on the person. I've never thought of it as a
type. It's too fundamental a concept to coerce into a type, even
thought protocol == type in a sense.

Steven> [1] There's also the question of how much you believe in
Steven> OO tenets like "functions closely associated with a type
Steven> should be members of that type"...

The issue that really bothers me here is bloating the builtin
space. We already have an uncomfortable amount of builtin
functions. Of course the additions that have been suggested would not
pollute the builtin namespace, but they would still be there, taking
space. I'd rather see a more modular and 'slimmer' Python, what with
the advent of Python for S60 and other embedded uses.

Perhaps what you need is 'from usefulstuff import *', with usefulstuff
having os, sys, 'itertools as it', &c.
 
S

Steven Bethard

Ville said:
The issue that really bothers me here is bloating the builtin
space. We already have an uncomfortable amount of builtin
functions. Of course the additions that have been suggested would not
pollute the builtin namespace, but they would still be there, taking
space. I'd rather see a more modular and 'slimmer' Python, what with
the advent of Python for S60 and other embedded uses.

Certainly a valid point. How would you feel about adding just a select
few itertools functions, perhaps just islice, chain and tee? These
functions provide the operations that exist for lists but don't, by
default, exist for iterators: slicing, concatenation and copying.

STeVe
 
V

Ville Vainio

Steven> Certainly a valid point. How would you feel about adding
Steven> just a select few itertools functions, perhaps just
Steven> islice, chain and tee?

A minimal set would not be that offensive, yes. But then we would have
two places to look for itertools functionality, which may not be
desirable.

One thing that might be worth keeping in mind is that some of
itertools functionality is going to become obsolete come py3k
(izip->zip), and some is already (imap). At least such operations
should not be dumped into the builtin iter.
 
S

Steven Bethard

Ville said:
A minimal set would not be that offensive, yes. But then we would have
two places to look for itertools functionality, which may not be
desirable.

True, though this is currently necessary with str objects if you want to
use, say string.maketrans, so it's not without some precedent. If it's
necessary to leave anything in itertools, my suggestion would be that
the documentation for the iter "type" have a clear "see also" link to
the itertools module.
One thing that might be worth keeping in mind is that some of
itertools functionality is going to become obsolete come py3k
(izip->zip), and some is already (imap). At least such operations
should not be dumped into the builtin iter.

Yeah, maps and filters are basically obsolete as of generator
expressions. The list of itertools functions that don't seem obsolete
(and won't be made obsolete by Python 3.0):

chain
count
cycle
dropwhile
groupby
islice
repeat
takewhile
tee

As I suggested, I think that chain, islice and tee are tightly coupled
with iterator objects, providing concatenation, slicing and copying
operations. This leaves:

count
cycle
dropwhile
groupby
repeat
takewhile

None of these really have analogs in sequence objects, so I consider
them less tightly tied to iter. I'd probahbly say that these are more
along the lines of alternate constructors, ala dict.fromkeys. While
they're certainly useful at times, I'd be happy enough to leave them in
itertools if that was the general feeling. Of course I guess I'd be
happy enough to leave everything in itertools if that was the general
feeling (or the BDFL pronouncement). ;)

STeVe
 
D

David M. Cooke

Steven Bethard said:
Terry said:
But I do not remember that being given as a reason for
classmethod(). But I am not sure what was.

Well I haven't searched thoroughly, but I know one place that it's
referenced is in descrintro[1]:

"Factoid: __new__ is a static method, not a class method. I initially
thought it would have to be a class method, and that's why I added the
classmethod primitive. Unfortunately, with class methods, upcalls
don't work right in this case, so I had to make it a static method
with an explicit class as its first argument. Ironically, there are
now no known uses for class methods in the Python distribution (other
than in the test suite).

Not true anymore, of course (it was in 2.2.3). In 2.3.5, UserDict,
tarfile and some the Mac-specific module use classmethod, and the
datetime extension module use the C version (the METH_CLASS flag).

And staticmethod (and METH_STATIC) aren't used at all in 2.3 or 2.4 :)
[if you ignore __new__]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top