Python 3000, zip, *args and iterators

S

Steven Bethard

So, as I understand it, in Python 3000, zip will basically be replaced
with izip, meaning that instead of returning a list, it will return an
iterator. This is great for situations like:

zip(*[iter1, iter2, iter3])

where I want to receive tuples of (item1, item2, item3) from the
iterables. But it doesn't work well for a situation like:

zip(*tuple_iter)

where tuple_iter is an iterator to tuples of the form
(item1, item2, item3) and I want to receive three iterators, one to the
item1s, one to the item2s and one to the item3s. I don't think this
is too unreasonable of a desire as the current zip, in a situation like:

zip(*tuple_list)

where tuple_list is a list of tuples of the form (item1, item2, item3),
returns a list of three tuples, one of the item1s, one of the item2s and
one of the item3s.

Of course, the reason this doesn't work currently is that the fn(*itr)
notation converts 'itr' into a tuple, exhausting the iterator:
.... for i in xrange(x):
.... yield (i, i+1, i+2)
.... print "exhausted"
....exhausted
[(0, 1, 2, 3), (1, 2, 3, 4), (2, 3, 4, 5)]
exhausted
((0, 1, 2, 3), (1, 2, 3, 4), (2, 3, 4, 5))

What I would prefer is something like:
(<iterator object at ...>, <iterator object at ..., <iterator object at ...)

Of course, I can write a separate function that will do what I want
here[1] -- my question is if Python's builtin zip will support this in
Python 3000. It's certainly not a trivial change -- it requires some
pretty substantially backwards incompatible changes in how *args is
parsed for a function call -- namely that fn(*itr) only extracts as many
of the items in the iterable as necessary, e.g.
.... print x, y, args
.... print list(it.islice(args, 4))
....0 1 count(2)
[2, 3, 4, 5]

So I guess my real question is, should I expect Python 3000 to play
nicely with *args and iterators? Are there reasons (besides backwards
incompatibility) that parsing *args this way would be bad?


Steve


[1] In fact, with the help of the folks from this list, I did:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302325
 
T

Terry Reedy

Steven Bethard said:
So, as I understand it, in Python 3000, zip will basically be replaced
with izip, meaning that instead of returning a list, it will return an
iterator.

I think it worth repeating that Python 3 is at yet something of a
pipedream, as indicated by the joke name Python 3000 (that also being in
part a satire on Windows 2000, and the like). So, while Guido has said he
would like to make Python iterator-oriented in the way that it used to be
list-oriented, nothing is set in stone, certainly not the details.

Guido has also said that he would like there to be funding to pay him to
spend a year on its development. He wants to take that long so there will
be adequate discussion, thought, and testing so he can 'get it right' as
least in the sense of having everything work well together.

Terry J. Reedy
 
S

Steven Bethard

Terry said:
I think it worth repeating that Python 3 is at yet something of a
pipedream, as indicated by the joke name Python 3000 (that also being in
part a satire on Windows 2000, and the like).

True, true. And worth repeating.
So, while Guido has said he
would like to make Python iterator-oriented in the way that it used to be
list-oriented, nothing is set in stone, certainly not the details.

Right, though my understanding of PEP 3000[1] is that though "Python
3000" may never exist, the PEP is there as a road-map of where Python
as a language would like to go. I guess the point of my question is to
find out if this kind of nice interaction of *args and iterators is
something that's in the road-map. If it is, then maybe there are parts
of it that could be implemented in a way that's backwards compatible,
even if the full system wouldn't be available for some time. (Perhaps
something along the lines of "from __future__ import iter_args".)

Steve

[1] http://www.python.org/peps/pep-3000.html
 
T

Terry Reedy

Right, though my understanding of PEP 3000[1] is that though "Python
3000" may never exist, the PEP is there as a road-map of where Python as
a language would like to go.

A major backwards compatibility break will not happen without a major
number change to Py3. And I expect it to happen -- the 'as yet' was
intentional. In fact, here is my New Year's prediction (with subjective
certainty > .5):

a. The PyPy project will succeed.
b. Python3 (actually, the reference implementation thereof) will be written
in Python3 (perhaps with 'draft' in Py2).
c. We will see it within 5 years.

We will see if I am any better than the tabloid 'psychics'.
I guess the point of my question is to find out if this kind of nice
interaction of *args and iterators is something that's in the road-map.
If it is, then maybe there are parts of it that could be implemented in a
way that's backwards compatible, even if the full system wouldn't be
available for some time. (Perhaps something along the lines of "from
__future__ import iter_args".)

You can certainly share your concerns with the PEP author. I believe that
there is also a PyWiki page that you can directly add to.

Terry J. Reedy
 
S

Steven Bethard

Terry said:
You can certainly share your concerns with the PEP author. I believe that
there is also a PyWiki page that you can directly add to.

Yeah, I found the wiki page too[1]. Does anyone know if it's okay to
add things to this page? I had avoided doing so since it gives as its
description "This page lists features that GvR has mentioned as goals
for Python 3.0" which sounds like it's not intended for commentary by
the general Python community.

Maybe I should start a Python3.0Wishlist page?

Steve

[1]http://www.python.org/moin/Python3_2e0

P.S. I thought about posting to python-dev where GvR might hear directly
about this kind of thing, but it seems a little premature since most
predictions put Python 3.0 at least 3-5 years from now.
 
R

Raymond Hettinger

[Steven Bethard]
What I would prefer is something like:

(<iterator object at ...>, <iterator object at ..., <iterator object
at ...)
.. . .
So I guess my real question is, should I expect Python 3000 to play
nicely with *args and iterators? Are there reasons (besides backwards
incompatibility) that parsing *args this way would be bad? .. . .
In fact, with the help of the folks from this list, I did:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302325

* The answer to the first question is Yes. The point of Python 3000 is
building on what was learned and writing a simpler, cleaner language
without the encumbrance of backwards compatibility.

* However, IMHO, the proposed behavior doesn't qualify as "playing
nicely".

* Your excellent recipe provides a good basis for discussion and it
highlights some of the issues around the proposed behavior:

1: The current implementation's behavior is easy to learn, easy to
explain, and does what most folks expect (not folks who are pushing the
iterator and *arg protocols to the outer limits). In contrast, the
proposed recipe is somewhat complex and its implications are not
immediately obvious. The itertools.tee() component is of extra concern
because it invisibly introduces memory intensive characteristics into
an otherwise lightweight, low-overhead function.

2. It is instructive to look at Guido's reactions to other *args
proposals. His receptivity to a,b,*c=it wanes whenever someone then
requests support for a,*b,c=it. Likewise, he considers zip(*args) as a
transpose function to be an abuse of the *arg protocol. IOW,
supporting "odd" usages does not bode well for a proposal.

3. The recipe discussion and newsgroup posting present only toy
examples -- real use cases have not yet emerged. If some do emerge, I
suspect that each problem will have a better solution (using existing
tools) than the one being proposed. If so, then adopting the proposal
will have the negative effect of leading folks away from the correct
solution.


Raymond Hettinger


"Not everything that can be done, should be done."
 
A

Alex Martelli

Raymond Hettinger said:
"Not everything that can be done, should be done."

Or, to quote Scripture...:

"'Everything is permissible for me' -- but not everything is beneficial"
(1 Cor 6:12)...


Alex
 
S

Steve Holden

Raymond Hettinger wrote:
[...]
"Not everything that can be done, should be done."

.... and not everything that should be done, can be done.

regards
Steve
 
S

Steven Bethard

Raymond said:
[Steven Bethard]
What I would prefer is something like:


(<iterator object at ...>, <iterator object at ..., <iterator object
at ...)

2. It is instructive to look at Guido's reactions to other *args
proposals. His receptivity to a,b,*c=it wanes whenever someone then
requests support for a,*b,c=it.

Yeah, I've seen his responses to those kind of suggestions. I don't
think what I'm suggesting (at least in terms of *args) is quite as
extreme though -- I'm still only talking about *args in function
definitions. I'm just suggesting that in a function with a *args in the
def, the args variable be an iterator instead of a tuple. (This doesn't
entirely solve my zip problem of course, but it's the only *args change
I was suggesting.)
Likewise, he considers zip(*args) as a
transpose function to be an abuse of the *arg protocol.

Ahh, I didn't know that. Is there another (preferred) way to do this?
3. The recipe discussion and newsgroup posting present only toy
examples -- real use cases have not yet emerged.

Ok, I'll try to give you one of my use cases. It's a little
complicated, so sorry if my explanation goes on for a bit here.

Basically, I'm parsing one file format to another. The files can be
quite large, so it's important to use iterators wherever possible. My
conversion function is a generator that generates a (label,
feature_dict) pair for each line in the input file.

Now, two possible things can happen at this point (depending on
parameters from the user):

CASE 1: I output the (label, feature_dict) pairs as is, with code
something like:

for label, feature_dict in generator:
write_instance(label, feature_dict)

This is, of course, the simple case.

CASE 2: I need to apply a windowing function to the iterables so that
each line includes not only its feature_dict's values, but also the
values of some of the surrounding feature_dicts. Note that I only want
to window the feature_dicts, not the labels. This gives me code
something like:

labels, feature_dicts = starzip(generator)
for label, feature_window in izip(labels, window(feature_dicts)):
write_instance(label, combine_dicts(feature_widow))

Note that I can't write the code like:

for label, feature_dict in generator:
feature_dict = combine_dicts(window(feature_dict)) # WRONG!
write_instance(label, feature_dict)

because window produces an iterable from an *iterable* of feature_dicts,
not from a single feature_dict. So basically what I've done here is to
"transpose" (to use your word) the iterators, apply my function, and
then transpose the iterators back.


Hopefully this gives a little better justification for starzip? If you
have a cleaner way to do this kind of thing, I'd welcome any suggestions
of course.


If zip(*) is discouraged as a transpose function, maybe I should be
lobbying for adding a transpose function instead? (For now, of course,
it would go into itertools, but when iterators become the standard in
Python 3.0, maybe it could be moved into the builtins...)


Thanks for your comments!

Steve
 
R

Raymond Hettinger

[Steven Bethard] I'm just suggesting that in a function with a
*args in the def, the args variable be an iterator instead of
a tuple.

So people would lose the useful abilities to check len(args) or extract
an argument with args[1]?

Besides, if a function really wants an iterator, then its signature
should accept one directly -- no need for the star operator.


Ahh, I didn't know that. Is there another (preferred) way to do
this?

I prefer the abusive approach ;-) however, the Right Way (tm) is
probably nested list comps or just plain for-loops. And, if you have
numeric, there is an obvious preferred approach.


So basically what I've done here is to
"transpose" (to use your word) the iterators, apply my function, and
then transpose the iterators back.

If you follow the data movements, you'll find that iterators provide no
advantage here. To execute transpose(map(f, transpose(iterator)), the
whole iterator necessarily has to be read into memory so that the first
function application will have all of its arguments present -- using
the star operator only obscures that fact.

Realizing that the input has to be in memory anyway, then you might as
well take advantage of the code simplication offered by indexing:
.... data = list(iterable)
.... rows = range(len(data))
.... for col in xrange(len(data[0])):
.... args = [data[row][col] for rows in rows]
.... yield f(*args)



Raymond Hettinger
 
S

Steven Bethard

Raymond said:
[Steven Bethard] I'm just suggesting that in a function with a
*args in the def, the args variable be an iterator instead of
a tuple.


So people would lose the useful abilities to check len(args) or extract
an argument with args[1]?

No more than you lose these abilities with any other iterators:

def f(x, y, *args):
args = list(args) # or tuple(args)
if len(args) == 3:
print args[0], args[1], args[2]

True, if you do want to check argument counts, this is an extra step of
work. I personally find that most of my functions with *args parameters
look like:

def f(x, y, *args):
do_something1(x)
do_something2(y)
for arg in args:
do_something3(arg)

where having *args be an iterable would not be a problem.
If you follow the data movements, you'll find that iterators provide no
advantage here. To execute transpose(map(f, transpose(iterator)), the
whole iterator necessarily has to be read into memory so that the first
function application will have all of its arguments present -- using
the star operator only obscures that fact.

I'm not sure I follow you here. Looking at my code:

labels, feature_dicts = starzip(generator)
for label, feature_window in izip(labels, window(feature_dicts)):
write_instance(label, combine_dicts(feature_widow))

A few points:

(1) starzip uses itertools.tee, so it is not going to read the entire
contents of the generator in at once as long as the two parallel
iterators do not run out of sync

(2) window does not exhaust the iterator passed to it; instead, it uses
the items of that iterator to generate a new iterator in sync with the
original, so izip(labels, window(feature_dicts)) will keep the labels
and feature_dicts iterators in sync.

(3) the for loop just iterates over the izip iterator, so it should be
consuming (label, feature_window) pairs in sync.

I assume you disagree with one of these points or you wouldn't say that
"iterators provide no advantage here". Could you explain what doesn't
work here?

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top