defining the behavior of zip(it, it) (WAS: Converting a flat list...)

S

Steven Bethard

[Duncan Booth]
> >>> aList = ['a', 1, 'b', 2, 'c', 3]
> >>> it = iter(aList)
> >>> zip(it, it)
>[('a', 1), ('b', 2), ('c', 3)]

[Alan Isaac]

[Bengt Richter]
> That says
> """
> ii. The other problem is easier to explain by example.
> Let it=iter([1,2,3,4]).
> What is the result of zip(*[it]*2)?
> The current answer is: [(1,2),(3,4)],
> but it is impossible to determine this from the docs,
> which would allow [(1,3),(2,4)] instead (or indeed
> other possibilities).
> """
> IMO left->right is useful enough to warrant making it defined
> behaviour

And in fact, it is defined behavior for itertools.izip() [1].

I don't see why it's such a big deal to make it defined behavior for
zip() too.

STeVe

[1]http://docs.python.org/lib/itertools-functions.html#l2h-1392
 
R

rhettinger

ii. The other problem is easier to explain by example.
Let it=iter([1,2,3,4]).
What is the result of zip(*[it]*2)?
The current answer is: [(1,2),(3,4)],
but it is impossible to determine this from the docs,
which would allow [(1,3),(2,4)] instead (or indeed
other possibilities).
"""
IMO left->right is useful enough to warrant making it defined
behaviour

And in fact, it is defined behavior for itertools.izip() [1].

I don't see why it's such a big deal to make it defined behavior for
zip() too.

IIRC, this was discussednd rejected in an SF bug report. It should not
be a defined behavior for severals reasons:

* It is not communicative to anyone reading the code that zip(it, it)
is creating a sequence of the form (it0, it1), (it2, it3), . . . IOW,
it conflicts with Python's style of plain-speaking.
* It is too clever by far -- much more of a trick than a technique.
* It is bug-prone -- zip(x,x) behaves differently when x is a sequence
and when x is an iterator (because of restartability). Don't leave
landmines for your code maintainers.
* The design spirit of zip() and related functions is from the world of
functional programming where a key virtue is avoidance of side-effects.
Writing zip(it, it) is exploiting a side-effect of the implementation
and its interaction with iterator inputs. The real spirit of zip() is
having multiple sequences translated to grouped sequences out -- the
order of application (and order of retrieving inputs) should be
irrelevant.
* Currently, a full understanding of zip() can be had by remembering
that it maps zip(a, b) to (a0, b0), (a1, b1), . . . . That is simple
to learn and easily remembered. In contrast, it introduces unnecessary
complexity to tighten the definition to also include the order of
application to cover the special case of zip being used for windowing.
IOW, making this a defined behavior results in making the language
harder to learn and remember.

Overall, I think anyone using zip(it,it) is living in a state of sin,
drawn to the tempations of one-liners and premature optimization. They
are forsaking obvious code in favor of screwy special cases. The
behavior has been left undefined for a reason.


Raymond Hettinger
 
B

bonono

IIRC, this was discussednd rejected in an SF bug report. It should not
be a defined behavior for severals reasons:

* It is not communicative to anyone reading the code that zip(it, it)
is creating a sequence of the form (it0, it1), (it2, it3), . . . IOW,
it conflicts with Python's style of plain-speaking.
* It is too clever by far -- much more of a trick than a technique.
* It is bug-prone -- zip(x,x) behaves differently when x is a sequence
and when x is an iterator (because of restartability). Don't leave
landmines for your code maintainers.
* The design spirit of zip() and related functions is from the world of
functional programming where a key virtue is avoidance of side-effects.
Writing zip(it, it) is exploiting a side-effect of the implementation
and its interaction with iterator inputs. The real spirit of zip() is
having multiple sequences translated to grouped sequences out -- the
order of application (and order of retrieving inputs) should be
irrelevant.
* Currently, a full understanding of zip() can be had by remembering
that it maps zip(a, b) to (a0, b0), (a1, b1), . . . . That is simple
to learn and easily remembered. In contrast, it introduces unnecessary
complexity to tighten the definition to also include the order of
application to cover the special case of zip being used for windowing.
IOW, making this a defined behavior results in making the language
harder to learn and remember.

Overall, I think anyone using zip(it,it) is living in a state of sin,
drawn to the tempations of one-liners and premature optimization. They
are forsaking obvious code in favor of screwy special cases. The
behavior has been left undefined for a reason.
While I agree that zip() should be left as it is(the definition as it
is only supposed to do a lock step of N iterables), I don't think
zip(it,it) is purely for the sake of one liner. It does convey the
intend clearer to me than for loop(well that IMO is the reasons of
many one liner). It tells me exactly what I want:

make a "window/mask" of N slots and put it on top of a list and fold.
The result are strips of the list each with element N.

zip(it,it,it,it) is a mask of 4 slot
zip(it,it,it,it,it) is a mask of 5 slot

Though in this case it is still not 100% correct as if my list has odd
number of elements, zip(it,it) would drop the last one, but that was
not defined in the original question.
 
R

rurpy

ii. The other problem is easier to explain by example.
Let it=iter([1,2,3,4]).
What is the result of zip(*[it]*2)?
The current answer is: [(1,2),(3,4)],
but it is impossible to determine this from the docs,
which would allow [(1,3),(2,4)] instead (or indeed
other possibilities).
"""
IMO left->right is useful enough to warrant making it defined
behaviour

And in fact, it is defined behavior for itertools.izip() [1].

I don't see why it's such a big deal to make it defined behavior for
zip() too.

IIRC, this was discussednd rejected in an SF bug report. It should not
be a defined behavior for severals reasons:

* It is not communicative to anyone reading the code that zip(it, it)
is creating a sequence of the form (it0, it1), (it2, it3), . . . IOW,
it conflicts with Python's style of plain-speaking.

Seems pretty plain to me, and I am far from a Python expert.
* It is too clever by far -- much more of a trick than a technique.

I see tons of tricks posted every day. There is a time and place
for them.
* It is bug-prone -- zip(x,x) behaves differently when x is a sequence
and when x is an iterator (because of restartability). Don't leave
landmines for your code maintainers.

Err thanks for the advice, but they are *my* code maintainers and
I am the best judge of what constitutes a landmine.
* The design spirit of zip() and related functions is from the world of
functional programming where a key virtue is avoidance of side-effects.
Writing zip(it, it) is exploiting a side-effect of the implementation
and its interaction with iterator inputs. The real spirit of zip() is
having multiple sequences translated to grouped sequences out -- the
order of application (and order of retrieving inputs) should be
irrelevant.

I'm not sure what to say when people start talking about "spirit"
when making technical decisions.
* Currently, a full understanding of zip() can be had by remembering
that it maps zip(a, b) to (a0, b0), (a1, b1), . . . . That is simple
to learn and easily remembered. In contrast, it introduces unnecessary
complexity to tighten the definition to also include the order of
application to cover the special case of zip being used for windowing.
IOW, making this a defined behavior results in making the language
harder to learn and remember.

I don't see how defining zip()'s behavior more precisely, interfers
at all with this understanding.
Overall, I think anyone using zip(it,it) is living in a state of sin,
drawn to the tempations of one-liners and premature optimization. They
are forsaking obvious code in favor of screwy special cases. The
behavior has been left undefined for a reason.

I really don't understand this point of view. It is exactly what
I was just complaining about in a different thread. Every one
your reasons (except the last, which is very weak anyway) is
based on how you think someone may use zip if you define
fully what it does.

Why do you feel compelled to tell me how I should or
shouldn't write my code?!

It is this attitude of "we know what is best for you, child"
that really pisses me off about Python sometimes. Please
make language decisions based on technical considerations
and not how you want me to use the language. Believe it
or not, I can make those decisions on my own.
 
B

bonono

Err thanks for the advice, but they are *my* code maintainers and
I am the best judge of what constitutes a landmine. :)


I'm not sure what to say when people start talking about "spirit"
when making technical decisions.
I am with raymond on this one though. It is not really about spirit but
the definition of zip(). How it would interact with "it", whatever it
is really has nothing to do with the functionality of zip(), and there
is no technically reason why it would do it one way or another. zip()
only gurantee zip(a,b) to be [(a0,b0), (a1,b1) ...], no more, no less.
I don't see how defining zip()'s behavior more precisely, interfers
at all with this understanding.
See above.
I really don't understand this point of view. It is exactly what
I was just complaining about in a different thread. Every one
your reasons (except the last, which is very weak anyway) is
based on how you think someone may use zip if you define
fully what it does.

Why do you feel compelled to tell me how I should or
shouldn't write my code?!

It is this attitude of "we know what is best for you, child"
that really pisses me off about Python sometimes. Please
make language decisions based on technical considerations
and not how you want me to use the language. Believe it
or not, I can make those decisions on my own.

That is so central to python though ;-)

But I think it is implemented in a very interesting way, depends on
some magical thing in the head of the creator.

Unlike a language like Haskell where you must ahere or else your
program won't run(Haskell compiler writers consider it a BUG in the
language if you find a trick to corner it), Python allows you to do a
lot of tricky things yet have lots of interesting idioms telling you
not to.

But that to me is one thing I like about Python, I can ignore the idiom
and do my work.
 
R

rurpy

Err thanks for the advice, but they are *my* code maintainers and
I am the best judge of what constitutes a landmine. :)


I'm not sure what to say when people start talking about "spirit"
when making technical decisions.
I am with raymond on this one though. It is not really about spirit but
the definition of zip(). How it would interact with "it", whatever it
is really has nothing to do with the functionality of zip(), and there
is no technically reason why it would do it one way or another. zip()
only gurantee zip(a,b) to be [(a0,b0), (a1,b1) ...], no more, no less.

I don't have a problem with that. That is a reason based on the
intrinsic behavior that zip() should have, without regard to how
someone might or might not use it. In contrast, Raymond's
arguments were based on how zip() might be used if it were
more precisely defined. I did not take any position on how zips
behavior should be defined, my only beef was the arguments
used in support of not defining it.

I am not sure if I looked at the same SF Bug that Raymond refered
to, but the submitter also was not asking for a change of behavior,
only either documenting how zip() behaves, or documenting it as
undefined. Seemed like a resonable request to me.
See above.

Sorry, I still don't see that defining the order in which the argument
elements are processed, makes zip() harder to understand in any
real material sense.
That is so central to python though ;-)

But I think it is implemented in a very interesting way, depends on
some magical thing in the head of the creator.

Unlike a language like Haskell where you must ahere or else your
program won't run(Haskell compiler writers consider it a BUG in the
language if you find a trick to corner it), Python allows you to do a
lot of tricky things yet have lots of interesting idioms telling you
not to.

Haskell is high on the queue to be learned (along with Ocaml) but
I would not consider either as general purpose prgramming languages
in the sense of C, Java, or (almost) Python. (But maybe I am wrong)
So I would not be surprised to find them to be built around, and
enforce, a very specific programming style.
But that to me is one thing I like about Python, I can ignore the idiom
and do my work.

Well, I do too mostly. On rereading my post, it seems I overreacted
a bit. But the attitude I complained about I think is real, and has
led to more serious flaws like the missing if-then-else expression,
something I use in virtually every piece of code I write, and which
increases readability. (Well, ok that is not the end of the world
either but it's lack is irritating as hell, and yes, I know that it is
now back in favor.)
 
B

bonono

Well, I do too mostly. On rereading my post, it seems I overreacted
a bit. But the attitude I complained about I think is real, and has
led to more serious flaws like the missing if-then-else expression,
something I use in virtually every piece of code I write, and which
increases readability. (Well, ok that is not the end of the world
either but it's lack is irritating as hell, and yes, I know that it is
now back in favor.)

I have the same feeling too, which I coin the "personality" of the
group and the language in general :)
 
R

rurpy

I have the same feeling too, which I coin the "personality" of the
group and the language in general :)

Yes, I've often thought I'd like to study sociology, using the internet
as a laboratory.

I know that I'm tilting at windmills, but I optimistic enough to hope
that maybe some small change will result.
The problem is that Python is really a very good language which
makes its flaws stand out all the more.
 
S

Steven Bethard

ii. The other problem is easier to explain by example.
Let it=iter([1,2,3,4]).
What is the result of zip(*[it]*2)?
The current answer is: [(1,2),(3,4)],
but it is impossible to determine this from the docs,
which would allow [(1,3),(2,4)] instead (or indeed
other possibilities).
"""
IMO left->right is useful enough to warrant making it defined
behaviour

And in fact, it is defined behavior for itertools.izip() [1].

I don't see why it's such a big deal to make it defined behavior for
zip() too.


IIRC, this was discussednd rejected in an SF bug report. It should not
be a defined behavior for severals reasons:
[snip arguments about how confusing zip(it, it) is]
Overall, I think anyone using zip(it,it) is living in a state of sin,
drawn to the tempations of one-liners and premature optimization. They
are forsaking obvious code in favor of screwy special cases. The
behavior has been left undefined for a reason.

Then why document itertools.izip() as it is? The documentation there is
explicit enough to know that izip(it, it) will work as intended. Should
we make the documentation there less explicit to discourage people from
using the izip(it, it) idiom?

STeVe
 
B

bonono

Steven said:
ii. The other problem is easier to explain by example.
Let it=iter([1,2,3,4]).
What is the result of zip(*[it]*2)?
The current answer is: [(1,2),(3,4)],
but it is impossible to determine this from the docs,
which would allow [(1,3),(2,4)] instead (or indeed
other possibilities).
"""
IMO left->right is useful enough to warrant making it defined
behaviour

And in fact, it is defined behavior for itertools.izip() [1].

I don't see why it's such a big deal to make it defined behavior for
zip() too.


IIRC, this was discussednd rejected in an SF bug report. It should not
be a defined behavior for severals reasons:
[snip arguments about how confusing zip(it, it) is]
Overall, I think anyone using zip(it,it) is living in a state of sin,
drawn to the tempations of one-liners and premature optimization. They
are forsaking obvious code in favor of screwy special cases. The
behavior has been left undefined for a reason.

Then why document itertools.izip() as it is? The documentation there is
explicit enough to know that izip(it, it) will work as intended. Should
we make the documentation there less explicit to discourage people from
using the izip(it, it) idiom?
That to me is also a slip but does demonstrate that it is easy for
people to be drawn into this "sin", including those people responsible
for the formal documentation, or those behind the implementation.

But technically speaking, you are still referring to the implementation
detail of izip(), not the functionality of izip().

I do now agree with another poster that the documentation of both zip
and izip should state clear that the order of picking from which
iterable is undefined or can be changed from implementation to
implementation, to avoid this kind of temptation.
 
F

Fredrik Lundh

led to more serious flaws like the missing if-then-else expression,
something I use in virtually every piece of code I write, and which
increases readability.

you obviously need to learn more Python idioms. Python works better
if you use it to write Python code; not when you mechanically translate
stuff written in other languages to Python.
(Well, ok that is not the end of the world either but it's lack is irritating
as hell, and yes, I know that it is now back in favor.)

the thing that's in favour is "then-if-else", not "if-then-else".

</F>
 
A

Antoon Pardon

Op 2005-11-23 said:
you obviously need to learn more Python idioms. Python works better
if you use it to write Python code; not when you mechanically translate
stuff written in other languages to Python.

What does this mean?

It could mean that python works better with those concepts that are
already implemented in python. That seems obvious, but isn't
an argument for or against implementing a particular language
feature.

It could also mean that some language feature will never work well
in python even when implemented. Are you arguing that a conditional
expression is such a feature?
the thing that's in favour is "then-if-else", not "if-then-else".

Well I don't know about the previous poster, but I'm mostly interesseted
in a conditional expression. Whether it is "then-if-else" or "if-then-else"
seems less important to me.
 
D

Dave Hansen

(e-mail address removed) wrote: [...]
IIRC, this was discussednd rejected in an SF bug report. It should not
be a defined behavior for severals reasons:
[snip arguments about how confusing zip(it, it) is]
Overall, I think anyone using zip(it,it) is living in a state of sin,
drawn to the tempations of one-liners and premature optimization. They
are forsaking obvious code in favor of screwy special cases. The
behavior has been left undefined for a reason.

Then why document itertools.izip() as it is? The documentation there is
explicit enough to know that izip(it, it) will work as intended. Should
we make the documentation there less explicit to discourage people from
using the izip(it, it) idiom?

ISTM that one would use itertools.izip in order to get some
functionality not available from zip. Perhaps this is one of those
bits of functionality.

But I admit, I'm not all that familiar with itertools...

In any case, the solution seems obvious: if you want the guarantee,
use the tool that provides it.

Regards,
-=Dave
 
R

rhettinger

[Steven Bethard]
[Dave Hansen]
In any case, the solution seems obvious: if you want the guarantee,
use the tool that provides it.

True enough :)

FWIW, the itertools documentation style was intended more as a learning
device than as a specification. I combined regular documentation,
approximately equivalent generator code, examples, and recipes.
Hopefully, reading the module docs creates an understanding of what the
tools do, how to use them, how to combine them, and how to roll your
own to extend the toolset. Another goal was providing code fragments
to support scripts needing to run on Py2.2 (itertools were introduced
in Py2.3).


Raymond
 
F

Fredrik Lundh

Steven said:
Then why document itertools.izip() as it is? The documentation there is
explicit enough to know that izip(it, it) will work as intended. Should
we make the documentation there less explicit to discourage people from
using the izip(it, it) idiom?

depends on whether you interpret "equivalent" as "having similar effects"
or "corresponding or virtually identical especially in effect or function" or
if you prefer some other dictionary definition...

because there are of course plenty of subtle differences between a Python
generator C type implementation. let's see...
<function izip2 at 0x00A26670>

alright, close enough.
Traceback (most recent call last):
File said:
'izip'

hmm.
.... def next(self):
.... raise ValueError("oops!")
....Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: izip argument #1 must support iteration

oops.
.... def __iter__(self):
.... return self
.... def next(self):
.... raise ValueError("oops!")
....<generator object at 0x00A2AB48>

that's better. now let's run it:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "i.py", line 6, in izip2
result = [i.next() for i in iterables]
File "<stdin>", line 5, in next
ValueError: oops!

different stack depths. hmm.

so how equivalent must something be to be equivalent?

</F>
 
B

bonono

yet another :)

Fredrik said:
Steven said:
Then why document itertools.izip() as it is? The documentation there is
explicit enough to know that izip(it, it) will work as intended. Should
we make the documentation there less explicit to discourage people from
using the izip(it, it) idiom?

depends on whether you interpret "equivalent" as "having similar effects"
or "corresponding or virtually identical especially in effect or function" or
if you prefer some other dictionary definition...

because there are of course plenty of subtle differences between a Python
generator C type implementation. let's see...
<function izip2 at 0x00A26670>

alright, close enough.
Traceback (most recent call last):
File said:
'izip'

hmm.
... def next(self):
... raise ValueError("oops!")
...Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: izip argument #1 must support iteration

oops.
... def __iter__(self):
... return self
... def next(self):
... raise ValueError("oops!")
...<generator object at 0x00A2AB48>

that's better. now let's run it:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "i.py", line 6, in izip2
result = [i.next() for i in iterables]
File "<stdin>", line 5, in next
ValueError: oops!

different stack depths. hmm.

so how equivalent must something be to be equivalent?

</F>
 
S

Steven Bethard

Steven said:
ii. The other problem is easier to explain by example.
Let it=iter([1,2,3,4]).
What is the result of zip(*[it]*2)?
The current answer is: [(1,2),(3,4)],
but it is impossible to determine this from the docs,
which would allow [(1,3),(2,4)] instead (or indeed
other possibilities).
"""
IMO left->right is useful enough to warrant making it defined
behaviour

And in fact, it is defined behavior for itertools.izip() [1].

I don't see why it's such a big deal to make it defined behavior for
zip() too.


IIRC, this was discussednd rejected in an SF bug report. It should not
be a defined behavior for severals reasons:

[snip arguments about how confusing zip(it, it) is]
Overall, I think anyone using zip(it,it) is living in a state of sin,
drawn to the tempations of one-liners and premature optimization. They
are forsaking obvious code in favor of screwy special cases. The
behavior has been left undefined for a reason.

Then why document itertools.izip() as it is? The documentation there is
explicit enough to know that izip(it, it) will work as intended. Should
we make the documentation there less explicit to discourage people from
using the izip(it, it) idiom?
[snip]

But technically speaking, you are still referring to the implementation
detail of izip(), not the functionality of izip().

I do now agree with another poster that the documentation of both zip
and izip should state clear that the order of picking from which
iterable is undefined or can be changed from implementation to
implementation, to avoid this kind of temptation.

Actually, it's part of the specificiation. Read the itertools
documentation[1]:

"""
izip(*iterables)
Make an iterator that aggregates elements from each of the
iterables. Like zip() except that it returns an iterator instead of a
list. Used for lock-step iteration over several iterables at a time.
Equivalent to:

def izip(*iterables):
iterables = map(iter, iterables)
while iterables:
result = [i.next() for i in iterables]
yield tuple(result)
"""

So technically, since itertools.izip() is "equivalent to" the Python
code above, it is part of the specification, not the implementation.

But I certainly understand Raymond's point -- the code in the itertools
documentation there serves a number of purposes other than just
documenting the behavior.

[1]http://docs.python.org/lib/itertools-functions.html#l2h-1392

STeVe
 
F

Fredrik Lundh

[Dave Hansen]
In any case, the solution seems obvious: if you want the guarantee,
use the tool that provides it.

True enough :)

FWIW, the itertools documentation style was intended more as a learning
device than as a specification. I combined regular documentation,
approximately equivalent generator code, examples, and recipes.
Hopefully, reading the module docs creates an understanding of what the
tools do, how to use them, how to combine them, and how to roll your
own to extend the toolset.

maybe it's time to change "equivalent to" to "similar to", to avoid
messing things up for people who reads the mostly informal library
reference as if it were an ISO specification.

</F>
 
R

rhettinger

FWIW, the itertools documentation style was intended more as a learning
[Fredrik Lundh]
maybe it's time to change "equivalent to" to "similar to", to avoid
messing things up for people who reads the mostly informal library
reference as if it were an ISO specification.

Will do.

This is doubly a good idea because there are small differences in
argument processing. For example, count() makes an immediate check for
a numerical argument but the generator version won't recognize the
fault until the count(x).next() is called.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top