Real-world use cases for map's None fill-in feature?

  • Thread starter Raymond Hettinger
  • Start date
R

rurpy

Raymond Hettinger said:
[[email protected]]
How well correlated in the use of map()-with-fill with the
(need for) the use of zip/izip-with-fill?
[raymond]
Close to 100%. A non-iterator version of izip_longest() is exactly
equivalent to map(None, it1, it2, ...).
[[email protected]]
If I use map()
I can trivially determine the arguments lengths and deal with
unequal length before map(). With iterators that is more
difficult. So I can imagine many cases where izip might
be applicable but map not, and a lack of map use cases
not representative of izip use cases.

You don't seem to understand what map() does. There is no need to
deal with unequal argument lengths before map(); it does the work for
you. It handles iterator inputs the same way. Meditate on this:

def izip_longest(*args):
return iter(map(None, *args))

Modulo arbitrary fill values and lazily evaluated inputs, the semantics
are exactly what is being requested. Ergo, lack of use cases for
map(None,it1,it2) means that izip_longest(it1,it2) isn't needed.

"lazily evaluated inputs" is exactly what I was pointing
out and what make your izip_longest() above not the
same as map(None,...), and hence, your conclusion
invalid. Specifically....

def izip_longest(*args):
return iter(map(None, *args))
f1 = file ("test.dat")
f2 = file ("test.dat")
it = izip2 (f1, f2)
while 1:
h1, h2 = it.next ()
print h1.strip(), h2

izip2() in the above code is a "real" izip_longest
based on a version posted in this thread.
3347 3347
-3487 -3487
2011 2011
239 239
....

Replace izip2 in the above code with your izip_longest
[wait, wait, wait,... after a few minutes type ^c, nothing
happens, close window].

I don't think your izip_longest is at all equivalent to
the proposed izip, and thus there may well be uses
cases for izip that aren't represented by imap(None,...)
use cases, which is what I said. That is, I might have
a use case for izip which I would never even consider
map() for.
 
A

Aahz

Request for more information
----------------------------
My request for readers of comp.lang.python is to search your own code
to see if map's None fill-in feature was ever used in real-world code
(not toy examples). I'm curious about the context, how it was used,
and what alternatives were rejected (i.e. did the fill-in feature
improve the code). Likewise, I'm curious as to whether anyone has seen
a zip-style fill-in feature employed to good effect in some other
programming language.

I've counted 63 cases of ``map(None, ...`` in my company's code base.
You're probably right that most of them could/should use zip() instead;
I see at least a few cases of

map(None, field_names, values)

but it's not clear what the expectation is for the size of the two lists.
(None of the uses were created by me -- I abhor map(). ;-)
 
R

Raymond Hettinger

[Aahz]
I've counted 63 cases of ``map(None, ...`` in my company's code base.
You're probably right that most of them could/should use zip() instead;
I see at least a few cases of

map(None, field_names, values)

but it's not clear what the expectation is for the size of the two lists.
(None of the uses were created by me -- I abhor map(). ;-)

Thanks for the additional datapoint. I'm most interested in the code
surrounding the few cases with multiple inputs and whether the code is
designed around equal or unequal length inputs. The existence of the
latter is good news for the proposal. Its absence would be a
contra-indication. If you get a chance, please look at those few
multi-input cases.

Thanks,


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
...
Thanks for the additional datapoint. I'm most interested in the code
surrounding the few cases with multiple inputs and whether the code is
designed around equal or unequal length inputs. The existence of the
latter is good news for the proposal. Its absence would be a
contra-indication. If you get a chance, please look at those few
multi-input cases.

ISTR there's also a plan to eliminate map in Python 3.0 in favor of
list comprehensions. That would get rid of the possibility of using
map(None...) instead of izip_longest. This needs to be thought through.
 
R

Raymond Hettinger

[Paul Rubin]
ISTR there's also a plan to eliminate map in Python 3.0 in favor of
list comprehensions. That would get rid of the possibility of using
map(None...) instead of izip_longest. This needs to be thought through.

Not to fear. If map() eventually loses its built-in status, it will
almost certainly reappear in the functional module. Also, if Py3.0
changes the balance of needs and tools, I will certainly adapt the
itertools module as needed.
 
A

Andrae Muys

I am still left with a difficult to express feeling of
dissatifaction at this process.

Plese try to see it from the point of view of
someone who it not a expert at Python:

Here is izip().
My conception is it takes two sequence generators
and matches up the items from each. (I am talking
overall coceptual models here, not details.)
Here is my problem.
I have two files that produce lines and I want to
compare each line.
Seems like a perfect fit.

So I read that izip() only goes to shortest itereable,
I think, "why only the shortest? why not the longest?
what's so special about the shortest?"
At this point explanations involving lack of uses cases
are not very convincing. I have a use. All the
alternative solutions are more code, less clear, less
obvious, less right. But most importantly, there
seems to be a symmetry between the two cases
(shortest vs longest) that makes the lack of
support for matching-to-longest somehow a
defect.

Now if there is something fundamental about
matching items in parallel lists that makes it a
sensible thing to do only for equal lists (or to the
shortest list) that's fine. You seem to imply that's
the case by referencing Haskell, ML, etc. If so,
that needs to be pointed out in izip's docs.
(Though nothing I have read in this thread has
been convincing.)

Because a simple call to chain() is an obvious (it's the very first
itertool in the docs), efficient, and straight forward solution to the
problem of padding a shorter iterable.

izip(chain(shorter, pad), longer)

It is not so straight forward to arrange the truncation of an iterable;
moreover this is a far more common case as it is not uncommon to use
infinite iterators in itertable based code.

izip(count(), iter(file))

which doesn't terminate without truncation. That a common use case
fails to terminate is generally considered 'something fundamental'.

The conversion between them is of course a matter of using takewhile
and an appropriate fence.

Padding in the presence of truncation:
def fence(): pass
takewhile(lambda x: x[0] != fence and x[1] != fence,
izip(chain(iter1, repeat(fence)),
chain(iter2, repeat(fence))))

Truncation in the presence of padding:
def fence(): pass
takewhile(lambda x: x[0] != fence or x[1] != fence,
izip(chain(iter1, repeat(fence)),
chain(iter2, repeat(fence))))

Of course you can use any value not in the domain of iter1 or iter2 as
a fence, but a closure is guarrenteed to satisfy that requirement and
hence keeps the code generic. In the padding example, if you actually
care what value is used for pad then either you can either replace
fence, or wrap the result in an imap.

Andrae Muys
 
R

rurpy

Andrae Muys said:
Because a simple call to chain() is an obvious (it's the very first
itertool in the docs), efficient, and straight forward solution to the
problem of padding a shorter iterable.

izip(chain(shorter, pad), longer)

And how do you tell, a priori, which iterable will turn out to
be the shortest?
It is not so straight forward to arrange the truncation of an iterable;
moreover this is a far more common case as it is not uncommon to use
infinite iterators in itertable based code.

It may be more common (which is arguable), but that
does not mean the "iterate-to-longest" is uncommon,
or is not common enough to be worth bothering about.
izip(count(), iter(file))

which doesn't terminate without truncation. That a common use case
fails to terminate is generally considered 'something fundamental'.

Nobody is suggesting changing the current behavior
of izip() in this case.
The conversion between them is of course a matter of using takewhile
and an appropriate fence.

"of course"?
Padding in the presence of truncation:
def fence(): pass
takewhile(lambda x: x[0] != fence and x[1] != fence,
izip(chain(iter1, repeat(fence)),
chain(iter2, repeat(fence))))

Truncation in the presence of padding:
def fence(): pass
takewhile(lambda x: x[0] != fence or x[1] != fence,
izip(chain(iter1, repeat(fence)),
chain(iter2, repeat(fence))))

Of course you can use any value not in the domain of iter1 or iter2 as
a fence, but a closure is guarrenteed to satisfy that requirement and
hence keeps the code generic. In the padding example, if you actually
care what value is used for pad then either you can either replace
fence, or wrap the result in an imap.

Thank you for the posting Andrae, it has increased my
knowledge.
But my original point was there are cases (often involving
file iterators) where the problem's complexity seems to be
on the same order as problems involving iterate-to-shortest
solutions, but, while the latter have simple, one function
call solutions, solutions for the former are far more complex
(as your post illustrates). This seems at best unbalanced.
When encountered by someone with less than your level of
expertise, it leads to the feeling, "jeez, why is this simple
problem take hours to figure out and a half dozen function
calls?!?" And please note, I am complaining about a general
problem with Python. The izip() issue was just (at the time)
the most recent trigger of that reaction. (Most recent is
<string>.translate() but that is for a new thread.)
 
A

Andrae Muys

Thank you for the posting Andrae, it has increased my
knowledge. No problem, happy to help.
But my original point was there are cases (often involving
file iterators) where the problem's complexity seems to be
on the same order as problems involving iterate-to-shortest
solutions, but, while the latter have simple, one function
call solutions, solutions for the former are far more complex
(as your post illustrates). This seems at best unbalanced.
When encountered by someone with less than your level of
expertise, it leads to the feeling, "jeez, why is this simple
problem take hours to figure out and a half dozen function
calls?!?"

I agree, having had to think about how to implement padding with
truncating api to implement your use-case, padding is a useful feature
to have available. I didn't mean to imply otherwise. You asked why
truncating is a common choice in the design of izip-like functions
(Python, ML, Haskell, Scheme); my post was an attempt to answer that.
The summary of my post is:

1. Either can be implemented in terms of the other.
2. Using a truncating zip instead of a padding zip leads to an
incorrect result.
3. Using a padding zip instead of a truncating zip leads to
non-termination.
4. A terminating bug is preferred to a non-terminating bug.

Hence zip is generally truncating.

Andrae Muys
 
R

rurpy

Andrae said:
I agree, having had to think about how to implement padding with
truncating api to implement your use-case, padding is a useful feature
to have available. I didn't mean to imply otherwise. You asked why
truncating is a common choice in the design of izip-like functions
(Python, ML, Haskell, Scheme); my post was an attempt to answer that.
The summary of my post is:

1. Either can be implemented in terms of the other.
2. Using a truncating zip instead of a padding zip leads to an
incorrect result.
3. Using a padding zip instead of a truncating zip leads to
non-termination.

(I assume "erroneously" should be inserted in front
of "Using")
OK.
4. A terminating bug is preferred to a non-terminating bug.

This is not self-evident to me. Is this somehow
related to the design philosophy of functional
languages? I was never aware of such a preference
in conventional procedural languages (though I
could easily be wrong).

It also seems directly counter to Python's "no errors
should pass silently" dogma -- a non termination
seems more noticable than silent erroneous results.
Hence zip is generally truncating.

I see your point in a theoretical sense, but it still
seems to me be a pretty weak reason for making
a practical decision about what should be in Python,
particularly when the justification is being transfered
from a functional programming domain to an
object/procedural one. Is any language feature that
might result in non-termination if erroneously used,
to be banned? That doesn't seem very reasonble.
(I realize you were explaining the rationale behind
the FP choice, not necessarily taking a position
on what should be in Python so the above comment
is directed to the discussion at large.)
 
A

Andrae Muys

This is not self-evident to me. Is this somehow
related to the design philosophy of functional
languages? I was never aware of such a preference
in conventional procedural languages (though I
could easily be wrong).

This is not a paradigm based preference. It derives from the basic
fact that you can test for an incorrect result, but you can't test for
non-termination. Therefore if you have to choose between making it
easy to inadvertantly introduce either non-termination or a trivial
logic error, you are better off chosing the trivial logic error. That
preference is independent of which
structured/object/functional/logic/constraint/concurrent/reactive
school of programming you adhere to.
It also seems directly counter to Python's "no errors
should pass silently" dogma -- a non termination
seems more noticable than silent erroneous results.

Two problems with that. One we are dealing with a bug, not an error.
Two, even if we classified the bug as an error noticing non-termination
requires solving the halting-problem whereas noticing an erroneous
result simply requires a unit-test.

The reason why you see this in FP and not in OOP is that
infinite/unbounded/v.large values are trivial to define in FP and
(generally) not in OOP. Consequently the potential for inadvertant
non-termination simply doesn't arise in OOP.
I see your point in a theoretical sense, but it still
seems to me be a pretty weak reason for making
a practical decision about what should be in Python,
particularly when the justification is being transfered
from a functional programming domain to an
object/procedural one. Is any language feature that
might result in non-termination if erroneously used,
to be banned? That doesn't seem very reasonble.

Of course not --- only by making python non-turing complete could such
a thing be achieved. But you have overstated the position. This isn't a
matter of outlawing function that could be used to write a
non-terminating prodcedure. It's about a simple API design decision.
A choice between two options. It doesn't require the designer to
decide that one is *right* and one *wrong*. Just that one is at least
slightly better than the other; or even that one was chosen simply
because either was better than neither. API design decisions are not
personal vendettas against your use-case :).

Andrae Muys
 
S

Steven D'Aprano

This is not self-evident to me. Is this somehow
related to the design philosophy of functional
languages? I was never aware of such a preference
in conventional procedural languages (though I
could easily be wrong).

It also seems directly counter to Python's "no errors
should pass silently" dogma -- a non termination
seems more noticable than silent erroneous results.

You are assuming that the function in question returns
a result *quickly*, and that any delay obvious to the
user is clearly a problem ("damn program has hung...").

Consider a generic function which may take "a long
time" to return a result. How long do you wait before
you conclude it has hung? A minute? An hour? A day? A
month? A year? If you have some knowledge of the
expected running time you can make a good estimate
("well, there are only a thousand records in the
database, so even if it takes an entire minute to check
each record, if it hasn't returned after 17 hours, it
is probably hung"). But for arbitrary problems, you
might not know enough about the function and data to
make that estimate. Some calculations do have to run
for days or weeks or months to get a correct result,
and some are impossible to predict in advance.

So, in general, it is impossible to tell the difference
between a non-terminating bug and a correct calculation
that would have finished if you had just waited a
little longer. (How much is a little longer?) Hence, in
general, a terminating wrong answer is easier to test
for than a non-terminating bug.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top