can list comprehensions replace map?

D

David Isaac

Newbie question:

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

Thanks,
Alan Isaac
 
M

Michael Hoffman

David said:
Newbie question:

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

It ain't broke so I'd stick with what you're doing. Even if map() is
removed as a builtin, it will surely stick around in a module.
 
M

Michael Hoffman

Michael said:
It ain't broke so I'd stick with what you're doing. Even if map() is
removed as a builtin, it will surely stick around in a module.

Addendum: I know this doesn't answer your question, so if you were
asking out of purely academic interest, then someone else will probably
post another answer.
 
L

Larry Bates

This isn't really a question about list
comprehensions as you are using a "feature"
of map by passing None as the function to be
executed over each list element:

This works when len(x) > len(y):

zip(x,y+(len(x)-len(y))*[None])

This works when len(y) >=0 len(x):

zip(x+(len(x)-len(y))*[None],y)

I would probably wrap into function:

def foo(x,y):
if len(x) > len(y):
return zip(x,y+(len(x)-len(y))*[None])

return zip(x+(len(x)-len(y))*[None],y)

Larry Bates
 
A

Andrew Dalke

David said:
I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

If you know that len(x)>=len(y) and you want the same behavior as
map() you can use itertools to synthesize a longer iterator

x = [1,2,3,4,5,6]
y = "Hi!"
from itertools import repeat, chain
zip(x, chain(y, repeat(None))) [(1, 'H'), (2, 'i'), (3, '!'), (4, None), (5, None), (6, None)]

This doesn't work if you want the result to be max(len(x), len(y))
in length - the result has length len(x).

As others suggested, if you want to use map, go ahead. It won't
disappear for a long time and even if it does it's easy to
retrofit if needed.

Andrew
(e-mail address removed)
 
P

Paolino

David said:
Newbie question:

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

Probably zip should change behaviour,and cover that case or at least
have another like 'tzip' in the __builtins__ .Dunno, I always thought
zip should not cut to the shortest list.
 
R

Raymond Hettinger

[David Isaac]
I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)
[Paolino]
Probably zip should change behaviour,and cover that case or at least
have another like 'tzip' in the __builtins__ .Dunno, I always thought
zip should not cut to the shortest list.

Heck no! For the core use case of lockstep iteration, it is almost
always a mistake to continue iterating beyond the length of the
shortest input sequence. Even for map(), the use cases are thin. How
many functions do something meaningful when one or more of their inputs
changes type and becomes a stream of Nones. Consider for example,
map(pow, seqa, seqb) -- what good can come of one sequence or the other
suddenly switching to a None mode?

As Andrew pointed out, if you really need that behavior, it can be
provided explicity. See the padNone() recipe in the itertools
documentation for an easy one-liner.

IMO, reliance on map's None fill-in feature should be taken as a code
smell indicating a design flaw (not always, but usually). There is a
reason that feature is missing from map() implementations in some other
languages.

In contrast, the existing behavior of zip() is quite useful. It allows
some of the input sequences to be infinite:

zip(itertools.count(1), open('myfile.txt'))



Raymond
 
S

Steven Bethard

David said:
I ran into a need for something like map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x.

I almost never run into this situation, so I'd be interested to know why
you need this. Here's one possible solution:

py> import itertools as it
py> def zipfill(*lists):
.... max_len = max(len(lst) for lst in lists)
.... return zip(*[it.chain(lst, it.repeat(None, max_len - len(lst)))
.... for lst in lists])
....
py> zipfill(range(4), range(5), range(3))
[(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, None), (None, 4, None)]

If you prefer, you can replace the call to zip with it.zip and get an
iterator back instead of a list.

STeVe
 
P

Paolino

Raymond said:
[David Isaac]
I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

[Paolino]

Probably zip should change behaviour,and cover that case or at least
have another like 'tzip' in the __builtins__ .Dunno, I always thought
zip should not cut to the shortest list.


Heck no! For the core use case of lockstep iteration, it is almost
always a mistake to continue iterating beyond the length of the
shortest input sequence. Even for map(), the use cases are thin. How
many functions do something meaningful when one or more of their inputs
changes type and becomes a stream of Nones. Consider for example,
map(pow, seqa, seqb) -- what good can come of one sequence or the other
suddenly switching to a None mode?

As Andrew pointed out, if you really need that behavior, it can be
provided explicity. See the padNone() recipe in the itertools
documentation for an easy one-liner.

IMO, reliance on map's None fill-in feature should be taken as a code
smell indicating a design flaw (not always, but usually). There is a
reason that feature is missing from map() implementations in some other
languages.

In contrast, the existing behavior of zip() is quite useful. It allows
some of the input sequences to be infinite:

zip(itertools.count(1), open('myfile.txt'))
Right point.
Well, for my little experiences use cases in which the lists have different
lengths are rare, but in those cases I don't see the reason of not being
able
to zip to the longest one.What is really strange is that I have to use
map(None,....) for that,instead of another zip-like function which ,at
least
would be intutitive for the average user.Also map(None,...) looks like a
super-hack
and it's not elegant or readable or logic (IMO)

I think zip comes to substitute the tuple.__new__ untolerant
implementation.A dumb like me wuold expect map(tuple,[1,2,3],[2,3,4]) to
work, so pretending map(None,....) would do it is like saying that None
and tuple are near concepts, which is obviously an absurdity.

Thanks anyway, for explanations.

Paolino
 
A

Andrew Dalke

Steven said:
Here's one possible solution:

py> import itertools as it
py> def zipfill(*lists):
... max_len = max(len(lst) for lst in lists)

A limitation to this is the need to iterate over the
lists twice, which might not be possible if one of them
is a file iterator.

Here's a clever, though not (in my opinion) elegant solution

import itertools

def zipfill(*seqs):
count = [len(seqs)]
def _forever(seq):
for item in seq: yield item
count[0] -= 1
while 1: yield None
seqs = [_forever(seq) for seq in seqs]
while 1:
x = [seq.next() for seq in seqs]
if count == [0]:
break
yield x

for x in zipfill("This", "is", "only", "a", "test."):
print x

This generates

['T', 'i', 'o', 'a', 't']
['h', 's', 'n', None, 'e']
['i', None, 'l', None, 's']
['s', None, 'y', None, 't']
[None, None, None, None, '.']

This seems a bit more elegant, though the "replace" dictionary is
still a bit of a hack

from itertools import repeat, chain, izip

sentinel = object()
end_of_stream = repeat(sentinel)

def zipfill(*seqs):
replace = {sentinel: None}.get
seqs = [chain(seq, end_of_stream) for seq in seqs]
for term in izip(*seqs):
for element in term:
if element is not sentinel:
break
else:
# All sentinels
break

yield [replace(element, element) for element in term]


(I originally had a "element == tuple([sentinel]*len(seqs))" check
but didn't like all the == tests incurred.)

Andrew
(e-mail address removed)
 
A

Andrew Dalke

Me:
Here's a clever, though not (in my opinion) elegant solution ...
This seems a bit more elegant, though the "replace" dictionary is
still a bit of a hack

Here's the direct approach without using itertools. Each list is
iterated over only once. No test against a sequence element is ever
made (either as == or 'is') and the end of the sequence exception
is raised only once per input iterator.

The use of a list for the flag is a bit of a hack. If the list has
1 element then its true, no elements then its false. By doing it this
way I don't need one extra array and one extra indexing/enumeration.

def zipfill(*seqs):
count = len(seqs)
seq_info = [(iter(seq), [1]) for seq in seqs]
while 1:
fields = []
for seq, has_data in seq_info:
if has_data:
try:
fields.append(seq.next())
except StopIteration:
fields.append(None)
del has_data[:]
count -= 1
else:
fields.append(None)
if count:
yield fields
else:
break


Hmm, it should probably yield tuple(fields)

Andrew
(e-mail address removed)
 
R

Raymond Hettinger

[Paolino]
Well, for my little experiences use cases in which the lists have different
lengths are rare, but in those cases I don't see the reason of not being
able
to zip to the longest one.What is really strange is that I have to use
map(None,....) for that,instead of another zip-like function which ,at
least
would be intutitive for the average user.Also map(None,...) looks like a
super-hack
and it's not elegant or readable or logic (IMO)

I think zip comes to substitute the tuple.__new__ untolerant
implementation.A dumb like me wuold expect map(tuple,[1,2,3],[2,3,4]) to
work, so pretending map(None,....) would do it is like saying that None
and tuple are near concepts, which is obviously an absurdity.

Yes, map(None, ...) lacks grace and it would be nice if it had never
been done. The more recently implemented zip() does away with these
issues. The original was kept for backwards compatibility. That's
evolution.

My sense for the rest is that your difficulties arise from fighting the
language rather than using it as designed. Most language features are
the result of much deliberation. When design X was chosen over
alternative Y, it is a pretty good cue that X is a more harmonious way
to do things.

Some other languages chose to implement both X and Y. On the plus
side, your intuition likely matches one of the two. On the minus side,
someone else's intuition may not match your own. Also, it leads to
language bloat. More importantly, such a language provides few cues as
to how to select components that work together harmoniously.
Unfortunately, that makes it effortless to mire yourself in deep goo.

My advice is to use the language instead of fighting it. Guido has
marked the trail; don't ignore the signs unless you really know where
you're going.



Raymond


"... and soon you'll feel right as rain." -- from The Matrix
 
C

Christopher Subich

Andrew said:
Steven said:
Here's one possible solution:

py> import itertools as it
py> def zipfill(*lists):
... max_len = max(len(lst) for lst in lists)


A limitation to this is the need to iterate over the
lists twice, which might not be possible if one of them
is a file iterator.

Here's a clever, though not (in my opinion) elegant solution

import itertools

def zipfill(*seqs):
count = [len(seqs)]
def _forever(seq):
for item in seq: yield item
count[0] -= 1
while 1: yield None
seqs = [_forever(seq) for seq in seqs]
while 1:
x = [seq.next() for seq in seqs]
if count == [0]:
break
yield x

I like this solution best (note, it doesn't actually use itertools). My
naive solution:
def lzip(*args):
ilist = [iter(a) for a in args]
while 1:
res = []
count = 0
for i in ilist:
try:
g = i.next()
count += 1
except StopIteration: # End of iter
g = None
res.append(g)
if count > 0: # At least one iter wasn't finished
yield tuple(res)
else: # All finished
raise StopIteration
 
A

Andrew Dalke

Christopher said:
My naive solution: ...
for i in ilist:
try:
g = i.next()
count += 1
except StopIteration: # End of iter
g = None
...

What I didn't like about this was the extra overhead of all
the StopIteration exceptions. Eg,

zipfill("a", range(1000))

will raise 1000 exceptions (999 for "a" and 1 for the end of the range).

But without doing timing tests I'm not sure which approach is
fastest, and it may depend on the data set.

Since this is code best not widely used, I don't think it's something
anyone should look into either. :)

Andrew
(e-mail address removed)
 
P

Peter Otten

Andrew said:
Steven said:
Here's one possible solution:

py> import itertools as it
py> def zipfill(*lists):
... max_len = max(len(lst) for lst in lists)

A limitation to this is the need to iterate over the
lists twice, which might not be possible if one of them
is a file iterator.

Here's a clever, though not (in my opinion) elegant solution

import itertools

def zipfill(*seqs):
count = [len(seqs)]
def _forever(seq):
for item in seq: yield item
count[0] -= 1
while 1: yield None
seqs = [_forever(seq) for seq in seqs]
while 1:
x = [seq.next() for seq in seqs]
if count == [0]:
break
yield x

This seems a bit more elegant, though the "replace" dictionary is
still a bit of a hack

from itertools import repeat, chain, izip

sentinel = object()
end_of_stream = repeat(sentinel)

def zipfill(*seqs):
replace = {sentinel: None}.get
seqs = [chain(seq, end_of_stream) for seq in seqs]
for term in izip(*seqs):
for element in term:
if element is not sentinel:
break
else:
# All sentinels
break

yield [replace(element, element) for element in term]

Combining your "clever" and your "elegant" approach to something fast
(though I'm not entirely confident it's correct):

def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return
while 1:
yield None
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Whether we ran out of active sequences is only tested once per sequence.

Fiddling with itertools is always fun, but feels a bit like reinventing the
wheel in this case. The only excuse being that you might need a lazy
map(None, ...) someday...

Peter
 
A

Andrew Dalke

Peter said:
Combining your "clever" and your "elegant" approach to something fast
(though I'm not entirely confident it's correct):

def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return
while 1:
yield None
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Ohh, that's pretty neat passing in 'done' via a mutable default argument.

It took me a bit to even realize why it does work. :)

Could make it one line shorter with

from itertools import chain, izip, repeat
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return []
return repeat(None)
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Go too far on that path and the code starts looking likg

from itertools import chain, izip, repeat
forever, table = repeat(None), {0: []}.get
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
return table(done[0], forever)
return izip(*[chain(seq, done_iter()) for seq in seqs])

Now add the performance tweak....

def done_iter(done=[len(seqs)], forever=forever, table=table)

Okay, I'm over it. :)

Andrew
(e-mail address removed)
 
S

Scott David Daniels

Peter said:
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return
while 1:
yield None
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Can I play too? How about:
import itertools

def fillzip(*seqs):
def Nones(countactive=[len(seqs)]):
countactive[0] -= 1
while countactive[0]:
yield None
seqs = [itertools.chain(seq, Nones()) for seq in seqs]
return itertools.izip(*seqs)

--Scott David Daniels
(e-mail address removed)
 
P

Peter Otten

Andrew said:
Peter said:
Combining your "clever" and your "elegant" approach to something fast
(though I'm not entirely confident it's correct):

def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return
while 1:
yield None
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Ohh, that's pretty neat passing in 'done' via a mutable default argument.

It took me a bit to even realize why it does work. :)

Though I would never have come up with it, were it not for the juxtaposition
of your two variants (I initially disliked the first and tried to improve
on the second), it is an unobvious merger :)
It's a bit fragile, too, as
Could make it one line shorter with
from itertools import chain, izip, repeat
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return []
return repeat(None)
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

that won't work because done_iter() is now no longer a generator.
In effect you just say

seqs = [chain(seq, repeat(None)) for seq in seqs[:-1]] + [chain(seq[-1],
[])]

I tried

class Done(Exception):
pass

pad = repeat(None)
def fillzip(*seqs):
def check(active=[len(seqs)]):
active[0] -= 1
if not active[0]:
raise Done
# just to turn check() into a generator
if 0: yield None
seqs = [chain(seq, check(), pad) for seq in seqs]
try
for item in izip(*seqs):
yield item
except Done:
pass

to be able to use the faster repeat() instead of the while loop, and then
stared at it for a while -- in vain -- to eliminate the for item... loop.
If there were a lazy ichain(iter_of_iters) you could tweak check() to decide
whether a repeat(None) should follow it, but I'd rather not ask Raymond for
that particular addition to the itertools.
Now add the performance tweak....

def done_iter(done=[len(seqs)], forever=forever, table=table)

Okay, I'm over it. :)

Me too. I think. For now...

Peter
 
A

Andrew Dalke

Me:
Could make it one line shorter with
from itertools import chain, izip, repeat
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return []
return repeat(None)
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Peter Otten:
that won't work because done_iter() is now no longer a generator.
In effect you just say

seqs = [chain(seq, repeat(None)) for seq in seqs[:-1]] + [chain(seq[-1],
[])]

It does work - I tested it. The trick is that izip takes iter()
of the terms passed into it. iter([]) -> an empty iterator and
iter(repeat(None)) -> the repeat(None) itself.

'Course then the name should be changed.

Andrew
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top