expression form of one-to-many dict?

S

Steven Bethard

So I end up writing code like this a fair bit:

map = {}
for key, value in sequence:
map.setdefault(key, []).append(value)

This code basically constructs a one-to-many mapping -- each value that
a key occurs with is stored in the list for that key.

This code's fine, and seems pretty simple, but thanks to generator
expressions, I'm getting kinda spoiled. ;) I like being able to do
something like the following for one-to-one mappings:

dict(sequence)

or a more likely scenario for me:

dict((get_key(item), get_value(item) for item in sequence)

The point here is that there's a simple sequence or GE that I can throw
to the dict constructor that spits out a dict with my one-to-one mapping.

Is there a similar expression form that would spit out my one-to-many
mapping?

Steve
 
T

Tim Peters

[Steven Bethard]
So I end up writing code like this a fair bit:

map = {}
for key, value in sequence:
map.setdefault(key, []).append(value)

This code basically constructs a one-to-many mapping -- each
value that a key occurs with is stored in the list for that key.

This code's fine, and seems pretty simple, but thanks to generator
expressions, I'm getting kinda spoiled. ;) I like being able to do
something like the following for one-to-one mappings:

dict(sequence)

or a more likely scenario for me:

dict((get_key(item), get_value(item) for item in sequence)

The point here is that there's a simple sequence or GE that I can
throw to the dict constructor that spits out a dict with my one-to-
one mapping.

It's a simple sequence because it's a simple task. It's even simpler
than perhaps it "should be", since it arbitrarily decides that, if
more than one instance of some key is seen, it will only remember the
value associated with the last instance of that key.
Is there a similar expression form that would spit out my one-to-
manymapping?

There's a straightforward one-liner in 2.4 (but not notably concise),
if your keys are totally ordered:

from itertools import groupby
d = dict((k, map(get_value, g))
for k, g in groupby(sorted(sequence, key=get_key),
key=get_key))

The real purpose of that is to increase your appreciation for your
original spelling <0.2 wink>.
 
S

Steven Bethard

Tim said:
It's a simple sequence because it's a simple task. It's even simpler
than perhaps it "should be", since it arbitrarily decides that, if
more than one instance of some key is seen, it will only remember the
value associated with the last instance of that key.

Good point. That's sort of an arbitrary behavior decision, which, while
reasonable in most situations is not desirable in mine. In thinking
about this, it struck me that one solution is to change that behavior:
.... def __init__(*args, **kwds):
.... self, args = args[0], args[1:]
.... self._dict = {}
.... self.update(*args, **kwds)
.... def __getitem__(self, key):
.... if not key in self._dict:
.... self._dict[key] = []
.... return self._dict[key]
.... def __setitem__(self, key, value):
.... self[key].append(value)
.... def keys(self):
.... return self._dict.keys()
....
>>> d = OneToMany((pow(i, 13, 4), i) for i in range(10))
>>> d[0] [0, 2, 4, 6, 8]
>>> d[1] [1, 5, 9]
>>> d[3]
[3, 7]

I'll have to think about whether it would be worth keeping a class like
this around... It might make working with this kind of data
interactively simpler...

Thanks for the comments!

Steve
 
L

Larry Bates

Steven,

Suggestion: It is a bad idea to name any variable
"map". When you do, you destroy your ability to call
Python's map function. Same goes for "list", "str",
or any other built-in function.

If you haven't been bitten by this you will, I was.

Larry Bates
 
S

Steven Bethard

Larry said:
Suggestion: It is a bad idea to name any variable
"map". When you do, you destroy your ability to call
Python's map function. Same goes for "list", "str",
or any other built-in function.

If you haven't been bitten by this you will, I was.

A good reminder for all the newbies out there.

Sorry, I renamed my variables to simplify the example -- my names
usually look like '<key>_<value>_map' where <key> and <value> describe
the items in the dict. Since the generic example didn't really have a
description for the keys or values, I stripped it down to map. Fine for
the example, but I should have realized it would draw this comment
(mainly because I probably would have posted a similar one myself if I
had seen the example). ;)

Fortunately, after 2+ years with Python, the risk of me being "bitten"
again by this is pretty small. ;)

Actually, it's even smaller now, because I've pretty much removed map
from all my code in favor of list comprehensions, which I find much
easier to read.

Steve
 
F

Fernando Perez

Steven said:
Actually, it's even smaller now, because I've pretty much removed map
from all my code in favor of list comprehensions, which I find much
easier to read.

While I agree that listcomps are more readable in most cases (and certainly for
all cases with any amount of complexity in the listcomp), I still find that
map is hard to beat for the simple case of a callable foo:

outlist = map(foo,inlist)

is still better in my book, and far more readable, than

outlist = [foo(x) for x in inlist]

The map form, in this case, parses instantly in my brain, while the listcomp
certainly takes a few cycles. And note that I'm not talking about the typing
conciseness, but about the effort for my brain. But maybe I'm just wired
funny :)

Since I tend to have a lot of code like this, I really cringe when I hear of a
desire to do away with map altogether. I know I could rewrite it myself in
all my code, but the beauty of these builtins is that they are always there...

Cheers,

f
 
S

Steven Bethard

Fernando said:
outlist = map(foo,inlist)

is still better in my book, and far more readable, than

outlist = [foo(x) for x in inlist]

The map form, in this case, parses instantly in my brain, while the listcomp
certainly takes a few cycles. And note that I'm not talking about the typing
conciseness, but about the effort for my brain. But maybe I'm just wired
funny :)

Well, different at least. I find the map one harder to parse mentally.
And not for lack of experience with functional programming. But to
each their own. =)

Steve
 
F

Fredrik Lundh

Steven said:
Well, different at least. I find the map one harder to parse mentally.

if you have trouble parsing a function call, I'm glad I don't have to maintain
your Python programs...

</F>
 
S

Steven Bethard

Fredrik said:
if you have trouble parsing a function call, I'm glad I don't have to maintain
your Python programs...

I don't know what I said to upset you so, but I do apologize.

If you could tell me what it was about my statement (that I find a list
comprehension to be a clearer description of program flow than a map
application[1]) that so insulted you, I would be glad to avoid such
comments in the future, if it would avoid such vicious replies from you.

Steve

[1] In my mind, the program flow is spelled out explicitly in the list
comprehension and only implicitly in the map application. Thus the list
comprehension is clearer to me.
 
T

Terry Reedy

In respect to map(func, seq) versus [func(i) for i in seq], for
pre-existing func, 'OP' wrote
[and]
[1] In my mind, the program flow is spelled out explicitly in the list
comprehension and only implicitly in the map application. Thus the list
comprehension is clearer to me.

For pre-existing func, I slightly prefer the map form because the
sequential program flow is *not* spelled out. So I can more easily see the
sequence items mapped in parallel. On the other hand, I probably prefer
[i-expression for i in seq] to map(lambda i: i-expression, seq) because I
find the (unnecessary) mental construction of a function object to be a
compensating burden.

I can easily imagine that others will find themselves more comfortably on
one side or the other of the mental fence I find myself sitting on ;-).

Terry J. Reedy
 
N

Nick Craig-Wood

Steven Bethard said:
map = {}
for key, value in sequence:
map.setdefault(key, []).append(value)

I was thinking about exactly that the other day, when converting some
perl to python.

[ digression: In perl, you do

push @{$map->{$key}}, $value

If $map->{$key} doesn't exist is is autovivified into an array (since
its in array context). Now thats exactly the sort of magic that gives
perl a bad name ;-) ]

However, one thing I noticed is that a list is created and destroyed
as the second parameter to setdefault whether or not it is used in
map, which strikes me as a bit wasteful. You obviously can't use the
same list there either. If it was an object with a more expensive
constructor then you'd notice it more.

It would be nice if setdefault didn't evaluate the second argument
unless it needed it. However I guess it would have to be a language
feature to do that.

Here are some timings

$ /usr/lib/python2.3/timeit.py -s 'sequence=zip(range(1000),range(1000))' 'map = {}
for key, value in sequence:
map.setdefault(key, []).append(value)'
1000 loops, best of 3: 1.42e+03 usec per loop

$ /usr/lib/python2.3/timeit.py -s 'sequence=zip(range(1000),[ i%11 for i in range(1000)])' 'map = {}
for key, value in sequence:
map.setdefault(key, []).append(value)'
1000 loops, best of 3: 1.57e+03 usec per loop

$ /usr/lib/python2.3/timeit.py -s 'sequence=zip(range(1000),range(1000))' 'map = {}
for key, value in sequence:
if map.has_key(key):
map[key].append(value)
else:
map[key] = [ value ]'
1000 loops, best of 3: 1.1e+03 usec per loop

$ /usr/lib/python2.3/timeit.py -s 'sequence=zip(range(1000),[ i%11 for i in range(1000)])' 'map = {}
for key, value in sequence:
if map.has_key(key):
map[key].append(value)
else:
map[key] = [ value ]'
1000 loops, best of 3: 1.11e+03 usec per loop


Not that timing is everything of course ;-)
 
M

Mike Meyer

Nick Craig-Wood said:
It would be nice if setdefault didn't evaluate the second argument
unless it needed it. However I guess it would have to be a language
feature to do that.

Personally, I'd love a language feature that let you create a function
that didn't evaluate arguments until they were actually used - lazy
evaluation. That lets you write the C ?: operator as a function, for
a start.

Hmmm. No, iterators can't be used to fake it. Oh well.

<mike
 
F

Fredrik Lundh

Mike said:
Personally, I'd love a language feature that let you create a function
that didn't evaluate arguments until they were actually used - lazy
evaluation. That lets you write the C ?: operator as a function, for
a start.

Hmmm. No, iterators can't be used to fake it. Oh well.

if you can get some collaboration from the function itself, you can fake
anything:

def foo_default():
while 1:
print "MAKE A FOO"
yield "foo"
foo_default = foo_default()

class mydict(dict):
def setdefault(self, key, default=None):
try:
return self[key]
except KeyError:
if hasattr(default, "next"):
default = default.next()
self[key] = default
return default

d = mydict()
d["spam"] = "bacon"

print d.setdefault("spam", foo_default)
print d.setdefault("egg", foo_default)

</F>
 
S

Steven Bethard

Mike said:
Personally, I'd love a language feature that let you create a function
that didn't evaluate arguments until they were actually used - lazy
evaluation. That lets you write the C ?: operator as a function, for
a start.

You can fake it with generator expresions:

.. >>> class C(object):
.. ... def __init__(self, x):
.. ... print x
.. ...
.. >>> def unpack(x):
.. ... try:
.. ... x, = x
.. ... except TypeError:
.. ... pass
.. ... return x
.. ...
.. >>> def f(x):
.. ... print "f"
.. ... print unpack(x)
.. ...
.. >>> f(C("created"))
.. created
.. f
.. <__main__.C object at 0x01140790>
.. >>> f(C("created") for _ in [1])
.. f
.. created
.. <__main__.C object at 0x0114F810>

Of course, the syntax here isn't great -- create a generator expression
that yields only one item, and call unpack on all function arguments to
convert them from generators to items as necessary... But it does
achieve the stated goal -- with the GE, the C object isn't created until
unpack is called from inside f.

Faking ?: this way could look like:

.. >>> def cond(test, true_exp, false_exp):
.. ... if test:
.. ... return unpack(true_exp)
.. ... else:
.. ... return unpack(false_exp)
.. ...
.. >>> cond(True,
.. ... (C("true exp") for _ in [1]),
.. ... (C("false exp") for _ in [1]))
.. true exp
.. <__main__.C object at 0x0114F730>
.. >>> cond(False,
.. ... (C("true exp") for _ in [1]),
.. ... (C("false exp") for _ in [1]))
.. false exp
.. <__main__.C object at 0x0114F8B0>

I don't know how often this is really necessary, but as it is
implementable in current Python, if you really liked it, you could argue
for some syntactic sugar for the construct...

Steve
 
B

Bengt Richter

Mike said:
Personally, I'd love a language feature that let you create a function
that didn't evaluate arguments until they were actually used - lazy
evaluation. That lets you write the C ?: operator as a function, for
a start.

Hmmm. No, iterators can't be used to fake it. Oh well.

if you can get some collaboration from the function itself, you can fake
anything:

def foo_default():
while 1:
print "MAKE A FOO"
yield "foo"
foo_default = foo_default()

class mydict(dict):
def setdefault(self, key, default=None):
try:
return self[key]
except KeyError:
if hasattr(default, "next"):
default = default.next()
self[key] = default
return default

d = mydict()
d["spam"] = "bacon"

print d.setdefault("spam", foo_default)
print d.setdefault("egg", foo_default)
Just the same, it would be interesting to be able to create special-form style functions easily
by designating arbitrary arguments for deferred evaluation, to which eval could safely be applied
(or maybe a restricted unquote_arg function for better safety).
E.g., double back-tick is a syntax error now, so you could write

def ternary(c, ``t, ``f):
if c: return eval(t)
else: return eval(f)

(Of course, it's also interesting to speculate about what kind of first class object
would be returned by
def foo(``x): return x
and whether ``x should be a generally valid expression, not just specialized for arg list setup ;-)

(note that ``x is not quite the same as automated lambda:x -- nor lambda x=x:x -- wrapping,
since the latter would evaluate x for the lambda and the former accesses the last value set
in the closure cell:
... v = 'a'
... x = lambda:v
... v = 'b'
... y = lambda:v
... return x,y
...

['b', 'b']

Anyway, with a suitable `` quoting operator your setdefault could be

def setdefault(self, key, ``default=None):
...
default = eval(default)
... etc

BTW, note that

def foo(``default=expr): ...

and

def foo(``default=``expr): ...

could mean different things, depending on preferred semantics and ``expr as
a generally valid expression.

Regards,
Bengt Richter
 
M

Mike Meyer

(or maybe a restricted unquote_arg function for better safety).
E.g., double back-tick is a syntax error now, so you could write

def ternary(c, ``t, ``f):
if c: return eval(t)
else: return eval(f)

Actually, I think it would be more pythonic if the indication of
non-evaluation happened at the function invocation instead of the
function definition. Having it at the function definition makes it
implicit at the function invocation point. Having functions that
expect objects of certain types (closures in this case, right?) is
pretty much standard operating procedure.

<mike
 
D

Doug Holton

Mike said:
Personally, I'd love a language feature that let you create a function
that didn't evaluate arguments until they were actually used - lazy
evaluation. That lets you write the C ?: operator as a function, for
a start.

Hmmm. No, iterators can't be used to fake it. Oh well.

That is a brilliant idea. I would suggest requesting such a feature for
Python 3.0: http://www.python.org/cgi-bin/moinmoin/Python3.0

However, Python 3.0 is likely years away. If you want to know how to
use this feature today, consult Fredrik Lundh.
 
P

Paul Rubin

Mike Meyer said:
Personally, I'd love a language feature that let you create a function
that didn't evaluate arguments until they were actually used - lazy
evaluation. That lets you write the C ?: operator as a function, for
a start.

Hmmm. No, iterators can't be used to fake it. Oh well.

You can fake it with lambda, but the syntax is contorted.

a = condition ? expr1 : expr2;

becomes

a = ((lambda: expr2, lambda: expr1)[bool(condition)])()
 
A

Alex Martelli

Mike Meyer said:
Actually, I think it would be more pythonic if the indication of
non-evaluation happened at the function invocation instead of the
function definition. Having it at the function definition makes it

As in, say, calling
x = ternary(c, lambda:t, lambda:f)
? The 'lambda:' is a (not nice-looking, but...) "indication of
non-evaluation"... or am I misundertanding what you're saying?

Of course, the implementation of ternary could then use 'apply' rather
than 'eval' (or, simply call t() or f() as appropriate, identically;-).


Alex
 
F

Fernando Perez

Doug said:
That is a brilliant idea. I would suggest requesting such a feature for
Python 3.0: http://www.python.org/cgi-bin/moinmoin/Python3.0

Just as a reference, Mathematica does have such a feature, in the form of the
HoldAll, HoldFirst, etc. function attributes. It can come in quite handy. I
actually used it to write a little routine to auto-generate python modules for
mathematica variables of certain types, without having to specify a list of
strings for their names. The evaluation control allowed me to give it a very
clean interface, while the equivalent python2mathematica routine requires a
list of variable names (strings) as input, and plays sys._getframe tricks with
it. Not very pleasant.

So yes, I think it would be a really nifty feature to have, though I haven't
really thought about how well it meshes (or not) with the rest of python's
design.

Cheers,

f
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top