Pre-PEP: Dictionary accumulator methods

R

Raymond Hettinger

d.count(key, qty)
[Bengt Richter]
How about an efficient duck-typing value-incrementer to replace both?

There is some Zen of Python that argues against this interesting idea. Also, I'm
concerned that by folding appendlist() into valadd() we would lose an important
cue that a list is being built-up.

Another issue is that duck-typed multiple-dispatch is only readable when the
type of the input argument is obvious from the surrounding code. Given
d.valadd(x), it is hard to grok if x was created by some code far away. Since a
primary goal is readability and clarity, having two separate, concrete methods
is likely better than having a single more-abstracted multi-purpose method. The
performance gains are just icing on the cake.


I'm thinking the idea that the counting is happening with the value corresponding
to the key should be emphasised more. Hence valadd or such?

How about countkey() or tabulate()?



Raymond Hettinger
 
M

Michele Simionato

Raymond Hettinger:
Any takers for tally()?

Dunno, to me "tally" reads "counts the numbers of votes for a candidate
in an election".
We should avoid abbreviations like inc() or incr() that different people tend to
abbreviate differently (for example, that is why the new partial() function has
its "keywords" argument spelled-out). The only other issue I see with that name
is that historically incrementing is more associated with +=1 than with +=n.
Also, there are reasonable use cases for a negative n and it would be misleading
to call it incrementing when decrementing is what is intended.

I agree with Paul Rubin's argument on that issue, let's use increment()
and do not
worry about negative increments.
I'm curious. When you do use setdefault, what is the typical second
argument?

Well, I have used setdefault *very few times* in years of heavy Python
usage.
His disappearence would not bother me that much. Grepping my source
code I find that practically
my main use case for setdefault is in a memoize recipe where the result
of a function call
is stored in a dictionary (if not already there) and returned. Then I
have a second case
with a list as second argument.
Are you happy with the readability of the argument order? To me, the key and
default value are not at all related. Do you prefer having the default value
pre-instantiated on every call when the effort is likely to be wasted? Do you
like the current design of returning an object and then making a further (second
dot) method lookup and call for append or extend? When you first saw setdefault
explained, was it immediately obvious or did it taking more learning effort than
other dictionary methods? To me, it is the least explainable dictionary method.
Even when given a good definition of setdefault(), it is not immediately obvious
that it is meant to be futher combined with append() or some such. When showing
code to newbies or non-pythonistas, do they find the meaning of the current
idiom self-evident? That last question is not compelling, but it does contrast
with other Python code which tends to be grokkable by non-pythonistas and


While get_or_set would be a bit of an improvement, it is still obtuse.
Eventhough a set operation only occurs conditionally, the get always occurs.
The proposed name doesn't make it clear that the method alway returns
an object.

Honestly, I don't care about the performance arguments. However I care
a lot about
about readability and clarity. setdefault is terrible in this respect,
since most
of the time it does *not* set a default, it just get a value. So I am
always confused
and I have to read at the documentation to remind to myself what it is
doing. The
only right name would be "get_and_possibly_set" but it is a bit long to
type.
Even if a wording is found that better describes the both the get and set
operation, it is still a distractor from the intent of the combined statement,
the intent of building up a list. That is an intrinsic wording limitation that
cannot be solved by a better name for setdefault. If any change is made at all,
we ought to go the distance and provide a better designed tool rather than just
a name change.

Well, I never figured out that the intent of setdefault was to build up
a list ;)

Anyway, if I think at how many times I have used setdefault in my code
(practically
twice) and how much time I have spent trying to decipher it (any time I
reread the
code using it) I think I would have better served by NOT having the
setdefault
method available ;)

About appendlist(): still it seems a bit special purpose to me. I mean,
dictionaries
already have lots of methods and I would think twice before adding new
ones; expecially
methods that may turn out not that useful in the long range, or easily
replaceble by
user code.


Michele Simionato
 
P

Paul Rubin

Reinhold Birkenfeld said:
Well, as a non-native speaker, I had to look up this one in my
dictionary. That said, it may be bad luck on my side, but it may be that
this word is relatively uncommon and there are many others who would be
happier with increment.

It is sort of an uncommon word. As a US English speaker I'd say it
sounds a bit old-fashioned, except when used idiomatically ("let's
tally up the posts about accumulator messages") or in nonstandard
dialect ("Hey mister tally man, tally me banana" is a song about
working on plantations in Jamaica). It may be more common in UK
English. There's an expression "tally-ho!" which had something to do
with British fox hunts, but they don't have those any more.

I'd say I prefer most of the suggested alternatives (count, add,
incr/increment) to "tally".
 
R

Roose

Py2.5 is already going to include any() and all() as builtins. The
signature
does not include a function, identity or otherwise. Instead, the caller can
write a listcomp or genexp that evaluates to True or False:

any(x >= 42 for x in data)

If you wanted an identify function, that simplifies to just:

any(data)

Oh great, I just saw that. I was referring to this, which didn't get much
discussion:

http://mail.python.org/pipermail/python-dev/2005-February/051556.html

but it looks like it went much further, to builtins! I'm surprised.

But I wish it could be included in Python 2.4.x. I really hope it won't
have any bugs in it. :) At my job we are probably going to upgrade to 2.4,
and that takes a long time, so it'll probably be a year or 18 months after
that happens (which itself might be months from now) that we would consider
upgrading again. Oh well...
 
R

Raymond Hettinger

[Michele Simionato]
Dunno, to me "tally" reads "counts the numbers of votes for a candidate
in an election".

That isn't a pleasant image ;-)


The
only right name would be "get_and_possibly_set" but it is a bit long to
type.


Well, I never figured out that the intent of setdefault was to build up
a list ;)

Right! What does have that intent is the full statement: d.setdefault(k,
[]).append(v).

My thought is that setdefault() is rarely used by itself. Instead, it is
typically part of a longer sentence whose intent and meaning is to accumulate or
build-up. That meaning is not well expressed by the current idiom.



Raymond Hettinger
 
R

Raymond Hettinger

Py2.5 is already going to include any() and all() as builtins. The
signature does not include a function, identity or otherwise.
Instead, the caller can
write a listcomp or genexp that evaluates to True or False:

any(x >= 42 for x in data)
[Roose]
Oh great, I just saw that. . . .
But I wish it could be included in Python 2.4.x.

If it is any consolation, the any() can already be expressed somewhat cleanly
and efficiently in Py2.4 with genexps:

True in (x >= 42 for x in data)

The translation for all() is a little less elegant:

False not in (x >= 42 for x in data)


Raymond Hettinger
 
R

Roose

Py2.5 is already going to include any() and all() as builtins. The
signature
does not include a function, identity or otherwise. Instead, the caller can
write a listcomp or genexp that evaluates to True or False:

Actually I was just looking at Python 2.5 docs since you mentioned this.

http://www.python.org/dev/doc/devel/whatsnew/node3.html

It says min() and max() will gain a key function parameter, and sort()
gained one in Python 2.4 (news to me).

And they do indeed default to the identity in all 3 cases, so this seems
very inconsistent. If one of them has it, and sort gained the argument even
in Python 2.4 with generator expressions, then they all should have it.
any(x >= 42 for x in data)

Not to belabor the point, but in the example on that page, max(L, key=len)
could be written max(len(x) for x in L).

Now I know why Guido said he didn't want a PEP for this... such a trivial
thing can produce a lot of opinions. : )

Roose
 
R

Raymond Hettinger

[Roose]
Actually I was just looking at Python 2.5 docs since you mentioned this.

http://www.python.org/dev/doc/devel/whatsnew/node3.html

It says min() and max() will gain a key function parameter, and sort()
gained one in Python 2.4 (news to me).

It also appears in itertools.groupby() and, for Py2.5, in heapq.nsmallest() and
heapq.nlargest().

And they do indeed default to the identity in all 3 cases, so this seems
very inconsistent. If one of them has it, and sort gained the argument even
in Python 2.4 with generator expressions, then they all should have it.


Not to belabor the point, but in the example on that page, max(L, key=len)
could be written max(len(x) for x in L).

Think about it. A key= function is quite a different thing. It provides a
*temporary* comparison key while retaining the original value. IOW, your
re-write is incorrect:
L = ['the', 'quick', 'brownish', 'toad']
max(L, key=len) 'brownish'
max(len(x) for x in L)
8


Remain calm. Keep the faith. Guido's design works fine.

No important use cases were left unserved by any() and all().



Raymond Hettinger
 
E

El Pitonero

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)
... def valadd(self, key, incr=1):
... try: self[key] = self[key] + type(self[key])(incr)
... except KeyError: self[key] = incr

What about:

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return copy.copy(self.default)

x = safedict(0)
x[3] += 1
y = safedict([])
y[5] += range(3)
print x, y
print x[123], y[234]
 
K

Kent Johnson

Bengt said:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)
How about an efficient duck-typing value-incrementer to replace both? E.g. functionally like:
... def valadd(self, key, incr=1):
... try: self[key] = self[key] + type(self[key])(incr)
... except KeyError: self[key] = incr

A big problem with this is that there are reasonable use cases for both
d.count(key, <some integer>)
and
d.appendlist(key, <some integer>)

Word counting is an obvious use for the first. Consolidating a list of key, value pairs where the
values are ints requires the second.

Combining count() and appendlist() into one function eliminates the second possibility.

Kent
 
C

Carl Banks

Raymond said:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)


Emphatic +1

I use both of these idioms all the time. (Kind of surprised to see
people confused about the need for the latter; I do it regularly.)
This is just the kind of thing experience shows cropping up enough that
it makes sense to put it in the language.

About the names: Seeing that these have specific uses, and do something
that is hard to explain in one word, I would suggest that short names
like count might betray the complexity of the operations. Therefore,
I'd suggest:

increment_value() (or add_to_value())
append_to_value()

Although they don't explicitly communicate that a value would be
created if it didn't exist, they do at least make it clear that it
happens to the value, which kind of implies that it would be created.

If we do have to use short names:

I don't like increment (or inc or incr) at all because it has the air
of a mutator method. Maybe it's just my previous experience with Java
and C++, but to me, a.incr() looks like it's incrementing a, and
a.incr(b) looks like it might be adding b to a. I don't like count
because it's too vague; it's pretty obvious what it does as an
iterator, but not as a method of dict. I could live with tally,
though. As for a short name for the other one, maybe fileas or
fileunder?
 
K

Kent Johnson

Brian said:
Raymond Hettinger said unto the world upon 2005-03-18 20:24:
I would like to get everyone's thoughts on two new dictionary methods:

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)
For appendlist, I would have expected

def appendlist(self, key, sequence):
try:
self[key].extend(sequence)
except KeyError:
self[key] = list(sequence)

The original proposal reads better at the point of call when values is a single item. In my
experience this will be the typical usage:
d.appendlist(key, 'some value')

as opposed to your proposal which has to be written
d.appendlist(key, ['some value'])

The original allows values to be a sequence using
d.appendlist(key, *value_list)

Kent
 
P

Pierre Barbier de Reuille

Ivan Van Laningham a écrit :
Hi All--
Maybe I'm not getting it, but I'd think a better name for count would be
add. As in

d.add(key)
d.add(key,-1)
d.add(key,399)
etc.
>
[...]

There is no existing add() method for dictionaries. Given the name
change, I'd like to see it.

Metta,
Ivan

I don't think "add" is a good name ... even if it doesn't exist in
dictionnarie, it exists in sets and, IMHO, this would add confusion ...

Pierre
 
D

Dan Sommers

The proposed names could possibly be improved (perhaps tally() is more
active and clear than count()).

Curious that in this lengthy discussion, a method name of "accumulate"
never came up. I'm not sure how to separate the two cases (accumulating
scalars vs. accumulating a list), though.

Regards,
Dan
 
J

Jeff Epler

[Jeff Epler]
I do not follow. Can you provide a pure python equivalent?

Here's what I had in mind:

$ python /tmp/unionset.py
Set(['set', 'self', 'since', 's', 'sys', 'source', 'S', 'Set', 'sets', 'starting'])

#------------------------------------------------------------------------
try:
set
except:
from sets import Set as set

def unionset(self, key, *values):
try:
self[key].update(values)
except KeyError:
self[key] = set(values)

if __name__ == '__main__':
import sys, re
index = {}

# We need a source of words. This file will do.
corpus = open(sys.argv[0]).read()
words = re.findall('\w+', corpus)

# Create an index of the words according to the first letter.
# repeated words are listed once since the values are sets
for word in words:
unionset(index, word[0].lower(), word)

# Display the words starting with 'S'
print index['s']
#------------------------------------------------------------------------

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCPDCNJd01MZaTXX0RArwwAJ49TWEKx9zWBR/ZP+O0vik13LdB7QCfbVpy
2U26jFyYPFwWbBnlXrcnFck=
=1s9E
-----END PGP SIGNATURE-----
 
I

Ivan Van Laningham

Hi All--

Raymond said:
[Michele Simionato]
+1 for inc instead of count.

Any takers for tally()?

Sure. Given the reasons for avoiding add(), tally()'s a much better
choice than count().

What about d.tally(key,0) then? Deleting the key as was suggested by
Michael Spencer seems non-intuitive to me.
I raise you by a ruble and a pound ;-)

<hardly-anything-is-worth-less-than-vietnamese-dong>-ly y'rs,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/workshops/1998-11/proceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
 
P

Peter Hansen

Michele said:
+1 for inc instead of count.

-1 for inc, increment, or anything that carries a
connotation of *increasing* the value, so long as
the proposal allows for negative numbers to be
involved. "Incrementing by -1" is a pretty silly
picture.

+1 for add and, given the above, I'm unsure there's
a viable alternative (unless this is restricted to
positive values, or perhaps even to "+1" specifically).
appendlist seems a bit too specific (I do not use dictionaries of lists
that often).

As Raymond does, I use this much more than the other.
The problem with setdefault is the name, not the functionality.
get_or_set would be a better name: we could use it as an alias for
setdefault and then remove setdefault in Python 3000.

Agreed...

-Peter
 
R

Reinhold Birkenfeld

Peter said:
-1 for inc, increment, or anything that carries a
connotation of *increasing* the value, so long as
the proposal allows for negative numbers to be
involved. "Incrementing by -1" is a pretty silly
picture.

+1 for add and, given the above, I'm unsure there's
a viable alternative (unless this is restricted to
positive values, or perhaps even to "+1" specifically).

What about `addto()'? add() just has the connotation of adding something
to the dict and not to an item in it.

Reinhold
 
P

Peter Hansen

Reinhold said:
What about `addto()'? add() just has the connotation of adding something
to the dict and not to an item in it.

Hmm... better than add anyway. I take back my ill-considered
+1 above, and apply instead a +0 to "count". I don't actually
like any of the alternatives at this point... needs more thought
(for my part, anyway).

To be honest, the only time I've ever seen this particular
idiom is in tutorial code or examples of how you produce
a histogram of word usage in a text document. Never in real
code (not that it doesn't happen, just that I've never
stumbled across it). The "appending to a list" idiom, on
the other hand, I've seen and used quite often.

I'm just going to stay out of the "add/inc/count/addto"
debate and consider the other half of the thread now. :)

-Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,772
Messages
2,569,592
Members
45,103
Latest member
VinaykumarnNevatia
Top