missing? dictionary methods

A

Antoon Pardon

Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.


2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.


What do other people think about this?
 
G

George Sakkis

Antoon Pardon said:
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.


2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.


What do other people think about this?

+1

I'm sure I've needed and implemented this functionality in the past, but it was simple enough to
even think of extracting them into functions/methods. In contrast to the recent pre-PEP about dict
accumulating methods, set() and make() (or whatever they might be called) are meaningful for all
dicts, so they're good candidates for being added to the base dict class.

As for naming, I would suggest reset() instead of set(), to emphasize that the key must be there.
make() is ok; other candidates could be add() or put().

George
 
R

Robert Kern

Antoon said:
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.


2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.


What do other people think about this?

def safeset(dct, key, value):
if key not in dct:
raise KeyError(key)
else:
dct[key] = value

def make(dct, key, value):
if key in dct:
raise KeyError('%r already in dict' % key)
else:
dct[key] = value

I don't see a good reason to make these built in to dict type.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
A

Antoon Pardon

Op 2005-03-21 said:
Antoon said:
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.


2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.


What do other people think about this?

def safeset(dct, key, value):
if key not in dct:
raise KeyError(key)
else:
dct[key] = value

def make(dct, key, value):
if key in dct:
raise KeyError('%r already in dict' % key)
else:
dct[key] = value

I don't see a good reason to make these built in to dict type.

I would say the same reason that we have get. There is no
reason to have a builtin get it is easily implemented
like this:

def get(dct, key, default):

try:
return dct[key]
except KeyError:
return default


I would go even so far that there is more reason to have a built-in
safeset and make, than there is a reason to have a built-in get.

The reason is that a python implementation of safeset and make,
will mean two accesses in the dictionary, once for the test and
once for the assignment. This double access could be eliminated
with a built-in. The get on the other hand does only one dictionary
access, so having it implemeted in python is a lesser burden.
 
R

Robert Kern

Antoon said:
I would say the same reason that we have get. There is no
reason to have a builtin get it is easily implemented
like this:

def get(dct, key, default):

try:
return dct[key]
except KeyError:
return default


I would go even so far that there is more reason to have a built-in
safeset and make, than there is a reason to have a built-in get.
>
The reason is that a python implementation of safeset and make,
will mean two accesses in the dictionary, once for the test and
once for the assignment. This double access could be eliminated
with a built-in. The get on the other hand does only one dictionary
access, so having it implemeted in python is a lesser burden.

That's not true; they're on more or less the same level
computation-wise. try:...except... doesn't relieve the burden; it's
expensive.

For me, the issue boils down to how often such constructs are used. I
don't think that I've ever run into use cases for safeset() and make().
dct.get(key, default) comes up *a lot*, and in places where speed can
matter. Searching through the standard library can give you an idea how
often.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
A

Antoon Pardon

Op 2005-03-21 said:
Antoon said:
I would say the same reason that we have get. There is no
reason to have a builtin get it is easily implemented
like this:

def get(dct, key, default):

try:
return dct[key]
except KeyError:
return default


I would go even so far that there is more reason to have a built-in
safeset and make, than there is a reason to have a built-in get.

The reason is that a python implementation of safeset and make,
will mean two accesses in the dictionary, once for the test and
once for the assignment. This double access could be eliminated
with a built-in. The get on the other hand does only one dictionary
access, so having it implemeted in python is a lesser burden.

That's not true; they're on more or less the same level
computation-wise. try:...except... doesn't relieve the burden; it's
expensive.

I have always heard that try: ... except is relatively inexpensive
in python. Particularly if there is no exception raised.
For me, the issue boils down to how often such constructs are used. I
don't think that I've ever run into use cases for safeset() and make().
dct.get(key, default) comes up *a lot*, and in places where speed can
matter. Searching through the standard library can give you an idea how
often.

It is always hard to compare the popularity/usefullness of two things when
one is already implemented and the other is not. IME it is not that
uncommon to know in some part of the code that the keys you use should
already be in the dictionary or contrary that you know the key should
not already be in the dictionary.
 
R

Ron

Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.


2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.


What do other people think about this?


There is a has_key(k) method that helps with these.

Adding these wouldn't be that hard and it can apply to all
dictionaries with any data.

class newdict(dict):
def new_key( self, key, value):
if self.has_key(key):
raise KeyError, 'key already exists'
else:
self[key]=value
def set_key( self, key, value):
if self.has_key(key):
self[key]=value
else:
raise KeyError, 'key does not exist'

d = newdict()
for x in list('abc'):
d[x]=x
print d
d.new_key('z', 'z')
d.set_key('a', 'b')
print d

Which is faster? (has_key()) or (key in keys())?
 
T

Terry Reedy

Antoon Pardon said:
For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.


2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.


What do other people think about this?

To me, one of the major problems with OOP is that there are an unbounded
number of functions that we can think of to operate on a date structure and
thus a continual pressure to turn functions into methods and thus
indefinitely expand a data structure class. And whatever is the least used
current method, there will always be candidates which are arguably at least
or almost as useful. And the addition of one method will be seen as reason
to add another, and another, and another. I was almost opposed to .get for
this reason. I think dict has about enough 'basic' methods.

So, without suppost from many people, your two examples strike me as fairly
specialized usages best written, as easily done, as Python functions.

Terry J. Reedy
 
C

Chris Rebert (cybercobra)

Antoon said:
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.


2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.


What do other people think about this?

If (1) gets accepted, I propose the name .change(key, val) It's
simple, logical, and makes sense.
 
G

Greg Ewing

George said:
As for naming, I would suggest reset() instead of set(), to emphasize that the key must be there.
make() is ok; other candidates could be add() or put().

How about 'new' and 'old'?
 
A

Antoon Pardon

Op 2005-03-21 said:
Antoon Pardon said:
For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.


2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.


What do other people think about this?

To me, one of the major problems with OOP is that there are an unbounded
number of functions that we can think of to operate on a date structure and
thus a continual pressure to turn functions into methods and thus
indefinitely expand a data structure class. And whatever is the least used
current method, there will always be candidates which are arguably at least
or almost as useful. And the addition of one method will be seen as reason
to add another, and another, and another. I was almost opposed to .get for
this reason. I think dict has about enough 'basic' methods.

So, without suppost from many people, your two examples strike me as fairly
specialized usages best written, as easily done, as Python functions.

I don't know it they are so specialized. I would rather say the
map[key] = value semantics is specialized. If we work with a list
the key already has to exist. If you have a list with 4 elements
and you try to assign to the 6th element you get an IndexError.
If you want to assign to the 6th element you have to construct
that first. That and for symetric reason with var = dct[key]
make me think that dct[key] = value shouldn't just construct
an entry when it isn't present.

I also was under the impression that a particular part of
my program almost doubled in execution time once I replaced
the naive dictionary assignment with these self implemented
methods. A rather heavy burden IMO for something that would
require almost no extra burden when implemented as a built-in.

But you are right that there doesn't seem to be much support
for this. So I won't press the matter.
 
B

Bengt Richter

On 22 Mar 2005 07:40:50 GMT said:
I also was under the impression that a particular part of
my program almost doubled in execution time once I replaced
the naive dictionary assignment with these self implemented
methods. A rather heavy burden IMO for something that would
require almost no extra burden when implemented as a built-in.
I think I see a conflict of concerns between language design
and optimization. I call it "arms-length assembler programming"
when I see language features being proposed to achieve assembler-level
code improvements.

For example, what if subclassing could be optimized to have virtually
zero cost, with some kind of sticky-mro hint etc to the compiler/optimizer?
How many language features would be dismissed with "just do a sticky subclass?"
But you are right that there doesn't seem to be much support
for this. So I won't press the matter.
I think I would rather see efficient general composition mechanisms
such as subclassing, decoration, and metaclassing etc. for program elements,
if possible, than incremental aggregation of efficient elements into the built-in core.

Also, because optimization risks using more computation to optimize than the expression
being optimized, I suspect that some kind of evaluate-expression-once (at def-time or first
execution time) and optimize-particular-expression hints could pay off more in general
than particular useful methods. Maybe Pypy will be an easier place to experiment with
these kinds of things.

Regards,
Bengt Richter
 
A

Antoon Pardon

Op 2005-03-22 said:
I think I see a conflict of concerns between language design
and optimization. I call it "arms-length assembler programming"
when I see language features being proposed to achieve assembler-level
code improvements.

For example, what if subclassing could be optimized to have virtually
zero cost, with some kind of sticky-mro hint etc to the compiler/optimizer?
How many language features would be dismissed with "just do a sticky subclass?"

I'm sorry you have lost me here. What do you mean with "stick-mro"

My feeling about this is the following. A[key] = value,
A.reset(key, value) and A.make(key, value) would do almost
identical things, so identical that it would probably easy
to unite them into something like A.assign(key, value, flag)
where flag would indicate which of the three options is wanted.

Also a lot of this code is identical to searching for a key.
Now because the implemantation doesn't provide some of the
possibilities I have to duplicate some of the work.

One could argue that hashes are fast enough so that this
doesn't matter, but dictionaries are the template for
all mappings in python. What it you are using a tree
and you have to go through it twice or what if you
are working with slower mediums like with one of
the dbm modules where you have to go through your
structure on disk twice.

You can see it as assembler-level code improvements, but
you also can see it as an imcomplete interface to your
structure. IMO it would be like only providing '<'
and if people wanted '==' they would have to implement
that like 'not (b < a or a < b)' and in this
case too, this would increase the cost compared with
a directly implemented '=='.

I think I would rather see efficient general composition mechanisms
such as subclassing, decoration, and metaclassing etc. for program elements,
if possible, than incremental aggregation of efficient elements into the built-in core.

Also, because optimization risks using more computation to optimize than the expression
being optimized,

I think this would hardly be the case here. The dictionary code already
has to find out if the key is already in the hash or not. Instead of
just continuing the branch it decided on as is now the case, the code
would test if the branch is appropiate for the demanded action
and raise an exception if not.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top