Chanelling Guido - dict subclasses

S

Steven D'Aprano

Over on the Python-Dev mailing list, there is an ENORMOUS multi-thread
discussion involving at least two PEPs, about bytes/str compatibility.
But I don't want to talk about that. (Oh gods, I *really* don't want to
talk about that...)

In the midst of that discussion, Guido van Rossum made a comment about
subclassing dicts:

From: Guido van Rossum <[email protected]>
Date: Tue, 14 Jan 2014 12:06:32 -0800
Subject: Re: [Python-Dev] PEP 460 reboot

Personally I wouldn't add any words suggesting or referring
to the option of creation another class for this purpose. You
wouldn't recommend subclassing dict for constraining the
types of keys or values, would you?
[end quote]

https://mail.python.org/pipermail/python-dev/2014-January/131537.html

This surprises me, and rather than bother Python-Dev (where it will
likely be lost in the noise, and certain will be off-topic), I'm hoping
there may be someone here who is willing to attempt to channel GvR. I
would have thought that subclassing dict for the purpose of constraining
the type of keys or values would be precisely an excellent use of
subclassing.


class TextOnlyDict(dict):
def __setitem__(self, key, value):
if not isinstance(key, str):
raise TypeError
super().__setitem__(key, value)
# need to override more methods too


But reading Guido, I think he's saying that wouldn't be a good idea. I
don't get it -- it's not a violation of the Liskov Substitution
Principle, because it's more restrictive, not less. What am I missing?
 
N

Ned Batchelder

Over on the Python-Dev mailing list, there is an ENORMOUS multi-thread
discussion involving at least two PEPs, about bytes/str compatibility.
But I don't want to talk about that. (Oh gods, I *really* don't want to
talk about that...)

In the midst of that discussion, Guido van Rossum made a comment about
subclassing dicts:

From: Guido van Rossum <[email protected]>
Date: Tue, 14 Jan 2014 12:06:32 -0800
Subject: Re: [Python-Dev] PEP 460 reboot

Personally I wouldn't add any words suggesting or referring
to the option of creation another class for this purpose. You
wouldn't recommend subclassing dict for constraining the
types of keys or values, would you?
[end quote]

https://mail.python.org/pipermail/python-dev/2014-January/131537.html

This surprises me, and rather than bother Python-Dev (where it will
likely be lost in the noise, and certain will be off-topic), I'm hoping
there may be someone here who is willing to attempt to channel GvR. I
would have thought that subclassing dict for the purpose of constraining
the type of keys or values would be precisely an excellent use of
subclassing.


class TextOnlyDict(dict):
def __setitem__(self, key, value):
if not isinstance(key, str):
raise TypeError
super().__setitem__(key, value)
# need to override more methods too


But reading Guido, I think he's saying that wouldn't be a good idea. I
don't get it -- it's not a violation of the Liskov Substitution
Principle, because it's more restrictive, not less. What am I missing?

One problem with it is that there are lots of ways of setting values in
the dict, and they don't use your __setitem__:

This is what you're getting at with your "need to override more methods
too", but it turns out to be a pain to override enough methods.

I don't know if that is what Guido was getting at, I suspect he was
talking at a more refined "principles of object design" level rather
than "dicts don't happen to work that way" level.

Also, I've never done it, but I understand that deriving from
collections.MutableMapping avoids this problem.
 
T

Terry Reedy

In the midst of that discussion, Guido van Rossum made a comment about
subclassing dicts:

From: Guido van Rossum <[email protected]>
Date: Tue, 14 Jan 2014 12:06:32 -0800
Subject: Re: [Python-Dev] PEP 460 reboot

Personally I wouldn't add any words suggesting or referring
to the option of creation another class for this purpose. You
wouldn't recommend subclassing dict for constraining the
types of keys or values, would you?
[end quote]

https://mail.python.org/pipermail/python-dev/2014-January/131537.html

This surprises me,

I was slightly surprised too. I understand not wanting to add a subclass
to stdlib, but I believe this was about adding words to the doc. Perhaps
he did not want to over-emphasize one particular possible subclass by
putting the words in the doc.
 
F

F

I can't speak for Guido but I think it is messy and unnatural and will lead to user frustration.
As a user, I would expect a dict to take any hashable as key and any object as value when using one. I would probably just provide a __getitem__ method in a normal class in your case.

This said I have overriden dict before, but my child class only added to dict, I didn't change it's underlying behaviour so you can use my class(es) as a vanilla dict everywhere, which enforcing types would have destroyed.



Over on the Python-Dev mailing list, there is an ENORMOUS multi-thread
discussion involving at least two PEPs, about bytes/str compatibility.
But I don't want to talk about that. (Oh gods, I *really* don't want to
talk about that...)

In the midst of that discussion, Guido van Rossum made a comment about
subclassing dicts:

From: Guido van Rossum <[email protected]>
Date: Tue, 14 Jan 2014 12:06:32 -0800
Subject: Re: [Python-Dev] PEP 460 reboot

Personally I wouldn't add any words suggesting or referring
to the option of creation another class for this purpose. You
wouldn't recommend subclassing dict for constraining the
types of keys or values, would you?
[end quote]

https://mail.python.org/pipermail/python-dev/2014-January/131537.html

This surprises me, and rather than bother Python-Dev (where it will
likely be lost in the noise, and certain will be off-topic), I'm hoping
there may be someone here who is willing to attempt to channel GvR. I
would have thought that subclassing dict for the purpose of constraining
the type of keys or values would be precisely an excellent use of
subclassing.


class TextOnlyDict(dict):
def __setitem__(self, key, value):
if not isinstance(key, str):
raise TypeError
super().__setitem__(key, value)
# need to override more methods too


But reading Guido, I think he's saying that wouldn't be a good idea. I
don't get it -- it's not a violation of the Liskov Substitution
Principle, because it's more restrictive, not less. What am I missing?
 
P

Peter Otten

Steven said:
In the midst of that discussion, Guido van Rossum made a comment about
subclassing dicts:

Personally I wouldn't add any words suggesting or referring
to the option of creation another class for this purpose. You
wouldn't recommend subclassing dict for constraining the
types of keys or values, would you?
[end quote]
This surprises me, and rather than bother Python-Dev (where it will
likely be lost in the noise, and certain will be off-topic), I'm hoping
there may be someone here who is willing to attempt to channel GvR. I
would have thought that subclassing dict for the purpose of constraining
the type of keys or values would be precisely an excellent use of
subclassing.


class TextOnlyDict(dict):
def __setitem__(self, key, value):
if not isinstance(key, str):
raise TypeError

Personally I feel dirty whenever I write Python code that defeats duck-
typing -- so I would not /recommend/ any isinstance() check.
I realize that this is not an argument...

PS: I tried to read GvR's remark in context, but failed. It's about time to
to revolt and temporarily install the FLUFL as our leader, long enough to
revoke Guido's top-posting license, but not long enough to reintroduce the
<> operator...
 
M

Mark Lawrence

Over on the Python-Dev mailing list, there is an ENORMOUS multi-thread
discussion involving at least two PEPs, about bytes/str compatibility.
But I don't want to talk about that. (Oh gods, I *really* don't want to
talk about that...)

+ trillions
In the midst of that discussion, Guido van Rossum made a comment about
subclassing dicts:

From: Guido van Rossum <[email protected]>
Date: Tue, 14 Jan 2014 12:06:32 -0800
Subject: Re: [Python-Dev] PEP 460 reboot

Personally I wouldn't add any words suggesting or referring
to the option of creation another class for this purpose. You
wouldn't recommend subclassing dict for constraining the
types of keys or values, would you?
[end quote]

https://mail.python.org/pipermail/python-dev/2014-January/131537.html

This surprises me, and rather than bother Python-Dev (where it will
likely be lost in the noise, and certain will be off-topic), I'm hoping
there may be someone here who is willing to attempt to channel GvR. I
would have thought that subclassing dict for the purpose of constraining
the type of keys or values would be precisely an excellent use of
subclassing.

Exactly what I was thinking.
class TextOnlyDict(dict):
def __setitem__(self, key, value):
if not isinstance(key, str):
raise TypeError
super().__setitem__(key, value)
# need to override more methods too


But reading Guido, I think he's saying that wouldn't be a good idea. I
don't get it -- it's not a violation of the Liskov Substitution
Principle, because it's more restrictive, not less. What am I missing?

Couple of replies I noted from Ned Batchelder and Terry Reedy. Smacked
bottom for Peter Otten, how dare he? :)
 
T

Tim Chase

class TextOnlyDict(dict):
def __setitem__(self, key, value):
if not isinstance(key, str):
raise TypeError
super().__setitem__(key, value)
# need to override more methods too


But reading Guido, I think he's saying that wouldn't be a good
idea. I don't get it -- it's not a violation of the Liskov
Substitution Principle, because it's more restrictive, not less.
What am I missing?

Just as an observation, this seems almost exactly what anydbm does,
behaving like a dict (whether it inherits from dict, or just
duck-types like a dict), but with the limitation that keys/values need
to be strings.

-tkc
 
J

John Ladasky

Personally I feel dirty whenever I write Python code that defeats duck-
typing -- so I would not /recommend/ any isinstance() check.

While I am inclined to agree, I have yet to see a solution to the problem of flattening nested lists/tuples which avoids isinstance(). If anyone has written one, I would like to see it, and consider its merits.
 
P

Peter Otten

John said:
While I am inclined to agree, I have yet to see a solution to the problem
of flattening nested lists/tuples which avoids isinstance(). If anyone
has written one, I would like to see it, and consider its merits.

Well, you should always be able to find some property that discriminates
what you want to treat as sequences from what you want to treat as atoms.

(flatten() Adapted from a nine-year-old post by Nick Craig-Wood
.... if check(items):
.... for item in items:
.... yield from flatten(item, check)
.... else:
.... yield items
....
items = [1, 2, (3, 4), [5, [6, (7,)]]]
print(list(flatten(items, check=lambda o: hasattr(o, "sort")))) [1, 2, (3, 4), 5, 6, (7,)]
print(list(flatten(items, check=lambda o: hasattr(o, "count"))))
[1, 2, 3, 4, 5, 6, 7]

The approach can of course break
items = ["foo", 1, 2, (3, 4), [5, [6, (7,)]]]
print(list(flatten(items, check=lambda o: hasattr(o, "count"))))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 4, in flatten
File "<stdin>", line 2, in flatten
RuntimeError: maximum recursion depth exceeded

and I'm the first to admit that the fix below looks really odd:
hasattr(o, "split"))))
['foo', 1, 2, 3, 4, 5, 6, 7]

In fact all of the following examples look more natural...
print(list(flatten(items, check=lambda o: isinstance(o, list)))) ['foo', 1, 2, (3, 4), 5, 6, (7,)]
print(list(flatten(items, check=lambda o: isinstance(o, (list,
tuple)))))
['foo', 1, 2, 3, 4, 5, 6, 7]or (isinstance(o, str) and len(o) > 1))))
['f', 'o', 'o', 1, 2, 3, 4, 5, 6, 7]

.... than the duck-typed variants because it doesn't matter for the problem
of flattening whether an object can be sorted or not. But in a real-world
application the "atoms" are more likely to have something in common that is
required for the problem at hand, and the check for it with

def check(obj):
return not (obj is an atom) # pseudo-code

may look more plausible.
 
D

Daniel da Silva

On Tue, Jan 14, 2014 at 8:27 PM, Steven D'Aprano <
But reading Guido, I think he's saying that wouldn't be a good idea. I
don't get it -- it's not a violation of the Liskov Substitution
Principle, because it's more restrictive, not less. What am I missing?

Just to be pedantic, this *is* a violation of the Liskov Substution
Principle. According to Wikipedia, the principle states:

if S is a subtype said:
objects of type <http://en.wikipedia.org/wiki/Datatype> T may be replaced
with objects of type S (i.e., objects of type S may be *substituted* for
objects of type T) without altering any of the desirable properties of that
program (correctness, task performed, etc.) [0]<http://en.wikipedia.org/wiki/Liskov_substitution_principle>


Since S (TextOnlyDict) is more restrictive, it cannot be replaced for T
(dict) because the program may be using non-string keys.


Daniel
 
G

Gregory Ewing

Daniel said:
Just to be pedantic, this /is/ a violation of the Liskov Substution
Principle. According to Wikipedia, the principle states:

if S is a subtype <http://en.wikipedia.org/wiki/Subtype> of T, then
objects of type <http://en.wikipedia.org/wiki/Datatype> T may be
replaced with objects of type S (i.e., objects of type S may
be /substituted/ for objects of type T) without altering any of the
desirable properties of that program

Something everyone seems to miss when they quote the LSP
is that what the "desirable properties of the program" are
*depends on the program*.

Whenever you create a subclass, there is always *some*
difference between the behaviour of the subclass and
the base class, otherwise there would be no point in
having the subclass. Whether that difference has any
bad consequences for the program depends on what the
program does with the objects.

So you can't just look at S and T in isolation and
decide whether they satisfy the LSP or not. You need
to consider them in context.

In Python, there's a special problem with subclassing
dicts in particular: some of the core interpreter code
assumes a plain dict and bypasses the lookup of
__getitem__ and __setitem__, going straight to the
C-level implementations. If you tried to use a dict
subclass in that context that overrode those methods,
your overridden versions wouldn't get called.

But if you never use your dict subclass in that way,
there is no problem. Or if you don't override those
particular methods, there's no problem either.

If you're giving advice to someone who isn't aware
of all the fine details, "don't subclass dict" is
probably the safest thing to say. But there are
legitimate use cases for it if you know what you're
doing.

The other issue is that people are often tempted to
subclass dict in order to implement what isn't really
a dict at all, but just a custom mapping type. The
downside to that is that you end up inheriting a
bunch of dict-specific methods that don't really
make sense for your type. In that case it's usually
better to start with a fresh class that *uses* a
dict as part of its implementation, and only
exposes the methods that are really needed.
 
D

Devin Jeanpierre

While I am inclined to agree, I have yet to see a solution to the problem of flattening nested lists/tuples which avoids isinstance(). If anyone has written one, I would like to see it, and consider its merits.

As long as you're the one that created the nested list structure, you
can choose to create a different structure instead, one which doesn't
require typechecking values inside your structure.

For example, os.walk has a similar kind of problem; it uses separate
lists for the subdirectories and the rest of the files, rather than
requiring you to check each child to see if it is a directory. It can
do it this way because it doesn't need to preserve the interleaved
order of directories and files, but there's other solutions for you if
you do want to preserve that order. (Although they won't be as clean
as they would be in a language with ADTs)

-- Devin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top