general problem when subclassing a built-in class

Discussion in 'Python' started by kj, Dec 23, 2010.

  1. kj

    kj Guest

    Suppose that you want to implement a subclass of built-in class, to
    meet some specific design requirements.

    Where in the Python documentation can one find the information
    required to determine the minimal[1] set of methods that one would
    need to override to achieve this goal?

    In my experience, educated guesswork doesn't get one very far with
    this question.

    Here's a *toy example*, just to illustrate this last point.

    Suppose that one wants to implement a subclass of dict, call it
    TSDict, to meet these two design requirements:

    1. for each one of its keys, an instance of TSDict should keep a
    timestamp (as given by time.time, and accessible via the new method
    get_timestamp(key)) of the last time that the key had a value
    assigned to it;

    2. other than the added capability described in (1), an instance
    of TSDict should behave *exactly* like a built-in dictionary.

    In particular, we should be able to observe behavior like this:

    >>> d = TSDict((('uno', 1), ('dos', 2)), tres=3, cuatro=4)
    >>> d['cinco'] = 5
    >>> d

    {'cuatro': 4, 'dos': 2, 'tres': 3, 'cinco': 5, 'uno': 1}
    >>> d.get_timestamp('uno')

    1293046586.644436


    OK, here's one strategy, right out of OOP 101:

    #---------------------------------------------------------------------------
    from time import time

    class TSDict(dict):
    def __setitem__(self, key, value):
    # save the value and timestamp for key as a tuple;
    # see footnote [2]
    dict.__setitem__(self, key, (value, time()))

    def __getitem__(self, key):
    # extract the value from the value-timestamp pair and return it
    return dict.__getitem__(self, key)[0]

    def get_timestamp(self, key):
    # extract the timestamp from the value-timestamp pair and return it
    return dict.__getitem__(self, key)[1]
    #---------------------------------------------------------------------------

    This implementation *should* work (again, at least according to
    OOP 101), but, in fact, it doesn't come *even close*:

    >>> d = TSDict((('uno', 1), ('dos', 2)), tres=3, cuatro=4)
    >>> d['cinco'] = 5
    >>> d

    {'cuatro': 4, 'dos': 2, 'tres': 3, 'cinco': (5, 1293059516.942985), 'uno': 1}
    >>> d.get_timestamp('uno')

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/tmp/tsdict.py", line 23, in get_timestamp
    return dict.__getitem__(self, key)[1]
    TypeError: 'int' object is not subscriptable


    From the above you can see that TSDict fails at *both* of the design
    requirements listed above: it fails to add a timestamp to all keys in
    the dictionary (e.g. 'uno', ..., 'cuatro' didn't get a timestamp), and
    get_timestamp bombs; and it also fails to behave in every other
    respect exactly like a built-in dict (e.g., repr(d) reveals the
    timestamps and how they are kept).


    So back to the general problem: to implement a subclass of a built-in
    class to meet a given set of design specifications. Where is the
    documentation needed to do this without guesswork?

    Given results like the one illustrated above, I can think of only two
    approaches (other than scrapping the whole idea of subclassing a
    built-in class in the first place):

    1) Study the Python C source code for the built-in class in the hopes
    of somehow figuring out what API methods need to be overridden;

    2) Through blind trial-and-error, keep trying different implementation
    strategies and/or keep overriding additional built-in class methods
    until the behavior of the resulting subclass approximates
    sufficiently the design specs.

    IMO, both of these approaches suck. Approach (1) would take *me*
    forever, since I don't know the first thing about Python's internals,
    and even if I did, going behind the documented API like that would
    make whatever I implement very likely to break with future releases of
    Python. Approach (2) could also take a very long time (probably much
    longer than the implementation would take if no guesswork was
    involved), but worse than that, one would have little assurance that
    one's experimentation has truly uncovered all the necessary details;
    IME, programming-by-guesswork leads to numerous and often nasty bugs.

    Is there any other way?

    TIA!

    ~kj


    [1] The "minimal" bit in the question statement is just another way of
    specifying a "maximal" reuse of the built-in's class code.

    [2] For this example, I've accessed the parent's methods directly
    through dict rather than through super(TSDict, self), just to keep
    the code as uncluttered as possible, but the results are the same
    if one uses super.
     
    kj, Dec 23, 2010
    #1
    1. Advertising

  2. On 2010-12-22 20:22:36 -0500, kj said:

    > Suppose that you want to implement a subclass of built-in class, to
    > meet some specific design requirements.
    >
    > Where in the Python documentation can one find the information
    > required to determine the minimal[1] set of methods that one would
    > need to override to achieve this goal?


    (Rest elided.)

    The short answer is that you can't everything you want.

    If some behaviour or implementation detail isn't documented, you can't
    make any assumptions about how a class works, even if you really want
    to. Whatever you figure out by "educated guesswork" (or, even better,
    reading the source) is, absent documentation, not guaranteed, and
    behaviour that's not guaranteed can and will change whenever the code's
    maintaners feel like it. This is true for for builtins, for things in
    the standard library, and for things in competently-maintained third
    party libraries.

    Requiring that every class document its internals extensively enough to
    permit subclasses to modify behaviour in totally arbitrary ways has to
    be balanced against allowing each class's interface to usefully
    abstract its internal workings. Especially in a duck-typed language
    like Python, there's very little need for subclassing simply to
    intercept one or two method calls, and the resulting composition-based
    system is both more flexible and, being based on documented and stable
    interfaces, easier to maintain and support.

    Subclassing is a powerful tool, not suited to every design problem. In
    your case (tracking timestamps in a dict-like container), you're
    probably better off implementing your own dict-like container, with
    support for as much or as little of the protocol you want. You can do
    that without subclassing dict: the protocol for dict is not difficult
    to implement, especially if your backing storage is a real dict.

    What forces do you imagine require you to use a subclass for this?

    -o
     
    Owen Jacobson, Dec 23, 2010
    #2
    1. Advertising

  3. kj

    bill Guest

    In <ieu88s$hmu$> kj <> writes:

    >Where in the Python documentation can one find the information
    >required to determine the minimal[1] set of methods that one would
    >need to override to achieve this goal?
    >

    I don't know the answer to that question, but imho it's
    the wrong question to ask. Instead you should be asking what
    design will let me implement the subclass making the fewest possible
    assumptions about the superclass. (hint: the design you're
    thinking about leads to a world of pain.)
    >
    >2. other than the added capability described in (1), an instance
    >of TSDict should behave *exactly* like a built-in dictionary.


    (minor point: what about repr? it shouldn't look like a dict, imho)

    >
    >#---------------------------------------------------------------------------
    >from time import time


    >class TSDict(dict):
    > def __setitem__(self, key, value):
    > # save the value and timestamp for key as a tuple;
    > # see footnote [2]
    > dict.__setitem__(self, key, (value, time()))


    > def __getitem__(self, key):
    > # extract the value from the value-timestamp pair and return it
    > return dict.__getitem__(self, key)[0]


    > def get_timestamp(self, key):
    > # extract the timestamp from the value-timestamp pair and return it
    > return dict.__getitem__(self, key)[1]
    >#---------------------------------------------------------------------------



    man, you're asking for trouble! even if you knew all you want to know,
    unless you had some guarantee that it wouldn't change in a later release,
    you'd still have to override pretty much all the methods. the problem
    here is not lack of information, but a horrible design (sorry to be blunt).

    Just keep the timestamps in a parallel dict and get on with life:

    from time import time
    from collections import MutableMapping

    class TSDict2(dict, MutableMapping):
    def __init__(self, *args, **kwargs):
    dict.__init__(self, *args, **kwargs)
    self.__ts = dict.fromkeys(self.keys(), time())

    def __setitem__(self, key, value):
    self.__ts[key] = time()
    dict.__setitem__(self, key, value)

    setdefault = MutableMapping.setdefault
    update = MutableMapping.update

    def get_timestamp(self, key):
    if self.has_key(key):
    return self.__ts[key]
    raise KeyError(key)


    that just took the time needed to type it and worked the first time:

    >>> d = TSDict2((('uno', 1), ('dos', 2)), tres=3, cuatro=4); d['cinco'] = 5; d.setdefault('six', 6);

    6
    >>> d.update((('tres', 'amigos'), ('seven', 7),), eight=8); d

    {'seven': 7, 'six': 6, 'cuatro': 4, 'cinco': 5, 'eight': 8, 'dos': 2, 'tres': 'amigos', 'uno': 1}
    >>> for p in sorted([(k, d.get_timestamp(k)) for k in d.keys()], key=lambda p: p[1]): print "%s\t%f" % p

    ....
    cuatro 1293131496.917215
    dos 1293131496.917215
    uno 1293131496.917215
    cinco 1293131496.917233
    six 1293131496.917279
    tres 1293131501.676962
    seven 1293131501.676974
    eight 1293131501.676981


    If you insist on bashing your skull against your original problem,
    take a look at collections.OrderedDict or collections.Counter to
    see how they use ABCs to tame dict.

    (still, even if you used all the remaining available
    MutableMapping methods in your class, i don't know how you'd
    get the dict constructor to return the right value, ie no timestamps,
    when you pass it an
    instance of your subclass as argument. i don't think
    there's a TSDict.__method__ you can write for that... anyway
    my TSDict2 doesn't have this problem either.)

    take home message: respect the privacy of your superclasses and they'll
    be nice to you.

    --bill
     
    bill, Dec 23, 2010
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gabriel Cooper

    subclassing built-in types

    Gabriel Cooper, May 24, 2004, in forum: Python
    Replies:
    0
    Views:
    292
    Gabriel Cooper
    May 24, 2004
  2. Emiliano Molina

    Subclassing built-in types

    Emiliano Molina, Aug 25, 2004, in forum: Python
    Replies:
    3
    Views:
    309
    Emiliano Molina
    Aug 26, 2004
  3. Jane Austine
    Replies:
    12
    Views:
    447
    Nick Coghlan
    Feb 24, 2005
  4. MonkeeSage

    Subclassing built-in classes

    MonkeeSage, Oct 5, 2006, in forum: Python
    Replies:
    12
    Views:
    550
    Theerasak Photha
    Oct 11, 2006
  5. Replies:
    0
    Views:
    138
Loading...

Share This Page