detecting containers during object introspection

Discussion in 'Python' started by David C. Fox, Jul 21, 2003.

  1. David C. Fox

    David C. Fox Guest

    Is there a reasonable way to tell whether a variable is a container
    (either a mapping or a sequence) and whether it is a mapping or a
    sequence? [For background on why I'm asking this question, see the end
    of the message]

    isMappingType and isSequenceType in the operator module return true for
    any class instance. The documentation for those functions says
    (correctly) that there is no general way to tell whether a class
    supports the complete mapping or sequence interfaces, but still, it
    would be nice to be able to tell whether

    for elem in x:

    would fail or succeed, without relying on something like isinstance(x,
    list), which would fail for a tuple or a UserList, or any other custom
    class. Is calling iter(x) and catching the possible TypeError the
    closest I can get? And what would be the closest equivalent for mapping
    types.

    Wouldn't it be helpful to have an abstract base class Sequence which
    didn't add any actual attributes or methods, but which someone writing a
    sequence class could include as a base class, as a sort of unenforced
    promise that the sequence operators were supported?

    Thanks in advance for any suggestions,
    David

    ---


    Why am I asking this question?

    I'm trying to write a regression test for a class which pickles
    dictionaries of attributes to store and retrieve instances. The
    dictionary includes a version number, and the class has a system for
    updating out-of-date instances to the current version.

    As part of the test, I need to be able to compare the structure of the
    two dictionaries to see if the developer has modified their structure
    but forgotten to increment the current version. Values in the
    dictionary may be unknown objects, in which case I just want to compare
    their types. However, the values may also be sequences or mappings
    themselves, in which case I want to recursively compare their
    elements/values.
     
    David C. Fox, Jul 21, 2003
    #1
    1. Advertising

  2. Quoth David C. Fox:
    > Is there a reasonable way to tell whether a variable is a container
    > (either a mapping or a sequence) and whether it is a mapping or a
    > sequence? [...]


    Depends what you mean by "reasonable".

    In the usual scenario, there's a specific facility which you want
    to make use of, and you can either (a) just try to use it, and
    catch the relevant exception (called Easier to Ask Forgiveness
    than Permission, or EAFP), or (b) do some kind of check beforehand
    for whether the facility exists (called Look Before You Leap, or
    LBYL).

    Your example of trying iter(foo) and catching the possible
    TypeError fits into such approaches.

    [...]
    > Wouldn't it be helpful to have an abstract base class Sequence which
    > didn't add any actual attributes or methods, but which someone writing a
    > sequence class could include as a base class, as a sort of unenforced
    > promise that the sequence operators were supported?


    Consider a DNS name -> address mapping whose __getitem__ actually
    queries the DNS. (Ignore the fact that this could be better done
    with a function -- assume we have a function which expects a
    dict-like object and we want to use it on the DNS.) Such an
    object could not reasonably implement __len__. Should it falsely
    imply that it does (by inheriting from an abstract class Mapping)
    or falsely imply that it's not a mapping (by not inheriting from
    Mapping)?

    Such cases -- those in which for conceptual or pragmatic reasons
    it is infeasible or undesirable to implement all of an interface
    -- are not uncommon.

    Also, there's the idea of YAGNI (You Ain't Gonna Need It):
    implement just what you discover you actually need, not what you
    think you might need in the future. In particular, when writing a
    dict-like object, write just as much of the interface as the
    contexts in which you want to use that object require, not more.
    Implementing the entire interface just for completeness is
    (according to YAGNI) a waste of valuable developer time, until you
    discover you really do need the complete interface. Adherents of
    this idea won't like the notion of having to make a promise
    (enforced or not) that they've implemented the entire interface.

    > Why am I asking this question?
    >
    > I'm trying to write a regression test for a class which pickles
    > dictionaries of attributes to store and retrieve instances. The
    > dictionary includes a version number, and the class has a system for
    > updating out-of-date instances to the current version.
    >
    > As part of the test, I need to be able to compare the structure of the
    > two dictionaries to see if the developer has modified their structure
    > but forgotten to increment the current version. Values in the
    > dictionary may be unknown objects, in which case I just want to compare
    > their types. However, the values may also be sequences or mappings
    > themselves, in which case I want to recursively compare their
    > elements/values.


    Interesting. So you have an example of a standard dictionary for
    a given version number, and you want to compare the structure of
    the actual dictionary to the standard example dictionary. Is that
    right?

    Are your developers actually using dict-like and list-like objects
    instead of real dicts and lists? If not, isn't the type-check for
    unknown objects enough? (If they do start using work-alikes
    instead of the real thing, the test will fail, but YAGNI, right?
    You can fix it if it ever happens, at worst by adding support for
    the specific kinds of work-alike the developers want to use. Only
    if they want to use a wide variety of work-alikes do you really
    need tests for abstract sequenceness and mappingness.)

    --
    Steven Taschuk
    "Our analysis begins with two outrageous benchmarks."
    -- "Implementation strategies for continuations", Clinger et al.
     
    Steven Taschuk, Jul 22, 2003
    #2
    1. Advertising

  3. David C. Fox

    David C. Fox Guest

    Steven Taschuk wrote:

    > Quoth David C. Fox:
    >
    >>Is there a reasonable way to tell whether a variable is a container
    >>(either a mapping or a sequence) and whether it is a mapping or a
    >>sequence? [...]

    >
    >
    > Depends what you mean by "reasonable".
    >
    > In the usual scenario, there's a specific facility which you want
    > to make use of, and you can either (a) just try to use it, and
    > catch the relevant exception (called Easier to Ask Forgiveness
    > than Permission, or EAFP), or (b) do some kind of check beforehand
    > for whether the facility exists (called Look Before You Leap, or
    > LBYL).
    >
    > Your example of trying iter(foo) and catching the possible
    > TypeError fits into such approaches.
    >
    > [...]
    >
    >>Wouldn't it be helpful to have an abstract base class Sequence which
    >>didn't add any actual attributes or methods, but which someone writing a
    >>sequence class could include as a base class, as a sort of unenforced
    >>promise that the sequence operators were supported?

    >
    >
    > Consider a DNS name -> address mapping whose __getitem__ actually
    > queries the DNS. (Ignore the fact that this could be better done
    > with a function -- assume we have a function which expects a
    > dict-like object and we want to use it on the DNS.) Such an
    > object could not reasonably implement __len__. Should it falsely
    > imply that it does (by inheriting from an abstract class Mapping)
    > or falsely imply that it's not a mapping (by not inheriting from
    > Mapping)?
    >
    > Such cases -- those in which for conceptual or pragmatic reasons
    > it is infeasible or undesirable to implement all of an interface
    > -- are not uncommon.


    Thanks, makes sense - I'll just try...catch the actual operations I use.

    >>Why am I asking this question?
    >>
    >>I'm trying to write a regression test for a class which pickles
    >>dictionaries of attributes to store and retrieve instances. The
    >>dictionary includes a version number, and the class has a system for
    >>updating out-of-date instances to the current version.
    >>
    >>As part of the test, I need to be able to compare the structure of the
    >>two dictionaries to see if the developer has modified their structure
    >>but forgotten to increment the current version. Values in the
    >>dictionary may be unknown objects, in which case I just want to compare
    >>their types. However, the values may also be sequences or mappings
    >>themselves, in which case I want to recursively compare their
    >>elements/values.

    >
    >
    > Interesting. So you have an example of a standard dictionary for
    > a given version number, and you want to compare the structure of
    > the actual dictionary to the standard example dictionary. Is that
    > right?


    Yes. The particular scenario I'm most worried about is if a developer
    makes a change to the structure of the dictionary, but fails to
    increment the current version number. In that case, users with existing
    stored dictionaries who update to a new version of the program will end
    up with incorrectly constructed objects, and are likely to have all
    sorts of strange and hard-to-debug problems.

    > Are your developers actually using dict-like and list-like objects
    > instead of real dicts and lists? If not, isn't the type-check for
    > unknown objects enough? (If they do start using work-alikes
    > instead of the real thing, the test will fail, but YAGNI, right?


    Yes to the first*, but yes and good points to the second and third.

    [*well, not exactly - I recently replaced a dictionary with a trie data
    structure which effectively maps from sequences of strings to values,
    but it doesn't currently use keys() or __getitem__(self, key).]

    However, the real problem occurs if the developer makes a change like
    this (or even just adds a new attribute which is a non-standard
    container), and does increment the version number. Because the version
    number was incremented, the regression test would *expect* some of the
    dictionary elements to change type or structure, and would simply update
    the standard example dictionary. Then, any *subsequent* changes to the
    structure of the values in that container would go undetected (unless
    the developer had also updated the recursive comparison operation to
    take into account that the unknown object was a container).

    But this scenario requires several mistakes, so YAGNI is probably still
    good advice. Besides, since there isn't any way to detect containers
    that don't use the standard methods, I guess the automated regression
    testing approach will never be foolproof, and we'll have to fall back on:

    (1) giving stern instructions (and good documentation) to our
    developers, and

    (2) making sure a developer who is aware of these issues looks over
    their shoulders and their code. Yet another responsibility - just what
    I need. Not to mention the sore neck I'm going to get looking over my
    own shoulder ;-)

    David
     
    David C. Fox, Jul 22, 2003
    #3
  4. Quoth David C. Fox:
    [...]
    > However, the real problem occurs if the developer makes a change like
    > this (or even just adds a new attribute which is a non-standard
    > container), and does increment the version number. Because the version
    > number was incremented, the regression test would *expect* some of the
    > dictionary elements to change type or structure, and would simply update
    > the standard example dictionary. Then, any *subsequent* changes to the
    > structure of the values in that container would go undetected (unless
    > the developer had also updated the recursive comparison operation to
    > take into account that the unknown object was a container).


    Yes, I see. Thorny.

    What if your recursive comparison operation choked on values of
    unknown types rather than falling back on the weak "have same
    type" criterion? That way, the developer who changes the
    structure by introducing values of (e.g.) a new dict work-alike
    type would see tests fail, which would remind her to add to that
    comparison operation the necessary knowledge of the new type.

    Fail safe, in other words, where "safe" means "no false positives
    from the regression test".

    --
    Steven Taschuk w_w
    ,-= U
    1 1
     
    Steven Taschuk, Jul 22, 2003
    #4
  5. David C. Fox

    David C. Fox Guest

    Steven Taschuk wrote:
    > Quoth David C. Fox:
    > [...]
    >
    >>However, the real problem occurs if the developer makes a change like
    >>this (or even just adds a new attribute which is a non-standard
    >>container), and does increment the version number. Because the version
    >>number was incremented, the regression test would *expect* some of the
    >>dictionary elements to change type or structure, and would simply update
    >>the standard example dictionary. Then, any *subsequent* changes to the
    >>structure of the values in that container would go undetected (unless
    >>the developer had also updated the recursive comparison operation to
    >>take into account that the unknown object was a container).

    >
    >
    > Yes, I see. Thorny.
    >
    > What if your recursive comparison operation choked on values of
    > unknown types rather than falling back on the weak "have same
    > type" criterion? That way, the developer who changes the
    > structure by introducing values of (e.g.) a new dict work-alike
    > type would see tests fail, which would remind her to add to that
    > comparison operation the necessary knowledge of the new type.
    >
    > Fail safe, in other words, where "safe" means "no false positives
    > from the regression test".
    >


    That's one possibility. My original reaction to it was that requiring
    developers to write comparison operations for every class whose
    instances are found in the stored dictionary seemed like overkill. Then
    I started think about what classes actually were found there. Many of
    them contain lists or dictionaries, and changes to the internal
    structure of those classes would cause the same problems described
    above. Therefore, just noting that the old and new versions had an
    attribute with the same class wouldn't be sufficient.

    Then, I recalled that any object instance *has* a dictionary of
    attributes: __dict__. So, instead of writing custom comparison
    operators for each class, I should be able to define a generic
    comparison operator cmp_struct(x, y) which returns true iff

    1) type(x) == type(y), and

    2) one of the following is true:
    a) type(x) == type({}), and
    x.keys() and y.keys() match, and
    for each key, cmp_struct(x[key], y[key]) is true
    b) type(x) == type([]), and
    for each pair of items in x and y, cmp_struct returns true
    c) type(x) == types.InstanceType, and
    cmp_struct(x.__dict__, y.__dict__) is true
    d) type(x) is not any of the above

    Thanks for your advice. It is really nice to have someone to bounce
    ideas off of.

    David
     
    David C. Fox, Jul 22, 2003
    #5
  6. Quoth David C. Fox:
    [...]
    > That's one possibility. My original reaction to it was that requiring
    > developers to write comparison operations for every class whose
    > instances are found in the stored dictionary seemed like overkill. [...]


    It might well be; this really depends on how frequently new types
    are introduced to the stored dict.

    Note also that it would often be possible to reuse comparison
    functions written for other types. Simple scalar values, for
    example, can all use the comparison function
    def cmpstruct_scalar(a, b):
    return type(a) == type(b)
    For another example, as you noticed, many (but not all -- see
    below) instances of classes can be compared by comparison of their
    __dict__s.

    The point is that, under the choke-on-unknown-types strategy, in
    many cases all the developer has to do when introducing a new type
    to the stored dict is to add a line
    new_type: some_existing_comparison_function,
    to a registry of type -> comparison function. Doesn't seem
    onerous to me.

    [...]
    > 1) type(x) == type(y), and
    >
    > 2) one of the following is true:
    > a) type(x) == type({}), and
    > x.keys() and y.keys() match, and
    > for each key, cmp_struct(x[key], y[key]) is true
    > b) type(x) == type([]), and
    > for each pair of items in x and y, cmp_struct returns true
    > c) type(x) == types.InstanceType, and
    > cmp_struct(x.__dict__, y.__dict__) is true
    > d) type(x) is not any of the above


    Good idea! A few things:

    This doesn't handle new-style classes -- (c) identifies instances
    of classic classes only. hasattr(x, '__dict__') might be a better
    test.

    Also (c) doesn't compare the structure of class attributes, since
    those are in the class dict, not the instance dict. (I suppose
    for pickling this might not matter, since the class is pickled
    just as its name, so changes will automagically be available to
    the unpickled instance.)

    An extension type written in C might maintain lists or whatnot,
    but not expose them as attributes. This is true of the list and
    dict types themselves, for example -- which is why you need
    special handling for them above.

    Instances of subclasses of list and dict won't be treated as lists
    and dicts by (a) and (b) as written. This is probably desirable
    -- such subclasses might have additional structure which needs to
    be compared -- but the contents of such instances won't be
    compared by the later steps, again because those contents are not
    exposed as attributes.

    --
    Steven Taschuk
    "What I find most baffling about that song is that it was not a hit."
    -- Tony Dylan Davis (CKUA)
     
    Steven Taschuk, Jul 23, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bugbear
    Replies:
    0
    Views:
    351
    bugbear
    Sep 28, 2005
  2. Replies:
    4
    Views:
    401
  3. traveller
    Replies:
    0
    Views:
    1,248
    traveller
    Jan 8, 2008
  4. Replies:
    7
    Views:
    578
    Pete Becker
    Jan 25, 2008
  5. Sebastian Mach
    Replies:
    5
    Views:
    342
Loading...

Share This Page