detecting containers during object introspection

D

David C. Fox

Is there a reasonable way to tell whether a variable is a container
(either a mapping or a sequence) and whether it is a mapping or a
sequence? [For background on why I'm asking this question, see the end
of the message]

isMappingType and isSequenceType in the operator module return true for
any class instance. The documentation for those functions says
(correctly) that there is no general way to tell whether a class
supports the complete mapping or sequence interfaces, but still, it
would be nice to be able to tell whether

for elem in x:

would fail or succeed, without relying on something like isinstance(x,
list), which would fail for a tuple or a UserList, or any other custom
class. Is calling iter(x) and catching the possible TypeError the
closest I can get? And what would be the closest equivalent for mapping
types.

Wouldn't it be helpful to have an abstract base class Sequence which
didn't add any actual attributes or methods, but which someone writing a
sequence class could include as a base class, as a sort of unenforced
promise that the sequence operators were supported?

Thanks in advance for any suggestions,
David

---


Why am I asking this question?

I'm trying to write a regression test for a class which pickles
dictionaries of attributes to store and retrieve instances. The
dictionary includes a version number, and the class has a system for
updating out-of-date instances to the current version.

As part of the test, I need to be able to compare the structure of the
two dictionaries to see if the developer has modified their structure
but forgotten to increment the current version. Values in the
dictionary may be unknown objects, in which case I just want to compare
their types. However, the values may also be sequences or mappings
themselves, in which case I want to recursively compare their
elements/values.
 
S

Steven Taschuk

Quoth David C. Fox:
Is there a reasonable way to tell whether a variable is a container
(either a mapping or a sequence) and whether it is a mapping or a
sequence? [...]

Depends what you mean by "reasonable".

In the usual scenario, there's a specific facility which you want
to make use of, and you can either (a) just try to use it, and
catch the relevant exception (called Easier to Ask Forgiveness
than Permission, or EAFP), or (b) do some kind of check beforehand
for whether the facility exists (called Look Before You Leap, or
LBYL).

Your example of trying iter(foo) and catching the possible
TypeError fits into such approaches.

[...]
Wouldn't it be helpful to have an abstract base class Sequence which
didn't add any actual attributes or methods, but which someone writing a
sequence class could include as a base class, as a sort of unenforced
promise that the sequence operators were supported?

Consider a DNS name -> address mapping whose __getitem__ actually
queries the DNS. (Ignore the fact that this could be better done
with a function -- assume we have a function which expects a
dict-like object and we want to use it on the DNS.) Such an
object could not reasonably implement __len__. Should it falsely
imply that it does (by inheriting from an abstract class Mapping)
or falsely imply that it's not a mapping (by not inheriting from
Mapping)?

Such cases -- those in which for conceptual or pragmatic reasons
it is infeasible or undesirable to implement all of an interface
-- are not uncommon.

Also, there's the idea of YAGNI (You Ain't Gonna Need It):
implement just what you discover you actually need, not what you
think you might need in the future. In particular, when writing a
dict-like object, write just as much of the interface as the
contexts in which you want to use that object require, not more.
Implementing the entire interface just for completeness is
(according to YAGNI) a waste of valuable developer time, until you
discover you really do need the complete interface. Adherents of
this idea won't like the notion of having to make a promise
(enforced or not) that they've implemented the entire interface.
Why am I asking this question?

I'm trying to write a regression test for a class which pickles
dictionaries of attributes to store and retrieve instances. The
dictionary includes a version number, and the class has a system for
updating out-of-date instances to the current version.

As part of the test, I need to be able to compare the structure of the
two dictionaries to see if the developer has modified their structure
but forgotten to increment the current version. Values in the
dictionary may be unknown objects, in which case I just want to compare
their types. However, the values may also be sequences or mappings
themselves, in which case I want to recursively compare their
elements/values.

Interesting. So you have an example of a standard dictionary for
a given version number, and you want to compare the structure of
the actual dictionary to the standard example dictionary. Is that
right?

Are your developers actually using dict-like and list-like objects
instead of real dicts and lists? If not, isn't the type-check for
unknown objects enough? (If they do start using work-alikes
instead of the real thing, the test will fail, but YAGNI, right?
You can fix it if it ever happens, at worst by adding support for
the specific kinds of work-alike the developers want to use. Only
if they want to use a wide variety of work-alikes do you really
need tests for abstract sequenceness and mappingness.)
 
D

David C. Fox

Steven said:
Quoth David C. Fox:
Is there a reasonable way to tell whether a variable is a container
(either a mapping or a sequence) and whether it is a mapping or a
sequence? [...]


Depends what you mean by "reasonable".

In the usual scenario, there's a specific facility which you want
to make use of, and you can either (a) just try to use it, and
catch the relevant exception (called Easier to Ask Forgiveness
than Permission, or EAFP), or (b) do some kind of check beforehand
for whether the facility exists (called Look Before You Leap, or
LBYL).

Your example of trying iter(foo) and catching the possible
TypeError fits into such approaches.

[...]
Wouldn't it be helpful to have an abstract base class Sequence which
didn't add any actual attributes or methods, but which someone writing a
sequence class could include as a base class, as a sort of unenforced
promise that the sequence operators were supported?


Consider a DNS name -> address mapping whose __getitem__ actually
queries the DNS. (Ignore the fact that this could be better done
with a function -- assume we have a function which expects a
dict-like object and we want to use it on the DNS.) Such an
object could not reasonably implement __len__. Should it falsely
imply that it does (by inheriting from an abstract class Mapping)
or falsely imply that it's not a mapping (by not inheriting from
Mapping)?

Such cases -- those in which for conceptual or pragmatic reasons
it is infeasible or undesirable to implement all of an interface
-- are not uncommon.

Thanks, makes sense - I'll just try...catch the actual operations I use.
Interesting. So you have an example of a standard dictionary for
a given version number, and you want to compare the structure of
the actual dictionary to the standard example dictionary. Is that
right?

Yes. The particular scenario I'm most worried about is if a developer
makes a change to the structure of the dictionary, but fails to
increment the current version number. In that case, users with existing
stored dictionaries who update to a new version of the program will end
up with incorrectly constructed objects, and are likely to have all
sorts of strange and hard-to-debug problems.
Are your developers actually using dict-like and list-like objects
instead of real dicts and lists? If not, isn't the type-check for
unknown objects enough? (If they do start using work-alikes
instead of the real thing, the test will fail, but YAGNI, right?

Yes to the first*, but yes and good points to the second and third.

[*well, not exactly - I recently replaced a dictionary with a trie data
structure which effectively maps from sequences of strings to values,
but it doesn't currently use keys() or __getitem__(self, key).]

However, the real problem occurs if the developer makes a change like
this (or even just adds a new attribute which is a non-standard
container), and does increment the version number. Because the version
number was incremented, the regression test would *expect* some of the
dictionary elements to change type or structure, and would simply update
the standard example dictionary. Then, any *subsequent* changes to the
structure of the values in that container would go undetected (unless
the developer had also updated the recursive comparison operation to
take into account that the unknown object was a container).

But this scenario requires several mistakes, so YAGNI is probably still
good advice. Besides, since there isn't any way to detect containers
that don't use the standard methods, I guess the automated regression
testing approach will never be foolproof, and we'll have to fall back on:

(1) giving stern instructions (and good documentation) to our
developers, and

(2) making sure a developer who is aware of these issues looks over
their shoulders and their code. Yet another responsibility - just what
I need. Not to mention the sore neck I'm going to get looking over my
own shoulder ;-)

David
 
S

Steven Taschuk

Quoth David C. Fox:
[...]
However, the real problem occurs if the developer makes a change like
this (or even just adds a new attribute which is a non-standard
container), and does increment the version number. Because the version
number was incremented, the regression test would *expect* some of the
dictionary elements to change type or structure, and would simply update
the standard example dictionary. Then, any *subsequent* changes to the
structure of the values in that container would go undetected (unless
the developer had also updated the recursive comparison operation to
take into account that the unknown object was a container).

Yes, I see. Thorny.

What if your recursive comparison operation choked on values of
unknown types rather than falling back on the weak "have same
type" criterion? That way, the developer who changes the
structure by introducing values of (e.g.) a new dict work-alike
type would see tests fail, which would remind her to add to that
comparison operation the necessary knowledge of the new type.

Fail safe, in other words, where "safe" means "no false positives
from the regression test".
 
D

David C. Fox

Steven said:
Quoth David C. Fox:
[...]
However, the real problem occurs if the developer makes a change like
this (or even just adds a new attribute which is a non-standard
container), and does increment the version number. Because the version
number was incremented, the regression test would *expect* some of the
dictionary elements to change type or structure, and would simply update
the standard example dictionary. Then, any *subsequent* changes to the
structure of the values in that container would go undetected (unless
the developer had also updated the recursive comparison operation to
take into account that the unknown object was a container).


Yes, I see. Thorny.

What if your recursive comparison operation choked on values of
unknown types rather than falling back on the weak "have same
type" criterion? That way, the developer who changes the
structure by introducing values of (e.g.) a new dict work-alike
type would see tests fail, which would remind her to add to that
comparison operation the necessary knowledge of the new type.

Fail safe, in other words, where "safe" means "no false positives
from the regression test".

That's one possibility. My original reaction to it was that requiring
developers to write comparison operations for every class whose
instances are found in the stored dictionary seemed like overkill. Then
I started think about what classes actually were found there. Many of
them contain lists or dictionaries, and changes to the internal
structure of those classes would cause the same problems described
above. Therefore, just noting that the old and new versions had an
attribute with the same class wouldn't be sufficient.

Then, I recalled that any object instance *has* a dictionary of
attributes: __dict__. So, instead of writing custom comparison
operators for each class, I should be able to define a generic
comparison operator cmp_struct(x, y) which returns true iff

1) type(x) == type(y), and

2) one of the following is true:
a) type(x) == type({}), and
x.keys() and y.keys() match, and
for each key, cmp_struct(x[key], y[key]) is true
b) type(x) == type([]), and
for each pair of items in x and y, cmp_struct returns true
c) type(x) == types.InstanceType, and
cmp_struct(x.__dict__, y.__dict__) is true
d) type(x) is not any of the above

Thanks for your advice. It is really nice to have someone to bounce
ideas off of.

David
 
S

Steven Taschuk

Quoth David C. Fox:
[...]
That's one possibility. My original reaction to it was that requiring
developers to write comparison operations for every class whose
instances are found in the stored dictionary seemed like overkill. [...]

It might well be; this really depends on how frequently new types
are introduced to the stored dict.

Note also that it would often be possible to reuse comparison
functions written for other types. Simple scalar values, for
example, can all use the comparison function
def cmpstruct_scalar(a, b):
return type(a) == type(b)
For another example, as you noticed, many (but not all -- see
below) instances of classes can be compared by comparison of their
__dict__s.

The point is that, under the choke-on-unknown-types strategy, in
many cases all the developer has to do when introducing a new type
to the stored dict is to add a line
new_type: some_existing_comparison_function,
to a registry of type -> comparison function. Doesn't seem
onerous to me.

[...]
1) type(x) == type(y), and

2) one of the following is true:
a) type(x) == type({}), and
x.keys() and y.keys() match, and
for each key, cmp_struct(x[key], y[key]) is true
b) type(x) == type([]), and
for each pair of items in x and y, cmp_struct returns true
c) type(x) == types.InstanceType, and
cmp_struct(x.__dict__, y.__dict__) is true
d) type(x) is not any of the above

Good idea! A few things:

This doesn't handle new-style classes -- (c) identifies instances
of classic classes only. hasattr(x, '__dict__') might be a better
test.

Also (c) doesn't compare the structure of class attributes, since
those are in the class dict, not the instance dict. (I suppose
for pickling this might not matter, since the class is pickled
just as its name, so changes will automagically be available to
the unpickled instance.)

An extension type written in C might maintain lists or whatnot,
but not expose them as attributes. This is true of the list and
dict types themselves, for example -- which is why you need
special handling for them above.

Instances of subclasses of list and dict won't be treated as lists
and dicts by (a) and (b) as written. This is probably desirable
-- such subclasses might have additional structure which needs to
be compared -- but the contents of such instances won't be
compared by the later steps, again because those contents are not
exposed as attributes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,830
Latest member
ZADIva7383

Latest Threads

Top