inheritance, multiple inheritance and the weaklist and instance dictionaries

  • Thread starter Rouslan Korneychuk
  • Start date
R

Rouslan Korneychuk

I'm working on a program that automatically generates C++ code for a
Python extension and I noticed a few limitations when using the weaklist
and instance dictionaries (tp_weaklistoffset and tp_dictoffset). This is
pertaining to the C API.

I noticed that when using multiple inheritance, I need a common base
class or I get an "instance lay-out conflict" error (my program already
deals with the issue of having a varying layout), but the error also
happens when the derived classes have these extra dictionaries and the
common base doesn't. This doesn't seem like it should be a problem if
the offsets for these variables are explicitly specified in the derived
types. I want this program to be as flexible as possible, so could
someone tell me what exactly are the rules when it comes to these
dictionaries and inheritance. Also, I don't like the idea of having up
to four different classes (one for every combination of those two
variables) that do nothing except tell CPython that I know what I'm doing.

Also is it possible to have a class that doesn't have these dictionaries
derive from a class that does?

I don't mind hackish solutions as long as they work reliably with
multiple Python versions.
 
C

Carl Banks

I'm working on a program that automatically generates C++ code for a
Python extension and I noticed a few limitations when using the weaklist
and instance dictionaries (tp_weaklistoffset and tp_dictoffset). This is
pertaining to the C API.

I noticed that when using multiple inheritance, I need a common base
class or I get an "instance lay-out conflict" error (my program already
deals with the issue of having a varying layout), but the error also
happens when the derived classes have these extra dictionaries and the
common base doesn't. This doesn't seem like it should be a problem if
the offsets for these variables are explicitly specified in the derived
types.

No, it is a problem. It violates Liskov substitution principle.

Let me see if I understand your situation. You have one base type
with tp_dictoffset or tp_weakrefoffset set but no extra fields in the
object structure, and another base type without tp_dictoffset or
tp_weakrefoffset, but with extra fields, and you're trying to multiply-
inherit from both? This is the only case I can think of where the
layout conflict would be caused by a type setting tp_dictoffset.

Even though this is a clear layout conflict, you think that if you set
tp_dictoffset and tp_weakrefoffset appropiately in the derived type,
it's ok if the dict and weakreflist appear in different locations in
the subtype, right?

Not in general. A bunch of stuff can go wrong. If any methods in the
base type reference the object dict directly (rather than indirectly
via tp_dictoffset), then the derived type will be broken when one of
the base-type methods is called. (This alone is enough to rule out
allowing it in general.) Even if you are careful to always use
tp_dictoffset; a user might write a subtype in C that directly
accesses it, not even stopping to consider that it might be used in
MI. It's not even certain that a derived type won't use the base
type's tp_dictoffset.

The algorithm to detect layout conflicts would require a terrible
increase in complexity: there's some of layouts that would "work" if
you could ignore tp_dictoffset, and some that won't, and now you have
a big mess trying to distinguish.

Bottom line is, tp_dictoffset and tp_weakrefoffset should be treated
as if they defined regular slots that affect layout, and it should be
assumed that (like for all other slots) the offset does not change for
subtypes. There's a few important situations where tp_dictoffset is a
tiny bit more flexible than a regular slot, but that's rare.

I want this program to be as flexible as possible, so could
someone tell me what exactly are the rules when it comes to these
dictionaries and inheritance. Also, I don't like the idea of having up
to four different classes (one for every combination of those two
variables) that do nothing except tell CPython that I know what I'm doing..

I don't think there's any reasonable way to subvert Python's behavior
on layout.

If your base types are abstract, you might consider not setting
tp_dictoffset and tp_weakrefoffset in the base (even if it has methods
that reference the dict)--just be sure to set it in the first class
that's concrete.

Also is it possible to have a class that doesn't have these dictionaries
derive from a class that does?

Nope. That would *really* violate Liskov substitution principle.


Carl Banks
 
R

Rouslan Korneychuk

No, it is a problem. It violates Liskov substitution principle.

Let me see if I understand your situation. You have one base type
with tp_dictoffset or tp_weakrefoffset set but no extra fields in the
object structure, and another base type without tp_dictoffset or
tp_weakrefoffset, but with extra fields, and you're trying to multiply-
inherit from both? This is the only case I can think of where the
layout conflict would be caused by a type setting tp_dictoffset.

No, actually I have code that is roughly equivalent to the following
pseudocode:

class _internal_class(object):
__slots__ = ()

class BaseA(_internal_class):
__slots__ = (some_data,...,__weaklist__,__dict__)

class BaseB(BaseA):
__slots__ = (some_data,...,__weaklist__,__dict__)

class BaseC(_internal_class):
__slots__ = (some_other_data,...,__weaklist__,__dict__)

class Derived(BaseB,BaseC):
__slots__ = (combined_data,...,__weaklist__,__dict__)


Before adding the weaklist and instance dicts, this all worked fine.
_internal_class was necessary to prevent the "lay-out conflict" error.
But when I added them to every class except _internal_class, I got the
error. I tried adding the dicts to _internal_class and it worked again.
(The classes are set up like this because the code is from my unit-test
code.)

Even though this is a clear layout conflict, you think that if you set
tp_dictoffset and tp_weakrefoffset appropiately in the derived type,
it's ok if the dict and weakreflist appear in different locations in
the subtype, right?

Not in general. A bunch of stuff can go wrong. If any methods in the
base type reference the object dict directly (rather than indirectly
via tp_dictoffset), then the derived type will be broken when one of
the base-type methods is called. (This alone is enough to rule out
allowing it in general.) Even if you are careful to always use
tp_dictoffset; a user might write a subtype in C that directly
accesses it, not even stopping to consider that it might be used in
MI. It's not even certain that a derived type won't use the base
type's tp_dictoffset.

The algorithm to detect layout conflicts would require a terrible
increase in complexity: there's some of layouts that would "work" if
you could ignore tp_dictoffset, and some that won't, and now you have
a big mess trying to distinguish.

Bottom line is, tp_dictoffset and tp_weakrefoffset should be treated
as if they defined regular slots that affect layout, and it should be
assumed that (like for all other slots) the offset does not change for
subtypes. There's a few important situations where tp_dictoffset is a
tiny bit more flexible than a regular slot, but that's rare.



I don't think there's any reasonable way to subvert Python's behavior
on layout.

If your base types are abstract, you might consider not setting
tp_dictoffset and tp_weakrefoffset in the base (even if it has methods
that reference the dict)--just be sure to set it in the first class
that's concrete.



Nope. That would *really* violate Liskov substitution principle.

This is why the code is automatically generated and is not meant to be
extended by hand. It already does more complicated things. Even without
inheritance, the layout of the classes can vary. Each Python class is a
wrapper for a C++ class. The C++ class is kept in the PyObject by copy,
by default. But it can also be stored by a pointer that is deleted when
the PyObject is gone. It can be stored as a reference to a member of
another exposed C++ class, along with a PyObject pointer to that class.
And it can also be stored as an unmanaged reference (this is intended
for static objects). On top of that, anything that isn't used, is
omitted, so that each class can be as small and efficient as possible.
So I have already thrown internal consistency out the window.

This all works by the way. You can check it out at
https://github.com/Rouslan/PyExpose (although I haven't uploaded the
code for supporting the weaklist and instance dictionaries yet). There
may be a few edge cases where it fails when having very different
requirements between inherited types, but I plan to write more
test-cases and work everything out.
 
C

Carl Banks

No, actually I have code that is roughly equivalent to the following
pseudocode:

class _internal_class(object):
     __slots__ = ()

class BaseA(_internal_class):
     __slots__ = (some_data,...,__weaklist__,__dict__)

class BaseB(BaseA):
     __slots__ = (some_data,...,__weaklist__,__dict__)

class BaseC(_internal_class):
     __slots__ = (some_other_data,...,__weaklist__,__dict__)

class Derived(BaseB,BaseC):
     __slots__ = (combined_data,...,__weaklist__,__dict__)

Ah, ok. So BaseA sticks weaklist and dict into a certain layout
location. BaseB gets the same location because it has the same
layout.
BaseC sticks weaklist and dict into a different location. Derived
then can't reconcile it because dict and weakref are in different
locations.

Again, for the reasons I mentioned the layouts conflict even if you
set tp_weakreflist and tp_dictoffset.

Before adding the weaklist and instance dicts, this all worked fine.
_internal_class was necessary to prevent the "lay-out conflict" error.

That actually seems like an error to me. It should signal a conflict
if bases have different layout sizes from their own base, and one
isn't a base of the other, which wouldn't be the case here.

"some_data" a proper subset of "some_other_data", right? (If it isn't
you have worse problems than dict and weakreflist.) A more natural
way to handle this situation would be to place common slots in the
internal_class.

But we'll just accept it for now.

But when I added them to every class except _internal_class, I got the
error. I tried adding the dicts to _internal_class and it worked again.
(The classes are set up like this because the code is from my unit-test
code.)

Don't set tp_dictoffset and tp_weakrefoffset in any of the bases. Set
them only in Derived. If you need to instantiate a Base, make a
trivial dervied class for it.



Carl Banks
 
C

Carl Banks

Each Python class is a wrapper for a C++ class.

Also, if you want my opinion (you probably don't after you've already
gone to so much trouble, but here it is anyway):

It's not worth it to mimic the C++ type hierarchy in Python. Just
wrap each C++ class, regardless of its ancestry, in a Python class
with only object as base.


Carl Banks
 
R

Rouslan Korneychuk

Ah, ok. So BaseA sticks weaklist and dict into a certain layout
location. BaseB gets the same location because it has the same
layout.
BaseC sticks weaklist and dict into a different location. Derived
then can't reconcile it because dict and weakref are in different
locations.

That doesn't explain why changing _internal_class to:
class _internal_class(object):
__slots__ = (__weaklist__,__dict__)
makes it work. Also, why does Derived need to reconcile anything if I'm
telling it where the dictionaries are in the new class? Doesn't CPython
calculate where, for example, weaklist is with what is essentially:
obj_instance + obj_instance->ob_type->tp_weaklistoffset
(at least when tp_weaklistoffset is non-negative)? If so, then it
wouldn't matter what the base classes look like.

"some_data" a proper subset of "some_other_data", right? (If it isn't
you have worse problems than dict and weakreflist.)

Not at all. The point of this, is to allow C++ multiple inheritance to
be mapped to Python multiple inheritance. I solve this issue as follows:
All attributes of base classes of multiply-inheriting classes are turned
into properties. Any time the wrapped C++ object needs to be accessed,
the PyObject self variable's type is compared to every derived type that
also inherits from another type. If it matches a derived type, a
reinterpret_cast is performed on the C++ object to the derived C++ type,
and then an implicit cast is performed to the intended type, which
allows the proper pointer fix-up to happen. Here is the actual function
that does that for BaseA:

BaseA &get_base_BaseA(PyObject *x,bool safe = true) {

if(reinterpret_cast<long>(static_cast<BaseA*>(reinterpret_cast<Derived*>(1)))
!= 1 &&

PyObject_IsInstance(x,reinterpret_cast<PyObject*>(get_obj_DerivedType())))
return cast_base_Derived(x);

if(UNLIKELY(safe &&
!PyObject_IsInstance(x,reinterpret_cast<PyObject*>(get_obj_BaseAType())))) {
PyErr_SetString(PyExc_TypeError,"object is not an instance of
BaseA");
throw py_error_set();
}

assert(PyObject_IsInstance(x,reinterpret_cast<PyObject*>(get_obj_BaseAType())));
return cast_base_BaseA(x);
}

(cast_base_X checks how the C++ object is stored and retrieves a
reference to it. The
"reinterpret_cast<long>(static_cast<BaseA*>(reinterpret_cast<Derived*>(1)))
!= 1" part is an optimization trick. It checks to see if a pointer
fix-up is even necessary. If not, that 'if' statement will be subject to
dead code removal.)

Of course, if this was hand-written code, such a method would be very
brittle and would be a nightmare to maintain, but it's generated by a
Python program and only needs to be correct.

Don't set tp_dictoffset and tp_weakrefoffset in any of the bases. Set
them only in Derived. If you need to instantiate a Base, make a
trivial dervied class for it.

That would mean calling isinstance(derived_instance,BaseType) would
return false. In any case, I'm writing a general-purpose tool and not
trying to solve a specific problem. If I can't do what I'm trying to do
here, I can just make the program not allow multiply-inheriting classes
to derive (directly or indirectly) from classes that have a different
combination of weaklist and dict (having both will be the default
anyway, so this is a special case).
 
R

Rouslan Korneychuk

Also, if you want my opinion (you probably don't after you've already
gone to so much trouble, but here it is anyway):

No, your opinion is quite welcome.

It's not worth it to mimic the C++ type hierarchy in Python. Just
wrap each C++ class, regardless of its ancestry, in a Python class
with only object as base.

I kind-of already did. The issue only comes up when multiply-inheriting
from types that have a different combination of the weaklist and
instance dictionaries. I don't have to support this particular feature.

As for the worth, this is something I'm working on in my spare time as a
hobby and I enjoy a good challenge.
 
C

Carl Banks

That doesn't explain why changing _internal_class to:
     class _internal_class(object):
         __slots__ = (__weaklist__,__dict__)
makes it work.

Yes it does. When weaklist and dict are slots in the base class, then
all the derived classes inherit those slots. So tp_dictoffset doesn't
end up with one value in BaseB and a different value in BaseC. For
the layouts to be compatible the dicts have to be in the same
location.

Also, why does Derived need to reconcile anything if I'm
telling it where the dictionaries are in the new class? Doesn't CPython
calculate where, for example, weaklist is with what is essentially:
obj_instance + obj_instance->ob_type->tp_weaklistoffset
(at least when tp_weaklistoffset is non-negative)? If so, then it
wouldn't matter what the base classes look like.

I explained why in my last post; there's a bunch of reasons.
Generally you can't assume someone's going to go through the type
structure to find the object's dict, nor can you expect inherited
methods to always use the derived class's type structure (some methods
might use their own type's tp_dictoffset or tp_weakreflist, which
would be wrong if called from a superclass that changes those
values). Even if you are careful to avoid such usage, the Python
interpreter can't be sure. So it has to check for layout conflicts,
and these checks would become very complex if it allowed dict and
weakreflist to appear in different locations in the layout (it's have
to check a lot more).

Not at all. The point of this, is to allow C++ multiple inheritance to
be mapped to Python multiple inheritance.

I would say you do. Python's type system specifies that a derived
type's layout is a superset of its base types' layout. You seem to
have found a way to derive a type without a common layout, perhaps by
exploiting a bug, and you claim to be able to keep data access
straight. But Python types are not intended to work that way, and you
are asking for trouble if you try to do it.

I guess there's also no point in arguing that tp_dictoffset and
tp_weakreflist need to have the same value for base and derived types,
since you're rejecting the premise that layouts need to be
compatible. Therefore, I'll only point out that the layout checking
code is based on this premise, so that's why you're running afoul of
it.


[snip complicated variable access code]

That would mean calling isinstance(derived_instance,BaseType) would
return false.

You claimed in another post you weren't trying to mimic the C++ type
hierarchy in Python, but this line suggests you are.

My suggestion was that you shouldn't; just don't create a type
hierarchy at all in Python. Even more so if you're using Python 3,
where isinstance() is customizable.


Carl Banks
 
R

Rouslan Korneychuk

I explained why in my last post; there's a bunch of reasons.
Generally you can't assume someone's going to go through the type
structure to find the object's dict, nor can you expect inherited
methods to always use the derived class's type structure (some methods
might use their own type's tp_dictoffset or tp_weakreflist, which
would be wrong if called from a superclass that changes those
values).

Who do you mean by someone? The code is generated by a program. No human
is required to touch it. If it needs to be updated, the program is
simply run again with the updated specification file. Thus I can make
those assumptions because I have total control over the code. The only
thing I don't have control over is the Python code that imports the
extension, but in Python, the user doesn't get to choose how they access
the weaklist and instance dictionary.

Even if you are careful to avoid such usage, the Python
interpreter can't be sure. So it has to check for layout conflicts,
and these checks would become very complex if it allowed dict and
weakreflist to appear in different locations in the layout (it's have
to check a lot more).

What is so complex about this? It already uses "obj_instance +
obj_instance->ob_type->tp_weaklistoffset". That's all the checking it
needs. It only becomes a problem when trying to derive from two or more
classes that already have these defined. In such a case the Python
interpreter can't deduce what the values of tp_weaklistoffset and
tp_dictoffset in the derived type should be, but it doesn't have to
because my program tells it what they need to be.

I would say you do. Python's type system specifies that a derived
type's layout is a superset of its base types' layout. You seem to
have found a way to derive a type without a common layout, perhaps by
exploiting a bug, and you claim to be able to keep data access
straight. But Python types are not intended to work that way, and you
are asking for trouble if you try to do it.

I'm not really circumventing this system (except for the varying
location of the dictionaries. See the explanation below for that).
Python allows variable-sized objects. Tuples and strings are variable
sized. This allows them to store the data directly in the object instead
of having a pointer to another location in memory. And the objects I
generate are basically this:

struct MyObject {
PyObject_HEAD
storage_mode mode;
char[x] opaque_data;
};

I use the real type instead of char[] when possible because it will have
the proper alignment but I still treat it like a private hunk of memory
that only my generate code will touch. What I store in opaque_data is up
to me. I can store a copy of the wrapped type, or I can store a pointer
to it. "mode" specifies what is in opaque_data. A derived type would
look like this:

struct MyDerivedObject {
PyObject_HEAD
storage_mode mode;
char[y] opaque_data;
};

Where y >= x. It's still the same layout. All that's left is some way
for the original object to know what C++ type is stored in opaque_data.
I could have used another variable like 'mode', but since there is a
one-to-one correspondence between PyObject->ob_type and the type that is
being wrapped, I can determine the type from ob_type instead.

There is no bug being exploited. The actual implementation is a little
different than this, but the principle is the same. I said before that
the layout varies, but that's only if you consider the contents of
opaque_data, but that is neither Python's nor the user's concern.

I guess there's also no point in arguing that tp_dictoffset and
tp_weakreflist need to have the same value for base and derived types,
since you're rejecting the premise that layouts need to be
compatible. Therefore, I'll only point out that the layout checking
code is based on this premise, so that's why you're running afoul of
it.

That's not what the Python documentation says. Under
http://docs.python.org/c-api/typeobj.html#tp_weaklistoffset it says
"This field is inherited by subtypes, but see the rules listed below. A
subtype may override this offset; this means that the subtype uses a
different weak reference list head than the base type. Since the list
head is always found via tp_weaklistoffset, this should not be a
problem." And under
http://docs.python.org/c-api/typeobj.html#tp_dictoffset it says "This
field is inherited by subtypes, but see the rules listed below. A
subtype may override this offset; this means that the subtype instances
store the dictionary at a difference offset than the base type. Since
the dictionary is always found via tp_dictoffset, this should not be a
problem."

You claimed in another post you weren't trying to mimic the C++ type
hierarchy in Python, but this line suggests you are.

When did I make that claim? Perhaps you misunderstood me I said "I
kind-of already did. The issue only comes up when multiply-inheriting
from types that have a different combination of the weaklist and
instance dictionaries. I don't have to support this particular feature."

I was saying I kind-of already did mimic the C++ hierarchy. And when I
said "this particular feature", I was talking about the thing I
described in the immediately preceding sentence, not the C++ type hierarchy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top