generic way to access C++ libs?

  • Thread starter Gabriel Zachmann
  • Start date
J

Jacek Generowicz

Gabriel Zachmann said:
Thanks a lot!

That's a very neat tool (like everything from Boost ;-) ),
and pretty close to what I was envisioning,
except that one still has to sort of manually transform header files into
BOOST_PYTHON_MODULE declarations ...

Then check out Pyste ... which should come with Boost itself.
 
T

Thomas Heller

To somebody with a good grasp of the current state of C++ technology,
maybe. Somebody who might just like to using existing dynlib/&c which
happen to be oriented to C++ rather than C might quite reasonably not
find the distinction obvious, IMHO.

Indeed, I suspect ctypes could be extended to do some of the requested
task, if one focused on a single, specific C++ compiler.

I guess that MSVC uses the same binary layout for C++ objects as for COM
objects, so it should be possible. I don't know if the name mangling rules
are documented somewhere.

But I have no idea how inline definitions of member functions (I'm not
sure that's the correct term - I mean code defined in the header files)
should be converted to Python code.

Thomas
 
D

Diez B. Roggisch

I guess that MSVC uses the same binary layout for C++ objects as for COM
objects, so it should be possible. I don't know if the name mangling
rules are documented somewhere.

COM has no binary layout of objects - the whole purpose of COM is to _not_
access objects by memory location, but instead by interfaces and thus
functions. Even what you perceive as attributes/properties are set/get
functions. Conveniently, COM objects can be wrapped on the fly be the
python win32 extensions.
But I have no idea how inline definitions of member functions (I'm not
sure that's the correct term - I mean code defined in the header files)
should be converted to Python code.

While inlining is an optimization technique that allows for copying method
code directly into the callee's code, they will still be exposed
explicitely as function as otherwise even linking between C++ libs won't
work.
 
T

Thomas Heller

Diez B. Roggisch said:
COM has no binary layout of objects - the whole purpose of COM is to _not_
access objects by memory location, but instead by interfaces and thus
functions. Even what you perceive as attributes/properties are set/get
functions.

Sorry, I simply used the wrong terminus - I meant interfaces, of course.
Conveniently, COM objects can be wrapped on the fly be the
python win32 extensions.

Yes, but only Dispatch interfaces. ctypes is able to wrap native
interfaces by constructing the vtable at runtime, by using a manually
written or generated interface definition.
While inlining is an optimization technique that allows for copying method
code directly into the callee's code, they will still be exposed
explicitely as function as otherwise even linking between C++ libs won't
work.

I mean, for example, code like this (from MS' GdiPlusTypes.h):

class CharacterRange
{
public:
CharacterRange(
INT first,
INT length
) :
First (first),
Length (length)
{}

CharacterRange() : First(0), Length(0)
{}

CharacterRange & operator = (const CharacterRange &rhs)
{
First = rhs.First;
Length = rhs.Length;
return *this;
}

INT First;
INT Length;
};

Thomas
 
D

Diez B. Roggisch

Conveniently, COM objects can be wrapped on the fly be the
Yes, but only Dispatch interfaces. ctypes is able to wrap native
interfaces by constructing the vtable at runtime, by using a manually
written or generated interface definition.

COM is espacially designed to allow for this - language interoparability. So
its not to surprising that its easily wrapped. But one can draw _no_
conclusions from that about C++ - as COM is a binary standard defined in
terms of C. The fact that VStudio integrates it nicely with C++ has nothing
to do with that.
I mean, for example, code like this (from MS' GdiPlusTypes.h):

class CharacterRange

I know, I meant that code too.
 
D

Damjan

that's not quite true.
I did not talk about binary executable formats, but the memory layout of
C++ objects. The c++ standard doesn't define where e.g. the vtable of an
objects virtual method resides - or even if virtual methods have to be
implemented by a vtable at first place.

There's a C++ ABI standard on Linux now (maybe it covers other Unixish
platforms too). GCC-3.x supports it and also the Intel compiler supports
it. That's how you can now use (link) libraries compiled with some gcc-3.x
compiler to a program compiled with Intel's compilers or vice versa.

Though, some of the early gcc-3.0 compilers may not be 100% C++ ABI
compliant. There were some bugs there.
 
D

Diez B. Roggisch

There's a C++ ABI standard on Linux now (maybe it covers other Unixish
platforms too). GCC-3.x supports it and also the Intel compiler supports
it. That's how you can now use (link) libraries compiled with some gcc-3.x
compiler to a program compiled with Intel's compilers or vice versa.

Though, some of the early gcc-3.0 compilers may not be 100% C++ ABI
compliant. There were some bugs there.

Interesting to hear that they work on these shortcomings of c++. From what
google told me, it looks as if this stuff is defined for Itanium processors
only - do you know if its also used for x86 systems?
 
J

Jacek Generowicz

Member functions which are _defined_ (as opposed to merely declared)
within the class, are implicitly marked "inline". And the "inline"
keyword is a hint to the compiler that you suggest that the function
be inlined; it's a hind which the compiler is allowed to ignore.
While inlining is an optimization technique that allows for copying method
code directly into the callee's code, they will still be exposed
explicitely as function as otherwise even linking between C++ libs won't
work.

I don't think so. Given that the _definition_ of the function is in
the header, and the header is needed to compile any code which might
want to link with the library, the compiler has all it needs, without
a need for a corresponding function to exist in the library.
 
J

Jacek Generowicz

Diez B. Roggisch said:
Interesting to hear that they work on these shortcomings of c++. From what
google told me, it looks as if this stuff is defined for Itanium processors
only - do you know if its also used for x86 systems?

Yes, I use it in GCC on x86, G4 and G5. It should work wherever GCC
works.
 
N

Neil Hodgson

Gabriel Zachmann:
really? that would mean that c++ libs themselves are
not binary compatible among each other?

I was thinking in terms of Microsoft C++ options like the vtordisp
setting used to handle virtual inheritance (/vd0 or /vd1), pointer to member
representation (/vm*), and the old favourite struct alignment (/Zp*).
Borland adds options for allowing small enumerations to take only one byte
(-b-), zero size empty base classes (-Ve), zero size empty members (-Vx),
and a layout compatible with old compiler versions (-Vl).

Much of the time C++ code only has to be compatible with the other
modules it is delivered with. So long as all the modules use the same
compiler flags, this will be achieved.

Neil
 
D

Diez B. Roggisch

I don't think so. Given that the _definition_ of the function is in
the header, and the header is needed to compile any code which might
want to link with the library, the compiler has all it needs, without
a need for a corresponding function to exist in the library.

It has been a while, but I think what happened when including such a header
on several locations that produced .o-files was that a duplicate symbol
problem arised on linking time. Why would there be a symbol at all, when
the code got "pasted" into the callee's code?
 
J

Jacek Generowicz

Diez B. Roggisch said:
It has been a while, but I think what happened when including such a header
on several locations that produced .o-files was that a duplicate symbol
problem arised on linking time.

I find this hard to believe, unless we are talking about a buggy
compiler or linker. Remember that in the case of templates (until the
EDG people implemented the mythical "export" keyword, relatively
recently) the _only_ way to use templates in more than one translation
unit was to include the full _definitions_ of everything in the
template, in the header.

That's why C++ has the One Definition Rule (ODR). Here's what
Stroustrup has to say about the ODR in _The C++ Programming Language_
3rd edition (p203):

A given class, enumeration, and template, etc., must be defined
exactly once in a program. From a practical point of view, this
means that there must be exactly one definition of, say, a class
ersiding in a single file somewhere. Unfortunately, the language
rule cannot be that simple. [...] Consequently the rule in the
standard that says that there must be a unique definition of a
class, template, etc., is phrased in a somewhat more compliated and
subtle manner. This rule is commonly referred to as "the
one-definiton rule," the ODR. That is, two definitions of a class,
template, or inline function are accepted as examples of the same
unique definition if and only if

[1] they appear in different translation units
[2] they are token-for-token identical
[3] the meanings of those tokens are the same in both translation
units.

[...]

The intent of the ODR is to allow inclusion of a class definiton in
different tranlation units from a common source file.

Why would there be a symbol at all, when the code got "pasted" into
the callee's code?

Remember that "inline" (and the implicit inline you generate by
_defining_ the method in the class itself) is merely a hint to the
compiler, which the compiler is perfectly entitled to ignore. One
situation in which the compiler is _guaranteed_ to ignore it is if you
also declare the method to be virtual, for example. In general, it may
choose to inlne some of the calls while not others.

I don't think that declaring a function to be inline makes any
guarantess about whether you will find a symbol for that function in
the object code. But the ODR guarantees that sorting this mess out is
the compiler's problem, not yours.
 
D

Diez B. Roggisch

Remember that "inline" (and the implicit inline you generate by
_defining_ the method in the class itself) is merely a hint to the
compiler, which the compiler is perfectly entitled to ignore. One
situation in which the compiler is _guaranteed_ to ignore it is if you
also declare the method to be virtual, for example. In general, it may
choose to inlne some of the calls while not others.

I don't think that declaring a function to be inline makes any
guarantess about whether you will find a symbol for that function in
the object code. But the ODR guarantees that sorting this mess out is
the compiler's problem, not yours.

You are right - I just created a test-project with two object files that
stem from .cc-files where the same class with an inlined function "bar" is
included. What happens is that nm shows me that the function "bar" is
defined weak - according to nm (and what I understand about that) this
means that in presence of another symbol of that name, it will be ignored.
So both files contain a definition for bar, but the linker is able to sort
it out.
 
J

Jorgen Grahn

Gabriel Zachmann:

I was thinking in terms of Microsoft C++ options like the vtordisp
setting used to handle virtual inheritance (/vd0 or /vd1), pointer to member
representation (/vm*), and the old favourite struct alignment (/Zp*).
Borland adds options for allowing small enumerations to take only one byte
(-b-), zero size empty base classes (-Ve), zero size empty members (-Vx),
and a layout compatible with old compiler versions (-Vl).

Much of the time C++ code only has to be compatible with the other
modules it is delivered with.

But most of the time it has to be compatible with the standard library.

Even if these flags exist, I think it's very rare to encounter more than one
layout, for one specific compiler/compiler version/architecture.

/Jorgen
 
N

Neil Hodgson

Neil Hodgson:

# Much of the time C++ code only has to be compatible with the other
# modules it is delivered with.


Jorgen Grahn:
But most of the time it has to be compatible with the standard library.

Even if these flags exist, I think it's very rare to encounter more than one
layout, for one specific compiler/compiler version/architecture.

The most commonly changed layout option is structure packing. Many C/C++
projects decide upon a particular packing option often just to be compatible
with files from the initial version or other applications and libraries.

The C++ standard library data types (vector, basic_string, etc.) are
mostly defined in header files which are compiled with the client code and
so use the same options. C system libraries like standard I/O and platform
libraries like Win32 use explicit compiler instructions to ensure they can
deal with client option choice. The Win32 headers use #pragma pack(n) to
ensure that client code will compile assuming particular packing of system
structures no matter how the client code is packing its own structures.

A C++ specific example is the use of the __declspec(novtable) compiler
directive (indirectly through the AFX_NOVTABLE macro) on many of the
interfaces in Microsoft's widely used ATL to remove code that references the
vtable and hence (normally) remove the vtable. A generic C++ library
accessor that opens a library based on ATL may have to know that some
classes that would be expected to have vtables will not actually have them.

Neil
 
J

Jorgen Grahn

Neil Hodgson:

# Much of the time C++ code only has to be compatible with the other
# modules it is delivered with.


Jorgen Grahn:
But most of the time it has to be compatible with the standard library.

Even if these flags exist, I think it's very rare to encounter more than one
layout, for one specific compiler/compiler version/architecture.

The most commonly changed layout option is structure packing. Many C/C++
projects decide upon a particular packing option often just to be compatible
with files from the initial version or other applications and libraries.

[...] C system libraries like standard I/O and platform
libraries like Win32 use explicit compiler instructions to ensure they can
deal with client option choice. The Win32 headers use #pragma pack(n) to
ensure that client code will compile assuming particular packing of system
structures no matter how the client code is packing its own structures.
....

I see. Then this seems to me to be a Windows culture issue which I've missed
because I'm a Unix person. The GNU and Solaris libcs (and all other Unix
shared libraries that I'm aware of) do not do this. Same thing with
AmigaDOS, as I recall it. Access these from code compiled with non-standard
packing and you're toast.

(Some Unix include files (things in /usr/include/netinet/, for example) have
structs with elaborate padding so that they can match e.g. an IP header
perfectly. This places a restriction on compiler writers, of course.)

That also explains to me why so many embedded projects I've been in have
messed with structure packing - the architects were Windows people.

One reason this is rare in the non-Windows world is (I suppose) that there's
often a huge run-time penalty for accessing misaligned words on non-x86
processors. The default layout is almost always the best possible, unless
you are low on memory.

/Jörgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,812
Messages
2,569,694
Members
45,478
Latest member
dontilydondon

Latest Threads

Top