C++ and shared objects

P

Phoenix87

Hallo everybody

I want to build some shared libraries from C++ code defining classes
because I want to load them dynamically in programs that need that
classes. I've read about dlopen and co., which require the
implementation of factory procedures in order to dynamically allocate
class instances from the loaded shared object. Is there an alternative
way to dynamically load classes in C++ programs? I mean something
which allows to instantiate new class instances with new keyword
etc...Basically my question is quite the same thing as asking how the
STL, or the CERN ROOT libraries work.

Thanks in advice!

Gab.
 
T

tonydee

I want to build some shared libraries from C++ code defining classes
because I want to load them dynamically in programs that need that
classes. I've read about dlopen and co., which require the
implementation of factory procedures in order to dynamically allocate
class instances from the loaded shared object. Is there an alternative
way to dynamically load classes in C++ programs? I mean something
which allows to instantiate new class instances with new keyword
etc...Basically my question is quite the same thing as asking how the
STL, or the CERN ROOT libraries work.

Using "new" (or creating an object on the stack) requires knowledge of
the object's size. For non-polymorphic objects, this is perfectly
possible without factories and polymorphism. But, just as in library
code that's not compiled into a shared object, if you want the library
to be able to vary the object size without needing the application to
be recompiled then you need to do something extra. Techniques such as
the pImpl idiom and run-time polymorphism / factories defer the memory
allocation to the shared object's code. You could plausibly have the
shared object report the size requirements back to the application, so
it can allocate memory using new, alloca etc., but that would be an
even more roundabout way of doing it.

The STL is quite different in that it's a template library...
everything is in the headers and they're instantiated as needed per
translation unit, with redundant instantiations removed at link time.
There is no shared library object involved. Same goes for most of
BOOST. I'm not familiar with CERN ROOT....

Cheers,
Tony
 
R

Robert Fendt

I want to build some shared libraries from C++ code defining classes
because I want to load them dynamically in programs that need that
classes. I've read about dlopen and co., which require the
implementation of factory procedures in order to dynamically allocate
class instances from the loaded shared object. Is there an alternative
way to dynamically load classes in C++ programs? I mean something
which allows to instantiate new class instances with new keyword
etc...Basically my question is quite the same thing as asking how the
STL, or the CERN ROOT libraries work.

STL is a template framework. It therefore does not consist of
any runtime libraries apart from the C standard stuff
underneath. I do not know how ROOT works, but at first glance it
looks like just any other library to me. Which means that there
is no 'dynamic' stuff involved, at least not for the basics.

In order for a direct 'new' to work, the class definition has to
be available (header file) as well as the object files
containing all non-inline functions. So you just can build a
shared library of your stuff and use it for regular compiling
and linking against. In this case, the system's runtime
facilities will do the loading of the dynamic library for you at
program startup. This is straight-forward, and I do not
interpret your question in that direction. You seem to think
about dynamic loading rather than dynamic linking.

Dynamic loading means that you load something your compiler did
not know about. If you look at the specifications of dlopen()
and such (the Windows equivalents have similar restrictions),
you will find that the framework just allows for retrieval of
unmangled (i.e. plain C) symbols. This works for constants and
for function pointers (although the ability to retrieve a
function pointer directly through dlsym() is technically a
non-standard compiler extension, albeit a popular one).

Long story short: you really can only get a pointer to a factory
function by using dlsym() (POSIX) or GetProcAddress() (Windows).
There is no way around it, just live with it. Loading C++
classes directly would involve all sorts of pains and hassles,
not to mention a change to the C++ specifications, and since at
least POSIX does not deal with C++ anyway, dlsym() will not be
changed in the near future. I do not know about Microsofts's
plans for VC and GetProcAddress, but I would be surprised.

Regards,
Robert
 
J

James Kanze

And thus spake Phoenix87 <[email protected]>
Sun, 7 Feb 2010 16:14:50 -0800 (PST):
STL is a template framework. It therefore does not consist of
any runtime libraries apart from the C standard stuff
underneath.

It depends on what you mean by "STL". Traditionally, it did
refer to a library created by Stepanov, which in fact was almost
exclusively templates. Most people today, however, use the term
STL to refer to the standard library. In which case, it's
provided somehow by your compiler, and you don't really have to
know the details. (And it's not all templates.)

Of course, the real problem is that he doesn't really say what
he's trying to do, so it's difficult to find a solution. In
general, if you're providing a library for other programs,
you'll provide a set of headers, a dynamic object *and* a static
library, since except in exceptional cases, the client will
probably prefer linking statically. (If nothing else, unless
you're freeware, the licensing issues for DLL's will be a
nightmare.)
In order for a direct 'new' to work, the class definition has
to be available (header file) as well as the object files
containing all non-inline functions.

And in order for a factory method to work, and external
declaration of the factory method must be present. And to use
whatever the factory method returns, you also need headers which
declare the interface. In the end, it's purely a question of
what and how you want to provide something.
So you just can build a shared library of your stuff and use
it for regular compiling and linking against. In this case,
the system's runtime facilities will do the loading of the
dynamic library for you at program startup.

With some notable exceptions, if that's the case, you should
probably be linking statically.
This is straight-forward, and I do not interpret your question
in that direction. You seem to think about dynamic loading
rather than dynamic linking.
Dynamic loading means that you load something your compiler
did not know about. If you look at the specifications of
dlopen() and such (the Windows equivalents have similar
restrictions), you will find that the framework just allows
for retrieval of unmangled (i.e. plain C) symbols. This works
for constants and for function pointers (although the ability
to retrieve a function pointer directly through dlsym() is
technically a non-standard compiler extension, albeit a
popular one).

The interface (both under Unix and under Windows) allows for
loading just about anything. (Windows is somewhat more
restrictive here, I think.) All you have to do is give the name
according to the local conventions, which means mangled in the
case of Windows or Unix. On the other hand, you also have the
problem that the function returns a void* (Unix) or a void (*)()
(Windows), which means you'll also need some casting. And no
doubt, headers, to define what you'll be casting to. In
practice, because of the mangling, the easiest solution is to
provide an ``extern "C"'' factory function. (Typically, the
mangling used by C is far simpler than that used by C++.) But
the client code still needs the header files to know the target
type of the cast, and the type actually returned.
Long story short: you really can only get a pointer to a
factory function by using dlsym() (POSIX) or GetProcAddress()
(Windows). There is no way around it, just live with it.

That's false. With both, you can get a pointer to anything.
And if the library contains classes which have been exported
(and under Unix, by default, everything has been exported), you
can new them. But you need a header file with the concrete type
to do so.
Loading C++ classes directly would involve all sorts of pains
and hassles, not to mention a change to the C++
specifications, and since at least POSIX does not deal with
C++ anyway, dlsym() will not be changed in the near future. I
do not know about Microsofts's plans for VC and
GetProcAddress, but I would be surprised.

GetProcAddress is almost exactly like dlsym, except that it
returns the address of a function, rather than the address of
data, But since both Windows and Unix require the two to have
the same representation, all it takes is some very ugly funny
casting to get from one to the other. (As luck would have it,
every time I've used dlsym, I've needed the address of a
function, but the first time I used GetProcAddress, it was to
obtain the address of data.)

In general, I think what you're trying to say is correct, but
the issues are far more complicated than one would guess from
your posting.
 
R

Robert Fendt

The interface (both under Unix and under Windows) allows for
loading just about anything. (Windows is somewhat more
restrictive here, I think.) All you have to do is give the name
according to the local conventions, which means mangled in the
case of Windows or Unix. On the other hand, you also have the
problem that the function returns a void* (Unix) or a void (*)()
(Windows), which means you'll also need some casting. And no
doubt, headers, to define what you'll be casting to. In
practice, because of the mangling, the easiest solution is to
provide an ``extern "C"'' factory function. (Typically, the
mangling used by C is far simpler than that used by C++.) But
the client code still needs the header files to know the target
type of the cast, and the type actually returned.

In fact, dlsym() is defined by the POSIX standard in a way that
it is designed exclusively for "extern C" definitions (i.e.,
unmangled symbols). Since the name mangling of C++ is
implementation-defined and one usually cannot directly get at
the mangled name, you cannot directly (and portably!) load C++
symbols, period.

What I was referring to as the 'non-standard extension' is the
fact that dlsym returns void*, and to cast void* to a function
pointer is 'illegal' (in the sense that it is *not* covered by
either C++ nor C standards). It is a popular extension, and one
that the POSIX standard requires. But an extension nonetheless.
That's false. With both, you can get a pointer to anything.
And if the library contains classes which have been exported
(and under Unix, by default, everything has been exported), you
can new them. But you need a header file with the concrete type
to do so.

GetProcAddress yields in fact a function pointer. To cast this
to an object pointer is just as illegal as the other way round
(like dlsym() requires). And you are missing my point, actually.
Yes, you can return a pointer to "anything". But that's not good
enough for C++'s "new" to work, since you can only return a
pointer to an instance of some kind (i.e., a function or some
kind of object). Classes are not first-class objects in C++, so
you cannot return the "address of a class" or something like
that. The only symbols that a class generates are its member
functions (in mangled form), and those you cannot just load at
runtime.

If you have the class declaration in scope, you can call "new",
on it but the program will not even link since the new call causes
symbols for any non-inline function in the class to be generated
(and thus the linker complains). If the class is purely inline,
then apparently you do not need to bother with runtime loading
at all, so it's moot. The only workable way for everything else
is a factory. If you think otherwise, I dare you to show me an
example of 'direct class loading' in such a way that new 'just
works' on it. And please without tricks like redefining
'new'. ;-)
In general, I think what you're trying to say is correct, but
the issues are far more complicated than one would guess from
your posting.

Indeed I have tried to simplify things a bit, since I did not
exactly know if dynamic loading was really what the OP wanted
(and as it turned out, it was not).

Regards,
Robert
 
J

Jorgen Grahn

Hallo everybody

I want to build some shared libraries from C++ code defining classes

Didn't you just post this in comp.lang.c++.moderated?

/Jorgen
 
J

James Kanze

And thus spake James Kanze <[email protected]>
Mon, 8 Feb 2010 14:39:06 -0800 (PST):
In fact, dlsym() is defined by the POSIX standard in a way
that it is designed exclusively for "extern C" definitions
(i.e., unmangled symbols). Since the name mangling of C++ is
implementation-defined and one usually cannot directly get at
the mangled name, you cannot directly (and portably!) load C++
symbols, period.

In fact, dlsym() is defined by the POSIX standard in a way that
it is designed exclusively for data, and not for functions at
all. But since POSIX also requires data pointers and function
pointers to have the same size and representation, it doesn't
matter.

Beyond that, POSIX is completely neutral with regards to what
the returned value points to; the only thing that's important is
that you convert the pointer to whatever type the pointed to
object or function really has in the shared library.

As for name mangling... That's a different issue. In general,
you can't link C++ programs compiled with different compilers,
statically or dynamically. But formally, there's nothing wrong
with passing the mangled name to dlsym. It's just a lot easier
to use ``extern "C"''. (Note that the same considerations apply
to GetProcAddress under Windows.)
What I was referring to as the 'non-standard extension' is the
fact that dlsym returns void*, and to cast void* to a function
pointer is 'illegal' (in the sense that it is *not* covered by
either C++ nor C standards). It is a popular extension, and
one that the POSIX standard requires. But an extension
nonetheless.

The POSIX standard does *not* require it (although as you say,
most Unix compilers do have it). The example in the POSIX
standard doesn't use it; it uses something like:

void (*pf)();
*(void**)(&pf) = dlsym("function");

(Note that in C++, since you're passing an unmangled name, you'd
have to write the first statement:

extern "C" {
void (*pf)();
}
GetProcAddress yields in fact a function pointer. To cast this
to an object pointer is just as illegal as the other way round
(like dlsym() requires).

Yes. But the same trick as POSIX requires above can be used in
reverse; both function and data pointers do in fact have the
same size and representation under Windows.
And you are missing my point, actually. Yes, you can return a
pointer to "anything". But that's not good enough for C++'s
"new" to work, since you can only return a pointer to an
instance of some kind (i.e., a function or some kind of
object). Classes are not first-class objects in C++, so you
cannot return the "address of a class" or something like that.
The only symbols that a class generates are its member
functions (in mangled form), and those you cannot just load at
runtime.

But the new operator doesn't require an object for a class. But
yes, I think I missed your point. IIUC, what you're saying is
that you can't use dlsym directly to obtain a newed object.
Which is correct: of necessity, it returns the address of
something that has static lifetime (in the sense of the
standard).
 
R

Robert Fendt

The POSIX standard does *not* require it (although as you say,
most Unix compilers do have it). The example in the POSIX
standard doesn't use it; it uses something like:

void (*pf)();
*(void**)(&pf) = dlsym("function");

You mean defining a pointer to the function in question
alongside it and retrieve the address of that pointer rather
than the address of the function itself. Yes, that used to be
the only correct workaround, to my knowledge. However, one small
final nitpick: the current version of the POSIX standard (i.e.,
v7) does actually require the possibility of the conversion:

http://www.opengroup.org/onlinepubs/9699919799/functions/dlsym.html

'The ISO C standard does not require that pointers to functions
can be cast back and forth to pointers to data. However,
POSIX-conforming implementations are required to support this,
as noted in Pointer Types . The result of converting a pointer
to a function into a pointer to another data type (except void
*) is still undefined, however.'
(Note that in C++, since you're passing an unmangled name, you'd
have to write the first statement:

extern "C" {
void (*pf)();
}

In most cases, this is IIRC not even really necessary, since the
calling convention for functions is actually most often
identical (apart from name mangling, which is irrelevant for the
receiving pointer). But you are right, to be on the safe side it
should be declared like that.
Yes. But the same trick as POSIX requires above can be used in
reverse; both function and data pointers do in fact have the
same size and representation under Windows.

As they have on most systems in use today. On Windows, this is
not really a problem, since the manufacturer of both the
operating system and the compiler is one and the same, so there
is not even potential for conflict. The fact that neither C nor
C++ specify that a function pointer and an object pointer have
to have identical size, does not mean that they are not allowed
to (and I do not know an example where they are actually
different).
But the new operator doesn't require an object for a class. But
yes, I think I missed your point. IIUC, what you're saying is
that you can't use dlsym directly to obtain a newed object.
Which is correct: of necessity, it returns the address of
something that has static lifetime (in the sense of the
standard).

What I meant was that this is impossible:

DO_SOMETHING_MAGICAL_WITH_DLSYM("ClassThatWasNotKnownBefore");
KnownBaseClass* = new ClassThatWasNotKnownBefore;

this simply cannot work since C++ (like C) is statically typed.
One cannot 'load' a type, since it is not a first-class entity.
E.g. in Python this is different, due to 'everything is an
object' (even the class itself). I am serious: if this is
possible, please show me how. I'm a curious person. :)

What is indeed possible is to use a factory and two-phase
construction, and so far that is the only way I know of:

ClassFactory.load_from_file("filename.so", "MyClass");
BaseClass* instance = ClassFactory.instantiate("MyClass");
instance->set_parameters(param_map);

Regards,
Robert
 
J

James Kanze

And thus spake James Kanze <[email protected]>
Tue, 9 Feb 2010 13:12:33 -0800 (PST):
You mean defining a pointer to the function in question
alongside it and retrieve the address of that pointer rather
than the address of the function itself.

No. I mean using reinterpret_cast to say that a reference to a
pointer to function is in fact a reference to a void*.
Formally, undefined behavior, but more or less guaranteed if
void* and pointers to functions have the same size and
representation. In C++, the above (taken directly from the
POSIX standard, so in C) would be:

extern "C" void (*pf)();
reinterpret_cast said:
Yes, that used to be the only correct workaround, to my
knowledge. However, one small final nitpick: the current
version of the POSIX standard (i.e., v7) does actually require
the possibility of the conversion:

'The ISO C standard does not require that pointers to
functions can be cast back and forth to pointers to data.
However, POSIX-conforming implementations are required to
support this, as noted in Pointer Types . The result of
converting a pointer to a function into a pointer to another
data type (except void *) is still undefined, however.'

That's in the rationale part of the standard, and so not
normative. The examples don't require this. What is
(indirectly) required, is that void* and pointers to functions
have the same size and representation. (Although as far as I
know, this is never stated explicitly, and is only implied by
the non-normative example.)
In most cases, this is IIRC not even really necessary, since the
calling convention for functions is actually most often
identical (apart from name mangling, which is irrelevant for the
receiving pointer). But you are right, to be on the safe side it
should be declared like that.

Independantly of the calling conventions for functions (and I've
actually used C/C++ compilers where it was different), the
C++ standard says that the language linkage is part of the type,
so any mismatches are errors which require a diagnostic. On the
other hand, you're using dlsym, which isn't part of the C++
standard, so you're in implementation specific code anyway.
As they have on most systems in use today.

I don't know about *most*. It's true that most general purpose
systems do have this restriction, but the Intel architecture
supports 48 bit pointers, and compilers can be (and have been)
built where data pointers are 32 bits, but function pointers 48
(and vice versa). And Intel isn't particularly exotic.
On Windows, this is not really a problem, since the
manufacturer of both the operating system and the compiler is
one and the same, so there is not even potential for conflict.

That's more or less true, and one of the reasons why you
generally prefer to use the "native" compiler, rather than some
other compiler. But Windows and Visual C++ do come from
different divisions, and in large companies, it's been known to
occur that different divisions didn't talk to one another. (But
in practice, I think you're safe on this one.)
The fact that neither C nor C++ specify that a function
pointer and an object pointer have to have identical size,
does not mean that they are not allowed to (and I do not know
an example where they are actually different).

Funny, not so long ago, they were different on the most
widespread systems around (MS-DOS).
 
J

James Kanze

Robert Fendt wrote:
In certain memory models on 16-bit Intel machines

The same holds for 32-bit Intel.
the pointers to data may cover more than 64K while the
pointers to code are 64K only.

I've actually worked on 32-bit Intel machines where pointers to
data were 48 bit, but pointers to functions 32. Or vice versa,
depending on the compilation model.
Or (rarely) the other way around.

Not so rare. For various reasons, early Unix on 8086 (e.g. the
one commercialized by Microsoft) supported 32 bit pointers for
code, but not for data.
Or may (à la Harvard architecture) be the same size, but in a
different address space.
I don't recall seeing C++ on any such machine, but I'm sure it
could be done.

You've never heard of Turbo C++. Borland's first C++ offering.
Or Zortech C++. Early versions of Microsoft C had similar
characteristics (but I don't think Microsoft had a C++ compiler
before everything was 32 bits).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top