How are POD structs associated with member functions?

J

JohnQ

The implementation of classes with virtual functions is conceptually easy to
understand: they use vtables. Which begs the question about POD structs: how
are they associated with their member functions in common implementations?
And where is the 'this' ptr tucked away at for POD structs with member
functions?

John
 
V

Victor Bazarov

JohnQ said:
The implementation of classes with virtual functions is conceptually
easy to understand: they use vtables. Which begs the question about
POD structs: how are they associated with their member functions in
common implementations? And where is the 'this' ptr tucked away at
for POD structs with member functions?

Yes, the 'this' pointer is usually passed into the function as the
hidden "zeroth" argument.

V
 
J

JohnQ

Victor Bazarov said:
Yes, the 'this' pointer is usually passed into the function as the
hidden "zeroth" argument.

The question is how the functions are staticly associated with the struct.
So there's this POD-struct in RAM, for example, and a member function get's
called. The POD-struct is the size of the data members so there's nothing in
it pointing to the functions apparently. How can the dot or -> operators
access the member functions? Some kind of special compiler-only overloading
of the member access operators that can distinguish between data member and
member function access?

John
 
G

Gavin Deane

The question is how the functions are staticly associated with the struct.
So there's this POD-struct in RAM, for example, and a member function get's
called. The POD-struct is the size of the data members so there's nothing in
it pointing to the functions apparently.

Why would there need to be?

Obviously *how* it actually works is an implementation detail. All
that matters is that the required behaviour is exhibited. But I would
think something along the lines of this:

You write

struct foo
{
void func() { i = 42; }

int i;
};

int main()
{
foo a_foo;
a_foo.func();

foo another_foo;
another_foo.func();

foo a_third_foo;
a_third_foo.i = 42;
}

The compiler treats it along the lines of

struct foo
{
int i;
};
void foo_func(foo* this) { this->i = 42; }

int main()
{
foo a_foo;
foo_func(&a_foo);

foo another_foo;
foo_func(&another_foo);

foo a_third_foo;
a_third_foo.i = 42;
}

I'm not suggesting the compiler actually rewrites your code to look
like that - just illustrating what's probably going on.
How can the dot or -> operators
access the member functions? Some kind of special compiler-only overloading
of the member access operators that can distinguish between data member and
member function access?

Given my struct foo above, the compiler knows which members are
functions (func) and which are data members (i). When I do

a_third_foo.i = 42;

it knows which foo object I want to work with (a_third_foo), it knows
where that object is in memory (wherever it put it) and it knows
where, within that memory, the int i is that I want to set to 42.

And when I do

another_foo.func();

it knows I want to call a function (func), it knows where the code for
that function is (wherever it put it) and it knows which object to
pass the address of as the hidden "this" parameter (another_foo).

Gavin Deane
 
J

John Harrison

JohnQ said:
The question is how the functions are staticly associated with the
struct.

In the compiled code they aren't.

So there's this POD-struct in RAM, for example, and a member
function get's called. The POD-struct is the size of the data members so
there's nothing in it pointing to the functions apparently. How can the
dot or -> operators access the member functions?

The compiler works it out.

Some kind of special
compiler-only overloading of the member access operators that can
distinguish between data member and member function access?

If it helps to think of it like that then yes.

Consider two programs

// first program
struct X
{
int get();
int d;
};

int X::get() { return d; }

// second program
struct X
{
int d;
};

int get(X* x) { return x->d; }

Once compiled these two programs are essentially the same. In some
region of memory there will be a function 'called' get (of course the
name has actually disappeared by this point) which takes one argument.
Inside that function the argument will be assumed to be pointing at an X
object. Whether that function was originally a member function or a free
standing function is irrelevant at this point.

john
 
J

JohnQ

Gavin Deane said:
Why would there need to be?

Obviously *how* it actually works is an implementation detail. All
that matters is that the required behaviour is exhibited. But I would
think something along the lines of this:

You write

struct foo
{
void func() { i = 42; }

int i;
};

int main()
{
foo a_foo;
a_foo.func();

foo another_foo;
another_foo.func();

foo a_third_foo;
a_third_foo.i = 42;
}

The compiler treats it along the lines of

struct foo
{
int i;
};
void foo_func(foo* this) { this->i = 42; }

int main()
{
foo a_foo;
foo_func(&a_foo);

foo another_foo;
foo_func(&another_foo);

foo a_third_foo;
a_third_foo.i = 42;
}

I'm not suggesting the compiler actually rewrites your code to look
like that - just illustrating what's probably going on.


Given my struct foo above, the compiler knows which members are
functions (func) and which are data members (i). When I do

a_third_foo.i = 42;

it knows which foo object I want to work with (a_third_foo), it knows
where that object is in memory (wherever it put it) and it knows
where, within that memory, the int i is that I want to set to 42.

And when I do

another_foo.func();

it knows I want to call a function (func), it knows where the code for
that function is (wherever it put it) and it knows which object to
pass the address of as the hidden "this" parameter (another_foo).

OK. That's a good explanation. Except, I can't see it working for objects
allocated on the free store.

John
 
J

John Harrison

OK. That's a good explanation. Except, I can't see it working for
objects allocated on the free store.

John

Why not? Gavin's explaination is of course how every compiler does do
this. Whether an object is allocated on the free store or not is
irrelevent. Every object has an address, and that is all that is
required for this too work.

Clearly you're not understnding something here but I have no idea what.
It's pretty simple, maybe you're just thinking too hard.

john
 
G

Gavin Deane

OK. That's a good explanation. Except, I can't see it working for objects
allocated on the free store.

There's no difference:

// struct foo is as above
int main()
{
foo local_foo;
local_foo.func(); // Call 1

foo* dynamic_foo_ptr = new foo;
dynamic_foo_ptr->func(); // Call 2
delete dynamic_foo_ptr;
}

For call 1, the compiler knows I want to call a function (func), it
knows where the code for that function is (wherever it put it) and it
knows which object to pass the address of as the hidden "this"
parameter (local_foo).

For call 2, the compiler knows I want to call a function (func - as
before), it knows where the code for that function is (wherever it put
it - exactly the same place as before) and this time it knows it has a
pointer to the particular foo object for which I want to invoke the
function func (dynamic_foo_ptr) and so it passes dynamic_foo_ptr as
the hidden "this" parameter.

Gavin Deane
 
J

JohnQ

Gavin Deane said:
There's no difference:

// struct foo is as above
int main()
{
foo local_foo;
local_foo.func(); // Call 1

foo* dynamic_foo_ptr = new foo;
dynamic_foo_ptr->func(); // Call 2
delete dynamic_foo_ptr;
}

For call 1, the compiler knows I want to call a function (func), it
knows where the code for that function is (wherever it put it) and it
knows which object to pass the address of as the hidden "this"
parameter (local_foo).

(I wrote this section after the stuff at the end of the post)

The secret ingredient is: how or by what mechanism the compiler knows how to
pass the foo to the foo function. OK, it doesn't. You say it does the
binding at compile time (yes?). There is no "passing of a foo to a foo func
at runtime" as it is "hard coded" in the compiled code (?). So with a free
store obj, it does similar and it is enough to know that the pointer to the
foo is "hard coded" (like where you said foo_func(&a_foo);) by the compiler
and the compiler doesn't care what actual foo is being pointed to. That's it
(!) isn't it?
For call 2, the compiler knows I want to call a function (func - as
before), it knows where the code for that function is (wherever it put
it - exactly the same place as before) and this time it knows it has a
pointer to the particular foo object for which I want to invoke the
function func (dynamic_foo_ptr) and so it passes dynamic_foo_ptr as
the hidden "this" parameter.

I was thinking of something less simplistic than the above example. Your
examples show instances where it is easy to comprehend what the compiler is
doing. Somewhere, I thought it would be difficult for the compiler to know
where class objects of a given class were instantiated (or something like
that) so I thought there must be some kind of external mapping of class
objects and their functions. Without trying to explain any more what I was
thinking, I'm wondering if the examples you show above are how it works ALL
the time (for POD-structs with member functions).

John
 
G

Gavin Deane

(I wrote this section after the stuff at the end of the post)

The secret ingredient is: how or by what mechanism the compiler knows how to
pass the foo to the foo function.

The compiler knows it has to pass *a* pointer-to-foo to a (non-static)
member function of foo because there is a pointer-t-foo in the
parameter list. If you write

struct foo
{
void func();
void func2(int i, double d, std::string s);
};
void foo::func() { /* ... */ }
void foo::func2(int i, double d, std::string s) { /* ... */ }

however much you might be tempted to think that you've written a
function, func, that takes zero parameters and a function, func2, that
takes three parameters (an int, a double and a std::string), you
haven't. You've written a function, func, that takes one parameter (a
pointer-to-foo) and a function, func2, that takes four parameters (a
pointer-to-foo, an int, a double and a std::string).

There are only two ways you can call func:

my_pointer_to_foo->func();
my_foo_object.func();

and, again, no matter how much you think you are calling a function
that takes zero arguments, you aren't. What you are doing with those
two statements is actually:

foo::func(my_pointer_to_foo);
foo::func(&my_foo_object);

And there are only two ways you can call func2:

my_pointer_to_foo->func2(my_int, my_double, my_string);
my_foo_object.func2(my_int, my_double, my_string);

which, similarly, is not calling a function that takes three
arguments, no matter how much it looks like it. Really, those two
statements are:

foo::func2(my_pointer_to_foo, my_int, my_double, my_string);
foo::func2(&my_foo_object, my_int, my_double, my_string);
OK, it doesn't. You say it does the
binding at compile time (yes?).

It is known at compile time which pointer-to-foo to pass to the member
function, yes, in exactly the same way as it is known which int,
double and std::string to pass in the case of func 2, and in exactly
the same way it is known at compile time which values to use as the
parameters to any function call, whether a free function or a member
function. It is known because the code you write states which pointer-
to-foo to pass to the member function. If it wasn't apparent from your
code, your code wouldn't compile.
There is no "passing of a foo to a foo func
at runtime" as it is "hard coded" in the compiled code (?).

A pointer-to-foo is passed as parameter to each (non-static) member
function of foo. It has to be known at compiler time *which* pointer-
to-foo is passed.
So with a free
store obj, it does similar and it is enough to know that the pointer to the
foo is "hard coded" (like where you said foo_func(&a_foo);) by the compiler

All that is required is that a pointer-to-foo is passed as a
parameter. Whether that pointer happens to point to a dynamically
allocated object is irrelevant.
and the compiler doesn't care what actual foo is being pointed to. That's it
(!) isn't it?

If I understand you corrently, then yes. The type of the parameter is
pointer-to-foo. If the function is invoked like

my_pointer_to_foo->func();

then the compiler has the necessary pointer-to-foo right there and
that's what it passes to the function. Everything the compiler needs
to make the function call is in that statement (and you will note that
nothing about that statement tells you, me or the compiler anything
about whether the foo object pointed to is dynamically allocated -
conclusion: whether the foo object is dynamically allocated is
irrelevant).

If the function is invoked like

my_foo_object.func();

the compiler's job involves one tiny extra step. The type of the
parameter to pass to func() is pointer-to-foo, but unlike before we
haven't given the compiler a pointer-to-foo, we've given it a foo.
However, it is simplicity itself for the compiler to obtain a pointer-
to-foo from a foo by taking the address of the foo. And that's what it
does.
I was thinking of something less simplistic than the above example. Your
examples show instances where it is easy to comprehend what the compiler is
doing. Somewhere, I thought it would be difficult for the compiler to know
where class objects of a given class were instantiated (or something like
that) so I thought there must be some kind of external mapping of class
objects and their functions.

Perhaps I confused you by first showing an example where a member
function was invoked on an object directly. Maybe I gave you the
impression that the object is what's needed to make a call to a member
function work. Hopefully with the above I've made it clear that what
the compiler needs whenever a member function is called is a pointer.
Without trying to explain any more what I was
thinking, I'm wondering if the examples you show above are how it works ALL
the time (for POD-structs with member functions).

How I've shown it above is how it works all the time, for all non-
static, non-virtual member functions of all classes (not just POD-
structs).

Are you happy that the following two examples are conceptually
identical and that example 2 shows how, in practice, example 1 is
implemented?

// Example 1
struct bar
{
void reset_i();
void set_i(int new_value);
int i;
};

void bar::reset_i() { i = 0; }

void bar::set_i(int new_value)
{
reset_i();
i = new_value;
}

int main()
{
bar* pb1 = new bar;
pb1->set_i(100);

bar b1;
b1.set_i(42);

bar b2;
bar* pb2 = &b2;
pb2->set_i(5);
}

// Example 2
struct bar
{
int i;
};

void bar_reset_i(bar* this_) { this_->i = 0; }

void bar_set_i(bar* this_, int new_value)
{
bar_reset_i(this_);
this_->i = new_value;
}

int main()
{
bar* pb1 = new bar;
bar_set_i(pb1, 100);

bar b1;
bar_set_i(&b1, 42);

bar b2;
bar* pb2 = &b2;
bar_set_i(pb2, 5);
}

Gavin Deane
 
J

JohnQ

(Read my response from the bottom up to avoid answering passages
unnecessarily. If I summed it up appropriately at the bottom of this post,
then there's no need to try and grok what precedes that.)

Gavin Deane said:
The compiler knows it has to pass *a* pointer-to-foo to a (non-static)
member function of foo because there is a pointer-t-foo in the
parameter list. If you write

struct foo
{
void func();
void func2(int i, double d, std::string s);
};
void foo::func() { /* ... */ }
void foo::func2(int i, double d, std::string s) { /* ... */ }

however much you might be tempted to think that you've written a
function, func, that takes zero parameters and a function, func2, that
takes three parameters (an int, a double and a std::string), you
haven't. You've written a function, func, that takes one parameter (a
pointer-to-foo) and a function, func2, that takes four parameters (a
pointer-to-foo, an int, a double and a std::string).

That was never the question. The question is how those things are associated
once again after they've been separated.
It is known at compile time which pointer-to-foo to pass to the member
function, yes, in exactly the same way as it is known which int,
double and std::string to pass in the case of func 2, and in exactly
the same way it is known at compile time which values to use as the
parameters to any function call, whether a free function or a member
function. It is known because the code you write states which pointer-
to-foo to pass to the member function. If it wasn't apparent from your
code, your code wouldn't compile.

I'm still able to use the dot and -> on a struct with apparently no member
functions because the size of it is the size of only the data members. So
there must be some magic going on to enable that to work.
If I understand you corrently, then yes. The type of the parameter is
pointer-to-foo. If the function is invoked like

my_pointer_to_foo->func();

Theoretically though, that shouldn't work because the compiler turned the
struct with member functions into a struct and separate functions taking the
hidden this argument. Is there some operator overloading going on?:

something_like_a_class& operator->(foo&); // returns something internal that
works with ->
then the compiler has the necessary pointer-to-foo right there and
that's what it passes to the function. Everything the compiler needs
to make the function call is in that statement (and you will note that
nothing about that statement tells you, me or the compiler anything
about whether the foo object pointed to is dynamically allocated -
conclusion: whether the foo object is dynamically allocated is
irrelevant).

The key thing though has not yet been revealed.
How I've shown it above is how it works all the time, for all non-
static, non-virtual member functions of all classes (not just POD-
structs).

I am emphasizing POD-structs because they can't have a vptr. The disconnect
that I have is how I can call a member function from an object instance with
the dot or -> operator when the struct is just a POD and what trickery the
compiler does do make sizeof(my_struct) the size of a POD instead of size of
something with a vptr. The second part of that is probably easy: the
compiler generates a struct and separate function from my definition of the
POD-struct w/member functions. So if those member functions are separated
from the struct, how is it that dot or -> can work to call a member function
that isn't part of the struct anymore?
Are you happy that the following two examples are conceptually
identical and that example 2 shows how, in practice, example 1 is
implemented?

Sure. That was never a question really. How that is done is what I want to
know. LOL, I'm talking myself into circles! I think I know the answer but I
forget it continually almost as soon as I learn it. At the following line in
my code:

foo.foo_func();

the compiler generates in the machine code, the equivalent of:

foo_func(&foo);

Therefor, there is no mechanism that associates foo structs with foo member
functions. It's all done by code generation. And the struct returns the size
of the contained sum of the sizes of it's data members because the struct
and member functions do indeed get separated by the compiler in the machine
code. And classes are handled the same way except for classes with virtual
functions which requires the introduction of the vptr into the data struct.


John
 
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

(Read my response from the bottom up to avoid answering passages
unnecessarily. If I summed it up appropriately at the bottom of this post,
then there's no need to try and grok what precedes that.)



That was never the question. The question is how those things are associated
once again after they've been separated.


I'm still able to use the dot and -> on a struct with apparently no member
functions because the size of it is the size of only the data members. So
there must be some magic going on to enable that to work.


Theoretically though, that shouldn't work because the compiler turned the
struct with member functions into a struct and separate functions taking the
hidden this argument. Is there some operator overloading going on?:

something_like_a_class& operator->(foo&); // returns something internal that
works with ->


The key thing though has not yet been revealed.


I am emphasizing POD-structs because they can't have a vptr. The disconnect
that I have is how I can call a member function from an object instance with
the dot or -> operator when the struct is just a POD and what trickery the
compiler does do make sizeof(my_struct) the size of a POD instead of size of
something with a vptr. The second part of that is probably easy: the
compiler generates a struct and separate function from my definition of the
POD-struct w/member functions. So if those member functions are separated
from the struct, how is it that dot or -> can work to call a member function
that isn't part of the struct anymore?


Sure. That was never a question really. How that is done is what I want to
know. LOL, I'm talking myself into circles! I think I know the answer but I
forget it continually almost as soon as I learn it. At the following line in
my code:

foo.foo_func();

the compiler generates in the machine code, the equivalent of:

foo_func(&foo);

Therefor, there is no mechanism that associates foo structs with foo member
functions. It's all done by code generation. And the struct returns the size
of the contained sum of the sizes of it's data members because the struct
and member functions do indeed get separated by the compiler in the machine
code. And classes are handled the same way except for classes with virtual
functions which requires the introduction of the vptr into the data struct.

I'm sorry, you kind of lost me here. Up till this last part it seems
like you don't get it and you ask how it's done. And now it seems you do
get it so I'm not sure if there's still something that you don't
understand. Please clarify.
 
J

JohnQ

Erik Wikström said:
I'm sorry, you kind of lost me here. Up till this last part it seems like
you don't get it and you ask how it's done. And now it seems you do get it
so I'm not sure if there's still something that you don't understand.
Please clarify.

That's why I put the parenthetical text at the top of the post: I left my
thought process in the post just because. The key was in thinking like a
compiler instead of a coder. Developers tend to think in terms of mechanisms
(at least I do) instead of code generation, so the "how" wasn't apparent,
but is now. There is no C++ after the compiler gets at the code in this case
as it turns it all into C stuff pretty much where there is no "physical"
mechanism of association between the data struct and the member functions
(no overloaded operators or such).

John
 
G

Gavin Deane

I'm sorry, you kind of lost me here. Up till this last part it seems
like you don't get it and you ask how it's done. And now it seems you do
get it so I'm not sure if there's still something that you don't
understand. Please clarify.

I think John did sum it appropriately at the bottom of his post,
therefore his opening comment about not needing to try and grok the
rest of his post applies.

Gavin Deane
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top