pImpl idiom: Heap vs Stack

H

Hoyin

I am reading an notes about Pointer To Implementation (pImpl) Idiom
which is trying to decouple the interface and impl via a pointer e.g.

class Book
{
public:
void print(){m_p->print()};
private:
class BookImpl;
BookImpl* m_p;
};

class BookImpl // in a separated file
{
public:
void print();
private:
std::string m_Contents;
std::string m_Title;
};

which avoid re-compile on the client code if we want to add/remove
property for Book (we change class BookImpl's data member rather than
class Book), BUT, in most case, the obj of BookImpl will be allocated
in Heap rather than Stack. Will there be significant cost introduced
by this change (we will be accessing the data from Heap instead of
Stack)? I was told accessing data from Heap is less efficient than
from Stack (even thought we have pointers/address to those memory) but
why? How we can measures the cost introduced? thanks

Hoyin
 
P

puppi

I am reading an notes about Pointer To Implementation (pImpl) Idiom
which is trying to decouple the interface and impl via a pointer e.g.

class Book
{
public:
  void print(){m_p->print()};
private:
  class BookImpl;
  BookImpl* m_p;

};

class BookImpl // in a separated file
{
public:
  void print();
private:
  std::string  m_Contents;
  std::string  m_Title;

};

which avoid re-compile on the client code if we want to add/remove
property for Book (we change class BookImpl's data member rather than
class Book), BUT, in most case, the obj of BookImpl will be allocated
in Heap rather than Stack. Will there be significant cost introduced
by this change (we will be accessing the data from Heap instead of
Stack)? I was told accessing data from Heap is less efficient than
from Stack (even thought we have pointers/address to those memory) but
why?  How we can measures the cost introduced? thanks

Hoyin

I will restrict my considerations to the x86 family of processors
(essentially, 32 and 64 intel and intel-compatible processors, except
for the discontinued intel's Itanium processor). I suspect there is
some generality to what I'm saying, but I can't safely say that it
will all hold exactly for other processor architectures (such as the
ARM architecture, common in smartphones).
That being said, and assuming that you know what an offset is: every
processor has registers, which are, grosso modo, a region of memory
inside the CPU with very fast access time (much faster than RAM). When
you have a local variable, its address is known at compile time as an
offset to a register known as the base pointer. Hence the address of
local variables can be computed very quickly: it suffices to add some
compile time constant to a register. If a local variable is composite,
such as an object, and since object members addresses are defined at
compile time as offsets to the object's address (the exact offset
being implementation dependant), the compiler just adds the member's
offset to the object's offset (from the base pointer) to compute the
member's address (relative to the base pointer), allowing very fast
access to members. If an object is stored dynamically instead, first
the pointer's address must be obtained (which is a compile time
constant for static allocation, an offset to base pointer for
automatic allocation, i.e. if it is local), then its contents must be
fetched from memory, and then these contents, the address of the
object, are used to compute the member's address. The difference in
efficiency between a local object and a dynamically allocated object
is that of a memory access (in RAM), plus a possible register-level
addition (which has a dismissable cost compared to the memory access).
That usually doesn't mean much as an absolute time cost, except for
member's used very frequently in places such as loops.
However, if you access a dynamically allocated objects frequently in a
portion of code, a decent compiler (with optimized compilation) will
only fetch the object's address once and place it in a register. From
that point on, the access times for the members will be the same as if
it were allocated in the stack. This is why you shouldn't worry too
much about the extra time cost for your implementation: they will only
really cost you if you don't use a decent compiler.
 
J

Juha Nieminen

Hoyin said:
which avoid re-compile on the client code if we want to add/remove
property for Book (we change class BookImpl's data member rather than
class Book), BUT, in most case, the obj of BookImpl will be allocated
in Heap rather than Stack. Will there be significant cost introduced
by this change (we will be accessing the data from Heap instead of
Stack)?

It depends on how much you instantate the class in question (and how).

If you create, for example, a vector containing a million instances of
your class, the difference (in the creation time) could be quite
significant (one million one allocations for the PImpl version vs. one
single allocation for the non-PImpl version, and memory allocation can
be a relatively heavy operation). Memory consumption will also increase
with the PImpl version.
 
G

Goran

I am reading an notes about Pointer To Implementation (pImpl) Idiom
which is trying to decouple the interface and impl via a pointer e.g.

class Book
{
public:
  void print(){m_p->print()};
private:
  class BookImpl;
  BookImpl* m_p;

};

class BookImpl // in a separated file
{
public:
  void print();
private:
  std::string  m_Contents;
  std::string  m_Title;

};

which avoid re-compile on the client code if we want to add/remove
property for Book (we change class BookImpl's data member rather than
class Book), BUT, in most case, the obj of BookImpl will be allocated
in Heap rather than Stack. Will there be significant cost introduced
by this change (we will be accessing the data from Heap instead of
Stack)? I was told accessing data from Heap is less efficient than
from Stack (even thought we have pointers/address to those memory) but
why?  How we can measures the cost introduced?

Is there a cost: yes, absolutely.

Is it significant: bad question. Answer depends very much on how
1. much speed you need
2. what are your call patterns
3. what your implementation (new/delete pair, mostly) and hardware
give you,

How to measure: make two versions (pimpl/"direct") of your code and
measure in an __optimized__ (by the compiler) build.

In C++, an abstract base class is a viable alternative to pimpl. Check
out http://stackoverflow.com/questions/825018/pimpl-idiom-vs-pure-virtual-class-interface

Goran.
 
H

Hoyin

I will restrict my considerations to the x86 family of processors
(essentially, 32 and 64 intel and intel-compatible processors, except
for the discontinued intel's Itanium processor). I suspect there is
some generality to what I'm saying, but I can't safely say that it
will all hold exactly for other processor architectures (such as the
ARM architecture, common in smartphones).
That being said, and assuming that you know what an offset is: every
processor has registers, which are, grosso modo, a region of memory
inside the CPU with very fast access time (much faster than RAM). When
you have a local variable, its address is known at compile time as an
offset to a register known as the base pointer. Hence the address of
local variables can be computed very quickly: it suffices to add some
compile time constant to a register. If a local variable is composite,
such as an object, and since object members addresses are defined at
compile time as offsets to the object's address (the exact offset
being implementation dependant), the compiler just adds the member's
offset to the object's offset (from the base pointer) to compute the
member's address (relative to the base pointer), allowing very fast
access to members. If an object is stored dynamically instead, first
the pointer's address must be obtained (which is a compile time
constant for static allocation, an offset to base pointer for
automatic allocation, i.e. if it is local), then its contents must be
fetched from memory, and then these contents, the address of the
object, are used to compute the member's address. The difference in
efficiency between a local object and a dynamically allocated object
is that of a memory access (in RAM), plus a possible register-level
addition (which has a dismissable cost compared to the memory access).
That usually doesn't mean much as an absolute time cost, except for
member's used very frequently in places such as loops.
However, if you access a dynamically allocated objects frequently in a
portion of code, a decent compiler (with optimized compilation) will
only fetch the object's address once and place it in a register. From
that point on, the access times for the members will be the same as if
it were allocated in the stack. This is why you shouldn't worry too
much about the extra time cost for your implementation: they will only
really cost you if you don't use a decent compiler.

Thanks Puppi, so basically the "extra" cost is from that we have to
compute the address (the offset from base pointer) of the heap object
on the fly rather than at compile time? what the registers (memory
inside CPU provide fast access time) is storing? the addresses of
local objects we computed at compile time? So there will be no
efficiency difference in access two objects in Heap/Stack if we can
(assume we can) know the offsets (against the base pointer) of the two
objects at compile time? thanks

Hoyin
 
R

robertwessel2

I will restrict my considerations to the x86 family of processors
(essentially, 32 and 64 intel and intel-compatible processors, except
for the discontinued intel's Itanium processor).


IPF may not be doing all that well, but it's most certainly not
discontinued (in fact a major new CPU is on the way).
 
N

Noah Roberts

Will there be significant cost introduced
by this change (we will be accessing the data from Heap instead of
Stack)? I was told accessing data from Heap is less efficient than
from Stack (even thought we have pointers/address to those memory) but
why? How we can measures the cost introduced?

With a profiler. If it indeed turns out that that there's too much
space or time being used up by the pimpl you can always push it back
into the open, close the firewall, and the cost is gone. You'll still
have a better design.

Unless you have real data associated with the profiling of your code, I
wouldn't even bother considering the opinions of those worried about
micro-optimizations. 99.99999% of the time they can be safely ignored.
 
J

James Kanze

It depends on how much you instantate the class in question (and how).
If you create, for example, a vector containing a million instances of
your class, the difference (in the creation time) could be quite
significant (one million one allocations for the PImpl version vs. one
single allocation for the non-PImpl version, and memory allocation can
be a relatively heavy operation). Memory consumption will also increase
with the PImpl version.

And locality could decrease, with negative impact even after the
allocation. If the class is small (e.g. something like complex
or Point), and you expect to have vectors with millions of them,
the compilation firewall idiom is not indicated. For a large
number of larger classes, however, it could be. On the other
hand, such larger classes are likely to be entity classes, which
can easily be constructed using a factory method, and then only
accessed through a pointer to the abstract base class. Which
accomplishes more or less the same thing as the compilation
firewall idiom.

So there's no general rule. You use whatever is most
appropriate for your application.
 
Ö

Öö Tiib

Is there a cost: yes, absolutely.

Is it significant: bad question. Answer depends very much on how
1. much speed you need
2. what are your call patterns
3. what your implementation (new/delete pair, mostly) and hardware
give you,

How to measure: make two versions (pimpl/"direct") of your code and
measure in an __optimized__ (by the compiler) build.

Also things should be measured for real application, not some
hypothetical "create them a billion" tests. Pimple's state is cheap to
swap and to move, (just a single pointer) so for certain algorithms it
may even result with performance gain. Usually however the difference
does not affect performance since the bottle-necks tend to be in slow
I/O, in crappy concurrency and synchronization or in wasteful
algorithms.
In C++, an abstract base class is a viable alternative to pimpl. Check
out http://stackoverflow.com/questions/825018/pimpl-idiom-vs-pure-virtual-class-interface

I have to disagree. People use these two things for different goals in
C++. It is very good that C++ supports such wide variety of idioms so
good designer can pick exactly fitting tool for each situation. These
two idioms do not overlap enough to alternate i think.

Pimple is fully implemented class with well-hidden and fire-walled
inner structure. That makes it useful as (non-abstract) base class or
as data member ... in OOP terms as generalized parent or as component
or element. In short ... like a value with functionality. Viable
alternative is just usual class without such firewall, abstract
interface does not cut it.

Abstract interface is a set of function descriptions that form an
interface. It is useful for various loose-coupling OOP relations
between different entities involving realization, association and
aggregation. In short ... an user can get such interface from some
factory in library and keep it as association with library or has to
implement such interface to inject it into library.
 
G

Goran

Also things should be measured for real application, not some
hypothetical "create them a billion" tests.

Well, yes, that's what I meant by "make two versions ... of __your__
code" (emphasis added).
Pimple's state is cheap to
swap and to move, (just a single pointer) so for certain algorithms it
may even result with performance gain.

Well, yes, but a pointer (to the abstract base) is even cheaper to
swap and move. Pimpl doesn't give anything more there.
I have to disagree. People use these two things for different goals in
C++. It is very good that C++ supports such wide variety of idioms so
good designer can pick exactly fitting tool for each situation. These
two idioms do not overlap enough to alternate i think.

Pimple is fully implemented class with well-hidden and fire-walled
inner structure. That makes it useful as (non-abstract) base class or
as data member ... in OOP terms as generalized parent or as component
or element. In short ... like a value with functionality. Viable
alternative is just usual class without such firewall, abstract
interface does not cut it.

Abstract interface is a set of function descriptions that form an
interface. It is useful for various loose-coupling OOP relations
between different entities involving realization, association and
aggregation. In short ... an user can get such interface from some
factory in library and keep it as association with library or has to
implement such interface to inject it into library.

Well... Question was performance-related. If locality of reference
matters on given hardware (which, with virtual base class and virtual
calls, can easily happen), all the better. In light of performance,
conceptual purity matters less, on one hand, and on the other, end
result, functionality-wise, is pretty similar between the two. Hence
"viable alternative".

Goran.
 
Ö

Öö Tiib

Well, yes, that's what I meant by "make two versions ... of __your__
code" (emphasis added).
OK.


Well, yes, but a pointer (to the abstract base) is even cheaper to
swap and move. Pimpl doesn't give anything more there.

Hmm, i thought you were comparing pimpl with direct object above. Via
abstract interface lets at least agree with a "draw". ;-) Pimple is
also pointer unless it has something additional to carried
implementation.
Well... Question was performance-related. If locality of reference
matters on given hardware (which, with virtual base class and virtual
calls, can easily happen), all the better. In light of performance,
conceptual purity matters less, on one hand, and on the other, end
result, functionality-wise, is pretty similar between the two. Hence
"viable alternative".

Yes, but that is where i disagreed, the difference of these two is
initially not in performance but in behavior, contract,
responsibilities and guarantees. It is not only conceptual purity
(like arrow type on some uml diagram). The usage of what you can or
have to do to one or other as user of it is different. Pointerness of
pimpl is entirely removed from being users concern. It is ready-made.
Use it as a data member and compiler generated assignment and copy
construction work.

Abstract interface on the other hand is usually up to its user to
manage as a pointer and there is usually pile of managerial concerns
where to acquire and how to release and check for nullness. Then the
users often wrap it into smart pointers and both damage performance
and uglify their code.
 
G

Goran

Pointerness of
pimpl is entirely removed from being users concern. It is ready-made.
Use it as a data member and compiler generated assignment and copy
construction work.

How so? IME(xperience), pimpl holder is most often also owner of the
object, and consequently, these do not work.
Abstract interface on the other hand is usually up to its user to
manage as a pointer and there is usually pile of managerial concerns
where to acquire and how to release and check for nullness.

Well, question of ownership is not easy, that's for sure. But it's
possible to clearly delineate places of code where NULL isn't allowed
(drop the pointer, require reference).

I honestly fail to see how pimpl changes anything there.

Goran.
 
Ö

Öö Tiib

How so? IME(xperience), pimpl holder is most often also owner of the
object, and consequently, these do not work.

Both classes the envelope and the implementation are usually written
by same person, "author". The resulting class behaves usually like
value class (copyable, assignable, swappable, movable). So the "user"
of the class owns the instances and can use as components of his
objects.
Well, question of ownership is not easy, that's for sure. But it's
possible to clearly delineate places of code where NULL isn't allowed
(drop the pointer, require reference).

I feel it is still pointer or reference to something not exactly
owned.
I honestly fail to see how pimpl changes anything there.

I don't use them for same thing, i think i run into difficulties when
trying to use as alternatives for same relations.

What is going on with pimple is inner business of it. I am not
suggesting extreme tricks there, firewall is good itself, but often
the pimples can outperform other class patterns. For example it can
internally use copy-on-write instead of copying and it can use fixed
immutable implementations for some common states of it and it can use
its own memory allocators. Externally however it looks simple and
ordinary and every instance feels like solid object with separate
state. It may have hidden dependencies with other modules or layers
but it does not work too well like proxy to such distant entities.

Pure base class is good for other relations where the object at what
the interface pointer points is not owned by its user and is often in
different layer or module. Object behind interface is usually not
copyable by interfaces user, also the life-time is often out of users
control. Interface is acquired from some factory or service and
released back to that. Such entity can be "part of" or "subordinate
of" several aggregates simultaneously. It does not work too well as a
component or value.
 
J

Juha Nieminen

James Kanze said:
And locality could decrease, with negative impact even after the
allocation. If the class is small (e.g. something like complex
or Point), and you expect to have vectors with millions of them,
the compilation firewall idiom is not indicated. For a large
number of larger classes, however, it could be. On the other
hand, such larger classes are likely to be entity classes, which
can easily be constructed using a factory method, and then only
accessed through a pointer to the abstract base class. Which
accomplishes more or less the same thing as the compilation
firewall idiom.

One situation where using the Pimpl method can increase efficiency
is when the amount of data (managed by one object) is relatively
large, instances of the class get copied around a lot but seldom
modified (sorting an array of instance is a concrete example of this)
and you use copy-on-write to manage the data. Depending on how much
the data is modified it could also decrease memory usage (because
several instances could share identical data).
 
J

James Kanze

One situation where using the Pimpl method can increase efficiency
is when the amount of data (managed by one object) is relatively
large, instances of the class get copied around a lot but seldom
modified (sorting an array of instance is a concrete example of this)
and you use copy-on-write to manage the data. Depending on how much
the data is modified it could also decrease memory usage (because
several instances could share identical data).

Agreed, but I'm not sure I'd consider that the same idiom.
That's CoW, or lazy copy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top