Y
yonil
Over the years of using C++ I've begun noticing that freestore
management functions (malloc/free) become performance bottlenecks in
complex object-oriented libraries. This is usually because these
functions acquire a mutex lock on the heap. Since the software I'm
writing is targetted for a number of embedded platforms as well as the
PC, it's somewhat difficult to use anything but the standard
implementation given with the compiler.
I've noticed the performance of even STL classes such as std::list is
bottlenecked by allocators. On the VS.net 2005 in particular,
std::allocator uses malloc even for small allocations, which apparently
has an overhead of over 30 bytes per block (in release mode), so
list<int> takes about 40 bytes per node. Ouch. Aside from all mutex
locking mentioned previously, wasting memory in this manner also
reduces performance because the cache is much less efficiently
utilized. RAM access is usually the next worst bottleneck, because the
gap between CPU power to memory bandwidth seems only to increase over
time.
Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).
I've found some remedy to these problems in boost::fast_pool_alloc and
in particular when instantiated using null_mutex. This can virtually
eliminate the abovementioned problems with STL containers and
shared_ptr, but in a multithreaded program you sometimes still can't
get rid of the lock. Looking into the future, it would seem smart to
implement multithreaded architectures for high-performance apps,
because of the developments in multi-core hardware. This makes it ever
more important to keep your program as lock-free as you can.
Ironically, it seems it only gets harder to code high performance apps
these days. I miss the times you could focus on your inner loop calcs
and lookup tables actually made things faster - running time was much
more closely related to the academic notion of complexity than it is
nowadays.
Maybe I'm just a little too concerned over these issues, I don't know
management functions (malloc/free) become performance bottlenecks in
complex object-oriented libraries. This is usually because these
functions acquire a mutex lock on the heap. Since the software I'm
writing is targetted for a number of embedded platforms as well as the
PC, it's somewhat difficult to use anything but the standard
implementation given with the compiler.
I've noticed the performance of even STL classes such as std::list is
bottlenecked by allocators. On the VS.net 2005 in particular,
std::allocator uses malloc even for small allocations, which apparently
has an overhead of over 30 bytes per block (in release mode), so
list<int> takes about 40 bytes per node. Ouch. Aside from all mutex
locking mentioned previously, wasting memory in this manner also
reduces performance because the cache is much less efficiently
utilized. RAM access is usually the next worst bottleneck, because the
gap between CPU power to memory bandwidth seems only to increase over
time.
Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).
I've found some remedy to these problems in boost::fast_pool_alloc and
in particular when instantiated using null_mutex. This can virtually
eliminate the abovementioned problems with STL containers and
shared_ptr, but in a multithreaded program you sometimes still can't
get rid of the lock. Looking into the future, it would seem smart to
implement multithreaded architectures for high-performance apps,
because of the developments in multi-core hardware. This makes it ever
more important to keep your program as lock-free as you can.
Ironically, it seems it only gets harder to code high performance apps
these days. I miss the times you could focus on your inner loop calcs
and lookup tables actually made things faster - running time was much
more closely related to the academic notion of complexity than it is
nowadays.
Maybe I'm just a little too concerned over these issues, I don't know