I've been trying to optimize a piece of code that executes in 6
milliseconds. From what I have read, dynamically managing memory with
new/malloc and delete/free is less efficient than using the stack to
store variables. Part of my 6 ms code calls delete on class pointers
more than 20000 times. I changed the code to store the class
instances on the stack instead of the heap, so the class's destructor
is called but not delete. Unfortunately, the execution time is still
6 ms.
Does anyone know what is happening? Is automatic storage really more
efficient than dynamic storage?
Any help would be greatly appreciated.
Gonna be a bit snarky here first. It is so frequent that
people want things faster, but they don't really have a
good reason for it, that it's worth badgering you with
the "have you checked the power cord is plugged in" questions.
Ok, ask a prior question here. Is this 6 ms code actually
worth worrying about? As somebody else asked, have you
profiled your code to see that this 6 ms actually is
where your code is spending a signficiant amount of time?
Or should you be worrying about the other parts of your
app that may be using the time?
Presuming all those come up with "yes I need optimiizing"
then go forward.
20,000 calls to delete in 6 ms? Are you sure there is any
room for "optimizing?" That's like 0.3 micro seconds per
delete, not including any actual work for the code.
Ok, you need to optimize. How fast would be fast enough?
If you don't have any spec for that, what are you doing?
"I want it faster" isn't a spec. It's a plea for that
pony you never had as a kid.
Maybe what you need is a petting zoo?
Ok, you are still here wanting to optimize.
First, consider whether a tweak of algorithm isn't
the best way to go. Maybe you can reduce the number
of objects you really allocate/deallocate? Maybe
you can cut depth of some tree or something?
Before you make your algorithm really complicated
to make it faster, revisit those "is this optimization
really necessary" questions. If you decide a more
complicated algorithm is the right path, make sure
you document it clearly so you will be able to
understand what is going on in six months or a year
when you come back to this code.
And make sure you time it before you switch.
There are ways to manage memory other than grabbing
and giving back objects. What is the maximum number
of objects you need at one time? Is it 20,000? If it
is some much smaller number, consider what the level
of effort of the ctor and dtor of each object is.
If the actual construction work is not very large,
maybe you can do something like so:
- Allocate a bunch of objects. Maybe in a vector.
- Construct them as required. Look up the "named
constructor idiom" for guidance.
- Destruct them as needed.
- Do not deallocate the memory until you are done.
- When you stop needing a particular object, keep
track that it has become available. When you need
a new one, select it from the available ones.
What scheme you use to do this will depend on
the order you acquire and give back objects.
- When you are finished, use normal dtor procedures
to give back all the resources associated.
But! Be sure to time this scheme with representative
sample data before you adopt it. It is quite possible
it will be slower than the usual thing, especially
if the construction effort for each object is large.
If you really need 20,000 objects all at once, maybe
what you want is to allocate all the objects at once.
Again, maybe in a vector. Then you can access them
with just an integer. Possibly (though again, time it
before you switch) allocating a block of 20,000 objects
will be faster than allocating 20,000 individual
objects one at a time. Again, you may need the "named
constructor idiom."
Socks