Mirek said:
Think is, std::list advantage seemingly is O(1) ability to
remove/insert element everywhere. But the trouble is that you must know
which one to remove/where to insert.
That's hardly what most projects I've worked on use std::list or linked
lists in general. Often when linked list -like data structure is
chosen, it is NOT relevant at all what order objects are in the
container.
The primary concern is that you can create and delete objects
*cheaply*, and still have a way of keeping track of all objects you
have created. Deleting object from *middle* of container with array
semantics isn't very efficient.
The std::list has it's own problems, example, we want to keep track of
objects which are created with factory dp.
Example code:
texture* s = device->createTexture("bla.jpg");
s->release();
OK, let's assume device is the factory object which also manages
texture objects. The s->release() method implements reference counting
and when the reference count reaches zero delete is invoked.
If we keep the texture objects in std::list, finding objects for
deletion is iterating through the factory's texture object container
(or generic object container) until the value "s" is found.
Inefficient, this invites usage of std::map or similar container.
On the other hand, if the texture inherits from custom linked list
type, which *includes* (implementation detail, for example) pointers to
previous and next object in the sequence in the container, the removal
by value of s is trivial (and very fast). There should be a mechanism
to validate that s is actually part of the sequence it is removed from.
One way is to have pointer to the "current owner container" in the node
object.
class object
{
container* m_container;
object* m_prev;
object* m_next;
int m_refcount;
...
};
class texture : public object
{ ... };
But such design is more closer to what you might do in code you write
in C, not C++ .. it is a reasonable trade off to avoid writing custom
code for this kind of use case to just use std::set and be done with
it.
std::vector, or anykind of container with array semantics doesn't cut
it. We are NOT interested in inserting new objects in specific location
in the container, we just want to insert new stuff and remove *random*
objects out without significant overhead.
Maybe there are usage patterns which naturally lead to keeping some
iterator(s) into the middle of list, but I never met one.
Above you see one not very uncommon case where this might be
requirement. Believe it or not, std::set is MUCH better solution to
this particular problem to specificly AVOID having tangling iterators
as object handles, that would be plain stupid wouldn't it?
So in most cases, you will want to find that position. And thanks to
node-based nature of list, iterating is way slower than std::vector, so
before you get your position, your O(1) insert/remove advantage is
lost.
Speaking of iteration, of course, even with std::set for the use case
above, iterating the tree is still required. But that's precisely why
the tree is preferred over flat array or linked list.
The best is if pointer-to-object is fast to resolve into
pointer-to-node-in-sequence. The best case is possible if
pointer-to-object IS pointer-to-node-in-sequence (eg. custom code). But
profiling has revealed that it it not worth the effort for the kind of
uses *I* have put this kind of code into.
Sensible approach. Likewise. I choose the std::set for the task I used
to illustrational purposes for the reason that it scales well. For
small dataset, it doesn't matter what you use, it's fast anyways. This
scales pretty nice to medium complexity too. With really large amount
of objects, 1,000,000+ it would start to take noticeable hit compared
to "direct lookup", but the dataset will not be likely to reach that
size for for a while, past decade we have been hovering between 10 and
a few thousands, tops, depending on the application.
The tops figure cancels out the brute force linear search, is still
manageable without any significant runtime overhead to tree and doesn't
yet warrant custom code.
Yes, you are doing the right thing here. (Well, you can have
vector-like associative containers and they are much faster than binary
trees or other node-based maps, but that is another story
While I agree with the sentiments that array is better caching wise, it
should only be concern if there is no other design criteria for
choosing a specific kind of container. For random or random-like access
the memory bandwidth isn't as relevant as latency, for caching point of
view. The latency can be amortized by design, don't use the data
immediately after reading and you won't stall. Example:
if ( object->foo )
bar( ... );
We have to read from "foo" which is behind a pointer, until we know if
we will branch or not. If the predicate says "true" here, we have to
call.. but we cannot make the call until the value is read, if it comes
off cache, we have high latency. On the other hand, it is possible to
work around memory latency with additional gates in the CPU, we could
decode ahead of the IP and have multiple look-ahead contexts, and
"branch" to the context that was actually taken after the predicate can
be resolved.
How far ahead we can execute depends on how much area we are willing to
sacrifice to implement this logic. There is likely to be much more
productive use for the gates as this problem can be solved on the
software side on performance critical sections of the code, which are
far and wide between most of the code we will ever have to write. How
did it go, 97.8735433 214 (?) % of code isn't performance critical?
Whatever.