STL list or map?

M

Mike Copeland

I have the following structure defined, and I'm trying to decide the
best container to use for accessing specific elements: "best" meaning
the fastest way. I will be searching on the "sBibNo" field, and it will
likely contain values in a high range (20000-50000). The elements
stored will not be contiguous. I can't allocate an array of elements
because doing so would waste a lot of memory space...so I can't use a
direct index to find the elements.
I'm thinking I could use either an STL list or map for this
application, but I fear the list processing would be prohibitively slow
and the map might be quite wasteful in other ways. The actual number of
elements used in this application probably won't exceed 3000.
Please advise. TIA

struct SHADOWS
{
char sRCode, aRCode;
int sBibNo, aBibNo;
}
 
K

Kai-Uwe Bux

Mike said:
I have the following structure defined, and I'm trying to decide the
best container to use for accessing specific elements: "best" meaning
the fastest way. I will be searching on the "sBibNo" field, and it will
likely contain values in a high range (20000-50000). The elements
stored will not be contiguous. I can't allocate an array of elements
because doing so would waste a lot of memory space...so I can't use a
direct index to find the elements.
I'm thinking I could use either an STL list or map for this
application, but I fear the list processing would be prohibitively slow
and the map might be quite wasteful in other ways.

Memory: A list node contains the data and typically two pointers. Very
likely, the underlying allocator is also putting in a hidder pointer. That
is three pointers in addition to your data. A map node typically contains
three pointers and the hidden pointer in addition to the data. Taking into
account that the allocator is likely to round sizes to multiples of 16 or 32
bytes, you are somewhat unlikely to see a difference.

Cost of iteration: incremening / decrementing an iterator is constant time
for a list and ammortized constant time for a map.
The actual number of
elements used in this application probably won't exceed 3000.
Please advise. TIA
[...]

Elsethread, the suggestion was made to hide this decision as an
implementation detail of some class. That is a good idea.

If you choose a list representation, then a lot will depend on which pattern
of searches predominates in your application. E.g., if there is a
comparatively small set of keys that you will search for very often, then a
"self organizing file" might be a good idea: you always search from front to
tail and when you dfind a key, you move the entry to the head of the list.
This way, frequently searched entries gather near the front and will be
found quickly.

If your searches are more or less evenly distributed among the keys, a
better data structure would probably improve performance of searching. The
tradeoff is usually a slightly greater complexity for inserting an entry.
Whether changing the data structures pays off, therefore, also depends on
whether your application will search way more often than it will modify the
underlying data.

As the influencing factors are hard to predict, you should write your
program so that it will be easy to change the data structure. Then,
implement one that you can easily prove correct. Create unit tests. When you
change the data structure, those tests will catch bugs. Once the other data
structure is up and running, you can compare performance.


Best,

Kai-Uwe Bux
 
R

Rune Allnor

   I have the following structure defined, and I'm trying to decide the
best container to use for accessing specific elements: "best" meaning
the fastest way.  I will be searching on the "sBibNo" field, and it will
likely contain values in a high range (20000-50000).  The elements
stored will not be contiguous.  I can't allocate an array of elements
because doing so would waste a lot of memory space...so I can't use a
direct index to find the elements.
   I'm thinking I could use either an STL list or map for this
application, but I fear the list processing would be prohibitively slow
and the map might be quite wasteful in other ways.  The actual number of
elements used in this application probably won't exceed 3000.

That's the trade-off you are faced with: Memory vs speed.

If you want to search inside the container, forget the list.
Lists can only be searched sequentially. Maps probably give
the fastest search results, at the expense of some internal
data structures that require memory.

Rune
 
J

Joshua Maurice

   I have the following structure defined, and I'm trying to decide the
best container to use for accessing specific elements: "best" meaning
the fastest way.  I will be searching on the "sBibNo" field, and it will
likely contain values in a high range (20000-50000).  The elements
stored will not be contiguous.  I can't allocate an array of elements
because doing so would waste a lot of memory space...so I can't use a
direct index to find the elements.
   I'm thinking I could use either an STL list or map for this
application, but I fear the list processing would be prohibitively slow
and the map might be quite wasteful in other ways.  The actual number of
elements used in this application probably won't exceed 3000.
   Please advise.  TIA

struct SHADOWS
{
        char sRCode, aRCode;
        int  sBibNo, aBibNo;
}

How often do you insert, and how often do you do finds? Specifically
the relative frequencies.

std::list is /usually/ the wrong answer to any performance problem. If
you're doing lookups and you're concerned about memory, it is
definitely the wrong answer. std::map would be better.

std::set, equivalently std::map, is the good default choice.

If the number of finds greatly outweigh the number of inserts, a
std::vector kept in sorted order using binary searches is probably the
best approach for performance.

Alternatively, maybe a B-tree with the right branching factor.
http://en.wikipedia.org/wiki/B-tree
I don't know offhand if this is a good idea. I would implement and
measure.

Specifically, if your problem is that important, I would implement and
measure all reasonable alternatives.
 
M

Michael Doubez

   I have the following structure defined, and I'm trying to decide the
best container to use for accessing specific elements: "best" meaning
the fastest way.  I will be searching on the "sBibNo" field, and it will
likely contain values in a high range (20000-50000).  The elements
stored will not be contiguous.  I can't allocate an array of elements
because doing so would waste a lot of memory space...so I can't use a
direct index to find the elements.
   I'm thinking I could use either an STL list or map for this
application, but I fear the list processing would be prohibitively slow
and the map might be quite wasteful in other ways.  The actual number of
elements used in this application probably won't exceed 3000.
   Please advise.  TIA

struct SHADOWS
{
        char sRCode, aRCode;
        int  sBibNo, aBibNo;
};

In addition to what others said, you can also consider unorder_map
(hash_map extension on some compilers) but it depends on the
distribution of sBibNo or if you can find a good hash function.
 
J

Juha Nieminen

Mike Copeland said:
I have the following structure defined, and I'm trying to decide the
best container to use for accessing specific elements: "best" meaning
the fastest way. I will be searching on the "sBibNo" field, and it will
likely contain values in a high range (20000-50000). The elements
stored will not be contiguous. I can't allocate an array of elements
because doing so would waste a lot of memory space...so I can't use a
direct index to find the elements.

Is the container initialized once at the start of the program and then
never modified again (fast reading being the only operation done to it
for the rest of the program)?

If the answer is yes, then the most efficient way is to put the elements
in a vector, sort it, and then use binary searching when you need to read
an element. The binary searching can be done with a std::lower_bound() call.

If the container needs to be updated (elements added or removed) during
the execution of the program, and this updating must be as fast as possible,
then this solution won't work. (In that case you need to either use std::set
or std::unordered_set.)
 
M

Mike Copeland

I have the following structure defined, and I'm trying to decide the
Is the container initialized once at the start of the program and then
never modified again (fast reading being the only operation done to it
for the rest of the program)?

Yes. Furthermore, the (many) searches will attempt to find virtually
all of the elements in the container - which is why I'm most interested
in speed of searching.
If the answer is yes, then the most efficient way is to put the elements
in a vector, sort it, and then use binary searching when you need to read
an element. The binary searching can be done with a std::lower_bound() call.

That seems to be most worth exploring. Thanks for the thorough
answer.
If the container needs to be updated (elements added or removed) during
the execution of the program, and this updating must be as fast as possible,
then this solution won't work. (In that case you need to either use std::set
or std::unordered_set.)

This has no bearing on my application. Data is stored once (at the
program's start) and there will be many accesses of all elements (as I
say above).
Thanks to all who responded!
 
J

Jeff Flinn

Mike said:
Yes. Furthermore, the (many) searches will attempt to find virtually
all of the elements in the container - which is why I'm most interested
in speed of searching.


That seems to be most worth exploring. Thanks for the thorough
answer.

This has no bearing on my application. Data is stored once (at the
program's start) and there will be many accesses of all elements (as I
say above).
Thanks to all who responded!

Sounds like a sorted vector and using upper/lower_bound would be the
best bet.

Jeff
 
M

MikeP

Kai-Uwe Bux said:
Elsethread, the suggestion was made to hide this decision as an
implementation detail of some class. That is a good idea.

I don't think that it is. This is the perfect opportunity for the OP to
learn about applying the correct container/algo by taking the required
amount of time to do so. Unnecessary abstraction just delays the decision
and gives the application unnatural semantics and stifles the
programmer's level of capability. Use the right tool for the job. Don't
just use a hammer for expediency. The way to proceed is in recognizing
the application's required semantics first. Second, pick the correct
container/algo. If need be (and usually it need not be), perform
optimization (time and/or space and/or size) steps (but do avoid this
unless
it is REALLY necessary, for it can unnecessarily result in "bad" code).
If you choose a list representation, then a lot will depend on which
pattern of searches predominates in your application. E.g., if there
is a comparatively small set of keys that you will search for very
often, then a "self organizing file" might be a good idea: you always
search from front to tail and when you dfind a key, you move the
entry to the head of the list. This way, frequently searched entries
gather near the front and will be found quickly.

That is a good example of the thought process required to arrive at a
solution.
If your searches are more or less evenly distributed among the keys, a
better data structure would probably improve performance of
searching. The tradeoff is usually a slightly greater complexity for
inserting an entry. Whether changing the data structures pays off,
therefore, also depends on whether your application will search way
more often than it will modify the underlying data.

Example of optimization concerns.
As the influencing factors are hard to predict, you should write your
program so that it will be easy to change the data structure. Then,
implement one that you can easily prove correct. Create unit tests.
When you change the data structure, those tests will catch bugs. Once
the other data structure is up and running, you can compare
performance.

Again, I disagree (see above).
 
M

MikeP

Rune said:
That's the trade-off you are faced with: Memory vs speed.

If you want to search inside the container, forget the list.
Lists can only be searched sequentially.

The other responder noted that a "priority list" (for lack of a better
designation in my mind right now) may be in order, which may or may not
be. A list that moves recently accessed elements to the front may be just
what the application requires. Do not reject a list out of only the above
characteristic of a simple list. Let the application semantics govern.
 
I

Ian Collins

I don't think that it is. This is the perfect opportunity for the OP to
learn about applying the correct container/algo by taking the required
amount of time to do so. Unnecessary abstraction just delays the decision
and gives the application unnatural semantics and stifles the
programmer's level of capability. Use the right tool for the job.

Requirements aren't always invariant. If we lived in a perfect world,
your observations would be correct. But we don't!
Again, I disagree (see above).

Again, we don't all live in a world of unchanging requirements.
 
R

Rune Allnor

The other responder noted that a "priority list" (for lack of a better
designation in my mind right now) may be in order, which may or may not
be. A list that moves recently accessed elements to the front may be just
what the application requires. Do not reject a list out of only the above
characteristic of a simple list. Let the application semantics govern.

Is that a list? I have seen descriptions of such functionality
in Cormen & al. The description seemed like some sort of tree
structure, that was updated after each access, to me.

Rune
 
A

Alain Ketterlin

[...]
Is that a list? I have seen descriptions of such functionality
in Cormen & al. The description seemed like some sort of tree
structure, that was updated after each access, to me.

It's called a priority queue, and usually uses a tree to guarantee
O(log(N)) insertion and extraction of the highest priority element. It
is included in th STL (priority_queue).

-- Alain.
 
M

MikeP

Ian said:
Requirements aren't always invariant. If we lived in a perfect world,
your observations would be correct. But we don't!

You seem to be trying to make the exception the rule. The rule is to only
code the exceptional case only in the exceptional case and not aberrate
the common case with the exceptional. If you extrapolate further from
your suggestion, you will see it is not practical nor is it good
engineering.

In the OP's case shown above, doing the abstraction of the container/algo
could be R&D but the final product should be devoid of test harnesses.
Again, we don't all live in a world of unchanging requirements.

Non-sequitor at best, strawman at worst. Your "case" is moot. (You are
welcome to try again though but I am not into have overly-general or
philosophical discussion about it, for it is a well-defined area, at
least to me it is).
 
M

MikeP

Alain said:
[...]
Is that a list? I have seen descriptions of such functionality
in Cormen & al. The description seemed like some sort of tree
structure, that was updated after each access, to me.

It's called a priority queue, and usually uses a tree to guarantee
O(log(N)) insertion and extraction of the highest priority element. It
is included in th STL (priority_queue).

No. A list is a list. A queue is a queue. (Notwithstanding, a queue can
be made from a list). I didn't mean to confuse by using "priority list"
(and parentheically tried to preclude potential confusion and also
attempted to avoid by putting it in quotes) but apparently I did. I was
trying to conjure up a name for a list that relinks last-accessed
elements to the front of the list for I don't know if there is a formal
name for such a beast while there indeed is a name for the different
animal called "priority queue".
 
R

Rune Allnor

Alain said:
[...]
The other responder noted that a "priority list" (for lack of a
better designation in my mind right now) may be in order, which may
or may not be. A list that moves recently accessed elements to the
front may be just what the application requires. Do not reject a
list out of only the above characteristic of a simple list. Let the
application semantics govern.
Is that a list? I have seen descriptions of such functionality
in Cormen & al. The description seemed like some sort of tree
structure, that was updated after each access, to me.
It's called a priority queue, and usually uses a tree to guarantee
O(log(N)) insertion and extraction of the highest priority element. It
is included in th STL (priority_queue).

No. A list is a list. A queue is a queue. (Notwithstanding, a queue can
be made from a list). I didn't mean to confuse by using "priority list"
(and parentheically tried to preclude potential confusion and also
attempted to avoid by putting it in quotes) but apparently I did. I was
trying to conjure up a name for a list that relinks last-accessed
elements to the front of the list for I don't know if there is a formal
name for such a beast while there indeed is a name for the different
animal called "priority queue".- Hide quoted text -

- Show quoted text -

OK, then why not try and untangle the confusion:

1) How is the list you describe functionally and semantically
different from a priority queue?

2) What would the advantages be, if any, with this list compared
to a priority queue?

Rune
 
M

MikeP

Rune said:
Alain said:
Rune Allnor <[email protected]> writes:
[...]
The other responder noted that a "priority list" (for lack of a
better designation in my mind right now) may be in order, which
may or may not be. A list that moves recently accessed elements
to the front may be just what the application requires. Do not
reject a list out of only the above characteristic of a simple
list. Let the application semantics govern.
Is that a list? I have seen descriptions of such functionality
in Cormen & al. The description seemed like some sort of tree
structure, that was updated after each access, to me.
It's called a priority queue, and usually uses a tree to guarantee
O(log(N)) insertion and extraction of the highest priority element.
It is included in th STL (priority_queue).

No. A list is a list. A queue is a queue. (Notwithstanding, a queue
can be made from a list). I didn't mean to confuse by using
"priority list" (and parentheically tried to preclude potential
confusion and also attempted to avoid by putting it in quotes) but
apparently I did. I was trying to conjure up a name for a list that
relinks last-accessed elements to the front of the list for I don't
know if there is a formal name for such a beast while there indeed
is a name for the different animal called "priority queue".- Hide
quoted text -

- Show quoted text -

OK, then why not try and untangle the confusion:

1) How is the list you describe functionally and semantically
different from a priority queue?

2) What would the advantages be, if any, with this list compared
to a priority queue?

Think about what a queue is and what a list is and how their behavior is
different and think of some examples where each is appropriately applied
from an application point of view. Save the Big-O stuff and the like for
exotic situations or at least AFTER thinking about the problem from the
use case perspective. The web and search is your friend. I recommend
Cormen's book as the best on the subject.
 
M

Michael Doubez

Rune said:
Alain Ketterlin wrote:
[...]
The other responder noted that a "priority list" (for lack of a
better designation in my mind right now) may be in order, which
may or may not be. A list that moves recently accessed elements
to the front may be just what the application requires. Do not
reject a list out of only the above characteristic of a simple
list. Let the application semantics govern.
Is that a list? I have seen descriptions of such functionality
in Cormen & al. The description seemed like some sort of tree
structure, that was updated after each access, to me.
It's called a priority queue, and usually uses a tree to guarantee
O(log(N)) insertion and extraction of the highest priority element.
It is included in th STL (priority_queue).
No. A list is a list. A queue is a queue. (Notwithstanding, a queue
can be made from a list). I didn't mean to confuse by using
"priority list" (and parentheically tried to preclude potential
confusion and also attempted to avoid by putting it in quotes) but
apparently I did. I was trying to conjure up a name for a list that
relinks last-accessed elements to the front of the list for I don't
know if there is a formal name for such a beast while there indeed
is a name for the different animal called "priority queue".- Hide
quoted text -
- Show quoted text -
OK, then why not try and untangle the confusion:
1) How is the list you describe functionally and semantically
   different from a priority queue?
2) What would the advantages be, if any, with this list compared
   to a priority queue?

Think about what a queue is and what a list is and how their behavior is
different and think of some examples where each is appropriately applied
from an application point of view.

I don't understand your point. A list is a data structure while a
queue is a way to organize/prioritize data (the FIFO abstraction).
Thus, in the C++ standard, a std::list<> is a container while a
std::queue<> is a container adapter.

I would be interest on how you can compare them ?
Save the Big-O stuff and the like for
exotic situations or at least AFTER thinking about the problem from the
use case perspective. The web and search is your friend. I recommend
Cormen's book as the best on the subject.

From the reviews I have seen, it is tagged as a bit too academic.
 
M

MikeP

Michael said:
Rune said:
Alain Ketterlin wrote:
Rune Allnor <[email protected]> writes:
[...]
The other responder noted that a "priority list" (for lack of a
better designation in my mind right now) may be in order, which
may or may not be. A list that moves recently accessed elements
to the front may be just what the application requires. Do not
reject a list out of only the above characteristic of a simple
list. Let the application semantics govern.
Is that a list? I have seen descriptions of such functionality
in Cormen & al. The description seemed like some sort of tree
structure, that was updated after each access, to me.
It's called a priority queue, and usually uses a tree to guarantee
O(log(N)) insertion and extraction of the highest priority
element. It is included in th STL (priority_queue).
No. A list is a list. A queue is a queue. (Notwithstanding, a queue
can be made from a list). I didn't mean to confuse by using
"priority list" (and parentheically tried to preclude potential
confusion and also attempted to avoid by putting it in quotes) but
apparently I did. I was trying to conjure up a name for a list that
relinks last-accessed elements to the front of the list for I don't
know if there is a formal name for such a beast while there indeed
is a name for the different animal called "priority queue".- Hide
quoted text -
- Show quoted text -
OK, then why not try and untangle the confusion:
1) How is the list you describe functionally and semantically
different from a priority queue?
2) What would the advantages be, if any, with this list compared
to a priority queue?

Think about what a queue is and what a list is and how their
behavior is different and think of some examples where each is
appropriately applied from an application point of view.

I don't understand your point.

My point is that discussion about fundamentals of ADTs is noise in a NG
about C++. As such, I gave suggestion for further investigation to the
poster.
A list is a data structure while a
queue is a way to organize/prioritize data (the FIFO abstraction).

The are both ADTs. In the simplest case, I think of a priority queue as a
queue that allows adding to the front of the queue in addition to just
the back of the queue. That one additional capability can distinguish a
priority queue from a regular queue. Here's a good "homework" question:
Should a queue allow iteration over it's elements just like a list does?
(Good homework or test question in general. I wasn't assigning you
homework!).

But see, now you are sucking me into the discussion I didn't want to get
into! Web/search is your friend.
Thus, in the C++ standard, a std::list<> is a container while a
std::queue<> is a container adapter.


You seem to be confusing the abstraction with a particular
implementation. Just because you CAN build a queue from a list does not
mean you HAVE to build it that way, nor is it necessarily the best way.
I would be interest on how you can compare them ?

I compare them according to their behavioral capabilities and by their
applicability to a given scenario.
From the reviews I have seen, it is tagged as a bit too academic.

Well that is exactly what seems to be the need: fundamental principles of
the different ADTs, divorced from any particular implementation in any
programming language. If you don't like Cormen, there are plenty of other
books on the subject. My red-black tree is based on the pseudo-code in
the Cormen book, but an early version. I will be rewriting based on their
latest edition which is more stable in regards to not invalidating
iterators. So, I obviously like the book and that is why I recommended
it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top