map or set for handling struct with a key?

D

Digital Puer

Hi, I was wondering whether to use a map or a set for
my purposes.

I am reading rows from a database and am populating a
struct (or class) for each row. It has a key and ancillary
data, e.g.:

struct Foo {
int key;
float data1;
int data2;
string data3;
};


Now I want to keep all the rows in either a map or a set.
If I use a set, I can keep the struct the way it is; the only
thing I need is to provide a comparison function for the set
so that the ordering is based on the key.

On the other hand, if I use a map, I am inclined to break
the key out and use the key as the first part in the map,
e.g.:

struct Foo {
float data1;
int data2;
string data3;
};
map<int, Foo> mydata;


Which approach is preferable for the following cases:

1. My only operations on the data structure are to iterate
over it and to use find().

2. I will iterate over as well as update the data.
 
V

Victor Bazarov

Digital said:
Hi, I was wondering whether to use a map or a set for
my purposes.

I am reading rows from a database and am populating a
struct (or class) for each row. It has a key and ancillary
data, e.g.:

struct Foo {
int key;
float data1;
int data2;
string data3;
};


Now I want to keep all the rows in either a map or a set.
If I use a set, I can keep the struct the way it is; the only
thing I need is to provide a comparison function for the set
so that the ordering is based on the key.

On the other hand, if I use a map, I am inclined to break
the key out and use the key as the first part in the map,
e.g.:

struct Foo {
float data1;
int data2;
string data3;
};
map<int, Foo> mydata;


Which approach is preferable for the following cases:

1. My only operations on the data structure are to iterate
over it and to use find().

2. I will iterate over as well as update the data.

If you don't update the keys, ever, then you're better off with
a sorted vector/deque.

V
 
P

PicO

To use set to sort objects , you need to make comparison function or
compare operator like ( < or > )

so i suppose you to make bool operator in the struct ...

struct Foo {
int key;
float data1;
int data2;
string data3;

bool operator < ( const foo & X )
const
{
return key < X.key ? true : false ;
}

};

that operator will get set to sort this objects according to the
key ...
 
D

Digital Puer

Yes, I already understood that part. I mentioned that in
my first post. My question pertains to choosing between a map
or a set.
 
D

Digital Puer

If you don't update the keys, ever, then you're better off with
a sorted vector/deque.

V


Please explain why a sorted vector or deque is better than
map or set. A find() will be O(lg n) with any of them.
 
R

redfloyd

Hi, I was wondering whether to use a map or a set for
my purposes.

I am reading rows from a database and am populating a
struct (or class) for each row. It has a key and ancillary
data, e.g.:

struct Foo {
int key;
float data1;
int data2;
string data3;

};

Now I want to keep all the rows in either a map or a set.
If I use a set, I can keep the struct the way it is; the only
thing I need is to provide a comparison function for the set
so that the ordering is based on the key.

On the other hand, if I use a map, I am inclined to break
the key out and use the key as the first part in the map,
e.g.:

struct Foo {
float data1;
int data2;
string data3;};

map<int, Foo> mydata;

Which approach is preferable for the following cases:

1. My only operations on the data structure are to iterate
over it and to use find().

2. I will iterate over as well as update the data.

Disregarding whether vector/deque may be more suitable (sorry,
Victor), your second criterion, namely the updating of the data, would
point ot a map.
 
P

PicO

Please explain why a sorted vector or deque is better than
map or set. A find() will be O(lg n) with any of them.

if you are going update data after insertion , you have to use map not
set as you can't update data in the set as set built with no
assignment properties while map can update the "value" ( data1 & data
2 & data3 ) ...

i suppose map as you can search easier ... but i don't suppose vector
as erase and insertion will take long time ( n for each erase ) ...
 
M

Mark P

Digital said:
Please explain why a sorted vector or deque is better than
map or set. A find() will be O(lg n) with any of them.

In general, it will be faster to read all of the data into a vector and
sort it once rather than reading the data into a set or map and sorting
"online". A vector will also use less space in general.
 
P

Pete Becker

if you are going update data after insertion , you have to use map not
set as you can't update data in the set as set built with no
assignment properties while map can update the "value" ( data1 & data
2 & data3 ) ...

You can update an element in a set by removing it and re-inserting it.
 
V

Victor Bazarov

PicO said:
ya ya you can but it's not efficient ... you have ( n log n ) cost for
every time you do this ....

Ya ya... Nope. Record the iterator you expect to delete, increment
and store the incremented one, remove using the iterator, insert again
with the incremented one as the hint. Should be constant time.

V
 
T

tony_in_da_uk

[ should I have a set of...]
struct Foo {
int key;
float data1;
int data2;
string data3;
};
[...or a map of int keys to...]
struct Foo {
float data1;
int data2;
string data3;};
Which approach is preferable for the following cases:

1. My only operations on the data structure are to iterate
over it and to use find().

2. I will iterate over as well as update the data.

In general, you should use a map when part of the struct data forms
the key, and set when the entire data is the key. Whatever operations
your program starts off doing, programs evolve and choosing a natural
match for your data is a better approach than choosing something
that's perhaps slightly better for the usage start off with, then
having to rewrite everything.

The fact that you want to use find suggests an associative container
is a simply more natural fit than say a sorted vector and binary
search algorithm, and unless profiling shows you need fast population
or iteration, then there's no reason to consider it.

Tony
 
P

PicO

Ya ya... Nope. Record the iterator you expect to delete, increment
and store the incremented one, remove using the iterator, insert again
with the incremented one as the hint. Should be constant time.

V

of course it'll not be a constant time .. for every set insertion
it'll cost ( log n ) ..
 
J

Juha Nieminen

PicO said:
ya ya you can but it's not efficient ... you have ( n log n ) cost for
every time you do this ....

Removing a value from a set is O(log n). Inserting a value in a set is
O(log n). Where do you get O(n log n)?
 
V

Victor Bazarov

PicO said:
of course it'll not be a constant time .. for every set insertion
it'll cost ( log n ) ..

You just like to disagree, don't you?

If you don't change the data affecting the sorting order (but only the
other, "irrelevant" data), and then reinsert it in the same place, two
comparisons should be made and that's all. If you don't believe me,
try it yourself.

V
 
J

joe

Please explain why a sorted vector or deque is better than
map or set. A find() will be O(lg n) with any of them.- Hide quoted text -

What people often forget in O() notation is that there is a
coefficient k which goes in front of it and reflects the cost of the
overhead for specific implementations of an algorithm. This value can
be important. A vector has the following benefits over a map/set:

1) Much less overhead per item. A map/set is usually a red/black tree
and each node has left, right, and parent pointers plus data for the
item.

2) Fewer allocations. Each item in a map/set is kept as an
individually allocated node whereas a vector is usually allocated in
chunks which can hold several nodes.

3) Quicker movement from one item to the next. After a comparison to
get to the next node, a pointer has to be read from memory, with a
vector you often just add two registers together.

4) Locality of reference. Since vectors are allocated in large chunks
of contiguous memory, you are much likelier to get a cache hit while
navigating through the vector than you would in navigating a set/map.

Maps/sets have the following benefits over a vector:

1) Existing iterators don't get invalidated by inserts and only the
iterators directly involved get invalidated by deletes.

2) Vectors may possibly require a lot of data shifting during inserts/
deletes.

3) More natural interface. You have to use the std algorithms to
manipulate the vectors whereas the map/set containers have them built-
in.


These are the kinds of factors that often make a sorted vector better
than a map or set if the data is mostly static. How much the data
changes is where the trade off occurs. If you have insertions and
deletions occurring frequently, then the cost of constantly keeping
the vector sorted will make a map or set more attractive. If
insertions/deletions can be batched and/or kept infrequent then a
vector is generally faster.

joe
 
P

Pete Becker

ya ya you can but it's not efficient ... you have ( n log n ) cost for
every time you do this ....

Surely that's O(log n). Even so, "you can't update data in the set" is
wrong. And without knowing how often this is going to happen, you
simply can't rule it out categorically.
 
P

Pete Becker

2) Fewer allocations. Each item in a map/set is kept as an
individually allocated node whereas a vector is usually allocated in
chunks which can hold several nodes.

Just a slight correction: a vector is allocated in a single chunk,
which holds all of the stored objects. A deque can be allocated in
multiple chunks.
 
J

James Kanze

On 2007-08-08 21:49:04 -0400, PicO <[email protected]> said:
Surely that's O(log n). Even so, "you can't update data in the set" is
wrong. And without knowing how often this is going to happen, you
simply can't rule it out categorically.

And as Victor pointed out, you can save the iterator behind the
one used to erase, so the insert can be constant time.

In fact, the constant factor involved in insert can be fairly
high, since there will typically be an allocation. On the other
hand, I've done exactly this in one very time critical
application, with no real problems. As you say, it depends on
how often you are doing this, and what else is going on.
 
P

PicO

Removing a value from a set is O(log n). Inserting a value in a set is
O(log n). Where do you get O(n log n)?

i was mean ( log n ) for every n it'll be (n log n ) and i correct it
later by log n if u read my reply .. but of course it'll not constant
like he says ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,681
Members
48,796
Latest member
Greg L.

Latest Threads

Top