Median of values in a std::map

Y

Yannick Tremblay

Hi,


I have a std::map<id, timestamp> collection.

The id is the normal reference so the key-value ordering is correct.
However, once in a blue moon, I want to prune this collection. So
that it doesn't happens to often, I prune the 50% oldest entries and
go back to normal usage of the collection. In order to do so, I need
to find the median timestamp then delete all entries older than the
median timestamp. The second part is easy, just iterate through the
map and delete items. My problem is finding the median. Since it
shouldn't be executed often, I used the dumb way: create a vector,
iterate through the map and insert the timestamp associated to each
element in the vector, sort the vector, item[size()/2] is the median.

i.e.

typedef std::map<id, timestamp> collection_t;
collection_t collection;
//...
std::vector<timestamp_t> allTimes;
allTimes.reserve(collection.size());
for(collection_t::const_iterator iter = collection.begin();
iter != collection.end() ; ++iter)
{
allTimes.push_back(iter->second);
}
std::sort(allTimes.begin(), allTimes.end());
const timestamp_t timeToPrune = allTimes.at(allTimes.size()/2);


It works, it does what I want, it's easy to understand, performance
isn't needed. It does the job. I should leave it there. But,
but... it just seems that there must be a more elegant way to do it
:-(

Any suggestions?

Thanks

Yan
 
K

Kai-Uwe Bux

Yannick said:
Hi,


I have a std::map<id, timestamp> collection.

The id is the normal reference so the key-value ordering is correct.
However, once in a blue moon, I want to prune this collection. So
that it doesn't happens to often, I prune the 50% oldest entries and
go back to normal usage of the collection. In order to do so, I need
to find the median timestamp then delete all entries older than the
median timestamp. The second part is easy, just iterate through the
map and delete items. My problem is finding the median. Since it
shouldn't be executed often, I used the dumb way: create a vector,
iterate through the map and insert the timestamp associated to each
element in the vector, sort the vector, item[size()/2] is the median.

i.e.

typedef std::map<id, timestamp> collection_t;
collection_t collection;
//...
std::vector<timestamp_t> allTimes;
allTimes.reserve(collection.size());
for(collection_t::const_iterator iter = collection.begin();
iter != collection.end() ; ++iter)
{
allTimes.push_back(iter->second);
}
std::sort(allTimes.begin(), allTimes.end());
const timestamp_t timeToPrune = allTimes.at(allTimes.size()/2);

It works, it does what I want, it's easy to understand, performance
isn't needed. It does the job. I should leave it there. But,
but... it just seems that there must be a more elegant way to do it
:-(

Any suggestions?

To convey intent more clearly, you could use nth_element() instead of sort.
That would be all, I'd change. [It could also be faster since nth_element
is linear on average.]


Best

Kai-Uwe Bux
 
V

Vikram

Yannick said:
typedef std::map<id, timestamp> collection_t;
collection_t collection;
//...
std::vector<timestamp_t> allTimes;
allTimes.reserve(collection.size());
for(collection_t::const_iterator iter = collection.begin();
iter != collection.end() ; ++iter)
{
allTimes.push_back(iter->second);
}
std::sort(allTimes.begin(), allTimes.end());
const timestamp_t timeToPrune = allTimes.at(allTimes.size()/2);


It works, it does what I want, it's easy to understand, performance
isn't needed. It does the job. I should leave it there. But,
but... it just seems that there must be a more elegant way to do it
:-(

Nothing C++ specific..but finding the median in a vector in linear time
(in average case) is possible using the "partitioning" technique. Its a
step in the QuickSort method whose documentation should be on the net.
For worst-case linear time, you can google for "median of medians"
technique. The idea is to deal with groups (of 5 elements each
typically), find their median and use it partition.
But frankly I dont think it will make your code more elegant!
 
O

Old Wolf

Yannick said:
I have a std::map<id, timestamp> collection.

The id is the normal reference so the key-value ordering is correct.
However, once in a blue moon, I want to prune this collection. So
that it doesn't happens to often, I prune the 50% oldest entries and
go back to normal usage of the collection. In order to do so, I need
to find the median timestamp then delete all entries older than the
median timestamp. The second part is easy, just iterate through the
map and delete items. My problem is finding the median. Since it
shouldn't be executed often, I used the dumb way: create a vector,
iterate through the map and insert the timestamp associated to each
element in the vector, sort the vector, item[size()/2] is the median.

You could avoid the space and time overhead by using the mean
instead of the median. The mean will be quite close to the median,
unless there is something strange about your distribution.

Of course, the code to do this would probably be longer :)
 
P

peter koch

Old said:
Yannick said:
I have a std::map<id, timestamp> collection.

The id is the normal reference so the key-value ordering is correct.
However, once in a blue moon, I want to prune this collection. So
that it doesn't happens to often, I prune the 50% oldest entries and
go back to normal usage of the collection. In order to do so, I need
to find the median timestamp then delete all entries older than the
median timestamp. The second part is easy, just iterate through the
map and delete items. My problem is finding the median. Since it
shouldn't be executed often, I used the dumb way: create a vector,
iterate through the map and insert the timestamp associated to each
element in the vector, sort the vector, item[size()/2] is the median.

You could avoid the space and time overhead by using the mean
instead of the median. The mean will be quite close to the median,
unless there is something strange about your distribution.

Of course, the code to do this would probably be longer :)

Also there might be some problem of defining the mean of a timestamp.
Come to think about, I do not believe that the timestamps I've created
in my professional career would allow this without some tweaking.

/Peter
 
Y

Yannick Tremblay

[ cut cut ]

Thanks to all thoses that replied. Confirms my "sensible head"
feeling that the code works and is clear and I shouldn't worry about
it.

Yan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,044
Latest member
RonaldNen

Latest Threads

Top