STL Map: Sorting options?

S

Steve Edwards

Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.


I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)

Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.


Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.

Thanks

Steve
 
M

Michiel.Salters

Steve said:
Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.


I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)

"Results" is a misleading word here. std::map<Key, Value> is used to
keep
one (composite) value per key. It's a container, and containers do not
have
results. Functions have results=return values.
Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.

That's one solution, if you often need that representation too, and the
keys
and values don't change. (basic caching pattern).
Or am I just using inappropriate container types to start with?

It's a good container to use when determining the int values, but it
may not
be the best container to keep those int values in.
As I'm dealing with very large text files, speed is a concern too.

Speed of what? Building the collection, or using it? And if the latter,
how?
(And if you talk files, you often are I/O limited anyway)

HTH,
Michiel Salters
 
D

Daniel T.

Steve Edwards said:
Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.


I'm new to STL, but I've just read through the headers and can't see a
way to sort the results by value. (Maybe there's a way to modify less<>?)

Or... Should I create a second multimap (a copy) and have the int as the
key, and the string as the value, and then trivaially keep that sorted.


Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.

I'm think rather than a multimap, one could use "map<int, set<string> >"
That way, the strings will stay alphabetized.
 
S

Steve Edwards

"Results" is a misleading word here. std::map<Key, Value> is used to
keep
one (composite) value per key. It's a container, and containers do not
have

My mistake... I meant contents.
Speed of what? Building the collection, or using it? And if the latter,
how?
(And if you talk files, you often are I/O limited anyway)

I've already loaded my large text file in to memory as a simple array
of strings (one per word/token). I'm doing various lexical analyses on
these data. For some of these functions I need to quickly look up the
word as the key (hence my choice of <map>) to get the associated data;
other functions then require that the value mapped to the key can be
retrieved rapidly in order (hence my original question).

Since reading your reply I've built a copy as a multimap with the
key/value swapped, and using both containers together seems to be
working quite well in retrieving by either key or value rapidly.
(Though I don't have an alternative strategy to compare speed with, so
who knows.)

Building the structures is naturally slower, now, but it is an
acceptable tradeoff.

------------------------

In my original map

typedef map<string, int, less<string> >concordance;

every time I find another occurrence of the string key, is there a
quicker way to increment it's value count, than:

myConcordance[theWord] = myConcordance [theWord]+1;

It seems I'm doing 2 lookups of [theWord], can I change the value
in-place instead?

Thanks for your help.
 
D

Daniel T.

Steve Edwards said:
In my original map

typedef map<string, int, less<string> >concordance;

every time I find another occurrence of the string key, is there a
quicker way to increment it's value count, than:

myConcordance[theWord] = myConcordance [theWord]+1;

It seems I'm doing 2 lookups of [theWord], can I change the value
in-place instead?

++myConcordance[theWord];
 
S

Steve Edwards

"Daniel T. said:
I'm think rather than a multimap, one could use "map<int, set<string> >"
That way, the strings will stay alphabetized.

Thanks, I'll try that, see if it's quicker for my lookup needs.
 
D

Daniel T.

Steve Edwards said:
Thanks, I'll try that, see if it's quicker for my lookup needs.

If nothing else, it will make it faster to determine how many words have
the same occurrence count.
 
S

Steve Edwards

myConcordance[theWord] = myConcordance [theWord]+1;
[/QUOTE]
++myConcordance[theWord];

Over 100 million insertions it dropped from 20s to 12.5s. Thanks.

If nothing else, it will make it faster to determine how many words
have
the same occurrence count.

That's a benefit, and it's cleaner to iterate through, too.
 
N

Neil Cerutti

Hi,

I'm using a map with the key of type string, and value of type int.

typedef map<string, int, less<string> >concordance;

I'm finding words within some text and keeping a count of their
frequency.


I'm new to STL, but I've just read through the headers and can't see
a way to sort the results by value. (Maybe there's a way to modify
less<>?) > Or... Should I create a second multimap (a copy) and
have the int as the key, and the string as the value, and then
trivaially keep that sorted.


Or am I just using inappropriate container types to start with?

As I'm dealing with very large text files, speed is a concern too.

Here's another idea, which turned out to be more complicated than I at
first thought. The original plan was to store actual map iterators in
the vector, but then calling equal range became problematic, because I
had no iterator to pass in as the value to search for.

#include <iostream>
#include <iterator>
#include <vector>
#include <map>
#include <string>
#include <algorithm>

using std::map;
using std::string;
using std::vector;
using std::cout;
using std::pair;

bool less_first(pair<int, string> const& lhs, pair<int, string> const& rhs)
{
return lhs.first < rhs.first;
}

vector<pair<int, string> > mirror_map(map<string, int> const& m)
{
vector<pair<int, string> > mirror;
for (map<string, int>::const_iterator i = m.begin(); i != m.end(); ++i)
{
mirror.push_back(pair<int, string>(i->second, i->first));
}
std::sort(mirror.begin(), mirror.end(), less_first);
return mirror;
}

typedef vector<pair<int, string> >::iterator viter;
pair<viter, viter> word_range(vector<pair<int, string> >& v, int n)
{
return std::equal_range(v.begin(),
v.end(),
pair<int, string>(n, ""),
less_first);
}

int main()
{
map<string, int> words;
words["love"] = 4;
words["like"] = 3;
words["the"] = 10;
words["hate"] = 4;
words["toodles"] = 1;

vector<pair<int, string> > mirror = mirror_map(words);

// The whole mirror
for (viter i = mirror.begin() ; i != mirror.end() ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
std::cout << "----\n";

// The part of the mirror with value 4.
std::pair<viter, viter> range = word_range(mirror, 4);
for (viter i = range.first ; i != range.second ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
return 0;
}


It is more lightweight than a multimap<int, string>, but not by as
much as I had originally hoped. I assumed that the map would not
change after it had been built up, i.e., you don't need the mirror
while building the map.
 
S

Steve Edwards

Neil Cerutti said:
Here's another idea, which turned out to be more complicated than I at
first thought. The original plan was to store actual map iterators in
the vector, but then calling equal range became problematic, because I
had no iterator to pass in as the value to search for.

#include <iostream>
#include <iterator>
#include <vector>
#include <map>
#include <string>
#include <algorithm>

using std::map;
using std::string;
using std::vector;
using std::cout;
using std::pair;

bool less_first(pair<int, string> const& lhs, pair<int, string> const& rhs)
{
return lhs.first < rhs.first;
}

vector<pair<int, string> > mirror_map(map<string, int> const& m)
{
vector<pair<int, string> > mirror;
for (map<string, int>::const_iterator i = m.begin(); i != m.end(); ++i)
{
mirror.push_back(pair<int, string>(i->second, i->first));
}
std::sort(mirror.begin(), mirror.end(), less_first);
return mirror;
}

typedef vector<pair<int, string> >::iterator viter;
pair<viter, viter> word_range(vector<pair<int, string> >& v, int n)
{
return std::equal_range(v.begin(),
v.end(),
pair<int, string>(n, ""),
less_first);
}

int main()
{
map<string, int> words;
words["love"] = 4;
words["like"] = 3;
words["the"] = 10;
words["hate"] = 4;
words["toodles"] = 1;

vector<pair<int, string> > mirror = mirror_map(words);

// The whole mirror
for (viter i = mirror.begin() ; i != mirror.end() ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
std::cout << "----\n";

// The part of the mirror with value 4.
std::pair<viter, viter> range = word_range(mirror, 4);
for (viter i = range.first ; i != range.second ; ++i)
{
cout << i->first << ": " << i->second << "\n";
}
return 0;
}


It is more lightweight than a multimap<int, string>, but not by as
much as I had originally hoped. I assumed that the map would not
change after it had been built up, i.e., you don't need the mirror
while building the map.


Whoa... thanks for this... I've been staring at it for 20 minutes trying
to get my head around it (I'm new to the stl.) Sorry I can't say
anything more constructive at the moment, until I can understand whether
doing it this way is going to be beneficial. Not having any STL text
book, I'd only found <map> code snippets on the web that seemed to suit
my needs. Having read this and googled for 'pair', I'm amazed I got as
far as I did without understanding them. (Up until now I'd been using
Objective-C which has NSDictionary and similar structures, that hide the
guts of this kind of stuff behind a lot of really intuitive convenience
functions... but I prefer C++ so here I am!)
 
N

Neil Cerutti

Whoa... thanks for this... I've been staring at it for 20 minutes
trying to get my head around it (I'm new to the stl.)

It basically copies the map of string->int into a vector, sorted by
int, of int->string.

I recall now that I used something like this in a small project a few
years ago, but I used a vector of *pointers* to map iterators. In
practice, the above solution is a lot more robust.
Sorry I can't
say anything more constructive at the moment, until I can understand
whether doing it this way is going to be beneficial. Not having any
STL text book, I'd only found <map> code snippets on the web that
seemed to suit my needs. Having read this and googled for 'pair',
I'm amazed I got as far as I did without understanding them. (Up
until now I'd been using Objective-C which has NSDictionary and
similar structures, that hide the guts of this kind of stuff behind
a lot of really intuitive convenience functions... but I prefer C++
so here I am!)

The way you're doing it now is clearer and easier. The vector mirror
could be considered a lower-level optimisation, which it probably
turns out you do not need.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top