std::map performance

J

jmoy

I am a C programmer graduating to C++. As an exercise I wrote a
program to count the number of times that different words occur in a
text file. Though a hash table might have been a better choice, I
chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees. The result is not changed very much if I replace std::string as
the key with a simple string class of my own (I guess I cannot use
char* because of the value semantics of the standard containers). I am
using GNU libstdc++3 and gcc 3.3 with the -03 (all optimizations)
flag. Is this a problem with my program, my C++ implementation, or
just the necessary overhead of using a more general class?

The structure of the program is as follows:

std::map<std::string,long> words;
main()
{
char buf[maxbuf];
while (/*Not end of input*/)
{
//Get next word into buf
insert(buf);
}
//Print out the words
}

void insert(char *s)
{
long &l=words;
if (l<numeric_limits<long>::max())
++l;
}
 
J

John Harrison

jmoy said:
I am a C programmer graduating to C++. As an exercise I wrote a
program to count the number of times that different words occur in a
text file. Though a hash table might have been a better choice, I
chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees. The result is not changed very much if I replace std::string as
the key with a simple string class of my own (I guess I cannot use
char* because of the value semantics of the standard containers). I am
using GNU libstdc++3 and gcc 3.3 with the -03 (all optimizations)
flag. Is this a problem with my program, my C++ implementation, or
just the necessary overhead of using a more general class?

Your program looks fine to me, I can't comment on your implementation.

I would expect a hand coded binary tree to be faster than std::map provided
the data is unordered. Try your hand coded binary tree on a large file that
is already in alphabetical order and I think you'll see a big performance
hit (assuming your binary tree code is alphabetically ordered). In other
words you are paying a price for using std::map over a binary tree but
std::map guarantees better worst case performance.

A hash table would have been better, it's coming to standard C++ soon!

john
 
M

M. Hrabowski

Hi,

[snip]
void insert(char *s)
{
long &l=words;
if (l<numeric_limits<long>::max())
++l;
}


in my experience using the []-operator with maps isn't a very good thing if
you want performance.
I would suggest to try both a version with find() and conditional insert()
afterwards and a version with the std::pair<iterator,bool> insert() method.

If you try it, please post results.

Cheers
Max
 
T

tom_usenet

I am a C programmer graduating to C++. As an exercise I wrote a
program to count the number of times that different words occur in a
text file. Though a hash table might have been a better choice, I
chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees. The result is not changed very much if I replace std::string as
the key with a simple string class of my own (I guess I cannot use
char* because of the value semantics of the standard containers). I am
using GNU libstdc++3 and gcc 3.3 with the -03 (all optimizations)
flag. Is this a problem with my program, my C++ implementation, or
just the necessary overhead of using a more general class?

The structure of the program is as follows:

std::map<std::string,long> words;

How about unsigned long?
main()
{
char buf[maxbuf];
while (/*Not end of input*/)
{
//Get next word into buf
insert(buf);
}
//Print out the words
}

void insert(char *s)
{
long &l=words;
if (l<numeric_limits<long>::max())
++l;
}


I assume the only difference between the hand-coded and std::map
versions is the insert method? How does your hand coded version store
the strings? As allocated char*s?

The basic problem with the map version is that a new std::string
object has to be allocated even when the map already contains the word
in question. The solution is to read into a std::string, not a char
buffer. e.g.

int main()
{
std::string word;
word.reserve(100);
std::map<std::string, unsigned long> words;
while (std::cin >> word)
++words[word];

for(std::map<std::string, unsigned long>::iterator i=words.begin(),
end = words.end(); i != end; ++i)
{
std::cout << i->first << ": " << i->second << '\n';
}
}

Because "word" should maintain capacity, a new memory allocation won't
(or shouldn't) be needed every iteration of the loop.

Tom
 
J

Jerry Coffin

I am a C programmer graduating to C++. As an exercise I wrote a
program to count the number of times that different words occur in a
text file. Though a hash table might have been a better choice, I
chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees. The result is not changed very much if I replace std::string as
the key with a simple string class of my own (I guess I cannot use
char* because of the value semantics of the standard containers). I am
using GNU libstdc++3 and gcc 3.3 with the -03 (all optimizations)
flag. Is this a problem with my program, my C++ implementation, or
just the necessary overhead of using a more general class?

About the only way for any of us to give an answer would be to do the
same thing you probably should: break out the profiler, and see what
part of your code is really taking the time. Without that, almost any
guess anybody makes is just that: a guess -- and unless the person in
question has profiled a fair amount of similar code on the same (or
extremely similar) implementation, it's not even likely to be an
informed guess.
 
S

Siemel Naran

I would expect a hand coded binary tree to be faster than std::map provided
the data is unordered. Try your hand coded binary tree on a large file that
is already in alphabetical order and I think you'll see a big performance
hit (assuming your binary tree code is alphabetically ordered). In other
words you are paying a price for using std::map over a binary tree but
std::map guarantees better worst case performance.

Can you elaborate? Why is std::map slower? Has it something to do with
balancing?
 
S

Siemel Naran

in my experience using the []-operator with maps isn't a very good thing if
you want performance.

Why is this? Or do you have some code or empirical data?
I would suggest to try both a version with find() and conditional insert()
afterwards and a version with the std::pair<iterator,bool> insert()
method.

Using find followed by insert seems less efficient because in find you scan
nodes to realize the element is not there, and then in insert you scan these
nodes again to place the element there. But the STLPort implementation of
operator[] seems more efficient. It finds the closest element, and then
inserts the new element there using the closest element as a hint.

_Tp& operator[](const key_type& __k) {
iterator __i = lower_bound(__k);
// __i->first is greater than or equivalent to __k.
if (__i == end() || key_comp()(__k, (*__i).first))
__i = insert(__i, value_type(__k, _Tp()));
return (*__i).second;
}
 
S

Siemel Naran

Jacques Labuschagne said:
jmoy wrote:

Odds are that this is your bottleneck. How are you reading the file?

But both methods (the std::map and hand coded map) use the same file
scanning algorithm, so while the file scanning can be optimized, it is not
the issue here.
 
J

John Harrison

Siemel Naran said:
Can you elaborate? Why is std::map slower? Has it something to do with
balancing?

Balancing is what I was thinking of. Also the possibility (it wasn't clear
from the OP's post) that his binary tree code was tailored in some way to
his application.

john
 
J

Jesper Madsen

In the handcoded version, do you use std::string there too, or const char*
or char*.. You could be paying for constructing strings...
 
D

DaKoadMunky

_Tp& operator[](const key_type& __k) {
iterator __i = lower_bound(__k);
// __i->first is greater than or equivalent to __k.
if (__i == end() || key_comp()(__k, (*__i).first))
__i = insert(__i, value_type(__k, _Tp()));
return (*__i).second;
}

This is the version on my platform...

mapped_type& operator[](const key_type& _Keyval)
{
iterator _Where = insert(value_type(_Keyval, mapped_type())).first;
return ((*_Where).second);
}

Having not seen it before I assumed that it would do a find followed by an
insert if necessary, just like the version you posted.

As can be seen in my version insert is called unconditionally which results in
numerous constructor/destructor calls for key_type, mapped_type and value_type.

If my client code replaces calls to operator[] with code that uses the public
find and insert functions the cost of updating an existing element is reduced
by a slew of constructor/destructor calls.

Can anyone offer insight as to why the authors might have chosen this
particular approach?
 
P

P.J. Plauger

_Tp& operator[](const key_type& __k) {
iterator __i = lower_bound(__k);
// __i->first is greater than or equivalent to __k.
if (__i == end() || key_comp()(__k, (*__i).first))
__i = insert(__i, value_type(__k, _Tp()));
return (*__i).second;
}

This is the version on my platform...

mapped_type& operator[](const key_type& _Keyval)
{
iterator _Where = insert(value_type(_Keyval, mapped_type())).first;
return ((*_Where).second);
}

Having not seen it before I assumed that it would do a find followed by an
insert if necessary, just like the version you posted.

As can be seen in my version insert is called unconditionally which results in
numerous constructor/destructor calls for key_type, mapped_type and value_type.

If my client code replaces calls to operator[] with code that uses the public
find and insert functions the cost of updating an existing element is reduced
by a slew of constructor/destructor calls.

Can anyone offer insight as to why the authors might have chosen this
particular approach?

23.3.1.2 map element access

T& operator[](const key_type& x);

Returns: (*((insert(make_pair( x, T()))).first)).second.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
S

Siemel Naran

P.J. Plauger said:
_Tp& operator[](const key_type& __k) {
iterator __i = lower_bound(__k);
// __i->first is greater than or equivalent to __k.
if (__i == end() || key_comp()(__k, (*__i).first))
__i = insert(__i, value_type(__k, _Tp()));
return (*__i).second;
}
This is the version on my platform...

mapped_type& operator[](const key_type& _Keyval)
{
iterator _Where = insert(value_type(_Keyval, mapped_type())).first;
return ((*_Where).second);
}

Having not seen it before I assumed that it would do a find followed by an
insert if necessary, just like the version you posted.

As can be seen in my version insert is called unconditionally which results in
numerous constructor/destructor calls for key_type, mapped_type and value_type.

If my client code replaces calls to operator[] with code that uses the public
find and insert functions the cost of updating an existing element is reduced
by a slew of constructor/destructor calls.

Can anyone offer insight as to why the authors might have chosen this
particular approach?

23.3.1.2 map element access

T& operator[](const key_type& x);

Returns: (*((insert(make_pair( x, T()))).first)).second.

Function operator[] has only to behave as if it were written this way.

Anyway, the OP said he's using gcc 3.3, which I think uses the STLPort,
which is the version at the top of this email.
 
S

Siemel Naran

John Harrison said:
Balancing is what I was thinking of. Also the possibility (it wasn't clear
from the OP's post) that his binary tree code was tailored in some way to
his application.

But balancing is supposed to make it faster. In an unbalanced tree, lookup
is worst case O(N), but in a balanced it is always O(N). In any case, I
don't think the act of balancing the tree should make the algorithm twice as
slow, which is what the OP reported. The answer probably has to do with
extra calls to the copy constructor, as indicated by tom_usenet and Jesper.
 
J

John Harrison

Siemel Naran said:
But balancing is supposed to make it faster. In an unbalanced tree, lookup
is worst case O(N), but in a balanced it is always O(N).

O(log N)

Balancing introduces a constant overhead, with balancing insertion and
deleteion might take twice as long as without (say). In the case of
randomised or unordered data a binary tree will be balanced without any
extra effort, therefore a binary tree could be faster than a balanced tree
by a constant factor given the right sort of data.
In any case, I
don't think the act of balancing the tree should make the algorithm twice as
slow, which is what the OP reported. The answer probably has to do with
extra calls to the copy constructor, as indicated by tom_usenet and Jesper.

Yes I think you're right.

john
 
R

Roshan

use map::find() instead of map::eek:perator[] that will most likely speed
up things as operator[] serves 2 purposes...insert and lookup.


SGI stl docs make a note that operator[] for map is really only for
convenience
and stricly speaking unnecessary!

http://www.sgi.com/tech/stl/Map.html

do let us know if you see any perf diff.

--roshan
 
J

jmoy

I am a C programmer graduating to C++. As an exercise I wrote a
program to count the number of times that different words occur in a
text file. Though a hash table might have been a better choice, I
chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees. ....
The structure of the program is as follows: ....
std::map<std::string,long> words;
void insert(char *s)
{
long &l=words;
if (l<numeric_limits<long>::max())
++l;
}


Thanks everybody for your interest. To eliminate possible overhead
from std::string, I used char * (code below). But the gain is only
about 5%. Profiling shows that both in the earlier and latter versions
most of the time is taken by the function _Rb_tree::lower_bound which
is called by map::lower_bound (in the original case this is in turn
called by map::eek:perator[]).

By the way my handcoded binary tree is extremely naive and I agree
that it will have a much worse worst case performance. But an average
case performance hit of 100% seems too high a cost for insurance.

Just to check whether the slowdown may be due to function call
overhead, I made my insert() an inline function. This actually led to
an increase in running time! Why might this be happening? Can it be
due to cache effects?

Going the other way round, I changed insert() in the handcoded version
to do nothing but call another function do_insert() to do the actual
processing. This however does not lead to any appreciable slowdown.

The code for the std::map version using char* is:

struct Cstring_less
{
bool operator()(const char *s, const char *t) const{
return strcmp(s,t)<0;
}
};
typedef std::map<const char *,long,Cstring_less> Map;
Map words;

const long maximum=std::numeric_limits<long>::max();

void insert(const char *s)
{
Map::iterator p=words.lower_bound(s);
if (p==words.end()||strcmp(p->first,s)!=0){//Not found
char *dup=new char[strlen(s)+1];
strcpy(dup,s);
std::pair<const char * const,long> tuple(dup,1L);
p=words.insert(p,tuple);
}else{//Found
if (p->second<maximum)
p->second++;
}
}
 
P

P.J. Plauger

Can anyone offer insight as to why the authors might have chosen this
particular approach?

23.3.1.2 map element access

T& operator[](const key_type& x);

Returns: (*((insert(make_pair( x, T()))).first)).second.

Function operator[] has only to behave as if it were written this way.

Anyway, the OP said he's using gcc 3.3, which I think uses the STLPort,
which is the version at the top of this email.

Agreed. What part of the above can you eliminate if a particularly
thorough validation suite is checking for full conformance?

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
S

Siemel Naran

P.J. Plauger said:
23.3.1.2 map element access

T& operator[](const key_type& x);

Returns: (*((insert(make_pair( x, T()))).first)).second.

Function operator[] has only to behave as if it were written this way.
Agreed. What part of the above can you eliminate if a particularly
thorough validation suite is checking for full conformance?

All of it? If T::T() has side effects then the STL way eliminates this.
But just as the return value optimization can ignore side effects in the
copy constructor, so should operator[]. Other than that, I can't think of
anything else. Do you know? (I may not check newsgroups till Tuesday or
Wednesday.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,196
Latest member
ScottChare

Latest Threads

Top