Faster std::map?

hg · Aug 21, 2007

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

From the code above one can see that I have 4 maps which are indexed

by std::string and then by time_t.

It doesn't seem to be possible to resize the map in order to allocate
memory for more items at the same time. I think that could certainly
help on speed.

Any suggestions are welcome.

Thanks.

-- Henrik

Ian Collins · Aug 21, 2007

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

From the code above one can see that I have 4 maps which are indexed

Click to expand...

by std::string and then by time_t.

It doesn't seem to be possible to resize the map in order to allocate
memory for more items at the same time. I think that could certainly
help on speed.

Profile and see where the bottlenecks are.

Does your R.GetString(0) generate something that could be cached?

Mathias Waack · Aug 21, 2007

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

From the code above one can see that I have 4 maps which are indexed

Click to expand...

by std::string and then by time_t.

Insertion into a map could be expensive, so it may help you to use only one
map. You're using the same key for each map, so instead of using 3 maps you
could store the values into a struct and use only one map:

struct T { int a,b,c; };
T t; t.a = R.GetInt(2); t.b = R.GetInt(3)....
MonthlyStat.m_UsageMap[R.GetString(0)]].m_Map[(time_t)R.GetInt(1)] = t;

It doesn't seem to be possible to resize the map in order to allocate
memory for more items at the same time. I think that could certainly
help on speed.

Not for a map, but you could use a sorted vector instead (assuming the
database is able to deliver sorted data) and use the search algorithms from
the STL for the data lookup.

Mathias

hg · Aug 21, 2007

Profile and see where the bottlenecks are.

Does your R.GetString(0) generate something that could be cached?

Actually I just tried to replace the string with an int as the first
index. However it didn't really give much. Not anything noticable at
least.
The index could eventually be cached but it requires some work do to
that by changing the sql queries related to this to return data in a
specific way known to be sorted. It's not likely going to help because
sorting the data elsewhere in the system could introduce new
bottlenecks elsewhere.

Thanks for the suggestion though.

-- Henrik

hg · Aug 21, 2007

Insertion into a map could be expensive, so it may help you to use only one

map. You're using the same key for each map, so instead of using 3 maps you
could store the values into a struct and use only one map:

struct T { int a,b,c; };
T t; t.a = R.GetInt(2); t.b = R.GetInt(3)....
MonthlyStat.m_UsageMap[R.GetString(0)]].m_Map[(time_t)R.GetInt(1)] = t;

Right thanks. This is actually how the code used to be before but was
refactored in order to make it more simple to understand. Some of the
previous datastructures turned to be very clumsy when the 3 maps were
combined into one.
I think with some more work it'll be collected again though.

Not for a map, but you could use a sorted vector instead (assuming the
database is able to deliver sorted data) and use the search algorithms from
the STL for the data lookup.

I think this is a route that I don't want to go with. I'm sure the
standard containers can do a better job than me so I'll try to stay as
plain and strict to the existing containers then invent something new.

Thanks.

-- Henrik

Ian Collins · Aug 21, 2007

Actually I just tried to replace the string with an int as the first
index. However it didn't really give much. Not anything noticable at
least.

Before you try anything else, profile!

The suggestion of letting the database engine do the sort for you was a
good one.

Ian Collins · Aug 21, 2007

Please don't snip attributions.

I think this is a route that I don't want to go with. I'm sure the
standard containers can do a better job than me so I'll try to stay as
plain and strict to the existing containers then invent something new.

std::vector is a standard container! Let the database engine do the
sort for you.

=?ISO-8859-1?Q?Erik_Wikstr=F6m?= · Aug 21, 2007

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

You can probably get a small speed-up by caching the result of
'MonthlyStat.m_UsageMap[R.GetString(0)]', so something like

while (R.Read())
{
std::map<time_t, int> m& = MonthlyStat.m_UsageMap[R.GetString(0)];
m[(time_t) R.GetInt(1)] = R.GetInt(2);
m[(time_t) R.GetInt(1)] = R.GetInt(3);
m[(time_t) R.GetInt(1)] = R.GetInt(4);
}

=?ISO-8859-1?Q?Erik_Wikstr=F6m?= · Aug 21, 2007

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

Click to expand...

You can probably get a small speed-up by caching the result of
'MonthlyStat.m_UsageMap[R.GetString(0)]', so something like

while (R.Read())
{
std::map<time_t, int> m& = MonthlyStat.m_UsageMap[R.GetString(0)];
m[(time_t) R.GetInt(1)] = R.GetInt(2);
m[(time_t) R.GetInt(1)] = R.GetInt(3);
m[(time_t) R.GetInt(1)] = R.GetInt(4);
}

That was a bit off, but I think you get the picture.

Frank Birbacher · Aug 22, 2007

Hi!

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

On my system I discovered that a map of maps really performed bad (I
don't know why though). You can try to use a single map instead with a
combined key (and combined values as Mathias suggested):

typedef pair<string, time_t> Key;
struct T { int a,b,c; }
typedef map<Key, T> MonthlyStatsMap;

MonthlyStatsMap usageMap;

while(R.Read())
{
const T value = {
R.GetInt(2),
R.GetInt(3),
R.GetInt(4)
};
usageMap.insert(make_pair(
Key(
R.GetString(0),
R.GetInt(1)
),
value
));
}

And I hope the R.GetString(0) does not represent the name of a month!!
Representing the names as strings is not likely to perform better than
an enum.

Another thing is: a map indexed by string can be replaced by a trie (see
http://de.wikipedia.org/wiki/Trie ). But the standard does not provide
such a container. There is one hidden in Boost.Spirit (see
http://www.boost.org/libs/spirit/doc/symbols.html ).

HTH,
Frank

John · Aug 22, 2007

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

From the code above one can see that I have 4 maps which are indexed

Click to expand...

by std::string and then by time_t.

It doesn't seem to be possible to resize the map in order to allocate
memory for more items at the same time. I think that could certainly
help on speed.

Any suggestions are welcome.

Thanks.

-- Henrik

you could try boost::fast_allocator as the allocator for the map.
I had to do alot of inserts into a map and found that this helped.

also map has an insert with a hint iterator that you can use
if you know the data from the database is sorted.

Markus Schoder · Aug 22, 2007

Matthias said:
Matthias said:

Insertion into a map could be expensive, so it may help you to use only
one map. You're using the same key for each map, so instead of using 3
maps you could store the values into a struct and use only one map:

struct T { int a,b,c; };
T t; t.a = R.GetInt(2); t.b = R.GetInt(3)....
MonthlyStat.m_UsageMap[R.GetString(0)]].m_Map[(time_t)R.GetInt(1)] = t;

Click to expand...

Right thanks. This is actually how the code used to be before but was
refactored in order to make it more simple to understand. Some of the
previous datastructures turned to be very clumsy when the 3 maps were
combined into one.
I think with some more work it'll be collected again though.

Matthias' approach looks vastly superior to me. How the 3 maps instead
can be considered simpler to understand by anyone is a mystery to me.

With Matthias' approach a small test program inserting 123000 records
(including reading from a text file) run less than 0.5s (Athlon 64
2.4GHz).

Is there in memory tree structure faster than std::map?	11	Aug 12, 2012
thread safety and std::map	2	Mar 4, 2007
std::transform for std::map	6	Mar 16, 2009
problem with iterator (map iterator)	3	Jan 10, 2014
help for the core dump in ::std::map	3	Nov 18, 2009
bogus entries in map (std library)	4	Dec 12, 2007
std::map with nonunique keys	12	Sep 4, 2009
Translater + module + tkinter	1	Feb 16, 2023

Faster std::map?

hg

Ian Collins

Mathias Waack

hg

hg

Ian Collins

Ian Collins

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

Frank Birbacher

John

Markus Schoder

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads