Faster std::map?

H

hg

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}
From the code above one can see that I have 4 maps which are indexed
by std::string and then by time_t.

It doesn't seem to be possible to resize the map in order to allocate
memory for more items at the same time. I think that could certainly
help on speed.

Any suggestions are welcome.

Thanks.

-- Henrik
 
I

Ian Collins

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}
From the code above one can see that I have 4 maps which are indexed
by std::string and then by time_t.

It doesn't seem to be possible to resize the map in order to allocate
memory for more items at the same time. I think that could certainly
help on speed.
Profile and see where the bottlenecks are.

Does your R.GetString(0) generate something that could be cached?
 
M

Mathias Waack

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}
From the code above one can see that I have 4 maps which are indexed
by std::string and then by time_t.

Insertion into a map could be expensive, so it may help you to use only one
map. You're using the same key for each map, so instead of using 3 maps you
could store the values into a struct and use only one map:

struct T { int a,b,c; };
T t; t.a = R.GetInt(2); t.b = R.GetInt(3)....
MonthlyStat.m_UsageMap[R.GetString(0)]].m_Map[(time_t)R.GetInt(1)] = t;
It doesn't seem to be possible to resize the map in order to allocate
memory for more items at the same time. I think that could certainly
help on speed.

Not for a map, but you could use a sorted vector instead (assuming the
database is able to deliver sorted data) and use the search algorithms from
the STL for the data lookup.

Mathias
 
H

hg

Profile and see where the bottlenecks are.
Does your R.GetString(0) generate something that could be cached?

Actually I just tried to replace the string with an int as the first
index. However it didn't really give much. Not anything noticable at
least.
The index could eventually be cached but it requires some work do to
that by changing the sql queries related to this to return data in a
specific way known to be sorted. It's not likely going to help because
sorting the data elsewhere in the system could introduce new
bottlenecks elsewhere.

Thanks for the suggestion though.

-- Henrik
 
H

hg

Insertion into a map could be expensive, so it may help you to use only one
map. You're using the same key for each map, so instead of using 3 maps you
could store the values into a struct and use only one map:

struct T { int a,b,c; };
T t; t.a = R.GetInt(2); t.b = R.GetInt(3)....
MonthlyStat.m_UsageMap[R.GetString(0)]].m_Map[(time_t)R.GetInt(1)] = t;

Right thanks. This is actually how the code used to be before but was
refactored in order to make it more simple to understand. Some of the
previous datastructures turned to be very clumsy when the 3 maps were
combined into one.
I think with some more work it'll be collected again though.
Not for a map, but you could use a sorted vector instead (assuming the
database is able to deliver sorted data) and use the search algorithms from
the STL for the data lookup.

I think this is a route that I don't want to go with. I'm sure the
standard containers can do a better job than me so I'll try to stay as
plain and strict to the existing containers then invent something new.

Thanks.

-- Henrik
 
I

Ian Collins

Actually I just tried to replace the string with an int as the first
index. However it didn't really give much. Not anything noticable at
least.

Before you try anything else, profile!

The suggestion of letting the database engine do the sort for you was a
good one.
 
I

Ian Collins

Please don't snip attributions.

I think this is a route that I don't want to go with. I'm sure the
standard containers can do a better job than me so I'll try to stay as
plain and strict to the existing containers then invent something new.
std::vector is a standard container! Let the database engine do the
sort for you.
 
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

You can probably get a small speed-up by caching the result of
'MonthlyStat.m_UsageMap[R.GetString(0)]', so something like

while (R.Read())
{
std::map<time_t, int> m& = MonthlyStat.m_UsageMap[R.GetString(0)];
m[(time_t) R.GetInt(1)] = R.GetInt(2);
m[(time_t) R.GetInt(1)] = R.GetInt(3);
m[(time_t) R.GetInt(1)] = R.GetInt(4);
}
 
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

You can probably get a small speed-up by caching the result of
'MonthlyStat.m_UsageMap[R.GetString(0)]', so something like

while (R.Read())
{
std::map<time_t, int> m& = MonthlyStat.m_UsageMap[R.GetString(0)];
m[(time_t) R.GetInt(1)] = R.GetInt(2);
m[(time_t) R.GetInt(1)] = R.GetInt(3);
m[(time_t) R.GetInt(1)] = R.GetInt(4);
}

That was a bit off, but I think you get the picture.
 
F

Frank Birbacher

Hi!

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}

On my system I discovered that a map of maps really performed bad (I
don't know why though). You can try to use a single map instead with a
combined key (and combined values as Mathias suggested):

typedef pair<string, time_t> Key;
struct T { int a,b,c; }
typedef map<Key, T> MonthlyStatsMap;

MonthlyStatsMap usageMap;

while(R.Read())
{
const T value = {
R.GetInt(2),
R.GetInt(3),
R.GetInt(4)
};
usageMap.insert(make_pair(
Key(
R.GetString(0),
R.GetInt(1)
),
value
));
}

And I hope the R.GetString(0) does not represent the name of a month!!
Representing the names as strings is not likely to perform better than
an enum.

Another thing is: a map indexed by string can be replaced by a trie (see
http://de.wikipedia.org/wiki/Trie ). But the standard does not provide
such a container. There is one hidden in Boost.Spirit (see
http://www.boost.org/libs/spirit/doc/symbols.html ).

HTH,
Frank
 
J

John

Hello,

I'm reading a sequence of 123000 records from a database which needs
to be indexed into a std::map for further processing.
Currently I've found that this process can take up to 30 seconds to
perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
like to cut this time down.

I've verified the actual data reading part to take less then 10
seconds which means that the map is taking about 2/3 of the time.

while (R.Read())
{
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
R.GetInt(1)] = R.GetInt(2);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
R.GetInt(1)] = R.GetInt(4);
MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
R.GetInt(1)] = R.GetInt(3);
}
From the code above one can see that I have 4 maps which are indexed

by std::string and then by time_t.

It doesn't seem to be possible to resize the map in order to allocate
memory for more items at the same time. I think that could certainly
help on speed.

Any suggestions are welcome.

Thanks.

-- Henrik

you could try boost::fast_allocator as the allocator for the map.
I had to do alot of inserts into a map and found that this helped.

also map has an insert with a hint iterator that you can use
if you know the data from the database is sorted.
 
M

Markus Schoder

Matthias said:
Insertion into a map could be expensive, so it may help you to use only
one map. You're using the same key for each map, so instead of using 3
maps you could store the values into a struct and use only one map:

struct T { int a,b,c; };
T t; t.a = R.GetInt(2); t.b = R.GetInt(3)....
MonthlyStat.m_UsageMap[R.GetString(0)]].m_Map[(time_t)R.GetInt(1)] = t;
Right thanks. This is actually how the code used to be before but was
refactored in order to make it more simple to understand. Some of the
previous datastructures turned to be very clumsy when the 3 maps were
combined into one.
I think with some more work it'll be collected again though.

Matthias' approach looks vastly superior to me. How the 3 maps instead
can be considered simpler to understand by anyone is a mystery to me.

With Matthias' approach a small test program inserting 123000 records
(including reading from a text file) run less than 0.5s (Athlon 64
2.4GHz).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top