Faster std::map?

Discussion in 'C++' started by hg@x-formation.com, Aug 21, 2007.

  1. Guest

    Hello,

    I'm reading a sequence of 123000 records from a database which needs
    to be indexed into a std::map for further processing.
    Currently I've found that this process can take up to 30 seconds to
    perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
    like to cut this time down.

    I've verified the actual data reading part to take less then 10
    seconds which means that the map is taking about 2/3 of the time.

    while (R.Read())
    {
    MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
    R.GetInt(1)] = R.GetInt(2);
    MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
    R.GetInt(1)] = R.GetInt(4);
    MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
    R.GetInt(1)] = R.GetInt(3);
    }

    >From the code above one can see that I have 4 maps which are indexed

    by std::string and then by time_t.

    It doesn't seem to be possible to resize the map in order to allocate
    memory for more items at the same time. I think that could certainly
    help on speed.

    Any suggestions are welcome.

    Thanks.

    -- Henrik
     
    , Aug 21, 2007
    #1
    1. Advertising

  2. Ian Collins Guest

    wrote:
    > Hello,
    >
    > I'm reading a sequence of 123000 records from a database which needs
    > to be indexed into a std::map for further processing.
    > Currently I've found that this process can take up to 30 seconds to
    > perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
    > like to cut this time down.
    >
    > I've verified the actual data reading part to take less then 10
    > seconds which means that the map is taking about 2/3 of the time.
    >
    > while (R.Read())
    > {
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
    > R.GetInt(1)] = R.GetInt(2);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
    > R.GetInt(1)] = R.GetInt(4);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
    > R.GetInt(1)] = R.GetInt(3);
    > }
    >
    >>From the code above one can see that I have 4 maps which are indexed

    > by std::string and then by time_t.
    >
    > It doesn't seem to be possible to resize the map in order to allocate
    > memory for more items at the same time. I think that could certainly
    > help on speed.
    >

    Profile and see where the bottlenecks are.

    Does your R.GetString(0) generate something that could be cached?

    --
    Ian Collins.
     
    Ian Collins, Aug 21, 2007
    #2
    1. Advertising

  3. wrote:
    > I'm reading a sequence of 123000 records from a database which needs
    > to be indexed into a std::map for further processing.
    > Currently I've found that this process can take up to 30 seconds to
    > perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
    > like to cut this time down.
    >
    > I've verified the actual data reading part to take less then 10
    > seconds which means that the map is taking about 2/3 of the time.
    >
    > while (R.Read())
    > {
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
    > R.GetInt(1)] = R.GetInt(2);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
    > R.GetInt(1)] = R.GetInt(4);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
    > R.GetInt(1)] = R.GetInt(3);
    > }
    >
    >>From the code above one can see that I have 4 maps which are indexed

    > by std::string and then by time_t.


    Insertion into a map could be expensive, so it may help you to use only one
    map. You're using the same key for each map, so instead of using 3 maps you
    could store the values into a struct and use only one map:

    struct T { int a,b,c; };
    T t; t.a = R.GetInt(2); t.b = R.GetInt(3)....
    MonthlyStat.m_UsageMap[R.GetString(0)]].m_Map[(time_t)R.GetInt(1)] = t;

    > It doesn't seem to be possible to resize the map in order to allocate
    > memory for more items at the same time. I think that could certainly
    > help on speed.


    Not for a map, but you could use a sorted vector instead (assuming the
    database is able to deliver sorted data) and use the search algorithms from
    the STL for the data lookup.

    Mathias
     
    Mathias Waack, Aug 21, 2007
    #3
  4. Guest

    > Profile and see where the bottlenecks are.
    >
    > Does your R.GetString(0) generate something that could be cached?
    >


    Actually I just tried to replace the string with an int as the first
    index. However it didn't really give much. Not anything noticable at
    least.
    The index could eventually be cached but it requires some work do to
    that by changing the sql queries related to this to return data in a
    specific way known to be sorted. It's not likely going to help because
    sorting the data elsewhere in the system could introduce new
    bottlenecks elsewhere.

    Thanks for the suggestion though.

    -- Henrik
     
    , Aug 21, 2007
    #4
  5. Guest

    > Insertion into a map could be expensive, so it may help you to use only one
    > map. You're using the same key for each map, so instead of using 3 maps you
    > could store the values into a struct and use only one map:
    >
    > struct T { int a,b,c; };
    > T t; t.a = R.GetInt(2); t.b = R.GetInt(3)....
    > MonthlyStat.m_UsageMap[R.GetString(0)]].m_Map[(time_t)R.GetInt(1)] = t;
    >


    Right thanks. This is actually how the code used to be before but was
    refactored in order to make it more simple to understand. Some of the
    previous datastructures turned to be very clumsy when the 3 maps were
    combined into one.
    I think with some more work it'll be collected again though.

    > Not for a map, but you could use a sorted vector instead (assuming the
    > database is able to deliver sorted data) and use the search algorithms from
    > the STL for the data lookup.
    >


    I think this is a route that I don't want to go with. I'm sure the
    standard containers can do a better job than me so I'll try to stay as
    plain and strict to the existing containers then invent something new.

    Thanks.

    -- Henrik
     
    , Aug 21, 2007
    #5
  6. Ian Collins Guest

    wrote:
    >> Profile and see where the bottlenecks are.
    >>
    >> Does your R.GetString(0) generate something that could be cached?
    >>

    >
    > Actually I just tried to replace the string with an int as the first
    > index. However it didn't really give much. Not anything noticable at
    > least.


    Before you try anything else, profile!

    The suggestion of letting the database engine do the sort for you was a
    good one.

    --
    Ian Collins.
     
    Ian Collins, Aug 21, 2007
    #6
  7. Ian Collins Guest

    Please don't snip attributions.

    wrote:
    >
    >> Not for a map, but you could use a sorted vector instead (assuming the
    >> database is able to deliver sorted data) and use the search algorithms from
    >> the STL for the data lookup.
    >>

    >
    > I think this is a route that I don't want to go with. I'm sure the
    > standard containers can do a better job than me so I'll try to stay as
    > plain and strict to the existing containers then invent something new.
    >

    std::vector is a standard container! Let the database engine do the
    sort for you.

    --
    Ian Collins.
     
    Ian Collins, Aug 21, 2007
    #7
  8. On 2007-08-21 11:07, wrote:
    > Hello,
    >
    > I'm reading a sequence of 123000 records from a database which needs
    > to be indexed into a std::map for further processing.
    > Currently I've found that this process can take up to 30 seconds to
    > perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
    > like to cut this time down.
    >
    > I've verified the actual data reading part to take less then 10
    > seconds which means that the map is taking about 2/3 of the time.
    >
    > while (R.Read())
    > {
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
    > R.GetInt(1)] = R.GetInt(2);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
    > R.GetInt(1)] = R.GetInt(4);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
    > R.GetInt(1)] = R.GetInt(3);
    > }


    You can probably get a small speed-up by caching the result of
    'MonthlyStat.m_UsageMap[R.GetString(0)]', so something like

    while (R.Read())
    {
    std::map<time_t, int> m& = MonthlyStat.m_UsageMap[R.GetString(0)];
    m[(time_t) R.GetInt(1)] = R.GetInt(2);
    m[(time_t) R.GetInt(1)] = R.GetInt(3);
    m[(time_t) R.GetInt(1)] = R.GetInt(4);
    }

    --
    Erik Wikström
     
    =?ISO-8859-1?Q?Erik_Wikstr=F6m?=, Aug 21, 2007
    #8
  9. On 2007-08-21 12:21, Erik Wikström wrote:
    > On 2007-08-21 11:07, wrote:
    >> Hello,
    >>
    >> I'm reading a sequence of 123000 records from a database which needs
    >> to be indexed into a std::map for further processing.
    >> Currently I've found that this process can take up to 30 seconds to
    >> perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
    >> like to cut this time down.
    >>
    >> I've verified the actual data reading part to take less then 10
    >> seconds which means that the map is taking about 2/3 of the time.
    >>
    >> while (R.Read())
    >> {
    >> MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
    >> R.GetInt(1)] = R.GetInt(2);
    >> MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
    >> R.GetInt(1)] = R.GetInt(4);
    >> MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
    >> R.GetInt(1)] = R.GetInt(3);
    >> }

    >
    > You can probably get a small speed-up by caching the result of
    > 'MonthlyStat.m_UsageMap[R.GetString(0)]', so something like
    >
    > while (R.Read())
    > {
    > std::map<time_t, int> m& = MonthlyStat.m_UsageMap[R.GetString(0)];
    > m[(time_t) R.GetInt(1)] = R.GetInt(2);
    > m[(time_t) R.GetInt(1)] = R.GetInt(3);
    > m[(time_t) R.GetInt(1)] = R.GetInt(4);
    > }


    That was a bit off, but I think you get the picture.

    --
    Erik Wikström
     
    =?ISO-8859-1?Q?Erik_Wikstr=F6m?=, Aug 21, 2007
    #9
  10. Hi!

    schrieb:
    > I've verified the actual data reading part to take less then 10
    > seconds which means that the map is taking about 2/3 of the time.
    >
    > while (R.Read())
    > {
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
    > R.GetInt(1)] = R.GetInt(2);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
    > R.GetInt(1)] = R.GetInt(4);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
    > R.GetInt(1)] = R.GetInt(3);
    > }


    On my system I discovered that a map of maps really performed bad (I
    don't know why though). You can try to use a single map instead with a
    combined key (and combined values as Mathias suggested):

    typedef pair<string, time_t> Key;
    struct T { int a,b,c; }
    typedef map<Key, T> MonthlyStatsMap;

    MonthlyStatsMap usageMap;

    while(R.Read())
    {
    const T value = {
    R.GetInt(2),
    R.GetInt(3),
    R.GetInt(4)
    };
    usageMap.insert(make_pair(
    Key(
    R.GetString(0),
    R.GetInt(1)
    ),
    value
    ));
    }

    And I hope the R.GetString(0) does not represent the name of a month!!
    Representing the names as strings is not likely to perform better than
    an enum.

    Another thing is: a map indexed by string can be replaced by a trie (see
    http://de.wikipedia.org/wiki/Trie ). But the standard does not provide
    such a container. There is one hidden in Boost.Spirit (see
    http://www.boost.org/libs/spirit/doc/symbols.html ).

    HTH,
    Frank
     
    Frank Birbacher, Aug 22, 2007
    #10
  11. John Guest

    On Aug 21, 5:07 am, wrote:
    > Hello,
    >
    > I'm reading a sequence of 123000 records from a database which needs
    > to be indexed into a std::map for further processing.
    > Currently I've found that this process can take up to 30 seconds to
    > perform on a modern system with 3400 MHz cpu and 1 gb ram. I would
    > like to cut this time down.
    >
    > I've verified the actual data reading part to take less then 10
    > seconds which means that the map is taking about 2/3 of the time.
    >
    > while (R.Read())
    > {
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map1[(time_t)
    > R.GetInt(1)] = R.GetInt(2);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map2[(time_t)
    > R.GetInt(1)] = R.GetInt(4);
    > MonthlyStat.m_UsageMap[R.GetString(0)].m_Map3[(time_t)
    > R.GetInt(1)] = R.GetInt(3);
    > }
    >
    > >From the code above one can see that I have 4 maps which are indexed

    >
    > by std::string and then by time_t.
    >
    > It doesn't seem to be possible to resize the map in order to allocate
    > memory for more items at the same time. I think that could certainly
    > help on speed.
    >
    > Any suggestions are welcome.
    >
    > Thanks.
    >
    > -- Henrik


    you could try boost::fast_allocator as the allocator for the map.
    I had to do alot of inserts into a map and found that this helped.

    also map has an insert with a hint iterator that you can use
    if you know the data from the database is sorted.
     
    John, Aug 22, 2007
    #11
  12. On Tue, 21 Aug 2007 02:48:47 -0700, hg wrote:
    > Matthias Waack wrote:
    >> Insertion into a map could be expensive, so it may help you to use only
    >> one map. You're using the same key for each map, so instead of using 3
    >> maps you could store the values into a struct and use only one map:
    >>
    >> struct T { int a,b,c; };
    >> T t; t.a = R.GetInt(2); t.b = R.GetInt(3)....
    >> MonthlyStat.m_UsageMap[R.GetString(0)]].m_Map[(time_t)R.GetInt(1)] = t;
    >>
    >>

    > Right thanks. This is actually how the code used to be before but was
    > refactored in order to make it more simple to understand. Some of the
    > previous datastructures turned to be very clumsy when the 3 maps were
    > combined into one.
    > I think with some more work it'll be collected again though.


    Matthias' approach looks vastly superior to me. How the 3 maps instead
    can be considered simpler to understand by anyone is a mystery to me.

    With Matthias' approach a small test program inserting 123000 records
    (including reading from a text file) run less than 0.5s (Athlon 64
    2.4GHz).

    --
    Markus Schoder
     
    Markus Schoder, Aug 22, 2007
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Jansson
    Replies:
    5
    Views:
    6,391
    Ivan Vecerina
    Mar 17, 2005
  2. Replies:
    1
    Views:
    441
    red floyd
    Dec 21, 2008
  3. Thomas J. Gritzan
    Replies:
    6
    Views:
    1,037
    James Kanze
    Dec 22, 2008
  4. James Kanze
    Replies:
    0
    Views:
    2,046
    James Kanze
    Dec 21, 2008
  5. Melzzzzz
    Replies:
    11
    Views:
    890
    Jorgen Grahn
    Aug 14, 2012
Loading...

Share This Page