vector vs map iterator

Discussion in 'C++' started by xyz, Jul 2, 2008.

  1. xyz

    xyz Guest

    I have to run the simulation of a trace file around (7gb contains
    116million entries)...
    presently i am using vector iterators to check the conditions in my
    program.....
    it is taking 2 days to finish whole simulation.....

    my question are the map iterators are faster than vector
    iterators.....
    does it improve the performance....
    thanks to all
     
    xyz, Jul 2, 2008
    #1
    1. Advertising

  2. xyz

    Guest

    On Jul 2, 9:31 am, xyz <> wrote:
    > I have to run the simulation of a trace file around (7gb contains
    > 116million entries)...
    > presently i am using vector iterators to check the conditions in my
    > program.....
    > it is taking 2 days to finish whole simulation.....
    >
    > my question are the map iterators are faster than vector
    > iterators.....
    > does it improve the performance....
    > thanks to all

    Since a vector is guarenteed to be contiguous memory, it's iterator
    *could* be implemented internally as a pointer - which is fast. A
    map is list-like in that is has nodes so it's iterator would have
    to dereference a pointer - which is slower. So, in very general
    terms, I think maps iterators would be slower.

    But...

    Use a good profiler to determine which parts of your program are
    time consuming. Without a profiler, you're just guessing.

    HTH
     
    , Jul 2, 2008
    #2
    1. Advertising

  3. xyz

    Joe Greer Guest

    xyz <> wrote in news:fb7c6ce6-5af3-4e24-8a0b-
    :

    > I have to run the simulation of a trace file around (7gb contains
    > 116million entries)...
    > presently i am using vector iterators to check the conditions in my
    > program.....
    > it is taking 2 days to finish whole simulation.....
    >
    > my question are the map iterators are faster than vector
    > iterators.....
    > does it improve the performance....
    > thanks to all
    >


    I am not sure what operations you are performing on your iterators, but
    vector iterators are about as fast as you get. Maps are generally a tree
    structure of some sort and moving from one node to the next will generally
    require loading a value from memory whereas with a vector, it will simply
    add or subtract a fixed amount from the address already in the iterator.
    This tends to be faster. Same thing applies to a list. Vectors also have
    better locality of reference so when an item is read, the whole page comes
    into memory and you get the objects surrounding the target for free. maps,
    sets, lists etc are better if their semantics match what you want to do
    better. That is, if you find yourself searching for objects a lot, then
    maps and sets are more natural. Though even then, sorting the vector and
    using the binary search algorithms may perform better. The other thing to
    consider is that maps, sets, and lists have more overhead per object. With
    116M objects, that can add up.

    HTH,
    joe
     
    Joe Greer, Jul 2, 2008
    #3
  4. xyz

    Joe Greer Guest

    Joe Greer <> wrote in
    news:Xns9ACF63DE02FB0jgreerdoubletakecom@85.214.90.236:

    > xyz <> wrote in news:fb7c6ce6-5af3-4e24-8a0b-
    > :
    >
    >> I have to run the simulation of a trace file around (7gb contains
    >> 116million entries)...
    >> presently i am using vector iterators to check the conditions in my
    >> program.....
    >> it is taking 2 days to finish whole simulation.....
    >>
    >> my question are the map iterators are faster than vector
    >> iterators.....
    >> does it improve the performance....
    >> thanks to all
    >>

    >
    > I am not sure what operations you are performing on your iterators,
    > but vector iterators are about as fast as you get. Maps are generally
    > a tree structure of some sort and moving from one node to the next
    > will generally require loading a value from memory whereas with a
    > vector, it will simply add or subtract a fixed amount from the address
    > already in the iterator. This tends to be faster. Same thing applies
    > to a list. Vectors also have better locality of reference so when an
    > item is read, the whole page comes into memory and you get the objects
    > surrounding the target for free. maps, sets, lists etc are better if
    > their semantics match what you want to do better. That is, if you
    > find yourself searching for objects a lot, then maps and sets are more
    > natural. Though even then, sorting the vector and using the binary
    > search algorithms may perform better. The other thing to consider is
    > that maps, sets, and lists have more overhead per object. With 116M
    > objects, that can add up.
    >
    > HTH,
    > joe
    >


    I also agree with the comment to use a profiler to be sure you are
    optimizing the proper thing.

    joe
     
    Joe Greer, Jul 2, 2008
    #4
  5. xyz

    xyz Guest

    On Jul 2, 3:49 pm, Joe Greer <> wrote:
    > xyz <> wrote in news:fb7c6ce6-5af3-4e24-8a0b-
    > :
    >
    > > I have to run the simulation of a trace file around (7gb contains
    > > 116million entries)...
    > > presently i am using vector iterators to check the conditions in my
    > > program.....
    > > it is taking 2 days to finish whole simulation.....

    >
    > > my question are the map iterators are faster than vector
    > > iterators.....
    > > does it improve the performance....
    > > thanks to all

    >
    > I am not sure what operations you are performing on your iterators, but
    > vector iterators are about as fast as you get.  Maps are generally a tree
    > structure of some sort and moving from one node to the next will generally
    > require loading a value from memory whereas with a vector, it will simply
    > add or subtract a fixed amount from the address already in the iterator.  
    > This tends to be faster.  Same thing applies to a list.  Vectors also have
    > better locality of reference so when an item is read, the whole page comes
    > into memory and you get the objects surrounding the target for free.  maps,
    > sets, lists etc are better if their semantics match what you want to do
    > better.  That is, if you find yourself searching for objects a lot, then
    > maps and sets are more natural.  Though even then, sorting the vector and
    > using the binary search algorithms may perform better.  The other thing to
    > consider is that maps, sets, and lists have more overhead per object.  With
    > 116M objects, that can add up.
    >
    > HTH,
    > joe


    for example i am doing the operation as below in my program
    whenever i receive a packet or something ...i will check does it
    contain in my vector list....
    so if I contain around 2 million entreis in my vector and i received
    around 100 packets...
    then the computation would be 100 packets * 2 million iterations...

    as my list goes on incresing I will have more iterations..
    this is the problem i am facing with my simulation..
     
    xyz, Jul 2, 2008
    #5
  6. xyz

    xyz Guest

    On Jul 2, 3:50 pm, Joe Greer <> wrote:
    > Joe Greer <> wrote innews:Xns9ACF63DE02FB0jgreerdoubletakecom@85.214.90.236:
    >
    >
    >
    > > xyz <> wrote in news:fb7c6ce6-5af3-4e24-8a0b-
    > > :

    >
    > >> I have to run the simulation of a trace file around (7gb contains
    > >> 116million entries)...
    > >> presently i am using vector iterators to check the conditions in my
    > >> program.....
    > >> it is taking 2 days to finish whole simulation.....

    >
    > >> my question are the map iterators are faster than vector
    > >> iterators.....
    > >> does it improve the performance....
    > >> thanks to all

    >
    > > I am not sure what operations you are performing on your iterators,
    > > but vector iterators are about as fast as you get.  Maps are generally
    > > a tree structure of some sort and moving from one node to the next
    > > will generally require loading a value from memory whereas with a
    > > vector, it will simply add or subtract a fixed amount from the address
    > > already in the iterator.  This tends to be faster.  Same thing applies
    > > to a list.  Vectors also have better locality of reference so when an
    > > item is read, the whole page comes into memory and you get the objects
    > > surrounding the target for free.  maps, sets, lists etc are better if
    > > their semantics match what you want to do better.  That is, if you
    > > find yourself searching for objects a lot, then maps and sets are more
    > > natural.  Though even then, sorting the vector and using the binary
    > > search algorithms may perform better.  The other thing to consider is
    > > that maps, sets, and lists have more overhead per object.  With 116M
    > > objects, that can add up.

    >
    > > HTH,
    > > joe

    >
    > I also agree with the comment to use a profiler to be sure you are
    > optimizing the proper thing.
    >
    > joe


    I have already used the profiler to know the time consuming places in
    my program....
    I checked smaller protion of my 7gb file...which showed the time
    consuming part is at what i mentioned
     
    xyz, Jul 2, 2008
    #6
  7. xyz wrote:
    > I have to run the simulation of a trace file around (7gb contains
    > 116million entries)...
    > presently i am using vector iterators to check the conditions in my
    > program.....
    > it is taking 2 days to finish whole simulation.....
    >
    > my question are the map iterators are faster than vector
    > iterators.....
    > does it improve the performance....


    It's not a question of iterators. It's a question of what is it that
    you are doing with your vector/map and how you are using those iterators.

    (And no, when measured in clock cycles, no operation done to a map
    iterator is faster than to a vector iterator. On the contrary. However,
    I have the feeling you are not really talking about the iterators
    per se.)
     
    Juha Nieminen, Jul 2, 2008
    #7
  8. xyz

    Jim Langston Guest

    "xyz" <> wrote in message
    news:...
    >I have to run the simulation of a trace file around (7gb contains
    > 116million entries)...
    > presently i am using vector iterators to check the conditions in my
    > program.....
    > it is taking 2 days to finish whole simulation.....
    >
    > my question are the map iterators are faster than vector
    > iterators.....
    > does it improve the performance....
    > thanks to all


    Performance for what? Insertions? Deletions? Insertions in middle? End?
    Beginning? Deletion is middle? End? Beginning? Lookups in beginning?
    End? Middle?

    Different containers (vector, set, map, etc...) are designed for different
    tasks and each has it's power and it's weakness.

    Maybe this link will help: http://www.linuxsoftware.co.nz/cppcontainers.html
    maybe it won't. Check the bottom anyway to determine which container to
    chose.

    Also, the wiki has a little bit about the speed of some of the containers:
    http://en.wikipedia.org/wiki/Standard_Template_Library

    Really, without knowing what you are trying to optmize it is hard to say.

    std::vector<MyClass> MyVector;
    /*... */
    MyVector[SomeCounter]
    should be a relatively fast operation, very similar to pointer math.

    std::map<Mykey, MyClass> MyMap;
    /* ... */
    MyMap.find( SomeKey );
    is a binary search lookup.
    MyMap[SomeKey]
    is also a binary key lookup, with the additon of possibly adding the key and
    data.

    Without knowing how you are using the vector it is hard to say. One thing I
    would hope, however is that you are preallocating enough memory for your
    vector so that it doesn't have to keep newing when it runs out of memory.

    I have no idea why your operation is taking 2 days, maybe it should be.
    Maybe it shouldn't. :But without knowing more about what you are actually
    doing anything we come up with is a shot in the dark.
     
    Jim Langston, Jul 2, 2008
    #8
  9. xyz a écrit :
    >>> I have to run the simulation of a trace file around (7gb contains
    >>> 116million entries)...
    >>> presently i am using vector iterators to check the conditions in my
    >>> program.....
    >>> it is taking 2 days to finish whole simulation.....
    >>> my question are the map iterators are faster than vector
    >>> iterators.....
    >>> does it improve the performance....
    >>> thanks to all

    > for example i am doing the operation as below in my program
    > whenever i receive a packet or something ...i will check does it
    > contain in my vector list....
    > so if I contain around 2 million entreis in my vector and i received
    > around 100 packets...
    > then the computation would be 100 packets * 2 million iterations...
    >
    > as my list goes on incresing I will have more iterations..
    > this is the problem i am facing with my simulation..


    I don't understand what you are doing.
    If your algorithm has to be in O(n) then, there is not much to do unless
    you can reorganize your data to have some kind of decision tree or
    another heuristic.

    But if you perform, let say a search, sorting the vector or using map
    will reduce it to O(log(n)). If you can find a good hash, it can be in O(1).

    --
    Michael
     
    Michael DOUBEZ, Jul 2, 2008
    #9
  10. xyz

    xyz Guest

    On Jul 2, 4:10 pm, "Jim Langston" <> wrote:
    > "xyz" <> wrote in message
    >
    > news:...
    >
    > >I have to run the simulation of a trace file around (7gb contains
    > > 116million entries)...
    > > presently i am using vector iterators to check the conditions in my
    > > program.....
    > > it is taking 2 days to finish whole simulation.....

    >
    > > my question are the map iterators are faster than vector
    > > iterators.....
    > > does it improve the performance....
    > > thanks to all

    >
    > Performance for what?  Insertions?  Deletions?  Insertions in middle?  End?
    > Beginning?  Deletion is middle? End?  Beginning?  Lookups in beginning?
    > End?  Middle?
    >
    > Different containers (vector, set, map, etc...) are designed for different
    > tasks and each has it's power and it's weakness.
    >
    > Maybe this link will help:http://www.linuxsoftware.co.nz/cppcontainers.html
    > maybe it won't.  Check the bottom anyway to determine which container to
    > chose.
    >
    > Also, the wiki has a little bit about the speed of some of the containers:http://en.wikipedia.org/wiki/Standard_Template_Library
    >
    > Really, without knowing what you are trying to optmize it is hard to say.
    >
    > std::vector<MyClass> MyVector;
    > /*... */
    > MyVector[SomeCounter]
    > should be a relatively fast operation, very similar to pointer math.
    >
    > std::map<Mykey, MyClass> MyMap;
    > /* ... */
    > MyMap.find( SomeKey );
    > is a binary search lookup.
    > MyMap[SomeKey]
    > is also a binary key lookup, with the additon of possibly adding the key and
    > data.
    >
    > Without knowing how you are using the vector it is hard to say.  One thing I
    > would hope, however is that you are preallocating enough memory for your
    > vector so that it doesn't have to keep newing when it runs out of memory.
    >
    > I have no idea why your operation is taking 2 days, maybe it should be.
    > Maybe it shouldn't.  :But without knowing more about what you are actually
    > doing anything we come up with is a shot in the da



    as i said...if my checking element matches one of the element in my
    vector list then i will collect the statistics...
    if it doesnt matches any one of the vector elements then i will move
    this checking elment to a new vector...

    i hope u all undestand
     
    xyz, Jul 2, 2008
    #10
  11. xyz

    Jim Langston Guest

    > "xyz" <> wrote in message
    > news:...
    > On Jul 2, 3:49 pm, Joe Greer <> wrote:
    > > xyz <> wrote in news:fb7c6ce6-5af3-4e24-8a0b-
    > > :
    > >
    > > > I have to run the simulation of a trace file around (7gb contains
    > > > 116million entries)...
    > > > presently i am using vector iterators to check the conditions in my
    > > > program.....
    > > > it is taking 2 days to finish whole simulation.....

    > >
    > > > my question are the map iterators are faster than vector
    > > > iterators.....
    > > > does it improve the performance....
    > > > thanks to all

    > >
    > > I am not sure what operations you are performing on your iterators, but
    > > vector iterators are about as fast as you get. Maps are generally a tree
    > > structure of some sort and moving from one node to the next will
    > > generally
    > > require loading a value from memory whereas with a vector, it will
    > > simply
    > > add or subtract a fixed amount from the address already in the iterator.
    > > This tends to be faster. Same thing applies to a list. Vectors also have
    > > better locality of reference so when an item is read, the whole page
    > > comes
    > > into memory and you get the objects surrounding the target for free.
    > > maps,
    > > sets, lists etc are better if their semantics match what you want to do
    > > better. That is, if you find yourself searching for objects a lot, then
    > > maps and sets are more natural. Though even then, sorting the vector and
    > > using the binary search algorithms may perform better. The other thing
    > > to
    > > consider is that maps, sets, and lists have more overhead per object.
    > > With
    > > 116M objects, that can add up.


    > for example i am doing the operation as below in my program
    > whenever i receive a packet or something ...i will check does it
    > contain in my vector list....
    > so if I contain around 2 million entreis in my vector and i received
    > around 100 packets...
    > then the computation would be 100 packets * 2 million iterations...
    >
    > as my list goes on incresing I will have more iterations..
    > this is the problem i am facing with my simulation..


    Now we're getting somewhere. So the speed isue is on lookups. Yes, a map
    should be much faster on a lookup. It is a binary tree lookup which is
    something like O(log n) (not sure) where searching for data through a vector
    is O(N).

    It sounds like, however, that you can get away with a set if your data is
    your key, what you are looking for. There is not much difference for lookup
    times as far as I know between a map and a set, but a set should use up a
    little less memory as it doesn't have to store both the key and the data,
    only the data which is the key.

    Be aware, however, that a set or a map is going to take up more memory.
    This may, or may not, be a problem for you. A set and map store their data
    in chunks. Each entry has it's own memory. A vector stores it's data in
    one huge block of memory. A vector's overhead, therefore, is the size of a
    pointer. A sets overhead is the size of the pointer for each entry, plus
    the key entry and overhead.

    Normally this wouldn't concern me that much, but you say you are using 2
    million entries and depending on the size of the entry thsi could be
    substantial. Or not.

    You might also want to look at a hash map or hash set (if it exists?) if
    yoru data is string (text) data. I understand this increases lookup time
    even a bit more.
     
    Jim Langston, Jul 2, 2008
    #11
  12. xyz

    Jim Langston Guest

    "xyz" <> wrote in message
    news:...
    On Jul 2, 4:10 pm, "Jim Langston" <> wrote:
    > "xyz" <> wrote in message
    >
    > news:...
    >
    > >I have to run the simulation of a trace file around (7gb contains
    > > 116million entries)...
    > > presently i am using vector iterators to check the conditions in my
    > > program.....
    > > it is taking 2 days to finish whole simulation.....

    >
    > > my question are the map iterators are faster than vector
    > > iterators.....
    > > does it improve the performance....
    > > thanks to all

    >
    > Performance for what? Insertions? Deletions? Insertions in middle? End?
    > Beginning? Deletion is middle? End? Beginning? Lookups in beginning?
    > End? Middle?
    >
    > Different containers (vector, set, map, etc...) are designed for different
    > tasks and each has it's power and it's weakness.
    >
    > Maybe this link will
    > help:http://www.linuxsoftware.co.nz/cppcontainers.html
    > maybe it won't. Check the bottom anyway to determine which container to
    > chose.
    >
    > Also, the wiki has a little bit about the speed of some of the
    > containers:http://en.wikipedia.org/wiki/Standard_Template_Library
    >
    > Really, without knowing what you are trying to optmize it is hard to say.
    >
    > std::vector<MyClass> MyVector;
    > /*... */
    > MyVector[SomeCounter]
    > should be a relatively fast operation, very similar to pointer math.
    >
    > std::map<Mykey, MyClass> MyMap;
    > /* ... */
    > MyMap.find( SomeKey );
    > is a binary search lookup.
    > MyMap[SomeKey]
    > is also a binary key lookup, with the additon of possibly adding the key
    > and
    > data.
    >
    > Without knowing how you are using the vector it is hard to say. One thing
    > I
    > would hope, however is that you are preallocating enough memory for your
    > vector so that it doesn't have to keep newing when it runs out of memory.
    >
    > I have no idea why your operation is taking 2 days, maybe it should be.
    > Maybe it shouldn't. :But without knowing more about what you are actually
    > doing anything we come up with is a shot in the da



    as i said...if my checking element matches one of the element in my
    vector list then i will collect the statistics...
    if it doesnt matches any one of the vector elements then i will move
    this checking elment to a new vector...

    i hope u all undestand

    -----

    Yes, I read your other post. Yes, a map or set would definately be faster
    for lookups. Read my other post.
     
    Jim Langston, Jul 2, 2008
    #12
  13. xyz

    Jerry Coffin Guest

    In article <85f819a0-6c66-4f13-8e22-eb11d431b773
    @c58g2000hsc.googlegroups.com>, says...

    [ ... ]

    > for example i am doing the operation as below in my program
    > whenever i receive a packet or something ...i will check does it
    > contain in my vector list....
    > so if I contain around 2 million entreis in my vector and i received
    > around 100 packets...
    > then the computation would be 100 packets * 2 million iterations...
    >
    > as my list goes on incresing I will have more iterations..
    > this is the problem i am facing with my simulation..


    [ and elsethread: ]

    > as i said...if my checking element matches one of the element in my
    > vector list then i will collect the statistics...
    > if it doesnt matches any one of the vector elements then i will move
    > this checking elment to a new vector...


    Okay, from the sound of things, the issue is really with searches, not
    with the iterators per se. From your description, you're currently using
    a linear search, which is pretty slow, as you've pointed out. A binary
    search should be much faster (logarithmic instead of linear complexity).

    An std::set or std::map will use a binary search, but that does NOT
    necessarily mean it's the best structure to use. I'm not _entirely_ sure
    I follow what you're saying, but it _sounds_ like the searches are being
    done in a fixed set of items -- i.e. you're _not_ (for example) adding
    are removing items from that set during a single run.

    If that's correct, then you're probably better off continuing to use a
    vector, but sorting it and using binary_search or upper_bound (or
    possibly lower_bound) to search it.

    Given the number of items you're dealing with, you might also want to
    try using an unordered_set instead -- this uses a hash map, so its speed
    is normally nearly constant regardless of the number of items being
    searched.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Jul 2, 2008
    #13
  14. xyz

    James Kanze Guest

    On Jul 2, 4:26 pm, "Jim Langston" <> wrote:
    > > "xyz" <> wrote in message
    > > > With
    > > > 116M objects, that can add up.

    > > for example i am doing the operation as below in my program
    > > whenever i receive a packet or something ...i will check does it
    > > contain in my vector list....
    > > so if I contain around 2 million entreis in my vector and i received
    > > around 100 packets...
    > > then the computation would be 100 packets * 2 million iterations...


    > > as my list goes on incresing I will have more iterations..
    > > this is the problem i am facing with my simulation..


    > Now we're getting somewhere. So the speed isue is on lookups.
    > Yes, a map should be much faster on a lookup. It is a binary
    > tree lookup which is something like O(log n) (not sure) where
    > searching for data through a vector is O(N).


    > It sounds like, however, that you can get away with a set if
    > your data is your key, what you are looking for. There is not
    > much difference for lookup times as far as I know between a
    > map and a set, but a set should use up a little less memory as
    > it doesn't have to store both the key and the data, only the
    > data which is the key.


    > Be aware, however, that a set or a map is going to take up
    > more memory. This may, or may not, be a problem for you. A
    > set and map store their data in chunks. Each entry has it's
    > own memory. A vector stores it's data in one huge block of
    > memory. A vector's overhead, therefore, is the size of a
    > pointer. A sets overhead is the size of the pointer for each
    > entry, plus the key entry and overhead.


    The size of several poniters (two or three) for each entry. If
    table can be filled once, up front, then a sorted vector is
    definitely the way to go (using std::lower_bound for lookup).
    Or a hash table.

    > Normally this wouldn't concern me that much, but you say you
    > are using 2 million entries and depending on the size of the
    > entry thsi could be substantial. Or not.


    Note that if the implementation of std::string he's using
    doesn't use the small string optimization, or his strings are to
    long for it to optimize, a custom string class which doesn't use
    dynamic memory could also make a significant difference. This
    largely depends on the contents of the strings, however.

    > You might also want to look at a hash map or hash set (if it
    > exists?) if yoru data is string (text) data. I understand
    > this increases lookup time even a bit more.


    With a good hashing function, look up in a hash table is O(1).
    With something like 2 million entries, the difference between
    O(ln n) and O(1) can be significant. (In my own tests, hash
    tables beat std::map or a sorted factor by a ratio of two to one
    for 1200 entries, and almost five to one for a million entries.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jul 2, 2008
    #14
  15. xyz

    Duane Hebert Guest

    >as i said...if my checking element matches one of the element in my
    >vector list then i will collect the statistics...
    >if it doesnt matches any one of the vector elements then i will move
    >this checking elment to a new vector...


    If you add elements to a vector, it's going to be forced to maintain
    them contiguously so it will sometime need to copy the whole
    thing. This can cause unexpected results unless you're able to
    reserve enough memory.

    If your slowdown is due to searches and growing the vectors then
    you could try a map (or set) and benchmark it.
     
    Duane Hebert, Jul 3, 2008
    #15
  16. xyz

    Duane Hebert Guest

    "James Kanze" <> wrote in message
    news:...

    >With a good hashing function, look up in a hash table is O(1).
    >With something like 2 million entries, the difference between
    >O(ln n) and O(1) can be significant. (In my own tests, hash
    >tables beat std::map or a sorted factor by a ratio of two to one
    >for 1200 entries, and almost five to one for a million entries.)



    True but he's saying that his code is bottlenecked in the
    search but then if the object isn't found, he's pushing it
    into a vector. It's not clear if it's the same vector.

    If it is and he needs to keep it sorted, inserting in the middle
    of the vector may offset any value from the hash table.
    I guess it would depend on percentage of misses or
    whatever. Just appending at the end may involve
    reallocation.

    While vectors are often the best choice, there are cases
    where we have to look at the implementation and maintaining
    contiguous values can cause overhead that you can't estimate
    by complexity measures. The OP should benchmark different
    strategies. My guess would be that your idea or a map will
    probably be the best.
     
    Duane Hebert, Jul 3, 2008
    #16
  17. Duane Hebert a écrit :
    > "James Kanze" <> wrote in message
    > news:...
    >
    >> With a good hashing function, look up in a hash table is O(1).
    >> With something like 2 million entries, the difference between
    >> O(ln n) and O(1) can be significant. (In my own tests, hash
    >> tables beat std::map or a sorted factor by a ratio of two to one
    >> for 1200 entries, and almost five to one for a million entries.)

    >
    >
    > True but he's saying that his code is bottlenecked in the
    > search but then if the object isn't found, he's pushing it
    > into a vector. It's not clear if it's the same vector.
    >
    > If it is and he needs to keep it sorted, inserting in the middle
    > of the vector may offset any value from the hash table.
    > I guess it would depend on percentage of misses or
    > whatever. Just appending at the end may involve
    > reallocation.


    If he is using a hash system (perhaps a std::tr1::unordered_map<>), the
    the data structure and the deferencing system are adapted: the hash can
    associate the index in the vector, or the data can be put directly in
    the hash container.

    And in the case of a hash, the vector doesn't need to be sorted.

    > While vectors are often the best choice, there are cases
    > where we have to look at the implementation and maintaining
    > contiguous values can cause overhead that you can't estimate
    > by complexity measures. The OP should benchmark different
    > strategies. My guess would be that your idea or a map will
    > probably be the best.


    A map only guarantee a O(log(n)) search and consumes at least 2 pointers
    per node for the tree (this means a minimum of 16Mbytes of pointers
    overhead); it is a hard hit in the balance.

    A vector has a cost upon insertion. But this can be leveraged by having
    a sorted vector/area and an unsorted vector/area; upon a trigger (size,
    time, load, reserve depleted), all elements of the unsorted area are
    inserted.

    Hash are harder to tune. But very effective in search and insertion time.

    --
    Michael
     
    Michael DOUBEZ, Jul 3, 2008
    #17
  18. xyz

    Duane Hebert Guest

    "Michael DOUBEZ" <> wrote in message
    news:486cce9d$0$24432$...
    > A map only guarantee a O(log(n)) search and consumes at least 2 pointers
    > per node for the tree (this means a minimum of 16Mbytes of pointers
    > overhead); it is a hard hit in the balance.
    >
    > A vector has a cost upon insertion. But this can be leveraged by having a
    > sorted vector/area and an unsorted vector/area; upon a trigger (size,
    > time, load, reserve depleted), all elements of the unsorted area are
    > inserted.
    >
    > Hash are harder to tune. But very effective in search and insertion time.


    This sounds like an instance when a hard to tune hash may
    be worth the trouble.

    If the vectors don't need to be maintained sorted, that helps a
    bunch but you still have to deal with the reallocation if you can't
    reserve a size. Since he's talking about a large data set, this
    could be an issue.

    This is all pretty much speculation until it's benchmarked. Especially
    considering that we don't know what the system limitations are.
    Maybe 16 Mb of pointers isn't an issue in relation to maintaining
    contiguous memory.
     
    Duane Hebert, Jul 4, 2008
    #18
  19. Duane Hebert a écrit :
    > "Michael DOUBEZ" <> wrote in message
    > news:486cce9d$0$24432$...
    >> A map only guarantee a O(log(n)) search and consumes at least 2 pointers
    >> per node for the tree (this means a minimum of 16Mbytes of pointers
    >> overhead); it is a hard hit in the balance.
    >>
    >> A vector has a cost upon insertion. But this can be leveraged by having a
    >> sorted vector/area and an unsorted vector/area; upon a trigger (size,
    >> time, load, reserve depleted), all elements of the unsorted area are
    >> inserted.
    >>
    >> Hash are harder to tune. But very effective in search and insertion time.

    >
    > This sounds like an instance when a hard to tune hash may
    > be worth the trouble.




    > If the vectors don't need to be maintained sorted, that helps a
    > bunch but you still have to deal with the reallocation if you can't
    > reserve a size. Since he's talking about a large data set, this
    > could be an issue.


    It depends if insertion occur a lot or not.
    If the size of the set of different element is logarithmic, I would
    expect insertion to happen a lot at the beginning an less at the end; if
    it exponential, well it is however hard to handle.

    From tuning you can deduce this kind of parameters and in particular
    the average number of element which can be reserved up front.

    However, if he uses a hash. It is simpler to use it as a container and
    not bother with a duplicated structure.

    > This is all pretty much speculation until it's benchmarked. Especially
    > considering that we don't know what the system limitations are.
    > Maybe 16 Mb of pointers isn't an issue in relation to maintaining
    > contiguous memory.


    Yes. That is my I simply stated the tradeoff I saw without dismissing
    any of them. It is true that, given the problematic of the OP, I don't
    favor the map<> solution because it consumes a lot for a very small gain
    (insertion and search in logarithm time). Now he may have other needs
    that requires a map<> (such as having a sorted container).


    --
    Michael
     
    Michael DOUBEZ, Jul 4, 2008
    #19
  20. xyz

    James Kanze Guest

    On Jul 3, 1:30 pm, "Duane Hebert" <> wrote:
    > "James Kanze" <> wrote in message


    > news:...


    > >With a good hashing function, look up in a hash table is O(1).
    > >With something like 2 million entries, the difference between
    > >O(ln n) and O(1) can be significant. (In my own tests, hash
    > >tables beat std::map or a sorted factor by a ratio of two to one
    > >for 1200 entries, and almost five to one for a million entries.)


    > True but he's saying that his code is bottlenecked in the
    > search but then if the object isn't found, he's pushing it
    > into a vector. It's not clear if it's the same vector.


    It's not clear at all what he does if he don't find the element.
    (At one point, he says something about a "new" vector, but I'm
    not sure what he's doing with it.)

    > If it is and he needs to keep it sorted, inserting in the
    > middle of the vector may offset any value from the hash table.


    I'm afraid I don't understand. The hash table contains the
    elements he's comparing against. Depending on what he's doing
    he may want to insert the element he didn't find into the hash
    table, or he may not. There's been no indication of having to
    keep any sort of order (or at least, I didn't see it).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jul 4, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tony Young
    Replies:
    4
    Views:
    846
    Mark P
    Apr 9, 2005
  2. Replies:
    8
    Views:
    1,996
    Csaba
    Feb 18, 2006
  3. wolverine
    Replies:
    3
    Views:
    1,111
    Chris
    Jul 31, 2006
  4. Vikram Karandikar
    Replies:
    1
    Views:
    233
    Vikram Karandikar
    Oct 25, 2013
  5. Jim Anderson

    problem with iterator (map iterator)

    Jim Anderson, Jan 10, 2014, in forum: C++
    Replies:
    3
    Views:
    158
    Luca Risolia
    Jan 13, 2014
Loading...

Share This Page