vector vs map iterator

X

xyz

I have to run the simulation of a trace file around (7gb contains
116million entries)...
presently i am using vector iterators to check the conditions in my
program.....
it is taking 2 days to finish whole simulation.....

my question are the map iterators are faster than vector
iterators.....
does it improve the performance....
thanks to all
 
A

AnonMail2005

I have to run the simulation of a trace file around (7gb contains
116million entries)...
presently i am using vector iterators to check the conditions in my
program.....
it is taking 2 days to finish whole simulation.....

my question are the map iterators are faster than vector
iterators.....
does it improve the performance....
thanks to all
Since a vector is guarenteed to be contiguous memory, it's iterator
*could* be implemented internally as a pointer - which is fast. A
map is list-like in that is has nodes so it's iterator would have
to dereference a pointer - which is slower. So, in very general
terms, I think maps iterators would be slower.

But...

Use a good profiler to determine which parts of your program are
time consuming. Without a profiler, you're just guessing.

HTH
 
J

Joe Greer

I have to run the simulation of a trace file around (7gb contains
116million entries)...
presently i am using vector iterators to check the conditions in my
program.....
it is taking 2 days to finish whole simulation.....

my question are the map iterators are faster than vector
iterators.....
does it improve the performance....
thanks to all

I am not sure what operations you are performing on your iterators, but
vector iterators are about as fast as you get. Maps are generally a tree
structure of some sort and moving from one node to the next will generally
require loading a value from memory whereas with a vector, it will simply
add or subtract a fixed amount from the address already in the iterator.
This tends to be faster. Same thing applies to a list. Vectors also have
better locality of reference so when an item is read, the whole page comes
into memory and you get the objects surrounding the target for free. maps,
sets, lists etc are better if their semantics match what you want to do
better. That is, if you find yourself searching for objects a lot, then
maps and sets are more natural. Though even then, sorting the vector and
using the binary search algorithms may perform better. The other thing to
consider is that maps, sets, and lists have more overhead per object. With
116M objects, that can add up.

HTH,
joe
 
J

Joe Greer

I am not sure what operations you are performing on your iterators,
but vector iterators are about as fast as you get. Maps are generally
a tree structure of some sort and moving from one node to the next
will generally require loading a value from memory whereas with a
vector, it will simply add or subtract a fixed amount from the address
already in the iterator. This tends to be faster. Same thing applies
to a list. Vectors also have better locality of reference so when an
item is read, the whole page comes into memory and you get the objects
surrounding the target for free. maps, sets, lists etc are better if
their semantics match what you want to do better. That is, if you
find yourself searching for objects a lot, then maps and sets are more
natural. Though even then, sorting the vector and using the binary
search algorithms may perform better. The other thing to consider is
that maps, sets, and lists have more overhead per object. With 116M
objects, that can add up.

HTH,
joe

I also agree with the comment to use a profiler to be sure you are
optimizing the proper thing.

joe
 
X

xyz

I am not sure what operations you are performing on your iterators, but
vector iterators are about as fast as you get.  Maps are generally a tree
structure of some sort and moving from one node to the next will generally
require loading a value from memory whereas with a vector, it will simply
add or subtract a fixed amount from the address already in the iterator.  
This tends to be faster.  Same thing applies to a list.  Vectors also have
better locality of reference so when an item is read, the whole page comes
into memory and you get the objects surrounding the target for free.  maps,
sets, lists etc are better if their semantics match what you want to do
better.  That is, if you find yourself searching for objects a lot, then
maps and sets are more natural.  Though even then, sorting the vector and
using the binary search algorithms may perform better.  The other thing to
consider is that maps, sets, and lists have more overhead per object.  With
116M objects, that can add up.

HTH,
joe

for example i am doing the operation as below in my program
whenever i receive a packet or something ...i will check does it
contain in my vector list....
so if I contain around 2 million entreis in my vector and i received
around 100 packets...
then the computation would be 100 packets * 2 million iterations...

as my list goes on incresing I will have more iterations..
this is the problem i am facing with my simulation..
 
X

xyz

I also agree with the comment to use a profiler to be sure you are
optimizing the proper thing.

joe

I have already used the profiler to know the time consuming places in
my program....
I checked smaller protion of my 7gb file...which showed the time
consuming part is at what i mentioned
 
J

Juha Nieminen

xyz said:
I have to run the simulation of a trace file around (7gb contains
116million entries)...
presently i am using vector iterators to check the conditions in my
program.....
it is taking 2 days to finish whole simulation.....

my question are the map iterators are faster than vector
iterators.....
does it improve the performance....

It's not a question of iterators. It's a question of what is it that
you are doing with your vector/map and how you are using those iterators.

(And no, when measured in clock cycles, no operation done to a map
iterator is faster than to a vector iterator. On the contrary. However,
I have the feeling you are not really talking about the iterators
per se.)
 
J

Jim Langston

xyz said:
I have to run the simulation of a trace file around (7gb contains
116million entries)...
presently i am using vector iterators to check the conditions in my
program.....
it is taking 2 days to finish whole simulation.....

my question are the map iterators are faster than vector
iterators.....
does it improve the performance....
thanks to all

Performance for what? Insertions? Deletions? Insertions in middle? End?
Beginning? Deletion is middle? End? Beginning? Lookups in beginning?
End? Middle?

Different containers (vector, set, map, etc...) are designed for different
tasks and each has it's power and it's weakness.

Maybe this link will help: http://www.linuxsoftware.co.nz/cppcontainers.html
maybe it won't. Check the bottom anyway to determine which container to
chose.

Also, the wiki has a little bit about the speed of some of the containers:
http://en.wikipedia.org/wiki/Standard_Template_Library

Really, without knowing what you are trying to optmize it is hard to say.

std::vector<MyClass> MyVector;
/*... */
MyVector[SomeCounter]
should be a relatively fast operation, very similar to pointer math.

std::map<Mykey, MyClass> MyMap;
/* ... */
MyMap.find( SomeKey );
is a binary search lookup.
MyMap[SomeKey]
is also a binary key lookup, with the additon of possibly adding the key and
data.

Without knowing how you are using the vector it is hard to say. One thing I
would hope, however is that you are preallocating enough memory for your
vector so that it doesn't have to keep newing when it runs out of memory.

I have no idea why your operation is taking 2 days, maybe it should be.
Maybe it shouldn't. :But without knowing more about what you are actually
doing anything we come up with is a shot in the dark.
 
M

Michael DOUBEZ

xyz a écrit :
for example i am doing the operation as below in my program
whenever i receive a packet or something ...i will check does it
contain in my vector list....
so if I contain around 2 million entreis in my vector and i received
around 100 packets...
then the computation would be 100 packets * 2 million iterations...

as my list goes on incresing I will have more iterations..
this is the problem i am facing with my simulation..

I don't understand what you are doing.
If your algorithm has to be in O(n) then, there is not much to do unless
you can reorganize your data to have some kind of decision tree or
another heuristic.

But if you perform, let say a search, sorting the vector or using map
will reduce it to O(log(n)). If you can find a good hash, it can be in O(1).
 
X

xyz

I have to run the simulation of a trace file around (7gb contains
116million entries)...
presently i am using vector iterators to check the conditions in my
program.....
it is taking 2 days to finish whole simulation.....
my question are the map iterators are faster than vector
iterators.....
does it improve the performance....
thanks to all

Performance for what?  Insertions?  Deletions?  Insertions in middle?  End?
Beginning?  Deletion is middle? End?  Beginning?  Lookups in beginning?
End?  Middle?

Different containers (vector, set, map, etc...) are designed for different
tasks and each has it's power and it's weakness.

Maybe this link will help:http://www.linuxsoftware.co.nz/cppcontainers.html
maybe it won't.  Check the bottom anyway to determine which container to
chose.

Also, the wiki has a little bit about the speed of some of the containers:http://en.wikipedia.org/wiki/Standard_Template_Library

Really, without knowing what you are trying to optmize it is hard to say.

std::vector<MyClass> MyVector;
/*... */
MyVector[SomeCounter]
should be a relatively fast operation, very similar to pointer math.

std::map<Mykey, MyClass> MyMap;
/* ... */
MyMap.find( SomeKey );
is a binary search lookup.
MyMap[SomeKey]
is also a binary key lookup, with the additon of possibly adding the key and
data.

Without knowing how you are using the vector it is hard to say.  One thing I
would hope, however is that you are preallocating enough memory for your
vector so that it doesn't have to keep newing when it runs out of memory.

I have no idea why your operation is taking 2 days, maybe it should be.
Maybe it shouldn't.  :But without knowing more about what you are actually
doing anything we come up with is a shot in the da


as i said...if my checking element matches one of the element in my
vector list then i will collect the statistics...
if it doesnt matches any one of the vector elements then i will move
this checking elment to a new vector...

i hope u all undestand
 
J

Jim Langston

for example i am doing the operation as below in my program
whenever i receive a packet or something ...i will check does it
contain in my vector list....
so if I contain around 2 million entreis in my vector and i received
around 100 packets...
then the computation would be 100 packets * 2 million iterations...

as my list goes on incresing I will have more iterations..
this is the problem i am facing with my simulation..

Now we're getting somewhere. So the speed isue is on lookups. Yes, a map
should be much faster on a lookup. It is a binary tree lookup which is
something like O(log n) (not sure) where searching for data through a vector
is O(N).

It sounds like, however, that you can get away with a set if your data is
your key, what you are looking for. There is not much difference for lookup
times as far as I know between a map and a set, but a set should use up a
little less memory as it doesn't have to store both the key and the data,
only the data which is the key.

Be aware, however, that a set or a map is going to take up more memory.
This may, or may not, be a problem for you. A set and map store their data
in chunks. Each entry has it's own memory. A vector stores it's data in
one huge block of memory. A vector's overhead, therefore, is the size of a
pointer. A sets overhead is the size of the pointer for each entry, plus
the key entry and overhead.

Normally this wouldn't concern me that much, but you say you are using 2
million entries and depending on the size of the entry thsi could be
substantial. Or not.

You might also want to look at a hash map or hash set (if it exists?) if
yoru data is string (text) data. I understand this increases lookup time
even a bit more.
 
J

Jim Langston

I have to run the simulation of a trace file around (7gb contains
116million entries)...
presently i am using vector iterators to check the conditions in my
program.....
it is taking 2 days to finish whole simulation.....
my question are the map iterators are faster than vector
iterators.....
does it improve the performance....
thanks to all

Performance for what? Insertions? Deletions? Insertions in middle? End?
Beginning? Deletion is middle? End? Beginning? Lookups in beginning?
End? Middle?

Different containers (vector, set, map, etc...) are designed for different
tasks and each has it's power and it's weakness.

Maybe this link will
help:http://www.linuxsoftware.co.nz/cppcontainers.html
maybe it won't. Check the bottom anyway to determine which container to
chose.

Also, the wiki has a little bit about the speed of some of the
containers:http://en.wikipedia.org/wiki/Standard_Template_Library

Really, without knowing what you are trying to optmize it is hard to say.

std::vector<MyClass> MyVector;
/*... */
MyVector[SomeCounter]
should be a relatively fast operation, very similar to pointer math.

std::map<Mykey, MyClass> MyMap;
/* ... */
MyMap.find( SomeKey );
is a binary search lookup.
MyMap[SomeKey]
is also a binary key lookup, with the additon of possibly adding the key
and
data.

Without knowing how you are using the vector it is hard to say. One thing
I
would hope, however is that you are preallocating enough memory for your
vector so that it doesn't have to keep newing when it runs out of memory.

I have no idea why your operation is taking 2 days, maybe it should be.
Maybe it shouldn't. :But without knowing more about what you are actually
doing anything we come up with is a shot in the da


as i said...if my checking element matches one of the element in my
vector list then i will collect the statistics...
if it doesnt matches any one of the vector elements then i will move
this checking elment to a new vector...

i hope u all undestand
 
J

Jerry Coffin

[ ... ]
for example i am doing the operation as below in my program
whenever i receive a packet or something ...i will check does it
contain in my vector list....
so if I contain around 2 million entreis in my vector and i received
around 100 packets...
then the computation would be 100 packets * 2 million iterations...

as my list goes on incresing I will have more iterations..
this is the problem i am facing with my simulation..

[ and elsethread: ]
as i said...if my checking element matches one of the element in my
vector list then i will collect the statistics...
if it doesnt matches any one of the vector elements then i will move
this checking elment to a new vector...

Okay, from the sound of things, the issue is really with searches, not
with the iterators per se. From your description, you're currently using
a linear search, which is pretty slow, as you've pointed out. A binary
search should be much faster (logarithmic instead of linear complexity).

An std::set or std::map will use a binary search, but that does NOT
necessarily mean it's the best structure to use. I'm not _entirely_ sure
I follow what you're saying, but it _sounds_ like the searches are being
done in a fixed set of items -- i.e. you're _not_ (for example) adding
are removing items from that set during a single run.

If that's correct, then you're probably better off continuing to use a
vector, but sorting it and using binary_search or upper_bound (or
possibly lower_bound) to search it.

Given the number of items you're dealing with, you might also want to
try using an unordered_set instead -- this uses a hash map, so its speed
is normally nearly constant regardless of the number of items being
searched.
 
J

James Kanze

Now we're getting somewhere. So the speed isue is on lookups.
Yes, a map should be much faster on a lookup. It is a binary
tree lookup which is something like O(log n) (not sure) where
searching for data through a vector is O(N).
It sounds like, however, that you can get away with a set if
your data is your key, what you are looking for. There is not
much difference for lookup times as far as I know between a
map and a set, but a set should use up a little less memory as
it doesn't have to store both the key and the data, only the
data which is the key.
Be aware, however, that a set or a map is going to take up
more memory. This may, or may not, be a problem for you. A
set and map store their data in chunks. Each entry has it's
own memory. A vector stores it's data in one huge block of
memory. A vector's overhead, therefore, is the size of a
pointer. A sets overhead is the size of the pointer for each
entry, plus the key entry and overhead.

The size of several poniters (two or three) for each entry. If
table can be filled once, up front, then a sorted vector is
definitely the way to go (using std::lower_bound for lookup).
Or a hash table.
Normally this wouldn't concern me that much, but you say you
are using 2 million entries and depending on the size of the
entry thsi could be substantial. Or not.

Note that if the implementation of std::string he's using
doesn't use the small string optimization, or his strings are to
long for it to optimize, a custom string class which doesn't use
dynamic memory could also make a significant difference. This
largely depends on the contents of the strings, however.
You might also want to look at a hash map or hash set (if it
exists?) if yoru data is string (text) data. I understand
this increases lookup time even a bit more.

With a good hashing function, look up in a hash table is O(1).
With something like 2 million entries, the difference between
O(ln n) and O(1) can be significant. (In my own tests, hash
tables beat std::map or a sorted factor by a ratio of two to one
for 1200 entries, and almost five to one for a million entries.)
 
D

Duane Hebert

as i said...if my checking element matches one of the element in my
vector list then i will collect the statistics...
if it doesnt matches any one of the vector elements then i will move
this checking elment to a new vector...

If you add elements to a vector, it's going to be forced to maintain
them contiguously so it will sometime need to copy the whole
thing. This can cause unexpected results unless you're able to
reserve enough memory.

If your slowdown is due to searches and growing the vectors then
you could try a map (or set) and benchmark it.
 
D

Duane Hebert

With a good hashing function, look up in a hash table is O(1).
With something like 2 million entries, the difference between
O(ln n) and O(1) can be significant. (In my own tests, hash
tables beat std::map or a sorted factor by a ratio of two to one
for 1200 entries, and almost five to one for a million entries.)


True but he's saying that his code is bottlenecked in the
search but then if the object isn't found, he's pushing it
into a vector. It's not clear if it's the same vector.

If it is and he needs to keep it sorted, inserting in the middle
of the vector may offset any value from the hash table.
I guess it would depend on percentage of misses or
whatever. Just appending at the end may involve
reallocation.

While vectors are often the best choice, there are cases
where we have to look at the implementation and maintaining
contiguous values can cause overhead that you can't estimate
by complexity measures. The OP should benchmark different
strategies. My guess would be that your idea or a map will
probably be the best.
 
M

Michael DOUBEZ

Duane Hebert a écrit :
True but he's saying that his code is bottlenecked in the
search but then if the object isn't found, he's pushing it
into a vector. It's not clear if it's the same vector.

If it is and he needs to keep it sorted, inserting in the middle
of the vector may offset any value from the hash table.
I guess it would depend on percentage of misses or
whatever. Just appending at the end may involve
reallocation.

If he is using a hash system (perhaps a std::tr1::unordered_map<>), the
the data structure and the deferencing system are adapted: the hash can
associate the index in the vector, or the data can be put directly in
the hash container.

And in the case of a hash, the vector doesn't need to be sorted.
While vectors are often the best choice, there are cases
where we have to look at the implementation and maintaining
contiguous values can cause overhead that you can't estimate
by complexity measures. The OP should benchmark different
strategies. My guess would be that your idea or a map will
probably be the best.

A map only guarantee a O(log(n)) search and consumes at least 2 pointers
per node for the tree (this means a minimum of 16Mbytes of pointers
overhead); it is a hard hit in the balance.

A vector has a cost upon insertion. But this can be leveraged by having
a sorted vector/area and an unsorted vector/area; upon a trigger (size,
time, load, reserve depleted), all elements of the unsorted area are
inserted.

Hash are harder to tune. But very effective in search and insertion time.
 
D

Duane Hebert

Michael DOUBEZ said:
A map only guarantee a O(log(n)) search and consumes at least 2 pointers
per node for the tree (this means a minimum of 16Mbytes of pointers
overhead); it is a hard hit in the balance.

A vector has a cost upon insertion. But this can be leveraged by having a
sorted vector/area and an unsorted vector/area; upon a trigger (size,
time, load, reserve depleted), all elements of the unsorted area are
inserted.

Hash are harder to tune. But very effective in search and insertion time.

This sounds like an instance when a hard to tune hash may
be worth the trouble.

If the vectors don't need to be maintained sorted, that helps a
bunch but you still have to deal with the reallocation if you can't
reserve a size. Since he's talking about a large data set, this
could be an issue.

This is all pretty much speculation until it's benchmarked. Especially
considering that we don't know what the system limitations are.
Maybe 16 Mb of pointers isn't an issue in relation to maintaining
contiguous memory.
 
M

Michael DOUBEZ

Duane Hebert a écrit :
This sounds like an instance when a hard to tune hash may
be worth the trouble.


If the vectors don't need to be maintained sorted, that helps a
bunch but you still have to deal with the reallocation if you can't
reserve a size. Since he's talking about a large data set, this
could be an issue.

It depends if insertion occur a lot or not.
If the size of the set of different element is logarithmic, I would
expect insertion to happen a lot at the beginning an less at the end; if
it exponential, well it is however hard to handle.

From tuning you can deduce this kind of parameters and in particular
the average number of element which can be reserved up front.

However, if he uses a hash. It is simpler to use it as a container and
not bother with a duplicated structure.
This is all pretty much speculation until it's benchmarked. Especially
considering that we don't know what the system limitations are.
Maybe 16 Mb of pointers isn't an issue in relation to maintaining
contiguous memory.

Yes. That is my I simply stated the tradeoff I saw without dismissing
any of them. It is true that, given the problematic of the OP, I don't
favor the map<> solution because it consumes a lot for a very small gain
(insertion and search in logarithm time). Now he may have other needs
that requires a map<> (such as having a sorted container).
 
J

James Kanze

True but he's saying that his code is bottlenecked in the
search but then if the object isn't found, he's pushing it
into a vector. It's not clear if it's the same vector.

It's not clear at all what he does if he don't find the element.
(At one point, he says something about a "new" vector, but I'm
not sure what he's doing with it.)
If it is and he needs to keep it sorted, inserting in the
middle of the vector may offset any value from the hash table.

I'm afraid I don't understand. The hash table contains the
elements he's comparing against. Depending on what he's doing
he may want to insert the element he didn't find into the hash
table, or he may not. There's been no indication of having to
keep any sort of order (or at least, I didn't see it).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top