STL container question

B

Boltar

Hi

I need to store a number of integer values which I will then search on
later to see if they exist in my container. Can someone tell me which
container would be quickest for finding these values? I can't use a
plain C array (unless I make it 2^32 in size!) since I don't know the
max integer value.

Thanks for any help

B2003
 
L

Lars Tetzlaff

Boltar said:
Hi

I need to store a number of integer values which I will then search on
later to see if they exist in my container. Can someone tell me which
container would be quickest for finding these values? I can't use a
plain C array (unless I make it 2^32 in size!) since I don't know the
max integer value.

Thanks for any help

B2003

std::set

Lars
 
I

Ioannis Vranos

Victor said:
Store first, then sort, then search (using 'std::binary_search'), you
could just use 'std::vector'.


For that case, I think std::list is a better option, since the sorting
will be faster,
 
I

Ioannis Vranos

Victor said:
Do you have any proof of that?


Lists are implemented using pointers to point to the previous and to the
next elements, so list::sort(), is more efficient by changing pointer
values, while sorting a vector involves copying objects.
 
J

Juha Nieminen

Ioannis said:
Lists are implemented using pointers to point to the previous and to the
next elements, so list::sort(), is more efficient by changing pointer
values, while sorting a vector involves copying objects.

The original poster talked about storing integer values. I highly
doubt sorting a list of integers will be faster than sorting an array of
integers. In fact, I'm pretty sure of the contrary.
 
J

Juha Nieminen

Boltar said:
I need to store a number of integer values which I will then search on
later to see if they exist in my container. Can someone tell me which
container would be quickest for finding these values? I can't use a
plain C array (unless I make it 2^32 in size!) since I don't know the
max integer value.

If memory usage is not an issue, std::set is by far the easiest solution.

(OTOH if the amount of integers can be counted in the millions, then
it may be better to use a sorted vector or whatever.)
 
R

Rolf Magnus

Ioannis said:
Lists are implemented using pointers to point to the previous and to the
next elements, so list::sort(), is more efficient by changing pointer
values, while sorting a vector involves copying objects.

And you really think that doing two pointer exchanges is faster than one
integer exchange?
 
R

Rolf Magnus

Jeff said:
Sorted vector. See Effective STL, Item 23.

For the record, you wouldn't 2^32 integers, just 2^32 bits = 500 MiB.
It's actually not that much RAM, depending on your target system, and
would let you check for integers with O(1) complexity (rather than O(log
N)).

However, it can still be slower, since it's more or less the worst thing you
can do to the cache.
 
E

Erik Wikström

However, it can still be slower, since it's more or less the worst thing you
can do to the cache.

Still, you should only get one cache-miss when looking for a value, if
you use a set or vector you will probably get more.
 
L

Lars Tetzlaff

Jeff said:
Sorted vector. See Effective STL, Item 23.

For the record, you wouldn't 2^32 integers, just 2^32 bits = 500 MiB.
It's actually not that much RAM, depending on your target system, and
would let you check for integers with O(1) complexity (rather than O(log
N)).

How do you get O(1) from a sorted vector<int>? A binary search costs
O(log N).

Lars
 
L

Lars Tetzlaff

Victor said:
Jeff meant one would get O(1) from looking up in the full array of bits.
Every integer would mean a shift to form the index and a mask to get to
the bit value, one shift, one pointer addition, one dereference, one
bitwise AND per lookup, O(1).

V

I see, something like vector<bool>. Ok, that would be fast.

Lars
 
R

Rolf Magnus

Erik said:
Still, you should only get one cache-miss when looking for a value, if
you use a set or vector you will probably get more.

If you only do a single look-up, yes. But in that case, building up an array
of 500 Megabytes will take a quite significant amount of time. ;-)
 
J

James Kanze

Sorted vector. See Effective STL, Item 23.
For the record, you wouldn't 2^32 integers, just 2^32 bits =
500 MiB. It's actually not that much RAM, depending on your
target system, and would let you check for integers with O(1)
complexity (rather than O(log N)).

Some systems won't allow you to allocate that much as a local
variable, so you might as well use std::vector. If the
reallocations are an issue, one call up front to capacity should
solve that.
 
J

James Kanze

Still, you should only get one cache-miss when looking for a
value, if you use a set or vector you will probably get more.

One thing I don't understand here: both a C style array and
std::vector use a single block of contiguous memory. How could
cache performance be any different for them?

(An earlier suggestion to use std::list does suffer from this,
of course.)
 
J

James Kanze

In other words, no. <g> One could also point out that
quicksort can be used to sort a vector but can't be used on a
list, so obviously sorting a vector is faster. The problem
with both arguments is exactly what Victor implied: they're
handwaving. Proof requires much more detailed analysis, or if
you're making a decision for a particular platform,
measurement.

To other considerations: you can't use binary_search on
std::list---with any sort of size, that's going to make a
significant difference in favor of vector. And of course, an
std::list has horrible locality, which will translate into a lot
of cache misses (or even virtual page faults). So even if
sorting it were faster (which I somehow doubt)...
 
R

Rolf Magnus

James said:
One thing I don't understand here: both a C style array and
std::vector use a single block of contiguous memory. How could
cache performance be any different for them?

We're not talking about C style arrays vs. vector, but about different
algorithms. One is to storing the values in a sorted array/vector, then
doing a binary search for the value. The other is like a trivial kind of
hash table, where each possible integer value has a boolean entry, so you
can simply use the value as index. Saves the search, but needs a big amount
of memory that is much larger than the cache.
 
I

Ioannis Vranos

Juha said:
The original poster talked about storing integer values. I highly
doubt sorting a list of integers will be faster than sorting an array of
integers. In fact, I'm pretty sure of the contrary.


We must think generally. In general, sorting a list is faster than
sorting a vector, because the list sorting does not involve the
construction or destruction of any object.

Regarding ints, I think sorting a vector of ints and as list of ints,
both have about the same efficiency.


If the programmer decides to replace ints with other objects, he will
not have to change much in the code, if he uses a list.
 
I

Ioannis Vranos

Pete said:
>
In other words, no. <g> One could also point out that quicksort can be
used to sort a vector but can't be used on a list, so obviously sorting
a vector is faster.


list::sort is faster than any sort on vector, because it does not
involve the construction or destruction of any object.
 
I

Ioannis Vranos

James said:
To other considerations: you can't use binary_search on
std::list---with any sort of size, that's going to make a
significant difference in favor of vector.


Of course you can:


#include <algorithm>
#include <list>
#include <cstddef>
#include <iostream>


int main()
{
using namespace std;

typedef list<int> intlist;

intlist ilist;


for (size_t i= 0; i< 100; ++i)
ilist.push_back(100-i);


ilist.sort();


for(intlist::const_iterator p= ilist.begin(); p!= ilist.end(); ++p)
cout<< *p<< endl;


binary_search(ilist.begin(), ilist.end(), 50);
}
 
H

Hendrik Schober

Ioannis said:
[...]
We must think generally. In general, sorting a list is faster than
sorting a vector, because the list sorting does not involve the
construction or destruction of any object.

Regarding ints, I think sorting a vector of ints and as list of ints,
both have about the same efficiency.

Why don't you just measure before you doubt the statements
of those who already went and did this?

On my platform, this

#include <iostream>
#include <vector>
#include <list>
#include <algorithm>

const unsigned int kLimit = 1000000;

template< class Cont >
void fill(Cont& cont)
{
for( unsigned int u = 0; u < kLimit; ++u ) {
cont.push_back(u);
}
}

template< class Cont >
void test(Cont& cont);

int main()
{
std::vector<unsigned int> v; v.reserve(kLimit);
std::list<unsigned int> l;

std::cout << "filling a vector..." << std::endl;
fill(v);
std::cout << "filling a list..." << std::endl;
fill(l);
std::cout << "...done.\n";

std::cout << "sorting a vector..." << std::endl;
test(v);
std::cout << "sorting a list..." << std::endl;
test(l);

return 0;
}

template< typename T, class Al >
inline void sort(std::vector<T,Al>& v) {std::sort(v.begin(),v.end());}

template< typename T, class Al >
inline void sort(std::list<T,Al>& l) {l.sort();}

#include <windows.h> //for GetTickCount()

template< class Cont >
void test(Cont& cont) {
const DWORD start = GetTickCount();
sort(cont);
std::cout << "...took " << GetTickCount()-start << "msecs." << std::endl;
}

outputs
filling a vector...
filling a list...
...done.
sorting a vector...
...took 47msecs.
sorting a list...
...took 1562msecs.
and thus agrees with everyone who disagreed with you.
If the programmer decides to replace ints with other objects, he will
not have to change much in the code, if he uses a list.

Right. It wouldn't do any good anyway as this

#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
#include <string>

const unsigned int kLimit = 1000000;

template< class Cont >
void fill(Cont& cont)
{
for( unsigned int u = 0; u < kLimit; ++u ) {
cont.push_back(Test());
}
}

template< class Cont >
void test(Cont& cont);

class Test {
public:
Test() : instance_(++id_), str_("test it!") {}
bool operator<(const Test& rhs) {return instance_>rhs.instance_;}
private:
unsigned int instance_;
static unsigned int id_;
std::string str_;
};

unsigned int Test::id_ = 0;

int main()
{
std::vector<Test> v; v.reserve(kLimit);
std::list<Test> l;

std::cout << "filling a vector..." << std::endl;
fill(v);
std::cout << "filling a list..." << std::endl;
fill(l);
std::cout << "...done.\n";

std::cout << "sorting a vector..." << std::endl;
test(v);
std::cout << "sorting a list..." << std::endl;
test(l);

return 0;
}

template< typename T, class Al >
inline void sort(std::vector<T,Al>& v) {std::sort(v.begin(),v.end());}

template< typename T, class Al >
inline void sort(std::list<T,Al>& l) {l.sort();}

#include <windows.h> //for GetTickCount()

template< class Cont >
void test(Cont& cont) {
const DWORD start = GetTickCount();
sort(cont);
std::cout << "...took " << GetTickCount()-start << "msecs." << std::endl;
}

outputs

filling a vector...
filling a list...
...done.
sorting a vector...
...took 829msecs.
sorting a list...
...took 2437msecs.

and thus again disagrees with you.

Eagerly awaiting your counter example,

Scho-you-can-take-your-foot-out-of-your-mouth-now-bi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,801
Messages
2,569,658
Members
45,421
Latest member
DoreenCorn

Latest Threads

Top