help with vector<vector<double>>

T

T. Crane

T. Crane wrote:


:::
:::::
::::: So, if you did something like:
:::::
::::: std::vector<int> VecInt;
::::: VecInt.reserve( 100 ); // sets 'capacity', not 'size'
:::::
::::: ... and then somewhere you did:
:::::
::::: VecInt.push_back( 1 ); // this is the 101st push_back
:::::
::::: ... you kind-of shot yourself in the foot (it needs to realloc
::::: and copy). No big thing for small vectors, but costs time on
::::: very big vectors. (....and you could run out of memory during
::::: the realloc/copy).
:::
::: If you know that you actually need 101 push_backs, you could do a
::: reserve(101) :)
:::
::: If you don't know, and the numbers are reasonably random, you
::: will get a good performance *on average*. Note that after
::: "shooting yourself in the foot" at 101, you get the next 99
::: push_backs for free. If you reallocate at a-million-and-one, you
::: get the next million or so push_backs for free. That's cheap!
:::
::: Bo Persson
::
:: So I was curious to see which is faster -- pushing back onto a
:: vector or using direct index access of the elements. I declare a
:: std::vector<int> v, setting the number of elements to vSize. I set
:: the values using a for-loop. Then, I declare two
:: std::vector<vector<int> > objects, m1 & m2. The first of these I
:: initialize with mSize elements, and the second I use the
:: std::vector::reserve method to claim mSize space. The using two
:: for- loops I populate m1 & m2 using direct index accessing of the
:: elements to set them equal to v, and then I use push_back to fill
:: m2. I time these two for-loops as well as the whole function.
:: What I find (perhaps not suprisingly) is that I fill m1 much, much
:: faster, i.e. push_back() with reserve() are SLOW. Anyway, here's
:: the code. Nothing to special.
::
:: #include <vector>
:: #include <iostream>
:: #include <iomanip>
:: #include <time.h>
::
:: using namespace std;
::
:: int main(){
:: time_t t0_0,t1_0,t2_0;
:: time_t t0_f,t1_f,t2_f;
::
:: t0_0 = time(NULL);
:: int vSize = 10000000;
:: int mSize = 10;
::
:: vector<int> v(vSize);
::
:: for (int i=0;i<vSize;i++){v.at(i) = i;}
::
:: vector<vector<int> > m1(mSize);
:: vector<vector<int> > m2;
:: m2.reserve(mSize);
:: t1_0 = time (NULL);
:: for (int j=0;j<mSize;j++){m1.at(j) = v;}
::
:: t1_f = time(NULL);
:: t2_0 = time(NULL);
::
:: for (int i=0; i<mSize;i++){ m2.push_back(v);}
::
:: t2_f = time(NULL);
:: t0_f = time(NULL);
:: cout << "testTime0 = " << t0_f-t0_0 << endl;
:: cout << "testTime1 = " << t1_f-t1_0 << endl;
:: cout << "testTime2 = " << t2_f-t2_0 << endl;
::
:: return 0;
:: }

Strange!

I get about equal time for both versions. In release mode it in fact
runs so fast that I get 0-1 second for all results. Changing time() to
clock(), I get something like

testTime0 = 890
testTime1 = 422
testTime2 = 406

You don't run with iterator debugging enabled, or anything?

Bo Persson

No. At least I don't think so. I'm still pretty new to the Visual
Studio IDE. When I ran the code shown here (in Visual Studio -- not a
release .exe), I got

testTime0 = 43
testTime1 = 0
testTime2 = 38

The units are all seconds.
 
B

Bo Persson

BobR wrote:
:::
::: If you don't know, and the numbers are reasonably random, you
::: will get a good performance *on average*. Note that after
::: "shooting yourself in the foot" at 101, you get the next 99
::: push_backs for free. If you reallocate at a-million-and-one, you
::: get the next million or so push_backs for free. That's cheap!
::: Bo Persson
::
:: And if your current vector is using 2/3 of available 'real
:: estate', and you exceed '.capacity()', it ain't cheap no more! <G>

If we are talking gigabytes of estimated storage, optimization might
no longer be premature. :)

In this example said:
> will work pretty much like a deque anyway. One level of indirection,
and suballocations for each row.


Bo Persson
 
B

Bo Persson

T. Crane wrote:
::: T. Crane wrote:
:::
::::: So I was curious to see which is faster -- pushing back onto a
::::: vector or using direct index access of the elements. I declare
::::: a std::vector<int> v, setting the number of elements to vSize.
::::: I set the values using a for-loop. Then, I declare two
::::: std::vector<vector<int> > objects, m1 & m2. The first of these
::::: I initialize with mSize elements, and the second I use the
::::: std::vector::reserve method to claim mSize space. The using two
::::: for- loops I populate m1 & m2 using direct index accessing of
::::: the elements to set them equal to v, and then I use push_back
::::: to fill m2. I time these two for-loops as well as the whole
::::: function.
::::: What I find (perhaps not suprisingly) is that I fill m1 much,
::::: much faster, i.e. push_back() with reserve() are SLOW. Anyway,
::::: here's the code. Nothing to special.
:::::
::::: #include <vector>
::::: #include <iostream>
::::: #include <iomanip>
::::: #include <time.h>
:::::
::::: using namespace std;
:::::
::::: int main(){
::::: time_t t0_0,t1_0,t2_0;
::::: time_t t0_f,t1_f,t2_f;
:::::
::::: t0_0 = time(NULL);
::::: int vSize = 10000000;
::::: int mSize = 10;
:::::
::::: vector<int> v(vSize);
:::::
::::: for (int i=0;i<vSize;i++){v.at(i) = i;}
:::::
::::: vector<vector<int> > m1(mSize);
::::: vector<vector<int> > m2;
::::: m2.reserve(mSize);
::::: t1_0 = time (NULL);
::::: for (int j=0;j<mSize;j++){m1.at(j) = v;}
:::::
::::: t1_f = time(NULL);
::::: t2_0 = time(NULL);
:::::
::::: for (int i=0; i<mSize;i++){ m2.push_back(v);}
:::::
::::: t2_f = time(NULL);
::::: t0_f = time(NULL);
::::: cout << "testTime0 = " << t0_f-t0_0 << endl;
::::: cout << "testTime1 = " << t1_f-t1_0 << endl;
::::: cout << "testTime2 = " << t2_f-t2_0 << endl;
:::::
::::: return 0;
::::: }
:::
::: Strange!
:::
::: I get about equal time for both versions. In release mode it in
::: fact runs so fast that I get 0-1 second for all results. Changing
::: time() to clock(), I get something like
:::
::: testTime0 = 890
::: testTime1 = 422
::: testTime2 = 406
:::
::: You don't run with iterator debugging enabled, or anything?
:::
::: Bo Persson
::
:: No. At least I don't think so. I'm still pretty new to the Visual
:: Studio IDE. When I ran the code shown here (in Visual Studio --
:: not a release .exe), I got
::
:: testTime0 = 43
:: testTime1 = 0
:: testTime2 = 38
::
:: The units are all seconds.

Ok, I have another guess: I have 2 GB of RAM and you have less?

If I change mSize to 25, I get the same result as you have -- that the
second test always runs slower!

Try running them in the opposite order, and you will probably get the
opposite result. Benchmarks are just so much fun!



Bo Persson
 
B

BobR

T. Crane said:
So I was curious to see which is faster -- pushing back onto a vector
or using direct index access of the elements. I declare a
std::vector<int> v, setting the number of elements to vSize. I set
the values using a for-loop. Then, I declare two
std::vector<vector<int> > objects, m1 & m2. The first of these I
initialize with mSize elements, and the second I use the
std::vector::reserve method to claim mSize space. The using two for-
loops I populate m1 & m2 using direct index accessing of the elements
to set them equal to v, and then I use push_back to fill m2. I time
these two for-loops as well as the whole function. What I find
(perhaps not suprisingly) is that I fill m1 much, much faster, i.e.
push_back() with reserve() are SLOW. Anyway, here's the code.
Nothing to special.

#include <vector>
#include <iostream>
#include <iomanip>
#include <time.h>

using namespace std;

int main(){
time_t t0_0,t1_0,t2_0;
time_t t0_f,t1_f,t2_f;

t0_0 = time(NULL);
int vSize = 10000000;
int mSize = 10;

vector<int> v(vSize);

for (int i=0;i<vSize;i++){v.at(i) = i;}

vector<vector<int> > m1(mSize);
vector<vector<int> > m2;
m2.reserve(mSize);
t1_0 = time (NULL);
for (int j=0;j<mSize;j++){m1.at(j) = v;}

t1_f = time(NULL);
t2_0 = time(NULL);

for (int i=0; i<mSize;i++){ m2.push_back(v);}

t2_f = time(NULL);
t0_f = time(NULL);
cout << "testTime0 = " << t0_f-t0_0 << endl;
cout << "testTime1 = " << t1_f-t1_0 << endl;
cout << "testTime2 = " << t2_f-t2_0 << endl;

return 0;
}

[ my opinion ]
And your results may be false. Put the vector operations in functions, and
'call' them at least 1000 times, capturing the elapsed time each pass.
Examine the mean and deviations of the times.

If you are running on an OS, the OS can intefere with the times you got.
(memory management, other programs running (background), etc.)

I've run some time tests on vectors that iterated 100 times[1], and the
result fluctuated between 50 to 150 ticks ( std::clock_t() ) on multiple
runs.
V1 might take 50 ticks and V2 150 ticks on one run, and the times reverse on
the next run.

[1] - construct, fill, destruct cycle.
 
J

Juha Nieminen

Sorry to interrupt, but I have been using this method to initialize my
2D-Array for awhile

std::vector< std::vector <double> > array(nRow,
std::vector<double>(nCol, 0.0));

and fill it up index-wise, especially if the data is supposed to be
all over the place and not "column-wise" or "row-wise" insertion.

It works for me so far, however, being a newbie in programming, can
someone tell me what is the drawbacks of using this type of
initialization as opposed to methods suggested beforehand?

There's nothing wrong with that type of initialization. However,
it assumes that you know the dimensions in advance. If you don't
know them (which is not an unusual situation) then you obviously
can't allocate the space beforehand, but you have to rely on
dynamically increasing the size of the vectors as more and more
data is inserted into them.
If that is the case then in some situations it's more efficient
to use a std::deque instead of a std::vector, especially if copying
the elements is a slow operation.
 
J

Juha Nieminen

aku said:
If the nColumns is same for all rows you are better off allocating one
array which contains nColumns * nRows objects.

double* array = new double[nColumns * nRows];

Why do it that way and risk a memory leak, instead of doing it like:

std::vector<double> array(nColumns * nRows);

?

The std::vector is doing effectively the same thing, but it's safer.
 
A

aku ankka

aku said:
If the nColumns is same for all rows you are better off allocating one
array which contains nColumns * nRows objects.
double* array = new double[nColumns * nRows];

Why do it that way and risk a memory leak, instead of doing it like:

std::vector<double> array(nColumns * nRows);

?

The std::vector is doing effectively the same thing, but it's safer.

That's pretty irrelevant for my interests, the point is that vector of
vector's isn't what *I* would do for 2d array (most of the time).
 
A

Alf P. Steinbach

* aku ankka:
aku said:
If the nColumns is same for all rows you are better off allocating one
array which contains nColumns * nRows objects.
double* array = new double[nColumns * nRows];
Why do it that way and risk a memory leak, instead of doing it like:

std::vector<double> array(nColumns * nRows);

?

The std::vector is doing effectively the same thing, but it's safer.

That's pretty irrelevant for my interests, the point is that vector of
vector's isn't what *I* would do for 2d array (most of the time).

Try reading that posting again.
 
J

Juha Nieminen

aku said:
That's pretty irrelevant for my interests, the point is that vector of
vector's isn't what *I* would do for 2d array (most of the time).

Code safety is pretty irrelevant for you? I sure hope you are not
programming professionally.
 
J

James Kanze

I'm struggling with how to initialize a vector<vector<double>>
object. I'm pulling data out of a file and storing it in the
vector<vector<double>> object. Because any given file will have a
large amount of data, that I read off using an ifstream object, I
don't want to use the push_back method because this grows the
vector<vector<double>> dynamically, and that will kill my execution
time. So, I want to reserve space first, using, of course the reserve
method. However, I'm not sure what the best way of doing this is.
Here's what I am thinking of doing (I don't even know if this will
work) but if there's a better way to do it, I'm all ears:
#include <vector>
int nColumns = 10;
int nRows = 15;
vector<vector<double>> myData;
myData.reserve(nRows);

for (int i;i<nRows;i++){
myData.reserve(nColumns);
}


Is there some reason you don't write just:

std::vector< std::vector< double > >
myData( nRows,
std::vector< double >( nColumns ) ) ;

This will actually initialize all of the data with 0.0, which
you might not need, but it avoids the need of push_back
entirely.
 
A

aku ankka

Code safety is pretty irrelevant for you? I sure hope you are not
programming professionally.

No, discussing it with _you_ is irrelevant for me, because that
doesn't touch the point I was making: using vector of vectors for
presenting 2d array is lame.

I'm sure a smart guy like you can see the difference.
 
A

aku ankka

That's pretty irrelevant for my interests, the point is that vector of
Try reading that posting again.

It adds irrelevant detail, which might be interesting if array vs.
vector were the topic. If I wanted to add more information into the
original posting I would have ranted ten pages on the virtues of
different containers, but I wasn't posting about that were I?

Nothing wrong people suggesting deques and what not as "optimization"
for such trivial task? Pffff.
 
B

BobR

Bo Persson said:
::
:: And if your current vector is using 2/3 of available 'real
:: estate', and you exceed '.capacity()', it ain't cheap no more! <G>

If we are talking gigabytes of estimated storage, optimization might
no longer be premature. :)

Never thought I'd see the day that someone would say that in this NG, even
in jest. <G>
 
J

Juha Nieminen

aku said:
No, discussing it with _you_ is irrelevant for me, because that
doesn't touch the point I was making: using vector of vectors for
presenting 2d array is lame.

I'm sure a smart guy like you can see the difference.

The problem is that by using 'new' instead of a std::vector you
are teaching bad habits. The exact same idea of using a 1-dimensional
vector to represent a 2-dimensional vector could have been done with
a std::vector, and it would have been a much better example.
In the worst case the original poster might get the idea that using
'new' instead of std::vector is a better solution.
 
A

aku ankka

The problem is that by using 'new' instead of a std::vector you
are teaching bad habits. The exact same idea of using a 1-dimensional
vector to represent a 2-dimensional vector could have been done with
a std::vector, and it would have been a much better example.
In the worst case the original poster might get the idea that using
'new' instead of std::vector is a better solution.

That is still a topic I am not interested in discussing with you. He
can use whatever he sees fit - the optimization has been done; now it
is issue of style, safety, yada yada yadayaa, on and on.
 
M

mistabean

There's nothing wrong with that type of initialization. However,
it assumes that you know the dimensions in advance. If you don't
know them (which is not an unusual situation) then you obviously
can't allocate the space beforehand, but you have to rely on
dynamically increasing the size of the vectors as more and more
data is inserted into them.
If that is the case then in some situations it's more efficient
to use a std::deque instead of a std::vector, especially if copying
the elements is a slow operation.

ahaaa.. so nothing wrong with that one then? In my programs that I am
currently doing, nRow and nCol are always calculated during runtime,
depending on what type of datas that I have.

And I found it much easier to debug, as opposed to push_back()'s. Of
course, it is also a matter of me always forgetting to clear(), when I
am doing the filling-up over and over again, and the std::vector is
either a return parameter, or declared outside the loop.

And have to read up more about std::deque, since I have one one-
dimensional std::vector that is initialized at start-up with an
initial size, but will be growing during runtime. Thanks for the
advice.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,242
Latest member
KendrickKo

Latest Threads

Top