help with vector<vector<double>>

T

T. Crane

Hi,

I'm struggling with how to initialize a vector<vector<double>>
object. I'm pulling data out of a file and storing it in the
vector<vector<double>> object. Because any given file will have a
large amount of data, that I read off using an ifstream object, I
don't want to use the push_back method because this grows the
vector<vector<double>> dynamically, and that will kill my execution
time. So, I want to reserve space first, using, of course the reserve
method. However, I'm not sure what the best way of doing this is.
Here's what I am thinking of doing (I don't even know if this will
work) but if there's a better way to do it, I'm all ears:

#include <vector>

int nColumns = 10;
int nRows = 15;

vector<vector<double>> myData;

myData.reserve(nRows);

for (int i;i<nRows;i++){
myData.reserve(nColumns);
}


thanks,
trevis
 
A

Alf P. Steinbach

* T. Crane:
Hi,

I'm struggling with how to initialize a vector<vector<double>>
object. I'm pulling data out of a file and storing it in the
vector<vector<double>> object. Because any given file will have a
large amount of data, that I read off using an ifstream object, I
don't want to use the push_back method because this grows the
vector<vector<double>> dynamically, and that will kill my execution
time.

Unwarranted assumption.

First, make it correct.

Then, make it correct.
 
P

Philip Potter

T. Crane said:
#include <vector>

int nColumns = 10;
int nRows = 15;

vector<vector<double>> myData;

myData.reserve(nRows);

for (int i;i<nRows;i++){
myData.reserve(nColumns);
}


This code is broken. vector::reserve() doesn't actually create or
initialize objects in the vector; it only allocates space for objects
which have yet to be inserted using push_back() or whatever. As a
result, even after the myData.reserve() call, myData contains no items
and myData refers to an item which doesn't exist.

As Alf says, you may not need to reserve at all. First, make it correct.
Then, only if necessary, make it faster.

Phil
 
T

T. Crane

T. Crane said:
#include <vector>
int nColumns = 10;
int nRows = 15;
vector<vector<double>> myData;
myData.reserve(nRows);

for (int i;i<nRows;i++){
myData.reserve(nColumns);
}


This code is broken. vector::reserve() doesn't actually create or
initialize objects in the vector; it only allocates space for objects
which have yet to be inserted using push_back() or whatever. As a
result, even after the myData.reserve() call, myData contains no items
and myData refers to an item which doesn't exist.

As Alf says, you may not need to reserve at all. First, make it correct.
Then, only if necessary, make it faster.

Phil


OK -- that helps some. A related question then: if I have a vector
object and I don't reserve any space beforehand, as I understand it if
I push_back values into the vector, this will cause at some point the
vector object to exceed its initial capacity. At this point the
vector will identify a contiguous block of memory that is large enough
to accomodate the original vector plus the additional data. It will
then make a copy of itself and the new data into the new block of
memory. Is this basically how it works?

If it is, I call this growing the vector dynamically. This is in
contrast to reserving one chunk of memory that's big enough to hold
everything you want at the beginning and then populating it using
push_back. In this manner, there is never a reason to go out and find
a new chunk of contiguous memory.

Does this clarify what I'm getting at? And yes, I understand that I
could get it right first and then get it faster, but if I can do both
at once, why not?

thanks for your help,
trevis
 
B

Bo Persson

T. Crane wrote:
:: On Aug 9, 1:17 pm, Philip Potter <[email protected]>
:: wrote:
::: T. Crane wrote:
:::: #include <vector>
:::
:::: int nColumns = 10;
:::: int nRows = 15;
:::
:::: vector<vector<double>> myData;
:::
:::: myData.reserve(nRows);
:::
:::: for (int i;i<nRows;i++){
:::: myData.reserve(nColumns);
:::: }
:::
::: This code is broken. vector::reserve() doesn't actually create or
::: initialize objects in the vector; it only allocates space for
::: objects which have yet to be inserted using push_back() or
::: whatever. As a result, even after the myData.reserve() call,
::: myData contains no items and myData refers to an item which
::: doesn't exist.
:::
::: As Alf says, you may not need to reserve at all. First, make it
::: correct. Then, only if necessary, make it faster.
:::
::: Phil
::
:: OK -- that helps some. A related question then: if I have a
:: vector object and I don't reserve any space beforehand, as I
:: understand it if I push_back values into the vector, this will
:: cause at some point the vector object to exceed its initial
:: capacity. At this point the vector will identify a contiguous
:: block of memory that is large enough to accomodate the original
:: vector plus the additional data. It will then make a copy of
:: itself and the new data into the new block of memory. Is this
:: basically how it works?

About right, yes. It also uses additional features, like for each time
the data is copied it allocates increasingly larger and larger blocks
of memory, so that it will not have to make additional copies for a
long time.

If your real sizes are 10 and 15, you definitely don't have to worry.
If you have 100s of millions of elements, you might have to. But you
will have to try it out first!


Bo Persson
 
T

T. Crane

T. Crane wrote:

::: T. Crane wrote:

:::: #include <vector>
:::
:::: int nColumns = 10;
:::: int nRows = 15;
:::
:::: vector<vector<double>> myData;
:::
:::: myData.reserve(nRows);
:::
:::: for (int i;i<nRows;i++){
:::: myData.reserve(nColumns);
:::: }
:::
::: This code is broken. vector::reserve() doesn't actually create or
::: initialize objects in the vector; it only allocates space for
::: objects which have yet to be inserted using push_back() or
::: whatever. As a result, even after the myData.reserve() call,
::: myData contains no items and myData refers to an item which
::: doesn't exist.
:::
::: As Alf says, you may not need to reserve at all. First, make it
::: correct. Then, only if necessary, make it faster.
:::
::: Phil
::
:: OK -- that helps some. A related question then: if I have a
:: vector object and I don't reserve any space beforehand, as I
:: understand it if I push_back values into the vector, this will
:: cause at some point the vector object to exceed its initial
:: capacity. At this point the vector will identify a contiguous
:: block of memory that is large enough to accomodate the original
:: vector plus the additional data. It will then make a copy of
:: itself and the new data into the new block of memory. Is this
:: basically how it works?

About right, yes. It also uses additional features, like for each time
the data is copied it allocates increasingly larger and larger blocks
of memory, so that it will not have to make additional copies for a
long time.

If your real sizes are 10 and 15, you definitely don't have to worry.
If you have 100s of millions of elements, you might have to. But you
will have to try it out first!

Bo Persson


My real sizes are:

100 data points cubed x 2 (two channels) x ~ 8 (different parameter
sets) ~= 16 million data points. Each data point is a 4 element
vector.

So, I'll try this first (as an example):

#include <vector>

int nColumns = 2;
int nRows = 3;

vector<float> v1(3),v2(3);
vector<vector<float>> m(2);

v1.push_back(0.23);
v1.push_back(0.36);
v1.push_back(1.54);

v2.push_back(0.76);
v2.push_back(0.86);
v2.push_back(2.92);

m.push_back(v1);
m.push_back(v2);

First off, is this the "recommended" way of populating a 2D vector?
Second, if I find that I really, really want to use vector::reserve(),
how would I do it for a 2D vector?

#include <vector>

vector<float> v;

v.reserve(50);

vector<vector<float>> m;

m.reserve(...?) How do I tell it to reserve enough memory to hold n
vector elements like v?

thanks again for your help,
trevis
 
B

BobR

T. Crane said:
Hi,

I'm struggling with how to initialize a vector<vector<double>>
object. I'm pulling data out of a file and storing it in the
vector<vector<double>> object. Because any given file will have a
large amount of data, that I read off using an ifstream object, I
don't want to use the push_back method because this grows the
vector<vector<double>> dynamically, and that will kill my execution
time. So, I want to reserve space first, using, of course the reserve
method. However, I'm not sure what the best way of doing this is.
Here's what I am thinking of doing (I don't even know if this will
work) but if there's a better way to do it, I'm all ears:

#include <vector>

int nColumns = 10;
int nRows = 15;

vector<vector<double>> myData;

myData.reserve(nRows);

for (int i;i<nRows;i++){

No, no. Use:

for( size_t i(0); i < myData.size(); ++i){

That way it will do nothing (Philip Potter explained why).
myData.reserve(nColumns);
}

thanks, trevis


size_t const nColumns( 10 );
size_t const nRows( 15 );

std::vector<std::vector<double> > vec2D( nRows,
std::vector<double>( nColumns, 123.4567 ) );

If you don't want to initialize the doubles:

std::vector<std::vector<double> > vec2D( nRows, nColumns );

for( size_t x(0); x < vec2D.size(); ++x){
for( size_t y(0); y < vec2D.at(x).size(); ++y){
// get DoubleFromFile
vec2D.at( x ).at( y ) = DoubleFromFile;
// - possibly -
// if( MyFile ){
// MyFile >> vec2D.at( x ).at( y );
// }
} // for(y)
} // for(x)

std::vector also has a resize().
vec2D.at( 0 ).resize( 20 );
 
T

T. Crane

I'm struggling with how to initialize a vector<vector<double>>
object. I'm pulling data out of a file and storing it in the
vector<vector<double>> object. Because any given file will have a
large amount of data, that I read off using an ifstream object, I
don't want to use the push_back method because this grows the
vector<vector<double>> dynamically, and that will kill my execution
time. So, I want to reserve space first, using, of course the reserve
method. However, I'm not sure what the best way of doing this is.
Here's what I am thinking of doing (I don't even know if this will
work) but if there's a better way to do it, I'm all ears:
#include <vector>
int nColumns = 10;
int nRows = 15;
vector<vector<double>> myData;

for (int i;i<nRows;i++){

No, no. Use:

for( size_t i(0); i < myData.size(); ++i){

That way it will do nothing (Philip Potter explained why).
myData.reserve(nColumns);
}

thanks, trevis

size_t const nColumns( 10 );
size_t const nRows( 15 );

std::vector<std::vector<double> > vec2D( nRows,
std::vector<double>( nColumns, 123.4567 ) );

If you don't want to initialize the doubles:

std::vector<std::vector<double> > vec2D( nRows, nColumns );

for( size_t x(0); x < vec2D.size(); ++x){
for( size_t y(0); y < vec2D.at(x).size(); ++y){
// get DoubleFromFile
vec2D.at( x ).at( y ) = DoubleFromFile;
// - possibly -
// if( MyFile ){
// MyFile >> vec2D.at( x ).at( y );
// }
} // for(y)
} // for(x)

std::vector also has a resize().
vec2D.at( 0 ).resize( 20 );

--
Bob R
POVrookie- Hide quoted text -

- Show quoted text -


Thanks for your help. I've got to play with it now.
 
J

Juha Nieminen

If you are worried about the copying overhead caused when the vector
increases its own size because of repeated push_backs, you might want
to consider using a std::deque instead. std::deque supports random
access iterators too, but its advantage is that it doesn't need to move
any existing elements when it allocates more space for itself (the
disadvantage is that its memory is not allocated contiguously, which is
usually not a problem, and that indexing might be slightly slower than
with a vector, which also isn't often a problem either). And of course
you get a constant-time push_front too, if you need it.
 
M

mistabean

Sorry to interrupt, but I have been using this method to initialize my
2D-Array for awhile

std::vector< std::vector <double> > array(nRow,
std::vector<double>(nCol, 0.0));

and fill it up index-wise, especially if the data is supposed to be
all over the place and not "column-wise" or "row-wise" insertion.

It works for me so far, however, being a newbie in programming, can
someone tell me what is the drawbacks of using this type of
initialization as opposed to methods suggested beforehand?
 
B

BobR

Sorry to interrupt, but I have been using this method to initialize my
2D-Array for awhile

std::vector< std::vector <double> > array(nRow,
std::vector<double>(nCol, 0.0));

and fill it up index-wise, especially if the data is supposed to be
all over the place and not "column-wise" or "row-wise" insertion.

It works for me so far, however, being a newbie in programming, can
someone tell me what is the drawbacks of using this type of
initialization as opposed to methods suggested beforehand?

If you know the data size, it's best to reserve the memory. If you don't
know data size, push_back, and let it grow dynamically.

Check-out the output from this snippet (shown at end):

void PrintVec( std::vector<int> const &vec, std::eek:stream &sout){
sout<<" size="<<vec.size()
<<" cap="<<vec.capacity()<<std::endl;
return;
} // PrintVec(vector<int> const&,ostream&)

void VecIntSize( std::eek:stream &cout ){
cout<<"\n--- VecInt(10) size test ---"<<std::endl;
std::vector<int> VecInt( 10 );
PrintVec( VecInt, cout);
VecInt.push_back( 1 );
PrintVec( VecInt, cout);
for(size_t i(0); i < 11; ++i){ VecInt.push_back( i );}
PrintVec( VecInt, cout);
VecInt.resize( 50 );
PrintVec( VecInt, cout);
VecInt.push_back( 1 );
PrintVec( VecInt, cout);
VecInt.resize( 40 );
PrintVec( VecInt, cout);

cout<<" std::vector<int>().swap( VecInt );"<<std::endl;
std::vector<int>().swap( VecInt );
PrintVec( VecInt, cout);

cout<<"--- VecInt(10) size test ---END"<<std::endl;
return;
}

/* --- VecInt(10) size test ---
size=10 cap=10
size=11 cap=20 // note how capacity doubled
size=22 cap=40 // .... and again
size=50 cap=50 // resize
size=51 cap=100 // push_back
size=40 cap=100 // size reduced, capacity stayed same
std::vector<int>( ).swap( VecInt ); // funky, but 'resets' vector
size=0 cap=0
--- VecInt(10) size test ---END
*/

So, if you did something like:

std::vector<int> VecInt;
VecInt.reserve( 100 ); // sets 'capacity', not 'size'

.... and then somewhere you did:

VecInt.push_back( 1 ); // this is the 101st push_back

.... you kind-of shot yourself in the foot (it needs to realloc and copy).
No big thing for small vectors, but costs time on very big vectors.
(....and you could run out of memory during the realloc/copy).
 
F

Frank Birbacher

BobR said:
VecInt.push_back( 1 ); // this is the 101st push_back

... you kind-of shot yourself in the foot (it needs to realloc and copy).
No big thing for small vectors, but costs time on very big vectors.
(....and you could run out of memory during the realloc/copy).

Actually using push_back n times on an empty vector costs O(n). Thus
from complexity theory it costs nothing extra.

In practise I'd prefer std::deque.

Frank
 
A

aku ankka

If the nColumns is same for all rows you are better off allocating one
array which contains nColumns * nRows objects.

double* array = new double[nColumns * nRows];

Indexing a single object:

// object at (x,y) is ..
double* object = array + nColumns * y + x;

It can be argued if vector of vector's for 2d matrix is better or vice-
versa, but how does that sound to you?
 
B

BobR

Frank Birbacher said:
Actually using push_back n times on an empty vector costs O(n). Thus
from complexity theory it costs nothing extra.

But, I was talking about when '.size()' passes '.capacity()'.
Ref: // this is the 101st push_back
.... when capacity was 100.
In practise I'd prefer std::deque.
Frank

Depends on what you're doing.

[ note in old STL docs ]
"
[5] Vector is usually preferable to deque and list. Deque is useful in the
case of frequent insertions at both the beginning and end of the sequence,
and list and slist are useful in the case of frequent insertions in the
middle of the sequence. In almost all other situations, vector is more
efficient.
"
 
B

BobR

aku ankka said:
If the nColumns is same for all rows you are better off allocating one
array which contains nColumns * nRows objects.

double* array = new double[nColumns * nRows];

Indexing a single object:

// object at (x,y) is ..
double* object = array + nColumns * y + x;

And later:

delete [] array;

<G>
 
B

Bo Persson

BobR wrote:
::
:: So, if you did something like:
::
:: std::vector<int> VecInt;
:: VecInt.reserve( 100 ); // sets 'capacity', not 'size'
::
:: ... and then somewhere you did:
::
:: VecInt.push_back( 1 ); // this is the 101st push_back
::
:: ... you kind-of shot yourself in the foot (it needs to realloc and
:: copy). No big thing for small vectors, but costs time on very big
:: vectors. (....and you could run out of memory during the
:: realloc/copy).

If you know that you actually need 101 push_backs, you could do a
reserve(101) :)

If you don't know, and the numbers are reasonably random, you will get
a good performance *on average*. Note that after "shooting yourself in
the foot" at 101, you get the next 99 push_backs for free. If you
reallocate at a-million-and-one, you get the next million or so
push_backs for free. That's cheap!


Bo Persson
 
T

T. Crane

BobR wrote:

::
:: So, if you did something like:
::
:: std::vector<int> VecInt;
:: VecInt.reserve( 100 ); // sets 'capacity', not 'size'
::
:: ... and then somewhere you did:
::
:: VecInt.push_back( 1 ); // this is the 101st push_back
::
:: ... you kind-of shot yourself in the foot (it needs to realloc and
:: copy). No big thing for small vectors, but costs time on very big
:: vectors. (....and you could run out of memory during the
:: realloc/copy).

If you know that you actually need 101 push_backs, you could do a
reserve(101) :)

If you don't know, and the numbers are reasonably random, you will get
a good performance *on average*. Note that after "shooting yourself in
the foot" at 101, you get the next 99 push_backs for free. If you
reallocate at a-million-and-one, you get the next million or so
push_backs for free. That's cheap!

Bo Persson

So I was curious to see which is faster -- pushing back onto a vector
or using direct index access of the elements. I declare a
std::vector<int> v, setting the number of elements to vSize. I set
the values using a for-loop. Then, I declare two
std::vector<vector<int> > objects, m1 & m2. The first of these I
initialize with mSize elements, and the second I use the
std::vector::reserve method to claim mSize space. The using two for-
loops I populate m1 & m2 using direct index accessing of the elements
to set them equal to v, and then I use push_back to fill m2. I time
these two for-loops as well as the whole function. What I find
(perhaps not suprisingly) is that I fill m1 much, much faster, i.e.
push_back() with reserve() are SLOW. Anyway, here's the code.
Nothing to special.

#include <vector>
#include <iostream>
#include <iomanip>
#include <time.h>

using namespace std;

int main(){
time_t t0_0,t1_0,t2_0;
time_t t0_f,t1_f,t2_f;

t0_0 = time(NULL);
int vSize = 10000000;
int mSize = 10;

vector<int> v(vSize);

for (int i=0;i<vSize;i++){v.at(i) = i;}

vector<vector<int> > m1(mSize);
vector<vector<int> > m2;
m2.reserve(mSize);
t1_0 = time (NULL);
for (int j=0;j<mSize;j++){m1.at(j) = v;}

t1_f = time(NULL);
t2_0 = time(NULL);

for (int i=0; i<mSize;i++){ m2.push_back(v);}

t2_f = time(NULL);
t0_f = time(NULL);
cout << "testTime0 = " << t0_f-t0_0 << endl;
cout << "testTime1 = " << t1_f-t1_0 << endl;
cout << "testTime2 = " << t2_f-t2_0 << endl;

return 0;
}
 
F

Frank Birbacher

Hi!
Depends on what you're doing.

As always. I'll try to explain myself.
[ note in old STL docs ]
"
[5] Vector is usually preferable to deque and list.
[snip]

Sure. I've got my advise from the "C++ Coding Standards" (Sutter and
Alexandrescu). They argu that all operations on std::deque are at least
as fast as the corresponding operation on std::vector in terms of
complexity theory. Paired with the advise not to optimize prematurely
they suggest to always start using std::deque. If you later discover
performance issues you can still switch to vector.

So I prefer std::deque. Your milage may vary.

Frank
 
B

BobR

Bo Persson said:
If you don't know, and the numbers are reasonably random, you will get
a good performance *on average*. Note that after "shooting yourself in
the foot" at 101, you get the next 99 push_backs for free. If you
reallocate at a-million-and-one, you get the next million or so
push_backs for free. That's cheap!
Bo Persson

And if your current vector is using 2/3 of available 'real estate', and you
exceed '.capacity()', it ain't cheap no more! <G>

But then, you should have taken that old 80286 machine down to the salvage
yard years ago. :-}
[ never put computers in the normal trash! ].
 
B

Bo Persson

T. Crane wrote:
::: BobR wrote:
:::
:::::
::::: So, if you did something like:
:::::
::::: std::vector<int> VecInt;
::::: VecInt.reserve( 100 ); // sets 'capacity', not 'size'
:::::
::::: ... and then somewhere you did:
:::::
::::: VecInt.push_back( 1 ); // this is the 101st push_back
:::::
::::: ... you kind-of shot yourself in the foot (it needs to realloc
::::: and copy). No big thing for small vectors, but costs time on
::::: very big vectors. (....and you could run out of memory during
::::: the realloc/copy).
:::
::: If you know that you actually need 101 push_backs, you could do a
::: reserve(101) :)
:::
::: If you don't know, and the numbers are reasonably random, you
::: will get a good performance *on average*. Note that after
::: "shooting yourself in the foot" at 101, you get the next 99
::: push_backs for free. If you reallocate at a-million-and-one, you
::: get the next million or so push_backs for free. That's cheap!
:::
::: Bo Persson
::
:: So I was curious to see which is faster -- pushing back onto a
:: vector or using direct index access of the elements. I declare a
:: std::vector<int> v, setting the number of elements to vSize. I set
:: the values using a for-loop. Then, I declare two
:: std::vector<vector<int> > objects, m1 & m2. The first of these I
:: initialize with mSize elements, and the second I use the
:: std::vector::reserve method to claim mSize space. The using two
:: for- loops I populate m1 & m2 using direct index accessing of the
:: elements to set them equal to v, and then I use push_back to fill
:: m2. I time these two for-loops as well as the whole function.
:: What I find (perhaps not suprisingly) is that I fill m1 much, much
:: faster, i.e. push_back() with reserve() are SLOW. Anyway, here's
:: the code. Nothing to special.
::
:: #include <vector>
:: #include <iostream>
:: #include <iomanip>
:: #include <time.h>
::
:: using namespace std;
::
:: int main(){
:: time_t t0_0,t1_0,t2_0;
:: time_t t0_f,t1_f,t2_f;
::
:: t0_0 = time(NULL);
:: int vSize = 10000000;
:: int mSize = 10;
::
:: vector<int> v(vSize);
::
:: for (int i=0;i<vSize;i++){v.at(i) = i;}
::
:: vector<vector<int> > m1(mSize);
:: vector<vector<int> > m2;
:: m2.reserve(mSize);
:: t1_0 = time (NULL);
:: for (int j=0;j<mSize;j++){m1.at(j) = v;}
::
:: t1_f = time(NULL);
:: t2_0 = time(NULL);
::
:: for (int i=0; i<mSize;i++){ m2.push_back(v);}
::
:: t2_f = time(NULL);
:: t0_f = time(NULL);
:: cout << "testTime0 = " << t0_f-t0_0 << endl;
:: cout << "testTime1 = " << t1_f-t1_0 << endl;
:: cout << "testTime2 = " << t2_f-t2_0 << endl;
::
:: return 0;
:: }

Strange!

I get about equal time for both versions. In release mode it in fact
runs so fast that I get 0-1 second for all results. Changing time() to
clock(), I get something like

testTime0 = 890
testTime1 = 422
testTime2 = 406

You don't run with iterator debugging enabled, or anything?


Bo Persson
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top