The fastest way to fill 2D dynamic array with zeros

R

romayankin

Hello All,

I'm writing cross-platform code so i'm bound to standards. Here is
the code I have:
~~~~~~~~~~~~~~~~~~
double **mx = new double*[col - 1];
for(int i = 0; i < col - 1; i++) {
mx = new double[row];
sizeOfMx += sizeof(double) * row;
}
//I need to fill "mx" with zeros as fast as it is possible
~~~~~~~~~~~~~~~~~~~

So as I mentioned in comment, i have problem with finding the extremely
fast way to fill the 2D dynamic array with zeros. Does anyone know
something I that might help me?

Thanks,
-R.
 
B

Bob Hairgrove

Hello All,

I'm writing cross-platform code so i'm bound to standards. Here is
the code I have:
~~~~~~~~~~~~~~~~~~
double **mx = new double*[col - 1];
for(int i = 0; i < col - 1; i++) {
mx = new double[row];
sizeOfMx += sizeof(double) * row;
}
//I need to fill "mx" with zeros as fast as it is possible
~~~~~~~~~~~~~~~~~~~

So as I mentioned in comment, i have problem with finding the extremely
fast way to fill the 2D dynamic array with zeros. Does anyone know
something I that might help me?

Thanks,
-R.


It would be much faster to calculate rows*columns and allocate your 2D
array in one call to new[] as if it were a 1D array, rather than doing
one new[] per column. Then you could initialize the elements with a
for loop as if it were a 1D array. I don't know if using a loop is the
fastest way, but all those new[] calls are probably going to be much
more expensive.

If you have millions or billions of doubles, you might want to split
the memory up into buckets and allocate a few threads to initialize
them in parallel. But this is not going to be platform-independent, so
I would just use a loop.

Probably the best idea is to use a std::vector<double> and treat it
like an array ... that way you won't have to call delete[] on your
arrays.

Implementing 2D arrays is also treated by Scott Meyers, among others,
in his book "More Effective C++". Google for "c++" and "2D array" and
you'll get lots of hits.
 
G

Gianni Mariani

Hello All,

I'm writing cross-platform code so i'm bound to standards. Here is
the code I have:

Look at "matrix" "C++" on google.
~~~~~~~~~~~~~~~~~~
double **mx = new double*[col - 1];
for(int i = 0; i < col - 1; i++) {
mx = new double[row];
sizeOfMx += sizeof(double) * row;
}
//I need to fill "mx" with zeros as fast as it is possible
~~~~~~~~~~~~~~~~~~~

So as I mentioned in comment, i have problem with finding the extremely
fast way to fill the 2D dynamic array with zeros. Does anyone know
something I that might help me?


How big are these arrays going to be ?
 
V

Victor Bazarov

Gianni said:
Hello All,

I'm writing cross-platform code so i'm bound to standards. Here is
the code I have:

Look at "matrix" "C++" on google.
~~~~~~~~~~~~~~~~~~
double **mx = new double*[col - 1];

I didn't see anybody mention this, but why are you allocating
one less than what seems to be the "number of columns"?
for(int i = 0; i < col - 1; i++) {
mx = new double[row];
sizeOfMx += sizeof(double) * row;
}
//I need to fill "mx" with zeros as fast as it is possible
~~~~~~~~~~~~~~~~~~~

So as I mentioned in comment, i have problem with finding the
extremely fast way to fill the 2D dynamic array with zeros. Does
anyone know something I that might help me?


Yes, if you do

mx = new double[row]();

(notice the parentheses after the brackets), then the elements
of that array shall be value-initialised, and for 'double' it
means zero-initialised. Beware, not all compilers are capable
of handling that syntax (surprisingly).
How big are these arrays going to be ?

Why should it matter, Gianni?

V
 
A

Axter

Hello All,

I'm writing cross-platform code so i'm bound to standards. Here is
the code I have:
~~~~~~~~~~~~~~~~~~
double **mx = new double*[col - 1];
for(int i = 0; i < col - 1; i++) {
mx = new double[row];
sizeOfMx += sizeof(double) * row;
}
//I need to fill "mx" with zeros as fast as it is possible
~~~~~~~~~~~~~~~~~~~

So as I mentioned in comment, i have problem with finding the extremely
fast way to fill the 2D dynamic array with zeros. Does anyone know
something I that might help me?


Here's a wrapper function that will initialize a 2D array to zero, much
more efficient.
template < typename T >
T **Allocate2DArray( int nRows, int nCols)
{
T **ppi = new T*[nRows];
//parentheses after the brackets makes it initialize to zero
T *curPtr = new T [nRows * nCols]();
for( int i = 0; i < nRows; i++) {
*(ppi + i) = curPtr;
curPtr += nCols;
}
return ppi;
}

The above wrapper function only makes two calls to new, which means the
following wrapper function only needs to make two calls to delete:

template < typename T >
void Free2DArray(T** Array)
{
delete [] *Array;
delete [] Array;
}

Example usage:
double **d = Allocate2DArray<double>(x, y);
d[0][0] = 10.0;
d[1][1] = 20.0;
d[3][2] = 2345.09;
Free2DArray(d);
 
R

romayankin

First answering your questions:

2Bob Hairgrove
I didn't want to create 1D array or use std::vector, because working
with 1D array involves to much address arithmetic's, and std::vector
is by default slower then any array, besides I do not need any
functionality that is provided by std::vector
How big are these arrays going to be ?
Not quite big the max value is around 1000 elements, in most cases
~100.
I didn't see anybody mention this, but why are you allocating one less than what seems to be the "number of columns"?
This is an atavism. That's how it works for me :)


OK, thanks everyone for the replies, but that was my fault that I
didn't emphasize that my performance problem does not involve
allocation. It is completely in setting all elements of a particular 2D
array to zero. I'm creating this array only once and then i might need
to zero it tens thousands of times.

Up until now the best option I could come up with is:
~~~~~~~~~~~~~~~~~~
for(int i = 0; i < col - 1; i++) {
memset(mx, 0, sizeof(double)*row);
}
~~~~~~~~~~~~~~~~~~
 
G

Gianni Mariani

Victor said:
Gianni said:
Hello All,

I'm writing cross-platform code so i'm bound to standards. Here is
the code I have:

Look at "matrix" "C++" on google.

~~~~~~~~~~~~~~~~~~
double **mx = new double*[col - 1];


I didn't see anybody mention this, but why are you allocating
one less than what seems to be the "number of columns"?

Good catch. I was thinking "matrix" class so I wasn't looking there.
for(int i = 0; i < col - 1; i++) {
mx = new double[row];
sizeOfMx += sizeof(double) * row;
}
//I need to fill "mx" with zeros as fast as it is possible
~~~~~~~~~~~~~~~~~~~

So as I mentioned in comment, i have problem with finding the
extremely fast way to fill the 2D dynamic array with zeros. Does
anyone know something I that might help me?



Yes, if you do

mx = new double[row]();

(notice the parentheses after the brackets), then the elements
of that array shall be value-initialised, and for 'double' it
means zero-initialised. Beware, not all compilers are capable
of handling that syntax (surprisingly).

How big are these arrays going to be ?


Why should it matter, Gianni?


Big (100's of cache size) sized arrays would benefit by using cache
prefetch and depending on the architecture (multiple instruction issue
and speculative execution) there are a number of things you can do to
re-order the statements in ways the compiler probably would not be able
to optimize for. These kinds of optimizations depend heavily on
architecture.

For example, in SGI's IRIX 4.x days, the memcpy routine was not
optimized for the R400x CPU's and some simple loop unrolling and the use
of "unaligned" ld/st increased performance by significant multiples (the
R4K had a single way associative cache). For the R8K and R10K, with
multiple issue, speculative execution and cache prefetch instructions,
the main loop was very different to get the most out of the CPU. In the
R10K, loop unrolling was next to useless, in the R4K loop unrolling was
a big boost.

However, if we're talking about what fits into a secondary cache, all
this is moot since big hits in performance in modern CPU's are due to
cache misses (mostly secondary cache). This also means that the speed
for "clearing out" a block of memory may be limited by the memory
bandwidth. So, the aim is to come up with an algorithm that is able to
saturate the memory interfaces.

However, if we're talking about a 4x4 matrix, that would fit in a small
number of cache lines and by the time the memory allocator handed over
the memory, it would more than likely already be in primary cache so
there would be next to no benefit to apply any of the more advanced
techniques to deal with cache misses.

All this is of course highly dependant on *your* system.
 
G

Gianni Mariani

First answering your questions:

2Bob Hairgrove
I didn't want to create 1D array or use std::vector, because working
with 1D array involves to much address arithmetic's, and std::vector
is by default slower then any array, besides I do not need any
functionality that is provided by std::vector

*BZZT* Wrong.

std::vector is not slower than an array unless you have tested it. In
many cases, I have seen negligible different between std::vector and an
array when the optimizer gets done with it.
Not quite big the max value is around 1000 elements, in most cases
~100.



This is an atavism. That's how it works for me :)


OK, thanks everyone for the replies, but that was my fault that I
didn't emphasize that my performance problem does not involve
allocation. It is completely in setting all elements of a particular 2D
array to zero. I'm creating this array only once and then i might need
to zero it tens thousands of times.

BTW - the address arithmetic you feel is causing you performance issues
is far more likely to be less of an issue than the memory fetch for the
address you're looking for.
Up until now the best option I could come up with is:
~~~~~~~~~~~~~~~~~~
for(int i = 0; i < col - 1; i++) {
memset(mx, 0, sizeof(double)*row);
}
~~~~~~~~~~~~~~~~~~
 
A

Axter

First answering your questions:

2Bob Hairgrove
I didn't want to create 1D array or use std::vector, because working
with 1D array involves to much address arithmetic's, and std::vector
is by default slower then any array, besides I do not need any
functionality that is provided by std::vector
How big are these arrays going to be ?
Not quite big the max value is around 1000 elements, in most cases
~100.
I didn't see anybody mention this, but why are you allocating one less than what seems to be the "number of columns"?
This is an atavism. That's how it works for me :)


OK, thanks everyone for the replies, but that was my fault that I
didn't emphasize that my performance problem does not involve
allocation. It is completely in setting all elements of a particular 2D
array to zero. I'm creating this array only once and then i might need
to zero it tens thousands of times.

Up until now the best option I could come up with is:
~~~~~~~~~~~~~~~~~~
for(int i = 0; i < col - 1; i++) {
memset(mx, 0, sizeof(double)*row);
}


If you use the method I posted, you wouldn't have to interatore through
each column of the array to set it.
With the method I posted, you can do the following:
memset(mx[0], 0, sizeof(double)*row*col);

You can just make one call to memset.
The method I posted allocates a single block for the 2D array, and then
devides it in sections to set the main array of pointers.
Since it's one continuous block, you can just use one call to memset.
 
G

Gavin Deane

First answering your questions:

2Bob Hairgrove
I didn't want to create 1D array or use std::vector, because working
with 1D array involves to much address arithmetic's, and std::vector
is by default slower then any array, besides I do not need any
functionality that is provided by std::vector

std::vector provides memory management, exception safety and a clean
and simple way of passing the data between functions and/or containing
the data within an object. You must be writing a very trivial C++
program if you don't need any of that functionality.

Gavin Deane
 
A

Amadeus W. M.

Hello All,

I'm writing cross-platform code so i'm bound to standards. Here is
the code I have:
~~~~~~~~~~~~~~~~~~
double **mx = new double*[col - 1];
for(int i = 0; i < col - 1; i++) {
mx = new double[row];
sizeOfMx += sizeof(double) * row;
}
//I need to fill "mx" with zeros as fast as it is possible
~~~~~~~~~~~~~~~~~~~

So as I mentioned in comment, i have problem with finding the extremely
fast way to fill the 2D dynamic array with zeros. Does anyone know
something I that might help me?

Thanks,
-R.


If you have to zero out repeatedly the double ** array, then I think the
fastest way is to call memset on a contiguous array (along the lines
Axter suggested). It's not just how many times you allocate/deallocate
memory. Once it's allocated, you only call memset once each time you need
to zero out the array, if the array is contiguous. And memset might be
optimized for the different architectures.

It's not good to have MANY large, contiguous arrays, because they may not
fit in memory. For instance, an image is invariably stored contiguously,
but how many images can you keep in memory simultaneously? Certainly not
a whole movie. Otherwise it's ok. Contiguous arrays of size O(1000) don't
pose any threat.

Now if you're concerned about page hits/misses, caches and virtual memory,
then you may have to unroll the loops, or resort to other tricks that
people in sci.math.num-analysis will most certainly know better.
 
R

romayankin

2Axter regarding your code. The thing is that my 2D array is not
rectangular. And therefore your approach will cause some unnecessary
memory overheads. But thanks for the idea i might implement it as an
"speed optimized" option.

2Gianni
BTW - the address arithmetic you feel is causing you performance issues
is far more likely to be less of an issue than the memory fetch for the
address you're looking for.
I didn't mean it would affect the speed, my saying that it involves too
much address arithmetic I meant that the code gets less crystal and
besides of that, its easier to make a mistake when you always have to
multiply iterator by the length of the row. And you right its more
likely that retrieving address using square brackets is usually less
efficient then using incrementing pointers.

So ... as far as i'm having non-rectangular 2D array the best way to
fill it with zeros is:
~~~~~~~~~~~~~~~~~~~~~~~~~~
for(int i = 0; i < col - 1; ++i)
{
memset(mx, 0, sizeof(double)*rowLen);
}
~~~~~~~~~~~~~~~~~~~~~~~~~~
or am I missing something again?
 
J

John Carson

2Axter regarding your code. The thing is that my 2D array is not
rectangular. And therefore your approach will cause some unnecessary
memory overheads. But thanks for the idea i might implement it as an
"speed optimized" option.

The code in your original post was for a rectangular array. However, if you
have an array of row lengths, then Axter's approach can easily be modified
to allocate a block equal in size to the sum of those row lengths. Pointers
can easily be initialized to point at the appropriate places.
 
D

Daniel T.

First answering your questions:

2Bob Hairgrove
I didn't want to create 1D array or use std::vector, because working
with 1D array involves to much address arithmetic's, and std::vector
is by default slower then any array, besides I do not need any
functionality that is provided by std::vector

You are quite wrong. I once had a rather good C programmer insist the
same thing so we wrote a test program. Once the compiler was through
optimizing the code, the vector was actually 5% *faster* than anything
he could come up with.

Worst case, use a one dimensional array encapsulated in a class that
does the address arithmetic for you.
 
R

romayankin

Okey thanks, I guess then my question has been answered. Thanks for
everyone who has participated in this discussion
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top