Multidimensional array performance

Discussion in 'C++' started by Gregory.A.Book@gmail.com, Sep 30, 2006.

  1. Guest

    I'm working with displaying and manipulating very large image sets. The
    program handles anything from 2D images to 4D RGB volumes in a
    time-series. I've been using dynamically allocated arrays to accomplish
    this, but having recently added the ability to load and display 4D
    data, I've run into some significant performance issues.

    For a dataset of size 256x176x176x1, it takes about 30 seconds to
    allocate the array. Loading the data into the array from 176 files only
    take a few seconds. Accessing the array is fine, I'm able to loop
    through the entire volume in a couple seconds. When its time to
    deallocate the memory, it takes a couple minutes. This is on a P4
    2.4GHz with 1GB RAM. I'm using the code below to allocate and
    deallocate the memory.

    I've read about alternate methods for multidimensional arrays,
    including templates. I've also read about memory pools, but haven't
    seen anything about how to allocate a multidimensional array inside of
    the pool. I also dont think a pool is ideal because I'll only be
    (de)allocating memory when the user program loads or unloads a dataset.
    I've also thought about just declaring a single array in memory and
    accessing elements by multiplication (ie array[row*col*plane])

    What method would be best to quickly allocate and deallocate large
    chunks of memory?

    Thanks,
    Greg


    -------------- DECLARATION -----------------
    unsigned short int ****imgDataINT16U; /* unsigned 16-bit */


    --------------- ALLOCATION -------------------
    int i, j, k;
    /* setup the image buffer */
    imgData->imgDataINT16U = new unsigned short int***[YSize];
    for (i=0; i < YSize;i++)
    imgData->imgDataINT16U = new unsigned short int**[XSize];
    for (i=0; i < YSize;i++) {
    for (j=0; j < XSize;j++)
    imgData->imgDataINT16U[j] = new unsigned short int*[ZSize];
    }
    for (i=0; i < YSize;i++) {
    for (j=0; j < XSize;j++)
    for (k=0; k < ZSize;k++)
    imgData->imgDataINT16U[j][k] = new unsigned short int[TSize];
    }


    --------------- DEALLOCATION -------------
    for (i=0; i < imgInfo->GetVolumeYSize();i++) {
    for (j=0; j < imgInfo->GetVolumeXSize();j++)
    for (k=0; k < imgInfo->GetVolumeZSize();k++)
    delete[] imgDataINT16U[j][k];
    progDlg.Update(i,"Unloading Data");
    }
    for (i=0; i < imgInfo->GetVolumeYSize();i++) {
    for (j=0; j < imgInfo->GetVolumeXSize();j++)
    delete[] imgDataINT16U[j];
    }
    for (i=0; i < imgInfo->GetVolumeYSize();i++)
    delete[] imgDataINT16U;
    delete[] imgDataINT16U;

    ------------ GENERAL ACCESS ------------
    voxelValue = imgData->imgDataINT16U[col][row][slice][0];
     
    , Sep 30, 2006
    #1
    1. Advertising

  2. <> wrote in message
    news:...
    : I'm working with displaying and manipulating very large image sets.
    The
    : program handles anything from 2D images to 4D RGB volumes in a
    : time-series. I've been using dynamically allocated arrays to
    accomplish
    : this, but having recently added the ability to load and display 4D
    : data, I've run into some significant performance issues.
    :
    : For a dataset of size 256x176x176x1, it takes about 30 seconds to
    : allocate the array. Loading the data into the array from 176 files
    only
    : take a few seconds.
    ....
    : -------------- DECLARATION -----------------
    : unsigned short int ****imgDataINT16U; /* unsigned 16-bit */
    : --------------- ALLOCATION -------------------
    : int i, j, k;
    : /* setup the image buffer */
    : imgData->imgDataINT16U = new unsigned short int***[YSize];
    : for (i=0; i < YSize;i++)
    : imgData->imgDataINT16U = new unsigned short int**[XSize];
    : for (i=0; i < YSize;i++) {
    : for (j=0; j < XSize;j++)
    : imgData->imgDataINT16U[j] = new unsigned short int*[ZSize];
    : }
    : for (i=0; i < YSize;i++) {
    : for (j=0; j < XSize;j++)
    : for (k=0; k < ZSize;k++)
    : imgData->imgDataINT16U[j][k] = new unsigned short int[TSize];
    : }
    Think of it: if TSize=1, the every single unsigned short 'voxel'
    will be allocated in its own memory block! This represents an
    space overhead of a pointer+several bytes (depending on your
    library implementation, maybe 16!) for each 2 bytes of data !
    This of course also means expensive initial allocations.
    But during data manipulation as well, performance will be degraded
    by the many misaligned memory accesses when manipulating the data.


    : What method would be best to quickly allocate and deallocate large
    : chunks of memory?

    A first step would be to modify your code to use something like:
    class Buf4D_Int16U {
    public:
    Buf4D_Int16U(int xSize, int ySize, int zSize, int tSize)
    : p_( new unsigned short(xSize*ySize*zSize*tSize) )
    , xSize_(xSize)
    , ySize_(ySize)
    , zSize_(zSize)
    , tSize_(tSize)
    { }

    ~Buf4D_Int16U() { delete p_; }

    // Element access (non-const and const)
    unsigned short& operator(int x,int y,int z,int t)
    { return p_[((x*ySize_+y)*zSize_+z_)*tSize+t]; }
    unsigned short operator(int x,int y,int z,int t) const
    { return p_[((x*ySize_+y)*zSize_+z_)*tSize+t]; }

    private:
    unsigned short* p_;
    int xSize_,ySize_,zSize_,tSize_;
    Buf4D_Int16U(Buf4D_Int16U const&); //disabled copy-ctr
    void operator=(Buf4D_Int16U const&); //disabled copy-assign
    };

    : ------------ GENERAL ACCESS ------------
    : voxelValue = imgData->imgDataINT16U[col][row][slice][0];

    Becomes:
    voxelValue = imgData(col,row,slice,0);

    Pass a reference to the imgData object instead of the
    unsigned short**** pointers when passing the buffer to functions.

    Note that you probably want to use an std::vector<short> instead
    of the naked pointer p_ (but this may have a performance cost,
    especially in non-optimized builds...).


    Many further improvements are possible, but this first step
    should be a no-brainer...


    Cheers,
    Ivan
    --
    http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
    Brainbench MVP for C++ <> http://www.brainbench.com
     
    Ivan Vecerina, Sep 30, 2006
    #2
    1. Advertising

  3. Guest

    That looks like an excellent solution. I never thought about putting
    the indexing math into a function.
    How does the performance of using [n][m][...] notation compare to a
    function call when accessing elements?

    > Many further improvements are possible, but this first step
    > should be a no-brainer...


    What other ways are there of improving large multidimensional arrays?

    -Greg



    Ivan Vecerina wrote:
    > <> wrote in message
    > news:...
    > : I'm working with displaying and manipulating very large image sets.
    > The
    > : program handles anything from 2D images to 4D RGB volumes in a
    > : time-series. I've been using dynamically allocated arrays to
    > accomplish
    > : this, but having recently added the ability to load and display 4D
    > : data, I've run into some significant performance issues.
    > :
    > : For a dataset of size 256x176x176x1, it takes about 30 seconds to
    > : allocate the array. Loading the data into the array from 176 files
    > only
    > : take a few seconds.
    > ...
    > : -------------- DECLARATION -----------------
    > : unsigned short int ****imgDataINT16U; /* unsigned 16-bit */
    > : --------------- ALLOCATION -------------------
    > : int i, j, k;
    > : /* setup the image buffer */
    > : imgData->imgDataINT16U = new unsigned short int***[YSize];
    > : for (i=0; i < YSize;i++)
    > : imgData->imgDataINT16U = new unsigned short int**[XSize];
    > : for (i=0; i < YSize;i++) {
    > : for (j=0; j < XSize;j++)
    > : imgData->imgDataINT16U[j] = new unsigned short int*[ZSize];
    > : }
    > : for (i=0; i < YSize;i++) {
    > : for (j=0; j < XSize;j++)
    > : for (k=0; k < ZSize;k++)
    > : imgData->imgDataINT16U[j][k] = new unsigned short int[TSize];
    > : }
    > Think of it: if TSize=1, the every single unsigned short 'voxel'
    > will be allocated in its own memory block! This represents an
    > space overhead of a pointer+several bytes (depending on your
    > library implementation, maybe 16!) for each 2 bytes of data !
    > This of course also means expensive initial allocations.
    > But during data manipulation as well, performance will be degraded
    > by the many misaligned memory accesses when manipulating the data.
    >
    >
    > : What method would be best to quickly allocate and deallocate large
    > : chunks of memory?
    >
    > A first step would be to modify your code to use something like:
    > class Buf4D_Int16U {
    > public:
    > Buf4D_Int16U(int xSize, int ySize, int zSize, int tSize)
    > : p_( new unsigned short(xSize*ySize*zSize*tSize) )
    > , xSize_(xSize)
    > , ySize_(ySize)
    > , zSize_(zSize)
    > , tSize_(tSize)
    > { }
    >
    > ~Buf4D_Int16U() { delete p_; }
    >
    > // Element access (non-const and const)
    > unsigned short& operator(int x,int y,int z,int t)
    > { return p_[((x*ySize_+y)*zSize_+z_)*tSize+t]; }
    > unsigned short operator(int x,int y,int z,int t) const
    > { return p_[((x*ySize_+y)*zSize_+z_)*tSize+t]; }
    >
    > private:
    > unsigned short* p_;
    > int xSize_,ySize_,zSize_,tSize_;
    > Buf4D_Int16U(Buf4D_Int16U const&); //disabled copy-ctr
    > void operator=(Buf4D_Int16U const&); //disabled copy-assign
    > };
    >
    > : ------------ GENERAL ACCESS ------------
    > : voxelValue = imgData->imgDataINT16U[col][row][slice][0];
    >
    > Becomes:
    > voxelValue = imgData(col,row,slice,0);
    >
    > Pass a reference to the imgData object instead of the
    > unsigned short**** pointers when passing the buffer to functions.
    >
    > Note that you probably want to use an std::vector<short> instead
    > of the naked pointer p_ (but this may have a performance cost,
    > especially in non-optimized builds...).
    >
    >
    > Many further improvements are possible, but this first step
    > should be a no-brainer...
    >
    >
    > Cheers,
    > Ivan
    > --
    > http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
    > Brainbench MVP for C++ <> http://www.brainbench.com
     
    , Oct 1, 2006
    #3
  4. <> wrote in message
    news:...
    : That looks like an excellent solution. I never thought about putting
    : the indexing math into a function.
    : How does the performance of using [n][m][...] notation compare to a
    : function call when accessing elements?
    You should check using a profiler on your specific platform.
    Some reasoning though:
    - In Debug mode/unoptimized builds, you will suffer the overhead
    of a function call. This may or may not matter.
    It may actually be useful to check, within the accessor function,
    at the expense of an additional runtime overhead, that all indices
    are within range (e.g. ASSERT(x<xSize) if x and xSize are unsigned).
    In release mode, this should not matter.
    - On a modern HW architecture, the contiguous (and cache-friendly)
    memory accesses to [xSize,ySize,zSize], and the multiplications
    to compute an index should be less expensive than the multiple
    dereferencing of pointers. But take this as a wild guess.

    : > Many further improvements are possible, but this first step
    : > should be a no-brainer...
    :
    : What other ways are there of improving large multidimensional arrays?

    For very large data sets, it can be useful to tile the data in
    memory (e.g. store small sub-cubes of the data in one contiguous
    memory block). This can, for example, improve the performance
    of some multi-planar reconstructions (memory locality, can
    be more cache-friendly, especially if some of the data does
    not fit in main memory).
    But this will complicate the implementation of most algorithms.
    Since your program ran with all the allocation space overhead
    you had, this should not be an issue for you.

    For many algorithms, you will save time by using relative offsets to
    reach the neighbors of a pixel rather than to recompute each index.
    E.g.: tStep=1, zStep=tSize, yStep=zStep*zSize, xStep=yStep*ySize;
    When you want to acess the neighbors of a voxel at pv,
    you can directly address pv[+zStep], pv[-zStep] etc...
    Even better, by adding a 'margin' around your data set,
    you can linearly scan and transform the data step from
    begin() to end() without any full-index computation.

    In a different vein, many algorithms today can be
    reimplemented to leverage the graphics processor
    of your display card:
    http://www.google.com/search?q=GPU processing

    But remember the saying about premature optimization...


    hth
    --
    http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
    Brainbench MVP for C++ <> http://www.brainbench.com
     
    Ivan Vecerina, Oct 1, 2006
    #4
  5. Guest

    Thanks! All your information will be helpful.
    -Greg


    Ivan Vecerina wrote:
    > <> wrote in message
    > news:...
    > : That looks like an excellent solution. I never thought about putting
    > : the indexing math into a function.
    > : How does the performance of using [n][m][...] notation compare to a
    > : function call when accessing elements?
    > You should check using a profiler on your specific platform.
    > Some reasoning though:
    > - In Debug mode/unoptimized builds, you will suffer the overhead
    > of a function call. This may or may not matter.
    > It may actually be useful to check, within the accessor function,
    > at the expense of an additional runtime overhead, that all indices
    > are within range (e.g. ASSERT(x<xSize) if x and xSize are unsigned).
    > In release mode, this should not matter.
    > - On a modern HW architecture, the contiguous (and cache-friendly)
    > memory accesses to [xSize,ySize,zSize], and the multiplications
    > to compute an index should be less expensive than the multiple
    > dereferencing of pointers. But take this as a wild guess.
    >
    > : > Many further improvements are possible, but this first step
    > : > should be a no-brainer...
    > :
    > : What other ways are there of improving large multidimensional arrays?
    >
    > For very large data sets, it can be useful to tile the data in
    > memory (e.g. store small sub-cubes of the data in one contiguous
    > memory block). This can, for example, improve the performance
    > of some multi-planar reconstructions (memory locality, can
    > be more cache-friendly, especially if some of the data does
    > not fit in main memory).
    > But this will complicate the implementation of most algorithms.
    > Since your program ran with all the allocation space overhead
    > you had, this should not be an issue for you.
    >
    > For many algorithms, you will save time by using relative offsets to
    > reach the neighbors of a pixel rather than to recompute each index.
    > E.g.: tStep=1, zStep=tSize, yStep=zStep*zSize, xStep=yStep*ySize;
    > When you want to acess the neighbors of a voxel at pv,
    > you can directly address pv[+zStep], pv[-zStep] etc...
    > Even better, by adding a 'margin' around your data set,
    > you can linearly scan and transform the data step from
    > begin() to end() without any full-index computation.
    >
    > In a different vein, many algorithms today can be
    > reimplemented to leverage the graphics processor
    > of your display card:
    > http://www.google.com/search?q=GPU processing
    >
    > But remember the saying about premature optimization...
    >
    >
    > hth
    > --
    > http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
    > Brainbench MVP for C++ <> http://www.brainbench.com
     
    , Oct 1, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dave Bazell

    slice of multidimensional array

    Dave Bazell, Jul 23, 2003, in forum: Perl
    Replies:
    2
    Views:
    4,084
  2. epigram
    Replies:
    1
    Views:
    10,825
    =?Utf-8?B?c29jaWV0b3BpYS5uZXQ=?=
    Jul 16, 2005
  3. Ben
    Replies:
    8
    Views:
    12,650
    Eki Y. Baskoro
    Dec 18, 2003
  4. Huub
    Replies:
    6
    Views:
    611
  5. geletine
    Replies:
    12
    Views:
    21,675
    Fred Kleinschmidt
    May 5, 2006
Loading...

Share This Page