Tips on optimizing these functions

Discussion in 'C Programming' started by Andrea Taverna, Sep 27, 2008.

  1. Hello everyone,

    I wrote a bunch of recursive functions to operate on multi-dimensional
    matrices. The matrices are allocated dynamically in a non-contiguous way,
    i.e. as an array of pointers pointing to arrays of data,or other pointers
    if the matrix has more than 2 dimensions.

    The parameters passed to these functions are:
    - current_dimension: counts (from 0 to dimensions-1) the matrix
    dimension on which the function is working, it's the variable passed on
    the stack by the recursion
    - dimensions: number of matrix's dimensions
    - elem_size: size of the matrix's elements
    - dimensions_sizes: a vector containing the 'size' of each dimension
    For example, to work on a 10x20 matrix of integers, following the
    ordering of above, we would pass:
    (0,2,sizeof(int),(unsigned int [2]){10,20})
    for a 10x20x15 one, we would pass
    (0,3,sizeof(int),(unsigned int [3]){10,20,15})

    The functions work fast for allocation and freeing, 'cause calls to
    malloc and free take up most of the execution time. They're somewhat slow
    at copying or initialising matrices. For initialization I mean assign a
    scalar value to the elements of the matrix.

    I've done some benchmarks with copying and initialisation. Compared to a
    specific-nested-loop solution, the functions take up to twice the time.
    However, turning on some optimization flags, specifically '-O3' with gcc,
    the gap between the recursive and the specific solution reduces to 20%.

    So, have you got any advice about optimizing this code?
    Other suggestions are welcomeas well.

    TIA

    Andrea

    Here follows the copying function. The initialising function is almost
    identical

    NB: to better understand the code you should imagine to work with a bi-
    dimensional matrix (implemented as a pointer to pointer in the code). The
    recursive step casts either the matrix to a vector, if the function
    reached the elements' dimension, ending recursion, or the rows of the
    matrix to a bi-dimensional matrix (again, pointer to pointer), continuing
    recursion.

    //////////////////////////////////

    typedef unsigned char byte;

    // this one copy one row of the matrix. The row is supposed to store the
    value of elements, not pointers
    void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
    int n)
    {
    unsigned short length;

    byte* d1,*d2;

    d1 = (byte*)dest;
    d2 = (byte*)src;

    // copy byte to byte
    while (n > 0)
    {
    for (length = 0; length < elem_size; length++)
    {
    (*d1) = (*d2);
    d1++;
    d2++;
    };
    n--;
    };
    }

    // this is the recursive function
    void _vec_copy(byte current_dimension, byte dimensions,unsigned short
    elem_size, unsigned int* dimensions_size, void** restrict dest, void**
    restrict src)
    {
    int i; // row index

    if (current_dimension < dimensions)
    {
    if (current_dimension == dimensions -1)
    {
    _copy_row((void*)dest, (void*)src, elem_size,dimensions_size
    [current_dimension]);
    }
    else
    {
    for (i = 0; i < dimensions_size[current_dimension]; i++)
    _vec_copy(current_dimension+1, dimensions,
    elem_size,dimensions_size, (void**)dest, (void**)src);
    };
    };
     
    Andrea Taverna, Sep 27, 2008
    #1
    1. Advertising

  2. Andrea Taverna

    Tim Prince Guest

    Andrea Taverna wrote:

    > I've done some benchmarks with copying and initialisation. Compared to a
    > specific-nested-loop solution, the functions take up to twice the time.
    > However, turning on some optimization flags, specifically '-O3' with gcc,
    > the gap between the recursive and the specific solution reduces to 20%.
    >
    > So, have you got any advice about optimizing this code?
    > Other suggestions are welcomeas well.


    > typedef unsigned char byte;
    >
    > // this one copy one row of the matrix. The row is supposed to store the
    > value of elements, not pointers
    > void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
    > int n)
    > {
    > unsigned short length;
    >
    > byte* d1,*d2;
    >
    > d1 = (byte*)dest;
    > d2 = (byte*)src;
    >
    > // copy byte to byte
    > while (n > 0)
    > {
    > for (length = 0; length < elem_size; length++)
    > {
    > (*d1) = (*d2);
    > d1++;
    > d2++;
    > };
    > n--;
    > };
    > }
    >

    This is so dependent on the platform that we could justifiably argue you
    should choose one, and go to a forum associated with that platform.
    Do any of the compilers you use take advantage of restrict?
    If elem_size happens to match frequently the size of a stdint type, you
    will need to switch case the code so as to remove the inner loop for those
    cases.
    Some compilers automatically substitute a run-time library copy function
    which invokes all the usual memcpy() optimizations (align destination,
    move groups of bytes per instruction).
    If you wrote memcpy() in line, that would work well with certain
    compilers, not so well with others (possibly depending on command line
    options and which run time library you choose). If you are somehow
    prohibited from using restrict, writing in memcpy() makes the same assertion.
     
    Tim Prince, Sep 27, 2008
    #2
    1. Advertising

  3. On Sat, 27 Sep 2008 14:13:50 +0200 (CEST), Andrea Taverna
    <> wrote:

    snip discussion of matrix philosophy

    >typedef unsigned char byte;
    >
    >// this one copy one row of the matrix. The row is supposed to store the
    >value of elements, not pointers
    >void _copy_row(void* dest, void* src, unsigned short elem_size, unsigned
    >int n)
    > {
    > unsigned short length;
    >
    > byte* d1,*d2;
    >
    > d1 = (byte*)dest;
    > d2 = (byte*)src;
    >
    >// copy byte to byte
    > while (n > 0)
    > {
    > for (length = 0; length < elem_size; length++)
    > {
    > (*d1) = (*d2);
    > d1++;
    > d2++;
    > };
    > n--;
    > };
    > }


    Each element consists of elem_size contiguous bytes. Each row
    consists of n contiguous elements. Therefore, each row must consist
    of n*elem_size contiguous bytes.

    The entire body of your function can be replaced with
    memcpy(dest, src, (size_t)n*elem_size);

    In fact, the entire function can be deleted and any call to the
    function replaced with the above statement.

    Either substitution will have the additional benefit of not invoking
    undefined behavior if any of the elements are indeterminate.

    --
    Remove del for email
     
    Barry Schwarz, Sep 27, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xiangliang Meng
    Replies:
    1
    Views:
    1,611
    Victor Bazarov
    Jun 21, 2004
  2. Thomas

    Optimizing tips for os.listdir

    Thomas, Sep 27, 2004, in forum: Python
    Replies:
    9
    Views:
    455
    G. S. Hayes
    Sep 29, 2004
  3. Esmail

    Optimizing math functions

    Esmail, May 23, 2009, in forum: Python
    Replies:
    9
    Views:
    619
    Beni Cherniavsky
    May 31, 2009
  4. Esmail

    Re: Optimizing math functions

    Esmail, May 24, 2009, in forum: Python
    Replies:
    2
    Views:
    263
    Esmail
    May 24, 2009
  5. Replies:
    1
    Views:
    3,166
    Arne Vajhøj
    Jul 31, 2012
Loading...

Share This Page