code optimization

Discussion in 'C Programming' started by syed, Sep 24, 2005.

  1. syed

    syed Guest

    I want clues to optimize a function that copies elements of a
    2-dimensional array to another. Is there a way to optimize this piece
    of code so that it runs faster?



    #define RIDX(i,j,n) ((i)*(n)+(j))

    void myrotate(int dim, pixel *src, pixel *dst)
    {

    for (i = 0; i < dim; i++)
    for (j = 0; j < dim; j++)
    dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];


    }
    syed, Sep 24, 2005
    #1
    1. Advertising

  2. In article <>,
    syed <> wrote:

    >I want clues to optimize a function that copies elements of a
    >2-dimensional array to another. Is there a way to optimize this piece
    >of code so that it runs faster?


    Probably not much. You are changing the order of the elements, so
    you can't use memcpy().

    >#define RIDX(i,j,n) ((i)*(n)+(j))


    Depending on what you are doing with the array, you might be able to
    avoid copying it altogether, and instead use a different version of
    RIDX to access the elements in the new order.

    -- Richard
    Richard Tobin, Sep 24, 2005
    #2
    1. Advertising

  3. One thing up front: C itself doesn't define anything like performance. I'll
    somewhat assume a typical desktop CPU.

    syed wrote:
    > I want clues to optimize a function that copies elements of a
    > 2-dimensional array to another. Is there a way to optimize this piece
    > of code so that it runs faster?
    >
    >
    >
    > #define RIDX(i,j,n) ((i)*(n)+(j))


    Why? Why not use an inline function?

    > void myrotate(int dim, pixel *src, pixel *dst)


    Okay, a few points here:
    - If 'dim' doesn't change, you could make it a constant giving the compiler
    possibility to optimise code better.
    - You're not going to modify 'src', so make that a pointer to const pixel.
    Note that this won't necessarily influence performance.
    - There's the 'restrict' property in C99 which (AFAIK) can be used to tell
    the compiler that the two ranges don't overlap. This might make things
    faster as the compiler can prefetch the read-values earlier.

    > {
    >
    > for (i = 0; i < dim; i++)


    Hmm, where is 'i'? Making it a global doesn't speed up things. Also:
    assert(src && dst && (dim>0));


    > for (i = 0; i < dim; i++)
    > for (j = 0; j < dim; j++)
    > dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];


    Hmm, not much that can be done here. What I'd try is to compute a pointer
    to the row in the inner loop in order not to redo this computation for
    every pixel. Maybe a "duff's device" would help, but I'd rather expect
    current compilers to unroll loops themselves.

    Lastly, the probably fastest way is when this is simply not done - instead
    you could compute the index differently when accessing the matrix.

    Uli
    Ulrich Eckhardt, Sep 24, 2005
    #3
  4. syed

    Syed Guest

    Hi

    Thanks for your reply, it was very helpful. I would like to iterate through
    the 2-dimensional array using a single loop (instead of two). Is it
    possible?

    Also, is there any room for loop-unrolling in the code snippet?

    Thanks again.


    Syed Ijaz

    -----
    #define RIDX(i,j,n) ((i)*(n)+(j))

    void myrotate(int dim, pixel *src, pixel *dst)
    {

    for (i = 0; i < dim; i++)
    for (j = 0; j < dim; j++)
    dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];


    }
    Syed, Sep 24, 2005
    #4
  5. syed

    Malcolm Guest

    "syed" <> wrote
    >
    >I want clues to optimize a function that copies elements of a
    > 2-dimensional array to another. Is there a way to optimize this piece
    > of code so that it runs faster?
    >
    >
    >
    > #define RIDX(i,j,n) ((i)*(n)+(j))
    >
    > void myrotate(int dim, pixel *src, pixel *dst)
    > {
    >
    > for (i = 0; i < dim; i++)
    > for (j = 0; j < dim; j++)
    > dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
    >
    >
    > }
    >


    You are calculating i and j * dim on each iteration.
    If you maintain an offset index and increment or decrement by dim on each
    pass, you might save a few instructions.
    Malcolm, Sep 24, 2005
    #5
  6. In article <dh3skm$h0s$-infra.bt.com>,
    Malcolm <> wrote:

    >> #define RIDX(i,j,n) ((i)*(n)+(j))


    >> dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];


    >You are calculating i and j * dim on each iteration.


    A reasonable compiler will avoid this.

    In general modern compilers are likely to be better at this kind of
    micro-optimisation than programmers, because they know the
    characteristics of the target processor. Hand-optimising for
    one architecture may make things worse on another.

    -- Richard
    Richard Tobin, Sep 24, 2005
    #6
  7. "Richard Tobin" <> wrote in message
    news:dh3ure$22go$...
    > In article <dh3skm$h0s$-infra.bt.com>,
    > Malcolm <> wrote:
    >
    > >> #define RIDX(i,j,n) ((i)*(n)+(j))

    >
    > >> dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];

    >
    > >You are calculating i and j * dim on each iteration.

    >
    > A reasonable compiler will avoid this.
    >
    > In general modern compilers are likely to be better at this kind of
    > micro-optimisation than programmers, because they know the
    > characteristics of the target processor. Hand-optimising for
    > one architecture may make things worse on another.


    Right. I'd add that it makes sense to look at the optimized assembler
    produced by one's particular compiler, as it may already be doing quite a
    few loop optimizations on it. Hand doing the optimizations may actually wind
    up defeating the compiler ones and make things worse. Many compilers do
    very, very well at optimizing loops such as this.

    -Walter Bright
    www.digitalmars.com free C, C++, D programming language compilers
    Walter Bright, Sep 25, 2005
    #7
  8. syed

    Old Wolf Guest

    Ulrich Eckhardt wrote:
    > One thing up front: C itself doesn't define anything like performance.
    > I'll somewhat assume a typical desktop CPU.
    >
    > syed wrote:
    >> I want clues to optimize a function that copies elements of a
    >> 2-dimensional array to another. Is there a way to optimize this piece
    >> of code so that it runs faster?
    >>
    >> #define RIDX(i,j,n) ((i)*(n)+(j))

    >
    > Why? Why not use an inline function?


    Inline functions are a C99 feature, although many (but not all)
    C89 compilers support them. One compiler I use will recognize the
    keyword, but never actually inline any function (which is
    conforming behaviour, in fact).

    Macros may be 'evil', but can also be useful IMHO.
    Old Wolf, Sep 26, 2005
    #8
  9. syed schrieb:
    > I want clues to optimize a function that copies elements of a
    > 2-dimensional array to another. Is there a way to optimize this piece
    > of code so that it runs faster?


    This depends on many things like processor type (RISC, CISC, VLIW,
    etc.), number of registers, number of pipelines, cache sizes
    (instruction, data), instruction set (SIMD, DSP) and available resources
    (RAM) of the architecture involved in the given scenario.

    So in general we cannot give you hints about how to optimize your code
    when we don't know the details. Anyway, I'm afraid discussing such
    details would be even more OT in this group.

    However, the compiler for your specific platform should give you good
    optimization and profiling options to find a way to improve the
    performance of your code. Probably there's an optimized library for such
    problems provided by a third party vendor.

    Regards
    Sascha
    Sascha Springer, Sep 26, 2005
    #9
  10. syed wrote:
    >
    > I want clues to optimize a function that copies elements of a
    > 2-dimensional array to another. Is there a way to optimize this piece
    > of code so that it runs faster?
    >
    > #define RIDX(i,j,n) ((i)*(n)+(j))
    >
    > void myrotate(int dim, pixel *src, pixel *dst)
    > {
    >
    > for (i = 0; i < dim; i++)
    > for (j = 0; j < dim; j++)
    > dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
    > }


    In the "src[RIDX(i,j,dim)]" expression, "i" and "dim" don't change for the
    duration of the inner for-loop. Since your RIDX macro will cause these two
    same values to be multiplied each time through the loop, you can optimize
    this away by calculating this only once:

    for (i = 0; i < dim; i++)
    {
    int temp = i*dim;
    for (j = 0; j < dim; j++)
    dst[RIDX(dim-1-j, i, dim)] = src[temp+j];
    }

    You can do this one better by not calcluating the addition each time,
    either:

    for (i = 0; i < dim; i++)
    {
    pixel *src2 = &src[RIDX(i,0,dim)];
    for (j = 0; j < dim; j++)
    dst[RIDX(dim-1-j, i, dim)] = src2[j];
    }

    A similar optimization can be done on the dst[] array by realizing that the
    subscript will be decrementing by dim each time through the loop. Calculate
    the initial pointer as "&dst[RIDX(dim-1,i,dim)]" and then decrement that
    pointer by "dim" each time through the loop.

    for (i = 0; i < dim; i++)
    {
    pixel *src2 = &src[RIDX(i,0,dim)];
    pixel *dst2 = &dst[RIDX(dim-1,i,dim)];
    for (j = 0; j < dim; j++)
    {
    *dst2 = *src2++;
    dst2 -= dim;
    }
    }

    You have now eliminated the need for almost all of the multiplcations,
    going from dim^2 to dim*2.

    --
    +-------------------------+--------------------+-----------------------------+
    | Kenneth J. Brody | www.hvcomputer.com | |
    | kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
    +-------------------------+--------------------+-----------------------------+
    Don't e-mail me at: <mailto:>
    Kenneth Brody, Sep 26, 2005
    #10
  11. In article <dh2arp$1f9g$>,
    Richard Tobin <> wrote:
    >
    > In article <>,
    > syed <> wrote:
    >
    > >#define RIDX(i,j,n) ((i)*(n)+(j))

    >
    > Depending on what you are doing with the array, you might be able to
    > avoid copying it altogether, and instead use a different version of
    > RIDX to access the elements in the new order.


    Specifically...

    #define ROTATED_RIDX(i,j,n) ((j)*(n)+((dim)-(i)-1))
    Anonymous 7843, Sep 26, 2005
    #11
  12. In article <wh_Ze.16262$mH.10320@fed1read07>, I wrote:
    > In article <dh2arp$1f9g$>,
    > Richard Tobin <> wrote:
    > >
    > > In article <>,
    > > syed <> wrote:
    > >
    > > >#define RIDX(i,j,n) ((i)*(n)+(j))

    > >
    > > Depending on what you are doing with the array, you might be able to
    > > avoid copying it altogether, and instead use a different version of
    > > RIDX to access the elements in the new order.

    >
    > Specifically...
    >
    > #define ROTATED_RIDX(i,j,n) ((j)*(n)+((dim)-(i)-1))


    Sigh, let me fix it before someone else does it for me:

    #define ROTATED_RIDX(i,j,n) ((j)*(n)+((n)-(i)-1))
    Anonymous 7843, Sep 26, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Pete Wright

    Re: Code optimization...

    Pete Wright, Jul 5, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    365
    David Waz...
    Jul 6, 2003
  2. Russell Wallace
    Replies:
    7
    Views:
    392
    Russell Wallace
    Jun 11, 2004
  3. ashu

    optimization of vhdl code

    ashu, May 8, 2006, in forum: VHDL
    Replies:
    5
    Views:
    3,996
  4. joshc
    Replies:
    14
    Views:
    762
    Keith Thompson
    Jan 14, 2005
  5. Ravikiran

    Zero Optimization and Sign Optimization???

    Ravikiran, Nov 17, 2008, in forum: C Programming
    Replies:
    22
    Views:
    831
    Thad Smith
    Nov 24, 2008
Loading...

Share This Page