speeding up C code

Discussion in 'C Programming' started by dvumani@hotmail.com, May 6, 2005.

  1. Guest

    I have C code which computes the row sums of a matrix, divide each
    element of each row with the row sum and then compute the column sum of
    the resulting matrix. Is there a way I can speed up the code in C:

    /* Here is the code */
    // Table is "wij"

    int i, j;
    for(i = 0; i < N; ++i)
    {
    for(j = 0; j < N; ++j)
    {
    sum_over_j_wij += wij[i,j];
    }
    for(j = 0; j < N; ++j)
    {
    sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];
    }
    }
    , May 6, 2005
    #1
    1. Advertising

  2. wrote:
    > I have C code which computes the row sums of a matrix, divide each
    > element of each row with the row sum and then compute the column sum of
    > the resulting matrix. Is there a way I can speed up the code in C:
    >
    > /* Here is the code */
    > // Table is "wij"
    >
    > int i, j;
    > for(i = 0; i < N; ++i)
    > {
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_j_wij += wij[i,j];


    By "win[i,j]" you probably mean "win[j]", don't you?

    > }
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];
    > }
    > }


    There are some loop unrolling techniques that might help. You can also
    avoid indexing all the time by using pointers and advancing those using
    the built-in ++ operator or += operator.

    Next time, please refrain from typing your code into the message and
    instead use the "copy-and-paste" capability offered by all modern Otes
    and applications.

    V
    Victor Bazarov, May 6, 2005
    #2
    1. Advertising

  3. Phlip Guest

    dvumani wrote:

    > I have C code which computes the row sums of a matrix, divide each
    > element of each row with the row sum and then compute the column sum of
    > the resulting matrix. Is there a way I can speed up the code in C:


    > sum_over_j_wij += wij[i,j];


    Is wij[i,j] doing what you think it does?

    Have you time-tested this code to see if it's slow?

    Have you time-tested your entire application, to see if the speed of this
    code is relevant?

    After all that research, switch to C++ and look up "expression
    metatemplates". You will probably find several examples with matrices.

    --
    Phlip
    http://www.c2.com/cgi/wiki?ZeekLand
    Phlip, May 6, 2005
    #3
  4. red floyd Guest

    wrote:
    > I have C code which computes the row sums of a matrix, divide each
    > element of each row with the row sum and then compute the column sum of
    > the resulting matrix. Is there a way I can speed up the code in C:
    >
    > /* Here is the code */
    > // Table is "wij"
    >
    > int i, j;
    > for(i = 0; i < N; ++i)
    > {
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_j_wij += wij[i,j];
    > }
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];
    > }
    > }
    >


    1. If you're talking C, why are you posting to c.l.c++?

    2. Your code won't compile; x[i,j] may be a syntax error

    2.a.Your code might compile, but it may only provide x[j] ( or x, I
    don't remember whether the comma operator return the right or left
    operand in C).
    red floyd, May 6, 2005
    #4
  5. wrote:

    > I have C code which computes the row sums of a matrix, divide each
    > element of each row with the row sum and then compute the column sum of
    > the resulting matrix. Is there a way I can speed up the code in C:

    You have some errors in your code.

    Besides the error, perhaps you could see something by
    expanding your loop into a series.


    > /* Here is the code */
    > // Table is "wij"
    >
    > int i, j;
    > for(i = 0; i < N; ++i)
    > {
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_j_wij += wij[i,j];

    Here is the problem: wij[i,j]
    The comma operator is not a subscript operator.
    In this expression, it evaluates 'i', then 'j'
    and returns the value of 'j'. So this is equivalent
    to: wij[j]

    The syntax for a multiple dimension array is:
    wij[j]


    > }
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];

    See note above about "wij[i,j]".
    Syntax error: "sum_overj_wij[i)];"

    > }
    > }
    >


    For N = 1:
    sum_over_j_wij[0] = wij[0][0];
    sum_over_i[0] = wij[0][0] / (wij[0][0]); /* substitution */

    For N = 2:
    sum_over_j_wij[0] = wij[0][0] + wij[0][1];
    sum_over_i[0] = wij[0][0] / (wij[0][0] + wij[0][1])
    + wij[0][1] / (wij[0][0] + wij[0][1]);

    sum_over_j_wij[1] = wij[1][0] + wij[1][1];
    sum_over_i[1] = wij[1][0] / (wij[1][0] + wij[1][1])
    + wij[1][1] / (wij[1][1] + wij[1][1]);

    Follow this through N = 5.
    See if there are any commonalities or if the terms
    can be re-arranged so that the operation can be
    distributed (such as one pass is addition, another
    division, etc.)

    See also loop unrolling and also repetitive subtraction
    rather than division.

    Think about trying to "pipeline" the operations.

    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.comeaucomputing.com/learn/faq/
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
    http://www.sgi.com/tech/stl -- Standard Template Library
    Thomas Matthews, May 6, 2005
    #5
  6. red floyd wrote:
    > wrote:
    >
    >> I have C code which computes the row sums of a matrix, divide each
    >> element of each row with the row sum and then compute the column sum of
    >> the resulting matrix. Is there a way I can speed up the code in C:
    >>
    >> /* Here is the code */
    >> // Table is "wij"
    >>
    >> int i, j;
    >> for(i = 0; i < N; ++i)
    >> {
    >> for(j = 0; j < N; ++j)
    >> {
    >> sum_over_j_wij += wij[i,j];
    >> }
    >> for(j = 0; j < N; ++j)
    >> {
    >> sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];
    >> }
    >> }
    >>

    >
    > 1. If you're talking C, why are you posting to c.l.c++?


    Techniques to speed up some code are common to both languages.

    > 2. Your code won't compile; x[i,j] may be a syntax error


    Why? The comma operator exists in both languages. It may not be doing
    what's intended, but that's not the point.

    > 2.a.Your code might compile, but it may only provide x[j] ( or x, I
    > don't remember whether the comma operator return the right or left
    > operand in C).


    Then you might consider studying before attempting to reply to both
    newsgroups. Just a thought...

    V
    Victor Bazarov, May 6, 2005
    #6
  7. wrote:

    > I have C code which computes the row sums of a matrix, divide each
    > element of each row with the row sum and then compute the column sum of
    > the resulting matrix. Is there a way I can speed up the code in C:


    Try this:

    > cat f.c

    #include <math.h>
    #include <stdlib.h>
    #include <stdbool.h>

    bool f( // false if any row sum is zero.
    const size_t m, // number of rows
    const size_t n, // number of columns
    const double w[m][n], // input matrix
    double_t* restrict sum // output (columm sums)
    ) {
    for (size_t i = 0; i < m; ++i) {
    sum = 0.0;
    }
    for (size_t i = 0; i < m; ++i) {
    double_t t = 0.0; // row sum accumulator
    for (size_t j = 0; j < n; ++j) {
    t += w[j];
    }
    if (0.0 == t)
    return false;
    for (size_t j = 0; j < n; ++j) {
    sum[j] += w[j]/t;
    }
    }
    return true;
    }

    > gcc -Wall -std=c99 -pedantic -O3 -c f.c


    Performance will degrade when rows of array w are too large
    to keep in level 1 cache along with the column sums.
    E. Robert Tisdale, May 6, 2005
    #7
  8. On Fri, 06 May 2005 11:55:47 -0400, Victor Bazarov wrote:

    > wrote:
    >> I have C code which computes the row sums of a matrix, divide each
    >> element of each row with the row sum and then compute the column sum of
    >> the resulting matrix. Is there a way I can speed up the code in C:
    >>
    >> /* Here is the code */
    >> // Table is "wij"
    >>
    >> int i, j;
    >> for(i = 0; i < N; ++i)
    >> {
    >> for(j = 0; j < N; ++j)
    >> {
    >> sum_over_j_wij += wij[i,j];


    sum_over_j_wij hasn't visibly been initialised anywhere. I assume these
    array elements should be initialised to 0 before the loop.

    > By "win[i,j]" you probably mean "win[j]", don't you?
    >
    >> }
    >> for(j = 0; j < N; ++j)
    >> {
    >> sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];


    Similarly sum_over_i[j] hasn't visibly been initialised.

    >> }
    >> }

    >
    > There are some loop unrolling techniques that might help.


    The biggest time consumer here is likely to be the division. Division is
    usually a lot slower than other arithmetic operations. Loop unrolling
    probably won't help much in comparison, and is something the compiler
    might do for you anyway.

    > You can also
    > avoid indexing all the time by using pointers and advancing those using
    > the built-in ++ operator or += operator.


    Indexing is usually not an expensive operation, and compilers tend to be
    good at optimising it. It isn't uncommon for an indexed version of code
    to turn out faster than one using pointer arithmetic.

    You could experimet with the following (untested) code.

    int i, j;

    /* Initialise sum_over_i elements if necessary here */

    for(j = 0; j < N; ++j)
    {
    sum_over_i[j] = 0.0;
    }

    for(i = 0; i < N; ++i)
    {
    /* Assuming elements are of type double */
    const double *const wij_row = wij;
    double sum = 0.0;
    double sum_reciprocal;

    for(j = 0; j < N; ++j)
    {
    sum += wij_row[j];
    }

    sum_over_j_wij = sum;
    sum_reciprocal = 1.0 / sum;

    for(j = 0; j < N; ++j)
    {
    sum_over_i[j] += wij_row[j] * sum_reciprocal;
    }
    }

    wij_row may or may not help, the compiler could well optimise in a similar
    or perhaps better way. Multiplying by the reciprocal may be marginally
    less accurate than dividing.

    Lawrence
    Lawrence Kirby, May 6, 2005
    #8
  9. Guest

    wrote:
    > I have C code which computes the row sums of a matrix, divide each
    > element of each row with the row sum and then compute the column sum

    of
    > the resulting matrix. Is there a way I can speed up the code in C:
    >
    > /* Here is the code */
    > // Table is "wij"
    >
    > int i, j;
    > for(i = 0; i < N; ++i)
    > {
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_j_wij += wij[i,j];
    > }
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];
    > }
    > }


    As others have pointed out, you are using Pascal-style subscripts
    rather than C or C++ style. And you have a strange syntax error in
    there too. I'm going to assume all these arrays are "doubles".

    Remember, division, module and square root are all roughly the same
    speed are and very slow (as a rule of thumb you should think of them as
    about 10 times slower than addition.) So reducing their quantity in
    your inner loops is of prime importance in this case. So, if you can
    live with slight accuracy issues, the key "speeding up" consideration
    is conversion of division to reciprocal multiplication:

    for (i=0; i < N; i++) {
    for (s=0.0,j=0; j < N; j++) s += wij[j];
    s = 1.0 / s; /* if s == 0, you are SOL. */
    for (j=0; j < N; j++) sum_over_i[j] += s*wij[j];
    }

    Some compilers are not able to hoist out the wij calculation, so it
    might be useful to precalculate this as double *wptr = wij; and
    replace the instances of wij with wptr.

    Certainly, using a good vectorizing compiler, such as Intel's compiler
    will likely make a *huge* difference on code like this. Microsoft
    claims that their latest compilers have vectorization capabilities, but
    I have not verified this myself. In any event, the benefits of using
    the SIMD instruction set basically goes straight to the bottom line,
    especially in cases like this.

    If you are on a processor which has a "multiply accumulate" (PowerPC,
    Itanium, PA-RISC) instead of SIMD, you can invert the loops:

    for (i=0; i < N; i++) {
    for (s=0.0,j=0; j < N; j++) s += wij[j];
    recip_sum_over_j = 1.0 / s; /* if s == 0, you are SOL. */
    }

    for (j=0; j < N; j++) {
    for (s=0.0,i=0; i < N; i++) s += recip_sum_over_j*wij[j];
    sum_over_i[j] = s;
    }

    So you can try each possibility, and check your compiler settings
    (w.r.t SIMD and "Multiply-Accumulate") to see which one works better
    for your platform.

    ---
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
    , May 6, 2005
    #9
  10. In article <>,
    wrote:

    > I have C code which computes the row sums of a matrix, divide each
    > element of each row with the row sum and then compute the column sum of
    > the resulting matrix. Is there a way I can speed up the code in C:
    >
    > /* Here is the code */
    > // Table is "wij"
    >
    > int i, j;
    > for(i = 0; i < N; ++i)
    > {
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_j_wij += wij[i,j];
    > }
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];
    > }
    > }
    >


    Easy. Note that wij[i,j] is exactly the same as wij[j], so we change
    this to

    int i, j;
    for(i = 0; i < N; ++i)
    {
    for(j = 0; j < N; ++j)
    {
    sum_over_j_wij += wij[j];
    }
    for(j = 0; j < N; ++j)
    {
    sum_over_i[j] += wij[j]/sum_over_j_wij[i)];
    }
    }

    Now the two inner loops are independent and can be split; then i and j
    can be exchanged in the second loop, so we get:

    int i, j;
    for(i = 0; i < N; ++i)
    {
    for(j = 0; j < N; ++j)
    {
    sum_over_j_wij += wij[j];
    }
    }
    for(j = 0; j < N; ++j)
    {
    for(i = 0; i < N; ++i)
    {
    sum_over_i[j] += wij[j]/sum_over_j_wij[i)];
    }
    }

    In the first nested loop, we always add the same values to
    sum_over_j_wij, so we calculate that sum only once:

    int i, j;
    double s = 0.0;
    for (j = 0; j < N; ++j) s += wij[j];
    for(i = 0; i < N; ++i)
    {
    sum_over_j_wij += s;
    }
    for(j = 0; j < N; ++j)
    {
    for(i = 0; i < N; ++i)
    {
    sum_over_i[j] += wij[j]/sum_over_j_wij[i)];
    }
    }

    In the second nested loop, we add wij[j] multiplied by the sum over 1 /
    sum_over_j_wij, so we change this to:

    int i, j;
    double s = 0.0, t = 0.0;

    for (j = 0; j < N; ++j) s += wij[j];
    for (i = 0; i < N; ++i) sum_over_j_wij += s;
    for (i = 0; i < N; ++i) t += 1.0 / sum_over_j_wij;
    for (j = 0; j < N; ++j) sum_over_i[j] += wij[j] / t;

    That should make it a bit faster.
    Christian Bau, May 7, 2005
    #10
  11. Bharat Guest

    Hi,

    If you want to speed up your code beyond a certain level of optimizaton
    then better study your data. The code written below may not perform
    well if the matrix contains binary data. The study of data pattern may
    not be useful at all times. Just an idea.

    int i, j;
    for(i = 0; i++ < N; )
    {
    for(j = 0; j++ < N;)
    sum_over_j_wij += wij[i,j];

    for(j = 0; j++ < N;)
    sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];

    }

    Thanks
    Bharat


    wrote:
    > I have C code which computes the row sums of a matrix, divide each
    > element of each row with the row sum and then compute the column sum

    of
    > the resulting matrix. Is there a way I can speed up the code in C:
    >
    > /* Here is the code */
    > // Table is "wij"
    >
    > int i, j;
    > for(i = 0; i < N; ++i)
    > {
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_j_wij += wij[i,j];
    > }
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];
    > }
    > }
    Bharat, May 7, 2005
    #11
  12. > /* Here is the code */
    > // Table is "wij"
    >
    > int i, j;
    > for(i = 0; i < N; ++i)
    > {
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_j_wij += wij[i,j];
    > }
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_i[j] += wij[i,j]/sum_over_j_wij[i)];
    > }
    > }


    wij[i,j] is probably wrong, maybe you meant wij[j]?

    Having modern C or C++ compiler I don't think you can make any significant
    speedups here. Unrolling the loops will not help because optimizers do
    that - especially well if N is compile time constant. Using pointers over
    indices is also not going to help - the optimizer will create the same code
    from such simple loops no matter if you use array indexing or pointers.

    Though, if you use an older compiler, these techniques may help a lot.

    cheers,
    M.
    Marcin Kalicinski, May 7, 2005
    #12
  13. Lawrence Kirby wrote:
    [...]
    > for(i = 0; i < N; ++i)
    > {
    > /* Assuming elements are of type double */
    > const double *const wij_row = wij;
    > double sum = 0.0;
    > double sum_reciprocal;
    >
    > for(j = 0; j < N; ++j)
    > {
    > sum += wij_row[j];
    > }


    And that can be further improved by incrementing and dereferencing a
    pointer, rather than subscripting the array each time:

    const double *const wij_row = wij;
    const double *wij_col;
    [...]
    wij_col = wij_row;
    for(j = 0; j < N; ++j)
    {
    sum += *(wij_col++);
    }

    (I'm not sure if the consts you added help or not, and I hope I got
    it right on my "wij_col" declaration.)

    [...]
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_i[j] += wij_row[j] * sum_reciprocal;
    > }


    Ditto for this loop:

    wij_col = wij_row;
    for(j = 0; j < N; ++j)
    {
    sum_over_i[j] += *(wij_col++) * sum_reciprocal;
    }

    [...]

    --
    +-------------------------+--------------------+-----------------------------+
    | Kenneth J. Brody | www.hvcomputer.com | |
    | kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
    +-------------------------+--------------------+-----------------------------+
    Don't e-mail me at: <mailto:>
    Kenneth Brody, May 9, 2005
    #13
  14. In article <>,
    Kenneth Brody <> wrote:
    >And that can be further improved by incrementing and dereferencing a
    >pointer, rather than subscripting the array each time:


    That is not certain: it depends on the compiler involved.
    Some machines will do -better- with the array subscripting
    (e.g., hardware prediction logic might be better for subscripting).

    Some compilers will do better with array subscripting even though
    they convert it internally to pointers. The difference compared
    to writing the pointer logic yourself is that if you write the
    pointer logic yourself then the compiler needs to do a bunch of
    extra work to deal with the possibility of pointer aliasing;
    the equivilent logic for subscripting is less difficult -- e.g.,
    one does not need to figure out whether subscripts into two different
    automatic arrays might result in a pointer clash.
    --
    Any sufficiently advanced bug is indistinguishable from a feature.
    -- Rich Kulawiec
    Walter Roberson, May 9, 2005
    #14
  15. On Mon, 09 May 2005 10:09:55 -0400, Kenneth Brody wrote:

    > Lawrence Kirby wrote:
    > [...]
    >> for(i = 0; i < N; ++i)
    >> {
    >> /* Assuming elements are of type double */
    >> const double *const wij_row = wij;
    >> double sum = 0.0;
    >> double sum_reciprocal;
    >>
    >> for(j = 0; j < N; ++j)
    >> {
    >> sum += wij_row[j];
    >> }

    >
    > And that can be further improved by incrementing and dereferencing a
    > pointer, rather than subscripting the array each time:


    Typically, no. Indexed accesses are particularly easy for compilers to
    analyse and optimise.

    > const double *const wij_row = wij; const double *wij_col;
    > [...]
    > wij_col = wij_row;
    > for(j = 0; j < N; ++j)
    > {
    > sum += *(wij_col++);
    > }
    > }


    In my experience the indexed version will run as fast if not faster than
    this with modern compilers. If this form is more efficient at the machine
    code level the compiler will optimise the indexed for to it. If not it is
    more difficult to go the other way.

    > (I'm not sure if the consts you added help or not, and I hope I got it
    > right on my "wij_col" declaration.)


    They're there to show the reader what is constant through the loop
    iteration as much as anything. The compiler couls use this but it should
    also be able to work it out for itself.
    >
    > [...]
    >> for(j = 0; j < N; ++j)
    >> {
    >> sum_over_i[j] += wij_row[j] * sum_reciprocal;
    >> }
    >> }

    > Ditto for this loop:
    >
    > wij_col = wij_row;
    > for(j = 0; j < N; ++j)
    > {
    > sum_over_i[j] += *(wij_col++) * sum_reciprocal;
    > }


    Why not for sum_over_i as well? But again this isn't likely to help. It is
    rare to find code that is "best" for all compilers so YMMV. The change
    from division to multiplication is likely to be more significant, and
    isn't something a correct compiler can't do by itself because it changes
    the results, even if only marginally.

    Lawrence
    Lawrence Kirby, May 9, 2005
    #15
  16. In article <>,
    Kenneth Brody <> wrote:

    > Lawrence Kirby wrote:
    > [...]
    > > for(i = 0; i < N; ++i)
    > > {
    > > /* Assuming elements are of type double */
    > > const double *const wij_row = wij;
    > > double sum = 0.0;
    > > double sum_reciprocal;
    > >
    > > for(j = 0; j < N; ++j)
    > > {
    > > sum += wij_row[j];
    > > }

    >
    > And that can be further improved by incrementing and dereferencing a
    > pointer, rather than subscripting the array each time:


    With a good compiler (that is with a compiler that is not completely
    brain-damaged) that won't make the slightest difference, except that it
    obfuscates the code. Strength reduction is an optimisation that was used
    in the seventies.

    And processors that can access wij_row[j] with a single instruction that
    executes just as fast as dereferencing a pointer are not exactly rare.
    Christian Bau, May 9, 2005
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Spamtrap

    Need some hints on speeding up

    Spamtrap, Aug 11, 2004, in forum: Perl
    Replies:
    1
    Views:
    363
    Jim Gibson
    Aug 12, 2004
  2. speeding up C code

    , May 6, 2005, in forum: C++
    Replies:
    15
    Views:
    713
    Christian Bau
    May 9, 2005
  3. Andreas Neudecker

    Docs on speeding up Python code?

    Andreas Neudecker, Sep 2, 2003, in forum: Python
    Replies:
    4
    Views:
    373
    Skip Montanaro
    Sep 2, 2003
  4. razor

    speeding up piece of code

    razor, Oct 16, 2009, in forum: C Programming
    Replies:
    11
    Views:
    519
    jameskuyper
    Oct 16, 2009
  5. Replies:
    35
    Views:
    317
    Uri Guttman
    May 18, 2008
Loading...

Share This Page