GGC's Machine Code Production ?

Discussion in 'C Programming' started by Taygun Kekec, May 1, 2008.

  1. Taygun Kekec

    Taygun Kekec Guest

    Hello , I have been curious about the speed of 2 version of a matris
    filling program and got suprised by the results.
    Actually , I am having Data Structures lecture on university though
    our professor claimed "using IF structures are costly , so you better
    use immediate data addressing " so he gave us the pseudo codes that i
    implemented in language C.

    Here are the source codes :
    /* Optimized.c * /
    #include <stdio.h>
    #include <time.h>
    #include <stdlib.h>

    #define ELEMENTS 15000
    #define KOSEGEN 1
    #define UPPER 2
    #define LOWER 3


    int main()
    {
    int i,j;
    int **A;
    /* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
    A = (int **) malloc( ELEMENTS * sizeof(int*) );
    for(i = 0; i < ELEMENTS; i++){
    if ( (A = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
    printf(" yer yok \n");
    return -1;
    }
    }

    time_t start,end;

    time (&start); /* gerekli sistem saatini cekme */
    for(i=0;i<ELEMENTS-1;i++){
    A = KOSEGEN;
    for(j= i+1;j<ELEMENTS;j++) {
    A[j]= UPPER;
    A[j]= LOWER;
    }
    }
    A[ELEMENTS-1][ELEMENTS-1] = 1;
    time (&end); /* gerekli sistem saatini cekme */
    double dif = difftime (end,start); /* gecen sureyi dif degiskenine
    atma */
    printf ("%.4lf seconds for optimized algo.\n", dif );
    free(A);
    scanf("%d");
    }
    /* Optimized.c * /
    ----------------------
    /* Unoptimized.c * /
    #include <stdio.h>
    #include <time.h>
    #include <stdlib.h>

    #define ELEMENTS 15000
    #define KOSEGEN 1
    #define UPPER 2
    #define LOWER 3


    int main()
    {
    int i,j;
    int **A;
    /* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
    A = (int **) malloc( ELEMENTS * sizeof(int*) );
    for(i = 0; i < ELEMENTS; i++){
    if ( (A = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
    printf(" yer yok \n");
    return -1;
    }
    }

    time_t start,end;

    time (&start); /* gerekli sistem saatini cekme */
    for(i=0;i<ELEMENTS;i++){
    for(j=0;j<ELEMENTS;j++) {
    if ( i<j)
    A[j] = UPPER;
    else if ( i>j)
    A[j] = LOWER;
    else
    A[j] = KOSEGEN;
    }
    }
    time (&end); /* gerekli sistem saatini cekme */
    double dif = difftime (end,start); /* gecen sureyi dif
    degiskenine atma */
    printf ("%.4lf seconds for unoptimized algo.\n", dif );
    free(A);
    scanf("%d");
    }
    /* Unoptimized.c * /


    Whats the reason for this ? Processing unit prefetching or something
    about GCC ?
    Taygun Kekec, May 1, 2008
    #1
    1. Advertising

  2. Taygun Kekec

    Taygun Kekec Guest

    By the way , the algorithm aims to fill the matris like this :
    Upper side of matris to 2
    Lower side of matris to 3
    for 4x4 :
    [1] [2] [2] [2]
    [3] [1] [2] [2]
    [3] [3] [1] [2]
    [3] [3] [3] [1]
    Taygun Kekec, May 1, 2008
    #2
    1. Advertising

  3. Taygun Kekec

    Thad Smith Guest

    Taygun Kekec wrote:
    > Hello , I have been curious about the speed of 2 version of a matris
    > filling program and got suprised by the results.
    > Actually , I am having Data Structures lecture on university though
    > our professor claimed "using IF structures are costly , so you better
    > use immediate data addressing " so he gave us the pseudo codes that i
    > implemented in language C.
    >
    > Here are the source codes :
    > /* Optimized.c * /

    ....
    > for(i=0;i<ELEMENTS-1;i++){
    > A = KOSEGEN;
    > for(j= i+1;j<ELEMENTS;j++) {
    > A[j]= UPPER;
    > A[j]= LOWER;
    > }
    > }
    > A[ELEMENTS-1][ELEMENTS-1] = 1;

    ....
    > /* Unoptimized.c * /
    > for(i=0;i<ELEMENTS;i++){
    > for(j=0;j<ELEMENTS;j++) {
    > if ( i<j)
    > A[j] = UPPER;
    > else if ( i>j)
    > A[j] = LOWER;
    > else
    > A[j] = KOSEGEN;
    > }
    > }


    Just looking at the source, the first version executes the inner loop
    n(n-1)/2 times, while the last does this n*n times, about double. Also the
    second version has the additional overhead of the conditional if code,
    which the first doesn't. In general, fewer branches mean faster code.

    --
    Thad
    Thad Smith, May 1, 2008
    #3
  4. Taygun Kekec

    Taygun Kekec Guest

    You are totally right ,I also think the optimized one should finish
    filling quicker than Unoptimized.c does.
    But Unoptimized.c finishes quicker...
    Compile & Run and you will see.
    Taygun Kekec, May 1, 2008
    #4
  5. On May 1, 3:20 pm, Taygun Kekec <> wrote:
    > You are totally right ,I also think the optimized one should finish
    > filling quicker than Unoptimized.c does.
    > But Unoptimized.c  finishes quicker...
    > Compile & Run and you will see.


    On a typical 32 bit system, the memory that you are filling is about
    900 Megabyte.

    Now have a look in which order you are storing the values: In the
    "unoptimised" version, you are storing to array elements in sequential
    order. In the so-called "optimised" version, half of the stores are in
    sequential order, but then your code will have to read up to 15000
    different pointers, and store a single four byte value. Each value
    stored in a completely different location.

    Modern processors usually transfer data from and to memory in units of
    whole cache lines, often 64 byte or more per cache line. The code
    accessing the data in a linear way only writes whole cache lines as
    needed. The "optimised" code has to read and write a whole cache line,
    just to store a single integer.

    Now if you want to really, really optimise the code: Consider that on
    most implementations, memcpy is highly optimised in ways that you
    could never think of. How could you achieve the task with 99% of the
    work done in calls to memcpy?
    christian.bau, May 1, 2008
    #5
  6. Taygun Kekec

    Willem Guest

    christian.bau wrote:
    ) Modern processors usually transfer data from and to memory in units of
    ) whole cache lines, often 64 byte or more per cache line. The code
    ) accessing the data in a linear way only writes whole cache lines as
    ) needed. The "optimised" code has to read and write a whole cache line,
    ) just to store a single integer.

    I think on modern computers, if you read a line of memory, the next line
    will be 'prefetched' and will be available a lot quicker than when you
    read an other (random) piece of memory.

    In other words: the memory is optimized for sequential access.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
    Willem, May 1, 2008
    #6
  7. Taygun Kekec <> wrote in news:3e8bc817-a55a-4c0d-
    :

    > Hello , I have been curious about the speed of 2 version of a matris
    > filling program and got suprised by the results.
    > Actually , I am having Data Structures lecture on university though
    > our professor claimed "using IF structures are costly , so you better
    > use immediate data addressing " so he gave us the pseudo codes that i
    > implemented in language C.


    As others have pointed out, a branch in code that is already in the
    instruction cache is much cheaper than constantly invalidating data in
    the cache.

    There are some fundamental style issues in the code. I am going to
    comment on those.

    > /* Optimized.c * /
    > #include <stdio.h>
    > #include <time.h>
    > #include <stdlib.h>
    >
    > #define ELEMENTS 15000
    > #define KOSEGEN 1
    > #define UPPER 2
    > #define LOWER 3


    Naming some of the symbols in Turkish and others in English is not a
    good idea if you want others to comprehend easily what you are doing.

    > int main()


    int main(void)

    > {
    > int i,j;
    > int **A;
    > /* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
    > A = (int **) malloc( ELEMENTS * sizeof(int*) );


    A = malloc( ELEMENTS * sizeof(*A) );

    Casting the return value of malloc can hide failure to include stdlib.h.

    Using sizeof(*A) means one less thing to worry about if the type of
    elements of the matrix changes.

    You do not check if malloc succeded.

    > for(i = 0; i < ELEMENTS; i++){
    > if ( (A = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
    > printf(" yer yok \n");
    > return -1;
    > }
    > }


    Ditto for malloc. Also, error messages should go to stderr. Finally,
    exit with EXIT_FAILURE status is preferable to -1.

    At the end of the program, you free A, but do not free the individual
    row pointers.

    I would use an OS provided utility to benchmark, such as time on *nix
    systems or timethis on Windows.

    How does the following version perform?

    #include <stdio.h>
    #include <stdlib.h>

    enum MATRIX_VALUE { DIAGONAL = 1, UPPER = 2, LOWER = 3 };
    enum DIMENSION { ROWS = 15000, COLS = 15000 };

    int main(void) {
    int r, c;
    int *A;

    A = malloc( ROWS * COLS * sizeof( *A ) );

    if ( !A ) {
    fputs( "Insufficient memory\n", stderr );
    exit( EXIT_FAILURE );
    }

    for ( r = 0; r < ROWS; ++r ) {
    int base = r * COLS;
    for ( c = 0; c < COLS; ++c ) {
    int index = base + c;
    if ( r < c ) {
    A[ index ] = UPPER;
    }
    else if ( r > c ) {
    A[ index ] = LOWER;
    }
    else {
    A[ index ] = DIAGONAL;
    }
    }
    }

    free( A );
    return EXIT_SUCCESS;
    }

    Note that the disadvantage of this version is that it requires about 900
    MB of contiguous memory.

    On my laptop, there is no appreciable difference between this and the
    unoptimized version above (after getting rid of the extraneous calls to
    time and including code to free individual pointers) measured with
    timethis.exe on Windows. On the other hand, your code would continue to
    work even when memory is fragmented.

    Sinan

    --
    A. Sinan Unur <>
    (remove .invalid and reverse each component for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://www.rehabitation.com/clpmisc/
    A. Sinan Unur, May 1, 2008
    #7
  8. Taygun Kekec

    Taygun Kekec Guest

    Thanks for you assistance
    That cache logic makes total sense.

    Also for the style suggestions.
    Taygun Kekec, May 1, 2008
    #8
  9. On Thu, 1 May 2008 06:21:21 -0700 (PDT), Taygun Kekec
    <> wrote:

    >Hello , I have been curious about the speed of 2 version of a matris
    >filling program and got suprised by the results.
    >Actually , I am having Data Structures lecture on university though
    >our professor claimed "using IF structures are costly , so you better
    >use immediate data addressing " so he gave us the pseudo codes that i
    >implemented in language C.
    >
    >Here are the source codes :
    >/* Optimized.c * /
    >#include <stdio.h>
    >#include <time.h>
    >#include <stdlib.h>
    >
    >#define ELEMENTS 15000
    >#define KOSEGEN 1
    >#define UPPER 2
    >#define LOWER 3
    >
    >
    >int main()
    >{
    > int i,j;
    > int **A;
    > /* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
    > A = (int **) malloc( ELEMENTS * sizeof(int*) );


    Don't cast the return from malloc. It cannot help but may cause the
    compiler to suppress a diagnostic you would really want to see.

    > for(i = 0; i < ELEMENTS; i++){
    > if ( (A = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
    > printf(" yer yok \n");
    > return -1;


    The only portable return values are 0, EXIT_SUCCESS, and EXIT_FAILURE.

    > }
    > }
    >
    > time_t start,end;
    >
    > time (&start); /* gerekli sistem saatini cekme */
    > for(i=0;i<ELEMENTS-1;i++){
    > A = KOSEGEN;
    > for(j= i+1;j<ELEMENTS;j++) {
    > A[j]= UPPER;
    > A[j]= LOWER;
    > }
    > }
    > A[ELEMENTS-1][ELEMENTS-1] = 1;


    The A statement in your for loop already handles this situation.

    > time (&end); /* gerekli sistem saatini cekme */
    > double dif = difftime (end,start); /* gecen sureyi dif degiskenine
    >atma */


    Unless you have a C99 compiler, declarations need to precede
    executable statements.

    > printf ("%.4lf seconds for optimized algo.\n", dif );
    > free(A);


    This causes several memory leaks. The memory allocated to each A
    is still allocated but now can never be released.

    > scanf("%d");


    This invokes undefined behavior. The %d requires an argument of type
    int*. What was your intent here? Did you mean to use getchar
    instead?

    >}
    >/* Optimized.c * /
    >----------------------
    >/* Unoptimized.c * /
    >#include <stdio.h>
    >#include <time.h>
    >#include <stdlib.h>
    >
    >#define ELEMENTS 15000
    >#define KOSEGEN 1
    >#define UPPER 2
    >#define LOWER 3
    >
    >
    >int main()
    >{
    > int i,j;
    > int **A;
    > /* 15000 x 15000 lik matris icin gerekli bellek allokasyonu */
    > A = (int **) malloc( ELEMENTS * sizeof(int*) );
    > for(i = 0; i < ELEMENTS; i++){
    > if ( (A = (int * )malloc( ELEMENTS * sizeof(int) )) == NULL ){
    > printf(" yer yok \n");
    > return -1;
    > }
    > }
    >
    > time_t start,end;
    >
    > time (&start); /* gerekli sistem saatini cekme */
    > for(i=0;i<ELEMENTS;i++){
    > for(j=0;j<ELEMENTS;j++) {
    > if ( i<j)
    > A[j] = UPPER;
    > else if ( i>j)
    > A[j] = LOWER;
    > else
    > A[j] = KOSEGEN;
    > }
    > }
    > time (&end); /* gerekli sistem saatini cekme */
    > double dif = difftime (end,start); /* gecen sureyi dif
    >degiskenine atma */
    > printf ("%.4lf seconds for unoptimized algo.\n", dif );
    > free(A);
    > scanf("%d");
    >}
    >/* Unoptimized.c * /
    >
    >
    >Whats the reason for this ? Processing unit prefetching or something
    >about GCC ?



    Remove del for email
    Barry Schwarz, May 2, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Seth Darr
    Replies:
    2
    Views:
    501
    Seth Darr
    Oct 6, 2004
  2. Tushar

    SMTP Settings on Production Machine

    Tushar, Nov 1, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    2,086
    Tushar
    Nov 2, 2004
  3. Tushar Karsan

    Setting SMTP on a Production Machine

    Tushar Karsan, Nov 1, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    324
    Tushar Karsan
    Nov 1, 2004
  4. =?Utf-8?B?UmV6YSBTb2xvdWtp?=

    SessionID Changes in production machine....help

    =?Utf-8?B?UmV6YSBTb2xvdWtp?=, May 13, 2005, in forum: ASP .Net
    Replies:
    5
    Views:
    467
    Juan T. Llibre
    May 13, 2005
  5. David C
    Replies:
    2
    Views:
    446
    David C
    Sep 4, 2007
Loading...

Share This Page