question about macro?

Discussion in 'C Programming' started by MBALOVER, Mar 9, 2010.

  1. MBALOVER

    MBALOVER Guest

    Hi all,

    Actually I want my code to be


    for (i=0;i<N;i++)
    {
    array2[i*30]=array1;
    array2[i*30+1]=array1;
    array2[i*30+2]=array1;
    array2[i*30+3]=array1;
    ..................
    ..................
    ..................
    array2[i*30+28]=array1;
    array2[i*30+29]=array1;
    }

    If there anyway to use macros to do it instead for listing out
    manually in the code like above?
    (the listing out such as the above example will be very bad in the
    case I need, say, array2[i*1000+0]=array1; to ...
    array2[i*1000+999]=array1;)

    And please note that for some reasons of my device, I do not want to
    use another loop such as

    for (i=0;i<N;i++)
    for (j=0;j<30;j++)
    array2[i*30+j]=array1;


    If I write a macro as follows:

    #define myMacro (array1, array2) \
    for ( j=0;j<30;j++) \
    array2[i*30+j]=array1; \



    Does it help? I doubt it because it will be the same as the case with
    two loops that I want to avoid.

    Thanks a lot.
    MBALOVER, Mar 9, 2010
    #1
    1. Advertising

  2. MBALOVER

    Ian Collins Guest

    MBALOVER wrote:
    > Hi all,
    >
    > Actually I want my code to be
    >
    >
    > for (i=0;i<N;i++)
    > {
    > array2[i*30]=array1;
    > array2[i*30+1]=array1;
    > array2[i*30+2]=array1;
    > array2[i*30+3]=array1;
    > ..................
    > ..................
    > ..................
    > array2[i*30+28]=array1;
    > array2[i*30+29]=array1;
    > }
    >
    > If there anyway to use macros to do it instead for listing out
    > manually in the code like above?


    Forget macros, they won't help here.

    How about memset?

    (untested)

    memset( array2+i*30, array1, 30*sizeof(array2[0]) );

    --
    Ian Collins
    Ian Collins, Mar 9, 2010
    #2
    1. Advertising

  3. MBALOVER

    John Bode Guest

    On Mar 9, 2:32 pm, MBALOVER <> wrote:
    > Hi all,
    >
    > Actually I want my code to be
    >
    > for (i=0;i<N;i++)
    > {
    >     array2[i*30]=array1;
    >     array2[i*30+1]=array1;
    >     array2[i*30+2]=array1;
    >     array2[i*30+3]=array1;
    > .................
    > .................
    > .................
    >     array2[i*30+28]=array1;
    >     array2[i*30+29]=array1;
    >
    > }
    >
    > If there anyway to use macros to do it instead for listing out
    > manually in the code like above?
    > (the listing out such as the above example will be very bad in the
    > case I need, say,  array2[i*1000+0]=array1;  to ...
    > array2[i*1000+999]=array1;)
    >
    > And please note that for some reasons of my device, I do not want to
    > use another loop such as
    >
    > for (i=0;i<N;i++)
    > for (j=0;j<30;j++)
    > array2[i*30+j]=array1;
    >
    > If I write a macro as follows:
    >
    > #define myMacro (array1, array2)          \
    > for ( j=0;j<30;j++)                                  \
    >      array2[i*30+j]=array1;                    \
    >
    > Does it help? I doubt it because it will be the same as the case with
    > two loops that I want to avoid.
    >
    > Thanks a lot.


    I'm curious as to why an inner loop would be that bad.

    Frankly, you're stuck; if you absolutely cannot use an inner loop,
    then you'll have to write out each initialization manually. You could
    write another program to automate the process for you, such as:

    printf("for (i = 0; i < N; i++)\n{\n");
    for (j = 0; j < K; j++) /** where K is the number of items */
    {
    printf(" array[i*30+%d] = array;\n", j);
    }
    printf("}\n");

    and then paste that output into your code.

    Another alternative would be to look into a variation of Duff's device
    (see http://en.wikipedia.org/wiki/Duff's_device), where you use an
    inner loop but partially unroll it.

    for (i = 0; i < N; i++)
    {
    size_t j = (K + 7)/8;
    switch(j % 8)
    {
    case 0 : do {
    case 7 : array[i*30+j+7] = array;
    case 6 : array[i*30+j+6] = array;
    case 5 : array[i*30+j+5] = array;
    case 4 : array[i*30+j+4] = array;
    case 3 : array[i*30+j+3] = array;
    case 2 : array[i*30+j+2] = array;
    case 1 : array[i*30+j+1] = array;
    } while (--j > 0);
    }
    }
    John Bode, Mar 9, 2010
    #3
  4. MBALOVER <> writes:
    > Actually I want my code to be
    >
    >
    > for (i=0;i<N;i++)
    > {
    > array2[i*30]=array1;
    > array2[i*30+1]=array1;
    > array2[i*30+2]=array1;
    > array2[i*30+3]=array1;
    > .................
    > .................
    > .................
    > array2[i*30+28]=array1;
    > array2[i*30+29]=array1;
    > }


    It certainly looks like a job for a nested for loop (but see below).

    > If there anyway to use macros to do it instead for listing out
    > manually in the code like above?
    > (the listing out such as the above example will be very bad in the
    > case I need, say, array2[i*1000+0]=array1; to ...
    > array2[i*1000+999]=array1;)


    There's not really any way for the preprocessor, given a number N, to
    generate N statements.

    > And please note that for some reasons of my device, I do not want to
    > use another loop such as
    >
    > for (i=0;i<N;i++)
    > for (j=0;j<30;j++)
    > array2[i*30+j]=array1;


    So you insist on having, say, 30 distinct assignment statements; an
    equivalent loop that iterates 30 times and performs exactly the same
    assignments isn't good enough.

    It might help if we know *how* your device imposes this requirement.

    If you just write the nested for loop, how exactly does that not work?

    > If I write a macro as follows:
    >
    > #define myMacro (array1, array2) \
    > for ( j=0;j<30;j++) \
    > array2[i*30+j]=array1; \
    >
    >
    >
    > Does it help? I doubt it because it will be the same as the case with
    > two loops that I want to avoid.


    Right, the generated code will almost certainly be the same whether
    the loop is written out or generated by a macro.

    What you're doing is loop unrolling. It's an optimization technique
    that can result in faster but larger code. (In some cases, the
    code can be slower just because it's larger, due to cache effects.)
    Doing loop unrolling at the source level would usually be considered
    premature optimization; a good optimizing compiler should be able
    to do this for you.

    But if you really need to do this, you might consider writing another
    tool that generates C code from an input specification. (m4 is a
    popular preprocessing tool; I don't know whether it's powerful enough
    for this, but it's worth looking into.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 9, 2010
    #4
  5. Ian Collins <> writes:
    > MBALOVER wrote:
    >> Actually I want my code to be
    >>
    >>
    >> for (i=0;i<N;i++)
    >> {
    >> array2[i*30]=array1;
    >> array2[i*30+1]=array1;
    >> array2[i*30+2]=array1;
    >> array2[i*30+3]=array1;
    >> ..................
    >> ..................
    >> ..................
    >> array2[i*30+28]=array1;
    >> array2[i*30+29]=array1;
    >> }
    >>
    >> If there anyway to use macros to do it instead for listing out
    >> manually in the code like above?

    >
    > Forget macros, they won't help here.
    >
    > How about memset?
    >
    > (untested)
    >
    > memset( array2+i*30, array1, 30*sizeof(array2[0]) );


    No, memset sets each byte of the target to the same value. Unless
    array1 and array2 are byte arrays, it won't help here.

    Something similar to memset that works on elements bigger than bytes
    would do the trick. But the obvious way to implement such a function
    is to use a for loop, which the OP wants to avoid for some unknown
    reason.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 9, 2010
    #5
  6. MBALOVER

    Ian Collins Guest

    Keith Thompson wrote:
    > Ian Collins <> writes:
    >> MBALOVER wrote:
    >>> Actually I want my code to be
    >>>
    >>>
    >>> for (i=0;i<N;i++)
    >>> {
    >>> array2[i*30]=array1;
    >>> array2[i*30+1]=array1;
    >>> array2[i*30+2]=array1;
    >>> array2[i*30+3]=array1;
    >>> ..................
    >>> ..................
    >>> ..................
    >>> array2[i*30+28]=array1;
    >>> array2[i*30+29]=array1;
    >>> }
    >>>
    >>> If there anyway to use macros to do it instead for listing out
    >>> manually in the code like above?

    >> Forget macros, they won't help here.
    >>
    >> How about memset?
    >>
    >> (untested)
    >>
    >> memset( array2+i*30, array1, 30*sizeof(array2[0]) );

    >
    > No, memset sets each byte of the target to the same value. Unless
    > array1 and array2 are byte arrays, it won't help here.


    Bugger, I forgot about that... I agree with your other posting.

    --
    Ian Collins
    Ian Collins, Mar 9, 2010
    #6
  7. MBALOVER

    MBALOVER Guest

    Thanks a lot for responses.

    Let me explain why I do not want to use a inner loop.

    for (i=0;i<N;i++)
    for (j=0;j<M;j++)
    array2[i*M+j]=array1;

    for each for iteration of a for loop, we need a comparison and one
    addition. In my system, each comparison takes 3 cycles and one
    addition takes 1 cycle.
    So each iteration takes 4 cycles.

    If I use a inner loop, for each i, I need 4*M additional cycles. In
    my case M , N both are large numbers and the system's time budget is
    limited.

    That is why I do not want to use this way.






    On Mar 9, 3:53 pm, Keith Thompson <> wrote:
    > MBALOVER <> writes:
    > > Actually I want my code to be

    >
    > > for (i=0;i<N;i++)
    > > {
    > >     array2[i*30]=array1;
    > >     array2[i*30+1]=array1;
    > >     array2[i*30+2]=array1;
    > >     array2[i*30+3]=array1;
    > > .................
    > > .................
    > > .................
    > >     array2[i*30+28]=array1;
    > >     array2[i*30+29]=array1;
    > > }

    >
    > It certainly looks like a job for a nested for loop (but see below).
    >
    > > If there anyway to use macros to do it instead for listing out
    > > manually in the code like above?
    > > (the listing out such as the above example will be very bad in the
    > > case I need, say,  array2[i*1000+0]=array1;  to ...
    > > array2[i*1000+999]=array1;)

    >
    > There's not really any way for the preprocessor, given a number N, to
    > generate N statements.
    >
    > > And please note that for some reasons of my device, I do not want to
    > > use another loop such as

    >
    > > for (i=0;i<N;i++)
    > > for (j=0;j<30;j++)
    > > array2[i*30+j]=array1;

    >
    > So you insist on having, say, 30 distinct assignment statements; an
    > equivalent loop that iterates 30 times and performs exactly the same
    > assignments isn't good enough.
    >
    > It might help if we know *how* your device imposes this requirement.
    >
    > If you just write the nested for loop, how exactly does that not work?
    >
    > > If I write a macro as follows:

    >
    > > #define myMacro (array1, array2)          \
    > > for ( j=0;j<30;j++)                                  \
    > >      array2[i*30+j]=array1;                    \

    >
    > > Does it help? I doubt it because it will be the same as the case with
    > > two loops that I want to avoid.

    >
    > Right, the generated code will almost certainly be the same whether
    > the loop is written out or generated by a macro.
    >
    > What you're doing is loop unrolling.  It's an optimization technique
    > that can result in faster but larger code.  (In some cases, the
    > code can be slower just because it's larger, due to cache effects.)
    > Doing loop unrolling at the source level would usually be considered
    > premature optimization; a good optimizing compiler should be able
    > to do this for you.
    >
    > But if you really need to do this, you might consider writing another
    > tool that generates C code from an input specification.  (m4 is a
    > popular preprocessing tool; I don't know whether it's powerful enough
    > for this, but it's worth looking into.)
    >
    > --
    > Keith Thompson (The_Other_Keith)  <http://www.ghoti.net/~kst>
    > Nokia
    > "We must do something.  This is something.  Therefore, we must do this."
    >     -- Antony Jay and Jonathan Lynn, "Yes Minister"
    MBALOVER, Mar 9, 2010
    #7
  8. MBALOVER

    Ian Collins Guest

    MBALOVER wrote:

    [please don't top post]

    > Thanks a lot for responses.
    >
    > Let me explain why I do not want to use a inner loop.
    >
    > for (i=0;i<N;i++)
    > for (j=0;j<M;j++)
    > array2[i*M+j]=array1;
    >
    > for each for iteration of a for loop, we need a comparison and one
    > addition. In my system, each comparison takes 3 cycles and one
    > addition takes 1 cycle.
    > So each iteration takes 4 cycles.
    >
    > If I use a inner loop, for each i, I need 4*M additional cycles. In
    > my case M , N both are large numbers and the system's time budget is
    > limited.
    >
    > That is why I do not want to use this way.


    Have you looked at the generated code with full optimisation on your
    compiler? All those I've used will perform some degree of loop
    unrolling, so you will only suffer an extra

    (4*M)/L

    where L is the number of iterations inlined by the compiler.

    --
    Ian Collins
    Ian Collins, Mar 9, 2010
    #8
  9. MBALOVER

    bartc Guest

    MBALOVER wrote:
    > Thanks a lot for responses.
    >
    > Let me explain why I do not want to use a inner loop.
    >
    > for (i=0;i<N;i++)
    > for (j=0;j<M;j++)
    > array2[i*M+j]=array1;
    >
    > for each for iteration of a for loop, we need a comparison and one
    > addition. In my system, each comparison takes 3 cycles and one
    > addition takes 1 cycle.
    > So each iteration takes 4 cycles.
    >
    > If I use a inner loop, for each i, I need 4*M additional cycles. In
    > my case M , N both are large numbers and the system's time budget is
    > limited.
    >
    > That is why I do not want to use this way.


    How about a partially unrolled inner loop?

    So in the the inner loop it does 10 assignments at a time, and the loop
    executes M/10 times (with any odd assignments added at the end). Then the
    overhead might only be 4*M/10 cycles, and you only ever have to write out
    in, full, 10 assignments (or however many you want to do in one go).

    --
    Bartc
    bartc, Mar 9, 2010
    #9
  10. MBALOVER

    Eric Sosman Guest

    On 3/9/2010 3:32 PM, MBALOVER wrote:
    > Hi all,
    >
    > Actually I want my code to be
    >
    >
    > for (i=0;i<N;i++)
    > {
    > array2[i*30]=array1;
    > array2[i*30+1]=array1;
    > array2[i*30+2]=array1;
    > array2[i*30+3]=array1;
    > .................
    > .................
    > .................
    > array2[i*30+28]=array1;
    > array2[i*30+29]=array1;
    > }
    >
    > If there anyway to use macros to do it instead for listing out
    > manually in the code like above?


    Use a second loop nested inside the first.

    > (the listing out such as the above example will be very bad in the
    > case I need, say, array2[i*1000+0]=array1; to ...
    > array2[i*1000+999]=array1;)
    >
    > And please note that for some reasons of my device, I do not want to
    > use another loop such as
    >
    > for (i=0;i<N;i++)
    > for (j=0;j<30;j++)
    > array2[i*30+j]=array1;


    Use a second loop nested inside-- oh, wait, sorry. It's
    the right solution, but if you don't want to use it ... What
    are these mysterious "reasons" of yours? Tell us about them,
    and maybe somebody will have a better idea than writing line
    after line after repetitive line of code.

    > If I write a macro as follows:
    >
    > #define myMacro (array1, array2) \
    > for ( j=0;j<30;j++) \
    > array2[i*30+j]=array1; \
    >
    > Does it help? I doubt it because it will be the same as the case with
    > two loops that I want to avoid.


    It does the right thing, which is to use a second loop-- oh,
    wait, we've been through that already.

    The preprocessor has no iterative constructs.

    --
    Eric Sosman
    lid
    Eric Sosman, Mar 9, 2010
    #10
  11. MBALOVER

    ImpalerCore Guest

    On Mar 9, 3:32 pm, MBALOVER <> wrote:
    > Hi all,
    >
    > Actually I want my code to be
    >
    > for (i=0;i<N;i++)
    > {
    >     array2[i*30]=array1;
    >     array2[i*30+1]=array1;
    >     array2[i*30+2]=array1;
    >     array2[i*30+3]=array1;
    > .................
    > .................
    > .................
    >     array2[i*30+28]=array1;
    >     array2[i*30+29]=array1;
    >
    > }
    >
    > If there anyway to use macros to do it instead for listing out
    > manually in the code like above?
    > (the listing out such as the above example will be very bad in the
    > case I need, say,  array2[i*1000+0]=array1;  to ...
    > array2[i*1000+999]=array1;)
    >
    > And please note that for some reasons of my device, I do not want to
    > use another loop such as
    >
    > for (i=0;i<N;i++)
    > for (j=0;j<30;j++)
    > array2[i*30+j]=array1;
    >
    > If I write a macro as follows:
    >
    > #define myMacro (array1, array2)          \
    > for ( j=0;j<30;j++)                                  \
    >      array2[i*30+j]=array1;                    \
    >
    > Does it help? I doubt it because it will be the same as the case with
    > two loops that I want to avoid.
    >
    > Thanks a lot.


    Just a question. Is the performance gain worth the maintenance and
    code size (if embedded) costs that you're adding? I haven't
    encountered a scenario where doing something like this is worth the
    imposed maintenance cost, so I'm curious what the scenario is.
    ImpalerCore, Mar 9, 2010
    #11
  12. MBALOVER

    Eric Sosman Guest

    On 3/9/2010 4:09 PM, MBALOVER wrote:
    > Thanks a lot for responses.
    >
    > Let me explain why I do not want to use a inner loop.
    >
    > for (i=0;i<N;i++)
    > for (j=0;j<M;j++)
    > array2[i*M+j]=array1;
    >
    > for each for iteration of a for loop, we need a comparison and one
    > addition.


    Maybe, but optimizing compilers are already pretty good
    at unrolling loops -- especially loops whose iteration count
    are compile-time constants.

    > addition. In my system, each comparison takes 3 cycles and one
    > addition takes 1 cycle.
    > So each iteration takes 4 cycles.


    It was reasonable to count cycles as recently as, oh, twenty
    years ago. Since then, CPU's have become far more intricate and
    cycle counts have become extremely difficult to compute. Unless
    you're using an unusually simple processor (single-issue, little
    or no pipelining, at most one level of memory cache, maybe only
    a one- or two-slot store buffer, ...), these cycle counts are
    only loosely related to elapsed time.

    Your example showed thirty assignments, and you indicated you
    might need as many as a thousand. How many cycles will you spend
    in wait states with the CPU twiddling its thumbs while RAM ever so
    slowly retrieves a fresh batch of instructions?

    > If I use a inner loop, for each i, I need 4*M additional cycles. In
    > my case M , N both are large numbers and the system's time budget is
    > limited.
    >
    > That is why I do not want to use this way.


    Write it both ways (John Bode's suggestion of a "helper"
    program will save some time), and MEASURE the difference. Cycle
    counting is nearly impossible unless you happen to know that most
    of the data is already in CPU registers; a couple of cache misses
    (even just in L1) can disrupt the whole calculation. Measure!

    --
    Eric Sosman
    lid
    Eric Sosman, Mar 9, 2010
    #12
  13. MBALOVER

    Ian Collins Guest

    Ian Collins wrote:
    > MBALOVER wrote:
    >
    > [please don't top post]
    >
    >> Thanks a lot for responses.
    >>
    >> Let me explain why I do not want to use a inner loop.
    >>
    >> for (i=0;i<N;i++)
    >> for (j=0;j<M;j++)
    >> array2[i*M+j]=array1;
    >>
    >> for each for iteration of a for loop, we need a comparison and one
    >> addition. In my system, each comparison takes 3 cycles and one
    >> addition takes 1 cycle.
    >> So each iteration takes 4 cycles.
    >>
    >> If I use a inner loop, for each i, I need 4*M additional cycles. In
    >> my case M , N both are large numbers and the system's time budget is
    >> limited.
    >>
    >> That is why I do not want to use this way.

    >
    > Have you looked at the generated code with full optimisation on your
    > compiler? All those I've used will perform some degree of loop
    > unrolling, so you will only suffer an extra
    >
    > (4*M)/L
    >
    > where L is the number of iterations inlined by the compiler.


    A wacky alternative if you have a C++ compiler for your target is to use
    template meta-programming to generate the code.

    --
    Ian Collins
    Ian Collins, Mar 9, 2010
    #13
  14. Ian Collins <> writes:

    > MBALOVER wrote:
    >> Hi all,
    >>
    >> Actually I want my code to be
    >>
    >>
    >> for (i=0;i<N;i++)
    >> {
    >> array2[i*30]=array1;
    >> array2[i*30+1]=array1;
    >> array2[i*30+2]=array1;
    >> array2[i*30+3]=array1;
    >> ..................
    >> ..................
    >> ..................
    >> array2[i*30+28]=array1;
    >> array2[i*30+29]=array1;
    >> }
    >>
    >> If there anyway to use macros to do it instead for listing out
    >> manually in the code like above?

    >
    > Forget macros, they won't help here.
    >
    > How about memset?
    >
    > (untested)
    >
    > memset( array2+i*30, array1, 30*sizeof(array2[0]) );


    As Keith has pointed out memset won't do. memcpy, on the other hand,
    can do it in 5 calls:

    array2[i+30] = array1;
    memcpy(array2+i*30 + 1, array2[i+30], sizeof array2);
    memcpy(array2+i*30 + 2, array2[i+30], 2 * sizeof array2);
    memcpy(array2+i*30 + 4, array2[i+30], 4 * sizeof array2);
    memcpy(array2+i*30 + 8, array2[i+30], 8 * sizeof array2);
    memcpy(array2+i*30 + 16, array2[i+30], 14 * sizeof array2);

    Note the final multiplier (if I've go it right). The initial
    assignment can be skipped if array2 and array1 have the same type (by
    copying from array1 rather than array2) but it is likely that it is
    faster to do the first few by assignment anyway before switching to
    memcpy. The only way to be sure is to measure. I'd also measure the
    plain loop because as so many people have pointed out, it is often
    hard to predict the costs of loops these days.

    The 1000 element copy takes 10 calls. Of course even these calls can
    be put in a loop that where the overhead is likely to be small
    compared to the work done. The result would be for more maintainable
    than huge pile of assignments.

    --
    Ben.
    Ben Bacarisse, Mar 9, 2010
    #14
  15. MBALOVER

    Seebs Guest

    On 2010-03-09, MBALOVER <> wrote:
    > Let me explain why I do not want to use a inner loop.


    Okay.

    > for each for iteration of a for loop, we need a comparison and one
    > addition. In my system, each comparison takes 3 cycles and one
    > addition takes 1 cycle.


    Stop right there.

    Write it the simple, clear, and correct way.

    Now test: IS IT ACTUALLY TOO SLOW?

    If the answer is "no", you're done.

    DO NOT TRY TO OPTIMIZE WITHOUT TESTING FIRST. EVER.

    You may actually find out that the for loop is faster. Why? Any of
    several reasons. Optimizers are pretty good about inner loops; they
    get a lot of practice. Furthermore, it is not uncommon for a processor
    to be such that an inner loop will be faster than a fully-unrolled
    loop, because the fully-unrolled loop breaks the cache. You think
    branches are slow, wait until you see the cost of waiting for a single
    byte of memory that wasn't in cache...

    Seriously, though, you should NOT be trying to optimize stuff like this
    at your level of experience. When you're regularly advising the experts
    in the group on performance, then you could possibly benefit from talking
    about cycle counts. At your level of experience, you're just wasting
    your time.

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Mar 9, 2010
    #15
  16. MBALOVER

    Ike Naar Guest

    In article <>,
    MBALOVER <> wrote:
    >Hi all,
    >
    >Actually I want my code to be
    >
    >
    >for (i=0;i<N;i++)
    >{
    > array2[i*30]=array1;
    > array2[i*30+1]=array1;
    > array2[i*30+2]=array1;
    > array2[i*30+3]=array1;
    >.................
    >.................
    >.................
    > array2[i*30+28]=array1;
    > array2[i*30+29]=array1;
    >}
    >
    >If there anyway to use macros to do it instead for listing out
    >manually in the code like above?
    >(the listing out such as the above example will be very bad in the
    >case I need, say, array2[i*1000+0]=array1; to ...
    >array2[i*1000+999]=array1;)
    >
    >And please note that for some reasons of my device, I do not want to
    >use another loop such as
    >
    >for (i=0;i<N;i++)
    >for (j=0;j<30;j++)
    >array2[i*30+j]=array1;


    You could to this:

    for (j = 0; j < 30 * N; j++)
    {
    array2[j] = array1[j/30];
    }
    Ike Naar, Mar 10, 2010
    #16
  17. MBALOVER

    Flash Gordon Guest

    Eric Sosman wrote:
    > On 3/9/2010 4:09 PM, MBALOVER wrote:
    >> Thanks a lot for responses.
    >>
    >> Let me explain why I do not want to use a inner loop.
    >>
    >> for (i=0;i<N;i++)
    >> for (j=0;j<M;j++)
    >> array2[i*M+j]=array1;
    >>
    >> for each for iteration of a for loop, we need a comparison and one
    >> addition.

    >
    > Maybe, but optimizing compilers are already pretty good
    > at unrolling loops -- especially loops whose iteration count
    > are compile-time constants.


    Yes, but possibly not enough. The OP needs to check this.

    Personally, if the OP needs to do the full unroll, I would be inclined
    to write a small C program to run on the ost system to generate the code
    to be compiled for the target.

    >> addition. In my system, each comparison takes 3 cycles and one
    >> addition takes 1 cycle.
    >> So each iteration takes 4 cycles.

    >
    > It was reasonable to count cycles as recently as, oh, twenty
    > years ago. Since then, CPU's have become far more intricate and
    > cycle counts have become extremely difficult to compute. Unless
    > you're using an unusually simple processor (single-issue, little
    > or no pipelining, at most one level of memory cache, maybe only
    > a one- or two-slot store buffer, ...), these cycle counts are
    > only loosely related to elapsed time.


    It's not *that* long ago that I was programming an embedded processor
    that had a trick three stage pipeline (all instructions when through all
    three stages even if they only did something in one of them), no
    out-of-order execution and no cache. So excluding wait states for memory
    all instructions took in three cycles. However, in practice, due to the
    pipelining, all instructions effectively took one cycle except in the
    presence of a branch. The processor, and it's derivatives, are still
    available today.

    > Your example showed thirty assignments, and you indicated you
    > might need as many as a thousand. How many cycles will you spend
    > in wait states with the CPU twiddling its thumbs while RAM ever so
    > slowly retrieves a fresh batch of instructions?


    On the embedded system I was processing the memory and clock speeds
    where carefully selected so there were no wait states.

    >> If I use a inner loop, for each i, I need 4*M additional cycles. In
    >> my case M , N both are large numbers and the system's time budget is
    >> limited.
    >>
    >> That is why I do not want to use this way.

    >
    > Write it both ways (John Bode's suggestion of a "helper"
    > program will save some time), and MEASURE the difference. Cycle
    > counting is nearly impossible unless you happen to know that most
    > of the data is already in CPU registers; a couple of cache misses
    > (even just in L1) can disrupt the whole calculation. Measure!


    I agree with measure, although *if* it is a simple enough processor (and
    they are still out there) you *can* count.
    --
    Flash Gordon
    Flash Gordon, Mar 11, 2010
    #17
  18. MBALOVER

    Squeamizh Guest

    On Mar 10, 5:11 pm, Flash Gordon <> wrote:
    > Eric Sosman wrote:
    > > On 3/9/2010 4:09 PM, MBALOVER wrote:
    > >> Thanks a lot for responses.

    >
    > >> Let me explain why I do not want to use a inner loop.

    >
    > >> for (i=0;i<N;i++)
    > >> for (j=0;j<M;j++)
    > >> array2[i*M+j]=array1;

    >
    > >> for each for iteration of a for loop, we need a comparison and one
    > >> addition.

    >
    > >     Maybe, but optimizing compilers are already pretty good
    > > at unrolling loops -- especially loops whose iteration count
    > > are compile-time constants.

    >
    > Yes, but possibly not enough. The OP needs to check this.
    >
    > Personally, if the OP needs to do the full unroll, I would be inclined
    > to write a small C program to run on the ost system to generate the code
    > to be compiled for the target.
    >
    > >> addition. In my system, each comparison takes 3 cycles and one
    > >> addition takes 1 cycle.
    > >> So each iteration takes 4 cycles.

    >
    > >     It was reasonable to count cycles as recently as, oh, twenty
    > > years ago.  Since then, CPU's have become far more intricate and
    > > cycle counts have become extremely difficult to compute.  Unless
    > > you're using an unusually simple processor (single-issue, little
    > > or no pipelining, at most one level of memory cache, maybe only
    > > a one- or two-slot store buffer, ...), these cycle counts are
    > > only loosely related to elapsed time.

    >
    > It's not *that* long ago that I was programming an embedded processor
    > that had a trick three stage pipeline (all instructions when through all
    > three stages even if they only did something in one of them), no
    > out-of-order execution and no cache. So excluding wait states for memory
    > all instructions took in three cycles. However, in practice, due to the
    > pipelining, all instructions effectively took one cycle except in the
    > presence of a branch. The processor, and it's derivatives, are still
    > available today.


    What processor? Just curious...
    Squeamizh, Mar 11, 2010
    #18
  19. MBALOVER

    Phil Carmody Guest

    Ben Bacarisse <> writes:
    > Ian Collins <> writes:
    >> MBALOVER wrote:
    >>> Hi all,
    >>>
    >>> Actually I want my code to be
    >>>
    >>> for (i=0;i<N;i++)
    >>> {
    >>> array2[i*30]=array1;
    >>> ..................
    >>> array2[i*30+29]=array1;
    >>> }
    >>>
    >>> If there anyway to use macros to do it instead for listing out
    >>> manually in the code like above?

    >>
    >> Forget macros, they won't help here.
    >>
    >> How about memset?


    On many installations, you'll find that's a macro! ;-)

    >> (untested)
    >>
    >> memset( array2+i*30, array1, 30*sizeof(array2[0]) );

    >
    > As Keith has pointed out memset won't do. memcpy, on the other hand,
    > can do it in 5 calls:
    >
    > array2[i+30] = array1;
    > memcpy(array2+i*30 + 1, array2[i+30], sizeof array2);
    > memcpy(array2+i*30 + 2, array2[i+30], 2 * sizeof array2);
    > memcpy(array2+i*30 + 4, array2[i+30], 4 * sizeof array2);
    > memcpy(array2+i*30 + 8, array2[i+30], 8 * sizeof array2);
    > memcpy(array2+i*30 + 16, array2[i+30], 14 * sizeof array2);
    >
    > Note the final multiplier (if I've go it right).


    14's right. The +30's should be *30's.

    I timed this against a naive loop for int, long long, and
    double types. Conclusions were inconclusive. For double,
    the naive loop was faster when compiled for plain i386, but
    slower when 'optimised' for the actual athlon-xp architecture.
    For int and long long, the memcpys were faster than the loop.
    The memcpys were in fact simply unrolled completely.

    Given that there was a case where the memcpys were measurably
    slower, it's not worth blindly going for the clever technique
    until you've both found that the naive loop is slowing you down
    and measured the two to make sure that the optimisation is in
    fact an improvement. Writing a clever version for speed without
    measuring whether it's faster or not is dumb. And frequently
    done, alas.

    Phil
    --
    I find the easiest thing to do is to k/f myself and just troll away
    -- David Melville on r.a.s.f1
    Phil Carmody, Mar 11, 2010
    #19
  20. Phil Carmody <> writes:

    > Ben Bacarisse <> writes:
    >> Ian Collins <> writes:
    >>> MBALOVER wrote:
    >>>> Hi all,
    >>>>
    >>>> Actually I want my code to be
    >>>>
    >>>> for (i=0;i<N;i++)
    >>>> {
    >>>> array2[i*30]=array1;
    >>>> ..................
    >>>> array2[i*30+29]=array1;
    >>>> }
    >>>>
    >>>> If there anyway to use macros to do it instead for listing out
    >>>> manually in the code like above?
    >>>
    >>> Forget macros, they won't help here.
    >>>
    >>> How about memset?

    >
    > On many installations, you'll find that's a macro! ;-)
    >
    >>> (untested)
    >>>
    >>> memset( array2+i*30, array1, 30*sizeof(array2[0]) );

    >>
    >> As Keith has pointed out memset won't do. memcpy, on the other hand,
    >> can do it in 5 calls:
    >>
    >> array2[i+30] = array1;
    >> memcpy(array2+i*30 + 1, array2[i+30], sizeof array2);
    >> memcpy(array2+i*30 + 2, array2[i+30], 2 * sizeof array2);
    >> memcpy(array2+i*30 + 4, array2[i+30], 4 * sizeof array2);
    >> memcpy(array2+i*30 + 8, array2[i+30], 8 * sizeof array2);
    >> memcpy(array2+i*30 + 16, array2[i+30], 14 * sizeof array2);
    >>
    >> Note the final multiplier (if I've go it right).

    >
    > 14's right. The +30's should be *30's.


    Of course. Thanks.

    > I timed this against a naive loop for int, long long, and
    > double types. Conclusions were inconclusive. For double,
    > the naive loop was faster when compiled for plain i386, but
    > slower when 'optimised' for the actual athlon-xp architecture.
    > For int and long long, the memcpys were faster than the loop.
    > The memcpys were in fact simply unrolled completely.


    Was this for the *30 example or the latter *1000 example?

    > Given that there was a case where the memcpys were measurably
    > slower, it's not worth blindly going for the clever technique
    > until you've both found that the naive loop is slowing you down
    > and measured the two to make sure that the optimisation is in
    > fact an improvement. Writing a clever version for speed without
    > measuring whether it's faster or not is dumb. And frequently
    > done, alas.


    The measuring is important but you can't find out that it is not
    always faster without writing it! Having written it, I'd be tempted
    to keep it around in some sort of conditional compilation so that it
    can be switched in when/if measurements show that it is better on some
    architecture/implementation.

    One does ave to weight the maintenance cost, so there must be at least
    the possibility of a benefit.

    --
    Ben.
    Ben Bacarisse, Mar 11, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dead RAM
    Replies:
    20
    Views:
    1,075
    John Harrison
    Jul 14, 2004
  2. D Senthil Kumar

    macro name from macro?

    D Senthil Kumar, Sep 20, 2003, in forum: C Programming
    Replies:
    1
    Views:
    555
    Jack Klein
    Sep 21, 2003
  3. sounak

    to get macro name from macro value

    sounak, Nov 22, 2005, in forum: C Programming
    Replies:
    17
    Views:
    476
    Mark McIntyre
    Nov 22, 2005
  4. Patrick Kowalzick
    Replies:
    5
    Views:
    453
    Patrick Kowalzick
    Mar 14, 2006
  5. Mike Manilone

    macro inside macro

    Mike Manilone, Oct 3, 2011, in forum: C Programming
    Replies:
    8
    Views:
    437
    Mike Manilone
    Oct 6, 2011
Loading...

Share This Page