Optimisation questions

Discussion in 'C Programming' started by Raj Pashwar, Feb 4, 2012.

  1. Raj Pashwar

    Raj Pashwar Guest

    Hello :

    My current program runs too slow. It uses x/8 and x%8 many times, where x
    is an int in both cases.

    I have replaced x/8 with x>>3, as this is faster (pretty sure).
    However, I am not sure whether this is faster than x%8 :
    x-=(x>>3)<<3;

    Is this generally a good optimisation to use?

    Also I've got a memory region that I constantly need to access, so I
    would like it to be in cache. I know that I can use 'register' in
    variable declarations, to speed up it's access. But if I have a pointer
    to that mem. region and I declared it 'register' I will get faster access
    to the pointer, not to the mem. region it points to, isn't it? What is
    the way out of this paradox?

    Thank You.
    Raj Pashwar, Feb 4, 2012
    #1
    1. Advertising

  2. Raj Pashwar

    Ian Collins Guest

    On 02/ 5/12 11:30 AM, Raj Pashwar wrote:
    > Hello :
    >
    > My current program runs too slow. It uses x/8 and x%8 many times, where x
    > is an int in both cases.
    >
    > I have replaced x/8 with x>>3, as this is faster (pretty sure).
    > However, I am not sure whether this is faster than x%8 :
    > x-=(x>>3)<<3;
    >
    > Is this generally a good optimisation to use?


    No, any half decent compiler will be able to optimise x/8 without the
    programmer resorting to hackery.

    > Also I've got a memory region that I constantly need to access, so I
    > would like it to be in cache. I know that I can use 'register' in
    > variable declarations, to speed up it's access. But if I have a pointer
    > to that mem. region and I declared it 'register' I will get faster access
    > to the pointer, not to the mem. region it points to, isn't it? What is
    > the way out of this paradox?


    Not really, register is only a hit and it is pretty well redundant these
    days. Et the optimiser do its job.

    Memory regions will be cached when they are accessed.

    --
    Ian Collins
    Ian Collins, Feb 4, 2012
    #2
    1. Advertising

  3. Raj Pashwar

    BartC Guest

    "Raj Pashwar" <> wrote in message
    news:jgkbiu$432$...
    > Hello :
    >
    > My current program runs too slow. It uses x/8 and x%8 many times, where x
    > is an int in both cases.
    >
    > I have replaced x/8 with x>>3, as this is faster (pretty sure).
    > However, I am not sure whether this is faster than x%8 :
    > x-=(x>>3)<<3;


    > Is this generally a good optimisation to use?


    Try x&7 in place of x%8. The double-shift code above does something entirely
    different (as well as modifying x).

    But, unless your program does nothing except /8 and %8 operations, it's
    unlikely to make much difference. And it's likely these simple optimisations
    have already been done internally.

    What sort of speed-up are you looking for?

    --
    Bartc
    BartC, Feb 4, 2012
    #3
  4. Raj Pashwar

    Eric Sosman Guest

    On 2/4/2012 5:30 PM, Raj Pashwar wrote:
    > Hello :
    >
    > My current program runs too slow. It uses x/8 and x%8 many times, where x
    > is an int in both cases.
    >
    > I have replaced x/8 with x>>3, as this is faster (pretty sure).


    "Pretty sure?" What do your (ahem) measurements think?

    Also, note that x/8 and x>>3 are not equivalent if x is
    negative. The C language does not define x>>3 for negative x,
    but you are likely to get either a very large positive result
    with no obvious relation to "one eighth of x" or else a negative
    result that is a little smaller than you'd like.

    If you happen to know that x is non-negative, so x>>3 is
    well-defined, consider sharing your knowledge with the compiler
    by making x an unsigned int. If you do so and then write x/8,
    it is very likely that the compiler will substitute a shift.

    > However, I am not sure whether this is faster than x%8 :
    > x-=(x>>3)<<3;


    Neither am I. Again, what do your measurements think?

    > Is this generally a good optimisation to use?


    Probably not. x&7 is tempting and certainly simpler, but
    as with the shift it, too, has issues with negative x. Once again,
    making x an unsigned int may enable the compiler to make this or
    some other optimization.

    Let's also note that most machines' integer division produces
    both the quotient and remainder in one operation. If the compiler
    sees x/8 and x%8 close together, with no intervening change to x,
    it may well combine them into just one division and make use of both
    the outputs. A simplistic "divide takes THIS long, remainder takes
    THAT long, add the two" may not be appropriate.

    > Also I've got a memory region that I constantly need to access, so I
    > would like it to be in cache. I know that I can use 'register' in
    > variable declarations, to speed up it's access. But if I have a pointer
    > to that mem. region and I declared it 'register' I will get faster access
    > to the pointer, not to the mem. region it points to, isn't it? What is
    > the way out of this paradox?


    Stop guessing and measure!

    --
    Eric Sosman
    d
    Eric Sosman, Feb 4, 2012
    #4
  5. Raj Pashwar <> writes:
    > My current program runs too slow. It uses x/8 and x%8 many times, where x
    > is an int in both cases.
    >
    > I have replaced x/8 with x>>3, as this is faster (pretty sure).
    > However, I am not sure whether this is faster than x%8 :
    > x-=(x>>3)<<3;
    >
    > Is this generally a good optimisation to use?
    >
    > Also I've got a memory region that I constantly need to access, so I
    > would like it to be in cache. I know that I can use 'register' in
    > variable declarations, to speed up it's access. But if I have a pointer
    > to that mem. region and I declared it 'register' I will get faster access
    > to the pointer, not to the mem. region it points to, isn't it? What is
    > the way out of this paradox?


    Are you enabling all optimization options in your compiler? For
    example, if you're using gcc, are you using "-O3"? If not, that will
    probably give you a better performance boost than any
    micro-optimizations tweaks you might make in your source code.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 4, 2012
    #5
  6. Ian Collins <> writes:
    [...]
    > Not really, register is only a hit and it is pretty well redundant these
    > days. Et the optimiser do its job.


    "hit" --> "hint"
    "Et" --> "Let"

    > Memory regions will be cached when they are accessed.


    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 4, 2012
    #6
  7. Raj Pashwar

    Ian Collins Guest

    On 02/ 5/12 11:52 AM, Keith Thompson wrote:
    > Ian Collins<> writes:
    > [...]
    >> Not really, register is only a hit and it is pretty well redundant these
    >> days. Et the optimiser do its job.

    >
    > "hit" --> "hint"
    > "Et" --> "Let"


    :) tim or a ew kybord.

    --
    Ian Collins
    Ian Collins, Feb 4, 2012
    #7
  8. Raj Pashwar

    BartC Guest

    "BartC" <> wrote in message news:jgkcd7$hgd$...

    >> However, I am not sure whether this is faster than x%8 :
    >> x-=(x>>3)<<3;

    >
    >> Is this generally a good optimisation to use?

    >
    > Try x&7 in place of x%8. The double-shift code above does something
    > entirely different (as well as modifying x).


    Actually, x-((x>>3)<<3) does I think get the same answer (when the shifts
    are the right way round..), but still, you're using 3 ops instead of 1:
    x&=7;

    --
    Bartc
    BartC, Feb 4, 2012
    #8
  9. Raj Pashwar

    James Kuyper Guest

    On 02/04/2012 05:30 PM, Raj Pashwar wrote:
    > Hello :
    >
    > My current program runs too slow. It uses x/8 and x%8 many times, where x
    > is an int in both cases.
    >
    > I have replaced x/8 with x>>3, as this is faster (pretty sure).
    > However, I am not sure whether this is faster than x%8 :
    > x-=(x>>3)<<3;


    As a general rule, your compiler should generate exactly the same code
    for x/8 and x>>3, because they're required to have exactly the same
    behavior. If it doesn't, get a better one. What that code will be may be
    different on different machines, it might be called a division
    instruction on one machine, and a shift instruction on a different
    machine, but you can generally trust the compiler to know which one is
    faster. If it doesn't, get a better one.

    >
    > Is this generally a good optimisation to use?
    >
    > Also I've got a memory region that I constantly need to access, so I
    > would like it to be in cache. I know that I can use 'register' in
    > variable declarations, to speed up it's access.


    That's unlikely to be the case; if putting it in a register would speed
    up, you can be reasonably sure the compiler already puts it in a
    register, even if you don't specifically tell it to do so. If it didn't
    put it in a register, it probably had a good reason for not doing so; in
    that case, if it paid attention to your "register" declaration, it could
    actually slow your program down. For precisely this reason, most
    compilers ignore the keyword "register", and the way the standard is
    written, this is perfectly acceptable behavior.
    --
    James Kuyper
    James Kuyper, Feb 4, 2012
    #9
  10. On 04-Feb-12 17:44, James Kuyper wrote:
    > On 02/04/2012 05:30 PM, Raj Pashwar wrote:
    >> My current program runs too slow. It uses x/8 and x%8 many times, where x
    >> is an int in both cases.
    >>
    >> I have replaced x/8 with x>>3, as this is faster (pretty sure).
    >> However, I am not sure whether this is faster than x%8 :
    >> x-=(x>>3)<<3;

    >
    > As a general rule, your compiler should generate exactly the same code
    > for x/8 and x>>3, because they're required to have exactly the same
    > behavior.


    That only holds if x is non-negative. Since x is an int, the compiler
    cannot safely make that assumption, so it cannot do strength reduction.

    If it is possible to change x from int to unsigned int, that is another
    story.

    S

    --
    Stephen Sprunk "God does not play dice." --Albert Einstein
    CCIE #3723 "God is an inveterate gambler, and He throws the
    K5SSS dice at every possible opportunity." --Stephen Hawking
    Stephen Sprunk, Feb 5, 2012
    #10
  11. Raj Pashwar

    Geoff Guest

    On Sat, 4 Feb 2012 22:30:54 +0000 (UTC), Raj Pashwar
    <> wrote:

    >Hello :
    >
    >My current program runs too slow. It uses x/8 and x%8 many times, where x
    >is an int in both cases.


    How slow is too slow? How did you measure it?
    Where is your measurement data?

    Debug build or release build?
    Which compiler and what version are you using?
    What is your target platform?

    >
    >I have replaced x/8 with x>>3, as this is faster (pretty sure).
    >However, I am not sure whether this is faster than x%8 :
    >x-=(x>>3)<<3;


    To be sure, you must measure.

    What if x is negative? Are the results of x/8 vs. x>>3 the same across
    your problem space?

    >
    >Is this generally a good optimisation to use?


    Generally, no.

    Generally, the result can depend on the value range of x and the
    architecture of your target. Generally, writing code this way is
    called 'premature optimization'. Write code to say what you mean, let
    the compiler do it's job regarding the details of the translation.
    Obtain correct results first, then turn on optimizations and test the
    code again.

    >
    >Also I've got a memory region that I constantly need to access, so I
    >would like it to be in cache. I know that I can use 'register' in
    >variable declarations, to speed up it's access. But if I have a pointer
    >to that mem. region and I declared it 'register' I will get faster access
    >to the pointer, not to the mem. region it points to, isn't it? What is
    >the way out of this paradox?


    Declaring 'register' is a hint to the compiler, it's free to disregard
    it if it wants. A 'region' can never be in register, you are correct.
    Any decent, state-of-the-art compiler will know more than you about
    how to optimize for this. What compiler and what version are you
    using?

    The way out of this paradox is to select a state-of-the-art compiler
    for your target platform. What compiler and version are you using?
    Geoff, Feb 5, 2012
    #11
  12. Raj Pashwar

    James Kuyper Guest

    On 02/04/2012 07:00 PM, Stephen Sprunk wrote:
    > On 04-Feb-12 17:44, James Kuyper wrote:
    >> On 02/04/2012 05:30 PM, Raj Pashwar wrote:
    >>> My current program runs too slow. It uses x/8 and x%8 many times, where x
    >>> is an int in both cases.
    >>>
    >>> I have replaced x/8 with x>>3, as this is faster (pretty sure).
    >>> However, I am not sure whether this is faster than x%8 :
    >>> x-=(x>>3)<<3;

    >>
    >> As a general rule, your compiler should generate exactly the same code
    >> for x/8 and x>>3, because they're required to have exactly the same
    >> behavior.

    >
    > That only holds if x is non-negative.


    You're right - the result is implementation-defined if x is negative.
    For my purposes, that's just about as bad as undefined behavior, so for
    precisely that reason I never use >> on values that could be negative.
    As a result, I have a tendency to automatically assume that an
    expression like x>>3 implies that the author is certain (justifiably or
    not) that x is not negative. However, when questions at this level are
    being asked, that's a bad assumption, and I shouldn't have made it.
    --
    James Kuyper
    James Kuyper, Feb 5, 2012
    #12
  13. Raj Pashwar

    Kaz Kylheku Guest

    On 2012-02-05, James Kuyper <> wrote:
    > On 02/04/2012 07:00 PM, Stephen Sprunk wrote:
    >> On 04-Feb-12 17:44, James Kuyper wrote:
    >>> On 02/04/2012 05:30 PM, Raj Pashwar wrote:
    >>>> My current program runs too slow. It uses x/8 and x%8 many times, where x
    >>>> is an int in both cases.
    >>>>
    >>>> I have replaced x/8 with x>>3, as this is faster (pretty sure).
    >>>> However, I am not sure whether this is faster than x%8 :
    >>>> x-=(x>>3)<<3;
    >>>
    >>> As a general rule, your compiler should generate exactly the same code
    >>> for x/8 and x>>3, because they're required to have exactly the same
    >>> behavior.

    >>
    >> That only holds if x is non-negative.

    >
    > You're right - the result is implementation-defined if x is negative.


    Yes and the point is that this is what helps to optimize it better.

    > For my purposes, that's just about as bad as undefined behavior, so for


    It is nowhere near as "bad". It must not be a failure, and it must be
    documented by the implementation!

    > precisely that reason I never use >> on values that could be negative.


    In C90, that means you would also not use / on values that could be negative,
    because yu could get truncation toward infinity or zero, and
    implementation-defined is just about as bad as undefined ...

    Speaking of division, whenever I see x/y, I tend to think that the programmer
    must be assuming that y is not zero. Oh, and whenever I see x + y,
    it must be that the programmer assumed that the sum of x + y is not greater
    than INT_MAX ...

    Anyway, let's see how GCC (4.5.2, x86) weighs in on this:

    int div8_0(int x)
    {
    return x >> 3;
    }

    int div8_1(int x)
    {
    return x / 8;
    }

    ..globl div8_0
    .type div8_0, @function
    div8_0:
    ..LFB0:
    .cfi_startproc
    movl %edi, %eax
    sarl $3, %eax
    ret
    .cfi_endproc
    ..LFE0:
    .size div8_0, .-div8_0
    .p2align 4,,15
    ..globl div8_1
    .type div8_1, @function
    div8_1:
    ..LFB1:
    .cfi_startproc
    leal 7(%rdi), %eax
    testl %edi, %edi
    cmovns %edi, %eax
    sarl $3, %eax
    ret
    .cfi_endproc
    ..LFE1:


    Two more instructions in the div case!

    So the advice "use / 8; the compiler will generate the same code" is quite
    obsolete. It was that way on some compilers for the C90 language.

    (For instance, if division has truncation toward negative infinity semantics,
    and the arithmetic bit shifter shifts through the sign and copies the sign,
    then such a reduction can be made.)

    The reduction can also be made if you somehow make make it obvious to the
    compiler that the variable cannot be negative.

    (Doing that by making it unsigned, however, has its downsides: like
    introducing bugs due to the discontinuity in unsigned in the neighborhood
    of zero, and surprising value changes on conversion.)
    Kaz Kylheku, Feb 5, 2012
    #13
  14. Raj Pashwar

    Raj Pashwar Guest

    Thanks for all the answers, x&8 looks like a good one for efficiency.

    I also had another idea while sleeping - amazing how that helps :) At
    program startup, maybe in another thread, I can build an array of all
    x/8, x%8 for x's that might occur, then just look up in this array
    instead of doing the division every time. I will try this today also.

    Thanks

    On Sat, 04 Feb 2012 18:44:25 -0500, James Kuyper wrote:
    > On 02/04/2012 05:30 PM, Raj Pashwar wrote:
    >> Hello :
    >>
    >> My current program runs too slow. It uses x/8 and x%8 many times, where
    >> x is an int in both cases.
    >>
    >> I have replaced x/8 with x>>3, as this is faster (pretty sure).
    >> However, I am not sure whether this is faster than x%8 : x-=(x>>3)<<3;

    >
    > As a general rule, your compiler should generate exactly the same code
    > for x/8 and x>>3, because they're required to have exactly the same
    > behavior. If it doesn't, get a better one. What that code will be may be
    > different on different machines, it might be called a division
    > instruction on one machine, and a shift instruction on a different
    > machine, but you can generally trust the compiler to know which one is
    > faster. If it doesn't, get a better one.
    >
    >
    >> Is this generally a good optimisation to use?
    >>
    >> Also I've got a memory region that I constantly need to access, so I
    >> would like it to be in cache. I know that I can use 'register' in
    >> variable declarations, to speed up it's access.

    >
    > That's unlikely to be the case; if putting it in a register would speed
    > up, you can be reasonably sure the compiler already puts it in a
    > register, even if you don't specifically tell it to do so. If it didn't
    > put it in a register, it probably had a good reason for not doing so; in
    > that case, if it paid attention to your "register" declaration, it could
    > actually slow your program down. For precisely this reason, most
    > compilers ignore the keyword "register", and the way the standard is
    > written, this is perfectly acceptable behavior.
    Raj Pashwar, Feb 5, 2012
    #14
  15. Raj Pashwar

    Ian Collins Guest

    On 02/ 5/12 11:51 PM, Raj Pashwar wrote:
    > Thanks for all the answers, x&8 looks like a good one for efficiency.


    Efficiency for what? Did you actually read what we wrote about the
    futility of such silly optimisations? Did you measure the performance
    of your code?

    --
    Ian Collins
    Ian Collins, Feb 5, 2012
    #15
  16. Raj Pashwar

    BartC Guest

    "Raj Pashwar" <> wrote in message
    news:jgln0d$jj2$...
    > Thanks for all the answers, x&8 looks like a good one for efficiency.


    You might find x&7 more useful, it you want the last 3 bits.

    --
    Bartc
    BartC, Feb 5, 2012
    #16
  17. Raj Pashwar

    Zoltan Kocsi Guest

    On Sun, 5 Feb 2012 10:51:57 +0000 (UTC)
    Raj Pashwar <> wrote:

    > Thanks for all the answers, x&8 looks like a good one for efficiency.
    >
    > I also had another idea while sleeping - amazing how that helps :) At
    > program startup, maybe in another thread, I can build an array of all
    > x/8, x%8 for x's that might occur, then just look up in this array
    > instead of doing the division every time. I will try this today also.


    Chances are, your table will be slower, especially if it is decent size.
    A table look up means an address calculation and a memory fetch, which even
    with superscalar architectures and large caches is likely to cost you more
    than a single instruction register-register AND or shift.

    By the way, why would you need to build the table in a different thread? To
    waste memory and cycles?

    You'd be better off with listening to what the others replied to your
    question. Unless x must be signed, change it to unsigned and the compiler
    will merrily optimise your division away.

    Zoltan
    Zoltan Kocsi, Feb 5, 2012
    #17
  18. On Feb 4, 11:37 pm, "christian.bau" <>
    wrote:


    > Here's the most important thing about optimisation: Any attempt at
    > optimisation is completely pointless unless you measure first how much
    > time your program spends in which part of the program.


    I known this is the received wisdom, if not the dogma. But it isn't
    actually true. Yes it's a good idea to measure. But then pressing the
    "run" button and counting heartbeats until the answer appears is a
    measurement. And sometimes a good-enough one.

    You /can/ improve the performance of a program without measuring all
    the different bits. I know I've done it (tweaking a hash calculation
    resulted in a massive performance improvement). Sometimes educated
    guess-work /is/ good enough.

    So, yes, the advice should read "any attempt at optimisation should be
    guided by careful measurement of parts of the program". But your
    "completly pointless" comment is just hyperbole that does more harm
    than good.


    > What you are
    > doing now is based on pure guesswork. And I can promise you that if
    > your program is too slow now, replacing x / 8 with x >> 3 isn't going
    > to help you.
    >
    > Most of the time, programs are made faster by finding the bits where
    > you are doing completely stupid things that waste time, and stop doing
    > them. Next, you look for things where you do unnecessary work, and
    > stop doing that. These things are the most important optimisations,
    > and you'll never do them unless you _measure_.
    >
    > Once your program is in a state where it doesn't do anything stupid,
    > and is still too slow: Google for "cache friendly programming". In
    > 2012, that is the most important thing you need to know to make code
    > fast. Most important, if your code is not "cache friendly" then no
    > optimisation is going to make the slightest difference to speed.
    Nick Keighley, Feb 5, 2012
    #18
  19. On Feb 4, 10:30 pm, Raj Pashwar <> wrote:

    > My current program runs too slow.


    how do you know?


    > It uses x/8 and x%8 many times, where x
    > is an int in both cases.
    >
    > I have replaced x/8 with x>>3, as this is faster (pretty sure).


    and? Does it go any faster. I'd bet not.

    > However, I am not sure whether this is faster than x%8 :
    > x-=(x>>3)<<3;


    measure it

    > Is this generally a good optimisation to use?


    dunno. What does your measurement tell you?

    > Also I've got a memory region that I constantly need to access, so I
    > would like it to be in cache. I know that I can use 'register' in
    > variable declarations, to speed up it's


    that's "its"

    > access. But if I have a pointer
    > to that mem. region and I declared it 'register' I will get faster access
    > to the pointer, not to the mem. region it points to, isn't it? What is
    > the way out of this paradox?


    I see no paradox. Look up what "paradox" means. It doesn't mean "I
    wish the laws of physics were different"
    Nick Keighley, Feb 5, 2012
    #19
  20. Raj Pashwar

    BartC Guest

    "Raj Pashwar" <> wrote in message
    news:jgln0d$jj2$...

    > I also had another idea while sleeping - amazing how that helps :) At
    > program startup, maybe in another thread, I can build an array of all
    > x/8, x%8 for x's that might occur, then just look up in this array
    > instead of doing the division every time. I will try this today also.


    If the range of x is small enough (8 or 16 bits), that it would make the
    table viable, then you don't need a separate thread to set it up. But if
    it's so big (32 bits for example), then such a huge table will likely make
    your machine grind to a halt.

    And a table-lookup will almost certainly take longer than a shift or mask
    operation.

    But, x/8 and x%8 sound like you're doing something with bit pointers.
    Perhaps post more of your code here, so you can get more helpful responses,
    once we know what you're trying to do.

    --
    Bartc
    BartC, Feb 5, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fredrik Ramsberg

    Optimisation of regexps in Perl?

    Fredrik Ramsberg, Oct 14, 2003, in forum: Perl
    Replies:
    2
    Views:
    465
    Fredrik Ramsberg
    Oct 15, 2003
  2. Roedy Green

    boolean loop optimisation

    Roedy Green, Sep 11, 2003, in forum: Java
    Replies:
    8
    Views:
    2,812
    Chris Uppal
    Sep 12, 2003
  3. sorry.no.email@post_NG.com

    Search Engine Optimisation

    sorry.no.email@post_NG.com, May 8, 2006, in forum: HTML
    Replies:
    0
    Views:
    345
    sorry.no.email@post_NG.com
    May 8, 2006
  4. Oliver Batchelor
    Replies:
    1
    Views:
    357
    Frank Schmitt
    Jul 22, 2003
  5. Agent Mulder

    Re: Code optimisation

    Agent Mulder, Aug 27, 2003, in forum: C++
    Replies:
    1
    Views:
    301
    Peter van Merkerk
    Aug 27, 2003
Loading...

Share This Page