Fast addition for n+1 or n+0

Discussion in 'C++' started by Alex Vinokur, Feb 18, 2005.

  1. Alex Vinokur

    Alex Vinokur Guest

    Alex Vinokur, Feb 18, 2005
    #1
    1. Advertising

  2. "Alex Vinokur" <> writes:
    > Consider the following statement:
    > n+i, where i = 1 or 0.
    >
    > Is there more fast method for computing n+i than direct computing that sum?


    The best way to compute n+0 is n.

    The best way to compute n+1 is n+1; if the CPU provides something
    faster than a general add instruction, the compiler will generate it
    for you.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Feb 18, 2005
    #2
    1. Advertising

  3. * Alex Vinokur:
    > Consider the following statement:
    > n+i, where i = 1 or 0.
    >
    > Is there more fast method for computing n+i than direct computing that sum?


    That depends on the types involved.

    For built-in numeric types, direct computation is probably fastest.

    Measure if you're in doubt (and it really matters).

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Feb 18, 2005
    #3
  4. Alex Vinokur wrote:

    > Consider the following statement:
    > n+i, where i = 1 or 0.
    >
    > Is there more fast method for computing n+i than direct computing that
    > sum?
    >


    Assuming integers, hardware addition is implemented simply using full
    adders, or faster algorithms like carry lookahead.

    n+0 has no carries is is fast; many compliers will constant fold to n
    n+1 has potentially m carries in m-bit arithmetic

    Full adder:
    http://isweb.redwoods.cc.ca.us/INSTRUCT/CalderwoodD/diglogic/full.htm

    Carry look ahead:
    http://www.seas.upenn.edu/~ee201/lab/CarryLookAhead/CarryLookAheadF01.html


    gtoomey
    www.gregorytoomey.com
     
    Gregory Toomey, Feb 18, 2005
    #4
  5. In article <>,
    Alex Vinokur <> wrote:

    >Consider the following statement:
    >n+i, where i = 1 or 0.
    >
    >Is there more fast method for computing n+i than direct computing that sum?


    Assuming n and i are ints, not on a modern general purpose computer.
    Addition typically takes one cycle, once the operands are in
    registers.

    Any attempt to use a conditional will almost certainly be much slower.

    For more details, try a newsgroup for the processor you're interested
    in, or maybe comp.arch.

    -- Richard
     
    Richard Tobin, Feb 18, 2005
    #5
  6. Alex Vinokur

    Alex Vinokur Guest

    "Richard Tobin" <> wrote in message news:cv525n$1i7f$...
    > In article <>,
    > Alex Vinokur <> wrote:
    >
    > >Consider the following statement:
    > >n+i, where i = 1 or 0.
    > >
    > >Is there more fast method for computing n+i than direct computing that sum?

    >
    > Assuming n and i are ints, not on a modern general purpose computer.
    > Addition typically takes one cycle, once the operands are in
    > registers.
    >
    > Any attempt to use a conditional will almost certainly be much slower.
    >
    > For more details, try a newsgroup for the processor you're interested
    > in, or maybe comp.arch.
    >
    > -- Richard


    I need that in C/C++ program.

    --
    Alex Vinokur
    email: alex DOT vinokur AT gmail DOT com
    http://mathforum.org/library/view/10978.html
    http://sourceforge.net/users/alexvn
     
    Alex Vinokur, Feb 18, 2005
    #6
  7. Alex Vinokur

    Michael Mair Guest

    Alex Vinokur wrote:
    > "Richard Tobin" <> wrote in message news:cv525n$1i7f$...
    >
    >>In article <>,
    >>Alex Vinokur <> wrote:
    >>
    >>
    >>>Consider the following statement:
    >>>n+i, where i = 1 or 0.
    >>>
    >>>Is there more fast method for computing n+i than direct computing that sum?

    >>
    >>Assuming n and i are ints, not on a modern general purpose computer.
    >>Addition typically takes one cycle, once the operands are in
    >>registers.
    >>
    >>Any attempt to use a conditional will almost certainly be much slower.
    >>
    >>For more details, try a newsgroup for the processor you're interested
    >>in, or maybe comp.arch.

    >
    > I need that in C/C++ program.


    Well, there is no general truth helping you along to a portable,
    always perfect solution.
    If you want to optimise your code for speed, use a profiler to
    determine which functions are called how often and take how much
    time. Then you know _where_ you lose your time.
    After that, try to find algorithms which reduce the number
    of calls to small functions which take a good part of the overall
    time and reduces the time spent in "big" functions taking much time.
    If you afterwards really find that optimising code with
    'n+0' and 'n+1' would be the best possible micro-optimisation
    to gain some more cycles, then you should try to write as many
    'n+0's/'n's and 'n+1's as possible explicitly in your code
    instead of using 'n+i'. The compiler will optimise that if the
    code has the potential for optimisation.
    Afterwards, use the profiler to determine whether this actually
    makes a difference.

    Probably not much.
    If you think you can do better than the compiler, then follow
    Richard's suggestion about comp.arch.*


    Cheers
    Michael
    --
    E-Mail: Mine is a gmx dot de address.
     
    Michael Mair, Feb 18, 2005
    #7
  8. In article <>,
    Alex Vinokur <> wrote:
    :Consider the following statement:
    :n+i, where i = 1 or 0.

    :Is there more fast method for computing n+i than direct computing that sum?

    It depends on the costs you assign to the various operations -- a
    matter which is architecture dependant. Integer addition is usually one of
    the fastest things a computer does. Suppose you were able to find a
    two instruction sequence that was faster for that particular case: then
    it is very likely to be slower because internally the CPU has
    to perform an integer addition in order to find the address of the
    second instruction.

    Have you perhaps omitted some important facts about the circumstances?
    For example, are you microprogramming, or is this a theory question
    at the micro-level where each comparison and change of a bit in
    the implimentation of the 'addition' operation is to be counted?
    Is this an assignment in designing an IC which is faster for these
    particular cases than building a full-blown adder circuit would be?

    --
    Reviewers should be required to produce a certain number of
    negative reviews - like police given quotas for handing out
    speeding tickets. -- The Audio Anarchist
     
    Walter Roberson, Feb 18, 2005
    #8
  9. Alex Vinokur wrote:

    > Consider the following statement:
    >
    > n + i, where i = 1 or 0.
    >
    > Is there more fast method for computing n + i than direct computing that sum?


    No.
    But a good optimizing compiler should be able to
    replace n + 0 with n and replace n + 1 with ++n.
     
    E. Robert Tisdale, Feb 18, 2005
    #9
  10. Alex Vinokur

    Alex Vinokur Guest

    "Walter Roberson" <-cnrc.gc.ca> wrote in message news:cv563d$d73$...
    > In article <>,
    > Alex Vinokur <> wrote:
    > :Consider the following statement:
    > :n+i, where i = 1 or 0.
    >
    > :Is there more fast method for computing n+i than direct computing that sum?
    >
    > It depends on the costs you assign to the various operations -- a
    > matter which is architecture dependant. Integer addition is usually one of
    > the fastest things a computer does. Suppose you were able to find a
    > two instruction sequence that was faster for that particular case: then
    > it is very likely to be slower because internally the CPU has
    > to perform an integer addition in order to find the address of the
    > second instruction.
    >
    > Have you perhaps omitted some important facts about the circumstances?
    > For example, are you microprogramming, or is this a theory question
    > at the micro-level where each comparison and change of a bit in
    > the implimentation of the 'addition' operation is to be counted?
    > Is this an assignment in designing an IC which is faster for these
    > particular cases than building a full-blown adder circuit would be?
    >


    I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula.
    The algorithm can be seen at
    http://groups-beta.google.com/group/alt.sources/msg/42e76b12150613a1

    Function AddUnits() contains a line
    n1 += (n2 + carry_s); // carry_s == 0 or 1

    The question is if is it possible to make that line to work faster?

    --
    Alex Vinokur
    email: alex DOT vinokur AT gmail DOT com
    http://mathforum.org/library/view/10978.html
    http://sourceforge.net/users/alexvn
     
    Alex Vinokur, Feb 18, 2005
    #10
  11. Alex Vinokur wrote:

    > I would like to optimize (speed)
    > an algorithm for computing very large Fibonacci numbers
    > using the primary recursive formula.
    > The algorithm can be seen at


    > http://groups-beta.google.com/group/alt.sources/msg/42e76b12150613a1
    >
    > Function AddUnits() contains a line
    >
    > n1 += (n2 + carry_s); // carry_s == 0 or 1
    >
    > The question is if is it possible to make that line to work faster?


    No.
     
    E. Robert Tisdale, Feb 18, 2005
    #11
  12. In article <cv563d$d73$>,
    Walter Roberson <-cnrc.gc.ca> wrote:
    |In article <>,
    |Alex Vinokur <> wrote:

    |:Consider the following statement:
    |:n+i, where i = 1 or 0.
    |:Is there more fast method for computing n+i than direct computing that sum?

    |It depends on the costs you assign to the various operations -- a
    |matter which is architecture dependant.

    There is a possibility that would be slower in any real
    architecture that I've ever heard of, but which could be faster
    under very narrow circumstances.

    (n&1) ? (n+i) : (n|i)

    The narrow circumstances under which this could be faster are:
    - this is within a tight loop that fits within the processor's
    primary instruction cache
    - the processor has a "move conditional" operation that
    avoids taking an actual branch when the operations are
    simple enough and the result is being used arithmetically
    instead of to control a branch
    - at the microcode level, the processor "runs free"
    when working from instruction cache, processing each
    instruction as fast as possible instead of working
    on a bus-cycle system (which is needed in most cases
    when anything outside the primary cache is being referenced)
    - the cost of the bitwise AND operation plus the cost of the
    comparison to 0 plus the cost of the bitwise OR operation,
    are faster than the cost of a full addition

    I have heard of one architecture (I don't recall which)
    that had a "move conditional" operation that took
    a test condition and two arithmetic operations as operands,
    and would start doing the two artihmetic operations in
    parallel at the same time it was doing the test; when the
    result of the test was available, it would abort the false
    branch if it was not already finished, with the result
    being whichever of the arithmetic expressions was selected
    by the condition.


    Please note how narrow these conditions are: you would
    have to know a LOT about your processor to make this kind
    of optimization: the expression I give above will be slower
    than a straight addition on nearly every architecture.

    Addition is usually hard-coded through a series of
    transistors, with the carry circuit taking most of the
    landscape. It's hard to beat transistor-level speeds
    by using multiple instructions.

    I have heard that some architectures internally
    optimize +0 and +1; there would be no way to beat that...
    but again you would need to know intimate details of
    the architecture.
    --
    I don't know if there's destiny,
    but there's a decision! -- Wim Wenders (WoD)
     
    Walter Roberson, Feb 18, 2005
    #12
  13. "Alex Vinokur" <> wrote in message
    news:...

    > Function AddUnits() contains a line
    > n1 += (n2 + carry_s); // carry_s == 0 or 1
    >
    > The question is if is it possible to make that line to work faster?


    What fraction of your program's total execution time does this statement
    consume?

    Until you know the answer to this question, you don't know whether it's even
    worth trying to change it, let alone the best way of doing so.
     
    Andrew Koenig, Feb 18, 2005
    #13
  14. On 2005-02-18 12:54:57 -0500, "Alex Vinokur" <> said:

    > "Walter Roberson" <-cnrc.gc.ca> wrote in message
    > news:cv563d$d73$...
    >> In article <>,
    >> Alex Vinokur <> wrote:
    >> :Consider the following statement:
    >> :n+i, where i = 1 or 0.
    >>
    >> :Is there more fast method for computing n+i than direct computing that sum?
    >>
    >> It depends on the costs you assign to the various operations -- a
    >> matter which is architecture dependant. Integer addition is usually one of
    >> the fastest things a computer does. Suppose you were able to find a
    >> two instruction sequence that was faster for that particular case: then
    >> it is very likely to be slower because internally the CPU has
    >> to perform an integer addition in order to find the address of the
    >> second instruction.
    >>
    >> Have you perhaps omitted some important facts about the circumstances?
    >> For example, are you microprogramming, or is this a theory question
    >> at the micro-level where each comparison and change of a bit in
    >> the implimentation of the 'addition' operation is to be counted?
    >> Is this an assignment in designing an IC which is faster for these
    >> particular cases than building a full-blown adder circuit would be?
    >>

    >
    > I would like to optimize (speed) an algorithm for computing very large
    > Fibonacci numbers using the primary recursive formula.
    > The algorithm can be seen at
    > http://groups-beta.google.com/group/alt.sources/msg/42e76b12150613a1
    >
    > Function AddUnits() contains a line
    > n1 += (n2 + carry_s); // carry_s == 0 or 1
    >
    > The question is if is it possible to make that line to work faster?


    No, the question is: "Is that line the bottleneck?" How do you know
    that line is the problem? Have you measured the performance of your
    code?

    --
    Clark S. Cox, III
     
    Clark S. Cox III, Feb 18, 2005
    #14
  15. "Alex Vinokur" <> writes:
    [...]
    > I would like to optimize (speed) an algorithm for computing very
    > large Fibonacci numbers using the primary recursive formula.
    > The algorithm can be seen at
    > http://groups-beta.google.com/group/alt.sources/msg/42e76b12150613a1
    >
    > Function AddUnits() contains a line
    > n1 += (n2 + carry_s); // carry_s == 0 or 1
    >
    > The question is if is it possible to make that line to work faster?


    Use the maximum optimization level your compiler provides (you're
    probably already doing this). Use a better compiler if you can find
    one. Use a faster computer. Kick other users off the system so you
    get 100% of the CPU.

    As others have mentioned, there's little point in trying to optimize
    this one line unless you've actually made measurements that indicate
    that it's a bottleneck. Even if you've done that, there's no reliable
    portable way in standard C to improve the performance of that line of
    code.

    It's conceivable that a compiler can generate better code if it
    happens to know that carry_s is either 0 or 1. It might be able to
    infer this by dataflow analysis, depending on how carry_s is set. If
    you have a C99 compiler, making carry_s a _Bool <OT>or bool if you're
    using C++</OT> might help (or it might hurt).

    The following:

    n1 += n2;
    if (carry_s) n1++;

    might give you better or worse performance, or exactly the same,
    depending on the CPU architecture, the compiler, and the phase of the
    moon. (The "if" is likely to cause a branch, which can screw up
    pipelining -- or the compiler may be able to use some special CPU
    instruction that does exactly what's needed.)

    If this line really is a serious bottleneck, you might consider
    writing it in several equivalent ways and choosing among them with a
    macro:

    #if METHOD == 1
    n1 += (n2 + carry_s);
    #elif METHOD == 2
    n1 += n2;
    if (carry_s) n1++;
    #elif METHOD == 3
    /* something else */
    #else
    #error METHOD is undefined or invalid.
    #endif

    For a given platform, try compiling and running your program with each
    defined METHOD, and *measure the results*. (You can also examine the
    assembly listing; this can tell you if two methods result in the same
    code, but won't necessarily tell you which is better unless you're an
    expert in the particular CPU.) Expect the tradeoffs to change with
    the next release of the compiler or a different version of the CPU.

    Or, if you don't care about portability, you can code it in assembly
    language (which we can't help you with here). Consider using your
    compiler's output as a guide.

    Again, all this assumes that that one line really is a serious
    bottleneck. The only way to know this is to profile your code. If
    it's not a bottleneck, just write it as straightfowardly as possible
    and spend your effort elsewhere.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Feb 18, 2005
    #15
  16. In article <2005021814290016807%clarkcox3@gmailcom>,
    Clark S. Cox III <> wrote:
    :On 2005-02-18 12:54:57 -0500, "Alex Vinokur" <> said:

    :> n1 += (n2 + carry_s); // carry_s == 0 or 1

    :> The question is if is it possible to make that line to work faster?

    :No, the question is: "Is that line the bottleneck?"

    I profiled his code here on a particular platform. The line he is
    asking about is the -fastest- part of that function. The startup code
    for the function itself is a hair slower; the line after the above line
    takes about 3 times as long as the +=, and the code for the return statement
    after that takes a bit longer still.
    --
    Entropy is the logarithm of probability -- Boltzmann
     
    Walter Roberson, Feb 18, 2005
    #16
  17. * Alex Vinokur:
    >
    > I would like to optimize (speed) an algorithm for computing very large Fibonacci
    > numbers using the primary recursive formula.
    > The algorithm can be seen at
    > http://groups-beta.google.com/group/alt.sources/msg/42e76b12150613a1
    >
    > Function AddUnits() contains a line
    > n1 += (n2 + carry_s); // carry_s == 0 or 1
    >
    > The question is if is it possible to make that line to work faster?


    Attacking an optimization problem at the level of fundamental additions is
    seldom a Good Idea.

    Thinking about what goes on is almost always a Better Idea.

    Almost any way of computing Fibonacci numbers is faster than the recursive
    formula. But you're not using the recursive formula directly, you're summing
    iteratively, storing results in a std::vector of std::vector. Most of the
    time will, I gather, be spent in internal new and delete operations, and in
    the operating system's virtual memory swapping to and from disk, so
    possibly you can optimize _a lot_ by first computing the approximate Fib
    number using double arithmetic (check out the Golden Ratio), then allocate
    just what you need of memory for that single number, and then compute the
    number exactly.

    Hth.,

    - Alf

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Feb 18, 2005
    #17
  18. In article <cv5lml$2um$>,
    Walter Roberson <-cnrc.gc.ca> wrote:
    |In article <2005021814290016807%clarkcox3@gmailcom>,
    |Clark S. Cox III <> wrote:

    |:No, the question is: "Is that line the bottleneck?"

    |I profiled his code here on a particular platform. The line he is
    |asking about is the -fastest- part of that function.

    I recompiled with aggressive optimizations, interprocedural analysis,
    loop unrolling, and telling the compiler it was okay to mix code
    together in ways that make it difficult to tell exactly which line you
    are on.

    When I turned on all those optimizations, a sample run with hardware
    profiling counted 9724 against the line the OP pointed out,
    3195 against the next line, and 637 against the return.

    Thus, if you were naive about what the profiling output really means in
    the face of high optimization, then you could end up drawing the
    conclusion that it was the add that was slow.
    --
    Sub-millibarn resolution bio-hyperdimensional plasmatic space
    polyimaging is just around the corner. -- Corry Lee Smith
     
    Walter Roberson, Feb 18, 2005
    #18
  19. In article <>,
    Alf P. Steinbach <> wrote:
    :Almost any way of computing Fibonacci numbers is faster than the recursive
    :formula. But you're not using the recursive formula directly, you're summing
    :iteratively, storing results in a std::vector of std::vector. Most of the
    :time will, I gather, be spent in internal new and delete operations, and in
    :the operating system's virtual memory swapping to and from disk,

    That's a good thought, but my profiling experiments on his code show
    that the amount of time spent in those areas is in the noise level,
    with the arithmetic functions of the routine the OP indicate
    being the bottleneck.

    The line he indicated is not the bottleneck, but my experiments show
    that if you are using high optimization in combination with profiling,
    that the profiler can end up accounting the addition line as if it
    was about 3/4 of the execution time. It's an artifact of loop
    unrolling and similar.
    --
    Cannot open .signature: Permission denied
     
    Walter Roberson, Feb 18, 2005
    #19
  20. * Walter Roberson:
    > In article <>,
    > Alf P. Steinbach <> wrote:
    > :Almost any way of computing Fibonacci numbers is faster than the recursive
    > :formula. But you're not using the recursive formula directly, you're summing
    > :iteratively, storing results in a std::vector of std::vector. Most of the
    > :time will, I gather, be spent in internal new and delete operations, and in
    > :the operating system's virtual memory swapping to and from disk,
    >
    > That's a good thought, but my profiling experiments on his code show
    > that the amount of time spent in those areas is in the noise level,
    > with the arithmetic functions of the routine the OP indicate
    > being the bottleneck.


    Did you profile for _large_ Fib numbers, numbers much greater than can be
    represented by ordinary 'long', which is what the code seems to be all about?

    And does your profiler account for out-of-process time such as e.g. swapping?

    Profiling is a tricky business, and without analyzing that code in detail
    (I just skimmed it) it seemed to me have at least O(n log n) memory
    consumption for computation of a Fib number number n...


    > The line he indicated is not the bottleneck, but my experiments show
    > that if you are using high optimization in combination with profiling,
    > that the profiler can end up accounting the addition line as if it
    > was about 3/4 of the execution time. It's an artifact of loop
    > unrolling and similar.


    Yes... ;-)

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Feb 18, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Midnight Java Junkie

    Office Development (feature addition)

    Midnight Java Junkie, Jun 20, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    468
    Midnight Java Junkie
    Jun 20, 2004
  2. Replies:
    0
    Views:
    681
  3. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    579
  4. Alex Vinokur

    Fast addition for n+1 or n+0

    Alex Vinokur, Feb 18, 2005, in forum: C Programming
    Replies:
    23
    Views:
    730
    Christian Bau
    Feb 22, 2005
  5. Juha Nieminen
    Replies:
    22
    Views:
    1,049
    Kai-Uwe Bux
    Oct 12, 2007
Loading...

Share This Page