# Fast addition for n+1 or n+0

Discussion in 'C++' started by Alex Vinokur, Feb 18, 2005.

1. ### Alex VinokurGuest

Alex Vinokur, Feb 18, 2005

2. ### Keith ThompsonGuest

"Alex Vinokur" <> writes:
> Consider the following statement:
> n+i, where i = 1 or 0.
>
> Is there more fast method for computing n+i than direct computing that sum?

The best way to compute n+0 is n.

The best way to compute n+1 is n+1; if the CPU provides something
faster than a general add instruction, the compiler will generate it
for you.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Keith Thompson, Feb 18, 2005

3. ### Alf P. SteinbachGuest

* Alex Vinokur:
> Consider the following statement:
> n+i, where i = 1 or 0.
>
> Is there more fast method for computing n+i than direct computing that sum?

That depends on the types involved.

For built-in numeric types, direct computation is probably fastest.

Measure if you're in doubt (and it really matters).

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Alf P. Steinbach, Feb 18, 2005
4. ### Gregory ToomeyGuest

Alex Vinokur wrote:

> Consider the following statement:
> n+i, where i = 1 or 0.
>
> Is there more fast method for computing n+i than direct computing that
> sum?
>

Assuming integers, hardware addition is implemented simply using full

n+0 has no carries is is fast; many compliers will constant fold to n
n+1 has potentially m carries in m-bit arithmetic

http://isweb.redwoods.cc.ca.us/INSTRUCT/CalderwoodD/diglogic/full.htm

gtoomey
www.gregorytoomey.com

Gregory Toomey, Feb 18, 2005
5. ### Richard TobinGuest

In article <>,
Alex Vinokur <> wrote:

>Consider the following statement:
>n+i, where i = 1 or 0.
>
>Is there more fast method for computing n+i than direct computing that sum?

Assuming n and i are ints, not on a modern general purpose computer.
Addition typically takes one cycle, once the operands are in
registers.

Any attempt to use a conditional will almost certainly be much slower.

For more details, try a newsgroup for the processor you're interested
in, or maybe comp.arch.

-- Richard

Richard Tobin, Feb 18, 2005
6. ### Alex VinokurGuest

"Richard Tobin" <> wrote in message news:cv525n\$1i7f\$...
> In article <>,
> Alex Vinokur <> wrote:
>
> >Consider the following statement:
> >n+i, where i = 1 or 0.
> >
> >Is there more fast method for computing n+i than direct computing that sum?

>
> Assuming n and i are ints, not on a modern general purpose computer.
> Addition typically takes one cycle, once the operands are in
> registers.
>
> Any attempt to use a conditional will almost certainly be much slower.
>
> For more details, try a newsgroup for the processor you're interested
> in, or maybe comp.arch.
>
> -- Richard

I need that in C/C++ program.

--
Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Alex Vinokur, Feb 18, 2005
7. ### Michael MairGuest

Alex Vinokur wrote:
> "Richard Tobin" <> wrote in message news:cv525n\$1i7f\$...
>
>>In article <>,
>>Alex Vinokur <> wrote:
>>
>>
>>>Consider the following statement:
>>>n+i, where i = 1 or 0.
>>>
>>>Is there more fast method for computing n+i than direct computing that sum?

>>
>>Assuming n and i are ints, not on a modern general purpose computer.
>>Addition typically takes one cycle, once the operands are in
>>registers.
>>
>>Any attempt to use a conditional will almost certainly be much slower.
>>
>>For more details, try a newsgroup for the processor you're interested
>>in, or maybe comp.arch.

>
> I need that in C/C++ program.

Well, there is no general truth helping you along to a portable,
always perfect solution.
If you want to optimise your code for speed, use a profiler to
determine which functions are called how often and take how much
time. Then you know _where_ you lose your time.
After that, try to find algorithms which reduce the number
of calls to small functions which take a good part of the overall
time and reduces the time spent in "big" functions taking much time.
If you afterwards really find that optimising code with
'n+0' and 'n+1' would be the best possible micro-optimisation
to gain some more cycles, then you should try to write as many
'n+0's/'n's and 'n+1's as possible explicitly in your code
instead of using 'n+i'. The compiler will optimise that if the
code has the potential for optimisation.
Afterwards, use the profiler to determine whether this actually
makes a difference.

Probably not much.
If you think you can do better than the compiler, then follow

Cheers
Michael
--
E-Mail: Mine is a gmx dot de address.

Michael Mair, Feb 18, 2005
8. ### Walter RobersonGuest

In article <>,
Alex Vinokur <> wrote:
:Consider the following statement:
:n+i, where i = 1 or 0.

:Is there more fast method for computing n+i than direct computing that sum?

It depends on the costs you assign to the various operations -- a
matter which is architecture dependant. Integer addition is usually one of
the fastest things a computer does. Suppose you were able to find a
two instruction sequence that was faster for that particular case: then
it is very likely to be slower because internally the CPU has
to perform an integer addition in order to find the address of the
second instruction.

Have you perhaps omitted some important facts about the circumstances?
For example, are you microprogramming, or is this a theory question
at the micro-level where each comparison and change of a bit in
the implimentation of the 'addition' operation is to be counted?
Is this an assignment in designing an IC which is faster for these
particular cases than building a full-blown adder circuit would be?

--
Reviewers should be required to produce a certain number of
negative reviews - like police given quotas for handing out
speeding tickets. -- The Audio Anarchist

Walter Roberson, Feb 18, 2005
9. ### E. Robert TisdaleGuest

Alex Vinokur wrote:

> Consider the following statement:
>
> n + i, where i = 1 or 0.
>
> Is there more fast method for computing n + i than direct computing that sum?

No.
But a good optimizing compiler should be able to
replace n + 0 with n and replace n + 1 with ++n.

E. Robert Tisdale, Feb 18, 2005
10. ### Alex VinokurGuest

"Walter Roberson" <-cnrc.gc.ca> wrote in message news:cv563d\$d73\$...
> In article <>,
> Alex Vinokur <> wrote:
> :Consider the following statement:
> :n+i, where i = 1 or 0.
>
> :Is there more fast method for computing n+i than direct computing that sum?
>
> It depends on the costs you assign to the various operations -- a
> matter which is architecture dependant. Integer addition is usually one of
> the fastest things a computer does. Suppose you were able to find a
> two instruction sequence that was faster for that particular case: then
> it is very likely to be slower because internally the CPU has
> to perform an integer addition in order to find the address of the
> second instruction.
>
> Have you perhaps omitted some important facts about the circumstances?
> For example, are you microprogramming, or is this a theory question
> at the micro-level where each comparison and change of a bit in
> the implimentation of the 'addition' operation is to be counted?
> Is this an assignment in designing an IC which is faster for these
> particular cases than building a full-blown adder circuit would be?
>

I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula.
The algorithm can be seen at

n1 += (n2 + carry_s); // carry_s == 0 or 1

The question is if is it possible to make that line to work faster?

--
Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Alex Vinokur, Feb 18, 2005
11. ### E. Robert TisdaleGuest

Alex Vinokur wrote:

> I would like to optimize (speed)
> an algorithm for computing very large Fibonacci numbers
> using the primary recursive formula.
> The algorithm can be seen at

>
> Function AddUnits() contains a line
>
> n1 += (n2 + carry_s); // carry_s == 0 or 1
>
> The question is if is it possible to make that line to work faster?

No.

E. Robert Tisdale, Feb 18, 2005
12. ### Walter RobersonGuest

In article <cv563d\$d73\$>,
Walter Roberson <-cnrc.gc.ca> wrote:
|In article <>,
|Alex Vinokur <> wrote:

|:Consider the following statement:
|:n+i, where i = 1 or 0.
|:Is there more fast method for computing n+i than direct computing that sum?

|It depends on the costs you assign to the various operations -- a
|matter which is architecture dependant.

There is a possibility that would be slower in any real
architecture that I've ever heard of, but which could be faster
under very narrow circumstances.

(n&1) ? (n+i) : (n|i)

The narrow circumstances under which this could be faster are:
- this is within a tight loop that fits within the processor's
primary instruction cache
- the processor has a "move conditional" operation that
avoids taking an actual branch when the operations are
simple enough and the result is being used arithmetically
instead of to control a branch
- at the microcode level, the processor "runs free"
when working from instruction cache, processing each
instruction as fast as possible instead of working
on a bus-cycle system (which is needed in most cases
when anything outside the primary cache is being referenced)
- the cost of the bitwise AND operation plus the cost of the
comparison to 0 plus the cost of the bitwise OR operation,
are faster than the cost of a full addition

I have heard of one architecture (I don't recall which)
that had a "move conditional" operation that took
a test condition and two arithmetic operations as operands,
and would start doing the two artihmetic operations in
parallel at the same time it was doing the test; when the
result of the test was available, it would abort the false
branch if it was not already finished, with the result
being whichever of the arithmetic expressions was selected
by the condition.

Please note how narrow these conditions are: you would
have to know a LOT about your processor to make this kind
of optimization: the expression I give above will be slower
than a straight addition on nearly every architecture.

Addition is usually hard-coded through a series of
transistors, with the carry circuit taking most of the
landscape. It's hard to beat transistor-level speeds
by using multiple instructions.

I have heard that some architectures internally
optimize +0 and +1; there would be no way to beat that...
but again you would need to know intimate details of
the architecture.
--
I don't know if there's destiny,
but there's a decision! -- Wim Wenders (WoD)

Walter Roberson, Feb 18, 2005
13. ### Andrew KoenigGuest

"Alex Vinokur" <> wrote in message
news:...

> Function AddUnits() contains a line
> n1 += (n2 + carry_s); // carry_s == 0 or 1
>
> The question is if is it possible to make that line to work faster?

What fraction of your program's total execution time does this statement
consume?

Until you know the answer to this question, you don't know whether it's even
worth trying to change it, let alone the best way of doing so.

Andrew Koenig, Feb 18, 2005
14. ### Clark S. Cox IIIGuest

On 2005-02-18 12:54:57 -0500, "Alex Vinokur" <> said:

> "Walter Roberson" <-cnrc.gc.ca> wrote in message
> news:cv563d\$d73\$...
>> In article <>,
>> Alex Vinokur <> wrote:
>> :Consider the following statement:
>> :n+i, where i = 1 or 0.
>>
>> :Is there more fast method for computing n+i than direct computing that sum?
>>
>> It depends on the costs you assign to the various operations -- a
>> matter which is architecture dependant. Integer addition is usually one of
>> the fastest things a computer does. Suppose you were able to find a
>> two instruction sequence that was faster for that particular case: then
>> it is very likely to be slower because internally the CPU has
>> to perform an integer addition in order to find the address of the
>> second instruction.
>>
>> Have you perhaps omitted some important facts about the circumstances?
>> For example, are you microprogramming, or is this a theory question
>> at the micro-level where each comparison and change of a bit in
>> the implimentation of the 'addition' operation is to be counted?
>> Is this an assignment in designing an IC which is faster for these
>> particular cases than building a full-blown adder circuit would be?
>>

>
> I would like to optimize (speed) an algorithm for computing very large
> Fibonacci numbers using the primary recursive formula.
> The algorithm can be seen at
>
> Function AddUnits() contains a line
> n1 += (n2 + carry_s); // carry_s == 0 or 1
>
> The question is if is it possible to make that line to work faster?

No, the question is: "Is that line the bottleneck?" How do you know
that line is the problem? Have you measured the performance of your
code?

--
Clark S. Cox, III

Clark S. Cox III, Feb 18, 2005
15. ### Keith ThompsonGuest

"Alex Vinokur" <> writes:
[...]
> I would like to optimize (speed) an algorithm for computing very
> large Fibonacci numbers using the primary recursive formula.
> The algorithm can be seen at
>
> Function AddUnits() contains a line
> n1 += (n2 + carry_s); // carry_s == 0 or 1
>
> The question is if is it possible to make that line to work faster?

Use the maximum optimization level your compiler provides (you're
probably already doing this). Use a better compiler if you can find
one. Use a faster computer. Kick other users off the system so you
get 100% of the CPU.

As others have mentioned, there's little point in trying to optimize
this one line unless you've actually made measurements that indicate
that it's a bottleneck. Even if you've done that, there's no reliable
portable way in standard C to improve the performance of that line of
code.

It's conceivable that a compiler can generate better code if it
happens to know that carry_s is either 0 or 1. It might be able to
infer this by dataflow analysis, depending on how carry_s is set. If
you have a C99 compiler, making carry_s a _Bool <OT>or bool if you're
using C++</OT> might help (or it might hurt).

The following:

n1 += n2;
if (carry_s) n1++;

might give you better or worse performance, or exactly the same,
depending on the CPU architecture, the compiler, and the phase of the
moon. (The "if" is likely to cause a branch, which can screw up
pipelining -- or the compiler may be able to use some special CPU
instruction that does exactly what's needed.)

If this line really is a serious bottleneck, you might consider
writing it in several equivalent ways and choosing among them with a
macro:

#if METHOD == 1
n1 += (n2 + carry_s);
#elif METHOD == 2
n1 += n2;
if (carry_s) n1++;
#elif METHOD == 3
/* something else */
#else
#error METHOD is undefined or invalid.
#endif

For a given platform, try compiling and running your program with each
defined METHOD, and *measure the results*. (You can also examine the
assembly listing; this can tell you if two methods result in the same
code, but won't necessarily tell you which is better unless you're an
expert in the particular CPU.) Expect the tradeoffs to change with
the next release of the compiler or a different version of the CPU.

Or, if you don't care about portability, you can code it in assembly
compiler's output as a guide.

Again, all this assumes that that one line really is a serious
bottleneck. The only way to know this is to profile your code. If
it's not a bottleneck, just write it as straightfowardly as possible

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Keith Thompson, Feb 18, 2005
16. ### Walter RobersonGuest

In article <2005021814290016807%clarkcox3@gmailcom>,
Clark S. Cox III <> wrote:
:On 2005-02-18 12:54:57 -0500, "Alex Vinokur" <> said:

:> n1 += (n2 + carry_s); // carry_s == 0 or 1

:> The question is if is it possible to make that line to work faster?

:No, the question is: "Is that line the bottleneck?"

I profiled his code here on a particular platform. The line he is
asking about is the -fastest- part of that function. The startup code
for the function itself is a hair slower; the line after the above line
takes about 3 times as long as the +=, and the code for the return statement
after that takes a bit longer still.
--
Entropy is the logarithm of probability -- Boltzmann

Walter Roberson, Feb 18, 2005
17. ### Alf P. SteinbachGuest

* Alex Vinokur:
>
> I would like to optimize (speed) an algorithm for computing very large Fibonacci
> numbers using the primary recursive formula.
> The algorithm can be seen at
>
> Function AddUnits() contains a line
> n1 += (n2 + carry_s); // carry_s == 0 or 1
>
> The question is if is it possible to make that line to work faster?

Attacking an optimization problem at the level of fundamental additions is
seldom a Good Idea.

Thinking about what goes on is almost always a Better Idea.

Almost any way of computing Fibonacci numbers is faster than the recursive
formula. But you're not using the recursive formula directly, you're summing
iteratively, storing results in a std::vector of std::vector. Most of the
time will, I gather, be spent in internal new and delete operations, and in
the operating system's virtual memory swapping to and from disk, so
possibly you can optimize _a lot_ by first computing the approximate Fib
number using double arithmetic (check out the Golden Ratio), then allocate
just what you need of memory for that single number, and then compute the
number exactly.

Hth.,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Alf P. Steinbach, Feb 18, 2005
18. ### Walter RobersonGuest

In article <cv5lml\$2um\$>,
Walter Roberson <-cnrc.gc.ca> wrote:
|In article <2005021814290016807%clarkcox3@gmailcom>,
|Clark S. Cox III <> wrote:

|:No, the question is: "Is that line the bottleneck?"

|I profiled his code here on a particular platform. The line he is

I recompiled with aggressive optimizations, interprocedural analysis,
loop unrolling, and telling the compiler it was okay to mix code
together in ways that make it difficult to tell exactly which line you
are on.

When I turned on all those optimizations, a sample run with hardware
profiling counted 9724 against the line the OP pointed out,
3195 against the next line, and 637 against the return.

Thus, if you were naive about what the profiling output really means in
the face of high optimization, then you could end up drawing the
conclusion that it was the add that was slow.
--
Sub-millibarn resolution bio-hyperdimensional plasmatic space
polyimaging is just around the corner. -- Corry Lee Smith

Walter Roberson, Feb 18, 2005
19. ### Walter RobersonGuest

In article <>,
Alf P. Steinbach <> wrote:
:Almost any way of computing Fibonacci numbers is faster than the recursive
:formula. But you're not using the recursive formula directly, you're summing
:iteratively, storing results in a std::vector of std::vector. Most of the
:time will, I gather, be spent in internal new and delete operations, and in
:the operating system's virtual memory swapping to and from disk,

That's a good thought, but my profiling experiments on his code show
that the amount of time spent in those areas is in the noise level,
with the arithmetic functions of the routine the OP indicate
being the bottleneck.

The line he indicated is not the bottleneck, but my experiments show
that if you are using high optimization in combination with profiling,
that the profiler can end up accounting the addition line as if it
was about 3/4 of the execution time. It's an artifact of loop
unrolling and similar.
--
Cannot open .signature: Permission denied

Walter Roberson, Feb 18, 2005
20. ### Alf P. SteinbachGuest

* Walter Roberson:
> In article <>,
> Alf P. Steinbach <> wrote:
> :Almost any way of computing Fibonacci numbers is faster than the recursive
> :formula. But you're not using the recursive formula directly, you're summing
> :iteratively, storing results in a std::vector of std::vector. Most of the
> :time will, I gather, be spent in internal new and delete operations, and in
> :the operating system's virtual memory swapping to and from disk,
>
> That's a good thought, but my profiling experiments on his code show
> that the amount of time spent in those areas is in the noise level,
> with the arithmetic functions of the routine the OP indicate
> being the bottleneck.

Did you profile for _large_ Fib numbers, numbers much greater than can be
represented by ordinary 'long', which is what the code seems to be all about?

And does your profiler account for out-of-process time such as e.g. swapping?

Profiling is a tricky business, and without analyzing that code in detail
(I just skimmed it) it seemed to me have at least O(n log n) memory
consumption for computation of a Fib number number n...

> The line he indicated is not the bottleneck, but my experiments show
> that if you are using high optimization in combination with profiling,
> that the profiler can end up accounting the addition line as if it
> was about 3/4 of the execution time. It's an artifact of loop
> unrolling and similar.

Yes... ;-)

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Alf P. Steinbach, Feb 18, 2005