Fast addition for n+1 or n+0

A

Alex Vinokur

Consider the following statement:
n+i, where i = 1 or 0.

Is there more fast method for computing n+i than direct computing that sum?
 
K

Keith Thompson

Alex Vinokur said:
Consider the following statement:
n+i, where i = 1 or 0.

Is there more fast method for computing n+i than direct computing that sum?

The best way to compute n+0 is n.

The best way to compute n+1 is n+1; if the CPU provides something
faster than a general add instruction, the compiler will generate it
for you.
 
A

Alf P. Steinbach

* Alex Vinokur:
Consider the following statement:
n+i, where i = 1 or 0.

Is there more fast method for computing n+i than direct computing that sum?

That depends on the types involved.

For built-in numeric types, direct computation is probably fastest.

Measure if you're in doubt (and it really matters).
 
G

Gregory Toomey

Alex said:
Consider the following statement:
n+i, where i = 1 or 0.

Is there more fast method for computing n+i than direct computing that
sum?

Assuming integers, hardware addition is implemented simply using full
adders, or faster algorithms like carry lookahead.

n+0 has no carries is is fast; many compliers will constant fold to n
n+1 has potentially m carries in m-bit arithmetic

Full adder:
http://isweb.redwoods.cc.ca.us/INSTRUCT/CalderwoodD/diglogic/full.htm

Carry look ahead:
http://www.seas.upenn.edu/~ee201/lab/CarryLookAhead/CarryLookAheadF01.html


gtoomey
www.gregorytoomey.com
 
R

Richard Tobin

Alex Vinokur said:
Consider the following statement:
n+i, where i = 1 or 0.

Is there more fast method for computing n+i than direct computing that sum?

Assuming n and i are ints, not on a modern general purpose computer.
Addition typically takes one cycle, once the operands are in
registers.

Any attempt to use a conditional will almost certainly be much slower.

For more details, try a newsgroup for the processor you're interested
in, or maybe comp.arch.

-- Richard
 
A

Alex Vinokur

Richard Tobin said:
Assuming n and i are ints, not on a modern general purpose computer.
Addition typically takes one cycle, once the operands are in
registers.

Any attempt to use a conditional will almost certainly be much slower.

For more details, try a newsgroup for the processor you're interested
in, or maybe comp.arch.

-- Richard

I need that in C/C++ program.
 
M

Michael Mair

Alex said:
I need that in C/C++ program.

Well, there is no general truth helping you along to a portable,
always perfect solution.
If you want to optimise your code for speed, use a profiler to
determine which functions are called how often and take how much
time. Then you know _where_ you lose your time.
After that, try to find algorithms which reduce the number
of calls to small functions which take a good part of the overall
time and reduces the time spent in "big" functions taking much time.
If you afterwards really find that optimising code with
'n+0' and 'n+1' would be the best possible micro-optimisation
to gain some more cycles, then you should try to write as many
'n+0's/'n's and 'n+1's as possible explicitly in your code
instead of using 'n+i'. The compiler will optimise that if the
code has the potential for optimisation.
Afterwards, use the profiler to determine whether this actually
makes a difference.

Probably not much.
If you think you can do better than the compiler, then follow
Richard's suggestion about comp.arch.*


Cheers
Michael
 
W

Walter Roberson

:n+i, where i = 1 or 0.

:Is there more fast method for computing n+i than direct computing that sum?

It depends on the costs you assign to the various operations -- a
matter which is architecture dependant. Integer addition is usually one of
the fastest things a computer does. Suppose you were able to find a
two instruction sequence that was faster for that particular case: then
it is very likely to be slower because internally the CPU has
to perform an integer addition in order to find the address of the
second instruction.

Have you perhaps omitted some important facts about the circumstances?
For example, are you microprogramming, or is this a theory question
at the micro-level where each comparison and change of a bit in
the implimentation of the 'addition' operation is to be counted?
Is this an assignment in designing an IC which is faster for these
particular cases than building a full-blown adder circuit would be?
 
E

E. Robert Tisdale

Alex said:
Consider the following statement:

n + i, where i = 1 or 0.

Is there more fast method for computing n + i than direct computing that sum?

No.
But a good optimizing compiler should be able to
replace n + 0 with n and replace n + 1 with ++n.
 
A

Alex Vinokur

Walter Roberson said:
:n+i, where i = 1 or 0.

:Is there more fast method for computing n+i than direct computing that sum?

It depends on the costs you assign to the various operations -- a
matter which is architecture dependant. Integer addition is usually one of
the fastest things a computer does. Suppose you were able to find a
two instruction sequence that was faster for that particular case: then
it is very likely to be slower because internally the CPU has
to perform an integer addition in order to find the address of the
second instruction.

Have you perhaps omitted some important facts about the circumstances?
For example, are you microprogramming, or is this a theory question
at the micro-level where each comparison and change of a bit in
the implimentation of the 'addition' operation is to be counted?
Is this an assignment in designing an IC which is faster for these
particular cases than building a full-blown adder circuit would be?

I would like to optimize (speed) an algorithm for computing very large Fibonacci numbers using the primary recursive formula.
The algorithm can be seen at
http://groups-beta.google.com/group/alt.sources/msg/42e76b12150613a1

Function AddUnits() contains a line
n1 += (n2 + carry_s); // carry_s == 0 or 1

The question is if is it possible to make that line to work faster?
 
W

Walter Roberson

|In article <[email protected]>,

|:Consider the following statement:
|:n+i, where i = 1 or 0.
|:Is there more fast method for computing n+i than direct computing that sum?

|It depends on the costs you assign to the various operations -- a
|matter which is architecture dependant.

There is a possibility that would be slower in any real
architecture that I've ever heard of, but which could be faster
under very narrow circumstances.

(n&1) ? (n+i) : (n|i)

The narrow circumstances under which this could be faster are:
- this is within a tight loop that fits within the processor's
primary instruction cache
- the processor has a "move conditional" operation that
avoids taking an actual branch when the operations are
simple enough and the result is being used arithmetically
instead of to control a branch
- at the microcode level, the processor "runs free"
when working from instruction cache, processing each
instruction as fast as possible instead of working
on a bus-cycle system (which is needed in most cases
when anything outside the primary cache is being referenced)
- the cost of the bitwise AND operation plus the cost of the
comparison to 0 plus the cost of the bitwise OR operation,
are faster than the cost of a full addition

I have heard of one architecture (I don't recall which)
that had a "move conditional" operation that took
a test condition and two arithmetic operations as operands,
and would start doing the two artihmetic operations in
parallel at the same time it was doing the test; when the
result of the test was available, it would abort the false
branch if it was not already finished, with the result
being whichever of the arithmetic expressions was selected
by the condition.


Please note how narrow these conditions are: you would
have to know a LOT about your processor to make this kind
of optimization: the expression I give above will be slower
than a straight addition on nearly every architecture.

Addition is usually hard-coded through a series of
transistors, with the carry circuit taking most of the
landscape. It's hard to beat transistor-level speeds
by using multiple instructions.

I have heard that some architectures internally
optimize +0 and +1; there would be no way to beat that...
but again you would need to know intimate details of
the architecture.
 
A

Andrew Koenig

Function AddUnits() contains a line
n1 += (n2 + carry_s); // carry_s == 0 or 1

The question is if is it possible to make that line to work faster?

What fraction of your program's total execution time does this statement
consume?

Until you know the answer to this question, you don't know whether it's even
worth trying to change it, let alone the best way of doing so.
 
C

Clark S. Cox III

I would like to optimize (speed) an algorithm for computing very large
Fibonacci numbers using the primary recursive formula.
The algorithm can be seen at
http://groups-beta.google.com/group/alt.sources/msg/42e76b12150613a1

Function AddUnits() contains a line
n1 += (n2 + carry_s); // carry_s == 0 or 1

The question is if is it possible to make that line to work faster?

No, the question is: "Is that line the bottleneck?" How do you know
that line is the problem? Have you measured the performance of your
code?
 
K

Keith Thompson

Alex Vinokur said:
I would like to optimize (speed) an algorithm for computing very
large Fibonacci numbers using the primary recursive formula.
The algorithm can be seen at
http://groups-beta.google.com/group/alt.sources/msg/42e76b12150613a1

Function AddUnits() contains a line
n1 += (n2 + carry_s); // carry_s == 0 or 1

The question is if is it possible to make that line to work faster?

Use the maximum optimization level your compiler provides (you're
probably already doing this). Use a better compiler if you can find
one. Use a faster computer. Kick other users off the system so you
get 100% of the CPU.

As others have mentioned, there's little point in trying to optimize
this one line unless you've actually made measurements that indicate
that it's a bottleneck. Even if you've done that, there's no reliable
portable way in standard C to improve the performance of that line of
code.

It's conceivable that a compiler can generate better code if it
happens to know that carry_s is either 0 or 1. It might be able to
infer this by dataflow analysis, depending on how carry_s is set. If
you have a C99 compiler, making carry_s a _Bool <OT>or bool if you're
using C++</OT> might help (or it might hurt).

The following:

n1 += n2;
if (carry_s) n1++;

might give you better or worse performance, or exactly the same,
depending on the CPU architecture, the compiler, and the phase of the
moon. (The "if" is likely to cause a branch, which can screw up
pipelining -- or the compiler may be able to use some special CPU
instruction that does exactly what's needed.)

If this line really is a serious bottleneck, you might consider
writing it in several equivalent ways and choosing among them with a
macro:

#if METHOD == 1
n1 += (n2 + carry_s);
#elif METHOD == 2
n1 += n2;
if (carry_s) n1++;
#elif METHOD == 3
/* something else */
#else
#error METHOD is undefined or invalid.
#endif

For a given platform, try compiling and running your program with each
defined METHOD, and *measure the results*. (You can also examine the
assembly listing; this can tell you if two methods result in the same
code, but won't necessarily tell you which is better unless you're an
expert in the particular CPU.) Expect the tradeoffs to change with
the next release of the compiler or a different version of the CPU.

Or, if you don't care about portability, you can code it in assembly
language (which we can't help you with here). Consider using your
compiler's output as a guide.

Again, all this assumes that that one line really is a serious
bottleneck. The only way to know this is to profile your code. If
it's not a bottleneck, just write it as straightfowardly as possible
and spend your effort elsewhere.
 
W

Walter Roberson

:> n1 += (n2 + carry_s); // carry_s == 0 or 1

:> The question is if is it possible to make that line to work faster?

:No, the question is: "Is that line the bottleneck?"

I profiled his code here on a particular platform. The line he is
asking about is the -fastest- part of that function. The startup code
for the function itself is a hair slower; the line after the above line
takes about 3 times as long as the +=, and the code for the return statement
after that takes a bit longer still.
 
A

Alf P. Steinbach

* Alex Vinokur:
I would like to optimize (speed) an algorithm for computing very large Fibonacci
numbers using the primary recursive formula.
The algorithm can be seen at
http://groups-beta.google.com/group/alt.sources/msg/42e76b12150613a1

Function AddUnits() contains a line
n1 += (n2 + carry_s); // carry_s == 0 or 1

The question is if is it possible to make that line to work faster?

Attacking an optimization problem at the level of fundamental additions is
seldom a Good Idea.

Thinking about what goes on is almost always a Better Idea.

Almost any way of computing Fibonacci numbers is faster than the recursive
formula. But you're not using the recursive formula directly, you're summing
iteratively, storing results in a std::vector of std::vector. Most of the
time will, I gather, be spent in internal new and delete operations, and in
the operating system's virtual memory swapping to and from disk, so
possibly you can optimize _a lot_ by first computing the approximate Fib
number using double arithmetic (check out the Golden Ratio), then allocate
just what you need of memory for that single number, and then compute the
number exactly.

Hth.,

- Alf
 
W

Walter Roberson

|In article <2005021814290016807%clarkcox3@gmailcom>,

|:No, the question is: "Is that line the bottleneck?"

|I profiled his code here on a particular platform. The line he is
|asking about is the -fastest- part of that function.

I recompiled with aggressive optimizations, interprocedural analysis,
loop unrolling, and telling the compiler it was okay to mix code
together in ways that make it difficult to tell exactly which line you
are on.

When I turned on all those optimizations, a sample run with hardware
profiling counted 9724 against the line the OP pointed out,
3195 against the next line, and 637 against the return.

Thus, if you were naive about what the profiling output really means in
the face of high optimization, then you could end up drawing the
conclusion that it was the add that was slow.
 
W

Walter Roberson

:Almost any way of computing Fibonacci numbers is faster than the recursive
:formula. But you're not using the recursive formula directly, you're summing
:iteratively, storing results in a std::vector of std::vector. Most of the
:time will, I gather, be spent in internal new and delete operations, and in
:the operating system's virtual memory swapping to and from disk,

That's a good thought, but my profiling experiments on his code show
that the amount of time spent in those areas is in the noise level,
with the arithmetic functions of the routine the OP indicate
being the bottleneck.

The line he indicated is not the bottleneck, but my experiments show
that if you are using high optimization in combination with profiling,
that the profiler can end up accounting the addition line as if it
was about 3/4 of the execution time. It's an artifact of loop
unrolling and similar.
 
A

Alf P. Steinbach

* Walter Roberson:
:Almost any way of computing Fibonacci numbers is faster than the recursive
:formula. But you're not using the recursive formula directly, you're summing
:iteratively, storing results in a std::vector of std::vector. Most of the
:time will, I gather, be spent in internal new and delete operations, and in
:the operating system's virtual memory swapping to and from disk,

That's a good thought, but my profiling experiments on his code show
that the amount of time spent in those areas is in the noise level,
with the arithmetic functions of the routine the OP indicate
being the bottleneck.

Did you profile for _large_ Fib numbers, numbers much greater than can be
represented by ordinary 'long', which is what the code seems to be all about?

And does your profiler account for out-of-process time such as e.g. swapping?

Profiling is a tricky business, and without analyzing that code in detail
(I just skimmed it) it seemed to me have at least O(n log n) memory
consumption for computation of a Fib number number n...

The line he indicated is not the bottleneck, but my experiments show
that if you are using high optimization in combination with profiling,
that the profiler can end up accounting the addition line as if it
was about 3/4 of the execution time. It's an artifact of loop
unrolling and similar.

Yes... ;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top