# Which operation is more expensive?(a <= b and a != 0)

Discussion in 'C Programming' started by Ping Cheu, Feb 27, 2012.

1. ### Ping CheuGuest

From a web-site that teaches optimization for c,

I saw bellow optimization rule.

int fact1_func (int n)
{
int i, fact = 1;
for (i = 1; i <= n; i++)
fact *= i;
return(fact);
}

int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i != 0; i--)
fact *= i;
return(fact);
}

The author say 'fact2_func' is faster than 'fact1_func'.

That means 'i <= n' operation is more expensive than 'i != 0'.

So far, I thought all condition statements have same cost.

So I want to know the cost of condition states from the fast one to slow
one.

Can anyone tell me?

Ping Cheu, Feb 27, 2012

2. ### James KuyperGuest

On 02/27/2012 04:24 PM, Ping Cheu wrote:
> From a web-site that teaches optimization for c,
>
> I saw bellow optimization rule.
>
> int fact1_func (int n)
> {
> int i, fact = 1;
> for (i = 1; i <= n; i++)
> fact *= i;
> return(fact);
> }
>
> int fact2_func(int n)
> {
> int i, fact = 1;
> for (i = n; i != 0; i--)
> fact *= i;
> return(fact);
> }
>
> The author say 'fact2_func' is faster than 'fact1_func'.
>
> That means 'i <= n' operation is more expensive than 'i != 0'.
>
> So far, I thought all condition statements have same cost.
>
> So I want to know the cost of condition states from the fast one to slow
> one.
>
> Can anyone tell me?

Yes, someone can tell you. And the answers will be different on
different machines, and for different compilers on the same machine, and
for different optimization levels for the same compiler on the same
machine. The timings will be different depending upon the nature of the
surrounding code, which will affect how effective the compiler's
optimization will be. Finally, and most importantly, if there are two
different but logically equivalent ways of writing your code, the best
modern compilers will, at high optimization levels, re-write the slower
one so that it is implemented that same way as the faster one. As a
result, if you perform a test, you may find that the two loops execute
equally fast.

In a language like C, which is meant to be widely portable and which
gives you very poor control over the generated machine code, it's a
waste of time to acquire that kind of knowledge. It makes more sense to
worry about such things when you're writing assembly code, where you
have more precise control over what the generated machine code will look
like, and a corresponding responsibility to acquire precisely that kind
of information for the particular machine you're writing the assembly
code for. You don't expect such code to be portable, so you don't have
to worry about the fact that the answers will be different on different
machines.

In C, it makes more sense to worry about higher-level issues - make sure
your algorithm doesn't waste time doing the same thing twice (or worse,
the same thing N times, for N very large). Make sure it only does what
needs to be done, and doesn't waste time and source code doing things
that don't need to be done. Make sure your algorithm is correct, and
most important of all, write your code so that it is clearly a correct
implementation of the algorithm. Let the compiler worry about making it
fast.

If and only if you find your program has a actual speed problem, run a
profiler to find out where it is wasting the most time - it's almost
never the same place that you thought it was, and it's almost never
involves issues like the loop condition you're worrying about here. Keep
in mind that the changes you make to speed it up for the current
platform could make it run slower on some other platform - so you should
always be very reluctant to make such changes.

James Kuyper, Feb 27, 2012

3. ### Keith ThompsonGuest

Ping Cheu <> writes:
> From a web-site that teaches optimization for c,
>
> I saw bellow optimization rule.
>

[snip]
>
> The author say 'fact2_func' is faster than 'fact1_func'.
>
> That means 'i <= n' operation is more expensive than 'i != 0'.
>
> So far, I thought all condition statements have same cost.
>
> So I want to know the cost of condition states from the fast one to slow
> one.
>
> Can anyone tell me?

No.

The C language standard says absolutely nothing about the relative
performance of different operations.

It may be the case that "i <= n" will be more expensive than "i !=0"
on some particular system. On another system, they might take
the same time, or "i != 0" might even be more expensive.

If I had to guess, I'd say that "i != 0" is likely to be slightly
faster, or at least no slower, since it only has to load one variable
rather than two before doing the comparison.

On the other hand, a decent optimizing compiler might generate equally
good code for both.

But *I don't have to guess*. I can measure it -- which is what you
should do if the performance matters.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Feb 27, 2012
4. ### Malcolm McLeanGuest

On Feb 27, 9:24 pm, Ping Cheu <> wrote:
>
> Can anyone tell me?
>

If we say if(x != 0) then the processor can load x into a register,
and do a comparison for zero. If we say if(x < n) then the processor
needs to load x and n into registers, then do a subtraction, then look
for carry. So the first test is simpler. Whether it is actually faster
or not depends on whether the circuitry that would have done the more
complex calculation is otherwise employed, or is sitting idle. Often
on modern processors xand n will usually both be in registers anyway,
and an inequality compare completes in exactly as many clock cycles as
a test against zero.

--
Basic Algorithms - a second book of C
http://www.malcolmmclean.site11.com/www

Malcolm McLean, Feb 27, 2012
5. ### Shao MillerGuest

On 2/27/2012 16:24, Ping Cheu wrote:
> From a web-site that teaches optimization for c,
>

probability of this happening seems low.

> I saw bellow optimization rule.
>
> int fact1_func (int n)
> {
> int i, fact = 1;
> for (i = 1; i<= n; i++)
> fact *= i;
> return(fact);
> }
>
> int fact2_func(int n)
> {
> int i, fact = 1;
> for (i = n; i != 0; i--)
> fact *= i;
> return(fact);
> }
>
> The author say 'fact2_func' is faster than 'fact1_func'.
>
> That means 'i<= n' operation is more expensive than 'i != 0'.
>
> So far, I thought all condition statements have same cost.
>
> So I want to know the cost of condition states from the fast one to slow
> one.
>
> Can anyone tell me?

Please don't take a claim out of context and suppose it applies

C doesn't define "the speed" of evaluation of anything that I'm aware
of, so my answer would be "no, 'fact2_func' is not defined to always be
faster than 'fact1_func'."

However, if you consider that 'i <= n' involves _two_ reads and 'i != 0'
involves _one_ read, then unless you can read both at the same time or

Enjoy.

Shao Miller, Feb 27, 2012
6. ### Richard DamonGuest

On 2/27/12 4:24 PM, Ping Cheu wrote:
> From a web-site that teaches optimization for c,
>
> I saw bellow optimization rule.
>
> int fact1_func (int n)
> {
> int i, fact = 1;
> for (i = 1; i <= n; i++)
> fact *= i;
> return(fact);
> }
>
> int fact2_func(int n)
> {
> int i, fact = 1;
> for (i = n; i != 0; i--)
> fact *= i;
> return(fact);
> }
>
> The author say 'fact2_func' is faster than 'fact1_func'.
>
> That means 'i <= n' operation is more expensive than 'i != 0'.
>
> So far, I thought all condition statements have same cost.
>
> So I want to know the cost of condition states from the fast one to slow
> one.
>
> Can anyone tell me?

The difference is that the second compares to a constant 0, which MAY be
faster than comparing to a variable n.

Many processors can quickly test the value of a register to see if it is
negative or positive (sign bit test) or zero or non-zero without running
an explicit comparison operation.

Note that the second version will behave differently in the case of n<0,
likely taking a significantly longer time than the first, finally
returning a value that is probably 0 (after numerous instances of
undefined behavior of overflow), while the first will rapidly return 1.

If you change the second to use i > 0, they will both return rapidly,
but you have possibly lost some of the advantage, as this relationship
is less often a built in automatic, but may still be slightly faster due
to the constant 0 instead of the variable n being used.

Richard Damon, Feb 27, 2012
7. ### Keith ThompsonGuest

Ping Cheu <> writes:
> From a web-site that teaches optimization for c,
>
> I saw bellow optimization rule.
>
> int fact1_func (int n)
> {
> int i, fact = 1;
> for (i = 1; i <= n; i++)
> fact *= i;
> return(fact);
> }
>
> int fact2_func(int n)
> {
> int i, fact = 1;
> for (i = n; i != 0; i--)
> fact *= i;
> return(fact);
> }
>
> The author say 'fact2_func' is faster than 'fact1_func'.

[...]

One other thing: if int is 32 bits, both functions will overflow
for any argument value greater than 12. If int is 64 bits (or if
you modify the functions to return a 64-bit result), it overflows
for arguments greater than 20.

If you're not calling these functions many many times (thousands or
millions), their performance isn't going to matter. And if you are,
the fix is to cache the results so you don't have to recompute the
same results; a table lookup would be a good approach.

Don't waste your time optimizing code whose performance doesn't matter.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Feb 27, 2012
8. ### Barry SchwarzGuest

On Feb 27, 3:24 pm, Ping Cheu <> wrote:
> From a web-site that teaches optimization for c,
>
> I saw bellow optimization rule.
>
>     int fact1_func (int n)
>     {
>         int i, fact = 1;
>         for (i = 1; i <= n; i++)
>           fact *= i;
>         return(fact);
>     }
>
>     int fact2_func(int n)
>     {
>         int i, fact = 1;
>         for (i = n; i != 0; i--)
>            fact *= i;
>         return(fact);
>     }
>
> The author say 'fact2_func' is faster than 'fact1_func'.
>
> That means 'i <= n' operation is more expensive than 'i != 0'.
>
> So far, I thought all condition statements have same cost.
>
> So I want to know the cost of condition states from the fast one to slow
> one.
>
> Can anyone tell me?

On at least one system, there are machine intsructions that will set
processor flags for an arithmetic operation that indicate whether the
result is positive, negative, or zero. On that machine, the i--
method will not require a comparison. The i++ method will always
require a comparison to n.

The same system has instructions that can test a value against zero
faster than a test against an arbitrary constant.

In the general case, tha author is merely displaying is lack of
experience and imagination.

Barry Schwarz, Feb 27, 2012
9. ### Edward A. FalkGuest

In article <>,
Keith Thompson <> wrote:
>
>On the other hand, a decent optimizing compiler might generate equally
>good code for both.
>
>But *I don't have to guess*. I can measure it -- which is what you
>should do if the performance matters.

Best answer. But bear in mind that the result will depend on the cpu
architecture and compiler used.

I can say that for gcc 4.4.3 on x86, with optimization level 4, the
"i <= n" variant generated one more instruction in the inner loop than the
"i != 0" variant.

Now with caching and other hardware optimizations, it might not make a
difference, but you'd have to run timing tests to be sure.

--
-Ed Falk,
http://thespamdiaries.blogspot.com/

Edward A. Falk, Feb 27, 2012
10. ### Kaz KylhekuGuest

On 2012-02-27, Ping Cheu <> wrote:
> From a web-site that teaches optimization for c,
>
> I saw bellow optimization rule.
>
> int fact1_func (int n)
> {
> int i, fact = 1;
> for (i = 1; i <= n; i++)
> fact *= i;
> return(fact);
> }

Your function will invoke multiplication overflow (undefined behavior) for
values of n which are not that large.

> int fact2_func(int n)
> {
> int i, fact = 1;
> for (i = n; i != 0; i--)
> fact *= i;
> return(fact);
> }
>
> The author say 'fact2_func' is faster than 'fact1_func'.

relative to the constrained range of int, that you cannot have a significantly
large domain value of n.

This also means it is practical to do it with a table lookup:

int fact3_func(int n)
{
/* This table is going to be quite small: */
static int fact_table[] = { 1, 1, 2, 24, 120, ...}

if (n >= 0 && n < sizeof fact_table / sizeof *fact_table)
return fact_table[n];

/* trouble: what to do? */
return -1;
}

Before you worry about how the direction of the loop affects machine
cycles, consider the limiting circumstances and the algorithm.

Kaz Kylheku, Feb 28, 2012
11. ### Peter NilssonGuest

Barry Schwarz <> wrote:
> Ping Cheu <> wrote:
> > I saw bellow optimization rule.
> >         for (i = 1; i <= n; i++)

....
> >         for (i = n; i != 0; i--)

....
> In the general case, tha author is merely displaying is
> lack of experience and imagination.

I suppose BWK was demonstrating the same lack of experience and
imagination when he wrote Exercise 2.9 in K&R2, implying...

for (b = 0; x; x &= x - 1)
b++;

....was always faster than...

for (b = 0; x != 0; x >>= 1)
if (x & 01)
b++;

These sorts of tricks are still relevant, just considerably
less significant.

--
Peter

Peter Nilsson, Feb 28, 2012
12. ### Joe PfeifferGuest

Ping Cheu <> writes:

> From a web-site that teaches optimization for c,
>
> I saw bellow optimization rule.
>
> int fact1_func (int n)
> {
> int i, fact = 1;
> for (i = 1; i <= n; i++)
> fact *= i;
> return(fact);
> }
>
> int fact2_func(int n)
> {
> int i, fact = 1;
> for (i = n; i != 0; i--)
> fact *= i;
> return(fact);
> }
>
> The author say 'fact2_func' is faster than 'fact1_func'.
>
> That means 'i <= n' operation is more expensive than 'i != 0'.
>
> So far, I thought all condition statements have same cost.
>
> So I want to know the cost of condition states from the fast one to slow
> one.
>
> Can anyone tell me?

For compilers and architectures for which this is true, the reason is
that the condition codes are set as a side effect of the decrement, so
you don't need to actually perform the comparison in the decrement
version.

It isn't true for all architectures because (1) not all architectures
let you get away with skipping the comparison, and (2) many
implementations will manage to combine the increment, compare, and
branch in u-ops that get executed out of order and don't end up taking
any extra time.

It isn't true for all compilers because (assuming it's true for the
architecture in the first place!) the compiler may figure out it can run
the loop backwards all by itself.

Joe Pfeiffer, Feb 28, 2012
13. ### BartCGuest

"Ping Cheu" <> wrote in message
news:jigsb4\$rdc\$...
> From a web-site that teaches optimization for c,
>
> I saw bellow optimization rule.
>
> int fact1_func (int n)
> {
> int i, fact = 1;
> for (i = 1; i <= n; i++)
> fact *= i;
> return(fact);
> }

On one test, your fact2_func took about 1/3 less time to execute than
fact1_func.

However fact1_func can be optimised a little: starting at i=2 for example.

> That means 'i <= n' operation is more expensive than 'i != 0'.

> So far, I thought all condition statements have same cost.

'i<=n' involves two variables. 'i!=0' involves one variable and a constant,
which being zero might not be needed at all; it's equivalent to just 'i',
which is one variable and no constants.

In general the cost depends on how many variable and constant terms there
are, with constants being a little more efficient than variables. But in
practice it depends on so many other factors (the types involved for
example), that it's difficult to say.

And you don't really know what the compiler might do; it's possible that any
calls to one of these functions with a constant argument will be optimised
out completely so the issue doesn't even come up.

--
Bartc

BartC, Feb 28, 2012
14. ### Stephen SprunkGuest

On 27-Feb-12 15:24, Ping Cheu wrote:
> From a web-site that teaches optimization for c,
>
> I saw bellow optimization rule.
> ...
> The author say 'fact2_func' is faster than 'fact1_func'.
>
> That means 'i <= n' operation is more expensive than 'i != 0'.

The Standard does not guarantee the performance of _any_ operation; at
best, such "optimizations" are only valid for a particular
implementation. They may (and often do) vary by platform, compiler,
optimization levels, and command-line options. Unless the web site in
question specifies the exact implementation he is using, such advice is
worthless, as the exact opposite may be true of the implementation _you_
are using.

> So far, I thought all condition statements have same cost.

That is unlikely to be exactly true, but the difference is likely
immeasurable on most implementations. At worst, any difference should
be completely insignificant to the performance of your program as a whole.

> Can anyone tell me?

The only way you can find out is to test _your implementation_ yourself.
However, the answer isn't likely to matter.

Concentrate on writing clear and correct code, and let the compiler
isn't fast enough, such tricks are unlikely to help; spend your time
finding a better algorithm, not optimizing a poor algorithm.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Stephen Sprunk, Feb 28, 2012
15. ### Stephen SprunkGuest

On 27-Feb-12 15:57, Shao Miller wrote:
> However, if you consider that 'i <= n' involves _two_ reads and 'i != 0'
> involves _one_ read, then unless you can read both at the same time or

True, but in most cases a modern CPU will be able to hoist both loads
far above the comparison, if both aren't _already_ in registers, so that
effect _should_ be irrelevant.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Stephen Sprunk, Feb 28, 2012
16. ### Joe keaneGuest

In article <jigsb4\$rdc\$>,
Ping Cheu <> wrote:
>Can anyone tell me?

I prefer the count-down method.

ct = len;
zp = bas;

while (--ct >= 0)
{
...(*zp)
zp++;
}

Think about it. You need a finite-state machine that outputs "0" ten
times then "1" one time. How much state do you need? About four bits,
certainly not two registers.

For complicated code, register allocation is always an issue (doubly so
for old x86).

If your compiler disdecides that loop control should be in registers,
and kicks out some other value that is really more important, it may be
an issue.

Joe keane, Feb 28, 2012
17. ### qarnosGuest

On Feb 28, 1:16 pm, Peter Nilsson <> wrote:
>
> I suppose BWK was demonstrating the same lack of experience and
> imagination when he wrote Exercise 2.9 in K&R2, implying...
>
>   for (b = 0; x; x &= x - 1)
>     b++;
>
> ...was always faster than...
>
>   for (b = 0; x != 0; x >>= 1)
>     if (x & 01)
>       b++;
>

These are completely different algorithms for computing the same
result. In this case, the "fast" version will, for most inputs,
perform fewer fundamental operations than the "slow version". In fact,
in the *worst case scenario* it will perform exactly the same number
of operations as the "slow" version. This is a good example of how to
optimise *an algorithm*.

In the OP, the number of operations is identical (for n >= 0, at
least) - the only difference is a semantic one over how the loop
condition is evaluated. It is an attempt to influence the machine code
generated by the compiler. It may well work, but to say it is
unconditionally "faster" is misleading at best.

It is grossly unfair to try to draw an equivalence between the two.

qarnos, Feb 29, 2012
18. ### Phil CarmodyGuest

(Joe keane) writes:
> In article <jigsb4\$rdc\$>,
> Ping Cheu <> wrote:
> >Can anyone tell me?

>
> I prefer the count-down method.
>
> ct = len;
> zp = bas;
>
> while (--ct >= 0)

So you don't like unsigned types for counters then?

Phil
--
> I'd argue that there is much evidence for the existence of a God.

Pics or it didn't happen.
-- Tom (/. uid 822)

Phil Carmody, Mar 2, 2012
19. ### Joe keaneGuest

In article <>,
Phil Carmody <> wrote:
>So you don't like unsigned types for counters then?

It's typical C to use 'int' unless something else is needed.

It's probably an error if the counter comes in negative [or negative if
it were signed if it's unsigned], but maybe running zero times is fine.

beg = 8;
end = 6;
for (j = beg; j < end; j++)
...

Joe keane, Mar 5, 2012
20. ### Michael Angelo RaveraGuest

On Monday, February 27, 2012 1:47:13 PM UTC-8, James Kuyper wrote:
> On 02/27/2012 04:24 PM, Ping Cheu wrote:
> > From a web-site that teaches optimization for c,
> >
> > I saw bellow optimization rule.
> >
> > int fact1_func (int n)
> > {
> > int i, fact = 1;
> > for (i = 1; i <= n; i++)
> > fact *= i;
> > return(fact);
> > }
> >
> > int fact2_func(int n)
> > {
> > int i, fact = 1;
> > for (i = n; i != 0; i--)
> > fact *= i;
> > return(fact);
> > }
> >
> > The author say 'fact2_func' is faster than 'fact1_func'.
> >
> > That means 'i <= n' operation is more expensive than 'i != 0'.
> >
> > So far, I thought all condition statements have same cost.
> >
> > So I want to know the cost of condition states from the fast one to slow
> > one.
> >
> > Can anyone tell me?

>
> Yes, someone can tell you. And the answers will be different on
> different machines, and for different compilers on the same machine, and
> for different optimization levels for the same compiler on the same
> machine. The timings will be different depending upon the nature of the
> surrounding code, which will affect how effective the compiler's
> optimization will be. Finally, and most importantly, if there are two
> different but logically equivalent ways of writing your code, the best
> modern compilers will, at high optimization levels, re-write the slower
> one so that it is implemented that same way as the faster one. As a
> result, if you perform a test, you may find that the two loops execute
> equally fast.
>
> In a language like C, which is meant to be widely portable and which
> gives you very poor control over the generated machine code, it's a
> waste of time to acquire that kind of knowledge. It makes more sense to
> worry about such things when you're writing assembly code, where you
> have more precise control over what the generated machine code will look
> like, and a corresponding responsibility to acquire precisely that kind
> of information for the particular machine you're writing the assembly
> code for. You don't expect such code to be portable, so you don't have
> to worry about the fact that the answers will be different on different
> machines.
>
> In C, it makes more sense to worry about higher-level issues - make sure
> your algorithm doesn't waste time doing the same thing twice (or worse,
> the same thing N times, for N very large). Make sure it only does what
> needs to be done, and doesn't waste time and source code doing things
> that don't need to be done. Make sure your algorithm is correct, and
> most important of all, write your code so that it is clearly a correct
> implementation of the algorithm. Let the compiler worry about making it
> fast.
>
> If and only if you find your program has a actual speed problem, run a
> profiler to find out where it is wasting the most time - it's almost
> never the same place that you thought it was, and it's almost never
> involves issues like the loop condition you're worrying about here. Keep
> in mind that the changes you make to speed it up for the current
> platform could make it run slower on some other platform - so you should
> always be very reluctant to make such changes.

.... what he said, but you can write the factorial function better than either of these.

1) You have no protection of overflow in either of the above examples.
2) The result isn't correct or useful for negative inputs
3) If you wanted it to be really fast, you'd use a const table (with the table in code space)
4) Write it like this and save a whole iteration (skipping a multiply by what you know will be one and the end of the calculation):
int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i > 1; i--)
fact *= i;
return(fact);
}

Michael Angelo Ravera, Mar 7, 2012