Which operation is more expensive?(a <= b and a != 0)

P

Ping Cheu

From a web-site that teaches optimization for c,

I saw bellow optimization rule.

int fact1_func (int n)
{
int i, fact = 1;
for (i = 1; i <= n; i++)
fact *= i;
return(fact);
}

int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i != 0; i--)
fact *= i;
return(fact);
}

The author say 'fact2_func' is faster than 'fact1_func'.

That means 'i <= n' operation is more expensive than 'i != 0'.

So far, I thought all condition statements have same cost.

So I want to know the cost of condition states from the fast one to slow
one.

Can anyone tell me?
 
J

James Kuyper

From a web-site that teaches optimization for c,

I saw bellow optimization rule.

int fact1_func (int n)
{
int i, fact = 1;
for (i = 1; i <= n; i++)
fact *= i;
return(fact);
}

int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i != 0; i--)
fact *= i;
return(fact);
}

The author say 'fact2_func' is faster than 'fact1_func'.

That means 'i <= n' operation is more expensive than 'i != 0'.

So far, I thought all condition statements have same cost.

So I want to know the cost of condition states from the fast one to slow
one.

Can anyone tell me?

Yes, someone can tell you. And the answers will be different on
different machines, and for different compilers on the same machine, and
for different optimization levels for the same compiler on the same
machine. The timings will be different depending upon the nature of the
surrounding code, which will affect how effective the compiler's
optimization will be. Finally, and most importantly, if there are two
different but logically equivalent ways of writing your code, the best
modern compilers will, at high optimization levels, re-write the slower
one so that it is implemented that same way as the faster one. As a
result, if you perform a test, you may find that the two loops execute
equally fast.

In a language like C, which is meant to be widely portable and which
gives you very poor control over the generated machine code, it's a
waste of time to acquire that kind of knowledge. It makes more sense to
worry about such things when you're writing assembly code, where you
have more precise control over what the generated machine code will look
like, and a corresponding responsibility to acquire precisely that kind
of information for the particular machine you're writing the assembly
code for. You don't expect such code to be portable, so you don't have
to worry about the fact that the answers will be different on different
machines.

In C, it makes more sense to worry about higher-level issues - make sure
your algorithm doesn't waste time doing the same thing twice (or worse,
the same thing N times, for N very large). Make sure it only does what
needs to be done, and doesn't waste time and source code doing things
that don't need to be done. Make sure your algorithm is correct, and
most important of all, write your code so that it is clearly a correct
implementation of the algorithm. Let the compiler worry about making it
fast.

If and only if you find your program has a actual speed problem, run a
profiler to find out where it is wasting the most time - it's almost
never the same place that you thought it was, and it's almost never
involves issues like the loop condition you're worrying about here. Keep
in mind that the changes you make to speed it up for the current
platform could make it run slower on some other platform - so you should
always be very reluctant to make such changes.
 
K

Keith Thompson

Ping Cheu said:
From a web-site that teaches optimization for c,

I saw bellow optimization rule.
[snip]

The author say 'fact2_func' is faster than 'fact1_func'.

That means 'i <= n' operation is more expensive than 'i != 0'.

So far, I thought all condition statements have same cost.

So I want to know the cost of condition states from the fast one to slow
one.

Can anyone tell me?

No.

The C language standard says absolutely nothing about the relative
performance of different operations.

It may be the case that "i <= n" will be more expensive than "i !=0"
on some particular system. On another system, they might take
the same time, or "i != 0" might even be more expensive.

If I had to guess, I'd say that "i != 0" is likely to be slightly
faster, or at least no slower, since it only has to load one variable
rather than two before doing the comparison.

On the other hand, a decent optimizing compiler might generate equally
good code for both.

But *I don't have to guess*. I can measure it -- which is what you
should do if the performance matters.
 
M

Malcolm McLean

Can anyone tell me?
If we say if(x != 0) then the processor can load x into a register,
and do a comparison for zero. If we say if(x < n) then the processor
needs to load x and n into registers, then do a subtraction, then look
for carry. So the first test is simpler. Whether it is actually faster
or not depends on whether the circuitry that would have done the more
complex calculation is otherwise employed, or is sitting idle. Often
on modern processors xand n will usually both be in registers anyway,
and an inequality compare completes in exactly as many clock cycles as
a test against zero.
 
S

Shao Miller

From a web-site that teaches optimization for c,

Please share the source. With no follow-up to your last thread, the
probability of this happening seems low.
I saw bellow optimization rule.

int fact1_func (int n)
{
int i, fact = 1;
for (i = 1; i<= n; i++)
fact *= i;
return(fact);
}

int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i != 0; i--)
fact *= i;
return(fact);
}

The author say 'fact2_func' is faster than 'fact1_func'.

That means 'i<= n' operation is more expensive than 'i != 0'.

So far, I thought all condition statements have same cost.

So I want to know the cost of condition states from the fast one to slow
one.

Can anyone tell me?

Please don't take a claim out of context and suppose it applies
universally. Please share the context.

C doesn't define "the speed" of evaluation of anything that I'm aware
of, so my answer would be "no, 'fact2_func' is not defined to always be
faster than 'fact1_func'."

However, if you consider that 'i <= n' involves _two_ reads and 'i != 0'
involves _one_ read, then unless you can read both at the same time or
deduce an optimization, 2 reads > 1 read.

Enjoy.
 
R

Richard Damon

From a web-site that teaches optimization for c,

I saw bellow optimization rule.

int fact1_func (int n)
{
int i, fact = 1;
for (i = 1; i <= n; i++)
fact *= i;
return(fact);
}

int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i != 0; i--)
fact *= i;
return(fact);
}

The author say 'fact2_func' is faster than 'fact1_func'.

That means 'i <= n' operation is more expensive than 'i != 0'.

So far, I thought all condition statements have same cost.

So I want to know the cost of condition states from the fast one to slow
one.

Can anyone tell me?

The difference is that the second compares to a constant 0, which MAY be
faster than comparing to a variable n.

Many processors can quickly test the value of a register to see if it is
negative or positive (sign bit test) or zero or non-zero without running
an explicit comparison operation.

Note that the second version will behave differently in the case of n<0,
likely taking a significantly longer time than the first, finally
returning a value that is probably 0 (after numerous instances of
undefined behavior of overflow), while the first will rapidly return 1.

If you change the second to use i > 0, they will both return rapidly,
but you have possibly lost some of the advantage, as this relationship
is less often a built in automatic, but may still be slightly faster due
to the constant 0 instead of the variable n being used.
 
K

Keith Thompson

Ping Cheu said:
From a web-site that teaches optimization for c,

I saw bellow optimization rule.

int fact1_func (int n)
{
int i, fact = 1;
for (i = 1; i <= n; i++)
fact *= i;
return(fact);
}

int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i != 0; i--)
fact *= i;
return(fact);
}

The author say 'fact2_func' is faster than 'fact1_func'.
[...]

One other thing: if int is 32 bits, both functions will overflow
for any argument value greater than 12. If int is 64 bits (or if
you modify the functions to return a 64-bit result), it overflows
for arguments greater than 20.

If you're not calling these functions many many times (thousands or
millions), their performance isn't going to matter. And if you are,
the fix is to cache the results so you don't have to recompute the
same results; a table lookup would be a good approach.

Don't waste your time optimizing code whose performance doesn't matter.
 
B

Barry Schwarz

From a web-site that teaches optimization for c,

I saw bellow optimization rule.

    int fact1_func (int n)
    {
        int i, fact = 1;
        for (i = 1; i <= n; i++)
          fact *= i;
        return(fact);
    }

    int fact2_func(int n)
    {
        int i, fact = 1;
        for (i = n; i != 0; i--)
           fact *= i;
        return(fact);
    }

The author say 'fact2_func' is faster than 'fact1_func'.

That means 'i <= n' operation is more expensive than 'i != 0'.

So far, I thought all condition statements have same cost.

So I want to know the cost of condition states from the fast one to slow
one.

Can anyone tell me?

On at least one system, there are machine intsructions that will set
processor flags for an arithmetic operation that indicate whether the
result is positive, negative, or zero. On that machine, the i--
method will not require a comparison. The i++ method will always
require a comparison to n.

The same system has instructions that can test a value against zero
faster than a test against an arbitrary constant.

In the general case, tha author is merely displaying is lack of
experience and imagination.
 
E

Edward A. Falk

On the other hand, a decent optimizing compiler might generate equally
good code for both.

But *I don't have to guess*. I can measure it -- which is what you
should do if the performance matters.

Best answer. But bear in mind that the result will depend on the cpu
architecture and compiler used.

I can say that for gcc 4.4.3 on x86, with optimization level 4, the
"i <= n" variant generated one more instruction in the inner loop than the
"i != 0" variant.

Now with caching and other hardware optimizations, it might not make a
difference, but you'd have to run timing tests to be sure.
 
K

Kaz Kylheku

From a web-site that teaches optimization for c,

I saw bellow optimization rule.

int fact1_func (int n)
{
int i, fact = 1;
for (i = 1; i <= n; i++)
fact *= i;
return(fact);
}

Your function will invoke multiplication overflow (undefined behavior) for
values of n which are not that large.
int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i != 0; i--)
fact *= i;
return(fact);
}

The author say 'fact2_func' is faster than 'fact1_func'.

Worrying about this is pointless because the factorial function grows so fast
relative to the constrained range of int, that you cannot have a significantly
large domain value of n.

This also means it is practical to do it with a table lookup:

int fact3_func(int n)
{
/* This table is going to be quite small: */
static int fact_table[] = { 1, 1, 2, 24, 120, ...}

if (n >= 0 && n < sizeof fact_table / sizeof *fact_table)
return fact_table[n];

/* trouble: what to do? */
return -1;
}

Before you worry about how the direction of the loop affects machine
cycles, consider the limiting circumstances and the algorithm.
 
P

Peter Nilsson

Barry Schwarz said:
....
In the general case, tha author is merely displaying is
lack of experience and imagination.

I suppose BWK was demonstrating the same lack of experience and
imagination when he wrote Exercise 2.9 in K&R2, implying...

for (b = 0; x; x &= x - 1)
b++;

....was always faster than...

for (b = 0; x != 0; x >>= 1)
if (x & 01)
b++;

These sorts of tricks are still relevant, just considerably
less significant.
 
J

Joe Pfeiffer

Ping Cheu said:
From a web-site that teaches optimization for c,

I saw bellow optimization rule.

int fact1_func (int n)
{
int i, fact = 1;
for (i = 1; i <= n; i++)
fact *= i;
return(fact);
}

int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i != 0; i--)
fact *= i;
return(fact);
}

The author say 'fact2_func' is faster than 'fact1_func'.

That means 'i <= n' operation is more expensive than 'i != 0'.

So far, I thought all condition statements have same cost.

So I want to know the cost of condition states from the fast one to slow
one.

Can anyone tell me?

For compilers and architectures for which this is true, the reason is
that the condition codes are set as a side effect of the decrement, so
you don't need to actually perform the comparison in the decrement
version.

It isn't true for all architectures because (1) not all architectures
let you get away with skipping the comparison, and (2) many
implementations will manage to combine the increment, compare, and
branch in u-ops that get executed out of order and don't end up taking
any extra time.

It isn't true for all compilers because (assuming it's true for the
architecture in the first place!) the compiler may figure out it can run
the loop backwards all by itself.
 
B

BartC

Ping Cheu said:
From a web-site that teaches optimization for c,

I saw bellow optimization rule.

int fact1_func (int n)
{
int i, fact = 1;
for (i = 1; i <= n; i++)
fact *= i;
return(fact);
}

On one test, your fact2_func took about 1/3 less time to execute than
fact1_func.

However fact1_func can be optimised a little: starting at i=2 for example.
That means 'i <= n' operation is more expensive than 'i != 0'.
So far, I thought all condition statements have same cost.

'i<=n' involves two variables. 'i!=0' involves one variable and a constant,
which being zero might not be needed at all; it's equivalent to just 'i',
which is one variable and no constants.

In general the cost depends on how many variable and constant terms there
are, with constants being a little more efficient than variables. But in
practice it depends on so many other factors (the types involved for
example), that it's difficult to say.

And you don't really know what the compiler might do; it's possible that any
calls to one of these functions with a constant argument will be optimised
out completely so the issue doesn't even come up.
 
S

Stephen Sprunk

From a web-site that teaches optimization for c,

I saw bellow optimization rule.
...
The author say 'fact2_func' is faster than 'fact1_func'.

That means 'i <= n' operation is more expensive than 'i != 0'.

The Standard does not guarantee the performance of _any_ operation; at
best, such "optimizations" are only valid for a particular
implementation. They may (and often do) vary by platform, compiler,
optimization levels, and command-line options. Unless the web site in
question specifies the exact implementation he is using, such advice is
worthless, as the exact opposite may be true of the implementation _you_
are using.
So far, I thought all condition statements have same cost.

That is unlikely to be exactly true, but the difference is likely
immeasurable on most implementations. At worst, any difference should
be completely insignificant to the performance of your program as a whole.
Can anyone tell me?

The only way you can find out is to test _your implementation_ yourself.
However, the answer isn't likely to matter.

Concentrate on writing clear and correct code, and let the compiler
worry about making it fast. If _actual measurement_ shows your program
isn't fast enough, such tricks are unlikely to help; spend your time
finding a better algorithm, not optimizing a poor algorithm.

S
 
S

Stephen Sprunk

However, if you consider that 'i <= n' involves _two_ reads and 'i != 0'
involves _one_ read, then unless you can read both at the same time or
deduce an optimization, 2 reads > 1 read.

True, but in most cases a modern CPU will be able to hoist both loads
far above the comparison, if both aren't _already_ in registers, so that
effect _should_ be irrelevant.

S
 
J

Joe keane

Can anyone tell me?

I prefer the count-down method.

ct = len;
zp = bas;

while (--ct >= 0)
{
...(*zp)
zp++;
}

Think about it. You need a finite-state machine that outputs "0" ten
times then "1" one time. How much state do you need? About four bits,
certainly not two registers.

For complicated code, register allocation is always an issue (doubly so
for old x86).

If your compiler disdecides that loop control should be in registers,
and kicks out some other value that is really more important, it may be
an issue.
 
Q

qarnos

I suppose BWK was demonstrating the same lack of experience and
imagination when he wrote Exercise 2.9 in K&R2, implying...

  for (b = 0; x; x &= x - 1)
    b++;

...was always faster than...

  for (b = 0; x != 0; x >>= 1)
    if (x & 01)
      b++;

These are completely different algorithms for computing the same
result. In this case, the "fast" version will, for most inputs,
perform fewer fundamental operations than the "slow version". In fact,
in the *worst case scenario* it will perform exactly the same number
of operations as the "slow" version. This is a good example of how to
optimise *an algorithm*.

In the OP, the number of operations is identical (for n >= 0, at
least) - the only difference is a semantic one over how the loop
condition is evaluated. It is an attempt to influence the machine code
generated by the compiler. It may well work, but to say it is
unconditionally "faster" is misleading at best.

It is grossly unfair to try to draw an equivalence between the two.
 
P

Phil Carmody

I prefer the count-down method.

ct = len;
zp = bas;

while (--ct >= 0)

So you don't like unsigned types for counters then?

Phil
--
I'd argue that there is much evidence for the existence of a God.
Pics or it didn't happen.
-- Tom (/. uid 822)
 
J

Joe keane

So you don't like unsigned types for counters then?

It's typical C to use 'int' unless something else is needed.

It's probably an error if the counter comes in negative [or negative if
it were signed if it's unsigned], but maybe running zero times is fine.

beg = 8;
end = 6;
for (j = beg; j < end; j++)
...
 
M

Michael Angelo Ravera

Yes, someone can tell you. And the answers will be different on
different machines, and for different compilers on the same machine, and
for different optimization levels for the same compiler on the same
machine. The timings will be different depending upon the nature of the
surrounding code, which will affect how effective the compiler's
optimization will be. Finally, and most importantly, if there are two
different but logically equivalent ways of writing your code, the best
modern compilers will, at high optimization levels, re-write the slower
one so that it is implemented that same way as the faster one. As a
result, if you perform a test, you may find that the two loops execute
equally fast.

In a language like C, which is meant to be widely portable and which
gives you very poor control over the generated machine code, it's a
waste of time to acquire that kind of knowledge. It makes more sense to
worry about such things when you're writing assembly code, where you
have more precise control over what the generated machine code will look
like, and a corresponding responsibility to acquire precisely that kind
of information for the particular machine you're writing the assembly
code for. You don't expect such code to be portable, so you don't have
to worry about the fact that the answers will be different on different
machines.

In C, it makes more sense to worry about higher-level issues - make sure
your algorithm doesn't waste time doing the same thing twice (or worse,
the same thing N times, for N very large). Make sure it only does what
needs to be done, and doesn't waste time and source code doing things
that don't need to be done. Make sure your algorithm is correct, and
most important of all, write your code so that it is clearly a correct
implementation of the algorithm. Let the compiler worry about making it
fast.

If and only if you find your program has a actual speed problem, run a
profiler to find out where it is wasting the most time - it's almost
never the same place that you thought it was, and it's almost never
involves issues like the loop condition you're worrying about here. Keep
in mind that the changes you make to speed it up for the current
platform could make it run slower on some other platform - so you should
always be very reluctant to make such changes.

.... what he said, but you can write the factorial function better than either of these.

1) You have no protection of overflow in either of the above examples.
2) The result isn't correct or useful for negative inputs
3) If you wanted it to be really fast, you'd use a const table (with the table in code space)
4) Write it like this and save a whole iteration (skipping a multiply by what you know will be one and the end of the calculation):
int fact2_func(int n)
{
int i, fact = 1;
for (i = n; i > 1; i--)
fact *= i;
return(fact);
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top