C perfomance

Papadopoulos Giannis · Jan 29, 2004

a) pre vs post increment/decrement

I have read somewhere that:

“Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object, increment
the internal value, and then return the temporary. Prefix operators
(++i) increment the value and return a reference to it. With objects
such as iterators, creating temporary copies is expensive compared to
built-in ints.”

A modern compiler wouldn’t make the optimization, so

i++;

and

++i;

would give the same instructions?

b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?

c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs... I don’t see any obvious performance gains - unless
they do it to remember what they are using...

Papadopoulos Giannis · Jan 29, 2004

Bruno said:
Papadopoulos Giannis wrote:
(snip)

Could it be possible that some people don't know that they can declare
variables at the start of each *block* ?-)

Bruno

Maybe.. But I find it often and I wonder...

Bruno Desthuilliers · Jan 29, 2004

Papadopoulos Giannis wrote:
(snip)

c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs...

Could it be possible that some people don't know that they can declare
variables at the start of each *block* ?-)

Bruno

Christian Bau · Jan 29, 2004

Papadopoulos Giannis said:
a) pre vs post increment/decrement

I have read somewhere that:

³Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object, increment
the internal value, and then return the temporary. Prefix operators
(++i) increment the value and return a reference to it. With objects
such as iterators, creating temporary copies is expensive compared to
built-in ints.²

I bet you didn't read that in a book about C.

b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?

c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs... I don¹t see any obvious performance gains - unless
they do it to remember what they are using...

You worry too much about performance, and you worry too much about the
wrong kind of performance. First try to write code that is bug-free and
readable. That is the most important thing.

If there is need to make your code faster: First measure. Get yourself a
profiler, learn how to use it, learn how to interpret the numbers. Then
before trying to figure out how to make an operation faster that you do
a million times, figure out how to do it only 100,000 times or 1000
times. That's how you make a program fast.

E. Robert Tisdale · Jan 29, 2004

Papadopoulos said:
a) pre vs post increment/decrement

I have read somewhere that:

“Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object,
increment the internal value, and then return the temporary.
Prefix operators (++i) increment the value
and return a reference to it. With objects such as iterators,
creating temporary copies is expensive compared to built-in ints.”

This must be a reference to
overloaded increment and decrement operators in C++.

Favoring pre-decrement/increment over post decrement/increment
operators is a good habit for C programmers who must also
write C++ programs. Otherwise, it is a matter of style.

A modern compiler wouldn’t make the optimization, so

i++;

and

++i;

would give the same instructions?
Yes.

b) I find that realloc() calls sometimes take more time to complete
than malloc() calls. Is this the general case?

Typically, the difference is hard to measure except in contrived cases.

c) Why do some people declare all the variables
at the start of each function? And I mean *all* variables
including those that are nested in deep for's and ifs...
I don’t see any obvious performance gains -
unless they do it to remember what they are using...

1. Some C programs are translations of Fortran programs.
2. Some C programs are written by Fortran programmers.
3. Early versions of C (before C 89) required this
according to Brian W. Kernighan and Dennis M. Ritchie
in "The C Programming Language",
Chapter 1: A Tutorial Introduction,
Section 2: Variables and Arithmetic, page 8:
"In C, /all/ variables must be declared before use, usually
at the beginning of a function before any executable statements."

Mark McIntyre · Jan 29, 2004

a) pre vs post increment/decrement

I have read somewhere that:

“Prefer pre-increment and -decrement to postfix operators.

Yes, this is an old chestnut. I can find nothing to support it
nowadays, tho ICBW.

i++;
and
++i;
would give the same instructions?

AFAIK yes. Try it and see.

b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?

The standard doesn't say.

c) Why do some people declare all the variables at the start of each
function?

Until C99, you pretty much had to do it like that. Plus many people
consider it a good idea to keep your declarations in one place for
easier reference. Spraying declarations around through the body of
your code makes it a lot harder to follow.

And I mean ALL variables, including those that are nested in
deep fors and ifs...

I agree, this is often a bad idea.

I don’t see any obvious performance gains

Again, C doesn't say.

unless they do it to remember what they are using...

Which may be a performance gain in itself of course

Nick Landsberg · Jan 29, 2004

Christian said:
I bet you didn't read that in a book about C.

This actually depends on the underlying chip architecture,
which is probably off-topic here.

You worry too much about performance, and you worry too much about the
wrong kind of performance. First try to write code that is bug-free and
readable. That is the most important thing.
AMEN!

If there is need to make your code faster: First measure. Get yourself a

^^^^
The key word is "need". What are the performance requirements? You
mean you didn't get any from the customer? Shame on you! This will
tell you how fast if MUST be in order to be acceptable.

profiler, learn how to use it, learn how to interpret the numbers. Then
before trying to figure out how to make an operation faster that you do
a million times, figure out how to do it only 100,000 times or 1000
times. That's how you make a program fast.

Agreed 1,000%! However, often after the developer has already
written code which uses foo() hundreds of thousands of times,
there is usually an emotional unwillingness to admit that a better
algorithm would do the trick and then they try to optimize foo(), even
if it's a standard library call. Start with requirements, as above,
design your algorithms, protoype and measure those which are going
to be invoked most often in order to find out if there will be
a problem. It's not just the efficiency (or lack thereof) in any
module, it's cpu-cost times frequency of use.

This holds for any language, not just C.
Sheesh, this IS [OT].

Nick Landsberg · Jan 29, 2004

E. Robert Tisdale said:
This must be a reference to
overloaded increment and decrement operators in C++.

Favoring pre-decrement/increment over post decrement/increment
operators is a good habit for C programmers who must also
write C++ programs. Otherwise, it is a matter of style.

Yes.

For this trivial example, yes.

For the case of j = i++; vs. j = ++i;
(which are very different in intent), the
emitted code SHOULD be different,
unless you have a broken compiler.

The efficiency of the constructs to implement
these is a function of the underlying chip
architecture and not a language issue.

CBFalconer · Jan 30, 2004

Mark said:
.... snip ...

The standard doesn't say.

However, if you think about typical implementations, _sometimes_
it is necessary to allocate a whole new block of memory and copy
old data over to it. It will normally take longer to copy than to
not copy.

Until C99, you pretty much had to do it like that. Plus many people
consider it a good idea to keep your declarations in one place for
easier reference. Spraying declarations around through the body of
your code makes it a lot harder to follow.

IMO if those declarations are getting awkwardly far away from the
place they are used, you are writing overly large functions in the
first place.

E. Robert Tisdale · Jan 30, 2004

CBFalconer said:
IMO if those declarations are getting awkwardly far away
from the place they are used,
you are writing overly large functions in the first place.

I agree.
And moving the declarations closer to the point of first use
is the first step in decomposing the function
into a set of smaller functions
that the compiler can inline automatically.

pete · Jan 30, 2004

Papadopoulos said:
a) pre vs post increment/decrement

I have read somewhere that:

“Prefer pre-increment and -decrement to postfix operators. Postfix
operators (i++) copy the existing value to a temporary object,
increment the internal value, and then return the temporary.

That's all wrong.
There is no order of operation vs evaluation implied in i++.
Any code which depends on such an order, has a problem.

These loops are semantically equal:
while (i++ != 5) {/*code*/}
while (++i != 6) {/*code*/}

Paul Hsieh · Jan 31, 2004

Papadopoulos Giannis said:
A modern compiler wouldn?t make the optimization, so

i++;

and

++i;

would give the same instructions?

Not only will the compiler give you the same instructions -- even if
it didn't the underlying CPU would execute all trivially equivalent
re-expressions of said expression with identical performance. (inc
eax; add eax, 1; sub eax, -1 -- its all the same.)

b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?

realloc() may have to perform a memcpy(). In general, actually you
should find that realloc() is *MUCH* slower than malloc().

c) Why do some people declare all the variables at the start of each
function? And I mean ALL variables, including those that are nested in
deep fors and ifs... I don?t see any obvious performance gains - unless
they do it to remember what they are using...

There is no performance difference, whatsoever. I also would point
out that I actually try to put any variable declaration that isn't
reused into the deepest possible scope where it can be declared. This
gives the compiler an opportunity to alias variables (even of
different types) while helping catch errors of using dead variables
out of scope.

Christian Bau · Jan 31, 2004

realloc() may have to perform a memcpy(). In general, actually you
should find that realloc() is *MUCH* slower than malloc().

Of course it has to do more things. If you tried to do the same things
as realloc by hand (malloc + memcpy + free + all kinds of checks), then
most likely that would end up slower.

Tim Prince · Jan 31, 2004

Paul Hsieh said:
Not only will the compiler give you the same instructions -- even if
it didn't the underlying CPU would execute all trivially equivalent
re-expressions of said expression with identical performance. (inc
eax; add eax, 1; sub eax, -1 -- its all the same.)

No, these instructions don't perform the same, on common platforms which
spell them this way. All the more reason for using a compiler which can
take portable C and choose the best instruction for the target architecture.
The OP assertion, that ++i could be more efficient (when used in subscript
context), was true of gcc on the Mac I once had.

Malcolm · Jan 31, 2004

Papadopoulos Giannis said:
a)
“Prefer pre-increment and -decrement to postfix operators.

You're mixing up C++ overloaded operators with the C types. In C++,
overloading the postincrement operator does indeed force the compiler to
make a temporary copy. In C the machine instructions are identical, but the
convention is to use postfix form where the order of evaluation doesn't
matter.

b) I find that realloc() calls sometimes take more time to complete than
malloc() calls. Is this the general case?

Yes, because realloc() has to copy the reallocated block. However this isn't
always the case, because some libraries set newly-allocated memory to a
fixed value, which takes as much time as copying.

c) Why do some people declare all the variables at the start of each
function?

This is because we already have two levels of scope - global scope and file
scope. Function scopes adds another layer. Adding a fourth, or
multiply-nested block scopes, moves the number of levels beyond what a human
programmer can reasonably be expected to cope with. There is of course no
problem for the computer - it's a human-to-human thing.

Paul Hsieh · Feb 1, 2004

Tim Prince said:
No, these instructions don't perform the same, on common platforms which
spell them this way.

Yes they do. The only difference is inc doesn't create a dependency
on the carry flag. Otherwise they all take a third, quarter or half a
clock depending on which brand of x86 processor is executing them.
This is true of the Intel 80486, and every 486 or better class x86
architecture that has followed it, from whichever vendor.

[...] All the more reason for using a compiler which can
take portable C and choose the best instruction for the target architecture.
The OP assertion, that ++i could be more efficient (when used in subscript
context), was true of gcc on the Mac I once had.

Well ok, then the Mac port of the gcc compiler sucks ass -- but it
also means that the underlying PPC must be somewhat weak not to make
this irrelevant, which I am not sure I believe.

Chris Torek · Feb 1, 2004

Yes they do. The only difference is inc doesn't create a dependency
on the carry flag. Otherwise they all take a third, quarter or half a
clock depending on which brand of x86 processor is executing them.

I will believe you on the cycle count (my x86 documentation is not
handy anyway

), but they differ in another way that *could*
matter in terms of performance: "inc eax" is a single byte opcode
(0x40), "add 1" is a six-byte sequence (01 05, followed by the four
bytes representing 1), and "sub -1" is also a six-byte sequence
(29 05, followed by the four bytes representing -1). Thus, the
first variant saves code space. Depending on instruction cache
usage, this could affect the performance of some loop. (It seems
a bit unlikely.)

[...] All the more reason for using a compiler which can
take portable C and choose the best instruction for the target architecture.
The OP assertion, that ++i could be more efficient (when used in subscript
context), was true of gcc on the Mac I once had.

Click to expand...

Well ok, then the Mac port of the gcc compiler sucks ass -- but it
also means that the underlying PPC must be somewhat weak not to make
this irrelevant, which I am not sure I believe.

I think you are making unwarranted assumptions here, such as which
version of gcc was involved, and whether the target CPU was the
PowerPC or the older Mac CPU family, the 680x0. It is, however,
true that the PowerPC has an unusual instruction set, with instructions
like "bdnzf" (decrement count and branch if comparison result not
equal and count not zero) and "rlwinm" (rotate and mask), and
apparently many versions of gcc do not use it very effectively.

About all that can be said with any certainty, when it comes to
actual run time of various C source code constructs, is that "it
depends".

CBFalconer · Feb 1, 2004

Chris said:
.... snip ...

I think you are making unwarranted assumptions here, such as which
version of gcc was involved, and whether the target CPU was the
PowerPC or the older Mac CPU family, the 680x0. It is, however,
true that the PowerPC has an unusual instruction set, with instructions
like "bdnzf" (decrement count and branch if comparison result not
equal and count not zero) and "rlwinm" (rotate and mask), and
apparently many versions of gcc do not use it very effectively.

About all that can be said with any certainty, when it comes to
actual run time of various C source code constructs, is that "it
depends".

I consider that a prerequisite for building an efficient code
generator is a good assembly language programmer for that
machine. The next requirement is a good register allocation
scheme. These days things are complicated by pipelining and the
need to know jump probabilities.

Paul Hsieh · Feb 1, 2004

Chris Torek said:
I will believe you on the cycle count (my x86 documentation is not
handy anyway ), but they differ in another way that *could*
matter in terms of performance: "inc eax" is a single byte opcode
(0x40), "add 1" is a six-byte sequence (01 05, followed by the four
bytes representing 1),

There is a short mode of encoding that allows a byte offset rather
than a dword offset (083h 0C1h) making for a total of 3 bytes for the
instruction. Same with sub. Since all x86s can consume 16-bytes
worth per clock in the instruction fetch, this is fine. I would
expect all x86 compilers and assemblers to see this.

I think you are making unwarranted assumptions here, such as which
version of gcc was involved, and whether the target CPU was the
PowerPC or the older Mac CPU family, the 680x0.

The 680x0 is kind of obsolete on the Mac. Not quite as obsolete as
the 386 (where there is a difference between inc and add), but just
about. Modern PowerPCs are out of order and fairly wide executers, so
using special instructions isn't going to really help.

Microprocessor architectures have all been moving forward in a
particular way -- all that matters is the length of the long
dependency chain, not the vague slight differences in expressing each
node on that chain. It doesn't make sense to chase performance in
this way any more.

Christian Bau · Feb 1, 2004

Tim Prince said:
Tim Prince said:

[...] All the more reason for using a compiler which can
take portable C and choose the best instruction for the target architecture.
The OP assertion, that ++i could be more efficient (when used in subscript
context), was true of gcc on the Mac I once had.

Click to expand...

Well ok, then the Mac port of the gcc compiler sucks ass -- but it
also means that the underlying PPC must be somewhat weak not to make
this irrelevant, which I am not sure I believe.

Not for ++i vs. i++, but for *++p vs. *p++ it made a difference: The
PowerPC has a "load with update" instruction: It calculates an address,
loads the data at that address, and writes the address into a register,
all in one instruction. This makes it easy to implement x = *++p; in one
instruction: Calculate p + 1, load x from address p + 1, store p + 1
into p all in one instruction. x = *p++ needs two instructions instead.

historical question, C unary operators	13	Mar 29, 2012
C Expressions	8	Jul 18, 2006
Performance of signed vs unsigned types	84	Apr 20, 2011
New C operator -- would it be a good idea?	28	Sep 10, 2012
Inheritance of overloaded ++ operator issue	1	Oct 2, 2011
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
The Wikipedia article on C and C++ operators	52	Jul 28, 2006
Pointer math	7	Nov 11, 2008

C perfomance

Papadopoulos Giannis

Papadopoulos Giannis

Bruno Desthuilliers

Christian Bau

E. Robert Tisdale

Mark McIntyre

Nick Landsberg

Nick Landsberg

CBFalconer

E. Robert Tisdale

pete

Paul Hsieh

Christian Bau

Tim Prince

Malcolm

Paul Hsieh

Chris Torek

CBFalconer

Paul Hsieh

Christian Bau

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads