Hi all
I am currently exploring the world of pointers and have encounter some
inconsistent information regarding the best way to reference an
array: array[1] OR *(array +1).
One book says that you should not use the syntax array[1] because
of performance reason. Another book says that the syntax array[1] is
only used by FORTRAN programmers who do not understand c pointers. So
what is the true oh wizards? Does it make a difference?
Thanks Peter
Out of curiosity, which books are these?
FWIW, subscript notation *may* result in a few more instructions at
the assembly level than pointer notation. Here are some examples
compiled with gcc -g -Wa,aldh:
First, indexing with a constant expression:
14:init.c **** y = x[5];
210 .LM5:
211 0025 8B45DC movl -36(%ebp), %eax
212 0028 8945C4 movl %eax, -60(%ebp)
15:init.c **** y = *(x + 5);
214 .LM6:
215 002b 8B45DC movl -36(%ebp), %eax
216 002e 8945C4 movl %eax, -60(%ebp)
No difference; the same number of instructions are generated.
Indexing with an auto variable:
17:init.c **** y = x[z];
218 .LM7:
219 0031 8B45C0 movl -64(%ebp), %eax
220 0034 89C0 movl %eax, %eax
221 0036 8D148500 leal 0(,%eax,4), %edx
221 000000
222 003d 8D45C8 leal -56(%ebp), %eax
223 0040 89C0 movl %eax, %eax
224 0042 8B0402 movl (%edx,%eax), %eax
225 0045 8945C4 movl %eax, -60(%ebp)
18:init.c **** y = *(x + z);
227 .LM8:
228 0048 8B45C0 movl -64(%ebp), %eax
229 004b 8B4485C8 movl -56(%ebp,%eax,4), %eax
230 004f 8945C4 movl %eax, -60(%ebp)
Hmp. Subscripting with a variable results in more instructions
compared to the manual dereference, at least under these
circumnstances (gcc compiler, debugging turned on, no optimization).
This is one specific case, using one specific compiler with one
specific set of compiler settings. If I turn on optimization, those
differences may disappear. Or they may not. It may turn out to be
that using subscript notation *really is* less efficient (or at least
requires more instructions) than manually adding offsets and
dereferencing in most circumstances.
And unless you're failing to meet a *hard* space or performance
requirement AND array accesses are the absolute last place you can
squeeze out those last few bytes and/or cycles *AND* you *know* that
pointer notation will result in fewer/faster instructions, use
subscript notation (x
). It's easier to read (especially for
multidimensional array accesses) and it more clearly conveys intent.
When you're thinking about this kind of micro-optimization, you have
to ask yourself the following questions:
1. How many times am I executing this operation? Do I do it once
over the lifetime of the program (in which case any gains are down in
the noise), or do I repeat the operation hundreds or thousands (or
more) times (in which case the gains are measurable)?
2. Have I *measured* the performance difference between the two
versions? Under a variety of conditions?
3. Are these differences consistent across different compilers?
Would leaving the code alone and simply switching to a different/
better compiler buy me the performance I need?
4. What is the tradeoff in terms of readability/maintainability?
Would I rather debug code that reads
x = y[j++][++k];
or
x = *(*(*(y + i) + j++) + ++k);
Both hurt to look at, but IMO the first hurts less. YMMV.