[OT] pointer vs. index notation

Ark Khasin · Nov 12, 2007

So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?

santosh · Nov 12, 2007

Ark Khasin said:
So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?

It has been mentioned that the pointer notation leads to slightly more
efficient code generation, but I suppose compilers these days generate
the same code for both the forms.

<OT>
On this machine (Pentium Dual Core), gcc *does* different assembler code
for both the notations. The array notation seems to give rise to a full
base-indexed displacement instruction while the pointer notation use a
simple indirection.
</OT>

Chris Torek · Nov 12, 2007

So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).

Generally speaking, it should not. However:

Any idea why? Does notation serve as a hint to the compiler?

Click to expand...

It depends on the compiler.

Most compilers today build trees (and maybe additional data structures
as well, but they at least start out with trees) to represent
expressions. The expression p "should" (logically at least,
and using Lisp notation for the tree) turn into (* (+ p i)). The
expression i[p] would turn into (* (+ i p)).

Given a compiler that uses trees, it may then run the optimization
pass on those trees. It is possible that whoever wrote this pass
thought to look for (* (+ p i)) patterns and optimize them, but
forgot to include (* (+ i p)) patterns. Those thus escape at least
some optimizations and survive to the code-generation portion of
the compiler.

As long as the code-generation part of the compiler handles the
second pattern (instead of crashing with "internal compiler error:
cannot figure out how to produce code for expression" or whatever),
you will still get machine (or assembly or whatever output format)
code for both constructs.

If you look inside actual C compilers that do optimization with
expression trees, there is usually some horrible thousands-of-lines
switch statement or "if/else" chain or some such to match particular
trees (though some use tables, and some have hybrids of tables
*and* horrible 3000-line switch statements ), and it is easy
for the programmer to forget to put in symmetric cases. Fortunately
it is also easy to add them afterward -- so you just need to point
out to the compiler-writers that p works well and i[p] does not,
and ten coding minutes (and 3 hours to compile the compiler) later,
they both work equally well.

Johannes Bauer · Nov 12, 2007

santosh said:
Ark Khasin said:

So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?

Click to expand...

It has been mentioned that the pointer notation leads to slightly more
efficient code generation, but I suppose compilers these days generate
the same code for both the forms.

I do not know how *any* compiler could dereference an integer value (as
the i[p] notation suggests).

<OT>
On this machine (Pentium Dual Core), gcc *does* different assembler code
for both the notations.

Click to expand...

On my x86_64 machine, using gcc 4.1.2, -O3 optimization it does not.
Code is completely equal.

Greetings,
Johannes

Chris Dollin · Nov 12, 2007

Johannes said:
santosh said:

Ark Khasin said:

So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?

Click to expand...

It has been mentioned that the pointer notation leads to slightly more
efficient code generation, but I suppose compilers these days generate
the same code for both the forms.

Click to expand...

I do not know how *any* compiler could dereference an integer value (as
the i[p] notation suggests).

It may suggest it, but it doesn't mean it. `i[p]` is (as is wrote above)
`*(i+p)`, which commutes to `*(p+i)`, for which `p` is (just) shorthand.
In all four of these expressions, it's the sum of `p` and `i` which is
dereferences, not either of then individual values.

Johannes Bauer · Nov 12, 2007

Chris said:
I do not know how *any* compiler could dereference an integer value (as
the i[p] notation suggests).

Click to expand...

It may suggest it, but it doesn't mean it. `i[p]` is (as is wrote above)
`*(i+p)`, which commutes to `*(p+i)`, for which `p` is (just) shorthand.
In all four of these expressions, it's the sum of `p` and `i` which is
dereferences, not either of then individual values.

I fully understand your line of argumentation. And indeed, they are
equivalent. I just tried out

char moo[] = "Test.";
printf("%c\n", 2[moo]);

And was quite surprised, to be honest, that it compiled cleanly and
yielded the expected result. Knowing C for 10 years I'm still sometimes
surprised what a sick yet beautiful language it is ;-)

Greetings,
Johannes

Willem · Nov 12, 2007

Johannes wrote:
) I fully understand your line of argumentation. And indeed, they are
) equivalent. I just tried out
)
) char moo[] = "Test.";
) printf("%c\n", 2[moo]);

Try:

printf("%c\n", 2["Test."]);

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

dj3vande · Nov 12, 2007

Chris Torek said:
If you look inside actual C compilers that do optimization with
expression trees, there is usually some horrible thousands-of-lines
switch statement or "if/else" chain or some such to match particular
trees (though some use tables, and some have hybrids of tables
*and* horrible 3000-line switch statements ), and it is easy
for the programmer to forget to put in symmetric cases.

Is there a good reason why they don't do a normalization pass first, to
convert symmetric operations into a canonical form? It seems to me
that that would be an easy and cheap way to reduce the size of the
massive switch-or-table, though some care would have to be taken to
avoid normalizing asymmetric operations like subtraction.

dave

Chris Torek · Nov 12, 2007

Is there a good reason why they don't do a normalization pass first, to
convert symmetric operations into a canonical form?

I imagine some do, although the last time I was digging around
inside gcc, I think it did not, and I have not seen it in the
innards of one or two other compilers I have touched. (I should
note that I have not gone spelunking in gcc since the 2.95 days.)

It seems to me that that would be an easy and cheap way to reduce
the size of the massive switch-or-table, though some care would
have to be taken to avoid normalizing asymmetric operations like
subtraction.

Yes. It also means one more pass over the tree, which has some
time cost.

Keith Thompson · Nov 12, 2007

Johannes Bauer said:
santosh said:

Ark Khasin said:

So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?

Click to expand...

It has been mentioned that the pointer notation leads to slightly more
efficient code generation, but I suppose compilers these days generate
the same code for both the forms.

Click to expand...

I do not know how *any* compiler could dereference an integer value (as
the i[p] notation suggests).

[...]

See question 6.11 in the comp.lang.c FAQ, <http://www.c-faq.com/>.

CBFalconer · Nov 12, 2007

Johannes said:
santosh schrieb:
.... snip ...

It has been mentioned that the pointer notation leads to slightly
more efficient code generation, but I suppose compilers these days
generate the same code for both the forms.

Click to expand...

I do not know how *any* compiler could dereference an integer value
(as the i[p] notation suggests).

A reference of that type is converted into a pointer (the p) and an
integer (the i). As long as there is one of each the results can
be added and then dereferenced (assuming within range). Note that
the integer is multiplied by (sizof *p).

incompatible pointer assignment	7	Dec 10, 2012
underlying implementation of array and pointer ?	6	Sep 8, 2011
pointer to an array vs pointer to pointer	5	Sep 20, 2011
Seeking some help with pointer/array notation wrt class members.	2	Jul 29, 2006
OT! Python vs... Objective-C!	2	Jun 21, 2010
Observing a container	2	Dec 10, 2013
Pointer to int, implicit conversion	19	Nov 20, 2007
arithmetic on a void * pointer	140	Jan 9, 2010

[OT] pointer vs. index notation

Ark Khasin

santosh

Chris Torek

Johannes Bauer

Chris Dollin

Johannes Bauer

Willem

dj3vande

Chris Torek

Keith Thompson

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads