[OT] pointer vs. index notation

A

Ark Khasin

So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?
 
S

santosh

Ark Khasin said:
So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?


It has been mentioned that the pointer notation leads to slightly more
efficient code generation, but I suppose compilers these days generate
the same code for both the forms.

<OT>
On this machine (Pentium Dual Core), gcc *does* different assembler code
for both the notations. The array notation seems to give rise to a full
base-indexed displacement instruction while the pointer notation use a
simple indirection.
</OT>
 
C

Chris Torek

So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).


Generally speaking, it should not. However:
Any idea why? Does notation serve as a hint to the compiler?

It depends on the compiler.

Most compilers today build trees (and maybe additional data structures
as well, but they at least start out with trees) to represent
expressions. The expression p "should" (logically at least,
and using Lisp notation for the tree) turn into (* (+ p i)). The
expression i[p] would turn into (* (+ i p)).

Given a compiler that uses trees, it may then run the optimization
pass on those trees. It is possible that whoever wrote this pass
thought to look for (* (+ p i)) patterns and optimize them, but
forgot to include (* (+ i p)) patterns. Those thus escape at least
some optimizations and survive to the code-generation portion of
the compiler.

As long as the code-generation part of the compiler handles the
second pattern (instead of crashing with "internal compiler error:
cannot figure out how to produce code for expression" or whatever),
you will still get machine (or assembly or whatever output format)
code for both constructs.

If you look inside actual C compilers that do optimization with
expression trees, there is usually some horrible thousands-of-lines
switch statement or "if/else" chain or some such to match particular
trees (though some use tables, and some have hybrids of tables
*and* horrible 3000-line switch statements :) ), and it is easy
for the programmer to forget to put in symmetric cases. Fortunately
it is also easy to add them afterward -- so you just need to point
out to the compiler-writers that p works well and i[p] does not,
and ten coding minutes (and 3 hours to compile the compiler) later,
they both work equally well.
 
J

Johannes Bauer

santosh said:
Ark Khasin said:
So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?


It has been mentioned that the pointer notation leads to slightly more
efficient code generation, but I suppose compilers these days generate
the same code for both the forms.


I do not know how *any* compiler could dereference an integer value (as
the i[p] notation suggests).
<OT>
On this machine (Pentium Dual Core), gcc *does* different assembler code
for both the notations.

On my x86_64 machine, using gcc 4.1.2, -O3 optimization it does not.
Code is completely equal.

Greetings,
Johannes
 
C

Chris Dollin

Johannes said:
santosh said:
Ark Khasin said:
So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?


It has been mentioned that the pointer notation leads to slightly more
efficient code generation, but I suppose compilers these days generate
the same code for both the forms.


I do not know how *any* compiler could dereference an integer value (as
the i[p] notation suggests).


It may suggest it, but it doesn't mean it. `i[p]` is (as is wrote above)
`*(i+p)`, which commutes to `*(p+i)`, for which `p` is (just) shorthand.
In all four of these expressions, it's the sum of `p` and `i` which is
dereferences, not either of then individual values.
 
J

Johannes Bauer

Chris said:
I do not know how *any* compiler could dereference an integer value (as
the i[p] notation suggests).

It may suggest it, but it doesn't mean it. `i[p]` is (as is wrote above)
`*(i+p)`, which commutes to `*(p+i)`, for which `p` is (just) shorthand.
In all four of these expressions, it's the sum of `p` and `i` which is
dereferences, not either of then individual values.


I fully understand your line of argumentation. And indeed, they are
equivalent. I just tried out

char moo[] = "Test.";
printf("%c\n", 2[moo]);

And was quite surprised, to be honest, that it compiled cleanly and
yielded the expected result. Knowing C for 10 years I'm still sometimes
surprised what a sick yet beautiful language it is ;-)

Greetings,
Johannes
 
W

Willem

Johannes wrote:
) I fully understand your line of argumentation. And indeed, they are
) equivalent. I just tried out
)
) char moo[] = "Test.";
) printf("%c\n", 2[moo]);

Try:

printf("%c\n", 2["Test."]);

;)


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
D

dj3vande

Chris Torek said:
If you look inside actual C compilers that do optimization with
expression trees, there is usually some horrible thousands-of-lines
switch statement or "if/else" chain or some such to match particular
trees (though some use tables, and some have hybrids of tables
*and* horrible 3000-line switch statements :) ), and it is easy
for the programmer to forget to put in symmetric cases.

Is there a good reason why they don't do a normalization pass first, to
convert symmetric operations into a canonical form? It seems to me
that that would be an easy and cheap way to reduce the size of the
massive switch-or-table, though some care would have to be taken to
avoid normalizing asymmetric operations like subtraction.


dave
 
C

Chris Torek

Is there a good reason why they don't do a normalization pass first, to
convert symmetric operations into a canonical form?

I imagine some do, although the last time I was digging around
inside gcc, I think it did not, and I have not seen it in the
innards of one or two other compilers I have touched. (I should
note that I have not gone spelunking in gcc since the 2.95 days.)
It seems to me that that would be an easy and cheap way to reduce
the size of the massive switch-or-table, though some care would
have to be taken to avoid normalizing asymmetric operations like
subtraction.

Yes. It also means one more pass over the tree, which has some
time cost.
 
K

Keith Thompson

Johannes Bauer said:
santosh said:
Ark Khasin said:
So, p is *(p+i) is (for fun) *(i+p) is i[p].
Yet I observed more than once that the equivalent constructs yield
different code generated (and sometimes of different efficiency).
Any idea why? Does notation serve as a hint to the compiler?


It has been mentioned that the pointer notation leads to slightly more
efficient code generation, but I suppose compilers these days generate
the same code for both the forms.


I do not know how *any* compiler could dereference an integer value (as
the i[p] notation suggests).

[...]

See question 6.11 in the comp.lang.c FAQ, <http://www.c-faq.com/>.
 
C

CBFalconer

Johannes said:
santosh schrieb:
.... snip ...
It has been mentioned that the pointer notation leads to slightly
more efficient code generation, but I suppose compilers these days
generate the same code for both the forms.

I do not know how *any* compiler could dereference an integer value
(as the i[p] notation suggests).

A reference of that type is converted into a pointer (the p) and an
integer (the i). As long as there is one of each the results can
be added and then dereferenced (assuming within range). Note that
the integer is multiplied by (sizof *p).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top