array index and pointer, which is faster?

Xiaohan · Feb 29, 2008

p is an array and N is a large number. Which of the following two
loops is faster?

Case1:

for(i=0; i<N; i++){
p = i;
}

Case2:

for(i=0; i<N; i++){
*p = i;
p++;
}

It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation. But I
tested it, the speed is almost the same. Can anybody tell me why?
Thanks.

For your reference, the assembly code is as follows:

For Case 1
for(i=0; i<N; i++){
0040101B xor ecx,ecx
0040101D lea ecx,[ecx]
p = i;
00401020 mov dword ptr [eax+ecx*4],ecx
00401023 add ecx,1
00401026 cmp ecx,5F5E100h
0040102C jl main+20h (401020h)
//p++;
}

for Case 2, it is:
for(i=0; i<N; i++){
0040101B xor ecx,ecx
0040101D lea ecx,[ecx]
*p = i;
00401020 mov dword ptr [eax],ecx
00401022 add ecx,1
p++;
00401025 add eax,4
00401028 cmp ecx,5F5E100h
0040102E jl main+20h (401020h)
}

Richard Tobin · Feb 29, 2008

Xiaohan said:
p is an array and N is a large number. Which of the following two
loops is faster?

Case1:

for(i=0; i<N; i++){
p = i;
}

Case2:

for(i=0; i<N; i++){
*p = i;
p++;
}

Depends entirely on your compiler. With a good compiler it shouldn't
make any difference: optimising indexing in a loop to pointer arithmetic
was common in the 1970s if not before.

Other things being equal I would prefer the array indexing version,
since in more complicated cases it might let the compiler see that
there is no aliasing.

It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation.

Click to expand...

That's only true if your compiler doesn't optimise one to the other.
But in any case, a shift operation on a register is extremely quick.
And for large amounts of data, the limiting factor will be memory
access so that it won't make any difference at all.

-- Richard

Andrey Tarasevich · Feb 29, 2008

Xiaohan said:
p is an array and N is a large number. Which of the following two
loops is faster?

Case1:

for(i=0; i<N; i++){
p = i;
}

Case2:

for(i=0; i<N; i++){
*p = i;
p++;
}

It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation.

Neither case "needs" anything like that. The compiler is free to
interpret this code anyway it likes, as long as the observable behavior
of the program satisfies the requirements imposed by the language
specification. Thats' the only thing it "needs".

But I
tested it, the speed is almost the same. Can anybody tell me why?

Click to expand...

Well, take a look at the assembly code you quoted. Don't you see that it
is exactly the same in both cases?

A typical compiler doesn't necessarily think in terms of individual C
operations. It doesn't try to translate each C operation into its exact
analogue in machine language, as you seem to believe. The compiler
normally tries to see the bigger picture - what your code really does -
and then choses the most optimal way to implement the requested
functionality in machine language. Of course its capabilities are
limited and sometimes its behavior might seem (and actually be) illogical.

In your case the compiler was smart enough to realize that both versions
of the code do exactly the same thing (in terms of the observable
behavior of the program), so it translated them into identical machine
code - the one it considers to be the most optimal.

Doug Miller · Feb 29, 2008

Xiaohan wrote:

Well, take a look at the assembly code you quoted. Don't you see that it
is exactly the same in both cases?

Well, *I* certainly don't see that, and you probably won't either if you look
at it again...

lawrence.jones · Feb 29, 2008

Xiaohan said:
It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation. But I
tested it, the speed is almost the same. Can anybody tell me why?

Because on most current processors multiplication is fast and shifting
is extremely fast, and there are multiple ALUs to allow address
arithmetic to be done in parallel with other operations so it takes no
additional time. This has very little to do with C, though.

-Larry Jones

I always send Grandma a thank-you note right away. ...Ever since she
sent me that empty box with the sarcastic note saying she was just
checking to see if the Postal Service was still working. -- Calvin

lawrence.jones · Feb 29, 2008

Andrey Tarasevich said:
Well, take a look at the assembly code you quoted. Don't you see that it
is exactly the same in both cases?

You may want to take another look -- the code is most definitely
different for the two cases.

-Larry Jones

I always have to help Dad establish the proper context. -- Calvin

Andrey Tarasevich · Feb 29, 2008

Doug said:
Well, *I* certainly don't see that, and you probably won't either if you look
at it again...

Yeah, I missed the difference somehow... Still I'd stay that in this
case the chances to end up with the same machine code are pretty high
with a modern compiler.

Xiaohan · Feb 29, 2008

Thank you all guys for your information!

I searched a little bit on the internet and it seems that nowadays the
compiler is really smart that it can do many optimizations. So is it
still worth doing it by ourselfs? Or can we say that we should only
try to optimize the algorithm itself instead the code?

The other example is like this, which is very common in image
processing:
for(i=0; i<h; i++){
for(j=0; j<w; j++){
p[i*w+j] = ...;
}
}
VS.
for(i=0; i<h; i++){
p_temp = p + i*w;
for(j=0; j<w; j++){
p_temp[j] = ...;
}
}

I tested those two cases, and the speed is almost the same...

Any ideas?

Xiaohan

Default User · Feb 29, 2008

Xiaohan said:
Thank you all guys for your information!

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or:
<http://www.caliburn.nl/topposting.html>

Xiaohan · Feb 29, 2008

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or:
<http://www.caliburn.nl/topposting.html>

It's bottom posting now.

Thank you all guys for your information!

I searched a little bit on the internet and it seems that nowadays
the
compiler is really smart that it can do many optimizations. So is it
still worth doing it by ourselfs? Or can we say that we should only
try to optimize the algorithm itself instead the code?

The other example is like this, which is very common in image
processing:
for(i=0; i<h; i++){
for(j=0; j<w; j++){
p[i*w+j] = ...;
}
}

VS.
for(i=0; i<h; i++){
p_temp = p + i*w;
for(j=0; j<w; j++){
p_temp[j] = ...;
}
}

I tested those two cases, and the speed is almost the same...

Any ideas?

Xiaohan

CBFalconer · Feb 29, 2008

Xiaohan said:
p is an array and N is a large number. Which of the following two
loops is faster?

Case1:
for(i=0; i<N; i++){
p = i;
}

Case2:
for(i=0; i<N; i++){
*p = i;
p++;
}

Why don't you try:
for (i = 0; i < N
*p++ = i++;

Then also try it with various optimization levels in your system.

pete · Feb 29, 2008

Xiaohan said:
Or can we say that we should only
try to optimize the algorithm itself instead the code?

http://en.wikipedia.org/wiki/Optimization_(computer_science)

Donald Knuth said, paraphrasing Hoare[1],

"We should forget about small efficiencies,
say about 97% of the time:
premature optimization is the root of all evil."

(Knuth, Donald. Structured Programming with go to Statements,
ACM Journal Computing Surveys, Vol 6, No.4, Dec. 1974. p.268.)

Gene · Mar 1, 2008

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or:
<http://www.caliburn.nl/topposting.html>

Click to expand...

It's bottom posting now.

Thank you all guys for your information!

I searched a little bit on the internet and it seems that nowadays
the
compiler is really smart that it can do many optimizations. So is it
still worth doing it by ourselfs? Or can we say that we should only
try to optimize the algorithm itself instead the code?

The other example is like this, which is very common in image
processing:
for(i=0; i<h; i++){
for(j=0; j<w; j++){
p[i*w+j] = ...;
}

}

VS.
for(i=0; i<h; i++){
p_temp = p + i*w;
for(j=0; j<w; j++){
p_temp[j] = ...;
}

}

I tested those two cases, and the speed is almost the same...

Any ideas?

This is also a very common compiler optimization called loop invariant
removal. A smart compiler will also optimize the multiplication into
addition by adding w to a temp in each iteration of the outer loop.

Malcolm McLean · Mar 1, 2008

It's bottom posting now.

Thank you all guys for your information!

I searched a little bit on the internet and it seems that nowadays
the compiler is really smart that it can do many optimizations. So is it
still worth doing it by ourselfs? Or can we say that we should only
try to optimize the algorithm itself instead the code?

Normally compilers are clever enough to make peephole microoptimisations. So
you should concentrate on the algorithm.
Possible problems are using an algorithm with the wrong O() level, like
bubblesort instead of quicksort, calculating invariants twice, for instance
not storing a mean, and rearrangements of data to giftwrap for ready made
functions, for instance writing mean(double *x, int N) which requires a
malloc of a double * and a data copy, when your data is embedded in a
structure member.

Bartc · Mar 1, 2008

Xiaohan said:
It's bottom posting now.

Thank you all guys for your information!

I searched a little bit on the internet and it seems that nowadays
the
compiler is really smart that it can do many optimizations. So is it
still worth doing it by ourselfs? Or can we say that we should only
try to optimize the algorithm itself instead the code?

I'm not at the stage yet of trusting any compiler to optimise.

The other example is like this, which is very common in image
processing:
for(i=0; i<h; i++){
for(j=0; j<w; j++){
p[i*w+j] = ...;
}
}

VS.
for(i=0; i<h; i++){
p_temp = p + i*w;
for(j=0; j<w; j++){
p_temp[j] = ...;
}
}

I tested those two cases, and the speed is almost the same...

You're suggesting it could be faster still? You would need to look at actual
output to see whether the compiler has missed anything.

In the second example, you don't really need j to go from 0 to w-1; you just
want to execute w times. Possibly you don't need j either, if you just use a
pointer to step along the row (depends what's in ....). If you know that w
is 2n or 4n, you can unroll the loop a little.

This is stuff the compiler ought to deal with.

If the pixel size is 16-bits, and w is even, perhaps you can process 2
pixels at once (depends on ....). If you are just setting pixels to the same
value or pushing them around, perhaps use memset and memcpy on each row.
These decisions I would say the compiler is unlikely to optimise.

If making repeated passes over the same large data, then you might lose
cache benefits. Maybe do as much possible on each row or region before
moving on to the next.

However if what is in ... is complex, then the loop overheads become
irrelevant, and you should look at simplifying ... .

Thad Smith · Mar 1, 2008

Xiaohan said:
p is an array and N is a large number. Which of the following two
loops is faster?

Case1:

for(i=0; i<N; i++){
p = i;
}

Case2:

for(i=0; i<N; i++){
*p = i;
p++;
}

It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation. But I
tested it, the speed is almost the same. Can anybody tell me why?
Thanks.

For your reference, the assembly code is as follows:
...

If timing isn't critical, I use the simplest to understand. If timing is
critical, I benchmark alternatives, then if the fastest method has a
significant advantage over the clear one *in the intended application*, I
use the optimized one, otherwise the clearer code.

If the fastest one is ugly enough, I sometimes leave the simpler one in as
a functional reference for comprehension and porting.

Doug Miller · Mar 2, 2008

for(i=3D0; i<h; i++){
for(j=3D0; j<w; j++){
p[i*w+j] =3D ...;
}
}
VS.
for(i=3D0; i<h; i++){
p_temp =3D p + i*w;
for(j=3D0; j<w; j++){
p_temp[j] =3D ...;
}
}

I tested those two cases, and the speed is almost the same...

Any ideas?

First idea that comes to my mind is that you didn't run the loops through
enough iterations to really show the difference. Try running each through a
hundred million iterations, then see if the speed is still "almost the same"
-- I'm betting it won't be.

Paul Hsieh · Mar 2, 2008

Xiaohan said:
Xiaohan said:

Or can we say that we should only
try to optimize the algorithm itself instead the code?

Click to expand...

http://en.wikipedia.org/wiki/Optimization_(computer_science)

Donald Knuth said, paraphrasing Hoare[1],

"We should forget about small efficiencies,
say about 97% of the time:
premature optimization is the root of all evil."

(Knuth, Donald. Structured Programming with go to Statements,
ACM Journal Computing Surveys, Vol 6, No.4, Dec. 1974. p.268.)

And do you pray to the esteemed Mr. Knuth as well? Do you even know
the answer to Xiaohan's question? (The answer is that there is no
difference, precisely because delivery of better compilers and CPUs
has created a long term situation where those two sources execute with
the same speed -- the point being that they are semantically
equivalent, which is precisely what the CPU and compiler vendors are
trying to express in its most basic fundamental way by the time it
reaches execution; now tell me which is the more useful thing to post
in response to the OP?) And if not, why have you not questioned Knuth
about his obsession over sorting? Surely heapsort has answered the
question correctly, why does he obsess over quicksort or shellsort?

Micah Cowan · Mar 2, 2008

Paul Hsieh said:
Xiaohan said:

Or can we say that we should only
try to optimize the algorithm itself instead the code?

Click to expand...

http://en.wikipedia.org/wiki/Optimization_(computer_science)

Donald Knuth said, paraphrasing Hoare[1],

Â Â Â "We should forget about small efficiencies,
Â Â Â say about 97% of the time:
Â Â Â premature optimization is the root of all evil."

Â Â Â (Knuth, Donald. Structured Programming with go to Statements,
Â Â Â ACM Journal Computing Surveys, Vol 6, No.4, Dec. 1974. p.268.)

Click to expand...

And do you pray to the esteemed Mr. Knuth as well? Do you even know
the answer to Xiaohan's question? (The answer is that there is no
difference, precisely because delivery of better compilers and CPUs
has created a long term situation where those two sources execute with
the same speed -- the point being that they are semantically
equivalent, which is precisely what the CPU and compiler vendors are
trying to express in its most basic fundamental way by the time it
reaches execution; now tell me which is the more useful thing to post
in response to the OP?)

There's no difference between optimizing algorithms versus
micro-optimizing the code? I'm sorry, that is broken.

It's hard to imagine a more appropriate answer to Xiachan's question
than the one pete gave: optimization of trivial things is usually a
waste of time, whereas choosing a good algorithm from the start is
often a good idea.

And if not, why have you not questioned Knuth
about his obsession over sorting? Surely heapsort has answered the
question correctly, why does he obsess over quicksort or shellsort?

(A) What has that got to do--at all--with this thread?
(B) What have you read that gives you the mistaken impression that
Knuth obsesses over quicksort or shellsort?
(C) What makes you presume to think that heapsort is always the best
sorting method for all situations (or even better than quicksort) in
all situations?

Paul Hsieh · Mar 2, 2008

Paul Hsieh said:
Paul Hsieh said:

Xiaohan wrote:
Or can we say that we should only
try to optimize the algorithm itself instead the code?
http://en.wikipedia.org/wiki/Optimization_(computer_science)
Donald Knuth said, paraphrasing Hoare[1],
"We should forget about small efficiencies,
say about 97% of the time:
premature optimization is the root of all evil."
(Knuth, Donald. Structured Programming with go to Statements,
ACM Journal Computing Surveys, Vol 6, No.4, Dec. 1974. p.268.)

Click to expand...

Click to expand...

And do you pray to the esteemed Mr. Knuth as well? Do you even know
the answer to Xiaohan's question? (The answer is that there is no
difference, precisely because delivery of better compilers and CPUs
has created a long term situation where those two sources execute with
the same speed -- the point being that they are semantically
equivalent, which is precisely what the CPU and compiler vendors are
trying to express in its most basic fundamental way by the time it
reaches execution; now tell me which is the more useful thing to post
in response to the OP?)

Click to expand...

There's no difference between optimizing algorithms versus
micro-optimizing the code? I'm sorry, that is broken.

False dichotomy much? No, the answer is that *this example* is not
worth optimizing because the code snippets have identical semantics.
(And in this case its the CPU which saves you, even though the
compiler was not quite perfect.) It not a question of the value of
micro-optimizing, its a question of realizing that this is *NOT* micro-
optimizing.

It's hard to imagine a more appropriate answer to Xiachan's question
than the one pete gave: optimization of trivial things is usually a
waste of time, whereas choosing a good algorithm from the start is
often a good idea.

Well, perhaps its hard for some people.

(A) What has that got to do--at all--with this thread?

pete brought up Knuth as if what he said should be taken seriously.
So I am questioning Knuth.

(B) What have you read that gives you the mistaken impression that
Knuth obsesses over quicksort or shellsort?

In his searching and sorting book he presents the algorithms and
measures their performance complexity with painstaking detail. Its
odd for someone who claims optimization is the root of all evil. Once
he's proven that sorting is O(n*ln n) and he shows that heapsort is
O(n*ln n) why does he discuss further?

(C) What makes you presume to think that heapsort is always the best
sorting method for all situations (or even better than quicksort) in
all situations?

I don't claim that. Under *HIS* edict that optimization is bad, it
*IS* generally fastest. Only context specific sorts like radix sort
pose any real alternative to heapsort for a sheer complexity analysis
point of view.

But if we his: "We should forget about small efficiencies, say about
97% of the time: premature optimization is the root of all evil."
then this is how we know that hybrid searches that mix quicksort and
heapsort are probably state of the art. One is faster some of the
time, and the other is faster the rest of the time, even though in the
O() scheme of things heapsort is always fastest.

On a related note, considering micro-optimization is what allows one
to realize that heap sort has gotten faster over time because of
changing micro-processor technology:

http://www.pobox.com/~qed/sort.html

You won't find any new algorithms there. All pure micro-
optimizations, in straight C. It matters if you care about
performance.

How to multiply two matrices of size in using inline assembly in C++	2	Mar 3, 2024
How do ensure atomic update of a shared global in a multi-threadedapplication?	7	Mar 16, 2011
array index and pointer	9	May 24, 2010
[xpost] a new C/C++ type that when overflow i see it	41	Aug 9, 2007
Array of structs function pointer	10	Jul 16, 2023
Calculating longword pointer, which method is faster ?	4	May 27, 2010
istream and while(cin >> i)	3	Jun 30, 2006
[alt.lang.asm,comp.lang.c]assembly Vs C Language	22	Nov 30, 2006

array index and pointer, which is faster?

Xiaohan

Richard Tobin

Andrey Tarasevich

Doug Miller

lawrence.jones

lawrence.jones

Andrey Tarasevich

Xiaohan

Default User

Xiaohan

CBFalconer

pete

Gene

Malcolm McLean

Bartc

Thad Smith

Doug Miller

Paul Hsieh

Micah Cowan

Paul Hsieh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads