array index and pointer, which is faster?

X

Xiaohan

p is an array and N is a large number. Which of the following two
loops is faster?

Case1:

for(i=0; i<N; i++){
p = i;
}

Case2:

for(i=0; i<N; i++){
*p = i;
p++;
}

It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation. But I
tested it, the speed is almost the same. Can anybody tell me why?
Thanks.

For your reference, the assembly code is as follows:

For Case 1
for(i=0; i<N; i++){
0040101B xor ecx,ecx
0040101D lea ecx,[ecx]
p = i;
00401020 mov dword ptr [eax+ecx*4],ecx
00401023 add ecx,1
00401026 cmp ecx,5F5E100h
0040102C jl main+20h (401020h)
//p++;
}

for Case 2, it is:
for(i=0; i<N; i++){
0040101B xor ecx,ecx
0040101D lea ecx,[ecx]
*p = i;
00401020 mov dword ptr [eax],ecx
00401022 add ecx,1
p++;
00401025 add eax,4
00401028 cmp ecx,5F5E100h
0040102E jl main+20h (401020h)
}
 
R

Richard Tobin

Xiaohan said:
p is an array and N is a large number. Which of the following two
loops is faster?

Case1:

for(i=0; i<N; i++){
p = i;
}

Case2:

for(i=0; i<N; i++){
*p = i;
p++;
}



Depends entirely on your compiler. With a good compiler it shouldn't
make any difference: optimising indexing in a loop to pointer arithmetic
was common in the 1970s if not before.

Other things being equal I would prefer the array indexing version,
since in more complicated cases it might let the compiler see that
there is no aliasing.
It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation.

That's only true if your compiler doesn't optimise one to the other.
But in any case, a shift operation on a register is extremely quick.
And for large amounts of data, the limiting factor will be memory
access so that it won't make any difference at all.

-- Richard
 
A

Andrey Tarasevich

Xiaohan said:
p is an array and N is a large number. Which of the following two
loops is faster?

Case1:

for(i=0; i<N; i++){
p = i;
}

Case2:

for(i=0; i<N; i++){
*p = i;
p++;
}

It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation.


Neither case "needs" anything like that. The compiler is free to
interpret this code anyway it likes, as long as the observable behavior
of the program satisfies the requirements imposed by the language
specification. Thats' the only thing it "needs".
But I
tested it, the speed is almost the same. Can anybody tell me why?

Well, take a look at the assembly code you quoted. Don't you see that it
is exactly the same in both cases?

A typical compiler doesn't necessarily think in terms of individual C
operations. It doesn't try to translate each C operation into its exact
analogue in machine language, as you seem to believe. The compiler
normally tries to see the bigger picture - what your code really does -
and then choses the most optimal way to implement the requested
functionality in machine language. Of course its capabilities are
limited and sometimes its behavior might seem (and actually be) illogical.

In your case the compiler was smart enough to realize that both versions
of the code do exactly the same thing (in terms of the observable
behavior of the program), so it translated them into identical machine
code - the one it considers to be the most optimal.
 
D

Doug Miller

Xiaohan wrote:

Well, take a look at the assembly code you quoted. Don't you see that it
is exactly the same in both cases?

Well, *I* certainly don't see that, and you probably won't either if you look
at it again... :)
 
L

lawrence.jones

Xiaohan said:
It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation. But I
tested it, the speed is almost the same. Can anybody tell me why?

Because on most current processors multiplication is fast and shifting
is extremely fast, and there are multiple ALUs to allow address
arithmetic to be done in parallel with other operations so it takes no
additional time. This has very little to do with C, though.

-Larry Jones

I always send Grandma a thank-you note right away. ...Ever since she
sent me that empty box with the sarcastic note saying she was just
checking to see if the Postal Service was still working. -- Calvin
 
L

lawrence.jones

Andrey Tarasevich said:
Well, take a look at the assembly code you quoted. Don't you see that it
is exactly the same in both cases?

You may want to take another look -- the code is most definitely
different for the two cases.

-Larry Jones

I always have to help Dad establish the proper context. -- Calvin
 
A

Andrey Tarasevich

Doug said:
Well, *I* certainly don't see that, and you probably won't either if you look
at it again... :)

Yeah, I missed the difference somehow... Still I'd stay that in this
case the chances to end up with the same machine code are pretty high
with a modern compiler.
 
X

Xiaohan

Thank you all guys for your information!

I searched a little bit on the internet and it seems that nowadays the
compiler is really smart that it can do many optimizations. So is it
still worth doing it by ourselfs? Or can we say that we should only
try to optimize the algorithm itself instead the code?

The other example is like this, which is very common in image
processing:
for(i=0; i<h; i++){
for(j=0; j<w; j++){
p[i*w+j] = ...;
}
}
VS.
for(i=0; i<h; i++){
p_temp = p + i*w;
for(j=0; j<w; j++){
p_temp[j] = ...;
}
}

I tested those two cases, and the speed is almost the same...

Any ideas?

Xiaohan
 
X

Xiaohan

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or:
<http://www.caliburn.nl/topposting.html>

It's bottom posting now.

Thank you all guys for your information!

I searched a little bit on the internet and it seems that nowadays
the
compiler is really smart that it can do many optimizations. So is it
still worth doing it by ourselfs? Or can we say that we should only
try to optimize the algorithm itself instead the code?


The other example is like this, which is very common in image
processing:
for(i=0; i<h; i++){
for(j=0; j<w; j++){
p[i*w+j] = ...;
}
}


VS.
for(i=0; i<h; i++){
p_temp = p + i*w;
for(j=0; j<w; j++){
p_temp[j] = ...;
}
}


I tested those two cases, and the speed is almost the same...

Any ideas?


Xiaohan
 
C

CBFalconer

Xiaohan said:
p is an array and N is a large number. Which of the following two
loops is faster?

Case1:
for(i=0; i<N; i++){
p = i;
}

Case2:
for(i=0; i<N; i++){
*p = i;
p++;
}


Why don't you try:
for (i = 0; i < N;)
*p++ = i++;

Then also try it with various optimization levels in your system.
 
P

pete

Xiaohan said:
Or can we say that we should only
try to optimize the algorithm itself instead the code?

http://en.wikipedia.org/wiki/Optimization_(computer_science)

Donald Knuth said, paraphrasing Hoare[1],

"We should forget about small efficiencies,
say about 97% of the time:
premature optimization is the root of all evil."

(Knuth, Donald. Structured Programming with go to Statements,
ACM Journal Computing Surveys, Vol 6, No.4, Dec. 1974. p.268.)
 
G

Gene

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or:
<http://www.caliburn.nl/topposting.html>

It's bottom posting now.

Thank you all guys for your information!

I searched a little bit on the internet and it seems that nowadays
the
compiler is really smart that it can do many optimizations.  So is it
still worth doing it by ourselfs?  Or can we say that we should only
try to optimize the algorithm itself instead the code?

The other example is like this, which is very common in image
processing:
for(i=0; i<h; i++){
    for(j=0; j<w; j++){
        p[i*w+j] = ...;
    }

}

VS.
for(i=0; i<h; i++){
    p_temp = p + i*w;
    for(j=0; j<w; j++){
        p_temp[j] = ...;
    }

}

I tested those two cases, and the speed is almost the same...

Any ideas?

This is also a very common compiler optimization called loop invariant
removal. A smart compiler will also optimize the multiplication into
addition by adding w to a temp in each iteration of the outer loop.
 
M

Malcolm McLean

It's bottom posting now.
Thank you all guys for your information!
I searched a little bit on the internet and it seems that nowadays
the compiler is really smart that it can do many optimizations. So is it
still worth doing it by ourselfs? Or can we say that we should only
try to optimize the algorithm itself instead the code?
Normally compilers are clever enough to make peephole microoptimisations. So
you should concentrate on the algorithm.
Possible problems are using an algorithm with the wrong O() level, like
bubblesort instead of quicksort, calculating invariants twice, for instance
not storing a mean, and rearrangements of data to giftwrap for ready made
functions, for instance writing mean(double *x, int N) which requires a
malloc of a double * and a data copy, when your data is embedded in a
structure member.
 
B

Bartc

Xiaohan said:
It's bottom posting now.

Thank you all guys for your information!

I searched a little bit on the internet and it seems that nowadays
the
compiler is really smart that it can do many optimizations. So is it
still worth doing it by ourselfs? Or can we say that we should only
try to optimize the algorithm itself instead the code?

I'm not at the stage yet of trusting any compiler to optimise.
The other example is like this, which is very common in image
processing:
for(i=0; i<h; i++){
for(j=0; j<w; j++){
p[i*w+j] = ...;
}
}


VS.
for(i=0; i<h; i++){
p_temp = p + i*w;
for(j=0; j<w; j++){
p_temp[j] = ...;
}
}


I tested those two cases, and the speed is almost the same...

You're suggesting it could be faster still? You would need to look at actual
output to see whether the compiler has missed anything.

In the second example, you don't really need j to go from 0 to w-1; you just
want to execute w times. Possibly you don't need j either, if you just use a
pointer to step along the row (depends what's in ....). If you know that w
is 2n or 4n, you can unroll the loop a little.

This is stuff the compiler ought to deal with.

If the pixel size is 16-bits, and w is even, perhaps you can process 2
pixels at once (depends on ....). If you are just setting pixels to the same
value or pushing them around, perhaps use memset and memcpy on each row.
These decisions I would say the compiler is unlikely to optimise.

If making repeated passes over the same large data, then you might lose
cache benefits. Maybe do as much possible on each row or region before
moving on to the next.

However if what is in ... is complex, then the loop overheads become
irrelevant, and you should look at simplifying ... .
 
T

Thad Smith

Xiaohan said:
p is an array and N is a large number. Which of the following two
loops is faster?

Case1:

for(i=0; i<N; i++){
p = i;
}

Case2:

for(i=0; i<N; i++){
*p = i;
p++;
}

It seems that Case 1 needs multiplication/shifting to calculate the
address/index and the second case needs only an ADD operation. But I
tested it, the speed is almost the same. Can anybody tell me why?
Thanks.

For your reference, the assembly code is as follows:
...


If timing isn't critical, I use the simplest to understand. If timing is
critical, I benchmark alternatives, then if the fastest method has a
significant advantage over the clear one *in the intended application*, I
use the optimized one, otherwise the clearer code.

If the fastest one is ugly enough, I sometimes leave the simpler one in as
a functional reference for comprehension and porting.
 
D

Doug Miller

for(i=3D0; i<h; i++){
for(j=3D0; j<w; j++){
p[i*w+j] =3D ...;
}
}
VS.
for(i=3D0; i<h; i++){
p_temp =3D p + i*w;
for(j=3D0; j<w; j++){
p_temp[j] =3D ...;
}
}

I tested those two cases, and the speed is almost the same...

Any ideas?

First idea that comes to my mind is that you didn't run the loops through
enough iterations to really show the difference. Try running each through a
hundred million iterations, then see if the speed is still "almost the same"
-- I'm betting it won't be.
 
P

Paul Hsieh

Xiaohan said:
Or can we say that we should only
try to optimize the algorithm itself instead the code?

http://en.wikipedia.org/wiki/Optimization_(computer_science)

Donald Knuth said, paraphrasing Hoare[1],

     "We should forget about small efficiencies,
      say about 97% of the time:
      premature optimization is the root of all evil."

     (Knuth, Donald. Structured Programming with go to Statements,
      ACM Journal Computing Surveys, Vol 6, No.4, Dec. 1974. p.268.)

And do you pray to the esteemed Mr. Knuth as well? Do you even know
the answer to Xiaohan's question? (The answer is that there is no
difference, precisely because delivery of better compilers and CPUs
has created a long term situation where those two sources execute with
the same speed -- the point being that they are semantically
equivalent, which is precisely what the CPU and compiler vendors are
trying to express in its most basic fundamental way by the time it
reaches execution; now tell me which is the more useful thing to post
in response to the OP?) And if not, why have you not questioned Knuth
about his obsession over sorting? Surely heapsort has answered the
question correctly, why does he obsess over quicksort or shellsort?
 
M

Micah Cowan

Paul Hsieh said:
Xiaohan said:
Or can we say that we should only
try to optimize the algorithm itself instead the code?

http://en.wikipedia.org/wiki/Optimization_(computer_science)

Donald Knuth said, paraphrasing Hoare[1],

     "We should forget about small efficiencies,
      say about 97% of the time:
      premature optimization is the root of all evil."

     (Knuth, Donald. Structured Programming with go to Statements,
      ACM Journal Computing Surveys, Vol 6, No.4, Dec. 1974. p.268.)

And do you pray to the esteemed Mr. Knuth as well? Do you even know
the answer to Xiaohan's question? (The answer is that there is no
difference, precisely because delivery of better compilers and CPUs
has created a long term situation where those two sources execute with
the same speed -- the point being that they are semantically
equivalent, which is precisely what the CPU and compiler vendors are
trying to express in its most basic fundamental way by the time it
reaches execution; now tell me which is the more useful thing to post
in response to the OP?)

There's no difference between optimizing algorithms versus
micro-optimizing the code? I'm sorry, that is broken.

It's hard to imagine a more appropriate answer to Xiachan's question
than the one pete gave: optimization of trivial things is usually a
waste of time, whereas choosing a good algorithm from the start is
often a good idea.
And if not, why have you not questioned Knuth
about his obsession over sorting? Surely heapsort has answered the
question correctly, why does he obsess over quicksort or shellsort?

(A) What has that got to do--at all--with this thread?
(B) What have you read that gives you the mistaken impression that
Knuth obsesses over quicksort or shellsort?
(C) What makes you presume to think that heapsort is always the best
sorting method for all situations (or even better than quicksort) in
all situations?
 
P

Paul Hsieh

Paul Hsieh said:
Xiaohan wrote:
Or can we say that we should only
try to optimize the algorithm itself instead the code?
http://en.wikipedia.org/wiki/Optimization_(computer_science)
Donald Knuth said, paraphrasing Hoare[1],
     "We should forget about small efficiencies,
      say about 97% of the time:
      premature optimization is the root of all evil."
     (Knuth, Donald. Structured Programming with go to Statements,
      ACM Journal Computing Surveys, Vol 6, No.4, Dec. 1974. p.268.)
And do you pray to the esteemed Mr. Knuth as well?  Do you even know
the answer to Xiaohan's question? (The answer is that there is no
difference, precisely because delivery of better compilers and CPUs
has created a long term situation where those two sources execute with
the same speed -- the point being that they are semantically
equivalent, which is precisely what the CPU and compiler vendors are
trying to express in its most basic fundamental way by the time it
reaches execution; now tell me which is the more useful thing to post
in response to the OP?)

There's no difference between optimizing algorithms versus
micro-optimizing the code? I'm sorry, that is broken.

False dichotomy much? No, the answer is that *this example* is not
worth optimizing because the code snippets have identical semantics.
(And in this case its the CPU which saves you, even though the
compiler was not quite perfect.) It not a question of the value of
micro-optimizing, its a question of realizing that this is *NOT* micro-
optimizing.
It's hard to imagine a more appropriate answer to Xiachan's question
than the one pete gave: optimization of trivial things is usually a
waste of time, whereas choosing a good algorithm from the start is
often a good idea.

Well, perhaps its hard for some people.
(A) What has that got to do--at all--with this thread?

pete brought up Knuth as if what he said should be taken seriously.
So I am questioning Knuth.
(B) What have you read that gives you the mistaken impression that
Knuth obsesses over quicksort or shellsort?

In his searching and sorting book he presents the algorithms and
measures their performance complexity with painstaking detail. Its
odd for someone who claims optimization is the root of all evil. Once
he's proven that sorting is O(n*ln n) and he shows that heapsort is
O(n*ln n) why does he discuss further?
(C) What makes you presume to think that heapsort is always the best
sorting method for all situations (or even better than quicksort) in
all situations?

I don't claim that. Under *HIS* edict that optimization is bad, it
*IS* generally fastest. Only context specific sorts like radix sort
pose any real alternative to heapsort for a sheer complexity analysis
point of view.

But if we his: "We should forget about small efficiencies, say about
97% of the time: premature optimization is the root of all evil."
then this is how we know that hybrid searches that mix quicksort and
heapsort are probably state of the art. One is faster some of the
time, and the other is faster the rest of the time, even though in the
O() scheme of things heapsort is always fastest.

On a related note, considering micro-optimization is what allows one
to realize that heap sort has gotten faster over time because of
changing micro-processor technology:

http://www.pobox.com/~qed/sort.html

You won't find any new algorithms there. All pure micro-
optimizations, in straight C. It matters if you care about
performance.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top