Which if faster?

Ganesh · Oct 24, 2006

Hi,

This is a question that pertains to pointers in general (C or C++).
Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

'a' is a dynamically allocated array of any valid scalar type and of
size N. I have read somewhere that the latter is faster. But could not
understand why? Or is that dependent on architecture or compiler?

Michael · Oct 24, 2006

Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

'a' is a dynamically allocated array of any valid scalar type and of
size N. I have read somewhere that the latter is faster. But could not
understand why? Or is that dependent on architecture or compiler?

Two answers:
1) For any reasonably modern optimizing compiler, they should be the
same speed.
2) This kind of micro-optimization will almost never matter; better to
spend time on fixing slow algorithms, handling slow IO, etc. Whatever
a profiler tells you is the part of the code that is *actually* taking
time.

Michael

Salt_Peter · Oct 24, 2006

Ganesh said:
Hi,

This is a question that pertains to pointers in general (C or C++).
Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

'a' is a dynamically allocated array of any valid scalar type and of
size N. I have read somewhere that the latter is faster. But could not
understand why? Or is that dependent on architecture or compiler?

Probably because the first implies a cast.

Ironically, you'ld probably get "faster" with:
for( size_t i = 0; i < N; ++i)
{
// your choice
}

Frederick Gotham · Oct 24, 2006

Ganesh posted:

Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

On most systems, the following is the fastest:

int *p = arr;
int const *const pover = arr+len;

do *p++ = ...
while (pover != p);

On systems which have a CPU instruction which takes both a pointer and an
offset, the following is fastest:

size_t i = 0;

do arr[i++] = ...
while (len != i);

When writing portable code, I opt for the former.

Howard · Oct 24, 2006

Salt_Peter said:
Hi,

This is a question that pertains to pointers in general (C or C++).
Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

'a' is a dynamically allocated array of any valid scalar type and of
size N. I have read somewhere that the latter is faster. But could not
understand why? Or is that dependent on architecture or compiler?

Click to expand...

Probably because the first implies a cast.

Ironically, you'ld probably get "faster" with:
for( size_t i = 0; i < N; ++i)
{
// your choice
}

Huh? The "int" type is the "native" integral type on any machine. Why
would there be a cast using "int i" to index a dynamically allocated array
(or any array for that matter)? The question wasn't about vectors, but
about raw pointers and arrays. I'd think that a "conversion" from size_t to
int would be needed if using your code and indexing the array with it. But
I see no cast implied or required anywhere in the OP's code.

The "usual" assumption is that the latter method (using pointers and
incrementing) is faster, on the assumption that indexing requires
multiplying the index by the size of an element, and adding that as an
offset to the array start. The assumption is false. That's one way to
implement indexing, but compilers are not required to implement it that way.

Indeed, a decent optimizing compiler is likely to take advantage of whatever
native machine code would do the job best, whether that's by using code more
like the pointer version, or by pre-loading some registers and calling a
built-in machine instruction which handles loops with one simple command.
Or any other solution they see fit to use.

Compiler makers undoubtedly profile their solutions. You should do the same
if you want to know whether you need to change your code, and whether a
proposed solution is a good one for your particular platform.

-Howard

Rolf Magnus · Oct 24, 2006

Ganesh said:
Hi,

This is a question that pertains to pointers in general (C or C++).
Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

'a' is a dynamically allocated array of any valid scalar type and of
size N. I have read somewhere that the latter is faster. But could not
understand why?

I have seen that happen on some compiler. Depending on the optimization
capabilities of the compiler and on the type that a points to, it might
happen that i gets multiplied with that type's size on each iteration for
the first solution, while the latter does an addition instead. Since on
many architectures, an addition is quicker than a multiplication, it can
happen that the second one is faster.

Or is that dependent on architecture or compiler?

Click to expand...

It is.

Howard · Oct 24, 2006

Frederick Gotham said:
Ganesh posted:

Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

Click to expand...

On most systems, the following is the fastest:

int *p = arr;
int const *const pover = arr+len;

do *p++ = ...
while (pover != p);

On systems which have a CPU instruction which takes both a pointer and an
offset, the following is fastest:

size_t i = 0;

do arr[i++] = ...
while (len != i);

Can you back those statements up with documentation?

There are many forms of optimizations a compiler might use to implement a
loop, regardless of what the original code looks like. The best way to know
what's fastest is to test it on your platform. And that's _still_ no
guarantee that you've actually hit upon the absolute fastest method for your
compiler and platform.

-Howard

Frederick Gotham · Oct 24, 2006

Howard posted:

Can you back those statements up with documentation?

There are many forms of optimizations a compiler might use to implement
a loop, regardless of what the original code looks like. The best way
to know what's fastest is to test it on your platform. And that's
_still_ no guarantee that you've actually hit upon the absolute fastest
method for your compiler and platform.

A few of us tested them over on comp.lang.c. Here's the relevant thread:

http://groups.google.ie/group/comp.lang.c/browse_frm/thread/33ac11a76eb728ad/
2e09898d4ca430be?lnk=st&q=&rnum=6&hl=en#2e09898d4ca430be

Victor Bazarov · Oct 24, 2006

Frederick said:
Ganesh posted:

Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

Click to expand...

On most systems, the following is the fastest:

int *p = arr;
int const *const pover = arr+len;

do *p++ = ...
while (pover != p);

On systems which have a CPU instruction which takes both a pointer
and an offset, the following is fastest:

size_t i = 0;

do arr[i++] = ...
while (len != i);

When writing portable code, I opt for the former.

They are not semantially equivalent. You need an 'if' in front of
the 'do' to get them there.

V

Frederick Gotham · Oct 24, 2006

Victor Bazarov posted:

On most systems, the following is the fastest:

int *p = arr;
int const *const pover = arr+len;

do *p++ = ...
while (pover != p);

On systems which have a CPU instruction which takes both a pointer
and an offset, the following is fastest:

size_t i = 0;

do arr[i++] = ...
while (len != i);

When writing portable code, I opt for the former.

Click to expand...

They are not semantially equivalent. You need an 'if' in front of
the 'do' to get them there.

Sorry, I don't follow. . .

(Also, the code assumes that len is non-zero)

Victor Bazarov · Oct 24, 2006

Frederick said:
Victor Bazarov posted:

On most systems, the following is the fastest:

int *p = arr;
int const *const pover = arr+len;

do *p++ = ...
while (pover != p);

On systems which have a CPU instruction which takes both a pointer
and an offset, the following is fastest:

size_t i = 0;

do arr[i++] = ...
while (len != i);

When writing portable code, I opt for the former.

Click to expand...

They are not semantially equivalent. You need an 'if' in front of
the 'do' to get them there.

Click to expand...

Sorry, I don't follow. . .

(Also, the code assumes that len is non-zero)

Assumption is the mother of all f***-ups.

for (int i = 0; i < len; +i)
blah;

is not semantically equivalent to

int i = 0;
do
blah;
while (++i < len);

, that's all.

V

Howard · Oct 24, 2006

Frederick Gotham said:
Howard posted:

Why did you clip what I was responding to? Now nobody reading this will
know what the heck I was talking about. Here are your statements that I was
challenging:

On most systems, the following is the fastest:

int *p = arr;
int const *const pover = arr+len;

do *p++ = ...
while (pover != p);

On systems which have a CPU instruction which takes both a pointer and an
offset, the following is fastest:

size_t i = 0;

do arr[i++] = ...
while (len != i);

A few of us tested them over on comp.lang.c. Here's the relevant thread:

http://groups.google.ie/group/comp.lang.c/browse_frm/thread/33ac11a76eb728ad/
2e09898d4ca430be?lnk=st&q=&rnum=6&hl=en#2e09898d4ca430be

I see a couple tests, but nothing which makes your statements always true,
even for the specific code you gave (let alone for the more general case the
OP asked about). How do you know that my compiler won't be able to unroll a
loop? Or that it won't load up a couple registers and call a single machine
instruction that performs the loop? (I've used those many times when
writing assembly code to handle simple operations on arrays.)

I think that you can really only say that, on your machine, using your
version of your compiler, your specific tests showed some specific results.
It's not valid to extend that to: "on any machine which has such-and-such a
feature, the following code is always fastest". There's simply no way to
know that for sure.

And even if you somehow demonstrated that it's true on all existing
compilers on all existing machines, that would be no guarantee that someone
couldn't write a compiler to implement it faster on one or more of those
machines. Or make a machine which will do it faster using some other
implementation.

-Howard

Howard · Oct 24, 2006

Assumption is the mother of all f***-ups.

for (int i = 0; i < len; +i)
blah;

Typing mistakes are certainly a close second.

That should obviously be "++i", not "+i".

-Howard

Clark S. Cox III · Oct 24, 2006

Salt_Peter said:
Ganesh said:

Hi,

This is a question that pertains to pointers in general (C or C++).
Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

'a' is a dynamically allocated array of any valid scalar type and of
size N. I have read somewhere that the latter is faster. But could not
understand why? Or is that dependent on architecture or compiler?

Click to expand...

Probably because the first implies a cast.

It implies no such thing.

Ironically, you'ld probably get "faster" with:
for( size_t i = 0; i < N; ++i)
{
// your choice
}

Click to expand...

I'd consider any compiler that produces significantly different code broken.

Victor Bazarov · Oct 24, 2006

Howard said:
Typing mistakes are certainly a close second.

That should obviously be "++i", not "+i".

That's all my keyboard's fault! And since I'm using an MS Natural
thing, it's all the fault of MS!

Clark S. Cox III · Oct 24, 2006

Frederick said:
Howard posted:

A few of us tested them over on comp.lang.c. Here's the relevant thread:

http://groups.google.ie/group/comp.lang.c/browse_frm/thread/33ac11a76eb728ad/
2e09898d4ca430be?lnk=st&q=&rnum=6&hl=en#2e09898d4ca430be

For every assertion that you made in that thread about which one was
generally faster, someone presented a counter-example.

Victor Bazarov · Oct 24, 2006

Clark said:
Salt_Peter said:

Ganesh said:

Hi,

This is a question that pertains to pointers in general (C or C++).
Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

'a' is a dynamically allocated array of any valid scalar type and
of size N. I have read somewhere that the latter is faster. But
could not understand why? Or is that dependent on architecture or
compiler?

Click to expand...

Probably because the first implies a cast.

Click to expand...

It implies no such thing.

Ironically, you'ld probably get "faster" with:
for( size_t i = 0; i < N; ++i)
{
// your choice
}

Click to expand...

I'd consider any compiler that produces significantly different code
broken.

Just to throw some fire into the oil, the former case is much easier
parallelizable than then latter.

V

Ganesh · Oct 24, 2006

Ganesh said:
Hi,

This is a question that pertains to pointers in general (C or C++).
Which of the following is faster and why?

for (int i = 0; i < N; i++)
= ... a...

(or)

for (int i = 0; i < N; i++)
= ...*a++....

'a' is a dynamically allocated array of any valid scalar type and of
size N. I have read somewhere that the latter is faster. But could not
understand why? Or is that dependent on architecture or compiler?

In my architecture (Intel Pentium IV 3 GHz, 1MG Cache ) I got the
following times for 10 runs on a reasonably huge array size for huge
number of iterations.

Array Access (seconds): 5 5 5 4 5 5 5 5 5 5
Pointer Access: 4 4 3 4 4 4 4 4 4
3

On the average, pointer access is faster, but array access also equals
the pointer access sometimes. Not sure why this happens. The assembly
code is different for each case. For array access, it shows
..
addl (%esi, %eax,4), %ebx
incl %eax
....
,
while for pointer access it generates
....
addl %eax, %ebx,
addl $4, %eax
....

BobR · Oct 24, 2006

Victor Bazarov wrote in message ...

That's all my keyboard's fault! And since I'm using an MS Natural
thing, it's all the fault of MS!

Nice save, Victor!!
Does *that* keyboard have a BSOD feature? Crash daily?

Frederick Gotham · Oct 24, 2006

Howard posted:

Why did you clip what I was responding to? Now nobody reading this will
know what the heck I was talking about.

Of course they will, by the miracle of threads! You do have threads... don't
you? Or do you still have candles for lightbulbs?

And even if you somehow demonstrated that it's true on all existing
compilers on all existing machines, that would be no guarantee that
someone couldn't write a compiler to implement it faster on one or more
of those machines. Or make a machine which will do it faster using some
other implementation.

Indeed you're right. When writing portable code, I make an educated guess as
to what strategy will produce high efficiency across the board. Looking at
the tests done all today's computers, I opt for the *p++ solution.

Which is faster?	17	Jul 21, 2009
I am trying to detect Which image id="" was clicked ?	22	Jan 3, 2023
SOLVE THIS IF YOU CAN PYTHON MASTER	7	Jan 30, 2023
Is there in memory tree structure faster than std::map?	11	Aug 12, 2012
If(strcmp(str, "") == 0) - What does this line of code mean?	0	Aug 8, 2022
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023
Help to decide which sw to use	1	Nov 18, 2023
min/max by value - 25% faster	10	Feb 22, 2012

Which if faster?

Ganesh

Michael

Salt_Peter

Frederick Gotham

Howard

Rolf Magnus

Howard

Frederick Gotham

Victor Bazarov

Frederick Gotham

Victor Bazarov

Howard

Howard

Clark S. Cox III

Victor Bazarov

Clark S. Cox III

Victor Bazarov

Ganesh

BobR

Frederick Gotham

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads