Vectors vs Arrays performance

Udit Sharma · Jan 2, 2010

Hi,

I am a newbie as far as C++ is concerned. I am just reading books like
Effective C++ and More Effective C++. So kindly bear with me if I have
made some very stupid mistake.

I have read in books that it is always better to use vectors rather
than arrays and that there is no performance hit and also that your
code would be more safe to use when using vectors.

I had to write a small program to compute the sum of digits in a
factorial of a number (<= 10000). I have written two versions: one
using vectors, and one using plain arrays.

But I notice a stark difference in execution times. The program using
arrays runs in about 0.309 seconds (to compute sum of digits in
10000!) while the one using vectors takes about 2.9 seconds to finish
the same. I compiled both on an Ubuntu 9.1 Linux machine running using
gcc 4.4.1 and at optimization level 3.

I have listed both the programs here. Kindly tell me if the problem is
with the way I have written the code or something else.

//Program using arrays
#include <iostream>
int main() {
unsigned int factorial = 0;
std::cout << "Enter number : ";
std::cin >> factorial;

const unsigned MAXDIGITS = 10000, BASE = 10000;
unsigned indDigits[MAXDIGITS]; //This is the array that am
talking about
indDigits[0] = 1;
unsigned int counter = 0, oanda = 0, size = 0;

for(unsigned facts = 1; facts <= factorial; facts++) {
for (unsigned index = counter; index <= size; index++){
oanda += facts * indDigits[index];
indDigits[index] = oanda%BASE;
if(index == counter && !((oanda > 0) && (oanda != BASE)))
counter++;
oanda /= BASE;
}
if(oanda)
indDigits[++size]=oanda;
}

unsigned sumDigits = 0;
for (unsigned i = 0; i <= size; i++){
for(unsigned j = 1; j <= BASE/10; j*=10)
sumDigits += (indDigits/j)%10;
}
std::cout << "Sum of Digits: " << sumDigits << std::endl;
}

//Same Program using vectors
#include <iostream>
#include <vector>
typedef std::vector<unsigned> Vec;
typedef std::vector<unsigned>::iterator Iter;

int main() {
unsigned int factorial = 0;
std::cout << "Enter number : ";
std::cin >> factorial;

const unsigned BASE = 10000;
Vec indDigits; //This is the vector that am talking about
indDigits.push_back(1);
unsigned int counter = 0, oanda = 0, size = 0;

for(unsigned facts = 1; facts <= factorial; facts++) {
oanda = 0;
for(Iter iter = indDigits.begin() + counter; iter !=
indDigits.end(); iter++){
oanda += facts * (*iter);
*iter = oanda%BASE;
if((iter == indDigits.begin() + counter) && !((oanda > 0)
&& (oanda != BASE)))
counter++;
oanda /= BASE;
}
if(oanda)
indDigits.push_back(oanda);
}

unsigned sumDigits = 0;
for (Iter iter = indDigits.begin(); iter != indDigits.end(); iter+
+){
for(unsigned j = 1; j <= BASE/10; j*=10)
sumDigits += ((*iter)/j)%10;
}
std::cout << "Sum of Digits: " << sumDigits << std::endl;
}

Thanks in advance.

Cheers
Udit

Udit Sharma · Jan 2, 2010

Hi,

I noticed that there are some formatting (additional tabs and spaces)
errors above.

You can also check the same code at

http://codepad.org/P6OoPyMO (for arrays version)

and

http://codepad.org/1z7K9wTZ (for vectors version)

I am sorry about the inconvenience.

Cheers
Udit

Eric BÃ¶se-Wolf · Jan 2, 2010

Hi,

running the script

http://codepad.org/AfkHj09t

I get the output

http://codepad.org/1HUP1jid

with

g++ (GCC) 4.4.1

Yours sincerely,

Eric

Leclerc · Jan 2, 2010

Hi,

(as you already might have concluded, from Eric's post)

are you sure you are measuring speed of optimized build; note '-O3' in
Eric's script.

The other thing you might do is to reserve in advance memory for vector.
Read documentation about std::vector::reserve, and std::vector:

ush_back

best regards,
Gordan

James Kanze · Jan 2, 2010

Accessing a vector's contents via operator[] will always be
slightly slower than accessing a local (stack based) array via
[] as you have to dereference a pointer with the vector
version.

That's bullshit. You have to dereference a pointer in both
cases. In the case of indexing, you have to calculate the
address (both for stack based and for std::vector), and because
these calculations are hidden in some functions with std::vector
(and probably do bounds checking as well), it's harder for the
optimizer to get its hands on them. On the other hand, if
you're using iterators, your code with std::vector is basically
what the optimizer will do with indexing into a built-in array.

So it all depends on the compiler. What will make a difference,
generally, is that when debug information is turned on and the
optimizer is turned off, most compilers will not inline. And
everything you do in std::vector involves a function call; if
that's not inlined, you could very well end up paying in
performance.

James Kanze · Jan 2, 2010

news:d215cd97-7eb5-4b38-bafb-bada3175bc0e@k17g2000yqh.googlegroups.com...

Click to expand...

Accessing a vector's contents via operator[] will always be
slightly slower than accessing a local (stack based) array
via [] as you have to dereference a pointer with the vector
version.

Click to expand...

That's bullshit. You have to dereference a pointer in both
cases. In the case of indexing, you have to calculate the
address (both for stack based and for std::vector), and
because these calculations are hidden in some functions with
std::vector (and probably do bounds checking as well), it's
harder for the optimizer to get its hands on them. On the
other hand, if you're using iterators, your code with
std::vector is basically what the optimizer will do with
indexing into a built-in array.

Click to expand...

It is not bullshit, in the case of the stack array it is
simply an offset from the current stack pointer, no
dereference required,

You can't access the value without having its address. And the
value is in memory, so accessing it involves dereferencing the
address.

take a look at assembler output if you do not believe me, you
should see an extra instruction for the dereference of vector
operator[] compared to the stack array []. If you are using a
pointer/reference to the array then they will be the same.

int main()
{
volatile int output;
int a[10];
std::vector<int> v(10);
_asm int 3;
output = a[0];
_asm int 3;
output = v[0];
_asm int 3;
}

What on earth is the volatile doing in there? And the _asm?

outputs (VC++ Release Build):

00020 cc int 3
; 19 : output = a[0];
00021 8b 44 24 20 mov eax, DWORD PTR _a$[esp+72]
00025 89 44 24 0c mov DWORD PTR _output$[esp+72], eax
00029 cc int 3
; 21 : output = v[0];
0002a 8b 44 24 14 mov eax, DWORD PTR _v$[esp+76]
0002e 8b 08 mov ecx, DWORD PTR [eax]
00030 89 4c 24 0c mov DWORD PTR _output$[esp+72], ecx
00034 cc int 3

Notice the extra instruction?

I noticed a lot of extra instructions: a volatile, and some
_asm, to begin with.

If you'd have read the rest of my posting, you'd have seen that
I explained what is actually relevant, not some hypothetical
case which doesn't correspond to anything.

Balog Pal · Jan 2, 2010

James Kanze said:
You can't access the value without having its address. And the
value is in memory, so accessing it involves dereferencing the
address.

Come on, the difference must be evident with any stack-using
implementation -- that is the most common. For direct array on the stack
tha base address is already in the stack or frame poitner, and need no
extra load. For the other, the base is at arbitrary locaton that must be
loaded in a register with an extra register.

The assy presented shows exactly that. Whether that single extra load, and
associated new area in cache actually measures in an application is a
different question, the difference may get masked entirely by a stall, but
it may as well show.

take a look at assembler output if you do not believe me, you
should see an extra instruction for the dereference of vector
operator[] compared to the stack array []. If you are using a
pointer/reference to the array then they will be the same.

Click to expand...

int main()
{
volatile int output;
int a[10];
std::vector<int> v(10);
_asm int 3;
output = a[0];
_asm int 3;
output = v[0];
_asm int 3;
}

Click to expand...

What on earth is the volatile doing in there? And the _asm?

Irreelvant to the point, but without such tricks tMSVC would just generate
an empty return for main and remove all the code.

I noticed a lot of extra instructions: a volatile, and some
_asm, to begin with.

it's

mov eax, DWORD PTR _a$[esp+72]

vs.

mov ecx, DWORD PTR _v$[esp+76]
mov eax, DWORD PTR [ecx]

If there is an offset calculation, it would add to the last line
respectively, one extra fetch is mandatory.

If you'd have read the rest of my posting, you'd have seen that
I explained what is actually relevant, not some hypothetical
case which doesn't correspond to anything.

Your above-quoted reafoning is simply flawed.
The need to dereference *a* pointer in both cases does not imply that
dereference can be made with the same aount of instructions/code. And it in
not a hypothesis, just the simple fact of life.

TANSTAAFL, extra indirections generally comes with some cost. That we
probably accept for the benefits on the other side, and should not deny.

Udit Sharma · Jan 3, 2010

Hi,

(as you already might have concluded, from Eric's post)

are you sure you are measuring speed of optimized build; note '-O3' in
Eric's script.

The other thing you might do is to reserve in advance memory for vector.
Read documentation about std::vector::reserve, and std::vector:ush_back

best regards,
Gordan

Hi Leclerc,

I am sorry. I noticed that including the -O3 flag achieved almost the
same results for both the arrays and vector code. I had chosen the
"High Optimization" flag in Code Blocks IDE but somehow that hasn't
been passed to the compiler underneath. Manual compilation like Eric
and you had suggested clearly brings the performance on both the
versions to the same level.

Thanks.

Cheers
Udit

Udit Sharma · Jan 3, 2010

You are comparing an array that is allocated once and never expanded

with a vector that is expanded and realocated a large number of times.
In other words, you forgot
inDigits.reserve(MAXDIGITS);

David,

Even without the reserve the Vectors version ran as fast as the arrays
version (well almost). I guess then doing this would surely improve
the performane.

But I have one more question. If we know the number of elements that
we are going to store then is it always better to reserve that size
before hand itself?

The reason why I am asking this is because I had read something to the
contrary in Stanley Lipmmann's C++ Primer.

Here is the exact quote (C++ Primer, Page 92, Key Concepts : Vectors
grow dynamically)

"A central property of vectors (and the other library containers) is
that they are required to be implemented so that it is efficient to
add elements to them at run time. Because vectors grow efficiently, it
is usually best to let the vector grow by adding elements to it
dynamically as the element values are known".

I have looked up some online references for more information about
reserve. Here is my understanding. Please correct me if I am wrong.

"When we add elements to the vector the compiler adds an instruction
to push the element into the contiguous block of memory allocated for
the vector. If this memory is not enough then the compiler adds
instructions to allocate a bigger blok of memory (I read that this
increases by a factor of 2) and then moves all the elements from the
previous location to the newer location and then the previously
allocated memory is freed. So in my program since the above mentioned
steps would be done quite a few times, it un-necessarily adds to the
run time and hence my program would be slower".

If what I have said above is true then why does Stanley Lippmann
suggest that it is always better to let vectors grow dynamically?

I hope this question is related to this thread. If it is not let me
know and I will repost it as a new topic.

Cheers
Udit

Miles Bader · Jan 3, 2010

Leigh Johnston said:
OK, to be absolutely clear when using operator[] std::vector has an
*extra* dereference in addition to the actual deference to get an
element's value compared to an array *on the stack* accessed *locally
within the same scope* (not via a pointer/reference to it). Clear now?

In many uses, this difference is irrelevant though, because the cost of the
first dereference can be amortized over many actual uses (hoisted out of
loops etc), and the "important" code ends up being exactly the same.

E.g., if you use this code (surrounding code omitted):

__asm ("a");
for (unsigned i = 0; i < 10; i++)
a = i;
__asm ("b");
for (unsigned i = 0; i < 10; i++)
v = i;
__asm ("c");

then with gcc 4.4.2 (-O3) you get:

#APP
# 7 "t.cc" 1
a
# 0 "" 2
#NO_APP
movl $0, (%rsp)
movl $1, 4(%rsp)
movl $2, 8(%rsp)
movl $3, 12(%rsp)
movl $4, 16(%rsp)
movl $5, 20(%rsp)
movl $6, 24(%rsp)
movl $7, 28(%rsp)
movl $8, 32(%rsp)
movl $9, 36(%rsp)
#APP
# 10 "t.cc" 1
b
# 0 "" 2
#NO_APP
movl $1, 4(%rax)
movl $2, 8(%rax)
movl $3, 12(%rax)
movl $0, (%rax)
movl $4, 16(%rax)
movl $5, 20(%rax)
movl $6, 24(%rax)
movl $7, 28(%rax)
movl $8, 32(%rax)
movl $9, 36(%rax)
#APP
# 13 "t.cc" 1
c
# 0 "" 2

Obviously compiler dependent, but any decent compiler should do
something similar. That's one reason why C++ vectors are nice -- they
offer some nice additional functionality, but aren't significantly
slower than primitive arrays.

-Miles

Jan · Jan 3, 2010

OK, to be absolutely clear when using operator[] std::vector has an *extra*
dereference in addition to the actual deference to get an element's value
compared to an array *on the stack* accessed *locally within the same scope*
(not via a pointer/reference to it). Clear now?

As you can see from the assembler output between the breakpoints the array
[] access involves one less instruction than the vector operator[]. A
std::vector object contains a pointer to the controlled sequence, not the
case for an array on the stack accessed locally from the same scope.

Can you please explain the above two quotes slightly more clearly to
me? I am not familiar with assembly language programming.

In the examples that have been given above both the vector and the
array have been assigned in local scope. So why is there an additional
instruction overhead to access the elements in the vector but not so
in the array?

Is it because a vector is an whole object by itself (different from
the elements contained within it) and that the contiguous block of
memory within it is just a sub part of it and that an additional
instruction is needed to get to this sub part?

Cheers
Udit

Bo Persson · Jan 3, 2010

Udit said:
David,

Even without the reserve the Vectors version ran as fast as the
arrays version (well almost). I guess then doing this would surely
improve the performane.

But I have one more question. If we know the number of elements that
we are going to store then is it always better to reserve that size
before hand itself?

The reason why I am asking this is because I had read something to
the contrary in Stanley Lipmmann's C++ Primer.

Here is the exact quote (C++ Primer, Page 92, Key Concepts : Vectors
grow dynamically)

"A central property of vectors (and the other library containers) is
that they are required to be implemented so that it is efficient to
add elements to them at run time. Because vectors grow efficiently,
it is usually best to let the vector grow by adding elements to it
dynamically as the element values are known".

I have looked up some online references for more information about
reserve. Here is my understanding. Please correct me if I am wrong.

"When we add elements to the vector the compiler adds an instruction
to push the element into the contiguous block of memory allocated
for the vector. If this memory is not enough then the compiler adds
instructions to allocate a bigger blok of memory (I read that this
increases by a factor of 2) and then moves all the elements from the
previous location to the newer location and then the previously
allocated memory is freed. So in my program since the above
mentioned steps would be done quite a few times, it un-necessarily
adds to the run time and hence my program would be slower".

If what I have said above is true then why does Stanley Lippmann
suggest that it is always better to let vectors grow dynamically?

When someone says "always", he really means "almost always".

But you are right, unless the pushing is the central part of your
program you will not notice the difference. So don't bother reserving
unless you can see a good reason for it.

Bo Persson

Carlo Milanesi · Jan 3, 2010

Udit said:
David,

Even without the reserve the Vectors version ran as fast as the arrays
version (well almost). I guess then doing this would surely improve
the performane.

If we know the number of elements that
we are going to store then is it always better to reserve that size
before hand itself?

I think so.

The reason why I am asking this is because I had read something to the
contrary in Stanley Lipmmann's C++ Primer.

Here is the exact quote (C++ Primer, Page 92, Key Concepts : Vectors
grow dynamically)

"A central property of vectors (and the other library containers) is
that they are required to be implemented so that it is efficient to
add elements to them at run time. Because vectors grow efficiently, it
is usually best to let the vector grow by adding elements to it
dynamically as the element values are known".

If what I have said above is true then why does Stanley Lippmann
suggest that it is always better to let vectors grow dynamically?

Lippmann wrote: "it is usually best" not "it is always better".
But I think he meant that it is usually wrong to set the "size" of the
vector bigger than needed. In that sentence, he said nothing about the
"capacity" of the vector. The function "reserve" changes the capacity,
not the size.
Nevertheless, setting the size of a vector beforehand may increase
performance in some cases. Yours is one of them.

> If this memory is not enough then the compiler adds
> instructions to allocate a bigger blok of memory

The compiler adds all the needed instructions at compile time. If the
vector capacity is not enough, the instructions to allocate a bigger
block of memory are actually executed.

> (I read that this increases by a factor of 2)

Not necessarily. For example, it may be multiplied by 7 / 4 and then
rounded upwardly to the nearest power of two.

James Kanze · Jan 3, 2010

OK, to be absolutely clear when using operator[] std::vector
has an *extra* dereference in addition to the actual deference
to get an element's value compared to an array *on the stack*
accessed *locally within the same scope* (not via a
pointer/reference to it). Clear now?

It's clear that you're claiming something that is in general
false.

Consider first the simple case:

int
main()
{
int a1[10];
for ( int i = 0 ; i < 10; ++ i ) {
a1[ i ] = i;
}
int* a2 = new int[ 10 ];
for ( int i = 0 ; i < 10; ++ i ) {
a2[ i ] = i;
}
// additional code which uses the arrays, to ensure
// that the compiler doesn't optimize them out.
}

Any decent compiler should generate exactly the same code for both
loops.

The second loop, of course, basically corresponds to what goes
on under the hood in std::vector, at least with regards to the
indexing. The main difference is that all of the operations in
std::vector are, in fact, function calls. Which, depending on
the compiler, may make recognizing the pattern more difficult.

[...]

As you can see from the assembler output between the
breakpoints the array [] access involves one less instruction
than the vector operator[].

In one particular case, on one particular machine, with one
particular compiler. It's certainly not a general case. And
it's not a realistic case because of the volatile and the inline
assembler (both of which affect compiler optimization
significantly).

A std::vector object contains a pointer to the controlled
sequence, not the case for an array on the stack accessed
locally from the same scope.

Both require the compiler to access through a pointer. Both
require some sort of pointer arithmetic. Beyond that, anything
you can say depends on the compiler and the machine hardware.
And the environment in which the array access occurs, which
affects how the compiler optimizes.

As mentioned in my original reply this extra instruction does
not count significantly to the timing differences observed so
other causes have to be considered (debug instead of release
build, optimizations turned off, extra bounds checking etc.)

Exactly. Other considerations which generally mean that there
won't be an extra instruction.

Aside: most sane implementations will inline std::vector<T>:perator[].

I think all do

. On the other hand, depending on optimization
levels and other compiler options, the compiler may or may not
respect the request to inline. (Without optimization, I don't
think g++ ever inlines. But I could be wrong about that.)

James Kanze · Jan 3, 2010

Come on, the difference must be evident with any stack-using
implementation -- that is the most common.

Having actually implemented compilers for stack based
implementations, it's quite evident to me that the difference
depends, and that the extra dereference should actually be quite
rare.

For direct array on the stack the base address is already in
the stack or frame poitner, and need no extra load. For the
other, the base is at arbitrary locaton that must be loaded in
a register with an extra register.

Most of the time, on most hardware, the base pointer for the
array will also already be in a register. Unless you've got a
very bad compiler. (It's a little more difficult for Intel
architecture, because of the paucity of registers. But
Microsoft's C 1.0 managed most of the times, so it's hardly
leading edge, even on Intel.)

The assy presented shows exactly that.

The assembly shows that when you surround a single access with
inline assembler and accesses to a volatile, the compiler
doesn't optimize very well. Something I was aware of before
even looking at the assembler.

Whether that single extra load, and associated new area in
cache actually measures in an application is a different
question, the difference may get masked entirely by a stall,
but it may as well show.

It won't show, because it won't be present in any critical
paths.

Why does everyone suppose that compilers are so stupid? This
sort of optimization has been present in most compilers for well
over 20 years now.

take a look at assembler output if you do not believe me, you
should see an extra instruction for the dereference of vector
operator[] compared to the stack array []. If you are using a
pointer/reference to the array then they will be the same.
int main()
{
volatile int output;
int a[10];
std::vector<int> v(10);
_asm int 3;
output = a[0];
_asm int 3;
output = v[0];
_asm int 3;
}

Click to expand...

What on earth is the volatile doing in there? And the _asm?

Click to expand...

Irreelvant to the point,

Very relevant to the point, because they create an environment
in which the compiler won't optimize.

but without such tricks tMSVC would just generate
an empty return for main and remove all the code.

Probably. But ensuring that no optimization is possible isn't
very realistic either, since in real code, optimization is
possible.

Note too that the old rule: "never trust a benchmark you didn't
falsify yourself" holds here as well. If I wanted to "prove"
that an extra instruction was required, I know how to write a
"realistic" benchmark that would do it. Except that 1) it would
really only prove the point for a given compiler on a given
architecture, and 2) "realistic" is in the eyes of the beholder:
while volatile and inline assembler certainly aren't realistic,
who knows where you really have to draw the line. (FWIW: my
case would use a virtual function returning a from a private
member array. In my experience, virtual functions are really
great optimization blockers for a lot of compilers---including
VC++. But is it realistic to have a virtual function which just
makes one access to a single element? That probably depends on
the application domain, but I've never seen it; whenever I'm
using an array, I'm accessing a large number of elements
locally, which means that the compiler will have hoisted the
pointer to the array data into a register. Regardless of where
that pointer comes from.)

I noticed a lot of extra instructions: a volatile, and some
_asm, to begin with.

Click to expand...

it's

Click to expand...

mov eax, DWORD PTR _a$[esp+72]

vs.

Click to expand...

mov ecx, DWORD PTR _v$[esp+76]
mov eax, DWORD PTR [ecx]

Click to expand...

Yes, but that code wouldn't necessarily be generated if there
wasn't a volatile nor any inline assembler.

If there is an offset calculation, it would add to the last line
respectively, one extra fetch is mandatory.

Click to expand...

One extra fetch is mandatory IF your code is twisted enough to
inhibit all compiler optimizations. In practice, the real code
I've seen isn't (and the architecture hasn't been Intel, which
changes things as well), and pointers to the array have always
been in a register.

[...]

TANSTAAFL, extra indirections generally comes with some cost.
That we probably accept for the benefits on the other side,
and should not deny.

Click to expand...

Except that the extra indirection isn't alway, or even usually,
present. If you pass the array to a function, and access it
there, for example, there is no difference between an array on
the stack and an array on the heap. (There may be a difference
between a C style array and std::vector, because many compilers
systematically pass class type objects in memory, but pointers
in a register.) If you use the array several times in the same
function, the compiler will also hoist the address calculations
up to before the first use---you don't get an extra access for
each use. And so on. The only way you can compare is to
implement both versions, with real code, and time them.

James Kanze · Jan 3, 2010

[...]

But I have one more question. If we know the number of
elements that we are going to store then is it always better
to reserve that size before hand itself?

It's generally not worth the effort, but if performance becomes
an issue, it's one of the simplest first steps to try.

The reason why I am asking this is because I had read
something to the contrary in Stanley Lipmmann's C++ Primer.

Here is the exact quote (C++ Primer, Page 92, Key Concepts :
Vectors grow dynamically)

"A central property of vectors (and the other library
containers) is that they are required to be implemented so
that it is efficient to add elements to them at run time.
Because vectors grow efficiently, it is usually best to let
the vector grow by adding elements to it dynamically as the
element values are known".

Certainly, It's best not to worry about it (since the
performance should be "reasonable") until the optmizer tells you
otherwise.

Note too that reserve does *not* grow the vector. It simply
ensures that the memory is pre-allocated so that growing the
vector won't require reallocation. (And most of the time I use
it, it is to guarantee lifetime of iterators, which may be
invalidated by reallocation, rather than for performance
reasons.)

I have looked up some online references for more information
about reserve. Here is my understanding. Please correct me if
I am wrong.

"When we add elements to the vector the compiler adds an
instruction to push the element into the contiguous block of
memory allocated for the vector. If this memory is not enough
then the compiler adds instructions to allocate a bigger blok
of memory (I read that this increases by a factor of 2) and
then moves all the elements from the previous location to the
newer location and then the previously allocated memory is
freed. So in my program since the above mentioned steps would
be done quite a few times, it un-necessarily adds to the run
time and hence my program would be slower".

Well, to start with, the compiler doesn't add any instructions;
they're all there to begin with

. But the code does execute
more or less instructions.

If what I have said above is true then why does Stanley
Lippmann suggest that it is always better to let vectors grow
dynamically?

Because the size increase is exponential, which means that over
a long period of time, push_back has amortized constant time,
and isn't expensive enough to worry about. (Until the profiler
tells you otherwise.)

Once the profiler indicates that you have a problem here, then
reserve is the simplest solution. For objects which are cheap
to copy, another alternative is resize, then use [] rather than
push_back. Logically, this would seem slower, but at least in
the one trial I did (i.e. for that one machine, with that
particular compiler and implementation of the library), resize
and [] were faster, despite actually initializing each cell
twice.

Udit Sharma · Jan 3, 2010

James and Leigh,

Thanks a lot for your clarifications. It helped me understand things
better.

James,

When I said compiler adds instructions, I actually meant that it
places it somewhere in the executable for it to be used when a bigger
block of memory needs to be created :-D. Am sorry that what I said
earlier was a bit misleading about compilers adding instructions at
run time. Pardon my English.

Cheers
Udit

Balog Pal · Jan 3, 2010

James Kanze said:
Having actually implemented compilers for stack based
implementations, it's quite evident to me that the difference
depends, and that the extra dereference should actually be quite
rare.

Well, this far you did not say anything that would explain the last part --
how you reach the "quite rare" figure.

The situation is like having points A anc C, case one: go stratight line
A->C, case two: tough point B first. B amy happen to be on the line, or
moved there by some tricks -- or the whole trip may be ignored in the full
travel context, but the difference still is one way, and its being zero
depends on a bunch of 'chance' factors.

And those factors are the kind IMO you can't tell.

Most of the time, on most hardware, the base pointer for the
array will also already be in a register. Unless you've got a
very bad compiler. (It's a little more difficult for Intel
architecture, because of the paucity of registers. But
Microsoft's C 1.0 managed most of the times, so it's hardly
leading edge, even on Intel.)

To actually have that, the compiler must:
- inline vector's constructor or last resize() operation that sets the data
address
- the arbitrary code that follows until the next index op shall not use all
the available data registers
- if the code has any of vector's operations those also must all be inlined,
or the compiler use special knowledge that they do not change the state

To me it looks like too many assumes tied to arbitrary/unknown code.
Your example in another post just keeps that code portion as nothing -- I
agree that for that case we can expect no extra load. But that case is too
tweaked to lay a 'proof'.

The assembly shows that when you surround a single access with
inline assembler and accesses to a volatile, the compiler
doesn't optimize very well. Something I was aware of before
even looking at the assembler.

In the presented case the compiler did a good job, and I'm sure it would
produce the same code for many different layouts using other tricks against
removal.

It won't show, because it won't be present in any critical
paths.

How can you know critical paths of unknown/arbitrary code?

Why does everyone suppose that compilers are so stupid? This
sort of optimization has been present in most compilers for well
over 20 years now.

Why you assume we think that? In many post here and around I'm the one
usually pointing out that healthy optimizer shall treat different-looking
expression the same.

The (good) optimizer can discover many things, and remove redundant stuff --
but is not magic. The vector case uses extra resource, and depends on much.
Also, the 'save' amount is about the lowest possible, a single pointer
fetch. A smart optimizer's first pick to drop if there is any doubt.

Probably. But ensuring that no optimization is possible isn't
very realistic either, since in real code, optimization is
possible.

Yeah -- subject to (at least) conditions mentioned above.

Note too that the old rule: "never trust a benchmark you didn't
falsify yourself" holds here as well. If I wanted to "prove"
that an extra instruction was required, I know how to write a
"realistic" benchmark that would do it.

Thought we are discussing an 'equivelence' issue, and be serioeus about it,
not a contest to cheat each other ot the public.

Except that 1) it would
really only prove the point for a given compiler on a given
architecture, and 2) "realistic" is in the eyes of the beholder:
while volatile and inline assembler certainly aren't realistic,
who knows where you really have to draw the line.

In general, possibly, for this case those were irrelevant and did not mess
with the relevant content.

(FWIW: my
case would use a virtual function returning a from a private
member array. In my experience, virtual functions are really
great optimization blockers for a lot of compilers---including
VC++.

Huh? Of course they are -- there is a call the compiler can't follow, and
the called function has license to change anything, including a private
member vector's state. For most practical cases even if you use const like
crazy.

And there are many similar 'blockers'. That is why I don't get your claim
for 'rare', my educated guess would calculate with at least one extra fetch
somewhere -- and call its omissin a case of good luck.

But is it realistic to have a virtual function which just
makes one access to a single element?

Click to expand...

Err, I think I lost you, as I got it originally the function may do aything,
and it is arbitrary. And having vector members and virtual functions is
hardly unrealistic. (Even with my position that in general nonvirtyual
functions outnumber virtuals by faaaar.

That probably depends on
the application domain, but I've never seen it; whenever I'm
using an array, I'm accessing a large number of elements
locally, which means that the compiler will have hoisted the
pointer to the array data into a register. Regardless of where
that pointer comes from.)

Click to expand...

IMO we all agreed that it *can* be hoisted. Also, that even if it isn't the
overall speed difference is likely amortized. And that using vector has too
many advantages to even think of old-style arrays without a very good and
profiled reason.

Yet, denying away the difference is IMO not fair -- the users are better
aware of the difference.

If there is an offset calculation, it would add to the last line
respectively, one extra fetch is mandatory.

Click to expand...

One extra fetch is mandatory IF your code is twisted enough to
inhibit all compiler optimizations. In practice, the real code
I've seen isn't (and the architecture hasn't been Intel, which
changes things as well), and pointers to the array have always
been in a register.

[...]

TANSTAAFL, extra indirections generally comes with some cost.
That we probably accept for the benefits on the other side,
and should not deny.

Click to expand...

Except that the extra indirection isn't alway, or even usually,
present. If you pass the array to a function, and access it
there, for example, there is no difference between an array on
the stack and an array on the heap.

Click to expand...

If you pass it the same way, that is... Then the difference is restricted to
the call site (and is exactly as discussed this far). If you pass the
vector by reference, then it remains. If you pass iterators, then you depend
on their implementation, some use plain pointers leading to identical code,
others may use fatties.

(There may be a difference
between a C style array and std::vector, because many compilers
systematically pass class type objects in memory, but pointers
in a register.) If you use the array several times in the same
function, the compiler will also hoist the address calculations
up to before the first use---you don't get an extra access for
each use. And so on.

Click to expand...

Yeah, sure. Stated *that* way there is nothing to argue.

Claiming the difference as 'bullshit' is different.

Nick Keighley · Jan 4, 2010

Accessing a vector's contents via operator[] will always be
slightly slower than accessing a local (stack based) array via
[] as you have to dereference a pointer with the vector
version.

Click to expand...

That's bullshit. You have to dereference a pointer in both
cases. In the case of indexing, you have to calculate the
address (both for stack based and for std::vector), and because
these calculations are hidden in some functions with std::vector
(and probably do bounds checking as well),

I thought operator[] didn't do bounds checking (or not usually) whilst
at() did.

James Kanze · Jan 4, 2010

We can come up with examples and counter-examples all day long.

Exactly. That's what I've been saying from the beginning. You
can't say absolutely that there is an extra access. There might
be. There might not be. It all depends on the actual use, the
compiler and the underlying architecture.

Consider this:

int main()
{
std::vector<int> v;
v.reserve(1000);
lib::vecarray<int, 1000> va;
{
timer t("vector: ");
for (int i = 0; i != 1000000; ++i)
{
for (int j = 0; j != 1000; ++j)
v.push_back(j);
v.clear();
}
}
{
timer t("vecarray: ");
for (int i = 0; i != 1000000; ++i)
{
for (int j = 0; j != 1000; ++j)
va.push_back(j);
va.clear();
}
}
}

Which outputs (several runs done, VC++ release build, secure
STL disabled):

vector: 2.9741 seconds
vecarray: 2.6782 seconds

Which indicates (for this example) that a stack based container
(http://i42.co.uk/stuff/vecarray.htm) is 11% quicker than std::vector.

For this example, with this compiler, on this machine. In
practice, I've had similar results with Sun CC on a Sparc, so
your measurements aren't unique. But in this case, you're
measuring the time necessary to grow the array, which is not the
same thing as the time necessary for an access.

If you'll remember, your original statement was: "Accessing a
vector's contents via operator[] will always be slightly slower
than accessing a local (stack based) array via [] as you have to
dereference a pointer with the vector version." That's the only
statement I'm disagreeing with, and in that statement, what I'm
really disagreeing with is the *always*. Because in most cases,
I think that the "dereference a pointer" simply won't be present
in the individual accesses. Or it will be present with the
"stack based array", because you'll have passed it to a
function. Your statement simply doesn't correspond to the
reality in actual programs.

How do i Do this function(dealing with arrays)	1	Dec 10, 2021
printing vectors	4	Nov 10, 2010
Crossword	2	May 11, 2020
Arrays vs. vectors - performance	12	Sep 17, 2003
Filter sober in c++ don't pass test	0	Dec 2, 2023
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
Fibonacci	0	May 13, 2023
vector of vectors	3	Jan 2, 2010

Vectors vs Arrays performance

Udit Sharma

Udit Sharma

Eric BÃ¶se-Wolf

Leclerc

James Kanze

James Kanze

Balog Pal

Udit Sharma

Udit Sharma

Miles Bader

Jan

Bo Persson

Carlo Milanesi

James Kanze

James Kanze

James Kanze

Udit Sharma

Balog Pal

Nick Keighley

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads