Use of std::vector slower than arrays?

M

mike3

Hi.

I was writing a bignum package in C++. I noticed that a version of an
addition routine I had written using std::vector was a lot slower than
one using simple "C-style" arrays, and this
raises the question: is using std::vector slower than using arrays???

This is the code for the simple C-style add function using C-style
arrays and pointers.

// Element-by-element array add.
Digit arradd(Digit *a, Digit *b, Digit *c, int len)
{
// set up pointers
Digit *ap(a), *bp(b), *cp(c);

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bp + *cp + carry); // add b/c's digits
carry = carry ? (tmp <= *bp) : (tmp < *bp); // check overflow/
carry
*ap = tmp; // set result
ap++; bp++; cp++; // increment pointers
}

return(carry);
}

This is the code for the one using std::vector:

// Addition using std::vector.
Digit vecadd(std::vector<Digit> *a,
const std::vector<Digit> *b,
const std::vector<Digit> *c,
int len)
{
// set up iterators
std::vector<Digit>::iterator ai(a->begin());
std::vector<Digit>::const_iterator bi(b->begin());
std::vector<Digit>::const_iterator ci(c->begin());

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bi + *ci + carry); // add b/c's digits
carry = carry ? (tmp <= *bi) : (tmp < *bi); // check overflow/
carry
*ai = tmp; // set result
ai++; bi++; ci++; // increment iterators
}

return(carry);
}

(Note: These routines assume all inputs are of the same length.
That's because they're just for testing purposes. Also, "Digit"
is a machine-word length type, for my machine it'd be 32 bits
long, so each one represents 32 bits of the number. The ordering
is little-endian, so the least-significant digit comes first.)

The timings show the simple array-based approach takes 4
seconds for 100 million operations on 128 bit numbers on my
machine. The one with std::vector, though, takes a whopping
40-41 seconds -- a 10x performance hit! OUCH!!! Why is that?
Is there any way to get around that? I was told(*) it is better to use
std::vector than using arrays, as it encapsulates the low-level
pointer stuff, making the software less prone to bugs. But in this
program, I need speed.

(*) See this post:
http://groups.google.com/group/comp.lang.c++/msg/a66278b77597d648
 
M

Michael

I was writing a bignum package in C++. I noticed that a version of an
addition routine I had written using std::vector was a lot slower than
one using simple "C-style" arrays, and this
raises the question: is using std::vector slower than using arrays???

Likely there is not a general answer other than "measure it." (With
the possible caveat "the standard says nothing about speed.")

In practical terms, I've noticed that debug builds under Visual C++
are dog-slow with STL-intensive tasks, presumably because it does all
sorts of extra checking to make sure you aren't writing past the end
of a vector or that kind of thing. The release builds are quite fast
though.

Michael
 
M

Mark P

mike3 said:
Hi.

I was writing a bignum package in C++. I noticed that a version of an
addition routine I had written using std::vector was a lot slower than
one using simple "C-style" arrays, and this
raises the question: is using std::vector slower than using arrays???

Are you using an optimized build on a reasonably recent / decent compiler?
This is the code for the simple C-style add function using C-style
arrays and pointers.

// Element-by-element array add.
Digit arradd(Digit *a, Digit *b, Digit *c, int len)
{
// set up pointers
Digit *ap(a), *bp(b), *cp(c);

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bp + *cp + carry); // add b/c's digits
carry = carry ? (tmp <= *bp) : (tmp < *bp); // check overflow/
carry
*ap = tmp; // set result
ap++; bp++; cp++; // increment pointers
}

return(carry);
}

This is the code for the one using std::vector:

// Addition using std::vector.
Digit vecadd(std::vector<Digit> *a,
const std::vector<Digit> *b,
const std::vector<Digit> *c,
int len)
{
// set up iterators
std::vector<Digit>::iterator ai(a->begin());
std::vector<Digit>::const_iterator bi(b->begin());
std::vector<Digit>::const_iterator ci(c->begin());

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bi + *ci + carry); // add b/c's digits
carry = carry ? (tmp <= *bi) : (tmp < *bi); // check overflow/
carry
*ai = tmp; // set result
ai++; bi++; ci++; // increment iterators
}

return(carry);
}

(Note: These routines assume all inputs are of the same length.
That's because they're just for testing purposes. Also, "Digit"
is a machine-word length type, for my machine it'd be 32 bits
long, so each one represents 32 bits of the number. The ordering
is little-endian, so the least-significant digit comes first.)

The timings show the simple array-based approach takes 4
seconds for 100 million operations on 128 bit numbers on my
machine. The one with std::vector, though, takes a whopping
40-41 seconds -- a 10x performance hit! OUCH!!! Why is that?
Is there any way to get around that? I was told(*) it is better to use
std::vector than using arrays, as it encapsulates the low-level
pointer stuff, making the software less prone to bugs. But in this
program, I need speed.

It's hard to say given what little you've shown us. We can't even see
how you're handling memory allocation. Try posting a complete program
(or two) that demonstrates the slowdown you've observed.
 
M

mike3

Likely there is not a general answer other than "measure it." (With
the possible caveat "the standard says nothing about speed.")

In practical terms, I've noticed that debug builds under Visual C++
are dog-slow with STL-intensive tasks, presumably because it does all
sorts of extra checking to make sure you aren't writing past the end
of a vector or that kind of thing. The release builds are quite fast
though.

Michael

Ooh. I was using the Microsoft C++ compiler, by the way. I measured,
like I said in the post, and it came out to 4 seconds for 100,000,000
operations on a simple C-like array of "Digit" 128 bits long, while an
std::vector of the same length and the same # of operations came out
to a whopping 40, an utterly atrocious amount of time! What can be
done? Should I write my own class to encapsulate the array that allows
for both "safe", slow access (with all the bounds-checking and stuff)
and "unsafe", fast access (ie. lets you obtain a pointer directly to
the contents of the array and use it)? Or is there still a way to use
std::vector for this application?
 
M

mike3

mike3 wrote:
It's hard to say given what little you've shown us. We can't even see
how you're handling memory allocation. Try posting a complete program
(or two) that demonstrates the slowdown you've observed.

Well, here's a complete testing program, then.

---------------------------------------------------

// Performance comparison of std::vector vs C-style array for
// bignum arithmetic.
#include <iostream>
#include <iomanip>
#include <vector>
#include <time.h>

using namespace std;

typedef unsigned long Digit; // should be equal to the machine's
wordsize

// Note: All arrays are stored in little-endian format.

// Test routine: Adds two digit strings together, stored as C arrays.
Digit arradd(Digit *a, Digit *b, Digit *c, int len)
{
// set up pointers
Digit *ap(a), *bp(b), *cp(c);

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bp + *cp + carry); // add b/c's digits
carry = carry ? (tmp <= *bp) : (tmp < *bp); // check overflow/
carry
*ap = tmp; // set result
ap++; bp++; cp++; // increment pointers
}

return(carry);
}

// Test routine: Adds two digit strings together, stored with vector.
Digit vecadd(vector<Digit> *a,
const vector<Digit> *b,
const vector<Digit> *c,
int len)
{
// set up iterators
vector<Digit>::iterator ai(a->begin());
vector<Digit>::const_iterator bi(b->begin());
vector<Digit>::const_iterator ci(c->begin());

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bi + *ci + carry); // add b/c's digits
carry = carry ? (tmp <= *bi) : (tmp < *bi); // check overflow/
carry
*ai = tmp; // set result
ai++; bi++; ci++; // increment iterators
}

return(carry);
}

// The main routine.
int main()
{
// Create the digit arrays and vectors.
Digit a[4], b[4];
vector<Digit> av, bv;

// Set up the vectors' memory storage.
av.reserve(4);
bv.reserve(4);

// Set the test values.
a[0] = 0x4B9F0101UL; av.push_back(0x4B9F0101UL); // LSD
a[1] = 0x56092310UL; av.push_back(0x56092310UL);
a[2] = 0x56238413UL; av.push_back(0x56238413UL);
a[3] = 0x44042121UL; av.push_back(0x44042121UL); // MSD
b[0] = 0x4B9F0101UL; bv.push_back(0x4B9F0101UL); // LSD
b[1] = 0x56092310UL; bv.push_back(0x56092310UL);
b[2] = 0x56238413UL; bv.push_back(0x56238413UL);
b[3] = 0x44042121UL; bv.push_back(0x44042121UL); // MSD

// Now run the tests.
time_t startt, endt;
clock_t startcpu, endcpu;

// Test 1
cout << "Test 1: 100 million 128-bit additions using C arrays..."
<< endl;

startt = time(0);
startcpu = clock();

for(int i(0);i<100000000;i++)
arradd(a, a, b, 4); // a = (a + b) % (1<<128)

endcpu = clock();
endt = time(0);

cout << "Test 1 complete." << endl;
cout << "CPU time elapsed: " << (endcpu - startcpu) /
CLOCKS_PER_SEC << " seconds" << endl;
cout << "Wall time elapsed: " << difftime(endt, startt) << "
seconds" << endl;;
cout << "Final sum: " << hex << uppercase << setfill('0');
for(int i(3);i>=0;i--)
cout << setw(8) << a << " ";
cout << dec << nouppercase << endl;
cout << endl;

// Test 2
cout << "Test 2: 100 million 128-bit additions using vector..." <<
endl;

startt = time(0);
startcpu = clock();

for(int i(0);i<100000000;i++)
vecadd(&av, &av, &bv, 4); // av = (av + bv) % (1<<128)

endcpu = clock();
endt = time(0);

cout << "Test 2 complete." << endl;
cout << "CPU time elapsed: " << (endcpu - startcpu) /
CLOCKS_PER_SEC << " seconds" << endl;
cout << "Wall time elapsed: " << difftime(endt, startt) << "
seconds" << endl;
cout << "Final sum: " << hex << uppercase << setfill('0');
for(int i(3);i>=0;i--)
cout << setw(8) << av.at(i) << " ";
cout << dec << nouppercase << endl;
cout << endl;

// Done.
return 0;
}

---------------------------------------------------
 
K

Kai-Uwe Bux

mike3 said:
Ooh. I was using the Microsoft C++ compiler, by the way.

What about the optimization settings?
I measured,
like I said in the post, and it came out to 4 seconds for 100,000,000
operations on a simple C-like array of "Digit" 128 bits long, while an
std::vector of the same length and the same # of operations came out
to a whopping 40, an utterly atrocious amount of time! What can be
done? Should I write my own class to encapsulate the array that allows
for both "safe", slow access (with all the bounds-checking and stuff)
and "unsafe", fast access (ie. lets you obtain a pointer directly to
the contents of the array and use it)? Or is there still a way to use
std::vector for this application?

I venture the conjecture that your std::vector does bounds checking. Chances
are that it will disappear if you define NDEBUG and turn on optimization.


Best

Kai-Uwe Bux
 
D

Daniel T.

mike3 said:
The timings show the simple array-based approach takes 4
seconds for 100 million operations on 128 bit numbers on my
machine. The one with std::vector, though, takes a whopping
40-41 seconds -- a 10x performance hit! OUCH!!! Why is that?

Because you did not optimize the build and/or did not remove the debug
checks from std::vector. Different compilers do things differently in
this regard. Check your documentation.
 
M

mike3

Because you did not optimize the build and/or did not remove the debug
checks from std::vector. Different compilers do things differently in
this regard. Check your documentation.

Well, I tried it with maximum optimization on, and that
got the time down to a nice 6 seconds, however the array-
based routine went down to a smoking-fast 2, so there's still a
3x performance gain from array use. I tried turning off the
debug, but that didn't seem to change it. Would it be a good
idea then to write my own encapsulation of the digit array,
with both "safe", slow access modes and "unsafe", fast ones
(ie. you could request a pointer to the digit array contained
in the thing and use it like a C-style array), then use this for
the time-critical areas (the bignum package) and std::vector
for everything else? Or would that just be silly and a waste
of work?
 
J

Jim Langston

mike3 said:
Hi.

I was writing a bignum package in C++. I noticed that a version of an
addition routine I had written using std::vector was a lot slower than
one using simple "C-style" arrays, and this
raises the question: is using std::vector slower than using arrays???

This is the code for the simple C-style add function using C-style
arrays and pointers.

// Element-by-element array add.
Digit arradd(Digit *a, Digit *b, Digit *c, int len)
{
// set up pointers
Digit *ap(a), *bp(b), *cp(c);

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bp + *cp + carry); // add b/c's digits
carry = carry ? (tmp <= *bp) : (tmp < *bp); // check overflow/
carry
*ap = tmp; // set result
ap++; bp++; cp++; // increment pointers
}

return(carry);
}

This is the code for the one using std::vector:

// Addition using std::vector.
Digit vecadd(std::vector<Digit> *a,
const std::vector<Digit> *b,
const std::vector<Digit> *c,
int len)
{
// set up iterators
std::vector<Digit>::iterator ai(a->begin());
std::vector<Digit>::const_iterator bi(b->begin());
std::vector<Digit>::const_iterator ci(c->begin());

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bi + *ci + carry); // add b/c's digits
carry = carry ? (tmp <= *bi) : (tmp < *bi); // check overflow/
carry
*ai = tmp; // set result
ai++; bi++; ci++; // increment iterators
}

return(carry);
}

(Note: These routines assume all inputs are of the same length.
That's because they're just for testing purposes. Also, "Digit"
is a machine-word length type, for my machine it'd be 32 bits
long, so each one represents 32 bits of the number. The ordering
is little-endian, so the least-significant digit comes first.)

The timings show the simple array-based approach takes 4
seconds for 100 million operations on 128 bit numbers on my
machine. The one with std::vector, though, takes a whopping
40-41 seconds -- a 10x performance hit! OUCH!!! Why is that?
Is there any way to get around that? I was told(*) it is better to use
std::vector than using arrays, as it encapsulates the low-level
pointer stuff, making the software less prone to bugs. But in this
program, I need speed.

(*) See this post:
http://groups.google.com/group/comp.lang.c++/msg/a66278b77597d648

I did some tests on my system and had dismal results also, but in release
things are a bit better, but not much.

Using your algorithm the array based was 6 seconds, the vector based was 29
seconds.

Using vectors with the array based algorithm
arradd(&av[0], &av[0], &bv[0], 4);
is 16 seconds.

I tried all the optmizations I could find but couldn't get an improvement
over these values.
 
M

mike3

"mike3" <[email protected]> wrote in message

I did some tests on my system and had dismal results also, but in release
things are a bit better, but not much.

Using your algorithm the array based was 6 seconds, the vector based was 29
seconds.

And I doubt there is any faster algorithm than this to
add numbers, so it would seem the best thing to do here
would be to do away with std::vector. However I've heard
that isn't so good, but then again, sometimes there are
such things as "necessary evils", and it looks like this
might just be one of them. If you read the post that
I linked to, you can see the discussion about the
"abstraction gap" in my previous attempt at an
implementation of the bignum package and the fractal
program, which is why I've been so hesitant to touch a
direct pointer. I'm afraid that if anyone else sees
the new code of the full bignum package, I'll look
like a doofus for refusing to use std::vector in there.
Using vectors with the array based algorithm
arradd(&av[0], &av[0], &bv[0], 4);
is 16 seconds.

I tried all the optmizations I could find but couldn't get an improvement
over these values.

So then might it be a good idea to, perhaps, write my own
encapsulator for the digit arrays that provides both slow
"safe" element handling _and_ allows one to fetch a pointer
that one can use for fast, "unsafe" handling? And only
use the "unsafe" direct pointer method in the performance-
critical areas of the program, which only makes up a small
portion of the total codebase (four routines: add, sub, mul,
div.) It does not need to be as complicated and general as
std::vector is, it would just need to have load/store, resize,
and direct pointer obtainment. Would this be acceptable,
or would it be a no-no?
 
I

Ian Collins

mike3 said:
Well, here's a complete testing program, then.
<snip>

Just for fun (I had to increase the loops to get meaningful data):

Test 1: 1000 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 5 seconds
Wall time elapsed: 5seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01

Test 2: 1000 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 15 seconds
Wall time elapsed: 16seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01
 
M

Markus Moll

mike3 said:
Well, here's a complete testing program, then.

I ran your testing program (with 300 mio. iterations) compiled with GCC. It
gives varying results, but on average it looks like this:


$ ./perf
Test 1: 300 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 4 seconds
Wall time elapsed: 5seconds
Final sum: 4C03923341D2421 60447D54AEE9D13 6027024086C5310 54835F77C23A401

Test 2: 300 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 4 seconds
Wall time elapsed: 4seconds
Final sum: 4C03923341D2421 60447D54AEE9D13 6027024086C5310 54835F77C23A401

Note that you dereference three extra pointers in your loop. If the compiler
isn't smart enough to optimize those away, it's not a fair comparison

Markus
 
M

Michael DOUBEZ

Ian Collins a écrit :
<snip>

Just for fun (I had to increase the loops to get meaningful data):

Test 1: 1000 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 5 seconds
Wall time elapsed: 5seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01

Test 2: 1000 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 15 seconds
Wall time elapsed: 16seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01

Just for fun with g++ 3.4.1:
--------------------------------------
$ g++ vectorTest.cpp -o vectorTest
$ ./vectorTest
Test 1: 10 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 5 seconds
Wall time elapsed: 5 seconds
Final sum: 5D495F5B A2C3F794 86D31DED DE4E1781

Test 2: 10 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 25 seconds
Wall time elapsed: 26 seconds
Final sum: 5D495F5B A2C3F794 86D31DED DE4E1781
------------------------------------------
$ g++ vectorTest.cpp -o vectorTest -O3
$ ./vectorTest
Test 1: 10 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 3 seconds
Wall time elapsed: 3 seconds
Final sum: 5D495F5B A2C3F794 86D31DED DE4E1781

Test 2: 10 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 3 seconds
Wall time elapsed: 3 seconds
Final sum: 5D495F5B A2C3F794 86D31DED DE4E1781

Michael
 
I

Ian Collins

Markus said:
I ran your testing program (with 300 mio. iterations) compiled with GCC. It
gives varying results, but on average it looks like this:


$ ./perf
Test 1: 300 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 4 seconds
Wall time elapsed: 5seconds
Final sum: 4C03923341D2421 60447D54AEE9D13 6027024086C5310 54835F77C23A401

Test 2: 300 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 4 seconds
Wall time elapsed: 4seconds
Final sum: 4C03923341D2421 60447D54AEE9D13 6027024086C5310 54835F77C23A401

Note that you dereference three extra pointers in your loop. If the compiler
isn't smart enough to optimize those away, it's not a fair comparison
Looks like you built it 64 bit and didn't adjust the values...
 
J

James Kanze

Because you did not optimize the build and/or did not remove
the debug checks from std::vector. Different compilers do
things differently in this regard. Check your documentation.
[/QUOTE]
Well, I tried it with maximum optimization on, and that
got the time down to a nice 6 seconds, however the array-
based routine went down to a smoking-fast 2, so there's still a
3x performance gain from array use. I tried turning off the
debug, but that didn't seem to change it.

I think you mentionned earlier that you use VC++. I seem to
recall hearing that you need to add some special options (/D
something) to turn off all of the debug checking; that just
setting the usual options for an optimized build aren't
sufficient. (No guarantee, but try /D_SECURE_SCL=0.)
Would it be a good idea then to write my own encapsulation of
the digit array, with both "safe", slow access modes and
"unsafe", fast ones (ie. you could request a pointer to the
digit array contained in the thing and use it like a C-style
array), then use this for the time-critical areas (the bignum
package) and std::vector for everything else? Or would that
just be silly and a waste of work?

It depends. If you want to be 100% of not loosing any cycles,
regardless of the library implementation, then obviously, you'll
have to write your own. Of course, if you want to be 100% sure
that the code is really optimized, you'll have to write your own
compiler as well, or maybe write the critical parts in
assembler. (In assembler, for example, you'll be able to access
the carry bit directly, rather than needing some additional, ad
hoc test.)

Most of the time, of course, it should be sufficient to just
figure out how to get the best performance from your compiler.
Depending on the quality of your library implementation, you may
not be able to get top performance from it, and end up writing
your own, but I doubt that this will be a problem with the VC++
implementation. Finding out how to turn off all of the internal
checking might be, however.
 
M

mathieu

Ooh. I was using the Microsoft C++ compiler, by the way. I measured,
like I said in the post, and it came out to 4 seconds for 100,000,000
operations on a simple C-like array of "Digit" 128 bits long, while an
std::vector of the same length and the same # of operations came out
to a whopping 40, an utterly atrocious amount of time! What can be
done? Should I write my own class to encapsulate the array that allows
for both "safe", slow access (with all the bounds-checking and stuff)
and "unsafe", fast access (ie. lets you obtain a pointer directly to
the contents of the array and use it)? Or is there still a way to use
std::vector for this application?


Hi Mike,

You are not specifying which compiler version are you using. There
is one (c++ .Net edition I think) that is pretty cheap, but won't do
optimization. You need to get the VCFreeTools (you can still find it
on the net). Or try VS2005. std::vector is clearly implemented using a
c-style array, so there should be no difference. Check your C++
compiler documentation.

CMake (http://cmake.org) is using those flags for release build: /
MD /O2 /Ob2 /D NDEBUG


HTH
-Mathieu
 
J

James Kanze

"mike3" <[email protected]> wrote in message

[...]
I did some tests on my system and had dismal results also, but
in release things are a bit better, but not much.
Using your algorithm the array based was 6 seconds, the vector
based was 29 seconds.
Using vectors with the array based algorithm
arradd(&av[0], &av[0], &bv[0], 4);
is 16 seconds.
I tried all the optmizations I could find but couldn't get an
improvement over these values.

Which compilers, and which options?

I gave the complete version he posted a quick try on the systems
I have at hand (a PC with VC++ 2005 and g++ 3.3.3---the CygWin
port, but I don't think that affects CPU bound programs much---,
an Intel PC with Linux and g++ 4.1.0, and a Sun Sparc with g++
4.1.0 and Sun CC 5.8---Sun CC comes with two different library
implementations). The results are interesting, to say the
least, and do indicate just how much the answer is
implementation dependent. I multiplied the loop count by 10 to
get measurable times. My compiler options were:

VC++: cl /vmg /GR /EHs /D_SECURE_SCL=0 /O2
g++: g++ -O3
Sun CC: CC -O4 and CC -library=stlport4 -O4

The results are interesting, to say the least:

array vector
Windows PC:
VC++ 22 22
g++ 30 38

Linux PC:
g++ 33 42

Sun Sparc:
g++ 73 77
Sun standard 60 172
Sun stlport 60 165

As we can see, g++ doesn't optimize as well as the other
compilers (not too surprising---portability has its price), for
example, and either the library implementations available with
Sun CC are very bad, or Sun CC has problems optimizing "more
advanced" code (or both---the library implementations *are*
very, very bad, however).

It's also interesting to note that the abstraction penalty with
g++ depends on the architecture: vector is 27% slower on a PC,
but only 5% slower on a Sun Sparc.

And VC++ deserves an accolade, both for the quality of its
library and the quality of its compiler. And a criticism: I had
to grunge around in the sources to find the _SECURE_SCL define;
while it didn't take me more than a couple of minutes, that's
because I'm used to this sort of thing---it really should be
documented somewhere. (Maybe it is, but I don't know how to
find it.)

Of course, if you want any real speed for this, you'll rewrite
the inner loop in assembler, where you have direct access to the
carry flag (and an add instruction which takes it into account,
so you don't have to test it anyway).
 
D

Daniel T.

mike3 said:
Well, I tried it with maximum optimization on, and that
got the time down to a nice 6 seconds, however the array-
based routine went down to a smoking-fast 2, so there's still a
3x performance gain from array use. I tried turning off the
debug, but that didn't seem to change it.

Then you missed something. There is absolutely no reason, based on the
code you showed us, that one routine would be faster than the other.

If you are using Visual C++, then check this out:
(http://msdn2.microsoft.com/en-us/library/aa985965(VS.80).aspx). Maybe
it will help.
Would it be a good
idea then to write my own encapsulation of the digit array,
with both "safe", slow access modes and "unsafe", fast ones
(ie. you could request a pointer to the digit array contained
in the thing and use it like a C-style array), then use this for
the time-critical areas (the bignum package) and std::vector
for everything else? Or would that just be silly and a waste
of work?

The latter. :)
 
K

Keith Willis

<snip>

Just for fun (I had to increase the loops to get meaningful data):

Test 1: 1000 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 5 seconds
Wall time elapsed: 5seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01

Test 2: 1000 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 15 seconds
Wall time elapsed: 16seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01

I know it's all nonsense, but I wanted to play too :)

Test 1: 100 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 4 seconds
Wall time elapsed: 9 seconds
Final sum: 40B88F68 5468071F 3DECEFB8 0675E201

Test 2: 100 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 4 seconds
Wall time elapsed: 9 seconds
Final sum: 40B88F68 5468071F 3DECEFB8 0675E201

This is on a sparc running Solaris 10, with the test compiled with g++
v3.4.5 with -O3.
 
G

Guest


[...]
I did some tests on my system and had dismal results also, but
in release things are a bit better, but not much.
Using your algorithm the array based was 6 seconds, the vector
based was 29 seconds.
Using vectors with the array based algorithm
arradd(&av[0], &av[0], &bv[0], 4);
is 16 seconds.
I tried all the optmizations I could find but couldn't get an
improvement over these values.

Which compilers, and which options?

[snip benchmark]
And VC++ deserves an accolade, both for the quality of its
library and the quality of its compiler. And a criticism: I had
to grunge around in the sources to find the _SECURE_SCL define;
while it didn't take me more than a couple of minutes, that's
because I'm used to this sort of thing---it really should be
documented somewhere. (Maybe it is, but I don't know how to
find it.)

It is quite well documented if you know where to look (search for
checked iterators at msdn), the problem is that they do not tell you
about it on the pages describing the containers and iterators.
Of course, if you want any real speed for this, you'll rewrite
the inner loop in assembler, where you have direct access to the
carry flag (and an add instruction which takes it into account,
so you don't have to test it anyway).

MS would recommend to use intrinsics (SIMD instructions) but of course
by then you have left everything called portability behind. And I have
no idea whether they would be useful for this purpose.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top