C++ is Slow?

nw · Feb 4, 2008

Hi all,

I'm constantly confronted with the following two techniques, which I
believe often produce less readable code, but I am told are faster
therefore better. Can anyone help me out with counter examples and
arguments?

1. The STL is slow.

More specifically vector. The argument goes like this:

"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).
Would using a valarray help?

2. iostream is slow.

I've encountered this is work recently. I'd not considered it before,
I like the syntax and don't do so much IO generally... I'm just now
starting to process terabytes of data, so it'll become an issue. Is
iostream slow? specifically I encountered the following example
googling around. The stdio version runs in around 1second, the
iostream version takes 8seconds. Is this just down to a poor iostream
implementation? (gcc 4 on OS X). Or are there reasons why iostream is
fundamentally slower for certain operations? Are there things I should
be keeping in mind to speed up io?

// stdio version
#include <cstdio>
using namespace std;
const int NULA = 0;
int main (void) {
for( int i = 0; i < 100000000; ++i )
printf( "a" );
return NULA;
}

//cout version
#include <iostream>
using namespace std;
const int NULA = 0;
int main (void) {
std::ios_base::sync_with_stdio(false);

for( int i = 0; i < 100000000; ++i )
cout << "a" ;
return NULA;
}

Victor Bazarov · Feb 4, 2008

nw said:
I'm constantly confronted with the following two techniques, which I
believe often produce less readable code, but I am told are faster
therefore better. Can anyone help me out with counter examples and
arguments?

1. The STL is slow.

More specifically vector. The argument goes like this:

"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

If the tests you've performed don't show any significant difference,
then the argument is not reasonable. The only reasonable argument
is the results of the test and your standards for significance.

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).
Would using a valarray help?

You can wrap your N-dimensional dynamic memory in a class and
add an overloaded operator () with N arguments, which will make
the syntax more acceptable.

2. iostream is slow.

I've encountered this is work recently. I'd not considered it before,
I like the syntax and don't do so much IO generally... I'm just now
starting to process terabytes of data, so it'll become an issue. Is
iostream slow? specifically I encountered the following example
googling around. The stdio version runs in around 1second, the
iostream version takes 8seconds. Is this just down to a poor iostream
implementation? (gcc 4 on OS X).

Most likely.

Or are there reasons why iostream is
fundamentally slower for certain operations?

There is no reason, AFAICT.

Are there things I should
be keeping in mind to speed up io?

The fewer conversions the better.

// stdio version
#include <cstdio>
using namespace std;
const int NULA = 0;
int main (void) {
for( int i = 0; i < 100000000; ++i )
printf( "a" );
return NULA;
}

//cout version
#include <iostream>
using namespace std;
const int NULA = 0;
int main (void) {
std::ios_base::sync_with_stdio(false);

for( int i = 0; i < 100000000; ++i )
cout << "a" ;
return NULA;
}

V

Alf P. Steinbach · Feb 4, 2008

* nw:

Hi all,

I'm constantly confronted with the following two techniques, which I
believe often produce less readable code, but I am told are faster
therefore better. Can anyone help me out with counter examples and
arguments?

1. The STL is slow.

More specifically vector. The argument goes like this:

"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

To me this syntax looks horrible, am I wrong?

The syntax is horrible, and the code is brittle.

Is vector the wrong
container to use?

No, not necessarily. Doing the above with vector gives you about the
same performance but less unsafe (no manual memory management) and more
convenient. Using a library matrix class even better.

(My usual solution would be a vector<vector<int> >).
Would using a valarray help?

I don't think valarray would help, I think on the contrary. Also
consider that that's essentially a not-quite-complete not-quite-kosher
part of the standard library. Which will never be completed.

2. iostream is slow.

I've encountered this is work recently. I'd not considered it before,
I like the syntax and don't do so much IO generally... I'm just now
starting to process terabytes of data, so it'll become an issue. Is
iostream slow?

Depends very much on the implementation. As a general rule, naive code
using C library i/o will be faster than equally naive code using
iostreams, or at least it was that way some years ago. However, you can
probably speed up things considerably for iostreams by using less naive
code, essentially buying speed by paying in complexity and code size.

specifically I encountered the following example
googling around. The stdio version runs in around 1second, the
iostream version takes 8seconds. Is this just down to a poor iostream
implementation? (gcc 4 on OS X). Or are there reasons why iostream is
fundamentally slower for certain operations? Are there things I should
be keeping in mind to speed up io?

// stdio version
#include <cstdio>
using namespace std;
const int NULA = 0;
int main (void) {
for( int i = 0; i < 100000000; ++i )
printf( "a" );
return NULA;
}

//cout version
#include <iostream>
using namespace std;
const int NULA = 0;
int main (void) {
std::ios_base::sync_with_stdio(false);

for( int i = 0; i < 100000000; ++i )
cout << "a" ;
return NULA;
}

Guideline: reserve all uppercase names for macros.

The C++ macro denoting success return value for main is EXIT_SUCCESS.

Cheers, & hth.,

- Alf

Ioannis Vranos · Feb 4, 2008

nw said:
Hi all,

I'm constantly confronted with the following two techniques, which I
believe often produce less readable code, but I am told are faster
therefore better. Can anyone help me out with counter examples and
arguments?

1. The STL is slow.

More specifically vector. The argument goes like this:

"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).
Would using a valarray help?

In normal cases you should use vector as you mentioned.

2. iostream is slow.

I've encountered this is work recently. I'd not considered it before,
I like the syntax and don't do so much IO generally... I'm just now
starting to process terabytes of data, so it'll become an issue. Is
iostream slow? specifically I encountered the following example
googling around. The stdio version runs in around 1second, the
iostream version takes 8seconds. Is this just down to a poor iostream
implementation? (gcc 4 on OS X). Or are there reasons why iostream is
fundamentally slower for certain operations? Are there things I should
be keeping in mind to speed up io?

// stdio version
#include <cstdio>
using namespace std;
const int NULA = 0;
int main (void) {
for( int i = 0; i < 100000000; ++i )
printf( "a" );
return NULA;
}

//cout version
#include <iostream>
using namespace std;
const int NULA = 0;
int main (void) {
std::ios_base::sync_with_stdio(false);

for( int i = 0; i < 100000000; ++i )
cout << "a" ;
return NULA;
}

I suppose it is an implementation issue. The two codes should undertake
the same time-cost. If you remove
"std::ios_base::sync_with_stdio(false);" the output is slower?

Also check the compiler optimisation switches, they can enhance the
run-time execution.

Daniel T. · Feb 5, 2008

nw said:
I'm constantly confronted with the following two techniques, which I
believe often produce less readable code, but I am told are faster
therefore better. Can anyone help me out with counter examples and
arguments?

1. The STL is slow.

More specifically vector. The argument goes like this:

Vector is no slower than manual dynamic array allocation. As for the
other containers, I once challenged a office mate (a C programmer who
had extensive knowledge of assembler) to write a double link list class
that was faster than the std::list implementation that came with the
compiler. He claimed he could do it because he was able to optimize his
list to work specifically with the data we were storing. He insisted
that the std::list couldn't possibly be as fast because it was "too
general". Despite his best efforts, std::list was a full 5% faster than
his code... using his own test suite!

"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).

I wouldn't use a vector<vector<int> > unless I needed a ragged array.
Even then, I would likely burry it in a class so I can change the
container without having to edit the entire code base.

Look at the Matrix class in the FAQ. Something like this:

template < typename T >
class Matrix {
public:
   Matrix(unsigned rows, unsigned cols) :
cols_( cols ),
data_( rows * cols )
{ }

   T& operator() (unsigned row, unsigned col)
{
// might want to consider error checking.
    return data_[cols_*row + col];
}
const T& operator() (unsigned row, unsigned col) const;
{
// might want to consider error checking.
    return data_[cols_*row + col];
}
// other methods to taste
private:
   unsigned cols_;
   vector<T> data_;
};

It's very simple to use:

Matrix<int> myarray( xdim, ydim);

accessed like:

myarray( xpos, ypos ) = avalue;

2. iostream is slow.

I've encountered this is work recently. I'd not considered it before,
I like the syntax and don't do so much IO generally... I'm just now
starting to process terabytes of data, so it'll become an issue. Is
iostream slow?

This I don't know about. I don't deal with the iostream library much.

Ioannis Vranos · Feb 5, 2008

nw said:
More specifically vector. The argument goes like this:>
"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).
Would using a valarray help?

I think your code is incorrect. Your approach corrected:

#include <cstdlib>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

int (*my_array)[YDIM]= static_cast<int (*)[200]> ( malloc(XDIM* YDIM
* **my_array) );

if(my_array== 0)
return EXIT_FAILURE;

for(size_t i= 0; i< XDIM; ++i)
for(size_t j= 0; j< YDIM; ++j)
my_array[j]= i+j;

// ...

free(my_array);

// ...
}

The equivalent C++ style:

include <cstdlib>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

int (*my_array)[YDIM]= new int[XDIM][YDIM];

for(size_t i= 0; i< XDIM; ++i)
for(size_t j= 0; j< YDIM; ++j)
my_array[j]= i+j;
}

The proper C++ approach:

#include <cstdlib>
#include <vector>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

vector<vector<int> > my_array(XDIM, vector<int>(YDIM));

for(vector<vector<int> >::size_type i= 0; i< my_array.size(); ++i)
for(vector<int>::size_type j= 0; j< my_array.size(); ++j)
my_array[j]= i+j;

// ...

// No need to clean up your memory or any other resource
// - RAII (Resource Acquisition Is Initialisation)

}

Daniel T. · Feb 5, 2008

Ioannis Vranos said:
The proper C++ approach:

vector<vector<int> > my_array(XDIM, vector<int>(YDIM));

That is not necessarily the proper approach. The above creates YDIM + 1
separate blocks of code which may, or may not be a good idea.

terminator · Feb 5, 2008

nw said:
nw said:

More specifically vector. The argument goes like this:>
"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

Click to expand...

const int xdim=100;
const int ydim=100;

Click to expand...

int *myarray = malloc(xdim*ydim*sizeof(int));

Click to expand...

and accessed like this:

Click to expand...

myarray[xdim*ypos+xpos] = avalue;

Click to expand...

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

Click to expand...

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).
Would using a valarray help?

Click to expand...

I think your code is incorrect. Your approach corrected:

#include <cstdlib>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

int (*my_array)[YDIM]= static_cast<int (*)[200]> ( malloc(XDIM* YDIM
* **my_array) );

if(my_array== 0)
return EXIT_FAILURE;

for(size_t i= 0; i< XDIM; ++i)
for(size_t j= 0; j< YDIM; ++j)
my_array[j]= i+j;

// ...

free(my_array);

// ...

}

The equivalent C++ style:

include <cstdlib>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

int (*my_array)[YDIM]= new int[XDIM][YDIM];

for(size_t i= 0; i< XDIM; ++i)
for(size_t j= 0; j< YDIM; ++j)
my_array[j]= i+j;

}

The proper C++ approach:

#include <cstdlib>
#include <vector>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

vector<vector<int> > my_array(XDIM, vector<int>(YDIM));

for(vector<vector<int> >::size_type i= 0; i< my_array.size(); ++i)
for(vector<int>::size_type j= 0; j< my_array.size(); ++j)
my_array[j]= i+j;

// ...

// No need to clean up your memory or any other resource
// - RAII (Resource Acquisition Is Initialisation)

}- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -

I would rather:

class myclass{
vector <int> data;
public:
myclass(Xdim,Ydim):
data(Xdim*Ydim)//*int data[Ydim*Xdim];*/
{};
...
};

regards,
FM.

Jim Langston · Feb 5, 2008

nw said:
Hi all,

I'm constantly confronted with the following two techniques, which I
believe often produce less readable code, but I am told are faster
therefore better. Can anyone help me out with counter examples and
arguments?

1. The STL is slow.

More specifically vector. The argument goes like this:

"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).
Would using a valarray help?

std::vector is not necessarily slower than a manual way of doing the same
thing. Consider your manual malloc, for example. You could do the same
thing with a std::vector:

std::vector<int> myarray( xdim * ydim );
myarray[xdim*ypos + xpos] = avalue;

A std::vector is simply a dynamic array and poses all the limitations of a
dynamic array. A
std::vector<std::vector<int> >
isn't going to be necessarily faster, or slower, than a manual way of doing
it, such as an array of pointers which are allocated for each row of dasta.

If you find that a std::vector<std::vector<int> > is a bottle neck for you,
you could wrap a std::vector<int> in a class and get the speed benifits of a
continguous allocated memory block without the headache of doing the math
each time. Perhaps create an at( int row, int col) that returns a reference
to the correct element.

stl containers are generic enough to be used easily, but are generic enough
that sometimes you might want to optimize your code to be fsater if you need
to.

I had a programmer friend tell me that stl was horribly slow, I had him send
me his code and I found he was using .resize() needlessly over and over. I
restructured his code and the final code came about as fast as what he was
doing without stl.

2. iostream is slow.

I've encountered this is work recently. I'd not considered it before,
I like the syntax and don't do so much IO generally... I'm just now
starting to process terabytes of data, so it'll become an issue. Is
iostream slow? specifically I encountered the following example
googling around. The stdio version runs in around 1second, the
iostream version takes 8seconds. Is this just down to a poor iostream
implementation? (gcc 4 on OS X). Or are there reasons why iostream is
fundamentally slower for certain operations? Are there things I should
be keeping in mind to speed up io?

// stdio version
#include <cstdio>
using namespace std;
const int NULA = 0;
int main (void) {
for( int i = 0; i < 100000000; ++i )
printf( "a" );
return NULA;
}

//cout version
#include <iostream>
using namespace std;
const int NULA = 0;
int main (void) {
std::ios_base::sync_with_stdio(false);

for( int i = 0; i < 100000000; ++i )
cout << "a" ;
return NULA;
}

As far as cout .vs. printf, they are both output routines. I've never found
much need to optimize user output as that is usually the bottle neck anyway,
showing informatoin to the user and waiting for user response. In your
example you are stating 1 second for 100 million iterations versus 8 seconds
for 100 million iterations. Meaning each iteration is taking 8/100
millionths of a second. That is, what, lets see, 1/100th is a milisecond,
1/1,000 is a nano second, 1/1,000,000 is a .. pico second? 8/100 of a pico
second? For something that is so fast and not used that often, do we care?
You might want to look up premature optimization.

I don't think it's pico anyway. Not sure.

Alex Vinokur · Feb 5, 2008

On Feb 5 said:
1. The STL is slow.

Comparative performance measurement

C-qsort vs. STL-sort (including vectors)
http://groups.google.com/group/comp.lang.c++/msg/35dc98dcb6f0f9fc

[snip]

2. iostream is slow.

Copying files: C vs. C++
http://groups.google.com/group/log-files/browse_frm/thread/508ee2a154042a0c
http://groups.google.com/group/sources/msg/eec640ecc648422c

[snip]

Copying files: C vs. C++
http://groups.google.com/group/log-files/browse_frm/thread/508ee2a154042a0c
http://groups.google.com/group/sources/msg/eec640ecc648422c

Ioannis Vranos · Feb 5, 2008

Code correction:

Ioannis said:
nw said:

More specifically vector. The argument goes like this:>
"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).
Would using a valarray help?

Click to expand...

I think your code is incorrect. Your approach corrected:

#include <cstdlib>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

if(my_array== 0)
return EXIT_FAILURE;

for(size_t i= 0; i< XDIM; ++i)
for(size_t j= 0; j< YDIM; ++j)
my_array[j]= i+j;

// ...

free(my_array);

// ...
}

The equivalent C++ style:

include <cstdlib>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

int (*my_array)[YDIM]= new int[XDIM][YDIM];

for(size_t i= 0; i< XDIM; ++i)
for(size_t j= 0; j< YDIM; ++j)
my_array[j]= i+j;
}

The proper C++ approach:

#include <cstdlib>
#include <vector>

int main()
{
using namespace std;

const int XDIM= 100;

const int YDIM= 200;

vector<vector<int> > my_array(XDIM, vector<int>(YDIM));

for(vector<vector<int> >::size_type i= 0; i< my_array.size(); ++i)
for(vector<int>::size_type j= 0; j< my_array.size(); ++j)
my_array[j]= i+j;

// ...

// No need to clean up your memory or any other resource
// - RAII (Resource Acquisition Is Initialisation)

}

nw · Feb 5, 2008

That is, what, lets see, 1/100th is a milisecond,

1/1,000 is a nano second, 1/1,000,000 is a .. pico second? 8/100 of a pico
second? For something that is so fast and not used that often, do we care?
You might want to look up premature optimization.

That's just an example, in reality I'm looking at 19.2Gb (compressed
size) of text files, which I'll have to parse on a regular basis.

James Kanze · Feb 5, 2008

I'm constantly confronted with the following two techniques, which I
believe often produce less readable code, but I am told are faster
therefore better. Can anyone help me out with counter examples and
arguments?

1. The STL is slow.

More specifically vector. The argument goes like this:

"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

It depends.

First, of course, this argument has nothing to do with
std::vector. Whether you use std::vector< int > or malloc in
this case probably won't change anything in time, and will
ensure that the memory is correctly freed in case of an
exception.

Second, whether it is faster to multiply, or to chase pointers,
depends very heavily on the machine architecture. When I did a
test like this on the original Intel 8086, chasing pointers won
hands down.

Third, whatever you do, you should wrap it in a class, so you
can change it later, if it turns out that the implementation
isn't optimal, and is creating a bottleneck.

To me this syntax looks horrible, am I wrong? Is vector the
wrong container to use? (My usual solution would be a
vector<vector<int> >). Would using a valarray help?

Wrap it in a class, and don't worry about it until the profiler
says you have to. At that point, try the different solutions on
the actual target hardware.

2. iostream is slow.

I've encountered this is work recently. I'd not considered it
before, I like the syntax and don't do so much IO generally...
I'm just now starting to process terabytes of data, so it'll
become an issue. Is iostream slow?

Again, it depends on the implementation. At least one person,
in the past, created an implementation which was faster than
stdio. For the most part, however, current implementations are
"fast enough", and implementors haven't bothered improving them,
even when faster implementations exist. Regardless of what you
do, if you're processing terabytes, getting the terabytes
physically into and out of memory will dominate runtimes.

specifically I encountered the following example googling
around. The stdio version runs in around 1second, the iostream
version takes 8seconds. Is this just down to a poor iostream
implementation? (gcc 4 on OS X). Or are there reasons why
iostream is fundamentally slower for certain operations? Are
there things I should be keeping in mind to speed up io?

The standards committee added support for code translation to
filebuf, which unless the implementation is very, very careful,
can slow things down significantly. In the past, Dietmar Kuehl
worked out some meta-programming technique which avoided the
cost as long as the translation was the identity function. I
don't think any implementation uses it, however.

Note that under g++ 2.95.2, which used the classical iostream,
iostream was actually faster than stdio. On the whole, though,
iostream implementations aren't as mature as those of stdio
(which, after all, has been around a long, long time). And I
don't find anywhere near that great of difference on a Sun
Sparc.

Erik WikstrÃ¶m · Feb 5, 2008

That's just an example, in reality I'm looking at 19.2Gb (compressed
size) of text files, which I'll have to parse on a regular basis.

Depending on what kind of processing/parsing you are going to do (and
what kind of hardware you are running) it is very likely that disk I/O
or decompression will be the bottleneck.

Puppet_Sock · Feb 5, 2008

Depending on what kind of processing/parsing you are going to do (and
what kind of hardware you are running) it is very likely that disk I/O
or decompression will be the bottleneck.

I'm going to (I hope) echo Erik's thoughts here.

Whenever you get to questions like "which is better" or "which
is faster" it is important to have a specification of what
really is better. "I like it that way" isn't a spec unless
the person saying it will give you money to do it his way.

And you need to measure it using several typical cases.
Presuming it is execution speed, you need a stopwatch.
And you need to compare various methods and see which
is faster and if it is significant. And you need to make
some tests to see what part of the app is using the time
and what parts are not making much of a difference.

Many stories are appropriate, but long winded and boring.
Just insert long shaggy-dog story of a "bug" that could
not be found when execution speed was perceived as a
problem, and a long attack on optimizing the code followed.
Only to end when the app was found to spend more than
90 percent of its time doing something other than the
part of the task being optimized. Such as Erik suggests,
where disk I/O might be using most of the time and there
isn't all that much you can do about it.
Socks

Juha Nieminen · Feb 5, 2008

Erik said:
Depending on what kind of processing/parsing you are going to do (and
what kind of hardware you are running) it is very likely that disk I/O
or decompression will be the bottleneck.

While that might sound plausible in theory, unfortunately my own
real-life experience tells otherwise: When reading&parsing input and
printing formatted output (usually to a file), switching from C++
streams to C streams usually gives a considerable speedup with all the
compilers I have tried with. We are talking about at least twice the
speed, if not even more, which is not a small difference.

If you are reading&parsing hundreds of megabytes of input and
outputting also hundreds of megabytes, the difference could well be very
considerable (eg. 20 seconds instead of 1 minute).

Juha Nieminen · Feb 5, 2008

nw said:
Is iostream slow?

My own practical experience has shown that for example if you are
reading&parsing tons of formatted data (in ascii format) and/or
outputting tons of formatted data (in ascii), switching from C++ streams
to C streams can produce a very considerable speedup (it can be at least
twice as fast) with all compilers I have tried. I have been in several
such projects where parsing of large ascii input files were necessary,
and in each case, with different compilers, switching to C streams gave
a very large speedup.

I haven't tested what happens if you simply read/write a large block
of binary data with fread()/fwrite() or the iostream equivalents, but I
assume that in this case the difference should be minimal, if there is any.

Or are there reasons why iostream is
fundamentally slower for certain operations?

While in theory iostream could be even faster than C streams for
certain operations (eg. printf() vs. std::cout) because in the latter
typing can be performed at compile time while in the former it's done at
runtime (by parsing the format string), in practice most iostream
implementations are considerably slower than the C equivalents. One
reason for this might be that most iostream operations are performed by
virtual functions, which probably cannot be inlined, or other such
reasons. Another reason may be that compiler makers simply haven't
optimized iostream as well as the C stream functions have been.

James Kanze · Feb 6, 2008

While that might sound plausible in theory, unfortunately my
own real-life experience tells otherwise: When reading&parsing
input and printing formatted output (usually to a file),
switching from C++ streams to C streams usually gives a
considerable speedup with all the compilers I have tried with.
We are talking about at least twice the speed, if not even
more, which is not a small difference.

If you are reading&parsing hundreds of megabytes of input and
outputting also hundreds of megabytes, the difference could
well be very considerable (eg. 20 seconds instead of 1
minute).

A quick test on the implementations I happen to have handy,
using the proposed benchmark, showed a little less that twice,
but not much. However, I'm not sure that even that means much:
if the disk were mounted on a slow network, the difference would
doubtlessly be less. (My experience is that SMB is almost an
order of magnitude slower than NFS, so if you're accessing a
remote disk under Windows, you really can forget about anything
but the I/O times.) And floating point formatting and parsing
can be very expensive in themselves---if iostream manages to
somehow do it better (e.g. because it specializes for float,
rather than parsing a double, then converting), then that could
make up of inefficiencies elsewhere. Alternatively, most
iostream implementations are less mature, and so there is a
distinct possibility that the floating point conversions are
less optimized, and the difference greater.

In sum, while it wouldn't surprise me if printf were faster in a
given implementation, I'd measure exactly what I needed before
making any decisions. Also, if speed is a criteria, I'd
consider using lower level I/O. Back in my pre-C++ days, I once
speeded a program up by over 60% by just using Unix level system
I/O rather than stdio.h. And mmap or its equivalent under
Windows can make even more of a difference.

In the end, you'll have to experiment. If iostream is fast
enough, there's no point in trying anything else. If it's not,
using stdio.h probably worth a try. And if even that's not fast
enough, you may have to go even lower. (In my experience, every
time iostream has been too slow, switching to stdio.h hasn't
been sufficient either, and we've had to go even lower. But
that doesn't mean that your experience will be identical.)

Jim Langston · Feb 6, 2008

nw said:
Hi all,

I'm constantly confronted with the following two techniques, which I
believe often produce less readable code, but I am told are faster
therefore better. Can anyone help me out with counter examples and
arguments?

1. The STL is slow.

More specifically vector. The argument goes like this:

"Multidimensional arrays should be allocated as large contiguous
blocks. This is so that when you are accessing the array and reach the
end of a row, the next row will already be in the cache. You also
don't need to spend time navigating pointers when accessing the array.
So a 2 dimensional array of size 100x100 should be created like this:

const int xdim=100;
const int ydim=100;

int *myarray = malloc(xdim*ydim*sizeof(int));

and accessed like this:

myarray[xdim*ypos+xpos] = avalue;

Is this argument reasonable? (Sounds reasonable to me, though the
small tests I've performed don't usually show any significant
difference).

To me this syntax looks horrible, am I wrong? Is vector the wrong
container to use? (My usual solution would be a vector<vector<int> >).
Would using a valarray help?

2. iostream is slow.

I've encountered this is work recently. I'd not considered it before,
I like the syntax and don't do so much IO generally... I'm just now
starting to process terabytes of data, so it'll become an issue. Is
iostream slow? specifically I encountered the following example
googling around. The stdio version runs in around 1second, the
iostream version takes 8seconds. Is this just down to a poor iostream
implementation? (gcc 4 on OS X). Or are there reasons why iostream is
fundamentally slower for certain operations? Are there things I should
be keeping in mind to speed up io?

// stdio version
#include <cstdio>
using namespace std;
const int NULA = 0;
int main (void) {
for( int i = 0; i < 100000000; ++i )
printf( "a" );
return NULA;
}

//cout version
#include <iostream>
using namespace std;
const int NULA = 0;
int main (void) {
std::ios_base::sync_with_stdio(false);

for( int i = 0; i < 100000000; ++i )
cout << "a" ;
return NULA;
}

I don't know how you got yours to run in 1 second and 8 second. On my
platform it was taking too long and I reduced the iterations. Here is a
test program I did with results:

#include <ctime>
#include <cstdio>
#include <iostream>

const int interations = 1000000; // 100000000
// stdio version
int mainC (void) {

std::ios_base::sync_with_stdio(false);

for( int i = 0; i < interations; ++i )
printf( "a" );
return 0;
}

//cout version
int mainCPP ()
{
std::ios_base::sync_with_stdio(false);

for( int i = 0; i < interations; ++i )
std::cout << "a" ;
return 0;
}

int main()
{
clock_t start;
clock_t stop;

start = clock();
mainC();
stop = clock();
clock_t ctime = stop - start;

start = clock();
mainCPP();
stop = clock();
clock_t cpptime = stop - start;

std::cout << "C: " << ctime << " C++: " << cpptime << "\n";

std::cout << static_cast<double>( cpptime ) / ctime << "\n";

}

after a bunch of aaaa's...

C: 20331 C++: 23418
1.15184

This is showing the stl to be about 15% slower than the C code.

Microsoft Visual C++ .net 2003
Windows XP Service Pack 2

Unfortunately with my compiler my optimizations are disabled so I don't know
how it would be optimized. But it is not 8x difference.

I would be curious of the output of different compilers. Note that clock()
on microsoft platforms shows total time, not just processing time.

Mirek Fidler · Feb 6, 2008

I don't know how you got yours to run in 1 second and 8 second. On my

platform it was taking too long and I reduced the iterations. Here is a
after a bunch of aaaa's...

C: 20331 C++: 23418
1.15184

This is showing the stl to be about 15% slower than the C code.

Actually, this only proves that this particular library has both C and
C++ implementations inefficient...

Mirek

Lexical Analysis on C++	1	Oct 31, 2023
Character operations in C++	2	Jan 28, 2024
Filter sober in c++ don't pass test	0	Dec 2, 2023
Function is not worked in C	2	Jun 27, 2023
Why is boost sg_set so slow on ordered insertions?	0	Aug 13, 2012
Somone's SO question: "Is there an existing library for dynamically-determineddimensional array in c	1	Dec 9, 2013
Stack is slow than heap?	23	Nov 7, 2011
How to multiply two matrices of size in using inline assembly in C++	2	Mar 3, 2024

C++ is Slow?

nw

Victor Bazarov

Alf P. Steinbach

Ioannis Vranos

Daniel T.

Ioannis Vranos

Daniel T.

terminator

Jim Langston

Alex Vinokur

Ioannis Vranos

nw

James Kanze

Erik WikstrÃ¶m

Puppet_Sock

Juha Nieminen

Juha Nieminen

James Kanze

Jim Langston

Mirek Fidler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads