What influences C++ I/O performance?

S

SzH

I would like to read in large integer matrices from text files, as
quickly as possible. I did a naive implementation (see below), but
for an input file with 100 rows and 250000 columns it runs more than
5x slower when compiled with visual studio express 2008 than the
version compiled with gcc 4.2.1 (mingw), on the same computer, same
OS.

I am not a regular C++ user. What are the things (relevant to the
standard C++ library) that influence the performance of I/O? I don't
believe that this is a "bug" in the VS compiler. It is more likely
that there are some relevant settings for iostreams (with a
significant impact on performance) that I am not aware of.

Any suggestions for making this program perform well portably would me
most welcome!

This is the relevant portion of the code:

std::ios::sync_with_stdio(false);
std::ifstream infile(path);

input_array arr(); // see note below

int row = 0, max_col = 0;

std::string line;
while (std::getline(infile, line)) {
std::istringstream ln(line);
int number;
int col = 0;
while (ln >> number) {
arr.append(number);
col++;
}
if (col != 0) {
if (row == 0)
max_col = col;
if (col != max_col) {
/*
* Array is not rectangular.
* Handle error (exit).
*/
}
row++;
}
}

Notes: 'input_array' is just a simple integer array, which grows
exponentially to accommodate new elements. It is not the source of
the performance problem.

Szabolcs
 
K

Krice

I am not a regular C++ user. What are the things (relevant to the
standard C++ library) that influence the performance of I/O?

Compiler optimization settings could be worth trying.
I guess VS has slower debug mode executables as well.
 
S

SzH

Compiler optimization settings could be worth trying.
I guess VS has slower debug mode executables as well.

I forgot to mention that "release" mode was used with VS
(optimizations turned on). In "debug" mode, the program is twice as
slow, i.e. 10x as slow as with gcc. Whether optimizations are turned
on or not didn't really matter with gcc.

The VS timing was approx 80 seconds (in "release" mode), while the gcc
timing was about 15 seconds.
 
J

Jerry Coffin

[ ... ]
This is the relevant portion of the code:

I suspect you've done a bit of unintentional editing while separating
out the relevant portion of the code.
std::ios::sync_with_stdio(false);
std::ifstream infile(path);

input_array arr(); // see note below

This is NOT a definition of an input_array named arr. Rather, it's a
declaration of a function named arr that takes no parameters, and
returns an input_array. Until this is fixed, I don't think the rest of
the code can even compile.

As far as speed of I/O goes, I'm having difficulty reproducing the
problem you cite. I started by writing a small program to generate a
test file of the size you cited:

#include <iostream>

char filename[] = "c:\\c\\source\\junk.mat";

int main() {
std::eek:fstream out(filename);

for (int row=0; row<100; row++) {
for (int col=0; col<25000; col++)
out << row + col << "\t";
out << "\n";
}
return 0;
}

Then I rewrote your code a bit so I could compile it:

std::vector<int> read_matrix(std::string const &path) {

std::ifstream infile(path.c_str());

std::vector<int> arr;

int row = 0, max_col = 0;

std::string line;
while (std::getline(infile, line)) {
std::istringstream ln(line);
int number;
int col = 0;
while (ln >> number) {
arr.push_back(number);
col++;
}
if (col != 0) {
if (row == 0)
max_col = col;
if (col != max_col) {
/*
* Array is not rectangular.
* Handle error (exit).
*/
}
row++;
}
}

return arr;
}

Finally, I added a test main to call that and read the file:

char filename[] = "c:\\c\\source\\junk.mat";

#include <iostream>
#include <numeric>
#include <time.h>

int main() {

clock_t start = clock();
std::vector<int> r = read_matrix(filename);
clock_t stop = clock();

// to ensure against the file-read being optimized away,
// doing something to use what we read.
int sum = std::accumulate(r.begin(), r.end(), 0);
std::cout << "sum = " << sum << "\n";
std::cout << "time = " << double(stop - start) / CLOCKS_PER_SEC;

return 0;
}

Run times:

VC++ 7.1: 1.89
Comeau 4.3.3: 2.672
g++ 3.4.4: 4.671

That leaves a few possibilities:

1) the newer version of VC++ has reduced I/O speed a lot.
2) the newer version of g++ has improved I/O speed a lot.
3) the software differences are being hidden by hardware differences.
4) you're not getting what you think from a "release" build in VS 2008.

Of these, the first seems possible but fairly unlikely (they both use
the Dinkumware library, and I don't think it's changed all that much
between these versions).

The second seems possible, but not to the degree necessary to explain
what you've observed. In particular, the executable I get from VC++ 7.1
reads the data quite a bit faster than 1/5th the claimed speed of my
hard drive, so speeding it up by 5:1 shouldn't be possible. Of course,
there could be differences due to caching (e.g. a second run might read
from the cache much faster than the hard drive can support), but I at
least attempted to factor this out in my testing, so while it could have
contributed something, I can't find any indication that it would account
for any large differences.

I'd guess the third is probably true to some degree -- but, again, I
don't see anything that would account for the major differences we're
seeing.

To me, that leaves the last possibility as being by far the most likely.
Of course, there may also be some entirely different possibility that
hasn't occured to me.
 
S

SzH

Hi Jerry,

Thanks for the reply! Hadn't it been for Google Groups, I wouldn't
have noticed it now, two weeks later.

[ ... ]
This is the relevant portion of the code:

I suspect you've done a bit of unintentional editing while separating
out the relevant portion of the code.
std::ios::sync_with_stdio(false);
std::ifstream infile(path);
input_array arr(); // see note below

This is NOT a definition of an input_array named arr. Rather, it's a
declaration of a function named arr that takes no parameters, and
returns an input_array. Until this is fixed, I don't think the rest of
the code can even compile.

Yes, you're right, it used to be input_array arr(1000 /* some number
here */);
I remove the number but not the parens.
As far as speed of I/O goes, I'm having difficulty reproducing the
problem you cite. I started by writing a small program to generate a
test file of the size you cited:

#include <iostream>

char filename[] = "c:\\c\\source\\junk.mat";

int main() {
std::eek:fstream out(filename);

for (int row=0; row<100; row++) {
for (int col=0; col<25000; col++)
out << row + col << "\t";
out << "\n";
}
return 0;

}

The matrix I used was actually 10 times bigger. Here's a program that
generates something that resembles closely what I actually had. It
should generate a file > 150 MB.

#include <fstream>
#include <cstdlib>

using namespace std;

int myrand() {
return int( 250000.0*rand()/(RAND_MAX + 1.0) );
}

int main() {
ofstream out("junk.mat");

for (int row=0; row < 100; row++) {
for (int col=0; col < 250000; col++)
out << myrand() << '\t';
out << '\n';
}
return 0;
}




Then I rewrote your code a bit so I could compile it:

std::vector<int> read_matrix(std::string const &path) {

std::ifstream infile(path.c_str());

std::vector<int> arr;

int row = 0, max_col = 0;

std::string line;
while (std::getline(infile, line)) {
std::istringstream ln(line);
int number;
int col = 0;
while (ln >> number) {
arr.push_back(number);
col++;
}
if (col != 0) {
if (row == 0)
max_col = col;
if (col != max_col) {
/*
* Array is not rectangular.
* Handle error (exit).
*/
}
row++;
}
}

return arr;

}

Finally, I added a test main to call that and read the file:

char filename[] = "c:\\c\\source\\junk.mat";

#include <iostream>
#include <numeric>
#include <time.h>

int main() {

clock_t start = clock();
std::vector<int> r = read_matrix(filename);
clock_t stop = clock();

// to ensure against the file-read being optimized away,
// doing something to use what we read.
int sum = std::accumulate(r.begin(), r.end(), 0);
std::cout << "sum = " << sum << "\n";
std::cout << "time = " << double(stop - start) / CLOCKS_PER_SEC;

return 0;

}

Run times:

VC++ 7.1: 1.89
Comeau 4.3.3: 2.672
g++ 3.4.4: 4.671

That leaves a few possibilities:

1) the newer version of VC++ has reduced I/O speed a lot.

I only have VS 2008 Express, so I don't know, but it seems possible.
2) the newer version of g++ has improved I/O speed a lot.

Yes, it did. I tested it with gcc 3.4.5 (mingw), and the program
compiled with the newer version (gcc 4.2.1, dw2 version from mingw) is
more than 2x (twice) faster than the one compiled with old gcc. (!!)
3) the software differences are being hidden by hardware differences.

This is unlikely. As I'll show below, the code generated by VS is CPU-
bound (not disk-bound).
4) you're not getting what you think from a "release" build in VS 2008.

This is unlikely because the program compiled in "debug" mode ran
twice slower than when compiled in "release" mode. But to keep things
reproducible, below I'll show the exact commands used for compiling.


Here's a simplified, almost minimal version of the program I used for
testing. This one can be compiled without additional prerequisites
(like input_array in the original example). Just copy and paste it
into a file.

/* perf.cpp */

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

using namespace std;

int main(int argc, char *argv[]) {
if (argc != 2) {
exit(EXIT_FAILURE);
}

ios::sync_with_stdio(false);

ifstream infile(argv[1]);

string line;
while (getline(infile, line)) {
// uncomment this for the second test
/*
std::istringstream ln(line);
int number;
int col = 0;
while (ln >> number)
col++;
*/
/* Just to make sure that the
* compiler doesn't optimize anything away,
* output the line length:
*/
cout << line.size() << '\n';
}

return 0;
}


First, let's just read in every line as strings (i.e. keep the section
that reads the numbers commented).
The timings are (in seconds):

vs: 7.311
gcc: 2.015
dmc: 4.218 (digital mars compiler, v8.50, with STLport)

You'll find the full (unedited) transcript of the command line session
below, complete with compilation flags.


Now let's uncomment the section that reads the numbers, and measure
again. Now the results are:

vs: 51.165
gcc: 8.795
dmc: 18.560

For comparison (not shown in transcript):

gcc 3.4.5 : 23.637
VS without the /O2 options: 1:58.754


Jerry, the results are the same with your version of the program too
(the timings are a little higher, probably because of need to "grow"
the std::vector).

Note that the timings are much higher in the second test. This
suggests that most of the time is spent with parsing the numbers, and
not reading the data from the disk. But the huge difference between VC
++ and gcc persists ... The only reasonable explanation to me is that
VC++'s streams implementation is very slow and gcc's is very fast.
But why is then such a huge difference between my and Jerry's
results? Did VC++ get really this slower since 7.1? Or is the
Express edition crippled in some way?

Or did I do something stupid here? I'd be happiest if someone could
point out some mistake I made, because I'd like to make this work fast
(on the VS forums no one did).


************ Transcript, just read the lines, don't parse numbers
**************

C:\work\temp\perf>cl /O2 /EHsc perf.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08
for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.

perf.cpp
Microsoft (R) Incremental Linker Version 9.00.21022.08
Copyright (C) Microsoft Corporation. All rights reserved.

/out:perf.exe
perf.obj

C:\work\temp\perf>g++ --version
g++ (GCC) 4.2.1-dw2 (mingw32-2)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.


C:\work\temp\perf>g++ -O2 perf.cpp -o perfgcc.exe

C:\work\temp\perf>dmc -o+all perf.cpp -o perfdmc.exe
link perf,perfdmc,,user32+kernel32/noi;


C:\work\temp\perf>timethis "perf junk.mat > NUL"

TimeThis : Command Line : perf junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:15 2008


TimeThis : Command Line : perf junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:15 2008
TimeThis : End Time : Sat Feb 16 20:37:22 2008
TimeThis : Elapsed Time : 00:00:07.311

C:\work\temp\perf>timethis "perfgcc junk.mat > NUL"

TimeThis : Command Line : perfgcc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:29 2008


TimeThis : Command Line : perfgcc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:29 2008
TimeThis : End Time : Sat Feb 16 20:37:31 2008
TimeThis : Elapsed Time : 00:00:02.015

C:\work\temp\perf>timethis "perfdmc junk.mat > NUL"

TimeThis : Command Line : perfdmc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:35 2008


TimeThis : Command Line : perfdmc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:37:35 2008
TimeThis : End Time : Sat Feb 16 20:37:39 2008
TimeThis : Elapsed Time : 00:00:04.218


************ Transcript, parse the numbers too **************

C:\work\temp\perf>cl /O2 /EHsc perf.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08
for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.

perf.cpp
Microsoft (R) Incremental Linker Version 9.00.21022.08
Copyright (C) Microsoft Corporation. All rights reserved.

/out:perf.exe
perf.obj

C:\work\temp\perf>g++ -O2 perf.cpp -o perfgcc.exe

C:\work\temp\perf>dmc -o+all perf.cpp -o perfdmc.exe
link perf,perfdmc,,user32+kernel32/noi;


C:\work\temp\perf>timethis "perf junk.mat > NUL"

TimeThis : Command Line : perf junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:42:00 2008


TimeThis : Command Line : perf junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:42:00 2008
TimeThis : End Time : Sat Feb 16 20:42:51 2008
TimeThis : Elapsed Time : 00:00:51.165

C:\work\temp\perf>timethis "perfgcc junk.mat > NUL"

TimeThis : Command Line : perfgcc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:43:04 2008


TimeThis : Command Line : perfgcc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:43:04 2008
TimeThis : End Time : Sat Feb 16 20:43:12 2008
TimeThis : Elapsed Time : 00:00:08.795

C:\work\temp\perf>timethis "perfdmc junk.mat > NUL"

TimeThis : Command Line : perfdmc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:43:17 2008


TimeThis : Command Line : perfdmc junk.mat > NUL
TimeThis : Start Time : Sat Feb 16 20:43:17 2008
TimeThis : End Time : Sat Feb 16 20:43:35 2008
TimeThis : Elapsed Time : 00:00:18.560


The test was done on an 1.7 GHz Pentium-M on WinXP.
Of these, the first seems possible but fairly unlikely (they both use
the Dinkumware library, and I don't think it's changed all that much
between these versions).

The second seems possible, but not to the degree necessary to explain
what you've observed. In particular, the executable I get from VC++ 7.1
reads the data quite a bit faster than 1/5th the claimed speed of my
hard drive, so speeding it up by 5:1 shouldn't be possible. Of course,
there could be differences due to caching (e.g. a second run might read
from the cache much faster than the hard drive can support), but I at
least attempted to factor this out in my testing, so while it could have
contributed something, I can't find any indication that it would account
for any large differences.

I'd guess the third is probably true to some degree -- but, again, I
don't see anything that would account for the major differences we're
seeing.

To me, that leaves the last possibility as being by far the most likely.
Of course, there may also be some entirely different possibility that
hasn't occured to me.

--
Later,
Jerry.

The universe is a figment of its own imagination.

P.S. I'm CCing you in case you sent the message two weeks ago, but it
arrived only now.
 
R

Roland Pibinger

I would like to read in large integer matrices from text files, as
quickly as possible. ....
It is more likely
that there are some relevant settings for iostreams (with a
significant impact on performance) that I am not aware of.
Any suggestions for making this program perform well portably would me
most welcome!

Avoid iostreams. They are slow 'by design'. Just try to figure out how
often reallocation and copying occur in your code.
 
P

peter koch

Avoid iostreams. They are slow 'by design'. Just try to figure out how
often reallocation and copying occur in your code.


Why do you believe that iostreams are slow by design? And what would
you suggest instead? The printf family certainly is slow by design,
replacing a compiled language with an (obscure) interpreted one -
namely the language specified by the format specifier.

/Peter
 
A

Alf P. Steinbach

* peter koch:
Why do you believe that iostreams are slow by design?

I think the belief that the inefficiency may or must be related to the
design, primarily stems from the virtual absence of efficient
implementations. Bjarne indicated in 2001[1] that an efficient iostream
implementation was very high on his wish-list, but that he only expected
that to be possible for a simple subset of operations, "I believe that
aggressive optimization techniques will allow us to regain the
efficiency of the original in the many common cases where the full
generality of iostreams is not used". As far as I know, as of 2008
Bjarne has not yet had his wish granted: e.g., Andrei's fabled YASTLI
implementation seems to still only have status of Good Idea.

And what would
you suggest instead? The printf family certainly is slow by design,
replacing a compiled language with an (obscure) interpreted one -
namely the language specified by the format specifier.

There is unfortunately, AFAIK, no general, good replacement.

However, where efficiency matters, one may try using FILE*, which is
generally more efficient, before going down to system APIs.


Cheers, & hth.,

- Alf

Disclaimer: it's been some years since last I measured file i/o
performance, however, I have seen no evidence to the contrary.

Notes:
[1] <url: http://research.att.com/~bs/01chinese.html>
 
P

Pavel

peter said:
Why do you believe that iostreams are slow by design? And what would
you suggest instead? The printf family certainly is slow by design,
replacing a compiled language with an (obscure) interpreted one -
namely the language specified by the format specifier.

/Peter
Well, for one, iostream formatting is defined in terms of num_put facets
of applicable locales.. and, guess what, the results of formatting with
num_put are in turn defined in term of out-of-your-favor printf() ..
after the overhead required to compute the required facet.. and before
the overhead required to "adjust the representation by converting each
char.." , padding, numpunct-controlled group separateion etc. etc. See
22.2.2.2.2. of the Standard for more details.

Of course you could argue that, theoretically, "as-if" printf() could be
faster than the real printf() but, usually, it isn't. And it is
difficult to understand why it would be, given the only waste one could
eliminate in a clean-room re-implementation would be not interpreting
the argument type from the format specifier (which takes a single
redirection in a reasonable sprintf implementation -- a negligible gain
indeed in comparison with the tons of floating-point arithmetics
required for formatting and tons of indirection and additional required
by locale-facet imposed group separation, even if the latter is only
checked and only to find that no real group separation should be done).

Oh yes, and, for example, my version of g++ Standard library in fact
implements num_put via that despicable and "certainly slow" sprintf (or
snprintf, depending on the "use C99" compliation flag).

Hope this will help,
Pavel
 
J

James Kanze

Avoid iostreams. They are slow 'by design'.

Compared to what? The best implementations generally beat
stdio. (Most wide spread implementations are designed to be
simple, rather than fast, since experience has shown that
they're fast enough anyway.)
 
J

James Kanze

* peter koch:
I think the belief that the inefficiency may or must be
related to the design, primarily stems from the virtual
absence of efficient implementations.

Totally agreed. Dietmar Kuehl did design a very, very efficient
implementation at one point. He also made it freely available,
allowing even commercial implementations to incorporate it for
free. I know that he was very frustrated that none did.

Note that making filebuf efficient in the presence of a possibly
non-trivial codecvt is not trivial, and if I understand
correctly, Dietmar's solution involved some pretty clever
meta-programming to avoid unnecessary work whtn the codecvt was
trivial. It may be that the commercial implementers didn't
think it worth the extra complexity, and supposed that any extra
overhead would not be noticeable compared to actual I/O times.
Bjarne indicated in 2001[1] that an efficient iostream
implementation was very high on his wish-list, but that he
only expected that to be possible for a simple subset of
operations, "I believe that aggressive optimization techniques
will allow us to regain the efficiency of the original in the
many common cases where the full generality of iostreams is
not used". As far as I know, as of 2008 Bjarne has not yet
had his wish granted:

I wonder what the status of Dietmar's implementation is.

FWIW: I've used classical iostreams which were faster than the
systems printf. The classical iostreams, of course, were not
templates, didn't support wide character I/O, and didn't support
on the fly code conversions via the codecvt in filebuf. Knowing
"up front" that you're dealing with the actual "ascii"
characters (and not having to deal with things like "narrow")
can certainly result in a faster implementation.
e.g., Andrei's fabled YASTLI
implementation seems to still only have status of Good Idea.
There is unfortunately, AFAIK, no general, good replacement.
However, where efficiency matters, one may try using FILE*,
which is generally more efficient, before going down to system
APIs.

One might, but in practice, most of us have to deal with at most
two system API's, so even going down to that level isn't always
that much of a problem. (And mmap can often beat both FILE* and
filebuf by a significant amount.)
 
R

Roland Pibinger

Why do you believe that iostreams are slow by design? And what would
you suggest instead?

The original example could be made faster by avoiding reallocation and
copying.

std::string line;
line.reserve(250000 * SIZEOF_COL);
while (std::getline(infile, line)) {
// don't copy the long line into istringstream,
// i.e. avoid istringstream
// std::istringstream ln(line);

If you implement your own (reusable)
String& getline(File* file, String& line, size_t lengthHint = 0)
function you get rid of any iostream dependency.
 
S

SzH

The original example could be made faster by avoiding reallocation and
copying.

std::string line;
line.reserve(250000 * SIZEOF_COL);

That's not where the bottleneck is. This is clearly shown by the
comparison of the two tests in my second post (see the section of the
code which is commented out between them).
while (std::getline(infile, line)) {
// don't copy the long line into istringstream,
// i.e. avoid istringstream
// std::istringstream ln(line);

How do I read the numbers without the istringstream? But
std::istringstream isn't the source of the "slowness" anyway. The
piece that takes a long time to execute is

while (ln >> number)
col++;

Just try commenting out the pieces and measure the timings to see what
I mean.

(BTW I made another, more complicated, istringstream-less
implementation, too, which reads directly from the file, and looks for
the line-endings manually. It performs the same way.)
If you implement your own (reusable)
String& getline(File* file, String& line, size_t lengthHint = 0)
function you get rid of any iostream dependency.

Yes, that's the very disappointing thing about C++. One can never
rely on the damn standard library for good performance! You just
switch compilers, and suddenly find that a stupid scripting language
performs much better than your C++ program.
 
S

SzH

Compared to what? The best implementations generally beat
stdio. (Most wide spread implementations are designed to be
simple, rather than fast, since experience has shown that
they're fast enough anyway.)


You know, it's just terribly disappointing and frustrating when one
finds out that even a stupid scripting language outperforms one's C++
implementation. For as long as I'm doing numerical stuff, I don't
even touch the standard library (std::vector and the likes). If I do,
and by accident it isn't slow, there's a good chance that trying a
different compiler will make it dog slow.

I have a good feeling for what is slow and what is fast when doing
numerical calculations, so I implement my own classes. But for I/O I
*have* to rely on the standard library and I have no idea about what
are the little things that I have to pay attention to avoid bad
performance. I'm very disappointed.

And, if possible, I don't want to overcomplicate things ... C++ is
supposed to make things easy and convenient over C, so I don't want to
use <stdio.h> when the C++ version is so much cleaner and easier to
write.

Yet, in the following example, even Python outperforms C++. I just
wanted to calculate how many distinct integers are there in each row
in the same matrix (same dataset).

The timings are:

gcc: 1:12.406
vs: 54.421
python: 17.312


gcc's fast iostreams are of no use here ... the slow std::set makes it
even slower than VS. Or maybe there is some little trick that one
should know about std::set to make it fast. But even if there is, it
is not documented (for my compiler), so an outsider like me cannot use
it!

And before anyone accuses me for reading from a gzipped file: just
the decompression of the file takes only 1.750 seconds. Gzipping the
datafile actually increases performance for large files, because it
makes the data processing CPU-bound instead of disk-bound.

----- count.cpp ---------

#include <iostream>
#include <sstream>
#include <string>
#include <set>

using namespace std;

int main() {
string line;
while (getline(cin, line)) {
istringstream ln(line);
set<int> specount;
int species;
while (ln >> species)
specount.insert(species);
cout << specount.size() << '\n';
}
return 0;
}


-------- pycount.py ----------

import gzip

for line in gzip.open('table014.gz'):
print len(set(line.split()))

g++ -O3 count.cpp -o gcccount.exe
cl /O2 /EHsc count.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08
for 80x8
Copyright (C) Microsoft Corporation. All rights reserved.

count.cpp
Microsoft (R) Incremental Linker Version 9.00.21022.08
Copyright (C) Microsoft Corporation. All rights reserved.

/out:count.exe
count.obj
timethis "zcat table014.gz > NUL"

TimeThis : Command Line : zcat table014.gz > NUL
TimeThis : Start Time : Sun Feb 17 14:07:38 2008


TimeThis : Command Line : zcat table014.gz > NUL
TimeThis : Start Time : Sun Feb 17 14:07:38 2008
TimeThis : End Time : Sun Feb 17 14:07:40 2008
TimeThis : Elapsed Time : 00:00:01.750
timethis "zcat table014.gz | count > NUL"

TimeThis : Command Line : zcat table014.gz | count > NUL
TimeThis : Start Time : Sun Feb 17 14:07:47 2008


TimeThis : Command Line : zcat table014.gz | count > NUL
TimeThis : Start Time : Sun Feb 17 14:07:47 2008
TimeThis : End Time : Sun Feb 17 14:08:41 2008
TimeThis : Elapsed Time : 00:00:54.421
timethis "zcat table014.gz | gcccount > NUL"

TimeThis : Command Line : zcat table014.gz | gcccount > NUL
TimeThis : Start Time : Sun Feb 17 14:08:46 2008


TimeThis : Command Line : zcat table014.gz | gcccount > NUL
TimeThis : Start Time : Sun Feb 17 14:08:46 2008
TimeThis : End Time : Sun Feb 17 14:09:59 2008
TimeThis : Elapsed Time : 00:01:12.406
timethis "pycount.py > NUL"

TimeThis : Command Line : pycount.py > NUL
TimeThis : Start Time : Sun Feb 17 14:10:26 2008


TimeThis : Command Line : pycount.py > NUL
TimeThis : Start Time : Sun Feb 17 14:10:26 2008
TimeThis : End Time : Sun Feb 17 14:10:43 2008
TimeThis : Elapsed Time : 00:00:17.312
 
R

Roland Pibinger

How do I read the numbers without the istringstream? But
std::istringstream isn't the source of the "slowness" anyway. The
piece that takes a long time to execute is

while (ln >> number)
col++;

This sentence seems to contain a contradiction. Anyway. AFAICS, you
have a string of ints separated by some whitespace. You could try
something like the following to extract the ints (no predicitons about
performance are provided). Error handling is omitted like in the
original code.

int main() {
using namespace std;

string str = "1 314 1024 ";

char* current = const_cast<char*> (str.c_str()) ;

while(true) {
const char* start = current;
long result = strtol(start, &current, 0);
if(start == current) {
break;
}
cout << result << " ";
}
}
 
S

SzH

This sentence seems to contain a contradiction.

Yes. I should have said that *constructing* the istringstream is not
the problem.
Anyway. AFAICS, you
have a string of ints separated by some whitespace. You could try
something like the following to extract the ints (no predicitons about
performance are provided).

That would be hard to do with these unpredictable libraries ... but
fortunately it is a lot faster :) Thanks!

Timings are

vs: 10.531
gcc: 4.953

compared to the previous

vs: 51.165
gcc: 8.795
 
J

James Kanze

You know, it's just terribly disappointing and frustrating when one
finds out that even a stupid scripting language outperforms one's C++
implementation. For as long as I'm doing numerical stuff, I don't
even touch the standard library (std::vector and the likes).

Interesting. In the implementations I use, there is no
difference in performance between std::vector and a C style
array, at least when it comes to access times. About the only
time I use C style arrays today is when I need static
initialization (although I can also see their use for small,
fixed length arrays---constructing an std::vector of a fixed
size will definitely be more expensive than just defining a C
style array).
If I do, and by accident it isn't slow, there's a good chance
that trying a different compiler will make it dog slow.
I have a good feeling for what is slow and what is fast when
doing numerical calculations, so I implement my own classes.
But for I/O I *have* to rely on the standard library and I
have no idea about what are the little things that I have to
pay attention to avoid bad performance. I'm very
disappointed.
And, if possible, I don't want to overcomplicate things ...
C++ is supposed to make things easy and convenient over C, so
I don't want to use <stdio.h> when the C++ version is so much
cleaner and easier to write.
Yet, in the following example, even Python outperforms C++. I
just wanted to calculate how many distinct integers are there
in each row in the same matrix (same dataset).
The timings are:
gcc: 1:12.406
vs: 54.421
python: 17.312
gcc's fast iostreams are of no use here ... the slow std::set
makes it even slower than VS. Or maybe there is some little
trick that one should know about std::set to make it fast.

Is std::set doing too much. You don't need the order, and most
scripting languages use a hash table here, which will be faster
if there are a large number of disinct integers. Also, I don't
know Python, but some scripting languages will use the text
representation, directly read from the file, as an index, rather
than converting it to int (and thus, counting 01 an 1 as two
distinct integers).
But even if there is, it is not documented (for my compiler),
so an outsider like me cannot use it!
And before anyone accuses me for reading from a gzipped file:
just the decompression of the file takes only 1.750 seconds.
Gzipping the datafile actually increases performance for large
files, because it makes the data processing CPU-bound instead
of disk-bound.
----- count.cpp ---------
#include <iostream>
#include <sstream>
#include <string>
#include <set>
using namespace std;
int main() {
string line;
while (getline(cin, line)) {
istringstream ln(line);
set<int> specount;
int species;
while (ln >> species)
specount.insert(species);
cout << specount.size() << '\n';
}
return 0;
}

Of course, you've chosen an example where garbage collection
(and thus garbage collected languages) is a big win:). It
should be easy to improve the performance here considerably by
using a custom allocator for the set. But you're right, you
shouldn't have to; the language should have garbage collection,
and handle this optimization on its own.
 
J

Jerry Coffin

[ ... ]
I only have VS 2008 Express, so I don't know, but it seems possible.

If you previously said you were using the "Express" edition, I didn't
catch it.

In any case, that's almost certainly the majority of the explanation
right there. Much of what they do in the Express edition of the compiler
is disable nearly _all_ optimization. Since iostreams are templates,
that means most of the code involved is being compiled with virtually no
optimization applied.

Fortunately, this is fairly easy to fix: the full version of the
compiler is included with (among other things) a number of versions of
the Windows SDK. If you care about performance at all, I'd advise
downloading and using that (at least for final builds).

IOW, I suspect (quite strongly) that the difference we're seeing IS
between different version of the compiler; the difference between 7.1
and 8.0 or 9.0 is probably NOT significant, but the difference between
the full, optimizing version of the compiler and the "Express" version
probably is.
Yes, it did. I tested it with gcc 3.4.5 (mingw), and the program
compiled with the newer version (gcc 4.2.1, dw2 version from mingw) is
more than 2x (twice) faster than the one compiled with old gcc. (!!)

Sorry -- when I said "a lot", I meant a much larger margin than this --
in my testing, gcc produced code about half the speed of VC++, so
roughly doubling the speed only brings it somewhere close to parity, NOT
a lot faster. By a lot faster, I was thinking of an improvement on the
order of 10:1 or something like that.
 
J

Jerry Coffin

[ ... ]
Yes, that's the very disappointing thing about C++. One can never
rely on the damn standard library for good performance! You just
switch compilers, and suddenly find that a stupid scripting language
performs much better than your C++ program.

In this case, it's really a matter of the nature of templates -- since
their behavior can change substantially when instantiated over a
different type, it's not really possible to pre-compile a template to
object code.

That means if you use standard library (or almost any other) code that's
based on templates (which includes most of the C++ standard library),
all that code is being compiled as part of your program. When, as in
your case, you use a compiler from which nearly all optimization
capability has been removed, the result can (obviously) be pretty ugly.
 
E

Erik Wikström

[ ... ]
I only have VS 2008 Express, so I don't know, but it seems possible.

If you previously said you were using the "Express" edition, I didn't
catch it.

In any case, that's almost certainly the majority of the explanation
right there. Much of what they do in the Express edition of the compiler
is disable nearly _all_ optimization. Since iostreams are templates,
that means most of the code involved is being compiled with virtually no
optimization applied.

Fortunately, this is fairly easy to fix: the full version of the
compiler is included with (among other things) a number of versions of
the Windows SDK. If you care about performance at all, I'd advise
downloading and using that (at least for final builds).

IOW, I suspect (quite strongly) that the difference we're seeing IS
between different version of the compiler; the difference between 7.1
and 8.0 or 9.0 is probably NOT significant, but the difference between
the full, optimizing version of the compiler and the "Express" version
probably is.

Actually the compiler in the Express editions are exactly the same as in
the other editions, the differences are the libraries (MFC etc.) and
tools that are included. Perhaps there are some differences in the flags
that are used by default but that is easily fixed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top