Increasing C++ throughput

SzH · Jan 21, 2008

I need to read very large text files and do some simple processing on
them. I'm trying to make this as fast as possible. Before spending a
lot of time with it and going through a series of futile attempts to
optimize this, I thought I'd post here and ask where to start.

Consider this very small program for outputting every 10th line:

------------ filt.cpp ----------------

#include <iostream>
#include <string>

using namespace std;

int main() {
ios::sync_with_stdio(false);

string line;
unsigned long nr = 0;
while (getline(cin, line))
if (nr++ % 10 == 0)
cout << line << '\n';
return 0;
}

-----------------------------------

How can this be made faster? I know very little about C++ I/O. I
usually only do simple numerical stuff, and I think that to speed this
up, one needs to be familiar with how I/O works (internally) in the C+
+ standard library.

First I found that ios::sync_with_stdio(false); really does help.
Then I noticed that compressing the data file with gzip, and piping it
with zcat to this simple program speeds up things *lot* (it's several
time faster). So I suppose that before compression was applied, the
speed of reading the uncompressed file was limited by the hard drive.

Now this is the time it takes to decompress the file (tt2.gz), and
throw away the result:

timethis "zcat tt2.gz > NUL"

TimeThis : Command Line : zcat tt2.gz > NUL
TimeThis : Start Time : Mon Jan 21 19:10:11 2008

TimeThis : Command Line : zcat tt2.gz > NUL
TimeThis : Start Time : Mon Jan 21 19:10:11 2008
TimeThis : End Time : Mon Jan 21 19:10:26 2008
TimeThis : Elapsed Time : 00:00:14.750

This is filtering the decompressed data through the filt program from
above, and throw away the result:

timethis "zcat tt2.gz |filt > NUL"

TimeThis : Command Line : zcat tt2.gz |filt > NUL
TimeThis : Start Time : Mon Jan 21 18:51:16 2008

TimeThis : Command Line : zcat tt2.gz |filt > NUL
TimeThis : Start Time : Mon Jan 21 18:51:16 2008
TimeThis : End Time : Mon Jan 21 18:51:53 2008
TimeThis : Elapsed Time : 00:00:37.031

This is more than twice as slow. Could some knowledgeable people give
some hints on why is simply reading the data line-by-line and
outputting every tenth line more than twice as slow as decompressing
it?

Is it because the memory allocations (happening in string)? Are the
limiting factor the C++ I/O routines? Can this be sped up?

The compression ratio of the data is about 1:10, so zcat is reading
approx. the same amount of data that filt is outputting.

Any insights will be most welcome!

Szabolcs

(P.S. I'm on WinXP, if this matters. The program was compiled with
mingw gcc 4.2.1 with the -O3 option.)

fnegroni · Jan 21, 2008

I am no C++ expert.
But the most obvious optimization coming to mind is right there in the
specs.
You are only printing one line every ten.
Why are you storing the lines you are not printing?
You could just scan the file (using buffered IO) for line endings, and
store only the tenth line.

SzH · Jan 22, 2008

If you want to optimise text file processing, consider platform specific
file mapping facilities over iostreams.

So are you saying that there is no room for improvement here? The
reason why I am unsure about it is that if I change that string to
char, like this

----------- filt2.cpp ----------------

#include <iostream>
#include <string>

using namespace std;

int main() {
ios::sync_with_stdio(false);

char ch;
unsigned long nr = 0;
while (cin >> ch)
if (nr++ % 10 == 0)
cout << ch;
return 0;
}

--------------------------------

, then it gets much-much-much slower.

Any pointers to tutorials/documentation which explains why this is/how
these things work (and how these things can be sped up---for example
is it worth using stdio.h instead of iostreams?) would be much
appreciated.

Juha Nieminen · Jan 22, 2008

SzH said:
How can this be made faster?

You might want to try using the C I/O functions. They may in some
cases be significantly faster.

(And no, I don't have good suggestions about how to easily print each
10th line using C I/O functions. It's complicated.)

Puzzled about random initialisation	7	Jul 8, 2004
Minimum Total Difficulty	0	Nov 15, 2023
Selenium c++ help	0	May 31, 2023
Help figuring out a directory permission change problem	1	May 12, 2023
Taskcproblem calendar	4	Aug 31, 2023
List as a dynamic array of increasing-sized arrays	13	Oct 21, 2010
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Help with my responsive home page	2	Dec 14, 2022

Increasing C++ throughput

SzH

fnegroni

SzH

Juha Nieminen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads