Why the performance of my string formatting code (via snprintf /

dmtr · Jun 24, 2010

I was trying to do some performance testing of the
google::dense_hash_map and run into a following problem with a very
poor performance of the snprintf/ostringstream. Somehow the same
"%0.12f" string formatting code in C++ appears to perform ~10 times
_worse_ than in Python. Can somebody give me a pointer why and how can
I get better performance in C++?

Following code that do 10,000,000 snprintf("%0.12f") takes 7 seconds
to execute (Linux/GCC 4.3.3 with -O2):

time_t seconds = time (NULL);
char buf[20];
for(int i = 0; i < 10000000; i++)
snprintf(buf, sizeof(buf), "%0.12f", 0.123456789101);
std::cout << time(NULL) - seconds << " seconds" << std::endl;

More C++-ish approach with ostringstream is even worse - 11 seconds:

std:stringstream str;
str << std::setprecision(12);
for(int i = 0; i < 10000000; i++)
{
str.str(std::string());
str << 0.123456789101;
str.str();
}

In Python 2.6.2 the same code takes ~1.1 seconds to execute:
I'm positive that in all these cases (including python) actual string
formatting is being performed. By the way, the complete test that I'm
trying to do is:

In Python:

import os, random, collections, time

d = collections.defaultdict(int)
start = time.time()
for i in xrange(10000000):
d["%0.12f" % random.random()] += 1
print "%f seconds" % (time.time() - start)
print d.iteritems().next()

Click to expand...

Click to expand...

And in C++ (see below, the code is too large to fit). If somebody
could improve my C++ example and outperform python, it could be nice.
So far my C++ code is slower and eats about the same amount of
memory :'-(

-- With Regards,
Dmitry Chichkov

-- With "gcc -std=c++0x -lstdc++ -O2" --

#include <google/dense_hash_map>
#include <iostream>
#include <string>
#include <random>
#include <sstream>
#include <string>
#include <iomanip>
#include <time.h>

using std::hash;

typedef std::mt19937 eng_t;
typedef std::uniform_real<double> dist_t;

int main()
{
eng_t eng;
dist_t dist(0.0, 1.0);
std::variate_generator <eng_t, dist_t > gen(eng, dist);

google::dense_hash_map<std::string, int, hash<std::string> > gm;
gm.set_empty_key("");

time_t seconds = time (NULL);
std:

stringstream str;
str << std::setprecision(12);
for(int i = 0; i < 10000000; i++)
{
str.str(std::string());
str << gen();
gm[str.str()] += 1;
}

std::cout << time(NULL) - seconds << " seconds" << std::endl;
std::cout << str.str() << " " << gm[str.str()] << std::endl;
std::cout << str.str().length() << " " << str.str().capacity() <<
std::endl;
return 0;
}

dmtr · Jun 24, 2010

Without knowing anything about Python's innards, one very plausible
explanation is that the Python compiler/interpreter is simply optimizing
away the entire formatting operation, replacing the assignment statement
with s="0.123456789101", and gcc does not do this. The compiler/intepreter
has all the information it needs to effect this optimization.

application_pgp-signature_part
< 1KViewDownload

It looks like you are right here. I've just tried:
and it takes 18 seconds, versus 11 seconds in C++. But still, it's
difficult to call that fast. Any way to improve on that? And my test
with dense_hash_map somehow is still slower than python one....

Öö Tiib · Jun 25, 2010

It looks like you are right here. I've just tried:

and it takes 18 seconds, versus 11 seconds in C++. But still, it's
difficult to call that fast. Any way to improve on that? And my test
with dense_hash_map somehow is still slower than python one....

What is the use-case? Why you need ten millions of floating point
numbers in readable form quicker than with 10 seconds? Human can not
read so lot of floats with 10 seconds anyway so for human to read 10
seconds is fast enough.

If you have some real case then consider to use some binary (not
readable) protocol. To read the protocol (casually when testing,
debugging), write separate application that translates it into
readable form when needed.

dmtr · Jun 25, 2010

What is the use-case? Why you need ten millions of floating point
numbers in readable form quicker than with 10 seconds? Human can not
read so lot of floats with 10 seconds anyway so for human to read 10
seconds is fast enough.

If you have some real case then consider to use some binary (not
readable) protocol. To read the protocol (casually when testing,
debugging), write separate application that translates it into
readable form when needed.

Uh. The use case was described in my original e-mail, I was using "%.
12f" string formatting facility as a random string source for
benchmark testing purposes.
Apparently not a good choice, as Sam and Paavo Helde noted "floating
point value into a string representation is a hideously long and
complicated process".

Öö Tiib · Jun 25, 2010

Uh. The use case was described in my original e-mail, I was using "%.
12f" string formatting facility as a random string source for
benchmark testing purposes.
Apparently not a good choice, as Sam and Paavo Helde noted "floating
point value into a string representation is a hideously long and
complicated process".

Yes, if it does not matter that these are exactly floats then generate
ints. It is faster.

Vaclav Haisman · Jun 26, 2010

dmtr wrote, On 25.6.2010 1:59:

I was trying to do some performance testing of the
google::dense_hash_map and run into a following problem with a very
poor performance of the snprintf/ostringstream. Somehow the same
"%0.12f" string formatting code in C++ appears to perform ~10 times
_worse_ than in Python. Can somebody give me a pointer why and how can
I get better performance in C++?

Following code that do 10,000,000 snprintf("%0.12f") takes 7 seconds
to execute (Linux/GCC 4.3.3 with -O2):

time_t seconds = time (NULL);
char buf[20];
for(int i = 0; i < 10000000; i++)
snprintf(buf, sizeof(buf), "%0.12f", 0.123456789101);
std::cout << time(NULL) - seconds << " seconds" << std::endl;

Click to expand...

More C++-ish approach with ostringstream is even worse - 11 seconds:

std:stringstream str;
str << std::setprecision(12);
for(int i = 0; i < 10000000; i++)
{
str.str(std::string());
str << 0.123456789101;
str.str();
}

Click to expand...

In Python 2.6.2 the same code takes ~1.1 seconds to execute:
I'm positive that in all these cases (including python) actual string
formatting is being performed. By the way, the complete test that I'm
trying to do is:

In Python:

import os, random, collections, time

d = collections.defaultdict(int)
start = time.time()
for i in xrange(10000000):
d["%0.12f" % random.random()] += 1
print "%f seconds" % (time.time() - start)
print d.iteritems().next()

Click to expand...

Click to expand...

And in C++ (see below, the code is too large to fit). If somebody
could improve my C++ example and outperform python, it could be nice.
So far my C++ code is slower and eats about the same amount of
memory :'-(

-- With Regards,
Dmitry Chichkov

-- With "gcc -std=c++0x -lstdc++ -O2" --

#include <google/dense_hash_map>
#include <iostream>
#include <string>
#include <random>
#include <sstream>
#include <string>
#include <iomanip>
#include <time.h>

using std::hash;

typedef std::mt19937 eng_t;
typedef std::uniform_real<double> dist_t;

int main()
{
eng_t eng;
dist_t dist(0.0, 1.0);
std::variate_generator <eng_t, dist_t > gen(eng, dist);

google::dense_hash_map<std::string, int, hash<std::string> > gm;
gm.set_empty_key("");

time_t seconds = time (NULL);
std:stringstream str;
str << std::setprecision(12);
for(int i = 0; i < 10000000; i++)
{
str.str(std::string());
str << gen();
gm[str.str()] += 1;
}

std::cout << time(NULL) - seconds << " seconds" << std::endl;
std::cout << str.str() << " " << gm[str.str()] << std::endl;
std::cout << str.str().length() << " " << str.str().capacity() <<
std::endl;
return 0;
}

I think that you are mixing too many things together here. You are not
comparing comparable.

First, I would not be surprised if Python could optimize printing single
floating point constant into a constant string. That's something C++ cannot do.

Second, when you add the random numbers generation into the loop then you
should be using equivalent random number generators to compare Python and
C++. I suspect that most of the time difference is in the random number
generation. I bet Python is not using Mersenne twister.

Third, the map/hash table structure is different in each and thus has
different performance characteristics. Python's dictionary might be better
tuned for strings than google::dense_hash_map is. Try different maps, e.g.
tr1::hash_map or just std::map.

dmtr · Jun 26, 2010

I think that you are mixing too many things together here. You are not

comparing comparable.

I was just trying to do a quick evaluation and see if a
google::dense_hash_map is worth wrapping (as a module) in python. And
I was trying to compare comparable. Only that slow "%.12f" had gotten
in a way. Guess I'll have to use something different as a random
string source. Maybe cache them in memory or something.

First, I would not be surprised if Python could optimize printing single
floating point constant into a constant string. That's something C++ cannot do.

Yep. And I thought that was settled by the Sam's comment and my test.

Second, when you add the random numbers generation into the loop then you
should be using equivalent random number generators to compare Python and
C++. I suspect that most of the time difference is in the random number
generation. I bet Python is not using Mersenne twister.

You've just lost your bet.

Python was using MT since 2.3 (released
~7 years ago). See: http://docs.python.org/library/random.html
And actually Mersenne Twister is very fast and doesn't eat much time.
Not the best RNG, but pretty much industry standard. It's a shame it
is so difficult to use in C++.

Third, the map/hash table structure is different in each and thus has
different performance characteristics. Python's dictionary might be better
tuned for strings than google::dense_hash_map is. Try different maps, e.g.
tr1::hash_map or just std::map.

Well, that's what I'm trying to evaluate. AFAIK google::dense_hash_map
(with a proper hash function like murmurhash) is considered to be
state of the art in terms of speed/memory usage for very large
datasets.
I've run into some problems with existing hash map solutions in python
(like dict/defaultdict/counter/memcached/redis/shelve/etc) and was
considering google::dense_hash_map as an alternative.
If you know any other hash map implementation that work well for large
(~100,000,000 items) string -> int hash maps/dictionaries, you'd be
very welcome!

Thomas J. Gritzan · Jun 26, 2010

Am 26.06.2010 10:58, schrieb dmtr:

I was just trying to do a quick evaluation and see if a
google::dense_hash_map is worth wrapping (as a module) in python. And
I was trying to compare comparable. Only that slow "%.12f" had gotten
in a way. Guess I'll have to use something different as a random
string source. Maybe cache them in memory or something.

Convert an unsigned 32bit into its hexadecimal representation. That
would only cost some bit-shifting and table-lookup and should be much
faster than outputing floating-points.

Yep. And I thought that was settled by the Sam's comment and my test.

You've just lost your bet. Python was using MT since 2.3 (released
~7 years ago). See: http://docs.python.org/library/random.html
And actually Mersenne Twister is very fast and doesn't eat much time.
Not the best RNG, but pretty much industry standard. It's a shame it
is so difficult to use in C++.

Difficult to use?

std::mt19937 random;
unsigned something_random = random();
// add seeding to get better results ;-)

It's only a bit more complex if you want floating-point randoms or some
non-uniform distribution. I think it would be easy to write a little
class that has the same interface as pythons random, though.

Well, that's what I'm trying to evaluate. AFAIK google::dense_hash_map
(with a proper hash function like murmurhash) is considered to be
state of the art in terms of speed/memory usage for very large
datasets.
I've run into some problems with existing hash map solutions in python
(like dict/defaultdict/counter/memcached/redis/shelve/etc) and was
considering google::dense_hash_map as an alternative.
If you know any other hash map implementation that work well for large
(~100,000,000 items) string -> int hash maps/dictionaries, you'd be
very welcome!

For string lookup with a large data set, a Trie might be the better data
structure.

dmtr · Jun 26, 2010

Thomas said:
Convert an unsigned 32bit into its hexadecimal representation. That
would only cost some bit-shifting and table-lookup and should be much
faster than outputing floating-points.

I'll try 32bits -> 4 chars conversion via a lookup table. Should not
be very expensive both in C++ and python.
And it will allow me to adjust char occurrence frequencies to some
degree. More real life test.

Difficult to use?

std::mt19937 random;
unsigned something_random = random();
// add seeding to get better results ;-)

It's only a bit more complex if you want floating-point randoms or some
non-uniform distribution. I think it would be easy to write a little
class that has the same interface as pythons random, though.

In fact it is. If you factor in lack of documentation and
straightforward examples.
Besides, originally I wanted a float between 0..1. And intuitive:
std::mt19937 e;
std::uniform_real<double> dist(0.0, 1.0);
double something_random = dist(e);

Had failed to produce it! It took me a half an hour to figure out how
to get it work. So yes. I consider it to be fairly difficult to use.

For string lookup with a large data set, a Trie might be the better data
structure.

Good idea. Thanks!

dmtr · Jun 29, 2010

I'll try 32bits -> 4 chars conversion via a lookup table. Should not
be very expensive both in C++ and python.
And it will allow me to adjust char occurrence frequencies to some
degree. More real life test.

Here are a couple of lookup arrays that I've made (256 elements, based
on chars/word lengths occurrence frequencies), could be useful to
someone attempting to generate random strings really quickly:
letters = [
'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e',
'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e',
't', 't', 't', 't', 't', 't', 't', 't', 't', 't', 't',
't', 't', 't',
'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o',
'o', 'o',
'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a',
'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r',
'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i',
's', 's', 's', 's', 's', 's', 's', 's', 's', 's', 's',
'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',
'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F',
'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n',
'K', 'K', 'K', 'K', 'K', 'K', 'K', 'K', 'K', 'K', 'K',
'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U',
'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
'h', 'h', 'h', 'h', 'h', 'h', 'h', 'h',
'l', 'l', 'l', 'l', 'l', 'l', 'l',
']', ']', ']', ']', ']', ']',
'[', '[', '[', '[', '[',
'u', 'u', 'u', 'u', 'u',
'd', 'd', 'd', 'd', 'd',
'm', 'm', 'm', 'm', 'm',
'g', 'g', 'g', 'g', 'g',
'p', 'p', 'p', 'p', 'p',
'f', 'f', 'f', 'f',
'y', 'y', 'y', 'y',
'k', 'k', 'k',
'w', 'w', 'w',
'.', '.', '.',
'V', 'V', 'V',
'b', 'b', 'b',
"'", "'", "'",
'v', 'v',
'1', '1',
'|', '|',
'=', '=',
'A', 'A',
'/', '/',
'5', '5',
'!', '!',
'0', '0',
'S', 'S',
'4']

lengths = [
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
1, 1, 1, 1, 1, 1, 1, 1,
11, 11, 11, 11, 11, 11, 11, 11,
12, 12, 12, 12, 12, 12, 12,
13, 13, 13, 13, 13,
0, 0, 0, 0, 0,
14, 14, 14,
15, 15, 15,
16, 16,
21,
22,
20,
17]

Sprechen sie von C++ · Jul 1, 2010

When I develop code, I often send results to a file so that I can read the
results over and see if the program is working fine.

My current project is tricky so the file idea is handy.

Writing to file is also way faster.

dmtr · Jul 1, 2010

When I develop code, I often send results to a file so that I can read the
results over and see if the program is working fine.

My current project is tricky so the file idea is handy.

Writing to file is also way faster.

Hey, that's great

Writing to files ability is an awesome power

BTW, have anyone noticed a nasty easter egg in the letters/lengths
lookup tables I've posted? That's what happens when you use chars/word
lengths occurrence frequencies from the internet. Happy 4th
Everyone!

Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Rearranging .ply file via C++ String Parsing	0	Dec 14, 2019
AES-128 Clipboard Protector: Auto-Encrypt Ctrl+C, Smart-Decrypt Ctrl+V (C++ Windows Hook)	7	Mar 24, 2026
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Mini Web Server in C++ (Part One)	4	Oct 2, 2025
Counter-intuitive io vs no-io time readings	6	Apr 9, 2014
How to keep count of right answer and wrong answers in C++?	0	Nov 3, 2021
String performance	25	Oct 20, 2007

Why the performance of my string formatting code (via snprintf /

dmtr

dmtr

Öö Tiib

dmtr

Öö Tiib

Vaclav Haisman

dmtr

Thomas J. Gritzan

dmtr

dmtr

Sprechen sie von C++

dmtr

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads