Why the performance of my string formatting code (via snprintf /

D

dmtr

I was trying to do some performance testing of the
google::dense_hash_map and run into a following problem with a very
poor performance of the snprintf/ostringstream. Somehow the same
"%0.12f" string formatting code in C++ appears to perform ~10 times
_worse_ than in Python. Can somebody give me a pointer why and how can
I get better performance in C++?

Following code that do 10,000,000 snprintf("%0.12f") takes 7 seconds
to execute (Linux/GCC 4.3.3 with -O2):
time_t seconds = time (NULL);
char buf[20];
for(int i = 0; i < 10000000; i++)
snprintf(buf, sizeof(buf), "%0.12f", 0.123456789101);
std::cout << time(NULL) - seconds << " seconds" << std::endl;

More C++-ish approach with ostringstream is even worse - 11 seconds:
std::eek:stringstream str;
str << std::setprecision(12);
for(int i = 0; i < 10000000; i++)
{
str.str(std::string());
str << 0.123456789101;
str.str();
}

In Python 2.6.2 the same code takes ~1.1 seconds to execute:
I'm positive that in all these cases (including python) actual string
formatting is being performed. By the way, the complete test that I'm
trying to do is:

In Python:
import os, random, collections, time

d = collections.defaultdict(int)
start = time.time()
for i in xrange(10000000):
d["%0.12f" % random.random()] += 1
print "%f seconds" % (time.time() - start)
print d.iteritems().next()

And in C++ (see below, the code is too large to fit). If somebody
could improve my C++ example and outperform python, it could be nice.
So far my C++ code is slower and eats about the same amount of
memory :'-(

-- With Regards,
Dmitry Chichkov


-- With "gcc -std=c++0x -lstdc++ -O2" --

#include <google/dense_hash_map>
#include <iostream>
#include <string>
#include <random>
#include <sstream>
#include <string>
#include <iomanip>
#include <time.h>

using std::hash;

typedef std::mt19937 eng_t;
typedef std::uniform_real<double> dist_t;

int main()
{
eng_t eng;
dist_t dist(0.0, 1.0);
std::variate_generator <eng_t, dist_t > gen(eng, dist);

google::dense_hash_map<std::string, int, hash<std::string> > gm;
gm.set_empty_key("");

time_t seconds = time (NULL);
std::eek:stringstream str;
str << std::setprecision(12);
for(int i = 0; i < 10000000; i++)
{
str.str(std::string());
str << gen();
gm[str.str()] += 1;
}

std::cout << time(NULL) - seconds << " seconds" << std::endl;
std::cout << str.str() << " " << gm[str.str()] << std::endl;
std::cout << str.str().length() << " " << str.str().capacity() <<
std::endl;
return 0;
}
 
D

dmtr

Without knowing anything about Python's innards, one very plausible
explanation is that the Python compiler/interpreter is simply optimizing
away the entire formatting operation, replacing the assignment statement
with s="0.123456789101", and gcc does not do this. The compiler/intepreter
has all the information it needs to effect this optimization.

 application_pgp-signature_part
< 1KViewDownload

It looks like you are right here. I've just tried:
and it takes 18 seconds, versus 11 seconds in C++. But still, it's
difficult to call that fast. Any way to improve on that? And my test
with dense_hash_map somehow is still slower than python one....
 
Ö

Öö Tiib

It looks like you are right here. I've just tried:


and it takes 18 seconds, versus 11 seconds in C++.  But still, it's
difficult to call that fast. Any way to improve on that? And my test
with dense_hash_map somehow is still slower than python one....

What is the use-case? Why you need ten millions of floating point
numbers in readable form quicker than with 10 seconds? Human can not
read so lot of floats with 10 seconds anyway so for human to read 10
seconds is fast enough.

If you have some real case then consider to use some binary (not
readable) protocol. To read the protocol (casually when testing,
debugging), write separate application that translates it into
readable form when needed.
 
D

dmtr

What is the use-case? Why you need ten millions of floating point
numbers in readable form quicker than with 10 seconds? Human can not
read so lot of floats with 10 seconds anyway so for human to read 10
seconds is fast enough.

If you have some real case then consider to use some binary (not
readable) protocol. To read the protocol (casually when testing,
debugging), write separate application that translates it into
readable form when needed.

Uh. The use case was described in my original e-mail, I was using "%.
12f" string formatting facility as a random string source for
benchmark testing purposes.
Apparently not a good choice, as Sam and Paavo Helde noted "floating
point value into a string representation is a hideously long and
complicated process".
 
Ö

Öö Tiib

Uh. The use case was described in my original e-mail, I was using "%.
12f" string formatting facility as a random string source for
benchmark testing purposes.
Apparently not a good choice, as Sam and Paavo Helde noted "floating
point value into a string representation is a hideously long and
complicated process".

Yes, if it does not matter that these are exactly floats then generate
ints. It is faster.
 
V

Vaclav Haisman

dmtr wrote, On 25.6.2010 1:59:
I was trying to do some performance testing of the
google::dense_hash_map and run into a following problem with a very
poor performance of the snprintf/ostringstream. Somehow the same
"%0.12f" string formatting code in C++ appears to perform ~10 times
_worse_ than in Python. Can somebody give me a pointer why and how can
I get better performance in C++?

Following code that do 10,000,000 snprintf("%0.12f") takes 7 seconds
to execute (Linux/GCC 4.3.3 with -O2):
time_t seconds = time (NULL);
char buf[20];
for(int i = 0; i < 10000000; i++)
snprintf(buf, sizeof(buf), "%0.12f", 0.123456789101);
std::cout << time(NULL) - seconds << " seconds" << std::endl;

More C++-ish approach with ostringstream is even worse - 11 seconds:
std::eek:stringstream str;
str << std::setprecision(12);
for(int i = 0; i < 10000000; i++)
{
str.str(std::string());
str << 0.123456789101;
str.str();
}

In Python 2.6.2 the same code takes ~1.1 seconds to execute:
I'm positive that in all these cases (including python) actual string
formatting is being performed. By the way, the complete test that I'm
trying to do is:

In Python:
import os, random, collections, time

d = collections.defaultdict(int)
start = time.time()
for i in xrange(10000000):
d["%0.12f" % random.random()] += 1
print "%f seconds" % (time.time() - start)
print d.iteritems().next()

And in C++ (see below, the code is too large to fit). If somebody
could improve my C++ example and outperform python, it could be nice.
So far my C++ code is slower and eats about the same amount of
memory :'-(

-- With Regards,
Dmitry Chichkov


-- With "gcc -std=c++0x -lstdc++ -O2" --

#include <google/dense_hash_map>
#include <iostream>
#include <string>
#include <random>
#include <sstream>
#include <string>
#include <iomanip>
#include <time.h>

using std::hash;

typedef std::mt19937 eng_t;
typedef std::uniform_real<double> dist_t;

int main()
{
eng_t eng;
dist_t dist(0.0, 1.0);
std::variate_generator <eng_t, dist_t > gen(eng, dist);

google::dense_hash_map<std::string, int, hash<std::string> > gm;
gm.set_empty_key("");

time_t seconds = time (NULL);
std::eek:stringstream str;
str << std::setprecision(12);
for(int i = 0; i < 10000000; i++)
{
str.str(std::string());
str << gen();
gm[str.str()] += 1;
}

std::cout << time(NULL) - seconds << " seconds" << std::endl;
std::cout << str.str() << " " << gm[str.str()] << std::endl;
std::cout << str.str().length() << " " << str.str().capacity() <<
std::endl;
return 0;
}
I think that you are mixing too many things together here. You are not
comparing comparable.

First, I would not be surprised if Python could optimize printing single
floating point constant into a constant string. That's something C++ cannot do.

Second, when you add the random numbers generation into the loop then you
should be using equivalent random number generators to compare Python and
C++. I suspect that most of the time difference is in the random number
generation. I bet Python is not using Mersenne twister.

Third, the map/hash table structure is different in each and thus has
different performance characteristics. Python's dictionary might be better
tuned for strings than google::dense_hash_map is. Try different maps, e.g.
tr1::hash_map or just std::map.
 
D

dmtr

I think that you are mixing too many things together here. You are not
comparing comparable.

I was just trying to do a quick evaluation and see if a
google::dense_hash_map is worth wrapping (as a module) in python. And
I was trying to compare comparable. Only that slow "%.12f" had gotten
in a way. Guess I'll have to use something different as a random
string source. Maybe cache them in memory or something.

First, I would not be surprised if Python could optimize printing single
floating point constant into a constant string. That's something C++ cannot do.

Yep. And I thought that was settled by the Sam's comment and my test.

Second, when you add the random numbers generation into the loop then you
should be using equivalent random number generators to compare Python and
C++. I suspect that most of the time difference is in the random number
generation. I bet Python is not using Mersenne twister.

You've just lost your bet. ;) Python was using MT since 2.3 (released
~7 years ago). See: http://docs.python.org/library/random.html
And actually Mersenne Twister is very fast and doesn't eat much time.
Not the best RNG, but pretty much industry standard. It's a shame it
is so difficult to use in C++.

Third, the map/hash table structure is different in each and thus has
different performance characteristics. Python's dictionary might be better
tuned for strings than google::dense_hash_map is. Try different maps, e.g.
tr1::hash_map or just std::map.

Well, that's what I'm trying to evaluate. AFAIK google::dense_hash_map
(with a proper hash function like murmurhash) is considered to be
state of the art in terms of speed/memory usage for very large
datasets.
I've run into some problems with existing hash map solutions in python
(like dict/defaultdict/counter/memcached/redis/shelve/etc) and was
considering google::dense_hash_map as an alternative.
If you know any other hash map implementation that work well for large
(~100,000,000 items) string -> int hash maps/dictionaries, you'd be
very welcome!
 
T

Thomas J. Gritzan

Am 26.06.2010 10:58, schrieb dmtr:
I was just trying to do a quick evaluation and see if a
google::dense_hash_map is worth wrapping (as a module) in python. And
I was trying to compare comparable. Only that slow "%.12f" had gotten
in a way. Guess I'll have to use something different as a random
string source. Maybe cache them in memory or something.

Convert an unsigned 32bit into its hexadecimal representation. That
would only cost some bit-shifting and table-lookup and should be much
faster than outputing floating-points.
Yep. And I thought that was settled by the Sam's comment and my test.



You've just lost your bet. ;) Python was using MT since 2.3 (released
~7 years ago). See: http://docs.python.org/library/random.html
And actually Mersenne Twister is very fast and doesn't eat much time.
Not the best RNG, but pretty much industry standard. It's a shame it
is so difficult to use in C++.

Difficult to use?

std::mt19937 random;
unsigned something_random = random();
// add seeding to get better results ;-)

It's only a bit more complex if you want floating-point randoms or some
non-uniform distribution. I think it would be easy to write a little
class that has the same interface as pythons random, though.
Well, that's what I'm trying to evaluate. AFAIK google::dense_hash_map
(with a proper hash function like murmurhash) is considered to be
state of the art in terms of speed/memory usage for very large
datasets.
I've run into some problems with existing hash map solutions in python
(like dict/defaultdict/counter/memcached/redis/shelve/etc) and was
considering google::dense_hash_map as an alternative.
If you know any other hash map implementation that work well for large
(~100,000,000 items) string -> int hash maps/dictionaries, you'd be
very welcome!

For string lookup with a large data set, a Trie might be the better data
structure.
 
D

dmtr

Thomas said:
Convert an unsigned 32bit into its hexadecimal representation. That
would only cost some bit-shifting and table-lookup and should be much
faster than outputing floating-points.

I'll try 32bits -> 4 chars conversion via a lookup table. Should not
be very expensive both in C++ and python.
And it will allow me to adjust char occurrence frequencies to some
degree. More real life test.

Difficult to use?

std::mt19937 random;
unsigned something_random = random();
// add seeding to get better results ;-)

It's only a bit more complex if you want floating-point randoms or some
non-uniform distribution. I think it would be easy to write a little
class that has the same interface as pythons random, though.

In fact it is. If you factor in lack of documentation and
straightforward examples.
Besides, originally I wanted a float between 0..1. And intuitive:
std::mt19937 e;
std::uniform_real<double> dist(0.0, 1.0);
double something_random = dist(e);

Had failed to produce it! It took me a half an hour to figure out how
to get it work. So yes. I consider it to be fairly difficult to use.

For string lookup with a large data set, a Trie might be the better data
structure.

Good idea. Thanks!
 
D

dmtr

I'll try 32bits -> 4 chars conversion via a lookup table. Should not
be very expensive both in C++ and python.
And it will allow me to adjust char occurrence frequencies to some
degree. More real life test.

Here are a couple of lookup arrays that I've made (256 elements, based
on chars/word lengths occurrence frequencies), could be useful to
someone attempting to generate random strings really quickly:
letters = [
'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e',
'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e', 'e',
't', 't', 't', 't', 't', 't', 't', 't', 't', 't', 't',
't', 't', 't',
'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o', 'o',
'o', 'o',
'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a',
'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r',
'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i', 'i',
's', 's', 's', 's', 's', 's', 's', 's', 's', 's', 's',
'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',
'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F',
'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n', 'n',
'K', 'K', 'K', 'K', 'K', 'K', 'K', 'K', 'K', 'K', 'K',
'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U',
'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
'h', 'h', 'h', 'h', 'h', 'h', 'h', 'h',
'l', 'l', 'l', 'l', 'l', 'l', 'l',
']', ']', ']', ']', ']', ']',
'[', '[', '[', '[', '[',
'u', 'u', 'u', 'u', 'u',
'd', 'd', 'd', 'd', 'd',
'm', 'm', 'm', 'm', 'm',
'g', 'g', 'g', 'g', 'g',
'p', 'p', 'p', 'p', 'p',
'f', 'f', 'f', 'f',
'y', 'y', 'y', 'y',
'k', 'k', 'k',
'w', 'w', 'w',
'.', '.', '.',
'V', 'V', 'V',
'b', 'b', 'b',
"'", "'", "'",
'v', 'v',
'1', '1',
'|', '|',
'=', '=',
'A', 'A',
'/', '/',
'5', '5',
'!', '!',
'0', '0',
'S', 'S',
'4']

lengths = [
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
1, 1, 1, 1, 1, 1, 1, 1,
11, 11, 11, 11, 11, 11, 11, 11,
12, 12, 12, 12, 12, 12, 12,
13, 13, 13, 13, 13,
0, 0, 0, 0, 0,
14, 14, 14,
15, 15, 15,
16, 16,
21,
22,
20,
17]
 
S

Sprechen sie von C++

When I develop code, I often send results to a file so that I can read the
results over and see if the program is working fine.

My current project is tricky so the file idea is handy.

Writing to file is also way faster.
 
D

dmtr

When I develop code, I often send results to a file so that I can read the
results over and see if the program is working fine.

My current project is tricky so the file idea is handy.

Writing to file is also way faster.

Hey, that's great :) Writing to files ability is an awesome power :)
BTW, have anyone noticed a nasty easter egg in the letters/lengths
lookup tables I've posted? That's what happens when you use chars/word
lengths occurrence frequencies from the internet. Happy 4th
Everyone! :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top